Proceedings of the 13 ESSLLI Student Session - Multiple Choices ...
Proceedings of the 13 ESSLLI Student Session - Multiple Choices ...
Proceedings of the 13 ESSLLI Student Session - Multiple Choices ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />
<strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
4–15 August 2008, Hamburg, Germany<br />
Kata Balogh<br />
(editor)
Copyright c○ to <strong>the</strong> authors
Contents<br />
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br />
Martin Avanzini<br />
POP ∗ and Semantic Labelling using SAT . . . . . . . . . . . . 7<br />
Timo Baumann<br />
Simulating Spoken Dialogue<br />
With a Focus on Realistic Turn-Taking . . . . . . . . . . . . . 17<br />
Christopher Brumwell<br />
Epistemic Modals in Dialogue . . . . . . . . . . . . . . . . . . 27<br />
Bert Le Bruyn<br />
Bare predication and kinds . . . . . . . . . . . . . . . . . . . . 37<br />
James Burton<br />
Diagrammatic Reasoning<br />
with Enhanced Static Constraints . . . . . . . . . . . . . . . . 47<br />
Gemma Celestino<br />
Fictional Contingencies . . . . . . . . . . . . . . . . . . . . . 57<br />
Michael Franke<br />
Meaning & Inference in Case <strong>of</strong> Conflict . . . . . . . . . . . . 65<br />
Michael Hartwig<br />
Towards a New Characterisation <strong>of</strong> Chomsky’s Hierarchy via<br />
Acceptance Probability . . . . . . . . . . . . . . . . . . . . . . 75<br />
Simon Hopp<br />
Distance Effects in Sentence Processing . . . . . . . . . . . . . 85<br />
Pierre Lison<br />
A Salience-driven Approach to<br />
Speech Recognition for Human-Robot Interaction . . . . . . . . 95<br />
Petar Maksimović – Dragan Doder–Bojan Marinković – Aleksandar<br />
Perović<br />
A logic with a conditional probability operator . . . . . . . . . 105<br />
Scott Martin<br />
A Pro<strong>of</strong>-<strong>the</strong>oretic Approach to French Pronominal Clitics . . . 115<br />
Takako Nemoto<br />
Infinite games from an intuitionistic point <strong>of</strong> view . . . . . . . 125<br />
Ivelina Nikolova<br />
Language Technologies for Instructional Resources in Bulgarian<strong>13</strong>5<br />
3
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Yves Peirsman<br />
Word Space Models <strong>of</strong> Semantic Similarity and Relatedness . . 143<br />
Maren Schierloh<br />
Examining <strong>the</strong> Noticing Function <strong>of</strong> Output . . . . . . . . . . 153<br />
Andreas Schnabl<br />
Cdiprover3: a Tool for Proving<br />
Derivational Complexities <strong>of</strong> Term Rewriting Systems . . . . . 165<br />
Éva Szilágyi<br />
The Rank(s) Of A Totally Lexicalist Syntax . . . . . . . . . . 175<br />
Camilo Thorne<br />
Expressing Conjunctive and Aggregate Queries<br />
over Ontologies with Controlled English . . . . . . . . . . . . . 185<br />
Christina Unger – Gianluca Giorgolo<br />
Interrogation in Dynamic Epistemic Logic . . . . . . . . . . . 195<br />
Melanie Uth<br />
The Semantic Change <strong>of</strong> <strong>the</strong> French -age-Derivation . . . . . . 203<br />
Grégoire Winterstein<br />
Adversary Implicatures . . . . . . . . . . . . . . . . . . . . . . 2<strong>13</strong><br />
List <strong>of</strong> Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223<br />
4
Preface<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
This years <strong>Student</strong> <strong>Session</strong> is <strong>the</strong> thirteenth in <strong>the</strong> twenty years history <strong>of</strong> <strong>the</strong> annual<br />
European Summer School on Logic Language and Information. The first edition was<br />
held in Prague in 1996, invented and organized by students, and ever since ESS-<br />
LLI has been accompanied by a separate <strong>Student</strong> <strong>Session</strong>. The aim <strong>of</strong> <strong>the</strong> <strong>Student</strong><br />
<strong>Session</strong> is to give an opportunity to students at all levels (Bachelor-, Master-, and<br />
PhD-students) to present and discuss <strong>the</strong>ir work in progress with a possibility to get<br />
feedback from senior researchers.<br />
Similarly to <strong>the</strong> previous years, <strong>the</strong> quality <strong>of</strong> <strong>the</strong> submissions was high, this made<br />
<strong>the</strong> selection procedure difficult. This year 17 papers were selected for oral presentation<br />
and 5 for poster presentation from a total <strong>of</strong> 46 submissions. All <strong>the</strong> accepted<br />
papers are included in this volume.<br />
I would like to thank <strong>the</strong> <strong>ESSLLI</strong> organization, in particular Rineke Verbrugge and<br />
Benedikt Loewe for <strong>the</strong>ir continuos support and for making it possible. I am grateful<br />
to <strong>the</strong> StuS Program Committee, <strong>the</strong> co-chairs: Laia Mayol, Manuel Kirschner and<br />
Ji Ruan, for <strong>the</strong>ir efforts in coordinating <strong>the</strong> reviewing process, and <strong>the</strong> senior area<br />
experts: Anke Lüdeling, Paul Egré, Guram Bezhanishvili and Alexander Rabinovich<br />
for <strong>the</strong>ir continuous presence and helpful advice. Also, I want to hank <strong>the</strong> anonymous<br />
reviewers, whose detailed comments have not only proved invaluable during<br />
<strong>the</strong> selection procedure, but also provide useful feedback to <strong>the</strong> authors. Many<br />
thanks to <strong>the</strong> Kluwer Academic Publishers who <strong>of</strong>fered — as in previous years —<br />
prizes in “Best <strong>Student</strong> Paper in <strong>the</strong> Oral <strong>Session</strong>” and “Best <strong>Student</strong> Paper in <strong>the</strong><br />
Poster <strong>Session</strong>” nominations.<br />
We are very much looking forward to <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong> in Hamburg,<br />
and believe that it will be again a very inspiring meeting.<br />
5<br />
Kata Balogh<br />
Amsterdam, May 2008
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
6
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
POP ∗ AND SEMANTIC LABELING USING SAT<br />
Martin Avanzini<br />
University <strong>of</strong> Innsbruck<br />
Abstract. The polynomial path order (POP ∗ for short) is a termination method that induces<br />
polynomial bounds on <strong>the</strong> innermost runtime complexity <strong>of</strong> term rewrite systems (TRSs).<br />
Semantic labeling is a transformation technique used for proving termination. In this paper<br />
we propose an efficient implementation <strong>of</strong> POP ∗ toge<strong>the</strong>r with finite semantic labeling. This<br />
automation works by a reduction to <strong>the</strong> problem <strong>of</strong> boolean satisfiability. Satisfiability <strong>of</strong><br />
<strong>the</strong> resulting formula is checked by a state-<strong>of</strong>-<strong>the</strong>-art SAT-solver. We have implemented <strong>the</strong><br />
technique and experimental results confirm <strong>the</strong> feasibility <strong>of</strong> our approach. By semantic<br />
labeling, we significantly increase <strong>the</strong> power <strong>of</strong> POP ∗ .<br />
Term rewrite systems provide a conceptually simple but powerful abstract model <strong>of</strong><br />
computation. In rewriting, proving termination is a long standing research field and<br />
consequently termination techniques applicable in an automated setting have been introduced<br />
quite early. Former research concentrated mainly on direct termination techniques<br />
(TeReSe, 2003). One such technique is <strong>the</strong> use <strong>of</strong> recursive path orders (RPOs), for instance<br />
<strong>the</strong> multiset path order (MPO) (Baader and Nipkow, 1998). Recently, <strong>the</strong> emphasis<br />
shifted toward transformation techniques like <strong>the</strong> dependency pair method (Arts and<br />
Giesl, 2000) or semantic labeling (Zantema, 1995). These methods significantly increase<br />
<strong>the</strong> possibility to automatically conclude termination.<br />
For direct termination techniques it is <strong>of</strong>ten possible to infer upper bounds on <strong>the</strong><br />
derivational complexity <strong>of</strong> a rewrite system R from <strong>the</strong> termination pro<strong>of</strong>. For instance,<br />
H<strong>of</strong>bauer was <strong>the</strong> first to observe that termination via MPO implies <strong>the</strong> existence <strong>of</strong> a<br />
primitive recursive bound on <strong>the</strong> derivational complexity (H<strong>of</strong>bauer, 1992). Here derivational<br />
complexity refers to <strong>the</strong> function that relates <strong>the</strong> length <strong>of</strong> <strong>the</strong> longest derivation<br />
sequence to <strong>the</strong> size <strong>of</strong> <strong>the</strong> initial term. It is thus quite natural to extend such a termination<br />
analysis <strong>of</strong> rewrite systems to <strong>the</strong> analysis <strong>of</strong> complexity properties. For <strong>the</strong> study <strong>of</strong> lower<br />
complexity bounds we recently introduced in (Avanzini and Moser, 2008) <strong>the</strong> polynomial<br />
path order (POP ∗ for short). This order is in essence a miniaturization <strong>of</strong> MPO, carefully<br />
crafted to induce polynomial bounds on <strong>the</strong> number <strong>of</strong> rewrite steps (c.f. Theorem 4).<br />
In this work, we show how to increase <strong>the</strong> power <strong>of</strong> POP ∗ by semantic labeling<br />
(Zantema, 1995). The idea behind semantic labeling is to label <strong>the</strong> function symbols<br />
<strong>of</strong> a rewrite system R with semantic information in such a way that direct termination<br />
methods become applicable for <strong>the</strong> labeled rewrite system Rlab. In order to label R, one<br />
needs to define suitable interpretation- and labeling-functions for all symbols appearing<br />
in R. Naturally, <strong>the</strong>se functions have to be chosen such that POP ∗ is applicable to <strong>the</strong><br />
labeled system. To find <strong>the</strong>m automatically, we extend <strong>the</strong> propositional encoding from<br />
(Avanzini and Moser, 2008). Satisfiability <strong>of</strong> <strong>the</strong> constructed formula certifies <strong>the</strong> existence<br />
<strong>of</strong> a labeled system Rlab that is compatible with POP ∗ . Finite semantic labeling is<br />
non-termination preserving and moreover, it is complexity preserving. Thus from compatibility<br />
<strong>of</strong> Rlab with POP ∗ we conclude that R admits a polynomial runtime complexity<br />
(c.f. Lemma 6).<br />
7
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
A translation <strong>of</strong> infinite semantic labeling in conjunction with RPOs has already been<br />
given in (Koprowski and Middeldorp, 2007). Unfortunately, this approach is inapplicable<br />
in our context since <strong>the</strong> runtime complexity <strong>of</strong> <strong>the</strong> original system cannot be related to <strong>the</strong><br />
runtime complexity <strong>of</strong> <strong>the</strong> infinite labeled system in general. Fur<strong>the</strong>rmore, finite semantic<br />
labeling using heuristics is implemented in <strong>the</strong> termination prover TPA (Koprowski, 2006)<br />
for instance. We consider <strong>the</strong> here presented approach favorable, as <strong>the</strong> choice <strong>of</strong> labeling<br />
suitable for <strong>the</strong> base order can be left to a state-<strong>of</strong>-<strong>the</strong>-art SAT-solver.<br />
1 The Polynomial Path Order<br />
We briefly recall <strong>the</strong> basic concepts <strong>of</strong> term rewriting, for details (Baader and Nipkow,<br />
1998) provides a good resource. Let V denote a countably infinite set <strong>of</strong> variables and F<br />
a signature. The set <strong>of</strong> terms over F and V is denoted by T (F, V). We write ✂ for <strong>the</strong><br />
subterm relation, <strong>the</strong> converse is denoted by ☎ and <strong>the</strong> <strong>the</strong> strict part <strong>of</strong> ☎ by ✄.<br />
A term rewrite system (TRS for short) R over T (F, V) is a set <strong>of</strong> rewrite rules l → r<br />
such that l, r ∈ T (F, V), l �∈ V and all variables <strong>of</strong> r also appear in l. In <strong>the</strong> following, R<br />
will always denote a TRS and in our context, R is finite. A binary relation on T (F, V) is a<br />
rewrite relation if it is compatible with F-operations and closed under substitutions. The<br />
smallest extension <strong>of</strong> R that is a rewrite relation is denoted by →R. The innermost rewrite<br />
relation i −→R is a restriction <strong>of</strong> →R, where innermost terms have to be reduced first. The<br />
transitive and reflexive closure <strong>of</strong> a rewrite relation → is denoted by →∗ and we write<br />
s →n t for <strong>the</strong> contraction <strong>of</strong> s to t in n steps. We say that R is (innermost) terminating<br />
i<br />
if <strong>the</strong>re exists no infinite chain <strong>of</strong> terms t0, t1, . . . such that ti →R ti+1 (ti −→R ti+1) for<br />
all i ∈ N.<br />
The root symbols <strong>of</strong> left-hand sides <strong>of</strong> rewrite rules in R are called defined symbols and<br />
collected in D(R), while all o<strong>the</strong>r symbols are called constructor symbols and collected<br />
in C(R). A term f(s1, . . . , sn) is constructor-based with respect to R if f ∈ D(R) and<br />
s1, . . . , sn ∈ T (C(R), V). We write T cb(R) for <strong>the</strong> set <strong>of</strong> all constructor-based terms<br />
over R. If every left-hand side <strong>of</strong> R is constructor-based <strong>the</strong>n R is called constructor<br />
TRS. Constructor TRSs allow us to model <strong>the</strong> computation <strong>of</strong> functions in a very natural<br />
way. Consider <strong>the</strong> following TRS:<br />
Example 1 The constructor TRS Rmult is defined by<br />
add(0, y) → y mult(0, y) → 0<br />
add(s(x), y) → s(add(x, y)) mult(s(x), y) → add(y, mult(x, y)).<br />
Rmult defines <strong>the</strong> function symbols add and mult, i.e. D(R) = {add, mult}. Natural<br />
numbers are represented using <strong>the</strong> constructor symbols from C(R) = {s, 0}. Define <strong>the</strong><br />
encoding function �·� : Σ ∗ → T (C(R), ∅) by �0� = 0 and �n + 1� = s(�n�). Then for<br />
all n, m ∈ N, mult(�n�, �m�) i −→ ∗ R �n ∗ m�. We say that Rmult computes multiplication<br />
(and addition) on natural numbers. For instance, <strong>the</strong> system admits <strong>the</strong> innermost rewrite<br />
sequence mult(s(0), 0) i −→ add(0, mult(0, 0)) i −→ add(0, 0) i −→ 0, computing 1 ∗ 0. Notice<br />
that we have to reduce in <strong>the</strong> second step <strong>the</strong> innermost redex mult(0, 0) first.<br />
In (Lescanne, 1995) it is proposed to conceive <strong>the</strong> complexity <strong>of</strong> a rewrite system<br />
R as <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> functions computed by R. Whereas this view falls into <strong>the</strong><br />
realm <strong>of</strong> implicit complexity analysis, we conceive rewriting under R as <strong>the</strong> evaluation<br />
8
mechanism <strong>of</strong> <strong>the</strong> encoded function. Thus it is natural to define <strong>the</strong> runtime complexity<br />
based on <strong>the</strong> number <strong>of</strong> rewrite steps admitted by R. Let |s| denote <strong>the</strong> size <strong>of</strong> a term<br />
s. The (innermost) runtime complexity <strong>of</strong> a terminating rewrite system R is defined by<br />
DlR(m) = max{n | ∃s, t. s i −→ n t, s ∈ T cb(R) and |s| � m}.<br />
To verify whe<strong>the</strong>r <strong>the</strong> runtime complexity <strong>of</strong> a rewrite system R is polynomially<br />
bounded, we employ <strong>the</strong> polynomial path order. Similar to <strong>the</strong> recursion-<strong>the</strong>oretic characterization<br />
<strong>of</strong> <strong>the</strong> polytime functions given in (Bellantoni and Cook, 1992), POP ∗ relies<br />
on <strong>the</strong> separation <strong>of</strong> safe and normal inputs. For this, <strong>the</strong> notion <strong>of</strong> safe mappings is introduced.<br />
A safe mapping safe associates with every n-ary function symbol f <strong>the</strong> set <strong>of</strong><br />
safe argument positions. If f ∈ D(R) <strong>the</strong>n safe(f) ⊆ {1, . . . , n}, for f ∈ C(R) we fix<br />
safe(f) = {1, . . . , n}. The argument positions not included in safe(f) are called normal<br />
and denoted by nrm(f). A precedence is an irreflexive and transitive order on F. The<br />
polynomial path order >pop∗ is an extension <strong>of</strong> <strong>the</strong> auxiliary order >pop, both defined in<br />
<strong>the</strong> following definitions:<br />
Definition 2 Let > be a precedence and safe a safe mapping. We define <strong>the</strong> order >pop<br />
inductively as follows: s = f(s1, . . . , sn) >pop t if one <strong>of</strong> <strong>the</strong> following alternatives hold:<br />
1. f ∈ C(R) and si > = pop t for some i ∈ {1, . . . , n}, or<br />
2. si > = pop t for some i ∈ nrm(f), or<br />
3. t = g(t1, . . . , tm) with f ∈ D(R) and f > g and s >pop ti for all 1 � i � m.<br />
Definition 3 Let > be a precedence and safe a safe mapping. We define <strong>the</strong> polynomial<br />
path order >pop∗ inductively as follows: s = f(s1, . . . , sn) >pop∗ t if ei<strong>the</strong>r<br />
1. s >pop t, or<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
2. si > = pop∗ t for some i ∈ {1, . . . , n}, or<br />
3. t = g(t1, . . . , tm), with f ∈ D(R), f > g, and <strong>the</strong> following properties hold:<br />
• s >pop∗ ti0 for some i0 ∈ safe(g) and<br />
• ei<strong>the</strong>r s >pop ti or s ✄ ti and i ∈ safe(g) for all i �= i0, or<br />
4. t = f(t1, . . . , tm) and for nrm(f) = {i1, . . . , ip}, safe(f) = {j1, . . . , jq} both<br />
[si1, . . . , sip] (>pop∗)mul [ti1, . . . , tip] and [sj1, . . . , sjq] (> = pop∗)mul [tj1, . . . , tjq] holds.<br />
Here > = pop∗ (> = pop) denotes <strong>the</strong> reflexive closure <strong>of</strong> >pop∗ (>pop) and (>pop∗)mul <strong>the</strong> multiset<br />
extension <strong>of</strong> >pop∗. When R ⊆ >pop∗ holds, we say that >pop∗ is compatible with R.<br />
The main <strong>the</strong>orem from (Avanzini and Moser, 2008) states:<br />
Theorem 4 Let R be a finite, constructor TRS compatible with >pop∗, i.e., R ⊆ >pop∗.<br />
Then <strong>the</strong> runtime complexity <strong>of</strong> R is polynomial. The polynomial depends only on <strong>the</strong><br />
cardinality <strong>of</strong> F and <strong>the</strong> sizes <strong>of</strong> <strong>the</strong> right-hand sides in R.<br />
We conclude this section by demonstrating <strong>the</strong> application <strong>of</strong> POP ∗ on <strong>the</strong> TRS Rmult:<br />
9
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Example 5 Reconsider <strong>the</strong> rewrite system Rmult from Example 1. We suppose that <strong>the</strong><br />
second argument <strong>of</strong> addition (add) is safe (safe(add) = {2}) and that all arguments <strong>of</strong><br />
multiplication (mult) are normal (safe(mult) = ∅). Fur<strong>the</strong>rmore let <strong>the</strong> precedence ><br />
be defined as mult > add > s. Then Rmult is compatible with >pop∗. As a consequence<br />
<strong>of</strong> Theorem 4, <strong>the</strong> number <strong>of</strong> rewrite steps starting from mult(�n�, �m�) is polynomially<br />
bounded in n and m.<br />
In order to verify compatibility for this particular instance >pop∗ we need to show that<br />
all <strong>the</strong> rules in Rmult are strictly decreasing with respect to >pop∗, that is l >pop∗ r holds<br />
for l → r ∈ Rmult. To exemplify this, consider <strong>the</strong> rule add(s(x), y) → s(add(x, y)).<br />
We write 〈i〉 for <strong>the</strong> i-th case <strong>of</strong> Definition 3. From s(x) >pop∗ x by rule 〈2〉 we infer<br />
[s(x)](>pop∗)mul[x]. Fur<strong>the</strong>rmore [y](> = pop∗)mul[y] holds and thus by rule 〈4〉 we obtain<br />
add(s(x), y) >pop∗ add(x, y). Finally, from this and add > s we conclude by one application<br />
<strong>of</strong> rule 〈3〉 that add(s(x), y) >pop∗ s(add(x, y)) holds.<br />
2 A Propositional Encoding <strong>of</strong> POP ∗ with Finite Semantic Labeling<br />
In (Zantema, 1995) <strong>the</strong> transformation technique semantic labeling is introduced. From<br />
R a labeled TRS Rlab is obtained by labeling <strong>the</strong> function symbols in R with semantic<br />
information. Semantics are given to R by defining a model. A model is a F-algebra<br />
A, i.e. a carrier A equipped with operations fA : A n → A for every n-ary symbol<br />
f ∈ F, such that for every rule l → r ∈ R and any assignment α : V → A, <strong>the</strong> equality<br />
[α]A(l) = [α]A(r) holds. Here [α]A(t) denotes <strong>the</strong> interpretation <strong>of</strong> t with assignment α,<br />
inductively defined by [α]A(t) = α(t) if t ∈ V and [α]A(t) = fA([α]A(t1), . . . , [α]A(tn))<br />
if t = f(t1, . . . , tn). The system is <strong>the</strong>n labeled according to a labeling ℓ for A, i.e. a set<br />
<strong>of</strong> mappings ℓf : A n → A for every n-ary function symbol f ∈ F. 1<br />
For every assignment α, <strong>the</strong> mapping labα(t) is defined by labα(t) = t if t ∈ V and<br />
labα(f(t1, . . . , tn)) = fa(labα(t1), . . . , labα(tn)) where a = ℓf([α]A(t1), . . . , [α]A(tn)).<br />
The labeled TRS Rlab is obtained by labeling all rules for all assignments α, that is<br />
Rlab = {labα(l) → labα(r) | l → r ∈ R and assignment α}.<br />
The main <strong>the</strong>orem from (Zantema, 1995) states that Rlab is terminating if and only if R<br />
is terminating. In <strong>the</strong> following, we restrict to algebras B with carrier B = {true, false},<br />
however <strong>the</strong> approach is extensible to arbitrary finite carriers.<br />
To encode a Boolean function b : B n → B, we make use <strong>of</strong> unique propositional atoms<br />
bw for every sequence <strong>of</strong> arguments w = w1, . . . , wn ∈ B n . The atom bw will denote<br />
<strong>the</strong> result <strong>of</strong> applying w1, . . . , wn to b. Let a1, . . . , an be propositional formulas. To<br />
impose restrictions on <strong>the</strong> encoded function b, we introduce <strong>the</strong> formula �b�(a1, . . . , an)<br />
such that for a satisfying assignment ν <strong>the</strong> equality ν(�b�(a1, . . . , an)) = bν(a1),...,ν(an)<br />
holds. For instance with �b�(a1, a2) ↔ r we assert that <strong>the</strong> encoded function b satisfies<br />
b(ν(a1), ν(a2)) = ν(r).<br />
For every assignment α : V → A and term t appearing in R we introduce <strong>the</strong> atoms<br />
intα,t and labα,t for t �∈ V. The meaning <strong>of</strong> intα,t will be <strong>the</strong> result <strong>of</strong> [α]B(t), labα,t<br />
will denote <strong>the</strong> label <strong>of</strong> <strong>the</strong> root symbol <strong>of</strong> t under α. In order to ensure this for t =<br />
1 The definition from (Zantema, 1995) allows <strong>the</strong> labeling <strong>of</strong> a subset <strong>of</strong> F and leave o<strong>the</strong>r symbols<br />
unchanged. In our context, this has no consequence and simplifies <strong>the</strong> translation.<br />
10
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
f(t1, . . . , tn) and a particular assignment α, we define<br />
INTα(t) = intα,t ↔ �fB�(intα,t1, . . . , intα,tn), and<br />
LABα(t) = labα,t ↔ �ℓf�(intα,t1, . . . , intα,tn).<br />
Fur<strong>the</strong>rmore for t ∈ V we set INTα(t) = intα,t ↔ α(t). We extend ☎ to TRSs as follows:<br />
R ☎ t if l ☎ t or r ☎ t for some rule l → r ∈ R. Beside <strong>the</strong> model condition, <strong>the</strong> above<br />
constraints have to be enforced for every term appearing in R. This is covered by<br />
LAB(R) = ��<br />
�<br />
(INTα(t) ∧ LABα(t)) ∧ �<br />
(intα,l ↔ intα,r) � .<br />
α<br />
R☎t<br />
l→r∈R<br />
Assume ν is a satisfying assignment for LAB(R) and Rlab denotes <strong>the</strong> system obtained by<br />
labeling R according to <strong>the</strong> encoded labeling and model. In order to show compatibility<br />
<strong>of</strong> Rlab with POP ∗ , we need to find a precedence > and safe mapping safe such that<br />
Rlab ⊆>pop∗ holds for <strong>the</strong> induced order >pop∗. To compare <strong>the</strong> labeled versions <strong>of</strong> two<br />
concrete terms s, t ∈ T (F, V) under a particular assignment α, we define<br />
�s >pop∗ t�α = �s > (1)<br />
pop∗ t�α ∨ �s > (2)<br />
pop∗ t�α ∨ �s > (3)<br />
pop∗ t�α ∨ �s > (4)<br />
pop∗ t�α.<br />
Here �s > (i)<br />
pop∗ t� refers to <strong>the</strong> encodings <strong>of</strong> <strong>the</strong> case 〈i〉 from Definition 3. We discuss<br />
<strong>the</strong> cases 〈2〉 – 〈4〉, case 〈1〉, <strong>the</strong> comparison using <strong>the</strong> weaker order >pop, is obtained<br />
similarly.<br />
Note that si = t implies labα(si) = labα(t). Thus case 〈2〉 is perfectly captured<br />
by �f(s1, . . . , sn) > (2)<br />
pop∗ t�α = ⊤ 2 if si = t holds for some si. O<strong>the</strong>rwise, we define<br />
�f(s1, . . . , sn) > (2)<br />
pop∗ t�α = � n<br />
i=1 �si >pop∗ t�α. For f ∈ F and formula a representing<br />
<strong>the</strong> label, <strong>the</strong> formula SF(fa, i) (NRM(fa, i)) assesses that depending on <strong>the</strong> valuation <strong>of</strong><br />
a, <strong>the</strong> i-th position <strong>of</strong> ftrue or ffalse is safe (normal). Likewise, for f, g ∈ F, <strong>the</strong> formula<br />
�fa > gb� is defined such that for a satisfying assignment ν, fν(a) > gν(b) is asserted.<br />
Assume <strong>the</strong> unlabeled symbol f is a defined symbol <strong>of</strong> R.We define for f �= g<br />
�f(s1, . . . , sn) > (3)<br />
pop∗ g(t1, . . . , tm)�α = �flabα,s > glabα,t�<br />
n� �<br />
∧ �s >pop∗ ti0�α ∧ SF(glabα,t, i0)<br />
i0=1<br />
∧<br />
n�<br />
i=1,i�=i0<br />
� �s > (1)<br />
pop∗ ti�α ∨ ( SF(glabα,t, i) ∧ �s ✄ ti� ) ��<br />
.<br />
Here we employ that <strong>the</strong> superterm property ✄ is closed under labeling. Additionally<br />
we add <strong>the</strong> rule fa(x1, . . . , xn) → c with c a fresh constant to <strong>the</strong> labeled system and<br />
require fa > c in <strong>the</strong> precedence. This guarantees that fa is defined with respect to<br />
Rlab as o<strong>the</strong>rwise case 〈3〉 is not applicable. Alternatively one could encode whe<strong>the</strong>r fa is<br />
defined and adopt <strong>the</strong> encoding <strong>of</strong> case 〈3〉 accordingly, but experimental findings indicate<br />
that <strong>the</strong> described approach is favorable.<br />
To encode multiset comparisons, we make use <strong>of</strong> multiset covers (Schneider-Kamp,<br />
Thiemann, Annov, Codish and Giesl, 2007). A multiset cover is a pair <strong>of</strong> total mappings<br />
2 We use ⊤ and ⊥ to denote truth and falsity in propositional formulas.<br />
11
γ : {1, . . . , n} → {1, . . . , n} and ε: {1, . . . , n} → B, encoded using fresh atoms γi,j and<br />
εi. The underlying idea is that for <strong>the</strong> comparison [s1, . . . , sn](> = pop∗)mul[t1, . . . , tn] to<br />
hold, every term tj has to be covered by some term si (encoded as γij = true), ei<strong>the</strong>r by<br />
si = tj (εi = true) or si >pop∗ tj (εi = false). For <strong>the</strong> case si = tj, si must not cover<br />
any element besides tj. To assert a correct encoding <strong>of</strong> (γ, ε), we introduce <strong>the</strong> formula<br />
�(γ, ε)�. By means <strong>of</strong> multiset covers we are able to encode case 〈4〉 using one multiset<br />
comparison. We define<br />
�f(s1, . . . , sn) > (4)<br />
pop∗ f(t1, . . . , tn)�α =<br />
(labα,s ↔ labα,t) ∧ �(γ, ε)� ∧<br />
∧<br />
n�<br />
i=1 j=1<br />
n� � �<br />
NRM(flabα,s, i) ∧ ¬εi<br />
i=1<br />
n� �<br />
γi,j → � (SF(flabα,s, i) ↔ SF(flabα,t, j))<br />
∧ (εi → �si = tj�) ∧ (¬εi → �si >pop∗ tj�α) ��<br />
where we restrict comparisons <strong>of</strong> arguments by <strong>the</strong>ir kind. Assuming STRICT(R) and<br />
SMSL(R) cover <strong>the</strong> restrictions on <strong>the</strong> precedence and safe mapping, satisfiability <strong>of</strong><br />
POP ∗ SL(R) = � �<br />
�l >pop∗ r�α ∧ SM(R) ∧ STRICT(R) ∧ LAB(R)<br />
α<br />
l→r∈R<br />
certifies <strong>the</strong> existence <strong>of</strong> a model B and labeling ℓ such that <strong>the</strong> rewrite system<br />
R ′ lab = Rlab ∪ {fa(x1, . . . , xn) → c | f ∈ D(R) and fa ∈ C(Rlab)}<br />
is compatible with >pop∗. Since every rewrite sequence in R translates to a sequence in<br />
Rlab, by Theorem 4 it is an easy exercise to pro<strong>of</strong> <strong>the</strong> following lemma:<br />
Lemma 6 Let R be a finite, constructor TRS and assume POP∗ SL (R) is satisfiable. Then<br />
<strong>the</strong> induced runtime complexity is polynomial.<br />
3 Experimental Results<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
We implemented <strong>the</strong> encoding <strong>of</strong> POP ∗ with semantic labeling (denoted by POP ∗ SL )<br />
in OCaml and compare it to <strong>the</strong> implementation without labeling from (Avanzini and<br />
Moser, 2008) (denoted by POP ∗ ) and an implementation <strong>of</strong> a restricted class <strong>of</strong> polynomial<br />
interpretations (denoted by SMC). To check satisfiability <strong>of</strong> <strong>the</strong> obtained formulas<br />
we employ <strong>the</strong> MiniSat SAT-solver (Eén and Sörensson, 2003).<br />
SMC refers to a restrictive class polynomial interpretations: Every constructor symbol<br />
is interpreted by a strongly linear polynomial, i.e. a polynomial <strong>of</strong> shape P (x1, . . . , xn) =<br />
Σ n i=1xi + c with c ∈ N, c � 1. Fur<strong>the</strong>rmore, each defined symbol is interpreted by a<br />
simple-mixed polynomial P (x1, . . . , xn) = Σij∈0,1ai1...inx i1<br />
1 . . . x in<br />
n + Σ n i=1bix 2 i with coefficients<br />
in N. For this class <strong>of</strong> polynomial interpretations it is trivial to check that <strong>the</strong>y<br />
induce polynomial bounds on <strong>the</strong> runtime complexity. To find <strong>the</strong>se interpretations automatically<br />
we employ cdiprover3 (Moser and Schnabl, 2008).<br />
12
The table below presents experimental results based on two testbeds. Testbed T constitutes<br />
<strong>of</strong> <strong>the</strong> 957 examples from <strong>the</strong> Termination Problem Database 4.03 (TPDB) that were<br />
automatically verified terminating in <strong>the</strong> competition <strong>of</strong> 20074 . Testbed C is a restriction<br />
<strong>of</strong> T where only constructor TRSs have been considered (449 in total). Experimental<br />
results, performed on a PC with 512 MB <strong>of</strong> RAM and a 2.4 GHz Intel R○ Pentium TM<br />
IV<br />
processor, are collected in Table 15 .<br />
Table 1: Experimental results on TPDB 4.0.<br />
POP ∗ POP ∗ SL SMC<br />
T C T C T C<br />
Yes 65 41 128 74 156 83<br />
Maybe 892 408 800 370 495 271<br />
Timeout (60 sec.) 0 0 29 5 306 95<br />
Average Time Yes (sec.) 0.037 0.<strong>13</strong>0 0.183<br />
The results confirm that semantic labeling significantly increases <strong>the</strong> power <strong>of</strong> POP∗ ,<br />
yielding comparable results to SMC. What is noteworthy is that <strong>the</strong> union <strong>of</strong> yes-instances<br />
<strong>of</strong> <strong>the</strong> three methods constitutes <strong>of</strong> 218 examples for testbed T and 112 for testbed C. For<br />
<strong>the</strong>se 112 out <strong>of</strong> 449 constructor TRSs we are able to conclude a polynomial runtime<br />
complexity. Interestingly POP∗ SL and SMC succeed on a quite different range <strong>of</strong> systems.<br />
There are 29 constructor TRSs that only POP∗ SL can deal with, whereas 38 constructor<br />
yes-instances <strong>of</strong> SMC cannot be handled by POP∗ SL . Table 1 reflects that for both suites<br />
SMC runs into a timeout for approximately every fourth system. This indicates that purely<br />
semantic methods similar to SMC tend to get impractical when <strong>the</strong> size <strong>of</strong> <strong>the</strong> input system<br />
increases. Compared to this, <strong>the</strong> number <strong>of</strong> timeouts <strong>of</strong> POP ∗ SL<br />
is ra<strong>the</strong>r low, confirming<br />
<strong>the</strong> feasibility <strong>of</strong> our new approach.<br />
We perform various optimizations in our implementation: First <strong>of</strong> all, <strong>the</strong> constraint<br />
formula can be reduced during construction. It is usually beneficial in combination with<br />
this to lazily construct <strong>the</strong> formula. For example, �f(s1, . . . , sn) > (2)<br />
pop∗ si�α reduces to ⊤<br />
and thus one can directly conclude �f(s1, . . . , sn) >pop∗ si�α = ⊤ without constructing<br />
encodings for <strong>the</strong> o<strong>the</strong>r cases. Fur<strong>the</strong>rmore, s >pop∗ t is doomed to failure if t contains<br />
variables not appearing in s, in this case we replace <strong>the</strong> constraint by ⊥. SAT-solvers<br />
expect <strong>the</strong>ir input in CNF (worst case exponential in size). We employ <strong>the</strong> transformation<br />
proposed in (Plaisted and Greenbaum, 1986) to obtain a equisatisfiable CNF linear in size.<br />
This approach is analogous to Tseitin’s transformation (Tseitin, 1968) but additionally<br />
takes <strong>the</strong> plurality <strong>of</strong> atoms into account, usually resulting in shorter transformations.<br />
4 Conclusion<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In this paper we have shown how to automatically verify polynomial runtime complexities<br />
<strong>of</strong> rewrite systems. For that we employ semantic labeling and <strong>the</strong> polynomial path order<br />
3 Available at http://www.lri.fr/ ∼ marche/tpdb.<br />
4 C.f. http://www.lri.fr/ ∼ marche/termination-competition/2007/.<br />
5 Detailed results available at http://homepage.uibk.ac.at/ ∼ csae2496/esslli08.<br />
<strong>13</strong>
POP ∗ . Our automation works by a reduction to SAT and employing a state-<strong>of</strong>-<strong>the</strong>-art<br />
SAT-solver. To our best knowledge, this is <strong>the</strong> first SAT encoding <strong>of</strong> recursive path orders<br />
with finite semantic labeling. The experimental results confirm <strong>the</strong> feasibility <strong>of</strong> our approach.<br />
Moreover, <strong>the</strong>y demonstrate that by semantic labeling we significantly increase<br />
<strong>the</strong> power <strong>of</strong> POP ∗ .<br />
Our research seems also comparable to (Bonfante, Marion and Pchoux, 2007), where<br />
recursive path orders toge<strong>the</strong>r with strongly linear polynomial quasi-interpretations are<br />
employed in <strong>the</strong> complexity analysis. However, this method relies on caching techniques<br />
to achieve polytime computability. Opposite to this, we only demand an eager evaluation<br />
strategy.<br />
In future work we will streng<strong>the</strong>n <strong>the</strong> applicability <strong>of</strong> our methods. Currently we investigate<br />
in <strong>the</strong> integration <strong>of</strong> POP ∗ into <strong>the</strong> dependency pair framework for an automatic<br />
complexity analysis as proposed in (Hirokawa and Moser, 2008). As this framework allows<br />
<strong>the</strong> use <strong>of</strong> argument filterings (Kusakari, Nakamura and Toyama, 1999) and usable<br />
rules (Arts and Giesl, 2000), we expect a significant increase in <strong>the</strong> ability to automatically<br />
verify polynomial runtime complexities.<br />
Finally we want to mention ano<strong>the</strong>r exciting field <strong>of</strong> application. There is a long interest<br />
in <strong>the</strong> functional programming community to automatically verify complexity properties<br />
<strong>of</strong> programs. For brevity we just mention (Rosendahl, 1989; Anderson, Khoo,<br />
Andrei and Luca, 2005; Bonfante et al., 2007). Rewriting naturally models <strong>the</strong> evaluation<br />
<strong>of</strong> functional programs, and termination behavior <strong>of</strong> functional programs via transformations<br />
to rewrite systems has been extensively studied. For instance, one recent approach is<br />
described in (Giesl, Swiderski, Schneider-Kamp and Thiemann, 2006) where Haskell programs<br />
are covered. In joint work with Hirokawa, Middeldorp and Moser (Avanzini, Hirokawa,<br />
Middeldorp and Moser, 2007) we propose a translation from (a subset <strong>of</strong> higherorder)<br />
Scheme programs to term rewrite systems. The transformation is designed to be<br />
complexity preserving and thus allows <strong>the</strong> study <strong>of</strong> <strong>the</strong> complexity <strong>of</strong> a Scheme program<br />
P by <strong>the</strong> analysis <strong>of</strong> <strong>the</strong> transformed rewrite system R. Hence from compatibility <strong>of</strong> R<br />
with POP ∗ we can directly conclude that <strong>the</strong> number <strong>of</strong> evaluation steps <strong>of</strong> <strong>the</strong> Scheme<br />
program P is polynomially bounded with respect to <strong>the</strong> input sizes. All necessary steps<br />
can be performed mechanically and thus we arrive at a completely automatic complexity<br />
analysis for Scheme, and eagerly evaluated functional programs in general.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Anderson, H., Khoo, S.-C., Andrei, S. and Luca, B. (2005). Calculating polynomial<br />
runtime properties, Proc. 3th APLAS, pp. 230–246.<br />
Arts, T. and Giesl, J. (2000). Termination <strong>of</strong> term rewriting using dependency pairs, TCS<br />
236(1-2): <strong>13</strong>3–178.<br />
Avanzini, M., Hirokawa, N., Middeldorp, A. and Moser, G. (2007). Proving termination<br />
<strong>of</strong> scheme programs by rewriting. Draft 6 .<br />
Avanzini, M. and Moser, G. (2008). Complexity analysis by rewriting, Proc. 9th FLOPS,<br />
Vol. 4989 <strong>of</strong> LICS, pp. <strong>13</strong>0–146.<br />
6 Available at http://cl-informatik.uibk.ac.at/ ∼ georg/list.publications<br />
14
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Baader, F. and Nipkow, T. (1998). Term Rewriting and All That, Cambridge University<br />
Press.<br />
Bellantoni, S. and Cook, S. A. (1992). A new recursion-<strong>the</strong>oretic characterization <strong>of</strong> <strong>the</strong><br />
polytime functions, CC 2: 97–110.<br />
Bonfante, G., Marion, J.-Y. and Pchoux, R. (2007). Quasi-interpretation syn<strong>the</strong>sis by<br />
decomposition., Proc. 4th ICTAC, Vol. 4711 <strong>of</strong> LICS, pp. 410–424.<br />
Eén, N. and Sörensson, N. (2003). An extensible sat-solver, Proc. 6th SAT, Vol. 2919 <strong>of</strong><br />
LICS, pp. 502–518.<br />
Giesl, J., Swiderski, S., Schneider-Kamp, P. and Thiemann, R. (2006). Automated termination<br />
analysis for haskell: From term rewriting to programming languages, Proc.<br />
17th RTA, Vol. 4098 <strong>of</strong> LICS, pp. 297–312.<br />
Hirokawa, N. and Moser, G. (2008). Automated complexity analysis based on <strong>the</strong> dependency<br />
pair method, Proc. 4th IJCAR. To appear.<br />
H<strong>of</strong>bauer, D. (1992). Termination pro<strong>of</strong>s by multiset path orderings imply primitive recursive<br />
derivation lengths, TCS 105(1): 129–140.<br />
Koprowski, A. (2006). Tpa: Termination proved automatically, Proc. 17th RTA, pp. 257–<br />
266.<br />
Koprowski, A. and Middeldorp, A. (2007). Predictive labeling with dependency pairs<br />
using sat, Proc. 21th CADE, Vol. 4603 <strong>of</strong> LICS, pp. 410–425.<br />
Kusakari, K., Nakamura, M. and Toyama, Y. (1999). Argument filtering transformation,<br />
Proc. 1th PPDP, Vol. 1702 <strong>of</strong> LICS, pp. 47–61.<br />
Lescanne, P. (1995). Termination <strong>of</strong> rewrite systems by elementary interpretations, Formal<br />
Aspects <strong>of</strong> Computing 7(1): 77–90.<br />
Moser, G. and Schnabl, A. (2008). Proving quadratic derivational complexities using<br />
context dependent interpretations, Proc. 19th RTA. To appear.<br />
Plaisted, D. A. and Greenbaum, S. (1986). A structure-preserving clause form translation,<br />
J. Symb. Comput. 2(3): 293–304.<br />
Rosendahl, M. (1989). Automatic complexity analysis, Proc. 4th FPCA, pp. 144–156.<br />
Schneider-Kamp, P., Thiemann, R., Annov, E., Codish, M. and Giesl, J. (2007). Proving<br />
termination using recursive path orders and SAT solving, Proc. 6th FroCoS, number<br />
4720 in LNCS, pp. 267–282.<br />
TeReSe (2003). Term Rewriting Systems, Vol. 55 <strong>of</strong> CTTCS, Cambridge University Press.<br />
Tseitin, G. (1968). On <strong>the</strong> complexity <strong>of</strong> derivation in propositional calculus, SCML, Part<br />
2 pp. 115–125.<br />
Zantema, H. (1995). Termination <strong>of</strong> term rewriting by semantic labelling, FI 24(1/2): 89–<br />
105.<br />
15
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
16
SIMULATING SPOKEN DIALOGUE<br />
WITH A FOCUS ON REALISTIC TURN-TAKING<br />
Timo Baumann<br />
University <strong>of</strong> Potsdam<br />
Abstract. We present a system for testing turn-taking strategies in a simulation environment,<br />
in which artificial dialogue participants exchange audio streams in real time – unlike earlier<br />
turn-taking simulations, which interchanged unambiguous symbolic messages. Dialogue<br />
participants autonomously determine <strong>the</strong>ir turn-taking behaviour, based on <strong>the</strong>ir analysis <strong>of</strong><br />
<strong>the</strong> incoming audio. We use machine-learning methods to classifiy <strong>the</strong> continuous audio<br />
signal into symbolic turn-taking states. We experiment with various rule sets and show how<br />
simple, local management rules can create realistic behavioural patterns.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Turn-taking management, i. e. deciding who may speak when in a dialogue, is an important<br />
subtask <strong>of</strong> interaction management. The classical model <strong>of</strong> turn-taking (Sacks,<br />
Schegl<strong>of</strong>f and Jefferson, 1974) describes turn-taking as locally managed (depending only<br />
on a local context) and predictive (upcoming turn endings are signalled in advance by <strong>the</strong><br />
interplay <strong>of</strong> syntax, semantics and prosody). Current speech dialogue systems (SDSes) on<br />
<strong>the</strong> o<strong>the</strong>r hand, use reactive turn-taking schemes, with <strong>the</strong> turn being taken after a silence<br />
<strong>of</strong> fixed length or <strong>of</strong> contextually determined length (Ferrer, Shriberg and Stolcke, 2002).<br />
This limits <strong>the</strong> interactivity <strong>of</strong> SDSes, as turns have to be separated by intervening silence.<br />
The prediction <strong>of</strong> turn endings (EoT prediction) has been investigated by a number<br />
<strong>of</strong> authors. Schlangen (2006) trains classifiers to predict <strong>the</strong> end <strong>of</strong> turn (EoT) but uses<br />
features that are not calculated strictly incrementally. Turn-management has also been<br />
studied before, but typically in simulation systems that interchange symbolic messages<br />
and work in a centrally managed environment (Padilha, 2006). In <strong>the</strong> present paper, we<br />
combine <strong>the</strong> efforts for EoT-prediction and turn-taking simulation. We propose an incremental<br />
classification <strong>of</strong> speech into speech states that control <strong>the</strong> system’s turn-taking. We<br />
first evaluate <strong>the</strong> classification itself and <strong>the</strong>n combined with different turn-management<br />
strategies in a dialogue simulation environment.<br />
Dialogue simulation itself has a long standing tradition in <strong>the</strong> development <strong>of</strong> SDSes,<br />
but <strong>the</strong> main focus seems to be on <strong>the</strong> improvement <strong>of</strong> dialogue strategies (Schatzmann,<br />
Weilhammer, Stuttle and Young, 2006) and audio is usually just used to trigger realistic<br />
ASR errors (López-Cózar, De la Torre, Segura and Rubio, 2003), which contrasts with<br />
<strong>the</strong> focus <strong>of</strong> <strong>the</strong> present paper: Our goal is to show how realistic turn-taking behaviour<br />
can be simulated using only local context for <strong>the</strong> classification <strong>of</strong> speech into classes relevant<br />
to turn-taking management combined with simple, locally managed rules. Dialogue<br />
strategies in general are not locally managed and thus learning dialogue strategies seems<br />
to require <strong>the</strong> more complex reinforcement learning instead <strong>of</strong> simple classifier training<br />
which we use.<br />
17
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Figure 1: A human user conversing with an artificial DP in our interaction environment<br />
(structured as in section 2). A dialogue recorder wiretaps <strong>the</strong>ir conversation.<br />
We do not (and do not need to) take into account <strong>the</strong> content <strong>of</strong> <strong>the</strong> dialogues and<br />
in fact we limit our speech analysis to simple prosodic features for <strong>the</strong> EoT prediction.<br />
Thus, for this work, we abstract away from all questions <strong>of</strong> content management and let<br />
our dialogue participants speak randomly selected pre-recorded utterances – though with<br />
proper turn-taking.<br />
The remainder <strong>of</strong> <strong>the</strong> paper is structured as follows: Section 2 describes <strong>the</strong> system<br />
architecture and Section 3 <strong>the</strong> corpora we use. Section 4 evaluates <strong>the</strong> speech state classification<br />
and Section 5 demonstrates and evaluates some simple turn-management strategies.<br />
We close with conclusions and ideas for fur<strong>the</strong>r work.<br />
2 Architecture <strong>of</strong> <strong>the</strong> Interaction Environment<br />
Our architecture defines an interaction environment in which dialogue participants (DPs)<br />
communicate with each o<strong>the</strong>r. Interaction is purely non-symbolic, using asynchronous<br />
audio streams over RTP (Schulzrinne, Casner, Frederick and Jacobson, 2003). There is no<br />
common clock, or o<strong>the</strong>r synchronisation required between DPs. The architecture provides<br />
a headset tool for human DPs, and monitoring tools to listen to ongoing dialogues and to<br />
record <strong>the</strong>m to disk.<br />
Figure 1 shows two dialogue participants – one human, one artificial – conversing in<br />
<strong>the</strong> environment described above. The artificial DP on <strong>the</strong> right <strong>of</strong> figure 1 is structured<br />
as described below.<br />
Artificial DPs are realized as modular and extensible collections <strong>of</strong> event-driven s<strong>of</strong>tware<br />
agents in <strong>the</strong> open agent architecture, OAA (Martin, Cheyer and Moran, 1999).<br />
In <strong>the</strong> OAA each s<strong>of</strong>tware agent advertises its own abilities to solve problems (such as<br />
generating utterances) and may itself request o<strong>the</strong>r agents to solve sub-problems (e. g.<br />
sending data over RTP). For audio processing inside <strong>the</strong> DP we rely on <strong>the</strong> Sphinx-4<br />
framework (Walker, Lamere, Kwok, Raj, Singh, Gouvea, Wolf and Woelfel, 2004) which<br />
we extended for our audio-processing pipeline. In <strong>the</strong> current system, we do not yet use<br />
Sphinx’ abilities as a speech recognizer and most o<strong>the</strong>r modules that would be needed for<br />
a real dialogue system are missing. These are obvious enhancements for later versions.<br />
18
21 Speech Generation<br />
Speech generation consists <strong>of</strong> a syn<strong>the</strong>sizer and a dispatcher. The syn<strong>the</strong>sizer currently<br />
selects from a corpus <strong>of</strong> pre-recorded utterances and will be extended to include text-tospeech.<br />
To make turn-taking management harder and <strong>the</strong> system more realistic a fixed<br />
delay <strong>of</strong> 100 ms between signal to <strong>the</strong> module and onset <strong>of</strong> <strong>the</strong> recorded utterance is<br />
introduced at this point. 1 This delay is realized by sending 100 ms <strong>of</strong> recorded silence<br />
before <strong>the</strong> utterance and utterances are also followed by 100 ms <strong>of</strong> recorded silence. (If<br />
DPs were to send digital zeros directly before and after <strong>the</strong>ir utterances, speech state<br />
classification, as described below, would become trivial.)<br />
The speech dispatcher continuously sends an RTP stream in packets <strong>of</strong> 10 ms, ei<strong>the</strong>r<br />
audio from a file or sine waves if so instructed by <strong>the</strong> syn<strong>the</strong>sizer, or silence (digital zero).<br />
It can also be ordered to interrupt <strong>the</strong> audio and to revert to silence. The dispatcher also<br />
publishes its current speech state which may be one <strong>of</strong> sil, start <strong>of</strong> turn (SoT), talk, or end<br />
<strong>of</strong> turn (EoT) to <strong>the</strong> DP it is part <strong>of</strong>.<br />
22 Speech Analysis<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Speech analysis focuses solely on local prosodic analysis for <strong>the</strong> classification <strong>of</strong> <strong>the</strong><br />
listening state (which should reflect <strong>the</strong> interlocutor’s speech state, as described above).<br />
In order to be effective, classification must happen with as short a lag as possible. While<br />
short lags would allow for reactive behaviour, we aim to predict when <strong>the</strong> interlocutor’s<br />
end <strong>of</strong> turn is approaching in order to achieve smooth turn changes and counter-balance<br />
<strong>the</strong> 100 ms lag before a response can be uttered by <strong>the</strong> speech generation.<br />
We use machine learning to classify each received frame (10 ms) <strong>of</strong> audio as silence (sil),<br />
ongoing talk (talk) or end <strong>of</strong> turn (EoT). Classification is based exclusively on signal<br />
power, pitch and derived features. Our pitch extraction is modelled after <strong>the</strong> first three<br />
steps <strong>of</strong> <strong>the</strong> YIN algorithm (de Cheveigné and Kawahara, 2002). As no smoothing or dynamic<br />
programming is applied to <strong>the</strong> pitch extraction, results are computed incrementally<br />
in real-time and become available instantaneously. The algorithm runs at several times<br />
real-time on average hardware. On <strong>the</strong> corpora described below, <strong>the</strong> gross error rate is<br />
1.6 % compared to <strong>the</strong> well known ESPS algorithm (Talkin, 1995).<br />
In order to track changes over time, we derive features by windowing over past values<br />
<strong>of</strong> pitch and power with sizes ranging from 20 to 500 ms. While <strong>the</strong> features calculated<br />
on smaller windows help to smooth and to remove outliers due to failures <strong>of</strong> <strong>the</strong> pitch<br />
extraction, <strong>the</strong> larger windows are expected to capture long-term trends. We calculate <strong>the</strong><br />
arithmetic mean and <strong>the</strong> range <strong>of</strong> <strong>the</strong> values, <strong>the</strong> mean difference between values within<br />
<strong>the</strong> window and <strong>the</strong> relative position <strong>of</strong> <strong>the</strong> minimum and maximum. We also perform<br />
a linear regression and use its slope, <strong>the</strong> MSE <strong>of</strong> <strong>the</strong> regression and <strong>the</strong> error <strong>of</strong> <strong>the</strong><br />
regression for <strong>the</strong> last value in <strong>the</strong> window.<br />
23 Turn-Taking Management<br />
The turn-taking management agent determines whe<strong>the</strong>r to start or stop emitting utterances<br />
on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> states <strong>of</strong> <strong>the</strong> generation and analysis modules. An important aspect in<br />
turn-taking management is robustness. To be robust, <strong>the</strong> turn-taking strategy must not<br />
1 In a dialogue system NLG and TTS would require processing time; for humans <strong>the</strong>re is a delay between<br />
starting to plan an utterance and <strong>the</strong> start <strong>of</strong> <strong>the</strong> articulation (Levinson, 1983).<br />
19
depend on its interlocutor acting and reacting in certain ways. Naturally, “good” dialogue<br />
will only evolve from friendly dialogue partners, but <strong>the</strong> turn-management strategy must<br />
prevent dead-locks due to <strong>the</strong> interlocutor’s behaviour.<br />
Upon <strong>the</strong> reception <strong>of</strong> dialogue state change notifications from <strong>the</strong> analysis module, <strong>the</strong><br />
agent decides about emitting messages to <strong>the</strong> generation module, ordering it to talk or to<br />
hush, according to a defined turn-taking strategy. Messages are only emitted with certain<br />
probabilities. The probabilities to start or hush were determined empirically to lead to<br />
natural performance. If no action is taken, <strong>the</strong> agent sleeps for a short while (currently,<br />
50 ms) being awakened if ano<strong>the</strong>r message is received (for example EoT changing to<br />
sil). Thus, exact timings are non-deterministic and randomly differ between agents. The<br />
probability to start an utterance is set to 0.1, and to hush during an utterance to 0.3.<br />
3 Corpora<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
We perform our experiments with two different corpora, one <strong>of</strong> simple pseudo-speech, one<br />
<strong>of</strong> read speech. Each corpus contains material from two different speakers (one female,<br />
one male) for which we train separate speech analyzers, in order to be able to simulate<br />
dialogues with one male and one female each.<br />
For pseudo-speech our speakers repeatedly uttered <strong>the</strong> syllable /ba/ instead <strong>of</strong> <strong>the</strong> actually<br />
occuring syllables in a script <strong>of</strong> 50 utterances (questions, informative sentences,<br />
confirmations, etc). By always uttering <strong>the</strong> same syllable, we remove segment-inherent<br />
influences on power and pitch variation, while at <strong>the</strong> same time retaining sentence intonation.<br />
For read speech we relied on <strong>the</strong> two major speakers <strong>of</strong> <strong>the</strong> Kiel Corpus <strong>of</strong><br />
Read Speech, KCoRS (IPDS, 1994). That corpus contains some 600 utterances for each<br />
speaker.<br />
The two corpora differ in size and complexity. Our controlled pseudo-speech poses<br />
hardly any problem for pitch-extraction and does not contain voiceless speech, silence<br />
during <strong>the</strong> occlusion <strong>of</strong> voiceless plosives or o<strong>the</strong>r potentially “difficult” audio. The<br />
KCoRS on <strong>the</strong> o<strong>the</strong>r hand contains far more training material. Also, as <strong>the</strong> pseudo-speech<br />
does not convey any semantic meaning, subjects in a listening test for <strong>the</strong> evaluation <strong>of</strong><br />
generated turn-taking patterns would not be distracted by nonsense dialogue.<br />
The performance <strong>of</strong> a speech state classifier on both <strong>of</strong> our corpora is likely to be better<br />
than on a corpus <strong>of</strong> real dialogue speech as it is more homogenous (especially compared to<br />
speaker-independent speech state classification). Thus, our results should be considered<br />
an upper bound on realistic results.<br />
The start and end <strong>of</strong> each utterance were hand-annotated and each 10 ms <strong>of</strong> audio was<br />
assigned to one <strong>of</strong> <strong>the</strong> listening states as described above with EoT being assigned to<br />
frames in <strong>the</strong> vicinity <strong>of</strong> ± 50 ms <strong>of</strong> <strong>the</strong> utterance end. For <strong>the</strong> turn-taking management<br />
experiments, we crop <strong>the</strong> audio files so that each utterance is preceeded and succeeded by<br />
100 ms <strong>of</strong> silence.<br />
4 Speech Analysis Evaluation<br />
We used <strong>the</strong> machine learning toolkit Weka (Witten and Frank, 2000) to train various<br />
speaker-dependent classifiers. For <strong>the</strong> evaluation 80 % <strong>of</strong> each corpus were used as<br />
training- and 20 % as test-set. Tables 1 and 2 show <strong>the</strong> results <strong>of</strong> <strong>the</strong> OneR-, J48 and<br />
20
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
classifier<br />
OneR<br />
J48<br />
Acc.<br />
96.1<br />
94.8<br />
female speaker<br />
Fsil Ftalk FEoT<br />
0.98 0.96 0.00<br />
0.98 0.95 0.50<br />
F AR<br />
21.4<br />
68.9<br />
Acc.<br />
92.8<br />
96.3<br />
male speaker<br />
Fsil Ftalk FEoT<br />
0.96 0.93 0.<strong>13</strong><br />
0.97 0.97 0.71<br />
F AR<br />
65.5<br />
64.3<br />
JRip 95.3 0.98 0.95 0.55 68.3 96.2 0.97 0.97 0.80 59.2<br />
Stateful JRip<br />
Stateful JRip, shifted<br />
95.9<br />
96.2<br />
0.98<br />
0.98<br />
0.95<br />
0.96<br />
0.59<br />
0.59<br />
48.4<br />
48.4<br />
95.5<br />
96.4<br />
0.97<br />
0.97<br />
0.96<br />
0.97<br />
0.72<br />
0.80<br />
50.0<br />
47.5<br />
Table 1: Accuracy, per-class f-measures and false alarm rate for various speech state<br />
classifiers for <strong>the</strong> pseudo-speech corpus.<br />
classifier<br />
OneR<br />
J48<br />
Acc.<br />
94.5<br />
97.3<br />
female speaker<br />
Fsil Ftalk FEoT<br />
0.97 0.96 0.03<br />
0.98 0.98 0.61<br />
F AR<br />
65.4<br />
71.1<br />
Acc.<br />
93.7<br />
96.1<br />
male speaker<br />
Fsil Ftalk FEoT<br />
0.92 0.96 0.10<br />
0.96 0.98 0.42<br />
F AR<br />
80.7<br />
84.1<br />
JRip 96.6 0.97 0.98 0.73 61.1 95.9 0.97 0.96 0.61 65.7<br />
Stateful JRip 96.4 0.96 0.98 0.70 31.9 94.9 0.97 0.96 0.58 50.0<br />
Stateful JRip, shifted 96.9 0.97 0.98 0.74 31.6 95.5 0.97 0.96 0.64 48.9<br />
Table 2: Accuracy, per-class f-measures and false alarm rate for various speech state<br />
classifiers for <strong>the</strong> KCoRS speakers.<br />
JRip-algorithms for each corpus. OneR finds <strong>the</strong> most predictive feature to be <strong>the</strong> dynamic<br />
range <strong>of</strong> frame energy over <strong>the</strong> last 100 or 200 ms. JRip outperforms J48, but<br />
has far worse training complexity. Separation <strong>of</strong> speech and silence (which here is <strong>the</strong><br />
recorded silence in <strong>the</strong> corpus, not digital zero) is done with high accuracy. Recognition<br />
<strong>of</strong> EoT regions is <strong>of</strong> lower quality, but still surpasses results in (Schlangen, 2006). 2<br />
While <strong>the</strong> data and <strong>the</strong>ir states are sequential in nature, <strong>the</strong> classifiers as described<br />
above evaluate each frame independently. At <strong>the</strong> same time, recognizing <strong>the</strong> o<strong>the</strong>r speaker’s<br />
start or end <strong>of</strong> turn a little too late or too early hardly matters, while frequently<br />
changing <strong>the</strong> listening state may lead to bad dialogue behaviour. This is measured in <strong>the</strong><br />
false alarm rate (FAR), defined as <strong>the</strong> proportion <strong>of</strong> over-generated state changes.<br />
The analysis <strong>of</strong> classification output showed that wrong classifications would <strong>of</strong>ten<br />
last for only one frame. We implemented a stateful classifier that only changes state<br />
after two consecutive classifications <strong>of</strong> <strong>the</strong> underlying classifier. This strongly decreases<br />
FAR but introduces systematic errors in <strong>the</strong> classification (every actual state change will<br />
be registered one frame too late) and reduces precision/recall measures. When this is<br />
accounted for in <strong>the</strong> evaluation, <strong>the</strong> stateful classifier outperforms <strong>the</strong> base classifier also<br />
in <strong>the</strong>se measures.<br />
The results show, that <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> KCoRS is counterbalanced by its 10 times<br />
larger size. This may indicate, that speech state classification for real dialogue speech<br />
would be feasible with a sufficiently large corpus and speaker-normalized prosodic features.<br />
5 Simple Strategies for Turn-Taking<br />
We outline some simple strategies to turn-control. Their purpose is to exemplify how<br />
very restricted locally managed behaviour with some simple rules can already lead to<br />
acceptable turn-taking behaviour as postulated by <strong>the</strong> local management model <strong>of</strong> Sacks<br />
et al. (1974), without <strong>the</strong> need for a dialogue history, or complex temporal reasoning.<br />
2 Results cannot be easily compared, as Schlangen (2006) recognizes turn-final words using prosodic<br />
and syntactic features on a more complex corpus, reaching an f-measure <strong>of</strong> 0.36.<br />
21
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
measure strategy 1 strategy 2 strategy 3<br />
gap 14.0 % 351 ms 18.7 % 358 ms 17.4 % 362 ms<br />
speaker a 31.4 % 1259 ms 35.9 % 1009 ms 36.5 % 1079 ms<br />
speaker b 39.3 % 1415 ms 39.8 % 1165 ms 40.8 % 1225 ms<br />
clash 15.4 % 1184 ms 5.6 % 317 ms 5.3 % 278 ms<br />
Table 3: Distribution and mean duration <strong>of</strong> dialogue states for three turn-taking strategies<br />
with pseudo-speech.<br />
measure strategy 1 strategy 2 strategy 3<br />
gap 14.1 % 528 ms 20.7 % 477 ms 18.9 % 454 ms<br />
speaker a 36.2 % 1764 ms 40.5 % 1456 ms 34.7 % 1232 ms<br />
speaker b 26.2 % 1437 ms 24.8 % <strong>13</strong>07 ms 42.0 % 1540 ms<br />
clash 23.5 % 1915 ms 4.0 % 253 ms 4.4 % 243 ms<br />
Table 4: Distribution and mean duration <strong>of</strong> dialogue states for three turn-taking strategies<br />
with KCoRS speakers.<br />
51 Measuring Turn-Management Success<br />
The dialogue state can be described by <strong>the</strong> current speech state <strong>of</strong> each <strong>of</strong> <strong>the</strong> dialogue<br />
participants, with each speech state being ei<strong>the</strong>r talk or sil. For two-party dialogue, this<br />
results in four states: two “good” states where ei<strong>the</strong>r one <strong>of</strong> <strong>the</strong> dialogue participants is<br />
talking and two “bad” states: Clashes when both participants talk simultaneously, and<br />
gaps with nei<strong>the</strong>r <strong>of</strong> <strong>the</strong>m talking.<br />
According to Sacks et al. (1974), speakers try to optimize <strong>the</strong>ir behaviour so as to<br />
minimize <strong>the</strong> occurence <strong>of</strong> both clashes and gaps. That is why we choose clashes and<br />
gaps as basic measures for turn-taking success. Slight gaps and clashes occur all <strong>the</strong> time,<br />
but <strong>the</strong>y are not always perceptually relevant. We thus decided to calculate <strong>the</strong> proportion<br />
<strong>of</strong> clashes and gaps over <strong>the</strong> course <strong>of</strong> <strong>the</strong> dialogue as well as <strong>the</strong>ir mean duration.<br />
For evaluation purposes, we set up two artificial dialogue participants and let <strong>the</strong>m talk<br />
with each o<strong>the</strong>r for about 10 minutes for each <strong>of</strong> <strong>the</strong> following strategies. We recorded<br />
<strong>the</strong> internal states and calculated <strong>the</strong> described measures. The audio itself was recorded<br />
but not fur<strong>the</strong>r analyzed in <strong>the</strong> evaluation. The results <strong>of</strong> <strong>the</strong> strategies described below<br />
are shown in tables tables 3 and 4.<br />
52 Strategy 1: Talk When Nobody Talks<br />
Rule: Start an utterance when nei<strong>the</strong>r you nor your interlocutor is talking. (Implicitly:<br />
Continue talking until your utterance is finished.)<br />
The performance with this strategy strongly depends on <strong>the</strong> round-trip time from one<br />
agent’s decision to take <strong>the</strong> turn until <strong>the</strong> o<strong>the</strong>r agent notices <strong>the</strong> turn being taken. The<br />
shorter <strong>the</strong> lags introduced by <strong>the</strong> talking agent’s internal communication, audio transmission,<br />
prosodic processing and classification, and <strong>the</strong> listening agent’s internal communication,<br />
<strong>the</strong> more likely it is for a dialogue participant to notice its interlocutor talking (and<br />
<strong>the</strong>n listen until he has finished) before she has started talking herself. For longer lags,<br />
<strong>the</strong> DP will decide to talk even though its interlocutor may already have started talking<br />
himself. As can be seen, this strategy leads to a large amount <strong>of</strong> clahes.<br />
53 Strategy 2: Hush When Both Talk<br />
Rule as above, plus: Stop your utterance when both you and your interlocutor are talking.<br />
22
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
The rule proves effective in reducing simultaneous talk as clashes are reduced by 65 %<br />
(pseudo-speech) and over 80 % (KCors) respectively. At <strong>the</strong> same time, this strategy leads<br />
to <strong>the</strong> introduction <strong>of</strong> utterance truncations, when an utterance was stopped prematuerly.<br />
(Actually, <strong>the</strong> majority <strong>of</strong> utterances (71 % for pseudo-speech) was truncated, but many <strong>of</strong><br />
<strong>the</strong>se truncations occur in <strong>the</strong> silent phases before or after <strong>the</strong> actual talk and do not have<br />
any deteriorating effect on <strong>the</strong> perceived turn-taking performance.) Truncations could be<br />
reduced with a higher probability to hush during SoT.<br />
54 Strategy 3: Start Talking Early<br />
The previous strategies only react after turns have started or ended. In order to initiate<br />
actions early and anticipates turn changes, this strategy exploits <strong>the</strong> EoT class <strong>of</strong> <strong>the</strong><br />
speech analysis (which was ignored before) in <strong>the</strong> first rule: Start an utterance, when you<br />
are not talking and your interlocutor is ending <strong>the</strong>ir turn or has already finished.<br />
By starting utterance planning before <strong>the</strong> interlocutor’s preceding utterance is finished,<br />
<strong>the</strong> dialogue participant can hide some <strong>of</strong> <strong>the</strong> lag introduced by its speech generation<br />
module. The duration <strong>of</strong> both gaps and clashes is reduced compared to strategy 2, for<br />
gaps because turns will be taken over more quickly and for clashes due to <strong>the</strong> original<br />
talk-owner noticing <strong>the</strong> turn-change earlier, avoiding <strong>the</strong> start <strong>of</strong> a new utterance.<br />
The durations for gaps and clashes with this strategy are similar to those reported<br />
for parts <strong>of</strong> <strong>the</strong> Verbmobil corpus by Weilhammer and Rabold (2003), with 363 ms and<br />
331 ms respectively. 3 Performance could be fur<strong>the</strong>r improved by using a lower probability<br />
to hush during EoT.<br />
6 Conclusion and Future Directions<br />
We have presented a flexible, modular architecture for dialogue strategy evaluation where<br />
arbitrary pairings <strong>of</strong> human users and artificial dialogue participants can be created. We<br />
have discussed a case-study in this environment, where pairs <strong>of</strong> artificial DPs converse in<br />
real time via audio. Each DP autonomously decides on <strong>the</strong>ir turn-taking behaviour (start<br />
or stop talking) based on a local analysis <strong>of</strong> <strong>the</strong> audio signal and using machine-learned<br />
classifiers. We tested <strong>the</strong>se with corpora <strong>of</strong> simplified speech and achieve good recognition<br />
performance. Three implemented turn-management rulesets, all <strong>of</strong> <strong>the</strong>m locallymanaged<br />
in <strong>the</strong> sense <strong>of</strong> Sacks et al. (1974), i. e. not requiring dialogue memory, were<br />
shown to create increasingly realistic behavioural patterns.<br />
We plan to use <strong>the</strong> components developed for this system in an interactive speech<br />
dialogue system. For <strong>the</strong> speech state classification, we will need normalized prosodic<br />
features that allow for speaker independent speech state classification. At <strong>the</strong> same time,<br />
ASR will make features relative to syllable information (stress patterns, speech rate, ...)<br />
accessible, as well as word hypo<strong>the</strong>ses. We may also want to look into classifier confidence<br />
scores, only emitting speech state changes if <strong>the</strong> classifier is reasonably certain.<br />
In real dialogue, <strong>the</strong> problem <strong>of</strong> hesitations arises. Our classification will have to be<br />
extended to distinguish hesitational interruptions from normal EoT. We would also like to<br />
identify positions in a turn where a back-channelling utterance might be appropriate.<br />
3 Note, that <strong>the</strong>ir numbers are for turn changes only, while we do not distinguish between gaps at turn<br />
changes and at turn continuations.<br />
23
Acknowledgements<br />
I would like to thank my supervisor David Schlangen for his constant guidance and support<br />
and <strong>the</strong> anonymous reviewers for <strong>the</strong>ir insightful comments and suggestions.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
de Cheveigné, A. and Kawahara, H. (2002). Yin, a fundamental frequency estimator for<br />
speech and music, The Journal <strong>of</strong> <strong>the</strong> Acoustical Society <strong>of</strong> America 111(4): 1917–<br />
1930.<br />
Ferrer, L., Shriberg, E. and Stolcke, A. (2002). Is <strong>the</strong> speaker done yet? Faster and more<br />
accurate end-<strong>of</strong>-utterance detection using prosody, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> International<br />
Conference on Spoken Language Processing (ICSLP2002), Denver, USA.<br />
IPDS (1994). The kiel corpus <strong>of</strong> read speech, CD-ROM.<br />
Levinson, S. C. (1983). Pragmatics, Cambridge Textbooks in Linguistics, Cambridge<br />
University Press.<br />
López-Cózar, R., De la Torre, A., Segura, J. and Rubio, A. (2003). Assessment <strong>of</strong> dialogue<br />
systems by means <strong>of</strong> a new simulation technique, Speech Communication<br />
40(3): 387–407.<br />
Martin, D., Cheyer, A. and Moran, D. (1999). The Open Agent Architecture: a framework<br />
for building distributed s<strong>of</strong>tware systems, Applied Artificial Intelligence <strong>13</strong>(1/2): 91–<br />
128.<br />
URL: citeseer.ist.psu.edu/martin99open.html<br />
Padilha, E. G. (2006). Modelling Turn-taking in a Simulation <strong>of</strong> Small Group Discussion,<br />
PhD <strong>the</strong>sis, School <strong>of</strong> Informatics, University <strong>of</strong> Edinburgh, Edinburgh, UK.<br />
Sacks, H., Schegl<strong>of</strong>f, E. A. and Jefferson, G. A. (1974). A simplest systematic for <strong>the</strong><br />
organization <strong>of</strong> turn-taking in conversation, Language 50: 735–996.<br />
Schatzmann, J., Weilhammer, K., Stuttle, M. and Young, S. (2006). A survey <strong>of</strong> statistical<br />
user simulation techniques for reinforcement-learning <strong>of</strong> dialogue management<br />
strategies, The Knowledge Engineering Review 21(02): 97–126.<br />
Schlangen, D. (2006). From reaction to prediction: Experiments with computational<br />
models <strong>of</strong> turn-taking, Interspeech 2006, Pittsburgh, USA.<br />
URL: http://www.ling.uni-potsdam.de/ das/papers/schlangen intersp2006.pdf<br />
Schulzrinne, H., Casner, S., Frederick, R. and Jacobson, V. (2003). RTP: A Transport<br />
Protocol for Real-Time Applications, RFC 3550 (Standard).<br />
URL: http://www.ietf.org/rfc/rfc3550.txt<br />
Talkin, D. (1995). A robust algorithm for pitch tracking (rapt), in W. B. Kleijn and K. K.<br />
Paliwal (eds), Speech Coding and Syn<strong>the</strong>sis, Elsevier, chapter 14, pp. 495–518.<br />
24
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Walker, W., Lamere, P., Kwok, P., Raj, B., Singh, R., Gouvea, E., Wolf, P. and Woelfel,<br />
J. (2004). Sphinx-4: A flexible open source framework for speech recognition,<br />
Technical Report SMLI TR2004-0811, Sun Microsystems Inc.<br />
Weilhammer, K. and Rabold, S. (2003). Durational aspects in turn taking, Proc. <strong>of</strong> <strong>the</strong><br />
ICPhS, Barcelona, Spain.<br />
URL: http://www.phonetik.uni-muenchen.de/Publications/WeilhammerRabold-03-<br />
ICPhS.pdf<br />
Witten, I. H. and Frank, E. (2000). Data Mining. Practical Machine Learning Tools and<br />
Techniques with Java Implementations., Morgan Kaufmann.<br />
25
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
26
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
EPISTEMIC MODALS IN DIALOGUE<br />
Chris Brumwell<br />
University <strong>of</strong> Amsterdam<br />
Abstract. I present an update semantics for epistemic modals in which a formula <strong>of</strong> <strong>the</strong><br />
form might φ acts on a context Γ by introducing a salient possibility con-structed from φ<br />
into Γ. This <strong>the</strong>ory is meant to account for <strong>the</strong> intuitions and data that suggest that assertions<br />
<strong>of</strong> epistemic modals do not provide information to <strong>the</strong> participants in a conversation, but<br />
instead suggest certain possibilities for <strong>the</strong>ir con-sideration. Among this data is <strong>the</strong> important<br />
empirical fact that epistemic modals can answer questions. To account for this, I also define a<br />
semantics for questions and show that in this system epistemic modals can count as answers<br />
to questions.<br />
1 Introduction and Motivations<br />
In <strong>the</strong> classic picture <strong>of</strong> communication given in Stalnaker (1978), a conversation is a<br />
process <strong>of</strong> distinguishing between various possibilities, or ways <strong>the</strong> world might be. It is<br />
clear, however, that in a conversation not all possibilities are given equal attention by <strong>the</strong><br />
interlocutors. People talking about whe<strong>the</strong>r or not John murdered Jack are not trying to<br />
distinguish a possibility in which chocolate makes cats sick from a possibility in which<br />
chocolate doesnt make cats sick. In this paper, I call <strong>the</strong> possibilities <strong>the</strong> interlocutors are<br />
most interested in salient possibilities.<br />
Asking a question is <strong>the</strong> canonical way <strong>of</strong> introducing salient possibilities into a discourse:<br />
questions introduce possibilities corresponding to <strong>the</strong>ir different answers. But<br />
o<strong>the</strong>r constructions introduce salient possibilities as well. The disjunction Jones works<br />
at a bank or a hospital introduces <strong>the</strong> salient possibilities that Jones works at a bank and<br />
Jones works at a hospital. Constructions containing indefinite NPs such as somebody<br />
stole <strong>the</strong> jewels can introduce salient possibilities corresponding to various instantiations<br />
<strong>of</strong> somebody. Free choice commands such as Take any apple you like introduce salient<br />
possibilities corresponding to your various choices. Finally, a statement expressing epistemic<br />
modality such as John might be hiding upstairs introduces <strong>the</strong> salient possi-bility<br />
that John is hiding upstairs.<br />
Recent work by Groenendijk (Groenendijk 2007) proposes an analysis <strong>of</strong> disjunction<br />
and existential quantification that captures <strong>the</strong>ir potential to introduce salient possibilities<br />
into a dialogue. In this paper, I formalize <strong>the</strong> notion <strong>of</strong> a salient possibility and use it<br />
to define a dynamic semantics for questions and epistemic modals. In <strong>the</strong> semantics,<br />
a question introduces salient possibilities corresponding to its possible answers, and an<br />
epistemic modal <strong>of</strong> <strong>the</strong> form might φ introduces a salient possibility constructed from φ<br />
and, following Veltman (1996), tests <strong>the</strong> common ground to see whe<strong>the</strong>r it is consistent<br />
with φ.<br />
Salient possibilities are almost perfectly suited for an analysis <strong>of</strong> epistemic modals.<br />
Unlike o<strong>the</strong>r kinds <strong>of</strong> assertions, an assertion <strong>of</strong> an epistemic modal does not con-tribute<br />
information to a conversation. Instead, its function is to call attention to certain possibilities<br />
that <strong>the</strong> conversational participants should, for some reason, find interesting. Thus,<br />
27
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
to analyze epistemic modals one must develop a framework in which assertions can significantly<br />
change a context without providing information. Since this papers framework<br />
postulates that epistemic modals affect <strong>the</strong> salient possibilities in a context ra<strong>the</strong>r than its<br />
information, <strong>the</strong> non-informative yet non-trivial effects <strong>of</strong> epistemic modals are properly<br />
represented.<br />
One advantage <strong>of</strong> this analysis is that it is able to account for <strong>the</strong> felicity <strong>of</strong> a modalized<br />
construction as an answer to a question. For example:<br />
(1) A: Where are my keys?<br />
B: They might be in <strong>the</strong> basement.<br />
(2) A: Are John and Bill coming to <strong>the</strong> party?<br />
B: They might.<br />
In dialogue (1), B doesnt answers As questions by saying where her keys are (because,<br />
if hes acting felicitously, he doesnt know where <strong>the</strong>y are), but by suggesting a possibility<br />
for her to consider. Similarly in (2): B suggests that A should not overlook <strong>the</strong> possibility<br />
that Bill and John come to <strong>the</strong> party. If she really dislikes <strong>the</strong>m, <strong>the</strong> very possibility that<br />
<strong>the</strong>y attend may be reason enough for her to skip <strong>the</strong> party.<br />
Enemies <strong>of</strong> salient possibilities may think that a modal answer to a question really says<br />
nothing more than I dont know, or Any answer is consistent with my knowledge. Against<br />
this, consider <strong>the</strong> following case: suppose A is frantically looking for her husband Joe,<br />
and comes across B, who has never met Joe and has never given him one thought. If<br />
she asks him Where is Joe? and he responds I dont know, this is perfectly acceptable.<br />
However, if he responds He might be in Boston this is completely infelicitous: if A takes<br />
him seriously, shes on her way to a wild goose chase. Intuitively, this is because she<br />
seriously takes into account his (inappropriate) suggestion to consider <strong>the</strong> possibility that<br />
Joe is in Boston.<br />
A classical partition <strong>the</strong>ory <strong>of</strong> questions has difficulty accounting for (1) and (2). This<br />
is <strong>the</strong> case because in a partition <strong>the</strong>ory an answer to a question has to give information.<br />
However, as dialogues (1) and (2) demonstrate, answers to questions do not need to be<br />
informative: it suffices that <strong>the</strong>y suggest informative answers. Below, I give a more detailed<br />
and formal discussion <strong>of</strong> <strong>the</strong> problem partition <strong>the</strong>ories <strong>of</strong> questions face from noninformative<br />
answers to questions, and discuss <strong>the</strong> similarities and differences between <strong>the</strong><br />
<strong>the</strong>ory presented in this paper and a partition <strong>the</strong>ory.<br />
This analysis also accounts for a puzzling feature <strong>of</strong> <strong>the</strong> behavior <strong>of</strong> epistemic modals<br />
under attitude reports. Statements <strong>of</strong> <strong>the</strong> form x believes that might φ mean, in part,<br />
that <strong>the</strong> attitude holder x considers φ to be a salient possibility. For example, sup-pose<br />
that John has never given a thought to what <strong>the</strong> wea<strong>the</strong>r is like in Amsterdam. Then (3)<br />
certainly seems wrong:<br />
(3) John believes it might be raining in Amsterdam.<br />
Using this papers <strong>the</strong>ory, one could account for (3) by analyzing a belief state as composed<br />
<strong>of</strong> both information and salient possibilities. The content <strong>of</strong> (3) <strong>the</strong>n states, roughly,<br />
that its consistent with Johns beliefs that its raining in Amsterdam and that this is a salient<br />
28
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
possibility in his belief state. Several contemporary <strong>the</strong>ories <strong>of</strong> epistemic modality do not<br />
appeal to any notion similar to that <strong>of</strong> a salient possibility, and hence have no clear way<br />
<strong>of</strong> accounting for (3) (e.g. DeRose (1991) and Egan et. al. (2005); for similar reasons<br />
<strong>the</strong>se <strong>the</strong>ories also have problems accounting for <strong>the</strong> question and answer data presented<br />
above). Due to constraints on length we will not formalize this <strong>the</strong>ory <strong>of</strong> <strong>the</strong> interaction<br />
between epistemic modals and attitude reports below.<br />
As mentioned above, <strong>the</strong> analysis is carried out in a dynamic semantic framework. In<br />
dynamic semantics, <strong>the</strong> meaning <strong>of</strong> a formula is not identified with its truth conditions,<br />
but ra<strong>the</strong>r with <strong>the</strong> way it changes a context. More specifically, our <strong>the</strong>ory is a version <strong>of</strong><br />
update semantics in <strong>the</strong> style <strong>of</strong> Veltman (1996), i.e. we give a definition <strong>of</strong> an information<br />
state and <strong>the</strong> meanings <strong>of</strong> formulas are functions from information states to information<br />
states.<br />
2 Questions and Salient Possibilities<br />
In this section, we define a 1st-order language with a question operator and an epistemic<br />
possibility operator. We <strong>the</strong>n define <strong>the</strong> structures (information states) used to give a<br />
semantics for this language and define <strong>the</strong> notion <strong>of</strong> a salient possibility. Finally, we give<br />
<strong>the</strong> semantics for this language and define what it means for a formula to be an answer to<br />
a question. This definition will allow modal and non-modal formulas to answer questions.<br />
DEFINITION 1. We define <strong>the</strong> languages L1, L2, and L3 as follows:<br />
(i) If P is an n-place predicate and t1...tn are terms, <strong>the</strong>n P(t1...tn) ∈ L1<br />
(ii) If φ,ψ ∈ L1, <strong>the</strong>n φ ∧ ψ ∈ L1 and ¬ φ ∈ L1<br />
(iii) If φ ∈ L1, <strong>the</strong>n ⋄φ ∈ L2<br />
(iv) If φ,ψ ∈ L2, <strong>the</strong>n φ ∧ ψ ∈ L2 and ¬ φ ∈ L2<br />
(v) If φ ∈ L1, <strong>the</strong>n ?φ ∈ L3<br />
(vi) If φ,ψ ∈ L3, <strong>the</strong>n φ ∧ ψ ∈ L3<br />
The language L we discuss in this paper is defined L = L1 ∪ L2 ∪ L3. As a notational<br />
convention, we write atomic sentences (i.e. atomic formulas with no free variables) and<br />
Boolean combinations <strong>of</strong> atomic sentences as p, q, ¬q,p ∧ q, etc.<br />
In a standard update semantics, information states are sets <strong>of</strong> indices, where an index<br />
assigns an individual from a domain D to each constant <strong>of</strong> <strong>the</strong> language and an n-ary<br />
relation to each n-place predicate. In this papers framework, an information state is a set<br />
<strong>of</strong> sets <strong>of</strong> indices A such that <strong>the</strong>re is an I* ∈ A that for all I m ∈ A, I m ⊆ I*. The intui-tion<br />
behind this definition is that this maximal set I* represents <strong>the</strong> common ground at a point<br />
in a conversation. Any subset <strong>of</strong> I* is a possible future state <strong>of</strong> <strong>the</strong> common ground, and<br />
hence could be a possibility that <strong>the</strong> discourse participants are interested in. However,<br />
recalling <strong>the</strong> introduction, all such subsets are not always <strong>of</strong> interest to <strong>the</strong> discourse<br />
participants. With that in mind we think <strong>of</strong> <strong>the</strong> subsets I m <strong>of</strong> I* as salient possibilities. We<br />
formally define information states below:<br />
DEFINITION 2. Let I be <strong>the</strong> set <strong>of</strong> all indices for <strong>the</strong> language L. We define an information<br />
state to be a set Γ = {P1,...,Pn,...} such that:<br />
(i) Pi ⊆ I for all n (ii) For some i, Pi = Γ<br />
(iii) There is an i such that for all j, Pj ⊆ Pi. This maximal set Pi is called <strong>the</strong> common<br />
ground.<br />
We write CG (common ground) for <strong>the</strong> maximal set Pi defined in (iii), and write<br />
Γ = {CG,P1,...,Pn,...,∅}. In some cases, we refer to information states as contexts.<br />
29
Though every element <strong>of</strong> an information state is a salient possibility (except <strong>the</strong> empty<br />
set, which is present to simplify <strong>the</strong> definition <strong>of</strong> an answer to a question), <strong>the</strong> sets in an<br />
information state do not exhaust its salient possibilities. Ra<strong>the</strong>r, <strong>the</strong> salient possibilities<br />
in an information state are generated by closing it under union and intersection. Salient<br />
possibilities are defined this way because, intuitively, if P1 and P2 are salient possibilities<br />
in a context, <strong>the</strong>n if <strong>the</strong>y are not mutually exclusive it is also possible that <strong>the</strong>y both obtain.<br />
Thus, <strong>the</strong>ir intersection should count as a salient possibility as well. Similar reasoning<br />
supports considering <strong>the</strong> union <strong>of</strong> salient possibilities to be a salient possibility.<br />
DEFINITION 3. Let Γ be an information state. Then 〈Γ〉, <strong>the</strong> set <strong>of</strong> salient possibilities in<br />
Γ, is defined as <strong>the</strong> <strong>the</strong> smallest set such that:<br />
(i) If P ∈ Γ, <strong>the</strong>n P ∈ 〈Γ〉 (ii) If P1, P2 ∈ 〈Γ〉, <strong>the</strong>n P1 ∪ P2 ∈ 〈Γ〉<br />
(iii) If P1, P2 ∈ 〈Γ〉, <strong>the</strong>n P1 ∩ P2 ∈ 〈Γ〉.<br />
We need one more concept in order to define <strong>the</strong> semantics <strong>of</strong> wh-questions. On our<br />
analysis, wh-questions introduce salient possibilities corresponding to each <strong>of</strong> <strong>the</strong>ir possible<br />
answers into an information state. To represent <strong>the</strong> possible answers to a wh-question,<br />
we use <strong>the</strong> relations defined in definition 5 (Definition 4 is a standard account <strong>of</strong> satisfaction,<br />
which is necessary for articulating definition 5):<br />
DEFINITION 4. Let φ ψ ∈ L1, let i be an index, and let g be a variable assignment function.<br />
(i) If φ = Qt1...tn, <strong>the</strong>n i |= φ [g] iff 〈[t1] i,g ,...,[tn] i,g 〉 ∈ i(Q)<br />
(ii) i |= φ ∧ ψ [g] iff i |= φ [g] and i |= ψ [g] (iii) i |= ¬φ [g] iff i �|= φ [g]<br />
DEFINITION 5. Let φ ∈ L1, and let i and j be indices. We say that i ≡ j (mod φ) if for all<br />
assignments g, i |= φ [g] iff i |= ψ [g]<br />
Given a formula φ, definition 6 defines <strong>the</strong> conditions under which two indices give<br />
<strong>the</strong> same answer to <strong>the</strong> question ?φ. For a sentence φ <strong>of</strong> L1, i ≡ j (mod φ) will hold as<br />
long as i and j assign φ <strong>the</strong> same truth value. But for a formula <strong>of</strong> L1 with free variables,<br />
congruence modulo φ requires that <strong>the</strong> indices assign <strong>the</strong> same denotations (or just similar<br />
denotations if <strong>the</strong> formula contains both free variables and constants) to predicates that<br />
occur in φ. The following examples illustrates how this definition works.<br />
EXAMPLE 1.<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(i) Let ?φ = ?Px (Who came to <strong>the</strong> party?) i ≡ j (mod φ) if i(P) = j(P), or informally, if <strong>the</strong><br />
same people came to <strong>the</strong> party according to indices i and j.<br />
(ii) Let ?φ = ?Ibx (Who did Bill invite to <strong>the</strong> party?) i ≡ j (mod φ) if<br />
{d ∈ D| 〈d, i(b)〉 ∈ i(P)} = {d ∈ D| 〈d, j(b)〉 ∈ j(P)}.<br />
(iii) Let ?φ = ?p (Did Alice help Bill?) i ≡ j (mod φ) if i |= p iff j |= p.<br />
In our update semantics, <strong>the</strong> effect <strong>of</strong> a formula on an information state will be defined<br />
in terms <strong>of</strong> <strong>the</strong> effects it has on certain elements <strong>of</strong> <strong>the</strong> information state. Thus, to state<br />
our update semantics for information states we require an update semantics for sets <strong>of</strong><br />
indices as well. The update semantics for sets indices is fairly simple, and is roughly <strong>the</strong><br />
same as that given in Veltman (1996).<br />
DEFINITION 6. Let φ ∈ L1 ∪ L2 be a sentence, and let P be a set <strong>of</strong> indices. We define <strong>the</strong><br />
update <strong>of</strong> P with φ, written P[φ], as follows:<br />
(i) P[p] = {i ∈ P | i |= p} (ii) P[φ ∧ ψ] = P[φ][ψ]<br />
(iii) P[¬φ] = {i ∈ P | i �∈ P[φ]} (iv) P[⋄φ] = P if P[φ] �= ∅<br />
(v) P[⋄φ] = ∅ if P[φ] = ∅<br />
30
We now state our update semantics for information states.<br />
DEFINITION 7. Let Γ = {CG, P1,...,Pn, ∅} be an information state, and let φ ∈ L be a<br />
sentence. We define <strong>the</strong> update <strong>of</strong> Γ with φ as follows:<br />
(i) Γ[p] = { CG[p], P1[p],...,Pn[p], ∅}<br />
(ii) Γ[¬φ] = { CG[¬φ], P1[¬φ],...,Pn[¬φ], ∅}<br />
(iii) Γ[φ ∧ ψ] = Γ[φ][ψ]<br />
(iv) Γ[⋄φ] = {CG[⋄φ], P1[⋄φ],...,Pn[⋄φ], ∅} if <strong>the</strong>re is a P ∈ Γ such that P[φ]] = P<br />
(v) Γ[⋄φ] = { CG[⋄φ], CG[φ], P1[φ],...,Pn, ∅} if <strong>the</strong>re is a P ∈ Γ such that P[φ]] = P<br />
(vi) Γ = Γ ∪ {{ i | i ≡ j (mod φ)} | j ∈ CG}.<br />
Clauses (i) - (vi) apply as long as CG[φ] �= ∅. In <strong>the</strong> degenerate case that CG[φ] = ∅, we set<br />
Γ[φ] = {∅}, <strong>the</strong> absurd state.<br />
In <strong>the</strong> semantics defined above, although epistemic modals can change information<br />
states <strong>the</strong>y cannot have a non-trivial effect on <strong>the</strong> common ground. This is as it should<br />
be: only constructions that provide information should change <strong>the</strong> common ground, and<br />
epistemic modals do not play that role in a dialogue. Thus, this semantics complies with<br />
<strong>the</strong> requirement set forward in <strong>the</strong> introduction: epistemic modals change a context in a<br />
significant yet non-informative manner.<br />
More specifically, epistemic modals change a context by drawing attention to certain<br />
possibilities. However, <strong>the</strong> manner in which an epistemic modal accomplishes this depends<br />
on <strong>the</strong> possibilities that are already salient in <strong>the</strong> dialogue’s context. If <strong>the</strong> possibility<br />
an epistemic modal calls attention to is not under discussion at all, <strong>the</strong>n <strong>the</strong> epistemic<br />
modal adds this possibility to <strong>the</strong> set <strong>of</strong> salient possibilities in <strong>the</strong> context, acting in <strong>the</strong><br />
manner specified in clause (v) (see example 4 below). But if this possibility is already<br />
under consideration, an epistemic modal draws attention to it by eliminating salient possibilities<br />
that are inconsistent with it from <strong>the</strong> context. In this latter case, epistemic modals<br />
act in <strong>the</strong> manner specified in clause (iv) (see example 2 - 3 below).<br />
An epistemic modal acts in accordance with clause (iv) when it functions as an answer<br />
to a question. Questions introduce several salient possibilities in a context, and an<br />
epistemic modal acts to draw attention to some answers ra<strong>the</strong>r than o<strong>the</strong>rs. But epistemic<br />
modals arent always used to answer questions. For example, <strong>the</strong>y can be used to provide<br />
someone with a warning:<br />
(4) A: Alice and I are going fishing in Leiden tomorrow.<br />
B: It might be illegal to fish in Leiden.<br />
A: Oh, I hadn’t thought to check that; thanks.<br />
B draws A’s attention to <strong>the</strong> possibility that fishing is illegal in Leiden, a possibility that<br />
A had overlooked but should investigate. Here, it is essential that B’s utterance contributes<br />
a new salient possibility to <strong>the</strong> context.<br />
Using this framework, we now define <strong>the</strong> conditions under which a formula φ answers<br />
a question ψ. Note that this definition admits full and partial answers.<br />
DEFINITION 8.<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Let φ ∈ L, and let ψ ∈ L3. We say that φ answers ψ if 〈{I,∅}[ψ][φ]〉 ⊂ 〈{I,∅}[ψ]〉.<br />
31
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Thus, φ answers ψ if φ removes some salient possibilities that ψ introduces. This<br />
notion <strong>of</strong> answerhood should be familiar from a partition <strong>the</strong>ory <strong>of</strong> questions: in both<br />
cases, answering a question amounts to eliminating some <strong>of</strong> <strong>the</strong> possibilities it introduces.<br />
But an important, unique feature <strong>of</strong> this definition is that an answer doesnt necessarily give<br />
information: it suffices that an answer suggest certain possibilities for <strong>the</strong> questioner to<br />
consider.<br />
We close this section by working through a few examples. We use <strong>the</strong> following notational<br />
conventions: {p} = {i ∈ I | i |= p}, {¬p} = {i ∈ I | i �|= p} etc.<br />
Example 2: A Polar Question.<br />
Recall example (2), and let p and q be <strong>the</strong> propositions ‘Bill is coming to <strong>the</strong> party’ and ‘John<br />
is coming to <strong>the</strong> party’ respectively. Let Γ = {I, ∅}; <strong>the</strong>n ⋄p ∧ ⋄q answers ?p ∧ ?q: Γ[?p ∧<br />
?q] = {I, {p}, {¬p}, {q}, {¬q}, ∅} = Γ 1 . Then: Γ 1 [⋄p ∧ ⋄q] = {I, {p}, {q}, ∅} = Γ 2 , and<br />
since 〈Γ 2 〉 ⊂ 〈Γ 1 〉, ⋄p ∧ ⋄q answers ?p ∧ ?q.<br />
Example 3: A Wh-Question.<br />
Consider <strong>the</strong> question ‘Who is likes to paint?’, and note that ‘Bill might like to paint’ felicitously<br />
answers this question. Let Px be ‘x likes to paint’, and let b be Bill. Let Γ = {I, ∅}.<br />
Then: Γ[?Px] = Γ ∪ {{i | i ≡ j (mod Px)} | j ∈ I }<br />
= Γ ∪ {{i |i(P) = D*}| D* ⊆ D} = Γ 1 . Then<br />
Γ 1 [⋄Pb] = Γ ∪ {{ i | i(P) = D*}[⋄Pb] |D* ⊆ D}<br />
= Γ ∪ {{i | i(b) ∈ i(P) and i(P) = D*}|D* ⊆ D such that i(b) ∈ D*}<br />
Since Γ 1 [⋄Pb] ⊂ Γ 1 , ⋄Pj is an answer to ?Px.<br />
Examples 3 and 4 bring out an important feature <strong>of</strong> this paper’s framework: epistemic<br />
modals behave much like questions. Both questions and epistemic modals draw attention<br />
to certain possibilities without committing <strong>the</strong> speaker to a position on whe<strong>the</strong>r or not<br />
<strong>the</strong>se possibilities are actual. Epistemic modals, however, are stronger than questions:<br />
modals draw attention to fewer possibilities than questions, suggesting that <strong>the</strong> chosen<br />
possibilities are somehow more important than <strong>the</strong> ignored possibilities. The notion <strong>of</strong> a<br />
salient possibility allows us to represent this similarity between questions and epistemic<br />
modals in fully formal way.<br />
Example 4: Raising Issues Without Questions.<br />
Recall (4), and let p and q be ‘Alice and A are going fishing in Leiden tomorrow’ and ‘It’s<br />
illegal to fish in Leiden’ respectively. Let Γ = {I, ∅}. Then<br />
Γ[p][⋄q] = {{p}, {p ∧ q}, ∅}. Here, since no possibility in Γ[p] satisfied q, <strong>the</strong> epistemic<br />
modal acted to add <strong>the</strong> possibility {p ∧ q} to <strong>the</strong> context. Thus, even though no questions<br />
have been asked in this context, B is able to bring A’s attention to some issue by using an<br />
epistemic modal.<br />
Example 5: Infelicitous Answer.<br />
Responding to a polar question ?φ with ⋄φ ∧ ⋄¬φ should not count as answering <strong>the</strong> question:<br />
ra<strong>the</strong>r, responding to a question with ‘maybe, maybe not’ is a deliberate and almost<br />
reticent refusal to answer <strong>the</strong> question. Our semantics allows us to account for this: {I,<br />
∅}[?p][⋄p ∧ ⋄¬p] = {I, {p}, {¬p}, ∅}[⋄p ∧ ⋄¬p]<br />
= {I, {p}, {¬p}, ∅}[⋄p][⋄¬p] = {I, {p}, ∅}[⋄¬p] = {I, {p}, {¬p}, ∅}. Thus,<br />
⋄p ∧ ⋄¬p does not answer ?p. Moreover, ⋄p ∧ ⋄¬p is actually equivalent to ?p in this information<br />
state.<br />
In general, ?φ and ⋄φ ∧ ⋄¬φ are equivalent in any information state that is consistent with<br />
both φ and ¬φ, so polar questions can almost be defined using epistemic modals (if we assume<br />
that polar questions presuppose that both <strong>of</strong> <strong>the</strong>ir answers are possible, polar questions<br />
can be defined in terms <strong>of</strong> <strong>the</strong> epistemic modality operator).<br />
32
3 Comparison With a Partition Semantics <strong>of</strong> Questions<br />
In this section, we will slightly change our semantics to yield a partition <strong>the</strong>ory <strong>of</strong> questions,<br />
1 and examine <strong>the</strong> difficulties it faces. These difficulties will bring to light problems<br />
that any partition <strong>the</strong>ory <strong>of</strong> questions faces in accounting for non-informative answers to<br />
questions, and point to an important feature <strong>of</strong> <strong>the</strong> <strong>the</strong>ory above that allows it to account<br />
for non-informative answers. For ease <strong>of</strong> exposition, we only consider polar questions: in<br />
this section, suppose that we only allow atomic sentences to be well-formed elements <strong>of</strong><br />
L1.<br />
Using our terminology, in a partition <strong>the</strong>ory <strong>of</strong> questions a question divides <strong>the</strong> common<br />
ground into <strong>the</strong> salient possibilities corresponding to its different answers. Crucially,<br />
salient possibilities are not added to <strong>the</strong> context as <strong>the</strong>y were in section 2. Thus, to state<br />
a partition <strong>the</strong>ory <strong>of</strong> questions in our framework we have to alter <strong>the</strong> definition <strong>of</strong> an<br />
information state: we no longer assume an information state contains a maximal set <strong>of</strong><br />
indices, and for purposes <strong>of</strong> this section we remove clause (ii) from <strong>the</strong> definition <strong>of</strong> an<br />
information state.<br />
Since information states no longer contain a common ground, clause (v) in <strong>the</strong> update<br />
semantics for information states is difficult to translate to this new system. For purposes<br />
<strong>of</strong> this section, <strong>the</strong>n, we also remove clause (v) from this definition, and stipulate that<br />
epistemic modals always change an information state according to clause (iv).<br />
Our partition <strong>the</strong>ory <strong>of</strong> questions results from changing definition 8 and clause (vi) in<br />
definition 7 to <strong>the</strong> following.<br />
DEFINITION 9.<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(i) Let Γ = {P1,...,Pn} be an information state, and let ?φ ∈ L. Then we define<br />
Γ[?φ] = {P1[φ]P1[¬φ],...,Pn[φ], Pn[¬φ]}<br />
(ii) Let φ ∈ L and let ψ ∈ L3. We say that φ answers ψ if {I}[ψ][φ] ⊂ {I}[ψ].<br />
An immediate problem with this <strong>the</strong>ory is that modal formulas can eliminate blocks <strong>of</strong><br />
a partition. This is <strong>the</strong> case because after a question ?p, ⋄p will eliminate any possibility<br />
that was just updated with ¬p. While this is good in so far as under this <strong>the</strong>ory modal formulas<br />
can answer questions, it has o<strong>the</strong>r disastrous consequences. Since modal formulas<br />
can eliminate blocks <strong>of</strong> a partition, <strong>the</strong>y can provide as much information as non-modal<br />
formulas: for any information state Γ, Γ[?p][⋄p] = Γ[?p][p]. This is <strong>the</strong> case because both<br />
p and ⋄p will eliminate <strong>the</strong> possibilities from Γ that have been updated with ¬p and have<br />
no effect on <strong>the</strong> possibilities that have been updated with p. This is a bad result: Γ[?p][⋄p]<br />
[¬p] should be consistent, but Γ[?p][p] [¬p] shouldn’t be. While modals and non-modals<br />
should both count as answers to questions, <strong>the</strong>y should not answer questions in <strong>the</strong> same<br />
way.<br />
On a more general level, <strong>the</strong> problem with <strong>the</strong> partition semantics is that any update<br />
has to provide information or add possibilities, and possibilities can only be removed by<br />
information. This leads to trouble with epistemic modals: if one lets an epistemic modal<br />
answer a question, it must provide information and hence function far too much like a<br />
non-modal. But, on <strong>the</strong> o<strong>the</strong>r hand, if one posits that an epistemic modal doesnt provide<br />
1 For purposes <strong>of</strong> this paper, a partition semantics for questions is a semantics that holds: (i) a question<br />
changes a context by partitioning an information state, and (ii) to answer a question is to remove blocks<br />
from this partition. The partition semantics given in Groenendijk (1999) is similar to <strong>the</strong> one we present in<br />
this section.<br />
33
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
information, <strong>the</strong>re is no way to say how it could change an information state in a way that<br />
answers a question.<br />
In <strong>the</strong> framework presented above this problem is dealt with by separating <strong>the</strong> common<br />
ground, and hence <strong>the</strong> information, from <strong>the</strong> salient possibilities. This change makes noninformative<br />
answers to questions possible: epistemic modals can eliminate possibilities<br />
without changing <strong>the</strong> information in <strong>the</strong> common ground. However, by connecting <strong>the</strong><br />
meaning <strong>of</strong> a question to its possible answers in a context, and by identifying answers to<br />
questions with <strong>the</strong> elimination <strong>of</strong> possibilities, this approach retains much <strong>of</strong> <strong>the</strong> spirit <strong>of</strong><br />
<strong>the</strong> partition <strong>the</strong>ory <strong>of</strong> questions.<br />
4 Fur<strong>the</strong>r Issues and Expansions <strong>of</strong> <strong>the</strong> System<br />
In this section, I will discuss some expansions <strong>of</strong> <strong>the</strong> system defined above and consider<br />
two objections to it.<br />
First, I will discuss <strong>the</strong> objections. Though <strong>the</strong> idea that epistemic modals can answer<br />
wh-questions or o<strong>the</strong>r complex questions by suggesting possible answers is quite natural,<br />
some readers may find <strong>the</strong> suggestion that epistemic modals answer polar questions by<br />
suggesting possible answers a bit odd. After all, someone asking a polar question clearly<br />
has both possibilities in mind, so how can simply making one <strong>of</strong> <strong>the</strong>m more salient in <strong>the</strong><br />
context count as felicitously answering her question?<br />
Dealing with this objection involves delving into <strong>the</strong> pragmatics <strong>of</strong> epistemic modals,<br />
and more specifically <strong>the</strong> pragmatic role that salient possibilities play in a context. This<br />
topic would take a great deal <strong>of</strong> space to treat, and is beyond <strong>the</strong> scope <strong>of</strong> this paper. But<br />
to respond to <strong>the</strong> objection we note that one very plausible pragmatic principle governing<br />
<strong>the</strong> use <strong>of</strong> epistemic modals is that, in general, one should only focus attention to some<br />
possibility if one has some reason to believe that it is <strong>the</strong> case. To see this, note how<br />
infelicitous dialogue (5) sounds:<br />
(5) A: Are John and Bill coming to <strong>the</strong> party?<br />
B: They might.<br />
A: Why do you say that?<br />
B: I dont know; <strong>the</strong>y just might.<br />
Thus, pragmatically, answering a polar question with an epistemic modal can commit <strong>the</strong><br />
speaker to having some reason to believe that <strong>the</strong> possibility made salient by her answer<br />
actually obtains. This pragmatic dimension <strong>of</strong> epistemic modals makes it clear how a<br />
speaker can answer a polar question simply by making one <strong>of</strong> <strong>the</strong> possible answers ra<strong>the</strong>r<br />
than <strong>the</strong> o<strong>the</strong>r salient in <strong>the</strong> context.<br />
Ano<strong>the</strong>r objection to this framework questions <strong>the</strong> idea that, given <strong>the</strong> informal description<br />
<strong>of</strong> salient possibilities in <strong>the</strong> introduction, it makes sense to say that epistemic<br />
modals actually eliminate salient possibilities that questions introduce. After all, if a question<br />
is answered by an epistemic modal, its possible answers that are inconsistent with <strong>the</strong><br />
epistemic modal aren’t completely forgotten about. But in <strong>the</strong> formal system, <strong>the</strong>se possibilities<br />
have <strong>the</strong> same status as many o<strong>the</strong>r possibilities that <strong>the</strong> interlocutors haven’t<br />
given any thought to. Thus, this objection concludes, holding that epistemic modals actually<br />
eliminate salient possibilities from a context is far too strong.<br />
34
We take this objection seriously, and admit that <strong>the</strong> definition <strong>of</strong> salient possibilities<br />
given above is too coarse. A better definition would make salience into a scalar notion.<br />
With a scalar notion <strong>of</strong> salience, we could say that <strong>the</strong> salient possibilities eliminated by<br />
an epistemic modal acting as an answer to a question are less salient than those still in<br />
<strong>the</strong> context, but more salient than many o<strong>the</strong>r subsets <strong>of</strong> <strong>the</strong> common ground. A potential<br />
candidate for this scale is defined below:<br />
Scale.<br />
Let Γ = {CG, P1,,Pn, ∅} be an information state, and let P ⊆ CG.<br />
(i) P is 1-salient if P ∈ 〈Γ〉 and P - CG �∈ 〈Γ〉<br />
(ii) P is 2-salient if P ∈ 〈Γ〉 and P - CG ∈ 〈Γ〉<br />
(iii) P is 3-salient if P �∈ 〈Γ〉 and P - CG ∈ 〈Γ〉<br />
(iv) P is 4-salient if P �∈ 〈Γ〉 and P - CG �∈ 〈Γ〉<br />
Here, 1-salient propositions are most salient, and 4-salient propositions are least salient.<br />
In general, after an epistemic modal answers a question it changes its possible answers<br />
from 2-salient propositions to ei<strong>the</strong>r 1-salient propositions or 3-salient propositions, thus<br />
making possible answers ei<strong>the</strong>r more or less salient and not rendering any forgotten. Thus,<br />
replacing an absolute notion <strong>of</strong> salience with a scalar one solves <strong>the</strong> problem raised by <strong>the</strong><br />
objection.<br />
In this papers semantics, epistemic modals can only focus attention on possibilities that<br />
are subsets <strong>of</strong> <strong>the</strong> common ground. This is problematic because some uses <strong>of</strong> epistemic<br />
modals make possibilities that lie outside <strong>of</strong> <strong>the</strong> common ground salient in a conversation.<br />
(6) A: There arent any deer in this part <strong>of</strong> <strong>the</strong> forest.<br />
B: (2 hours later) Look over <strong>the</strong>re! Ho<strong>of</strong>prints! There might be deer after all.<br />
These modal assertions also challenge previously accepted information without directly<br />
contradicting it. To account for this use <strong>of</strong> epistemic modals, one could posit that if ⋄φ<br />
is inconsistent with <strong>the</strong> common ground <strong>of</strong> an information state, <strong>the</strong>n ⋄φ acts on this<br />
information state by: (i) introducing a salient possibility corresponding to <strong>the</strong> revision <strong>of</strong><br />
CG with φ, (ii) transforming <strong>the</strong> information states common ground into <strong>the</strong> union <strong>of</strong> this<br />
revision and <strong>the</strong> old common ground, and (iii) performing a similar operation on <strong>the</strong> o<strong>the</strong>r<br />
possibilities in <strong>the</strong> information state. Thus, though <strong>the</strong> papers <strong>the</strong>ory itself cannot account<br />
for uses <strong>of</strong> epistemic modals like (6), augmented with a <strong>the</strong>ory <strong>of</strong> belief revision it can<br />
provide an elegant analysis.<br />
I would like to thank Paul Dekker.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Acknowledgements<br />
De Rose, K. (1991). Epistemic possibilities, Philosophical Review 100.: 581–605.<br />
Egan, A., Hawthorne, J. and Wea<strong>the</strong>rson, B. (2005). Epistemic modals in context, in<br />
G. Preyer and P. Peter (eds), Contextualism in Philosophy, Oxford University Press,<br />
Oxford, pp. <strong>13</strong>1– 170.<br />
35
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Groenendijk, J. (1999). The logic <strong>of</strong> interrogation, in T. Mat<strong>the</strong>ws and D. Strolovitch<br />
(eds), The <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Ninth Conference on Semantics and Linguistic Theory,<br />
CLC Publications, Ithaca, NY, pp. 109–126.<br />
Groenendijk, J. (2007). Inquisitive semantics: Two possibilities for disjunction.<br />
Groenendijk, J. and Stokh<strong>of</strong>, M. (1997). Questions, in J. van Ben<strong>the</strong>m and A. T. Meulen<br />
(eds), Handbook <strong>of</strong> Logic and Language, Elsevier.<br />
Stalnaker, R. (1978). Assertion, Syntax and Semantics 9.<br />
Veltman, F. (1996). Defaults in update semantics, Journal <strong>of</strong> Philosophical Logic 25(3).<br />
36
BARE PREDICATION AND KINDS ∗<br />
Bert Le Bruyn<br />
Utrecht University<br />
Abstract. This paper treats <strong>the</strong> distinction between singular nominal predication with and<br />
without indefinite article in languages like Dutch. The former variant is referred to as nonbare<br />
predication, <strong>the</strong> latter as bare predication. I make <strong>the</strong> following claims: (i) temporal<br />
analyses <strong>of</strong> <strong>the</strong> distinction between bare and non-bare predication are on <strong>the</strong> wrong track,<br />
(ii) bare predication needn’t be analyzed as a lexical phenomenon, (iii) non-bare predication<br />
should be analyzed as kind-membership predication.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In order to understand <strong>the</strong> role played by <strong>the</strong> indefinite article in predicate position it is instructive<br />
to look at instances <strong>of</strong> singular nominal predication in which <strong>the</strong> indefinite article<br />
does not appear. These instances are subsumed under <strong>the</strong> notion <strong>of</strong> bare predication (see<br />
(Kupferman, 1991), (Broekhuis, Keizer and Den Dikken, 2003), (de Swart, Winter and<br />
Zwarts, 2005), (de Swart, Winter and Zwarts, 2007), (Matushansky and Spector, 2005),<br />
(Déprez, 2005), (Munn and Schmitt, 2005), (Roy, 2006), (Beyssade and Dobrovie-Sorin,<br />
2005)). In English bare predication is marginal but a language like Dutch seems to have<br />
a productive paradigm:<br />
(1) (a) Jan is slager. (litt. John is butcher) (b) Jan is moslim. (litt. John is muslim)<br />
(c) Jan is Belg. (litt. John is Belgian) (d) Jan is hertog. (litt. John is duke)<br />
Nouns that typically occur in bare predication are linked to pr<strong>of</strong>essions (1a), religions<br />
(1b), nationalities (1c) and titles (1d). It is important to note that this is not an idiosyncracy<br />
<strong>of</strong> Dutch but a pervasive phenomenon in Romance and Germanic languages (examples<br />
taken from (de Swart et al., 2007)):<br />
(2) Es negrero. (Spanish, litt. Is trader in black slaves); João é médico. (Portuguese,<br />
litt. John is doctor); Gianni è dottore. (Italian, litt. John is doctor); Jean est<br />
médecin. (French, litt. John is doctor); Olivier var skuespiller. (Danish, litt. Oliver<br />
was actor); Herr Weber är katolik. (Swedish, litt. Mr Weber is catholic); Han er<br />
lærer. (Norwegian, litt. He is teacher); Er ist praktizierender Katholik. (German,<br />
litt. He is practicing catholic).<br />
∗ This paper should be read as a working paper that presents thoughts and bits <strong>of</strong> analysis that are not<br />
finished yet. I’m very grateful to audiences at ConSOLE XVI, my UiL-OTS kermit-lecture and <strong>the</strong> LSB<br />
2008 Linguists’ Day and to <strong>the</strong> reviewers <strong>of</strong> <strong>the</strong> <strong>ESSLLI</strong> student session for very useful comments and<br />
discussion. Special thanks also to Min Que, Gianluca Giorgolo, Dorota Klimek, Sander Lestrade, Joost<br />
Zwarts and Henriëtte de Swart.<br />
37
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In this paper I will defend three claims concerning bare predication. The first is that analyses<br />
that reduce <strong>the</strong> distinction between bare and non-bare predication to a temporal one<br />
are not on <strong>the</strong> right track (see paragraph 2). The second is that a purely lexical approach to<br />
bare predication is not tenable (see paragraph 3). The third and final one is that non-bare<br />
predication should be analyzed as kind-membership predication (see paragraph 4).<br />
2 Bare predication and time<br />
When comparing sentences (3a) and (3b) most informants tend to say that <strong>the</strong> a-variant is<br />
more ’eventive’ than <strong>the</strong> b-variant (Roy, 2006):<br />
(3) (a) Paul est acteur. (French, litt. Paul is actor)<br />
(b) Paul est un acteur. (French, litt. Paul is an actor)<br />
This intuition has led linguists to explore a temporal analysis <strong>of</strong> bare predication. In its<br />
simplest form it would state that bare predication is concerned with transient properties<br />
whereas non-bare predication is concerned with permanent ones. The most convincing<br />
argument in favour <strong>of</strong> this analysis comes from ’lifetime effects’:<br />
(4) (a) Paul était médecin. (French, litt. Paul was doctor)<br />
(b) Paul était un médecin. (French, litt. Paul was a doctor)<br />
Sentence (4a) can be understood as stating that Paul used to be a doctor and that he’s<br />
retired now. Sentence (4b) can only mean that Paul is dead. Under <strong>the</strong> assumption that<br />
non-bare predication is concerned with permanent properties <strong>the</strong> interpretation <strong>of</strong> sentence<br />
(4b) follows: to cancel a permanent property one has to cancel <strong>the</strong> existence <strong>of</strong> <strong>the</strong><br />
entity <strong>the</strong> property applies to. The problem this analysis faces is that it predicts that inherently<br />
transient properties should always occur bare in predicate position. This prediction<br />
is not borne out (cf. (de Swart et al., 2007)):<br />
(5) ?? Marie est fille. (French, litt. Mary is girl)<br />
Ano<strong>the</strong>r temporal approach to bare predication is <strong>the</strong> one presented in (Roy, 2006) (variants<br />
are (Munn and Schmitt, 2005) and (Déprez, 2005)). Roy assumes all nouns come<br />
with an event argument that has to be bound. When bound by <strong>the</strong> indefinite article it is<br />
signalled that <strong>the</strong> predication holds for <strong>the</strong> maximal event around <strong>the</strong> ’time <strong>of</strong> utterance’<br />
(given by <strong>the</strong> Tense on <strong>the</strong> copula) and that this event cannot be split up into smaller intervals.<br />
When bound by Tense it is signalled that <strong>the</strong> maximal event can be split up. The<br />
facts that led to this analysis are presented in (6) and (7):<br />
(6) (a) Jean est pr<strong>of</strong>esseur le jour, danseur la nuit.<br />
(French, litt. John is teacher by day, dancer by night)<br />
(b) ?? Jean est un pr<strong>of</strong>esseur le jour, un danseur la nuit.<br />
(French, litt. John is a teacher by day, a dancer by night)<br />
(7) (a) Paul est devenu chanteur.<br />
(French, litt. Paul has become singer)<br />
(b) ?? Paul est devenu un chanteur.<br />
(French, litt. Paul has become a singer)<br />
38
The reason why <strong>the</strong> b-variants are out on Roy’s analysis is that adverbials like le jour<br />
... la nuit (’by day ... by night’) and verbs like devenir (’become’) split up <strong>the</strong> ’time <strong>of</strong><br />
utterance’. This is depicted for <strong>the</strong> adverbials in (8) and for <strong>the</strong> verb in (9).<br />
(8)<br />
(9)<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
It is important to note that in absence <strong>of</strong> temporal adverbials or verbs like devenir <strong>the</strong>re is<br />
no clear reason in Roy’s analysis to prefer bare over non-bare predication or vice versa.<br />
In order to account for preferences like in (5) Roy has to assume that whenever world<br />
knowledge makes it implausible / impossible that <strong>the</strong> maximal event is split up <strong>the</strong> indefinite<br />
article is obligatory and that whenever world knowledge makes it plausible / possible<br />
that <strong>the</strong> maximal event is split up <strong>the</strong> indefinite article ends being obligatory.<br />
The problem Roy’s analysis faces is that <strong>the</strong> incompatibility <strong>of</strong> non-bare predication with<br />
temporal adverbials or verbs like devenir is only a strong tendency that surfaces as an<br />
epiphenomenon. To show this it is necessary to anticipate <strong>the</strong> analysis presented in paragraph<br />
4. There it is claimed that non-bare predication signals kind-membership. A sentence<br />
like (10) e.g. would mean that White Fang belongs to <strong>the</strong> kind wolf.<br />
(10) White Fang is een wolf.<br />
(Dutch, litt. White Fang is a wolf)<br />
What makes kind-membership special is that in general one cannot change from one kind<br />
into ano<strong>the</strong>r. White Fang e.g. cannot turn into a sheep or a wild boar. This explains why<br />
non-bare predication in general is incompatible with temporal adverbials or verbs like devenir.<br />
There are however instances <strong>of</strong> transformations in nature and in folklore: e.g. <strong>the</strong><br />
transformation from a caterpillar into a butterfly and from a man into a werewolf. The former<br />
can be described in a sentence with <strong>the</strong> verb devenir and <strong>the</strong> latter in a sentence with<br />
temporal adverbials. Roy’s analysis predicts that in <strong>the</strong>se sentences non-bare predication<br />
is not allowed. An analysis that takes non-bare predication to signal kind-membership<br />
predicts <strong>the</strong> opposite. As shown by <strong>the</strong> acceptability <strong>of</strong> (11) and (12) it is <strong>the</strong> latter that<br />
makes <strong>the</strong> right prediction.<br />
(11) In Lady Hawke is Rutger Hauer ’s nachts een wolf en overdag een mens.<br />
(Dutch, litt. In Lady Hawke is Rutger Hauer by night a wolf and by day a man)<br />
(12) La chenille est devenue un papillon.<br />
(French, litt. The caterpillar has become a butterfly)<br />
39
From <strong>the</strong> preceding I conclude that <strong>the</strong> existing analyses that try to reduce <strong>the</strong> distinction<br />
between bare and non-bare predication to a temporal one are not on <strong>the</strong> right track. It was<br />
important to establish this given that most existing analyses are cast in temporal terms<br />
whereas <strong>the</strong> one I will defend in paragraph 4 is not.<br />
3 Bare predication and <strong>the</strong> lexicon<br />
In <strong>the</strong> literature on bare predication one <strong>of</strong> <strong>the</strong> following positions is <strong>of</strong>ten taken: (i)<br />
nouns that usually appear in non-bare predication are marked in <strong>the</strong> lexicon (see e.g.<br />
(Matushansky and Spector, 2005)); (ii) nouns that usually appear in bare predication are<br />
marked in <strong>the</strong> lexicon (see e.g. (de Swart et al., 2005), (de Swart et al., 2007)). In this<br />
section it will be argued that purely lexical standpoints like (i) and (ii) should be amended.<br />
In order to do so it will be shown that :<br />
(a) all nouns that usually appear in bare predication can appear in non-bare predication;<br />
(b) all nouns that usually appear in non-bare predication can appear in bare predication.<br />
It should be noted that (a) and (b) don’t constitute decisive arguments against lexical<br />
analyses. They do however make <strong>the</strong>m less appealing.<br />
31 Bare predication nouns<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
As stated in paragraph 1 <strong>the</strong>re is a subclass <strong>of</strong> nouns that usually appear in bare predication.<br />
They include nouns related to pr<strong>of</strong>essions, religions, nationalities and titles. It is<br />
however well-known that <strong>the</strong>se nouns appear fairly frequently in non-bare predication too<br />
(see e.g. (de Swart et al., 2005), (de Swart et al., 2005)). When <strong>the</strong>y do <strong>the</strong>y allow for<br />
<strong>the</strong>ir normal interpretation and an enriched one. This will be illustrated on <strong>the</strong> basis <strong>of</strong><br />
(<strong>13</strong>):<br />
(<strong>13</strong>) (a) Sil is beenhouwer. (Dutch, litt. Sil is butcher)<br />
(b) Sil is een beenhouwer. (Dutch, litt. Sil is a butcher)<br />
The a-variant is <strong>the</strong> unmarked one and simply states that Sil works as a butcher. The bvariant<br />
has <strong>the</strong> same interpretation but also allows <strong>the</strong> interpretation according to which<br />
Sil is not a butcher but has <strong>the</strong> characteristics we usually associate with butchers. A<br />
typical person <strong>the</strong> b-variant would apply to is a violent boxer. The enriched interpretation<br />
projects <strong>the</strong> (stereotypical) characteristics that are associated with a pr<strong>of</strong>ession on<br />
an individual. From a lexical standpoint one could see <strong>the</strong> enriched interpretation as an<br />
instance <strong>of</strong> coercion. Note though that if we store in our world knowledge that butcher is<br />
a pr<strong>of</strong>ession we can get <strong>the</strong> same coercion effect to arise.<br />
32 Non-bare predication nouns<br />
The majority <strong>of</strong> nouns in languages like Dutch usually appears in non-bare predication.<br />
Up to date <strong>the</strong>se nouns have been defined negatively; <strong>the</strong>y are those that are not related to<br />
pr<strong>of</strong>essions, religions, nationalities and titles.<br />
In <strong>the</strong> literature <strong>the</strong>re are two claims about nouns appearing in bare predication. The<br />
first is that <strong>the</strong>y are usually [+ human] (cf. (Matushansky and Spector, 2005) and (Roy,<br />
40
2006)). The second is that nouns referring to kinds (which would be a subset <strong>of</strong> nonbare<br />
predication nouns) can never appear in bare predication (cf. (Kupferman, 1991) and<br />
(Roy, 2006)). In order to argue that all non-bare predication nouns can in principle appear<br />
in bare predication <strong>the</strong> strongest claim would <strong>the</strong>refore be to say that even [-human] and<br />
[+kind] nouns can appear in bare predication. This is <strong>the</strong> claim I defend here.<br />
A noun that meets both <strong>the</strong> [-human] and <strong>the</strong> [+kind] criterion is wolf. An example <strong>of</strong><br />
wolf in non-bare predication was given in (10). Its bare variant would look as follows:<br />
(14) Ik ben wolf. (Dutch, litt. I am wolf)<br />
Even though (14) might seem ungrammatical at first sight it is acceptable in Dutch under<br />
a very specific interpretation, viz. <strong>the</strong> one in which wolf is a role in a game (e.g. <strong>the</strong><br />
werewolf game). This should not come as a surprise given that it is <strong>of</strong>ten claimed that<br />
bare predication nouns refer to roles in society:<br />
”[Bare predication nouns] usually [...] denote specific roles in society: pr<strong>of</strong>essions, religions<br />
or nationalities. O<strong>the</strong>r nominals (non-human or human) that are not related to such<br />
roles generally resist taking up a bare nominal position.” (de Swart et al., 2007)<br />
Under <strong>the</strong> assumption that any noun can be reinterpreted as referring to a role in a game<br />
<strong>the</strong>re is no reason to expect a principled limit on nouns appearing in bare predication.<br />
Note that <strong>the</strong> reinterpretation referred to can be seen as a coercion mechanism from a lexical<br />
standpoint. Once again it is not obvious though that we couldn’t get <strong>the</strong> same effect<br />
through world knowledge.<br />
33 Conclusion<br />
In 3.1. and 3.2. it was argued that any noun can appear in both bare predication and<br />
non-bare predication. As noted before <strong>the</strong>se facts cannot be seen as decisive arguments<br />
against a lexical approach. They do however make lexical approaches less appealing and<br />
clear <strong>the</strong> road for non-lexical analyses like <strong>the</strong> one that will be presented in paragraph 4.<br />
4 Bare predication and kinds<br />
In this paragraph I will introduce <strong>the</strong> basic ingredients for an analysis in which non-bare<br />
predication is seen as kind-membership predication. The basic claim is that a sentence<br />
involving non-bare predication should be interpreted as ’X belongs to <strong>the</strong> kind Y’. The<br />
paragraph is organized as follows. I first present my background assumptions about kinds<br />
and articles (4.1. and 4.2.). Afterwards I present a pragmatic analysis <strong>of</strong> <strong>the</strong> contrast between<br />
bare and non-bare predication (4.3). I close <strong>the</strong> paragraph defending <strong>the</strong> claim that<br />
<strong>the</strong>re is a one-to-one correspondence between non-bare predication and kind-membership<br />
predication (4.4).<br />
41 Background on kinds<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
I follow Chierchia (1998) in his intuition that kinds are regularities that occur in nature.<br />
This translates into two constraints on kinds and <strong>the</strong>ir instantiations. The first (see (15))<br />
captures <strong>the</strong> intuition that for something to be regular it should be hypo<strong>the</strong>sized that <strong>the</strong>re<br />
41
could be more than one. Note though that for K to qualify as a kind in w0 it is not<br />
necessary for <strong>the</strong>re to be more than one or even one single instantiation <strong>of</strong> K in w0 (this<br />
makes it possible to talk about unicorns, dodos and new inventions as kinds).<br />
(15) For K to be a kind in w0 <strong>the</strong>re has to be at least one world in which K has more<br />
than one instantiation.<br />
The second constraint (see (16)) captures <strong>the</strong> intuition that <strong>the</strong> instantiations <strong>of</strong> kinds<br />
behave in a regular way, i.e. that <strong>the</strong>ir kind-membership is not accidental. Note though<br />
that it does not prohibit kinds to display properties varying over time nor for individuals<br />
to start or stop being instantiations <strong>of</strong> a kind (this is left to world knowledge).<br />
(16) If k is an instantiation <strong>of</strong> <strong>the</strong> kind K in w0 at tn and if k exists in a world wn<br />
accessible from w0 at tn k is an instantiation <strong>of</strong> <strong>the</strong> kind K in wn at tn.<br />
I will call (15) <strong>the</strong> non-uniqueness constraint and (16) <strong>the</strong> non-accidentality constraint on<br />
kinds and <strong>the</strong>ir instantiations.<br />
42 Background on articles<br />
I follow Partee (1987) in assuming that articles are default type-shifters from type to<br />
type e or type . In short this means that <strong>the</strong>y are markers <strong>of</strong> argumenthood and<br />
that <strong>the</strong>y cannot be omitted in absence <strong>of</strong> o<strong>the</strong>r determiners in argument position:<br />
(17) *I have cat.<br />
(18) *Man came to see me.<br />
I fur<strong>the</strong>rmore follow (Hawkins, 1991) and (Farkas, 2002) in assuming that <strong>the</strong> definite<br />
article is a uniqueness marker whereas <strong>the</strong> indefinite article is unmarked for uniqueness.<br />
This means that (19) signals that <strong>the</strong>re is only one teacher present in a particular setting<br />
whereas (20) is in principle neutral with respect to <strong>the</strong>re being one or more teachers.<br />
(19) I saw <strong>the</strong> teacher.<br />
(20) I saw a teacher.<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
As noted by Hawkins and Farkas it is <strong>the</strong> case though that by choosing <strong>the</strong> indefinite<br />
instead <strong>of</strong> <strong>the</strong> definite <strong>the</strong> speaker triggers <strong>the</strong> implicature that <strong>the</strong>re is more than one<br />
teacher.<br />
Finally, in line with Partee’s type-shifting analysis I expect indefinite articles to be omissible<br />
in predicate position. The instances <strong>of</strong> bare predication treated in this paper show that<br />
this expectation is borne out. The crucial question is why <strong>the</strong>y cannot always be omitted.<br />
The answer, I claim, does not lie in <strong>the</strong> semantics but in <strong>the</strong> pragmatics. The pragmatic<br />
analysis I defend is presented in 4.3.<br />
42
43 Non-bare predication and non-uniqueness<br />
The analysis I defend is cast in (Weak) Bi-directional Optimality Theory (cf. (Blutner,<br />
2000)) and is based on five standard assumptions. The first is that bare and non-bare<br />
predication are truth-conditionally equivalent (cf. (Partee, 1987)). The second assumption<br />
is that both bare and non-bare predication in principle trigger an implicature <strong>of</strong> nonuniqueness.<br />
This assumption builds on <strong>the</strong> insights <strong>of</strong> Hawkins and Farkas according to<br />
whom not using <strong>the</strong> definite triggers an implicature <strong>of</strong> non-uniqueness. The third assumption<br />
is that non-bare predication is syntactically more marked than bare predication (cf.<br />
(de Swart and Zwarts, To appear)). Syntactic markedness can be understood in terms <strong>of</strong><br />
projections: whereas non-bare predication involves DPs, bare predication only involves<br />
NPs (or NumPs). The fourth assumption is that conveying non-uniqueness is semantically<br />
more marked than conveying neutrality with respect to uniqueness (cf. (de Swart and<br />
Zwarts, To appear)). Semantic markedness can be understood in terms <strong>of</strong> compatibility:<br />
non-uniqueness is compatible with neutrality but neutrality is not necessarily compatible<br />
with non-uniqueness. The fifth and final assumption is that unmarked forms and meanings<br />
are preferred over marked forms and meanings (a standard assumption in <strong>the</strong> OT<br />
literature). The resulting (Weak) Bi-directional OT tableau is presented in (21).<br />
(21)<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
What comes out <strong>of</strong> this analysis is that bare predication is neutral with respect to uniqueness<br />
whereas non-bare predication marks non-uniqueness.<br />
44 Kinds and non-bare predication<br />
In 4.1. I claimed - on <strong>the</strong> basis <strong>of</strong> common intuitions - that kinds are subject to a nonuniqueness<br />
constraint. In 4.3. I claimed - on <strong>the</strong> basis <strong>of</strong> standard assumptions - that<br />
non-bare predication marks non-uniqueness whereas bare predication is neutral with respect<br />
to uniqueness. When we combine both claims it follows that non-bare predication<br />
is best suited to signal kind-membership. As I will demonstrate in what follows this is<br />
indeed what it does in languages like Dutch. I will show this on <strong>the</strong> basis <strong>of</strong> five predictions<br />
that follow from <strong>the</strong> claim that <strong>the</strong>re is one-to-one correspondence between non-bare<br />
predication and kind-membership predication.<br />
The first prediction is that all predication involving kind-membership has to involve <strong>the</strong><br />
indefinite article. That this is <strong>the</strong> case has been suggested by (Kupferman, 1991) and<br />
(Roy, 2006) and as far as I know this has never been challenged. Note that (14) is not<br />
a counterexample. (14) shows that bare predication may involve nouns that are usually<br />
associated with kinds but it is not an instance <strong>of</strong> kind-membership predication. Note also<br />
43
that kinds are not restricted to plants or animals but may involve things as diverse as bottles,<br />
chairs, ... in as far as <strong>the</strong>y show a sufficiently regular behaviour (see 4.1).<br />
The second prediction my claim about non-bare predication and kind-membership makes<br />
is that bare predication should be concerned with <strong>the</strong> predication <strong>of</strong> properties that are unlike<br />
those that link a kind to its instantiations. In view <strong>of</strong> <strong>the</strong> non-accidentality constraint<br />
on kinds and <strong>the</strong>ir instantiations (see 4.1) it is <strong>the</strong>n predicted that bare predication is concerned<br />
with accidental properties. To see that this is exactly what happens it is instructive<br />
to look at those nouns that usually appear in bare predication: nouns linked to pr<strong>of</strong>essions,<br />
religions, nationalities and titles. These ”do not depend on <strong>the</strong> inherent, natural properties<br />
<strong>of</strong> a person or what <strong>the</strong> person actually does, but on <strong>the</strong> social or cultural status <strong>of</strong> that<br />
person” (de Swart et al., 2007).<br />
The third prediction is that whenever a noun that is usually associated with kinds is used<br />
in bare predication it is reinterpreted in such a way that it no longer predicates a nonaccidental<br />
property. An example was given in (14): being a wolf in (14) is an accidental<br />
property that comes with <strong>the</strong> distribution <strong>of</strong> roles in a game.<br />
The fourth prediction is that whenever a noun that is usually not associated with kinds<br />
is used in non-bare predication it is reinterpreted in such a way that it starts predicating<br />
non-accidental properties. An example was given in (<strong>13</strong>b): for Sil to be a butcher is no<br />
longer seen as an accidental property but ra<strong>the</strong>r as something that is linked to his inherent<br />
properties. This explains why Sil needn’t be a butcher by pr<strong>of</strong>ession to make (<strong>13</strong>b) true.<br />
The fifth prediction is that whenever it is not clear whe<strong>the</strong>r something is an accidental<br />
property or not <strong>the</strong>re is variation in <strong>the</strong> predication that is used. One telling example is<br />
that <strong>of</strong> diseases like alcoholism. According to some alcoholism is a disease that people<br />
may or may not get, according to o<strong>the</strong>rs alcoholics are <strong>the</strong>mselves responsible and are<br />
not sick in <strong>the</strong> classical meaning <strong>of</strong> <strong>the</strong> word. Interestingly this division is reflected in <strong>the</strong><br />
use <strong>of</strong> <strong>the</strong> more clinical alcoholieker (Dutch, ’alcoholic’) and <strong>the</strong> more popular drinker<br />
(Dutch, ’drinker’). On google I found <strong>the</strong> former 43 times in bare predication and 8 times<br />
in non-bare predication whereas <strong>the</strong> latter appeared 364 times in non-bare predication and<br />
only 6 times in bare predication. 1<br />
5 Conclusion<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
This paper started out as an investigation into <strong>the</strong> role <strong>of</strong> <strong>the</strong> indefinite article in predicate<br />
position. The analysis I defended is that through its competition with <strong>the</strong> bare form it<br />
marks non-uniqueness which in turn can be linked to kind-membership predication. This<br />
analysis is attractive in at least three respects. The first is that <strong>the</strong> indefinite article maintains<br />
its standard semantics and pragmatics and is not reduced to a vacuous item. The<br />
second is that it <strong>of</strong>fers a formalizable alternative to temporal analyses that were shown to<br />
make wrong predictions. The third is that it brings toge<strong>the</strong>r intuitions and claims from<br />
work on kinds and work on bare predication that lend <strong>the</strong>mselves to an interesting remix.<br />
1 The google search was done on www.google.nl (restricted to Dutch pages) and concerned searches <strong>of</strong><br />
<strong>the</strong> form ”is drinker” / ”is alcoholieker”.<br />
44
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Beyssade, C. and Dobrovie-Sorin, C. (2005). Bare predicate nominals in dutch, <strong>Proceedings</strong><br />
<strong>of</strong> SALT 15.<br />
Blutner, R. (2000). Some aspects <strong>of</strong> optimality in natural language interpretation, Journal<br />
<strong>of</strong> Semantics 17.<br />
Broekhuis, H., Keizer, E. and Den Dikken, M. (2003). Modern grammar <strong>of</strong> Dutch. Occasional<br />
papers 4, Tilburg University, Tilburg.<br />
de Swart, H., Winter, Y. and Zwarts, J. (2005). Bare predicate nominals in dutch, in<br />
E. Maier, C. Bary and J. Huitink (eds), <strong>Proceedings</strong> <strong>of</strong> SuB9.<br />
de Swart, H., Winter, Y. and Zwarts, J. (2007). Bare nominals and reference to capacities,<br />
Natural Language and Linguistic Theory 25.<br />
de Swart, H. and Zwarts, J. (To appear). Nominals with and without an article: Distribution,<br />
interpretation and variation, in P. Hendriks, H. de Hoop, I. Krämer, H. de Swart<br />
and J. Zwarts (eds), Conflicts in Interpretation.<br />
Déprez, V. (2005). Morphological number, semantic number and bare nouns, Lingua 115.<br />
Farkas, D. (2002). Specificity distinctions, Journal <strong>of</strong> Semantics 19.<br />
Hawkins, J. (1991). On (in)definite articles: implicatures and (un)grammaticality prediction,<br />
Journal <strong>of</strong> Linguistics 27.<br />
Kupferman, L. (1991). Structure événementielle de l’ alternance un / ∅ devant les noms<br />
humains attributs, Langage 102.<br />
Matushansky, O. and Spector, B. (2005). Tinker, tailor, soldier, spy, in E. Maier, C. Bary<br />
and J. Huitink (eds), <strong>Proceedings</strong> <strong>of</strong> SuB9.<br />
Munn, A. and Schmitt, C. (2005). Number and indefinites, Lingua 115.<br />
Partee, B. (1987). Noun phrase interpretation and type-shifting principles, in J. Groenendijk,<br />
D. de Jongh and M. Stokh<strong>of</strong> (eds), Studies in Discourse Representation<br />
Theory and <strong>the</strong> Theory <strong>of</strong> Generalized Quantifiers, Foris, Dordrecht.<br />
Roy, I. (2006). Non-verbal predications: a syntactic analysis <strong>of</strong> predicational copular<br />
sentences, PhD <strong>the</strong>sis, University <strong>of</strong> Sou<strong>the</strong>rn California.<br />
45
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
46
DIAGRAMMATIC REASONING<br />
WITH ENHANCED STATIC CONSTRAINTS<br />
James Burton<br />
University <strong>of</strong> Brighton<br />
Abstract. This paper reports on ongoing work to create a pro<strong>of</strong>-carrying Domain Specific<br />
Embedded Language (DSEL) for diagrammatic logics, using Euler diagrams as a case study.<br />
The DSEL is written in Haskell with type system extensions that allow <strong>the</strong> exploitation <strong>of</strong><br />
a combination <strong>of</strong> ideas from Constructive Type Theory. These extensions <strong>of</strong>fer an increase<br />
in expressiveness over Hindley-Milner type systems and have been used for program verification.<br />
We use <strong>the</strong>se extensions to create enhanced static constraints to enforce invariants<br />
on diagrams and transformations (inference rules). Our work is at an early stage and we<br />
describe <strong>the</strong> goals and challenges ahead. The major goal is to create a DSEL for generalized<br />
constraint diagrams, a visual logic expressive enough to be useful for modelling s<strong>of</strong>tware,<br />
and to extract <strong>the</strong> types <strong>of</strong> <strong>the</strong> resulting diagrams for use as s<strong>of</strong>tware artefacts.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
A great deal <strong>of</strong> effort is spent on attempts to increase s<strong>of</strong>tware reliability and <strong>the</strong> productivity<br />
<strong>of</strong> programmers, by both <strong>the</strong> research community and <strong>the</strong> s<strong>of</strong>tware industry. Of<br />
<strong>the</strong> techniques employed (development methodologies, systematic modelling, automated<br />
testing), formal methods have been little used outside <strong>of</strong> <strong>the</strong> most safety-critical sectors<br />
where <strong>the</strong>y are used to verify semantic properties <strong>of</strong> s<strong>of</strong>tware and to assure desired runtime<br />
conditions. We believe <strong>the</strong> benefits <strong>of</strong> <strong>the</strong>ir more widespread use could be great, but<br />
<strong>the</strong> impact <strong>of</strong> factors inhibiting adoption needs to be reduced. These factors may include<br />
<strong>the</strong> fact that existing techniques are seen as difficult to use, time-consuming and requiring<br />
specialised expertise. There is, <strong>the</strong>refore, a need for more “lightweight” formal methods<br />
which are accessible to programmers with a minimum <strong>of</strong> specialised training and which<br />
fit in seamlessly with <strong>the</strong> tools <strong>the</strong>y employ. Sheard has said that enabling programmers<br />
to make statements about semantic properties <strong>of</strong> <strong>the</strong> code <strong>the</strong>y write directly, ra<strong>the</strong>r than<br />
turning to external tools with high barriers to entry (likely to be written by, and for, ma<strong>the</strong>maticians)<br />
will make it more likely that <strong>the</strong>y do so — in short, that <strong>the</strong> semantic gap<br />
between <strong>the</strong> tools for programming and those for formal reasoning is damaging to <strong>the</strong><br />
cause <strong>of</strong> both (Sheard, 2004).<br />
At <strong>the</strong> same time as <strong>the</strong> Unified Modelling Language (UML) was adopted as a standard<br />
visual language for modelling s<strong>of</strong>tware in <strong>the</strong> 1990s, breakthroughs occured in <strong>the</strong><br />
use <strong>of</strong> diagrams as visual logics (Shin, 1994; Hammer, 1995). Shin proved soundness and<br />
completeness results for <strong>the</strong> so-called Venn-II reasoning system, equivalent in expressive<br />
power to Monadic First Order Logic, and research began into a number <strong>of</strong> diagrammatic<br />
reasoning systems varying in notation and expressive power. The connection between<br />
<strong>the</strong> new formalised diagrams and those used in s<strong>of</strong>tware modelling was quickly made.<br />
Although <strong>the</strong> UML works well to describe <strong>the</strong> architecture <strong>of</strong> a system it is not always expressive<br />
enough to capture all invariants we might wish to enforce, a fact which led to <strong>the</strong><br />
development <strong>of</strong> <strong>the</strong> (non-graphical) Object Constraint Language (OCL). Kent proposed<br />
constraint diagrams as a purely diagrammatic alternative to <strong>the</strong> OCL, more appropriately<br />
47
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
complementing <strong>the</strong> UML’s visual nature (Kent, 1997). The constraint diagram in figure 1<br />
shows a constraint in a library management system. Amongst o<strong>the</strong>r things it states that<br />
people can only borrow books that are in <strong>the</strong> collections <strong>of</strong> libraries <strong>the</strong>y have joined.<br />
Figure 1: A constraint diagram and an Euler diagram.<br />
There are many reasons that we might want to use diagrams to represent information,<br />
including <strong>the</strong> potential <strong>of</strong> diagrams for well matchedness and free rides. A diagram is<br />
well matched to its subject if it presents <strong>the</strong> key features <strong>of</strong> that subject effectively and in<br />
a way that seems intuitively clear to <strong>the</strong> viewer (Gurr and Tourlas, 2000). A well matched<br />
diagram can make certain reasoning tasks appear to be easier when compared with a symbolic<br />
representation <strong>of</strong> <strong>the</strong> same information. Free rides occur when a diagram provides<br />
some information ‘naturally’ or ‘for free’ which would need to be explicitly stated in, or<br />
derived from, a symbolic representation (Shimojima, 2004). For example, in <strong>the</strong> Euler diagram<br />
in figure 1, <strong>the</strong> fact that <strong>the</strong> contour Spaniels is placed within GunDogs asserts directly<br />
that Spaniels ⊆ GunDogs but also allows <strong>the</strong> viewer to infer Spaniels ⊆ Dogs and<br />
Spaniels ∩ Cats = ∅. Details <strong>of</strong> well matchedness and free rides in constraint diagrams<br />
can be found in (Stapleton and Delaney, 2008). In some circumstances <strong>the</strong> expressive<br />
power <strong>of</strong> diagrams can produce ambiguity, or lead <strong>the</strong> viewer to make false inferences.<br />
However, many diagrammatic notations now have formal, unambiguous semantics, <strong>of</strong><br />
which Euler and constraint diagrams are prominent examples.<br />
Our ultimate goal is to create a Domain Specific Embedded Language (DSEL) for<br />
several systems <strong>of</strong> diagrammatic reasoning, with two main aims: to explore <strong>the</strong> benefits<br />
and boundaries <strong>of</strong> <strong>the</strong> emerging style <strong>of</strong> programming that mixes formal methods with<br />
programming, and to support <strong>the</strong> work which aims to establish visual logics as a valuable<br />
tool in formal methods.<br />
The DSEL will be written in Haskell and will consist <strong>of</strong> statically verified code which<br />
will allow <strong>the</strong> user to manipulate and reason with a variety <strong>of</strong> visual logics such as Euler<br />
diagrams, spider diagrams and constraint diagrams (see Section 2). The DSEL, <strong>the</strong>refore,<br />
shares one <strong>of</strong> <strong>the</strong> primary aims <strong>of</strong> visual logics — to make formal reasoning more accessible<br />
and widely used. Reasoning about design and implementation have traditionally<br />
taken place in separate phases <strong>of</strong> <strong>the</strong> s<strong>of</strong>tware process, with <strong>the</strong> onus on <strong>the</strong> programmer<br />
to bridge <strong>the</strong> gap between <strong>the</strong> two. One <strong>of</strong> <strong>the</strong> benefits <strong>of</strong> combining both activities in one<br />
phase is that constraints modelled by a programmer using <strong>the</strong> DSEL will form s<strong>of</strong>tware<br />
components in <strong>the</strong>ir own right, resulting in diagrams with <strong>the</strong> same type as functions in<br />
48
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
<strong>the</strong> modelled system. This suggests that such constraints could eventually form part <strong>of</strong><br />
working s<strong>of</strong>tware, perhaps as part <strong>of</strong> a “trusted kernel” used by o<strong>the</strong>r components, following<br />
<strong>the</strong> approach <strong>of</strong> (Kiselyov and Shan, 2007). The form and function <strong>of</strong> <strong>the</strong> DSEL will<br />
<strong>the</strong>refore be closely linked — a formally specified language to assist formal reasoning.<br />
Advances in Programming Language Theory are typically explored in research languages<br />
before percolating into more widely used languages. This is especially true <strong>of</strong><br />
modern functional languages and, in particular, Haskell. The Haskell type system with<br />
<strong>the</strong> extensions provided by <strong>the</strong> GHC compiler make it possible to explore what Sheard<br />
called (when speaking <strong>of</strong> <strong>the</strong> closely related language Ωmega) “a new point in <strong>the</strong> design<br />
space <strong>of</strong> formal reasoning systems — part programming language, part logical framework”<br />
(Sheard, 2004) and to do so directly within <strong>the</strong> environment <strong>of</strong> a practical language<br />
with efficient implementations. The language features that enable this can be used to emulate<br />
<strong>the</strong> behaviour <strong>of</strong> fully dependently typed languages such as Epigram (Altenkirch,<br />
Mcbride and Mckinna, 2005), resulting in what have been called “pseudo-dependently<br />
typed” systems, described in Section 4. The syntactic clarity, referential transparency<br />
and similarity to ma<strong>the</strong>matical notation <strong>of</strong> functional languages are also <strong>of</strong> benefit to us.<br />
These features help us in our goal to minimise syntactic differences between <strong>the</strong> DSEL<br />
and <strong>the</strong> diagrammatic logics we implement, making it easier to demonstrate a clear mapping<br />
between <strong>the</strong> two. The point <strong>of</strong> this mapping is to demonstrate “literal preservation<br />
<strong>of</strong> syntactic relations under denotation”, as Hammer states <strong>the</strong> conditions for resemblance<br />
between a sign and that which it signifies (Hammer, 1995).<br />
In Section 2 we describe reasoning with Euler diagrams. Section 3 gives an overview<br />
<strong>of</strong> type <strong>the</strong>oretic features making <strong>the</strong>ir way into programming languages while Section<br />
4 looks ahead to <strong>the</strong> form our DSEL will take, using Euler diagrams as a case study. In<br />
Section 5 we consider <strong>the</strong> goals <strong>of</strong> <strong>the</strong> research, evaluate <strong>the</strong> strategies used to reach <strong>the</strong>m<br />
and identify some <strong>of</strong> <strong>the</strong> challenges ahead.<br />
2 Reasoning with Euler diagrams<br />
Although diagrams have <strong>of</strong>ten been used to aid understanding in ma<strong>the</strong>matical pro<strong>of</strong>s,<br />
<strong>the</strong>y have until fairly recently been treated as informal and secondary to formalized symbolic<br />
content. In <strong>the</strong> 1990s <strong>the</strong> work <strong>of</strong> Shin began to put diagrams on a different standing<br />
by proving soundness and completeness results for <strong>the</strong> Venn-II reasoning system, an extension<br />
and formalisation <strong>of</strong> earlier work by Venn and Peirce (Shin, 1994). Stapleton<br />
provides a summary <strong>of</strong> <strong>the</strong> history <strong>of</strong> diagrammatic reasoning since <strong>the</strong>n, which is now<br />
a rapidly evolving and active research area (Stapleton, 2007). What makes such logics<br />
interesting, given <strong>the</strong> existence <strong>of</strong> mature symbolic reasoning techniques, is <strong>the</strong> combination<br />
<strong>of</strong> formal reasoning with <strong>the</strong> compact and intuitive nature <strong>of</strong> diagrams referred<br />
to previously. We expect that this, and <strong>the</strong> efforts to create supporting tools, will make<br />
formal reasoning more accessible to non-logicians.<br />
An Euler diagram is a collection <strong>of</strong> closed curves called contours which represent sets,<br />
within an enclosing rectangle. Figure 2 shows an example with three contours, labelled<br />
A, B and C. Containment, intersection and disjointness are represented by <strong>the</strong> placement<br />
<strong>of</strong> contours, so <strong>the</strong> same diagram asserts C ⊆ A and B ∩ C = ∅. A zone is a set <strong>of</strong><br />
points in <strong>the</strong> diagram that can be described as being inside certain contours and outside<br />
all o<strong>the</strong>rs. The diagram in figure 2 has five zones; one inside A but outside B and C, one<br />
49
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Figure 2: An Euler diagram.<br />
inside A and C but outside B, and so forth. The region outside <strong>of</strong> all contours is also a<br />
zone. Shading within a zone asserts <strong>the</strong> emptiness <strong>of</strong> <strong>the</strong> set represented by that zone. So,<br />
<strong>the</strong> shading <strong>of</strong> <strong>the</strong> diagram in figure 2 asserts A ∩ B = ∅ and A − C = ∅.<br />
Reasoning is carried out by <strong>the</strong> application <strong>of</strong> rules which transform one diagram into<br />
ano<strong>the</strong>r, such as Add Contour and Remove Shading; a sound and complete set is given in<br />
(Stapleton, Masth<strong>of</strong>f, Flower, Fish and Sou<strong>the</strong>rn, 2007). A pro<strong>of</strong> using Euler diagrams is<br />
formed by applying <strong>the</strong>se rules repeatedly to transform an initial diagram (<strong>the</strong> premise)<br />
into <strong>the</strong> target diagram (<strong>the</strong> conclusion); figure 3 shows a short example. The Add Shaded<br />
Zone rule is applied to transform d1 to d2. A new shaded zone can be added at any<br />
time since both a shaded zone and a missing zone assert <strong>the</strong> emptiness <strong>of</strong> <strong>the</strong> represented<br />
set; both d1 and d2 state that A and B are disjoint. The Add Contour rule is applied<br />
to transform d2 to d3. The new contour C intersects all existing zones without changing<br />
<strong>the</strong>ir shading. Since this operation introduces no new shading and <strong>the</strong> way that C is added<br />
ensures that no missing zones are created, d2 and d3 have <strong>the</strong> same meaning.<br />
Figure 3: An Euler diagram pro<strong>of</strong>.<br />
The diagrams are formalised using an abstract syntax. The abstraction <strong>of</strong> Euler diagrams<br />
that we present here is obtained from (Stapleton et al., 2007). Each zone is represented<br />
as a tuple <strong>of</strong> <strong>the</strong> set <strong>of</strong> labels <strong>of</strong> contours that <strong>the</strong> zone is inside and <strong>the</strong> set <strong>of</strong><br />
labels <strong>of</strong> contours <strong>the</strong> zone is outside. For example, in diagram d1, figure 3, <strong>the</strong> only zone<br />
inside A has <strong>the</strong> abstraction ({A}, {B}). Diagrams are represented as a tuple <strong>of</strong> <strong>the</strong> set<br />
<strong>of</strong> labels (L), <strong>the</strong> set <strong>of</strong> zones (Z) and <strong>the</strong> set <strong>of</strong> shaded zones (Z ∗ ). Thus, diagram d2 in<br />
figure 3 has abstraction:<br />
〈L = {A, B}, Z = {({A}, {B}), ({B}, {A}), ({A, B}, ∅), (∅, {A, B})}, Z ∗ = {({A, B}, ∅)}〉<br />
There are a number <strong>of</strong> logics that extend this system <strong>of</strong> Euler diagrams, including spider<br />
diagrams (Howse, Stapleton and Taylor, 2005) and <strong>the</strong> constraint diagrams mentioned<br />
50
previously.<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
3 Dependent Typing and Pro<strong>of</strong>-Carrying Code<br />
The Curry-Howard Isomorphism has a long history and arises from <strong>the</strong> observation <strong>of</strong> a<br />
correspondence between Hilbert-style deductive logic and combinatory models <strong>of</strong> computation.<br />
The work <strong>of</strong> Martin-Löf cast it as a more general principle linking logical formalisms<br />
and <strong>the</strong> type systems <strong>of</strong> programming languages (Martin-Löf, 1984). Ra<strong>the</strong>r<br />
than classifying values, types can be viewed as propositions; a value inhabiting type T<br />
corresponds to a pro<strong>of</strong> <strong>of</strong> T. Martin-Löf’s type <strong>the</strong>ory can be used as an environment for<br />
programming with dependent types (Nordstrom, Petersson and Smith, 1990). Dependent<br />
type systems are so-called because types may depend on a value, such as List a n, <strong>the</strong><br />
type <strong>of</strong> collections <strong>of</strong> elements <strong>of</strong> type a with length n. For different values <strong>of</strong> n we have<br />
different types. A sketch <strong>of</strong> <strong>the</strong> logical rules for type-safe list operations is given as type<br />
judgements below. We assume <strong>the</strong> types Nat (<strong>of</strong> Peano numbers with constructors Zero<br />
and Succ n) and List a n. Γ is a typing context and Γ ⊢ σ type means that σ is a type in<br />
Γ.<br />
Γ ⊢ Nat type Γ ⊢ Zero : Nat<br />
Γ ⊢ n : Nat<br />
Γ ⊢ Succ n : Nat<br />
Γ ⊢ t type<br />
Γ ⊢ empty t : List t Zero<br />
Γ ⊢ t type Γ ⊢ x : t Γ ⊢ n : Nat Γ ⊢ l : List t n<br />
Γ ⊢ cons x l : List t (Succ n)<br />
Γ ⊢ t type Γ ⊢ n : Nat Γ ⊢ l : List t (Succ n)<br />
Γ ⊢ tail l : List t n<br />
Γ ⊢ t type Γ, n : Nat ⊢ l : List t (Succ n)<br />
Γ ⊢ head l : t<br />
Dependent type <strong>the</strong>ory makes Curry-Howard (or propositions-as-types) useful in practical<br />
ways. The resulting type systems form <strong>the</strong> basis <strong>of</strong> automated <strong>the</strong>orem provers<br />
(Bertot and Casteran, 2004) and, on <strong>the</strong> o<strong>the</strong>r hand, purely functional and total programming<br />
languages (Altenkirch et al., 2005). The same insights inform more widely used<br />
languages at an accelerating rate, especially Haskell, which plays <strong>the</strong> dual rôle <strong>of</strong> research<br />
language and practical tool. The type system <strong>of</strong> Haskell with extensions is flexible<br />
enough to emulate many aspects <strong>of</strong> dependent typing and to create programs whose types<br />
act as pro<strong>of</strong> that <strong>the</strong>ir implementation conforms to <strong>the</strong>ir specification.<br />
4 Haskell and <strong>the</strong> DSEL for Euler Diagrams<br />
Programming our diagrammatic DSEL is at <strong>the</strong> prototype stage. Its foundation is a typelevel<br />
Set library which encodes and ensures constraints such as set membership, disjointness<br />
and so on. Above this will sit <strong>the</strong> implementation <strong>of</strong> several diagrammatic logics.<br />
Two diagrammatic transformations corresponding to inference rules in an Euler diagram<br />
system are presented as a type judgements below.<br />
51
In a language such as Haskell we may not mix types and terms in <strong>the</strong> way described in<br />
Section 3. The collection <strong>of</strong> techniques used to achieve something <strong>of</strong>ten called “pseudodependent<br />
typing” includes type-level representations <strong>of</strong> <strong>the</strong> indexing term supplied to <strong>the</strong><br />
type constructor; to use <strong>the</strong> example from Section 3, since we have no type-level numbers<br />
we represent n in List a n by types formed <strong>of</strong> <strong>the</strong> empty Haskell type constructors Z and<br />
Succ n, such as Succ (Succ Z ).<br />
It is important to distinguish type-level from term-level computations. In <strong>the</strong> termlevel<br />
<strong>of</strong> a programming language with partial functions <strong>the</strong> result <strong>of</strong> any function may be<br />
undefined (⊥), and so programs are not pro<strong>of</strong>s. Type functions like union below are not<br />
functions over values, are defined extensionally and exclude <strong>the</strong> undefined. The DSEL<br />
is comprised <strong>of</strong> two main components, <strong>the</strong> domain specific, dependently typed <strong>the</strong>ory<br />
<strong>of</strong> diagrammatic reasoning, which provides assurances about <strong>the</strong> correct formation <strong>of</strong><br />
diagrams and application <strong>of</strong> reasoning rules, and <strong>the</strong> interactive front end which makes use<br />
<strong>of</strong> this type system and is subject to <strong>the</strong> usual limitations <strong>of</strong> <strong>the</strong> host language. Although<br />
we do not use a dependently typed host language, our approach is similar in spirit to<br />
(Oury and Swierstra, 2008) who use Agda to enforce sophisticated constraints statically<br />
in a series <strong>of</strong> DSELs.<br />
Since type-level values are distinct from terms, special measures are required to handle<br />
<strong>the</strong>m at runtime. We use a combination <strong>of</strong> techniques involving empty and existential<br />
types (Peyton Jones, 2008) to do this. As an example <strong>of</strong> our strategy, <strong>the</strong> types A, B and<br />
C below are empty types used to represent <strong>the</strong> labels <strong>of</strong> contours in a diagram:<br />
data A ; data B ; data C data Nil<br />
data L a where data t ⊲ ts<br />
AL :: L A<br />
BL :: L B<br />
The type L a lifts labels into a more general type, allowing us to consider labels <strong>of</strong> any<br />
type. The type constructors Nil and ⊲ are used as <strong>the</strong> building blocks <strong>of</strong> sets <strong>of</strong> labels.<br />
LBox and LSetBox use “existential boxing” to wrap type-level values <strong>of</strong> LSet t, allowing<br />
us to handle <strong>the</strong> outer type at runtime but for <strong>the</strong> “boxed” value to remain available for<br />
inspection by constraints:<br />
data LSet t where data LBox = ∀a. LBox (L a)<br />
Empty :: LSet Nil data LSetBox = ∀t. LSetBox (LSet t)<br />
Ins :: L a → LSet t → LSet (a ⊲ t)<br />
By creating a function fromChar :: Char → LBox we can box runtime values and insert<br />
<strong>the</strong>m into boxed sets with a function that calls on fromChar, insertChar :: Char →<br />
LSetBox → LSetBox. When insertChar is used to add elements to a set <strong>of</strong> type<br />
LSetBox, a correspondence is enforced between <strong>the</strong> collection <strong>of</strong> values and <strong>the</strong> type<br />
<strong>of</strong> its LSet t parameter. The value <strong>of</strong> a collection can be seen as fully determined by<br />
<strong>the</strong> type <strong>of</strong> this parameter, which is a pro<strong>of</strong> ensuring that inserted elements are members<br />
<strong>of</strong> <strong>the</strong> resulting collection. Assurances for <strong>the</strong> semantics <strong>of</strong> sets may be encoded using<br />
constraints written using Indexed Type Families (Peyton Jones, 2008).<br />
41 Judgement Rules<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
We model <strong>the</strong> tuples found in <strong>the</strong> Euler diagram abstraction with <strong>the</strong> types Z l1 l2 (zones)<br />
and D l z z ∗ (diagrams). The type judgements below are a fragment <strong>of</strong> a self-contained<br />
52
type <strong>the</strong>ory <strong>of</strong> Euler diagrams based on <strong>the</strong> abstract syntax given in (Stapleton et al.,<br />
2007). Once complete, this type <strong>the</strong>ory will be implemented using <strong>the</strong> techniques in <strong>the</strong><br />
previous section to produce a DSEL with enhanced static constraints.<br />
Two kinds <strong>of</strong> element appear in <strong>the</strong> judgement rules: typing judgements, e.g. x : a<br />
and type constraints, e.g. Γ ⊢ C x y type, meaning that <strong>the</strong> type C can be formed in<br />
<strong>the</strong> context Γ. Type constraints presumed to be defined in <strong>the</strong> Set library such as Disjoint<br />
appear capitalised, while functions from types to types are in lower case, e.g. union.<br />
We use <strong>the</strong> constraints Label, LabelSet, Zone and ZoneSet to restrict <strong>the</strong> input to type<br />
constructors.<br />
Supplied with disjoint sets <strong>of</strong> labels, l1 and l2 , Z constructs a zone:<br />
Γ ⊢ LabelSet l1 type Γ ⊢ LabelSet l2 type Γ ⊢ Disjoint l1 l2 type<br />
Γ ⊢ Z l1 l2 type<br />
The syntactic rules state that given a diagram D l z z ∗ , <strong>the</strong> zones z form a superset <strong>of</strong><br />
<strong>the</strong> shaded zones z ∗ . Also, for each zone Z l1 l2 in z and z ∗ , l1 ∪ l2 forms a partition over<br />
l. The Invs rule applies <strong>the</strong>se constraints to a diagram:<br />
Γ ⊢ Invs l z type Γ ⊢ Invs l z ∗ type Γ ⊢ Subset z ∗ z type<br />
Γ ⊢ D l z z ∗ type<br />
The Inv rule applies <strong>the</strong> relevant constraint to an individual zone. The base case for<br />
applying Inv is:<br />
Γ ⊢ Label l type<br />
Γ ⊢ Invs l Nil type<br />
The inductive case for applying Inv is:<br />
Γ ⊢ Label l type Γ ⊢ ZoneSet (z ⊲ zs) type Γ ⊢ Inv l z type Γ ⊢ Invs l zs type<br />
Γ ⊢ Invs l (z ⊲ zs) type<br />
Since l1 ∩ l2 = ∅, <strong>the</strong>y partition l if l1 ∪ l2 = l:<br />
Γ ⊢ Z l1 l2 type Γ ⊢ u : union l1 l2 Γ ⊢ LabelSet ls type Γ ⊢ Eq l u type<br />
Γ ⊢ Inv ls (Z l1 l2) type<br />
The quotes that begin <strong>the</strong> following subsections are from (Stapleton et al., 2007) from<br />
which we take reasoning rules and translate <strong>the</strong>m to typing judgements. The invariants are<br />
not tested after applying <strong>the</strong> rules since previous judgements guarantee that if a diagram<br />
can be formed, <strong>the</strong> invariants have been met.<br />
411 Remove Shaded Zone<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
“A shaded zone can be removed but only if <strong>the</strong>re is at least one zone inside each contour<br />
in <strong>the</strong> resulting diagram and <strong>the</strong> zone outside all <strong>the</strong> contours remains”. In figure 4, <strong>the</strong><br />
Remove Shaded Zone rule can be applied to transform d1 into d2.<br />
Γ ⊢ Zone x type<br />
Γ ⊢ D l z z ∗ type Γ ⊢ z ′ : delete x z<br />
Γ ⊢ z ∗′ : delete x z ∗ Γ ⊢ Member x z ∗ type<br />
Γ ⊢ transform RemoveShadedZone x (D l z z ∗ ) : (D l z ′ z ∗′ )<br />
53
412 Add Contour<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Figure 4: Three Euler diagrams.<br />
“A contour can be added to a diagram provided its label is not already in <strong>the</strong> diagram. Each<br />
zone is split into two zones (one inside and one outside <strong>the</strong> new contour), and shading is<br />
preserved”. In figure 4 <strong>the</strong> Add Contour rule can be applied to transform d1 into d3.<br />
Before we can add contours we need a way <strong>of</strong> replacing all zones z : Z l1 l2 in a set<br />
with two copies <strong>of</strong> itself, one with an extra label added to l1, one with that same label<br />
added to l2.<br />
Γ ⊢ Label c type<br />
Γ ⊢ splitZones c Nil : Nil<br />
Γ ⊢ ZoneSet (z ⊲ zs) type Γ ⊢ Label c type<br />
Γ ⊢ z2 : insertLabel Excl c z Γ ⊢ z1 : insertLabel Incl c z<br />
Γ ⊢ splitZones c (z ⊲ zs) : (z1 ⊲ z2 ⊲ (splitZones c zs))<br />
Γ ⊢ Z l1 l2 type Γ ⊢ Label c type Γ ⊢ l3 : c ⊲ l1<br />
Γ ⊢ insertLabel Incl c (Z l1 l2) : (Z l3 l2)<br />
Γ ⊢ Z l1 l2 type Γ ⊢ Label c type Γ ⊢ l3 : c ⊲ l2<br />
Γ ⊢ insertLabel Excl c (Z l1 l2) : (Z l1 l3)<br />
Γ ⊢ Label c type<br />
Γ ⊢ D l z z ∗ type Γ ⊢ l ′ : c ⊲ l<br />
Γ ⊢ z ′ : splitZones c z Γ ⊢ z ∗′ : splitZones c z ∗<br />
Γ ⊢ transform AddContour c (D l z z ∗ ) : (D l ′ z ′ z ∗′ )<br />
5 Conclusions and Fur<strong>the</strong>r Work<br />
We have presented part <strong>of</strong> a DSEL for Euler diagrams that closely mirrors <strong>the</strong>ir abstract<br />
syntax and which allows us to inherit <strong>the</strong> definitions <strong>of</strong> reasoning rules in a seamless<br />
way. We have extended <strong>the</strong> approach <strong>of</strong> section 41 to a complete set <strong>of</strong> reasoning rules,<br />
providing a type <strong>the</strong>oretical version <strong>of</strong> Euler diagrams. Providing a self-contained type<br />
<strong>the</strong>ory for <strong>the</strong> DSEL (beginning with <strong>the</strong> simplest case <strong>of</strong> a set <strong>of</strong> rules for reasoning with<br />
Euler diagrams and extending this to more complex cases) will make results relating to<br />
<strong>the</strong> logics (soundness, completeness, etc.) transferable, giving <strong>the</strong> DSEL <strong>the</strong> status <strong>of</strong> a<br />
reasoning tool in its own right.<br />
Our goal is to extend <strong>the</strong> current approach to more expressive notations, such as generalized<br />
constraint diagrams, which are expressive enough to be used when modelling<br />
54
s<strong>of</strong>tware (Stapleton and Delaney, 2008). It is ultimately expected that <strong>the</strong> DSEL will be<br />
used by higher level tools which allow <strong>the</strong> user to select from contextually legitimate diagram<br />
transformations. Diagrams created using <strong>the</strong> DSEL (with or without <strong>the</strong> support <strong>of</strong><br />
additional tools) will have a type which captures <strong>the</strong> modelled constraint. If <strong>the</strong> modelled<br />
s<strong>of</strong>tware is written in <strong>the</strong> same language as <strong>the</strong> constraint and <strong>the</strong>re is a correspondence<br />
between <strong>the</strong> datatypes used in each, we may be able to use <strong>the</strong> constraint as part <strong>of</strong> a<br />
“trusted kernel” exporting a safe subset <strong>of</strong> constructors via <strong>the</strong> module system. This scenario,<br />
in which <strong>the</strong> programmer uses tools to model constraints <strong>the</strong>n applies <strong>the</strong>m directly<br />
within <strong>the</strong> implementation phase, will provide a more unified and, ideally, a more usable<br />
programming/verification environment than exists today.<br />
Combining types with terms requires careful design. Some <strong>of</strong> <strong>the</strong> solutions, such as<br />
existential boxing, introduce levels <strong>of</strong> indirection which are unnecessary in more specialised<br />
environments and which may threaten to obscure <strong>the</strong> relationship with underlying<br />
diagrammatic logics, at least superficially. If we were to use a language such as<br />
Coq or Epigram to implement <strong>the</strong> DSEL it is possible that we could find a more natural<br />
expression <strong>of</strong> many types and constraints. We believe however, given our central aim <strong>of</strong><br />
accessibility, that <strong>the</strong>se risks are <strong>of</strong>fset by <strong>the</strong> benefits <strong>of</strong> using a more practical and accessible<br />
language than is available in <strong>the</strong> current generation <strong>of</strong> dependently typed systems.<br />
The limitations <strong>of</strong> <strong>the</strong>se techniques and how <strong>the</strong>y might be used to form a general strategy<br />
to combine verification and programming are some <strong>of</strong> <strong>the</strong> subjects <strong>of</strong> <strong>the</strong> research. The<br />
research will support <strong>the</strong> longer term goals <strong>of</strong> <strong>the</strong> diagrammatic reasoning community by<br />
providing an implementation <strong>of</strong> various visual logics which can be clearly linked to <strong>the</strong>ir<br />
related abstract syntax. Once extended to <strong>the</strong> case <strong>of</strong> constraint diagrams, <strong>the</strong> DSEL has<br />
<strong>the</strong> potential to shrink <strong>the</strong> toolchain used by programmers who wish to make statements<br />
about <strong>the</strong> semantic properties <strong>of</strong> <strong>the</strong> code <strong>the</strong>y write. There are a number <strong>of</strong> interesting<br />
challenges involved in reaching that point, such as <strong>the</strong> issue <strong>of</strong> extracting <strong>the</strong> type <strong>of</strong> a<br />
diagram in a usable form. The work reported in this paper is a first step towards achieving<br />
<strong>the</strong>se goals.<br />
Acknowledgements<br />
I would like to express my sincere thanks to John Howse, Gem Stapleton and Richard<br />
Bosworth for <strong>the</strong>ir support and encouragement, and to <strong>the</strong> anonymous reviewers for <strong>the</strong>ir<br />
helpful comments. The author is supported by EPSRC Grant EP/P50<strong>13</strong>18/1.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Altenkirch, T., Mcbride, C. and Mckinna, J. (2005). Why dependent types matter, Available<br />
online http://www.cs.nott.ac.uk/˜txa/publ/ydtm.pdf Accessed 01/02/08.<br />
Bertot, Y. and Casteran, P. (2004). Interactive Theorem Proving and Program Development,<br />
SpringerVerlag.<br />
Gurr, C. and Tourlas, K. (2000). Towards <strong>the</strong> principled design <strong>of</strong> s<strong>of</strong>tware engineering<br />
diagrams, <strong>Proceedings</strong> <strong>of</strong> 22nd International Conference on S<strong>of</strong>tware Engineering,<br />
ACM Press, pp. 509–518.<br />
55
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Hammer, E. (1995). Logic and Visual Information, CSLI, Stanford.<br />
Howse, J., Stapleton, G. and Taylor (2005). Spider diagrams, LMS Journal <strong>of</strong> Computation<br />
and Ma<strong>the</strong>matics 8: 145–194.<br />
Kent, S. (1997). Constraint diagrams: Visualizing invariants in object oriented modelling,<br />
<strong>Proceedings</strong> <strong>of</strong> OOPSLA97, ACM Press, pp. 327–341.<br />
Kiselyov, O. and Shan, C.-C. (2007). Lightweight static capabilities, Electronic Notes in<br />
Theoretical Computer Science 174(7): 79–104.<br />
Martin-Löf, P. (1984). Constructive ma<strong>the</strong>matics and computer programming, Royal Society<br />
<strong>of</strong> London Philosophical Transactions Series A pp. 501–518.<br />
Nordstrom, B., Petersson, K. and Smith, J. M. (1990). Programming in Martin-Löf’s Type<br />
Theory, OUP.<br />
Oury, N. and Swierstra, W. (2008). The power <strong>of</strong> pi, Submitted to ICFP 2008.<br />
Available online http://www.cs.nott.ac.uk/˜wss/Publications/ThePowerOfPi.pdf Accessed<br />
01/05/08.<br />
Peyton Jones, S. (2008). Ghc language features, Accessed 01/02/08<br />
http://www.haskell.org/ghc/docs/latest/html/users guide/ghc-languagefeatures.html.<br />
Sheard, T. (2004). Languages <strong>of</strong> <strong>the</strong> future, SIGPLAN Notices 39(12): 119–<strong>13</strong>2.<br />
Shimojima, A. (2004). Inferential and expressive capacities <strong>of</strong> graphical representations:<br />
Survey and some generalizations, <strong>Proceedings</strong> <strong>of</strong> Diagrams 2004, Vol. 2980<br />
<strong>of</strong> LNAI, Springer, pp. 18–21.<br />
Shin, S. J. (1994). The Logical Status <strong>of</strong> Diagrams, CUP.<br />
Stapleton, G. (2007). Diagrammatic logics: Past, present and future, International Conference<br />
on Logic, Navya Nyaya and Applications, Jadavpur University, pp. 4–15.<br />
Stapleton, G. and Delaney, A. (2008). Evaluating and generalizing constraint diagrams,<br />
Accepted for Journal <strong>of</strong> Visual Languages and Computing. Available online from<br />
JVLC.<br />
Stapleton, G., Masth<strong>of</strong>f, J., Flower, J., Fish, A. and Sou<strong>the</strong>rn, J. (2007). Automated <strong>the</strong>orem<br />
proving in Euler diagrams systems, Journal <strong>of</strong> Automated Reasoning 39: 431–<br />
470.<br />
56
FICTIONAL CONTINGENCIES<br />
Gemma Celestino<br />
University <strong>of</strong> British Columbia & LOGOS Research Group<br />
Abstract. I argue that fictional contingencies, such as <strong>the</strong> one that, in Tolstoy’s Anna Karenina,<br />
Anna Karenina might not have fallen for Vronsky pose a serious problem to a descriptivist<br />
and possible worlds view <strong>of</strong> fiction such as <strong>the</strong> one defended by David Lewis and<br />
Gregory Currie. Their view cannot account for <strong>the</strong> fact that in Tolstoys Anna Karenina, it<br />
is Anna Karenina herself who contingently falls for Vronsky. In Tolstoy’s Anna Karenina,<br />
Anna Karenina falls for Vronsky in <strong>the</strong> actual world but she fails to fall for him in some<br />
possible world.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
An interesting issue that arises within <strong>the</strong> topic <strong>of</strong> fiction is <strong>the</strong> issue <strong>of</strong> how to account for<br />
<strong>the</strong> intuitive contingencies <strong>of</strong> fictional characters. For at least some <strong>of</strong> <strong>the</strong> things that occur<br />
to fictional characters within a story are supposed to happen only contingently. There is a<br />
way certain views on fiction could take to account for <strong>the</strong>se modal properties <strong>of</strong> fictional<br />
characters that I think is mistaken and I shall argue why in this paper.<br />
Gregory Currie recently advanced such an account in his “Characters and Contingency”<br />
(2003). But his account is one that must be attractive to any follower <strong>of</strong> <strong>the</strong><br />
Lewis-Currie descriptivist view <strong>of</strong> fictional names, or <strong>of</strong> what I take would be a natural<br />
two-dimensionalist extension <strong>of</strong> Robert Stalnaker’s position on true negative existentials<br />
and related matters. The account, in fact, only makes sense within a possible worlds<br />
framework <strong>of</strong> fiction. In short, <strong>the</strong> descriptivist view is <strong>the</strong> view that fictional names,<br />
unlike ordinary proper names, are, or are used by <strong>the</strong> author <strong>of</strong> <strong>the</strong> fiction, as non-rigid<br />
definite descriptions.<br />
First, I shall explain <strong>the</strong> problem <strong>of</strong> fictional contingencies and argue that <strong>the</strong> explanation<br />
Currie <strong>of</strong>fered does not work. This is a real problem for <strong>the</strong> descriptivist view <strong>of</strong><br />
fiction and I will also argue that. Secondly, I shall consider o<strong>the</strong>r alternatives to descriptivism<br />
within <strong>the</strong> possible worlds framework to conclude that no possible worlds view <strong>of</strong><br />
fiction looks promising. Finally, I will end up with some positive suggestions that I would<br />
like to develop soon somewhere else.<br />
2 The Problem <strong>of</strong> Fictional Contingencies<br />
I shall motivate <strong>the</strong> problem I want to address in this paper by introducing <strong>the</strong> following<br />
pair <strong>of</strong> sentences:<br />
(1) Necessarily, someone who did not fall for Vronsky would not be Anna Karenina<br />
(2) Someone who necessarily fell for Vronsky would not be Anna Karenina<br />
Despite <strong>the</strong> apparent inconsistency between <strong>the</strong>se two claims, both seem intuitively<br />
true. (1) is true because anything that a fictional story tells about its characters is essential<br />
to <strong>the</strong>m. Tolstoy’s story about Anna Karenina tells us, among o<strong>the</strong>r things, that Anna<br />
57
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Karenina falls for Vronsky. Hence, unlike what happens to non-fictional people like you<br />
and me, and due to its fictionality, it is a constitutive feature <strong>of</strong> Anna Karenina that she<br />
falls for Vronsky. Thus, it is necessary that she does. (2) is true because Tolstoy’s story<br />
is not a story in which Anna Karenina cannot but fall for Vronsky, but a story in which<br />
Anna Karenina falls for Vronsky only contingently. Thus, anyone who necessarily fell<br />
for Vronsky, who fell for Vronsky not contingently, would not be Anna Karenina.<br />
The apparent incompatibility or tension between (1) and (2) cannot be explained in<br />
terms <strong>of</strong> <strong>the</strong> distinction between truth in fiction and truth simpliciter, or any o<strong>the</strong>r similar<br />
distinction. For both seem to be true in one and <strong>the</strong> same reading. None <strong>of</strong> <strong>the</strong>m is true<br />
in <strong>the</strong> fiction. Ra<strong>the</strong>r, <strong>the</strong>y are about <strong>the</strong> fictional character Anna Karenina. They specify<br />
some <strong>of</strong> its necessary qualities.<br />
3 The Descriptivist Way Out <strong>of</strong> <strong>the</strong> Problem<br />
The view I want to show wrong in this paper would accept <strong>the</strong> truth <strong>of</strong> both claims and<br />
would explain it as follows: Anna Karenina possibly exists. That is to say, even if –as<br />
we all agree– Anna Karenina does not actually exist, <strong>the</strong>re is some o<strong>the</strong>r possible world<br />
where she does. For to be Anna Karenina is simply to play <strong>the</strong> Anna Karenina-role and<br />
to play <strong>the</strong> Anna Karenina-role merely amounts to satisfy <strong>the</strong> general definite description<br />
that could be extracted out from <strong>the</strong> story told by Tolstoy, constructed out <strong>of</strong> everything<br />
Tolstoy says about Anna in <strong>the</strong> story he tells, which is <strong>the</strong> exact meaning <strong>of</strong> <strong>the</strong> fictional<br />
name ‘Anna Karenina’, at least as it is used by Tolstoy.<br />
On this view, what one does when telling a fiction is to tell a story, which although not<br />
actual, is possible. It is to qualitatively describe part <strong>of</strong> some possible worlds o<strong>the</strong>r than<br />
<strong>the</strong> actual. It is to explain some ways <strong>the</strong> actual world might have been but is not. Thus,<br />
<strong>the</strong> view is that Anna Karenina could have existed and fallen for Vronsky even if in fact<br />
this never occurred and will never do in actuality. That Anna Karenina falls for Vronsky<br />
is as possible as my turning <strong>of</strong>f my laptop in a moment.<br />
What would explain <strong>the</strong> truth <strong>of</strong> (1), according to this view, is <strong>the</strong> fact that <strong>the</strong>re is<br />
no possible world where someone plays <strong>the</strong> role <strong>of</strong> Anna Karenina but does not fall for<br />
Vronsky. This is so precisely for part <strong>of</strong> what it means to play this role is to fall for<br />
Vronsky. Thus, it is true in every world that anyone who plays <strong>the</strong> Anna Karenina-role in<br />
that world falls for Vronsky.<br />
Never<strong>the</strong>less, (2) would be true as well because for every person who plays <strong>the</strong> Annarole<br />
in some possible world, <strong>the</strong>re is at least one more world where that same person does<br />
not fall for Vronsky, i.e. a world where she does not play <strong>the</strong> role <strong>of</strong> Anna Karenina (This<br />
would be so because it is impossible to necessarily fall in love). The existence <strong>of</strong> <strong>the</strong>se<br />
o<strong>the</strong>r possible worlds is what would explain <strong>the</strong> contingency <strong>of</strong> <strong>the</strong> falling for Vronsky<br />
by Anna Karenina. In “Characters and Contingency”, Currie advances such an account <strong>of</strong><br />
<strong>the</strong> truth <strong>of</strong> (1) and (2).<br />
The reason why <strong>the</strong> explanation provided above does not work is that it does not explain<br />
what it has to explain, that is, <strong>the</strong> fact that in <strong>the</strong> fiction, Anna Karenina has <strong>the</strong><br />
property <strong>of</strong> falling for Vronsky but only contingently so. This amounts to <strong>the</strong> fact that<br />
Anna Karenina herself must have <strong>the</strong> property in every story-world –i.e. where <strong>the</strong> Anna<br />
Karenina-role is satisfied–, but at <strong>the</strong> same time she (Anna Karenina and no one else)<br />
must fail to have that property while being Anna Karenina at some world, which must<br />
58
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
be possible with respect to <strong>the</strong> story-world. But it is Anna Karenina herself who must<br />
have <strong>the</strong> contingent property at one world and lack it at ano<strong>the</strong>r. This is what contingency<br />
means. O<strong>the</strong>rwise, it is not true that Anna Karenina falls for Vronsky in a contingent way,<br />
but that someone else does. The problem is that <strong>the</strong> only way for this view to try to explain<br />
that contingency is by appealing to <strong>the</strong> possible worlds –which are not story-worlds–<br />
where <strong>the</strong> possible persons that, on this view, occupy <strong>the</strong> Anna Karenina-role, and thus<br />
are Anna, in some story-worlds, do not fall for Vronsky and, <strong>the</strong>reby, nei<strong>the</strong>r occupy <strong>the</strong><br />
Anna Karenina-role nor are Anna in <strong>the</strong>m.<br />
I see no way a possible worlds descriptivist view can handle this problem. However, I<br />
can see how one might reply. But <strong>the</strong> replies I envisage seem to be wrong as well.<br />
One might find <strong>the</strong> possible worlds explanation <strong>of</strong> fictional contingencies plausible<br />
and be easily misled into thinking that it is in fact right merely due to a natural tendency<br />
to forget what this possible worlds view tells us being Anna Karenina consists in and,<br />
as a result, come to have <strong>the</strong> following confused thought: that this person who does not<br />
fall for Vronsky in some world in which she does not occupy <strong>the</strong> Anna Karenina-role is,<br />
never<strong>the</strong>less, Anna Karenina also in such a world due to <strong>the</strong> fact that she is Anna Karenina<br />
in one <strong>of</strong> <strong>the</strong> worlds <strong>of</strong> <strong>the</strong> story, where she does occupy <strong>the</strong> Anna Karenina-role and does<br />
fall for Vronsky. But to evaluate this possible worlds view under this impression is to<br />
misunderstand what <strong>the</strong> view (at least, about being Anna Karenina) is.<br />
If that o<strong>the</strong>r person were to be Anna Karenina in any sense also in this o<strong>the</strong>r world<br />
where she does not fall for Vronsky, (1) would not be true. It would not be a necessary<br />
condition for being Anna Karenina to fell for Vronsky, for <strong>the</strong>re would be some possible<br />
worlds where Anna Karenina would not fall for him. These would precisely be <strong>the</strong> worlds<br />
where someone who occupies <strong>the</strong> Anna-role in one <strong>of</strong> <strong>the</strong> story-worlds exists and does<br />
not fall for Vronsky. As I argued above, however, <strong>the</strong>re is no such sense for <strong>the</strong> case <strong>of</strong><br />
being Anna Karenina. To think <strong>of</strong> that person, let’s say Jane, as being Anna Karenina also<br />
in that o<strong>the</strong>r world where she does not fall for Vronsky only because she does occupy <strong>the</strong><br />
Anna Karenina-role at some world, it is to mistake what being Anna Karenina is, on such<br />
a view, for what being Jane (or, in fact, any o<strong>the</strong>r real person) is. Currie explains this as<br />
follows: “Now consider Jane, a respectable inhabitant <strong>of</strong> <strong>the</strong> actual world. In <strong>the</strong> actual<br />
world she does not fall for Vronsky; in fact she never meets him. But, given what I have<br />
said just now, it may well be <strong>the</strong> case that Jane in some o<strong>the</strong>r world does fall for Vronsky;<br />
in that o<strong>the</strong>r world, Jane occupies <strong>the</strong> Anna-role. Does that make Jane, in this world,<br />
Anna Karenina? No. Being Anna is, according to me, something that happens to you in<br />
some worlds and not in o<strong>the</strong>rs. It happens to you in worlds where you occupy <strong>the</strong> Anna<br />
role. In any world in which Jane occupies that role she is Anna. But that does not make<br />
her Anna in this world. Being Anna is not at all like being Jane. The person who is Jane in<br />
one world is Jane in all worlds. Being Jane is a matter <strong>of</strong> being a certain individual; being<br />
Anna, on <strong>the</strong> o<strong>the</strong>r hand, is a matter <strong>of</strong> occupying a certain role. Moving up a semantic<br />
step we can say that “Jane” is a proper name <strong>of</strong> an individual, whereas “Anna”, where it is<br />
<strong>the</strong> proper name <strong>of</strong> anything, is <strong>the</strong> proper name <strong>of</strong> a function from worlds to individuals.<br />
Of course when Tolstoy says that Anna did this or that, we are not from <strong>the</strong> point <strong>of</strong> view<br />
<strong>of</strong> our imaginative engagement with <strong>the</strong> work, to understand this as meaning that a role<br />
did this or that. This is because it is part <strong>of</strong> <strong>the</strong> fiction that “Anna” is <strong>the</strong> name <strong>of</strong> a person.<br />
But “Anna”, as used by Tolstoy, is not in fact <strong>the</strong> name <strong>of</strong> a person, nor does it purport to<br />
be. Names are expressions used in order to pick out individuals, and Tolstoy does not use<br />
59
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
“Anna” in order to do this, nor does he expect us to believe that he is. “Anna”, as used by<br />
Tolstoy, is not a name.” (Currie 2003, p. 141)<br />
On <strong>the</strong> o<strong>the</strong>r hand, one might also contemplate <strong>the</strong> possibility <strong>of</strong> <strong>the</strong> fictional characters<br />
enjoying <strong>of</strong> a certain autonomy with respect to <strong>the</strong>ir stories in such a way that one could<br />
say that Tolstoy’s Anna Karenina could have had a different end, for instance. The idea<br />
being that <strong>the</strong> characters would be well defined since <strong>the</strong> very beginning <strong>of</strong> <strong>the</strong> fiction -this<br />
opening possibilities for <strong>the</strong>ir fate o<strong>the</strong>r than <strong>the</strong> ones that <strong>the</strong> author chose. Considering<br />
this, one might think that <strong>the</strong> contingency <strong>of</strong> <strong>the</strong> properties <strong>of</strong> <strong>the</strong> characters could be<br />
reduced to <strong>the</strong> contingency <strong>of</strong> <strong>the</strong> writing process itself. Anna Karenina, for instance,<br />
might not have fallen for Vronsky precisely because Tolstoy might not have written that<br />
she did. However, this possibility would not save Currie’s explanation <strong>of</strong> <strong>the</strong> fictional<br />
contingency, or descriptivism <strong>of</strong> fictional names, since it is a whole different explanation<br />
not compatible with <strong>the</strong>m. But also one could see that it would not work by considering<br />
<strong>the</strong> fact that one can write a fiction where characters have certain properties necessarily,<br />
and notwithstanding this, <strong>the</strong> contingency <strong>of</strong> <strong>the</strong> writing process remains; <strong>the</strong> author could<br />
have written a different story or this story a bit different.<br />
The conclusions I think we should draw from all <strong>of</strong> this go far<strong>the</strong>r than <strong>the</strong> mere conclusion<br />
that <strong>the</strong> explanation <strong>of</strong> fictional contingencies I criticized is wrong and should be<br />
rejected. This problem that fictional contingencies pose and <strong>the</strong> incorrectness <strong>of</strong> this explanation<br />
indicate a deeper or more fundamental problem. It really shows why at least any<br />
descriptivist view that tries to explain fiction in terms <strong>of</strong> possible worlds –which seems to<br />
be <strong>the</strong>ir only way– is mistaken, and maybe it even shows that fiction cannot be accounted<br />
for in possible worlds terms at all; at least, for <strong>the</strong> case <strong>of</strong> fictions told by <strong>the</strong> use <strong>of</strong><br />
singular terms such as proper names. In short, <strong>the</strong> problem is that this possible worlds<br />
descriptivist view cannot explain <strong>the</strong> truth <strong>of</strong> pairs like (1) and (2). For, in particular, it<br />
cannot explain <strong>the</strong> possession <strong>of</strong> any fictional contingency by any fictional character.<br />
4 O<strong>the</strong>r Possible Descriptivist Ways Out<br />
If <strong>the</strong> possible worlds view has it that being Anna Karenina amounts to satisfy <strong>the</strong> nonrigid<br />
definite description, which has as a part <strong>the</strong> description <strong>of</strong> this woman as falling<br />
for Vronsky, it will not succeed in explaining that Anna Karenina falls for Vronsky only<br />
contingently. For <strong>the</strong> simple reason that any woman who would be Anna Karenina at<br />
all would be so only in some worlds and precisely in those worlds where she falls for<br />
Vronsky. One might think, even against what Currie seems to insist, that <strong>the</strong>re are two<br />
ways <strong>of</strong> being Anna Karenina, though: one <strong>of</strong> <strong>the</strong>m, <strong>the</strong> one we already contemplated<br />
and <strong>the</strong> one that Currie tells us; <strong>the</strong> o<strong>the</strong>r, <strong>the</strong> one that <strong>the</strong> possible worlds view would<br />
like to have, while keeping <strong>the</strong> previous one, which is to be someone who at some storyworld<br />
satisfies <strong>the</strong> description that ‘Anna Karenina is or conveys, even if she does not do<br />
so at some o<strong>the</strong>r possible worlds. In this sense anyone who met <strong>the</strong> description at some<br />
possible world, would be also Anna Karenina at all <strong>the</strong> o<strong>the</strong>r worlds where she existed<br />
even if she did not meet <strong>the</strong> description in <strong>the</strong>m. This last sense does not seem to be<br />
compatible with <strong>the</strong> view that claims that ‘Anna Karenina is used as a non-rigid definite<br />
description, and that when it is not, when it is used literally, does not refer at all. But lets<br />
assume for a moment it is for <strong>the</strong> sake <strong>of</strong> <strong>the</strong> argument.<br />
This way <strong>the</strong>re would be two ways <strong>of</strong> understanding <strong>the</strong> relevant pair <strong>of</strong> claims. Ac-<br />
60
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
cording to <strong>the</strong> interpretation corresponding to <strong>the</strong> first sense <strong>of</strong> being Anna Karenina,<br />
(1) would be true but (2) false. And according to <strong>the</strong> interpretation corresponding to <strong>the</strong><br />
second sense, while (2) would be true, (1) would be false. In none <strong>of</strong> <strong>the</strong>se two interpretations,<br />
one gets that both claims are true. Intuitively at least, however, <strong>the</strong>y seem to<br />
be true under one and <strong>the</strong> same interpretation. Both claims are about <strong>the</strong> features that<br />
characterize a fictional character, Anna Karenina. One <strong>of</strong> <strong>the</strong>se features is to be someone<br />
who falls for Vronsky; ano<strong>the</strong>r, to be someone who falls for Vronsky in a contingent way.<br />
One might think, though, that <strong>the</strong> intuitive truth <strong>of</strong> <strong>the</strong>se two claims may be very well<br />
accounted for by considering a different interpretation <strong>of</strong> <strong>the</strong>m in each case. However,<br />
<strong>the</strong>re is no independent reason to interpret <strong>the</strong>m this differently. This does not seem to be<br />
why we think <strong>the</strong>y are both true. This way out <strong>of</strong> <strong>the</strong> problem fictional contingencies pose<br />
to this view would be completely ad hoc.<br />
In any case, <strong>the</strong>re is no way on such a view to obtain what <strong>the</strong> view really needs. That<br />
is, that Anna Karenina, one and <strong>the</strong> same thing, has <strong>the</strong> property <strong>of</strong> falling for Vronsky,<br />
but lacks it at ano<strong>the</strong>r possible world. For it is a condition on being Anna Karenina that<br />
she does so contingently. This is what having a contingent property amounts to. Note that<br />
<strong>the</strong> independent reason to argue for <strong>the</strong> legitimacy <strong>of</strong> using two different interpretations<br />
cannot be that ‘Anna Karenina can be used both as a non-rigid definite description and as<br />
a rigid proper name and that while it is used as a non-rigid definite description in <strong>the</strong> case<br />
<strong>of</strong> (1), it is used as a rigid proper name in <strong>the</strong> case <strong>of</strong> (2). For, according to <strong>the</strong> possible<br />
worlds view, only within <strong>the</strong> fiction, ‘Anna Karenina is or comes to be used as an ordinary<br />
rigid proper name. We cannot use <strong>the</strong> proper names that are used in <strong>the</strong>se o<strong>the</strong>r possible<br />
worlds. For <strong>the</strong>se proper names are only possible, not actual. Note too that appeal to <strong>the</strong><br />
ambiguity in scope due to <strong>the</strong> interaction between modalities and definite descriptions in<br />
(1) and (2) does not work ei<strong>the</strong>r. For <strong>the</strong> problem is that we are dealing with fiction and<br />
fictional names and hence, <strong>the</strong>re are no individuals that could stand in <strong>the</strong> place <strong>of</strong> <strong>the</strong>se<br />
fictional characters o<strong>the</strong>r than <strong>the</strong> ones that satisfy <strong>the</strong> definite descriptions in question in<br />
each <strong>of</strong> <strong>the</strong> possible worlds. Thus, we can explain <strong>the</strong> consistency <strong>of</strong> <strong>the</strong> following pair<br />
<strong>of</strong> sentences:<br />
(3) Necessarily, <strong>the</strong> Queen <strong>of</strong> England is queen<br />
(4) The Queen <strong>of</strong> England may not have been queen<br />
by noticing <strong>the</strong> distinction in scope <strong>of</strong> <strong>the</strong> occurrences <strong>of</strong> <strong>the</strong> definite description ‘<strong>the</strong><br />
Queen <strong>of</strong> England in (3) and (4), and explain that (4) can be true compatibly with <strong>the</strong><br />
truth <strong>of</strong> (3) because <strong>the</strong>re is an individual –i.e. <strong>the</strong> Queen <strong>of</strong> England– who can exist in<br />
ano<strong>the</strong>r possible world and not be <strong>the</strong> Queen <strong>of</strong> England in it. As I said, unlike in <strong>the</strong><br />
case <strong>of</strong> fiction, this is possible precisely because <strong>the</strong>re is in fact an individual who is <strong>the</strong><br />
Queen <strong>of</strong> England in <strong>the</strong> actual world, whereas <strong>the</strong>re is no such individual for <strong>the</strong> definite<br />
description that <strong>the</strong> fictional name Anna Karenina allegedly abbreviates.<br />
5 Non-Descriptivist Possible Worlds Views <strong>of</strong> Fiction<br />
One might think that perhaps <strong>the</strong>re are o<strong>the</strong>r possible worlds views <strong>of</strong> fiction that are<br />
not descriptivist that could handle this problem <strong>of</strong> <strong>the</strong> fictional contingencies <strong>of</strong> fictional<br />
characters. I shall very briefly argue that <strong>the</strong> only available ones are not very attractive.<br />
Descriptivism seems to be <strong>the</strong> most plausible possible worlds view <strong>of</strong> fiction.<br />
I see two options: one might defend Meignonianism and say that fictional characters<br />
61
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
actually exist in some special mysterious way and that fictional names are like ordinary<br />
proper names that rigidly refer to <strong>the</strong>m. Or one might defend <strong>the</strong> view that fictional characters<br />
are abstract objects, which actually exist and to which <strong>the</strong> fictional names rigidly<br />
refer. Within this last option I see two fur<strong>the</strong>r options: one might say that <strong>the</strong>se abstract<br />
objects are only contingently so, so that in o<strong>the</strong>r worlds <strong>the</strong>se same objects exist but are<br />
concrete instead <strong>of</strong> abstract in <strong>the</strong>se worlds. The existence <strong>of</strong> <strong>the</strong>se contingently nonconcrete<br />
is defended by Bernard Linsky and Edward N. Zalta not with respect to fictional<br />
characters but with respect to mere possible objects –i.e. possibilia. Or one might defend<br />
that <strong>the</strong>se abstract objects, like any o<strong>the</strong>r abstract objects, are necessarily abstract, in<br />
which case, <strong>the</strong>y only can do what <strong>the</strong>ir fictions tell <strong>the</strong>y do in worlds that are impossible,<br />
for <strong>the</strong>re are things that only concrete objects can do. Thus, if <strong>the</strong>se abstract objects are<br />
to do <strong>the</strong>m, it can only occur in impossible worlds ra<strong>the</strong>r than possible ones. This is <strong>the</strong><br />
Millian view defended by Nathan Salmon.<br />
On <strong>the</strong> one hand, <strong>the</strong> first option, Meignonianism, is wholly mysterious and hence, no<br />
plausible at all. On <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> only option left which explains fictions in terms <strong>of</strong><br />
possibilities is <strong>the</strong> option that sees fictional characters as contingently nonconcrete objects<br />
and, hence, consists in <strong>the</strong> very implausible claim that some actual abstract objects can be<br />
concrete and some actual concrete objects can be abstract. In view <strong>of</strong> <strong>the</strong> alternatives to<br />
descriptivism about fiction, I think we can conclude that fiction should not be dealt with<br />
in terms <strong>of</strong> possible worlds.<br />
6 Some Positive Suggestions<br />
I think this problem is easily solved once we simply abandon <strong>the</strong> idea <strong>of</strong> explaining fiction<br />
in terms <strong>of</strong> possible worlds. I would like to defend that story-worlds are not possible<br />
worlds even if <strong>the</strong>y are ontologically <strong>the</strong> same kind <strong>of</strong> thing: that is, sets <strong>of</strong> sentences<br />
or propositions. The difference between story worlds and possible worlds would just be<br />
that only <strong>the</strong> later represent possibilities with respect to <strong>the</strong> actual world. The fictional<br />
contingencies <strong>of</strong> fictional characters should be explained by appealing to those worlds<br />
which would be possible but only with respect to <strong>the</strong> world <strong>of</strong> <strong>the</strong> story and not with<br />
respect to our actual world.<br />
This way out <strong>of</strong> <strong>the</strong> problem would be possible because fictional names, on <strong>the</strong> o<strong>the</strong>r<br />
hand, are not abbreviated non-rigid definite descriptions, but merely empty rigid proper<br />
names, that is, proper names that do not have a referent. The meaning <strong>of</strong> fictional names<br />
should be derived, in my view, from <strong>the</strong> fact that part <strong>of</strong> <strong>the</strong> meaning <strong>of</strong> any proper name<br />
is <strong>the</strong> meaning <strong>of</strong> a rigid definite description associated with <strong>the</strong>m. Any proper name N,<br />
when used, in addition to rigidly refer to <strong>the</strong>ir bearer, semantically expresses some definite<br />
description like ‘<strong>the</strong> bearer <strong>of</strong> N or ‘<strong>the</strong> individual called N, where <strong>the</strong> token <strong>of</strong> <strong>the</strong> name<br />
N that occurs within that description is used <strong>the</strong> same way as <strong>the</strong> name N. This view<br />
about proper names in general is a view that I learnt from Manuel Garcia-Carpinteros<br />
work. Note that this view does not say that proper names are synonymous to definite<br />
descriptions, as Saul Kripke showed this is incorrect, and that we can compatibly say that<br />
Anna Karenina nei<strong>the</strong>r actually nor possibly exist.<br />
Finally, I also think that in addition to <strong>the</strong> fictional operator ‘in <strong>the</strong> fiction f, <strong>the</strong>re is<br />
ano<strong>the</strong>r fictional operator that we use, whe<strong>the</strong>r explicitly or implicitly, in our fictional<br />
discourse. When we use fictional names to talk about <strong>the</strong>m as fictional characters instead<br />
62
<strong>of</strong> as <strong>the</strong> individuals that <strong>the</strong>se fictional characters represent in <strong>the</strong> fictions, we ei<strong>the</strong>r say<br />
‘<strong>the</strong> fictional character N or we just utter <strong>the</strong> name N. It is my view that even in <strong>the</strong> later<br />
case, <strong>the</strong> expression ‘<strong>the</strong> fictional character is <strong>the</strong>re, though only in an implicit way. It<br />
is <strong>the</strong> interaction between this expression and fictional names that makes our fictional<br />
discourse when talking about fictional characters meaningful. How this interaction works<br />
is something we have yet to discover. I do not know.<br />
7 Conclusion<br />
I have argued that <strong>the</strong>re is a problem with <strong>the</strong> fictional contingent properties <strong>of</strong> fictional<br />
characters that descriptivism about fiction cannot solve. I have also argued that o<strong>the</strong>r<br />
alternative views on fiction that explain it in terms <strong>of</strong> possible worlds do not seem any<br />
plausible. Finally, I have provided some positive suggestions to develop in order to explain<br />
fiction and <strong>the</strong> problem posed by fictional contingencies. These are suggestions that<br />
I plan to develop soon.<br />
Acknowledgements<br />
I would like to thank <strong>the</strong> extremely useful comments to earlier drafts <strong>of</strong> this work that I<br />
have received from Manuel Garcia-Carpintero, Dominic McIver Lopes, Genoveva Marti,<br />
Francis Jeffry Pelletier, Pablo Rychter and Ori Simchen as well as <strong>the</strong> extremely useful<br />
patience and interest that Stefano Predelli showed in discussing it with me. I am also<br />
thankful to <strong>the</strong> anonymous referees for <strong>the</strong>ir interesting points that I have tried to include<br />
in this final version <strong>the</strong> best I could.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Currie, G. (1988). Fictional names, Australasian Journal <strong>of</strong> Philosophy 66.<br />
Currie, G. (1990). The Nature <strong>of</strong> Fiction, Cambridge: Cambridge University Press.<br />
Currie, G. (2003). Characters and contingency, Dialectica 57.<br />
Kripke, S. (1972). Naming and Necessity, Harvard University Press.<br />
Lewis, D. (1978/1983). Truth in fiction, Reprinted in David Lewis: Philosophical Papers<br />
I.: Oxford: Oxford University Press.<br />
Linsky, B. and Zalta, E. N. (1996). In defense <strong>of</strong> <strong>the</strong> contingently nonconcrete, Philosophical<br />
Studies 84/2-3.<br />
Salmon, N. (1998). Nonexistence, Nous 32.<br />
Stalnaker, R. (1999). Assertion, Context and Content, Oxford : Oxford University Press.<br />
63
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
64
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
MEANING & INFERENCE IN CASE OF CONFLICT<br />
Michael Franke<br />
Universiteit van Amsterdam<br />
Abstract. This paper applies a model <strong>of</strong> boundedly rational “level-k thinking” (c.f. Stahl<br />
and Wilson, 1995; Crawford, 2003; Camerer, Ho and Chong, 2004) to a classical concern <strong>of</strong><br />
game <strong>the</strong>ory: when is information credible and what shall I do with it if it is not? The<br />
model presented here extends and generalizes recent work in game-<strong>the</strong>oretic pragmatics<br />
(Stalnaker, 2006; Jäger, 2007; Benz and van Rooij, 2007). Pragmatic inference is modeled<br />
as a sequence <strong>of</strong> iterated best responses, defined here in terms <strong>of</strong> <strong>the</strong> interlocutors’ epistemic<br />
states. Credibility considerations are a special case <strong>of</strong> a more general pragmatic inference<br />
procedure at each iteration step. The resulting analysis <strong>of</strong> message credibility improves on<br />
previous game-<strong>the</strong>oretic analyses, is more general and places credibility in <strong>the</strong> linguistic<br />
context where it, arguably, belongs.<br />
1 Semantic Meaning and Credible Information in Signaling Games<br />
The perhaps simplest game-<strong>the</strong>oretic model <strong>of</strong> language use is a signaling game with<br />
meaningful signals. A sender S observes <strong>the</strong> state <strong>of</strong> <strong>the</strong> world t ∈ T in private and<br />
chooses a message m from a set <strong>of</strong> alternatives M all <strong>of</strong> which are assumed to be meaningful<br />
in <strong>the</strong> (unique and commonly known) language shared by S and a receiver R. In<br />
turn, R observes <strong>the</strong> sent message and chooses an action a from a given set A. In general,<br />
<strong>the</strong> pay<strong>of</strong>fs for both S and R depend on <strong>the</strong> state t, <strong>the</strong> sent message m and <strong>the</strong> action a<br />
chosen by <strong>the</strong> receiver. Formally, a SIGNALING GAME WITH MEANINGFUL SIGNALS is<br />
a tuple 〈{S, R} , T, Pr, M, [·] , A, US, UR〉 where Pr ∈ ∆(T ) is a probability distribution<br />
over T ; [·] : M → P(T ) is a semantic denotation function and US,R : M × A × T → R<br />
are utility functions for both sender and receiver. 1 We can conceive <strong>of</strong> such signaling<br />
games as abstract ma<strong>the</strong>matical models <strong>of</strong> a conversational context whose most important<br />
features <strong>the</strong>y represent: <strong>the</strong> interlocutors’ beliefs, behavioral possibilities and preferences.<br />
If a signaling game is a context model, <strong>the</strong> game’s solution concept is what yields a<br />
prediction <strong>of</strong> <strong>the</strong> behavior <strong>of</strong> agents in <strong>the</strong> modelled conversational situation. The following<br />
easy example <strong>of</strong> a scalar implicature, e.g., <strong>the</strong> inference that not all students came<br />
when hearing <strong>the</strong> sentence “Some <strong>of</strong> <strong>the</strong> students came”, makes this distinction clear. A<br />
simple context model for this case is <strong>the</strong> signaling game G1: 2 <strong>the</strong>re are two states t∃¬∀ and<br />
t∀, two messages msome and mall with semantic meaning as indicated and two receiver<br />
interpretation actions a∃¬∀ or a∀ which correspond one-to-one with <strong>the</strong> states; sender and<br />
receiver pay<strong>of</strong>fs are aligned: an implementation <strong>of</strong> <strong>the</strong> standard assumption that conversation<br />
and implicature calculation revolve around <strong>the</strong> cooperative principle (Grice, 1989). A<br />
solution concept, whatever it may be, should <strong>the</strong>n ideally predict that S t∀ (S t∃¬∀) chooses<br />
msome (mall) and <strong>the</strong> receiver responds with action a∃¬∀ (a∀). 3<br />
1 I will assume throughout that (i) all sets T , M and A are non-empty and finite, that (ii) Pr(t) > 0 for<br />
all t ∈ T , that (iii) for each state t <strong>the</strong>re is at least one message m which is true in that state and that (iv) no<br />
message is contradictory, i.e., <strong>the</strong>re is no m for which [m] = ∅.<br />
2 Unless indicated, I assume that states are equiprobable in example games.<br />
3 For t ∈ T , I write S t as an abbreviation for “a sender <strong>of</strong> type t”.<br />
65
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
a∃¬∀ a∀ msome mall<br />
t∃¬∀ 1,1 0,0<br />
t∀ 0,0 1,1<br />
√<br />
√ √−<br />
G1: “Scalar Implicatures”<br />
amate aignore mhigh mlow<br />
√<br />
thigh 1,1 0,0<br />
tlow 1,0 0,1 −<br />
G2: “Partial Conflict”<br />
It is obvious that in order to arrive at this prediction, a special role has to be assigned to<br />
<strong>the</strong> conventional, semantic meaning <strong>of</strong> <strong>the</strong> messages involved. For instance, in <strong>the</strong> above<br />
example anti-semantic play, as we could call it, that simply reverses <strong>the</strong> use <strong>of</strong> messages,<br />
should be excluded. Most game-<strong>the</strong>oretic models <strong>of</strong> language use hard-wire semantic<br />
meaning into <strong>the</strong> game play, ei<strong>the</strong>r as a restriction on available moves <strong>of</strong> sender and receiver,<br />
or into <strong>the</strong> pay<strong>of</strong>fs, but in both cases effectively enforcing truthfulness and trust.<br />
This is fine as long as conversation is mainly cooperative and preferences aligned. But<br />
let’s face it: <strong>the</strong> central Gricean assumption <strong>of</strong> cooperation is an optimistic idealization<br />
after all; conflict, lies and deceit are as ubiquitous as air. But <strong>the</strong>n, hard-wiring <strong>of</strong> truthfulness<br />
and trust limits <strong>the</strong> applicability <strong>of</strong> our models as it excludes <strong>the</strong> possibility that<br />
senders may wish to mislead <strong>the</strong>ir audience. We should aim for more general models and,<br />
ideally, let <strong>the</strong> agents, not <strong>the</strong> modeller decide when to be truthful and what to trust.<br />
Opposed to hard-wiring truthfulness and trust, <strong>the</strong> most liberal case at <strong>the</strong> o<strong>the</strong>r end<br />
<strong>of</strong> <strong>the</strong> spectrum is to model communication, not considering reputation or fur<strong>the</strong>r psychological<br />
constraints at all, as cheap talk. Here messages do not impose restrictions on<br />
<strong>the</strong> game play and are entirely pay<strong>of</strong>f irrelevant: US,R(m, a, t) = US,R(m ′ , a, t) for all<br />
m, m ′ ∈ M, a ∈ A and t ∈ T . However, if talk is cheap, yet exogenously meaningful, <strong>the</strong><br />
question arises how to integrate semantic meaning into <strong>the</strong> game. Standard solution concepts,<br />
such as sequential equilibrium or rationalizability, are too weak to predict anything<br />
reasonable in this case: <strong>the</strong>y allow for nearly all anti-semantic play and also for babbling,<br />
where signals are sent, as it were, arbitrarily and <strong>the</strong>refore ignored by <strong>the</strong> receiver.<br />
In response to this problem, game <strong>the</strong>orists have proposed various refinements <strong>of</strong> <strong>the</strong><br />
standard solution concepts based on <strong>the</strong> notion <strong>of</strong> credibility. 4 The idea is that semantic<br />
meaning should be respected (in <strong>the</strong> solution concept) wherever this is reasonable in view<br />
<strong>of</strong> <strong>the</strong> possibly diverging preferences <strong>of</strong> interlocutors. As an easy example, look at game<br />
G2 where S is <strong>of</strong> ei<strong>the</strong>r a high quality or a low quality type, and where R would like<br />
to pair with S thigh only, while S wants to pair with R irrespective <strong>of</strong> her type. Interests<br />
are in partial conflict here and, intuitively, a costless, non-committing message mhigh<br />
is not credible, because S tlow would have all reason to send it untruthfully. Therefore,<br />
intuitively, R should ignore whatever S says in this game. In general, if nothing prevents<br />
S from babbling, lying or deceiving, she might as well do so; whenever she even has an<br />
incentive to, she certainly will. For <strong>the</strong> receiver <strong>the</strong> central question becomes: when is a<br />
signal credible and what should I do if it is not?<br />
This paper <strong>of</strong>fers a fresh look at this classical problem <strong>of</strong> game <strong>the</strong>ory. The novelty<br />
is, so to speak, a “linguistic turn”: I suggest that credibility considerations are pragmatic<br />
inferences, in some sense very much alike —and in ano<strong>the</strong>r sense very much unlike—<br />
conversational implicatures. I argue that this linguistic approach to credibility <strong>of</strong> information<br />
improves on <strong>the</strong> classical game-<strong>the</strong>oretic analyses by Farrell (1993) and Rabin<br />
4 The standards in <strong>the</strong> debate about credibility were set by Farrell (1993) for equilibrium and by Rabin<br />
(1990) for rationalizability. I will mainly focus on <strong>the</strong>se two classical papers here for reasons <strong>of</strong> space.<br />
66<br />
√−
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(1990). In order to implement conventional meaning <strong>of</strong> signals in a cheap talk model, <strong>the</strong><br />
present paper takes an epistemic approach to <strong>the</strong> solution <strong>of</strong> games: <strong>the</strong> model presented<br />
in this paper spells out <strong>the</strong> reasoning <strong>of</strong> interlocutors in terms <strong>of</strong> <strong>the</strong>ir beliefs about <strong>the</strong><br />
behavior <strong>of</strong> <strong>the</strong>ir opponents as a sequence <strong>of</strong> iterated best responses (IBR) which takes<br />
semantic meaning as a starting point. For clarity: <strong>the</strong> IBR model places no restriction<br />
whatsoever on <strong>the</strong> use <strong>of</strong> signals; conventional meaning is implemented merely as a focal<br />
element in <strong>the</strong> deliberation <strong>of</strong> agents. This way, <strong>the</strong> IBR model extends recent work<br />
in game-<strong>the</strong>oretic pragmatics (Jäger, 2007; Benz and van Rooij, 2007), to which it adds<br />
generality by taking diverging preferences into account and by implementing <strong>the</strong> basic assumptions<br />
<strong>of</strong> “level-k models” <strong>of</strong> reasoning in games (cf. Stahl and Wilson, 1995; Crawford,<br />
2003; Camerer et al., 2004). In particular, agents in <strong>the</strong> model are assumed to be<br />
boundedly rational in <strong>the</strong> sense that each agent computes only finitely many steps <strong>of</strong> <strong>the</strong><br />
best response sequence. Section 2 scrutinizes <strong>the</strong> notion <strong>of</strong> credibility, section 3 spells out<br />
<strong>the</strong> formal model and section 4 discusses its properties and predictions.<br />
2 Credibility and Pragmatic Inference<br />
The classical idea <strong>of</strong> message credibility is due to Farrell (1993). Farrell seeks an equilibrium<br />
refinement that pays due respect to <strong>the</strong> semantic meaning <strong>of</strong> messages. His notion<br />
<strong>of</strong> credibility is <strong>the</strong>refore tied to a given reference equilibrium as a status quo. According<br />
to Farrell, <strong>the</strong>n, a message m is FARRELL-CREDIBLE with respect to a given equilibrium<br />
if all t ∈ [m] prefer <strong>the</strong> receiver to interpret m literally, i.e., to play a best response to <strong>the</strong><br />
belief Pr(·| [m]) that m is true, over <strong>the</strong> equilibrium play, while no type t �∈ [m] does.<br />
A number <strong>of</strong> objections can be raised against Farrell-credibility. First <strong>of</strong> all, <strong>the</strong> definition<br />
requires all types in [m] to prefer a literal interpretation <strong>of</strong> m over <strong>the</strong> reference<br />
equilibrium. This makes sense, under Farrell’s Rich Language Assumption (RLA) that<br />
for every X ⊆ T <strong>the</strong>re is a message m with [m] = X. This assumption is prevalent in<br />
game-<strong>the</strong>oretic discussions <strong>of</strong> credibility, but restricts applicability. I will show in section<br />
4 that this assumption seriously restricts Rabin’s (1990) account. But for now, suffice<br />
it to say that, in particular, <strong>the</strong> RLA excludes models like G1, used to study pragmatic<br />
inference in <strong>the</strong> light <strong>of</strong> (partial) inexpressibility. I will drop <strong>the</strong> RLA here to aim for<br />
more generality and compatibility with linguistic pragmatics. 5 Doing so, implies amending<br />
Farrell-credibility to require only that some types in [m] prefer a literal interpretation<br />
<strong>of</strong> m over <strong>the</strong> reference equilibrium.<br />
Still, <strong>the</strong>re are fur<strong>the</strong>r problems. Mat<strong>the</strong>ws, Okuno-Fujiwara and Postlewaite (1991)<br />
criticize Farrell-credibility as being too strong. Their argument builds on example G3.<br />
Compared to <strong>the</strong> babbling equilibrium, in which R performs a3, messages m1 and m2 are<br />
intuitively credible: both S t1 , as well as S t2 have good reason to send m1 and m2 respectively.<br />
Communication seems possible and utterly plausible. However, nei<strong>the</strong>r message is<br />
Farrell-credible, because for i, j ∈ {1, 2} and i �= j not only S tj , but also S ti prefers R to<br />
play a best response to a literal interpretation <strong>of</strong> mj, which would trigger action aj, over<br />
5 A reviewer points out that <strong>the</strong> RLA has a correspondent in <strong>the</strong> linguistic world in Katz’s (1981) “principle<br />
<strong>of</strong> effability”. The reviewer supports dropping <strong>the</strong> RLA, because o<strong>the</strong>rwise pragmatic inferences is<br />
limited to context and effort considerations. It is also very common (and, to my mind, reasonable) to restrict<br />
attention to certain alternative expressions only, namely those that are salient (in context) after observing a<br />
message. Of course, game <strong>the</strong>ory is silent as to where <strong>the</strong> alternatives come from, since this is a question<br />
for <strong>the</strong> linguist, perhaps even <strong>the</strong> syntactician (cf. Katzir, 2007).<br />
67
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
a1 a2 a3 m1 m2<br />
t1 4,3 3,0 1,2 √<br />
t2 3,0 4,3 1,2 −<br />
G3: “Best Message Counts”<br />
√−<br />
a1 a2 a3 a4 m12 m23 m<strong>13</strong><br />
t1 4,5 5,4 0,0 1,4 √ √<br />
√ √−<br />
t2 0,0 4,5 5,4 1,4 √ √−<br />
t3 5,4 0,0 4,5 1,4 −<br />
G4: “Fur<strong>the</strong>r Iteration”<br />
<strong>the</strong> no-communication outcome a3. The problem with Farrell’s notion is obviously that<br />
just doing better than equilibrium is not enough reason to send a message, when sending<br />
ano<strong>the</strong>r message is even better for <strong>the</strong> sender. When evaluating <strong>the</strong> credibility <strong>of</strong> a<br />
message m, we have to take into account alternative forms that t �∈ [m] might want to<br />
send.<br />
Compare this with <strong>the</strong> scalar implicature in G1. Message msome is interpreted as communicating<br />
that <strong>the</strong> true state <strong>of</strong> affairs is t∃¬∀, because in t∀ <strong>the</strong> sender would have used<br />
mall. In o<strong>the</strong>r words, <strong>the</strong> receiver discards a state t ∈ [m] as a possible sender <strong>of</strong> m<br />
because that type has a better message to send. Of course, such pragmatic enrichment<br />
does not make a message intuitively incredible, as it is still used in line with its semantic<br />
meaning. Intuitively speaking, in G1 S even wants R to draw this pragmatic inference.<br />
This is, <strong>of</strong> course, different in G2. In general, if S wants to mislead, she intuitively<br />
wants <strong>the</strong> receiver to adopt a certain belief, but she does not want <strong>the</strong> receiver to realize<br />
that this belief might be false: we could say, somewhat loosely, that S wants her purported<br />
communicative intention to be recognized (and acted upon), but she does not want<br />
her deceptive intention to be recognized. Never<strong>the</strong>less, if <strong>the</strong> receiver does manage to<br />
recognize a deceptive intention, this too may lead to some kind <strong>of</strong> pragmatic inference,<br />
albeit one that <strong>the</strong> sender did not intend <strong>the</strong> receiver to draw. While <strong>the</strong> implicature in G1<br />
rules out a semantically feasible possibility, credibility considerations, in a sense, do <strong>the</strong><br />
exact opposite: message mhigh is pragmatically weakened in G2 by ruling in state tlow.<br />
Despite <strong>the</strong> differences, <strong>the</strong>re is a common core to both implicature and credibility<br />
inference. Here and <strong>the</strong>re, <strong>the</strong> receiver seems to reason: which types <strong>of</strong> senders would<br />
send this message given that I believe it literally? Indeed, exactly this kind <strong>of</strong> reasoning<br />
underlies Benz and van Rooij’s (2007) model <strong>of</strong> implicature calculation for <strong>the</strong> purely<br />
cooperative case. The driving observation <strong>of</strong> this paper is that <strong>the</strong> same reasoning might<br />
not only rule out states t ∈ [m] to yield implicatures but may also rule in states t �∈ [m].<br />
When <strong>the</strong> latter is <strong>the</strong> case, m seems intuitively incredible. Still, <strong>the</strong> reasoning pattern<br />
by which implicatures and credibility-based inferences are computed is <strong>the</strong> same. On<br />
superficial reading, this view on message credibility can be found in Stalnaker (2006)<br />
: 6 call a message m BVRS-CREDIBLE (Benz, van Rooij, Stalnaker) iff for some types<br />
t ∈ [m], but for no type t �∈ [m] S t ’s expected utility <strong>of</strong> sending m given that R interprets<br />
literally is at least as great as S t ’s expected utility <strong>of</strong> sending any alternative message m ′ .<br />
The notion <strong>of</strong> BvRS-credibility matches our intuitions in all <strong>the</strong> cases discussed so far,<br />
but it is, in a sense, self-refuting, as G4 from Mat<strong>the</strong>ws et al. (1991) shows. In this game,<br />
all <strong>the</strong> available messages m12, m23 and m<strong>13</strong> are BvRS-credible, because if R interprets<br />
6 It is unfortunately not entirely clear to me what exactly Stalnaker’s proposal amounts to, as insightful<br />
as it might be, because <strong>the</strong> account is not fully spelled out formally. The basic idea seems to be that<br />
(something like) <strong>the</strong> notion <strong>of</strong> BvRS-credibility, as it is called here, should be integrated as a constraint on<br />
receiver beliefs —believe a message iff it is BvRS-credible— into an epistemic model <strong>of</strong> <strong>the</strong> game toge<strong>the</strong>r<br />
with some appropriate assumption <strong>of</strong> (common) belief in rationality. The class <strong>of</strong> game models that satisfies<br />
rationality and credibility constraints would <strong>the</strong>n ultimately define how signals are used and interpreted.<br />
68
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
literally S t1 will use message m12, S t2 will use message m23 and S t3 will use message m<strong>13</strong>.<br />
No message is used untruthfully by any type. However, if R realizes that exactly S t1 uses<br />
message m12, he would ra<strong>the</strong>r not play a2, but a1. But if <strong>the</strong> sender realizes that message<br />
m12 triggers <strong>the</strong> receiver to play a1, suddenly S t3 wants to send m12 untruthfully. This<br />
example shows that BvRS-credibility is a reliable start, but stops too short. If messages<br />
are deemed credible and <strong>the</strong>refore believed, this may create an incentive to mislead. What<br />
seems needed to rectify <strong>the</strong> formal analysis <strong>of</strong> message credibility is a fully spelled-out<br />
model <strong>of</strong> iterated best responses that starts in <strong>the</strong> Benz-van-Rooij-Stalnaker way and <strong>the</strong>n<br />
carries on iterating. Here is such a model.<br />
3 The IBR Model and its Assumptions<br />
3.1 Assumptions: Focal Meaning and Bounded Rationality<br />
The IBR model presented in this paper rests on three assumptions with which it also sets<br />
itself apart from previous best-response models in formal pragmatics (Jäger, 2007; Benz<br />
and van Rooij, 2007; Jäger, 2008). The first assumption is <strong>the</strong> Focal Meaning Assumption:<br />
semantic meaning is focal in <strong>the</strong> sense that <strong>the</strong> sequence <strong>of</strong> best responses starts with a<br />
purely semantic truth-only sender strategy. Semantic meaning is also assumed focal in<br />
<strong>the</strong> sense that throughout <strong>the</strong> IBR sequence R believes messages to be truthful unless<br />
S has a positive incentive to be untruthful. This is <strong>the</strong> second, so called Truth Ceteris<br />
Paribus Assumption (TCP). These two (epistemic) assumptions assign semantic meaning<br />
its proper place in this model <strong>of</strong> cheap-talk communication.<br />
The third assumption is <strong>the</strong> Bounded Rationality Assumption: I assume that players<br />
in <strong>the</strong> game have limited resources which allow <strong>the</strong>m to reason only up to some finite<br />
iteration depth k. At <strong>the</strong> same time I take agents to be overconfident: each agent beliefs<br />
that she is smarter than her opponent. Camerer et al. (2004) make an empirical case for<br />
<strong>the</strong>se assumptions about <strong>the</strong> psychology <strong>of</strong> reasoners. 7 However, for simplicity, I do not<br />
implement Camerer et al.’s (2004) Cognitive Hierarchy Model in full. Camerer et al.<br />
assume that each agent who is able to reason up to strategic depth k has a proper belief<br />
about <strong>the</strong> population distribution <strong>of</strong> players who reason up to depth l < k, but I will<br />
assume here, just to keep things simple, that each player believes that she is exactly one<br />
step ahead <strong>of</strong> her opponent (cf. Crawford, 2003; Crawford, 2007). (I will discuss this<br />
simplifying assumption critically in section 4.)<br />
3.2 Beliefs & Best Responses<br />
Given a signaling game, a SENDER SIGNALING-STRATEGY is a function σ ∈ S =<br />
(∆(M)) T and a RECEIVER RESPONSE-STRATEGY is a function ρ ∈ R = (∆(A)) M .<br />
In order to define which strategies are best responses to a given belief, we need to define<br />
<strong>the</strong> game-relevant beliefs <strong>of</strong> both S and R. Since <strong>the</strong> only uncertainty <strong>of</strong> S concerns what<br />
R will do, <strong>the</strong> set <strong>of</strong> relevant SENDER BELIEFS ΠS is just <strong>the</strong> set <strong>of</strong> receiver responsestrategies:<br />
ΠS = R. On <strong>the</strong> receiver’s side, we may say, with some redundancy, that <strong>the</strong>re<br />
7 A good intuitively accessible example why this should be is a so-called beauty contest game (cf. Ho,<br />
Camerer and Weigelt, 1998). Each player from a group <strong>of</strong> size n > 2 chooses a number from 0 to 100. The<br />
player closest to 2/3 <strong>the</strong> average wins. When this game is played with a group <strong>of</strong> subjects who have never<br />
played <strong>the</strong> game before, <strong>the</strong> usual group average lies somewhere between 20 to 30. This is quite far from<br />
<strong>the</strong> group average 0 which we would expect from common (true) belief in rationality. Everybody seems to<br />
believe that <strong>the</strong>y are just a smarter than everybody else, without noticing <strong>the</strong>ir own limitations.<br />
69
are three components in any game-relevant belief (cf. Battigalli, 2006): firstly, R has a<br />
prior belief Pr(·) about <strong>the</strong> true state <strong>of</strong> <strong>the</strong> world; secondly, he has a belief about <strong>the</strong><br />
sender’s signaling strategy; and thirdly, he has a posterior belief about <strong>the</strong> true state after<br />
hearing a message. Posteriors should be derived by Bayesian update from <strong>the</strong> former two<br />
components, but also specify R’s beliefs after unexpected surprise messages. Taken to-<br />
ge<strong>the</strong>r, <strong>the</strong> set <strong>of</strong> relevant RECEIVER BELIEFS ΠR is <strong>the</strong> set <strong>of</strong> all triples 〈π 1 R , π2 R , π3 R<br />
〉 for<br />
which π1 R = Pr, π2 R ∈ S = (∆(M))T and π3 R ∈ (∆(T ))M such that for any t ∈ T and<br />
m ∈ M if π2 R (t, m) �= 0, <strong>the</strong>n:<br />
π 3 R(m, t) =<br />
π1 R (t) × π2 R (t, m)<br />
�<br />
t ′ ∈T π1 R (t′ ) × π2 R (t′ , m) .<br />
Given a sender belief ρ ∈ ΠS, say that σ is a BEST RESPONSE SIGNALING STRATEGY<br />
to belief ρ iff for all t ∈ T and m ∈ M we have:<br />
σ(t, m) �= 0 → m ∈ arg max<br />
m ′ �<br />
ρm<br />
∈M<br />
′(a) × US(m ′ , a, t)<br />
The set <strong>of</strong> all such best responses to belief ρ is denoted by S(ρ). Given a receiver belief<br />
πR ∈ ΠR say that ρ is a BEST RESPONSE STRATEGY to belief πR iff for all m ∈ M and<br />
a ∈ A we have:<br />
ρ(m, a) �= 0 → a ∈ arg max<br />
a ′ ∈A<br />
�<br />
t∈T<br />
a∈A<br />
π 3 R(m, t) × UR(m, a ′ , t)<br />
The set <strong>of</strong> all such best responses to belief πR is denoted by R(πR). Also, if Π ′ R ⊆ ΠR is<br />
a set <strong>of</strong> receiver beliefs, let R(Π ′ R R(πR).<br />
) = �<br />
πR∈Π ′ R<br />
3.3 Strategic Types and <strong>the</strong> IBR sequence<br />
In line with <strong>the</strong> Bounded Rationality Assumption <strong>of</strong> Section 3.1, I assume that senders<br />
and receivers are <strong>of</strong> different strategic types. Strategic types correspond to <strong>the</strong> level k <strong>of</strong><br />
strategic depth a player in <strong>the</strong> game performs (while believing she <strong>the</strong>reby outperfoms her<br />
opponent by exactly one step <strong>of</strong> reasoning). I will give an inductive definition <strong>of</strong> strategic<br />
types in terms <strong>of</strong> players beliefs, starting with a fixed strategy σ∗ 0 <strong>of</strong> S0. 8 Then, for any<br />
k ≥ 0, Rk is characterized by a belief set π∗ Rk ⊆ ΠR that S is a level-k sender and Sk+1 is<br />
characterized by a belief π∗ Sk+1 ∈ ΠS that R is a level-k receiver.<br />
I assume that S0 plays according to <strong>the</strong> signaling strategy σ∗ 0 which simply sends any<br />
true message with equal probability in all states. There need not be any belief to which<br />
this is a best response, as level-0 senders are (possibly irrational) dummies to implement<br />
<strong>the</strong> Focal Meaning Assumption. R0 <strong>the</strong>n believes that he is facing S0. With unique σ∗ 0,<br />
which sends all messages in M with positive probability (M is finite and contains no<br />
contradictions), R0 is characterized entirely by <strong>the</strong> unique belief π∗ Ro that S plays σ∗ 0.<br />
In general, Rk believes that he is facing a level-k sender. For k > 0, Sk is characterized<br />
by a belief π∗ Sk ∈ ΠS. Rk consequently believes that Sk plays a best response σk ∈<br />
S(π∗ Sk ) to this belief. We can leave this unrestricted and assume that Rk considers any<br />
) possible. But it will transpire that for an intuitively appealing analysis <strong>of</strong><br />
σk ∈ S(π ∗ Sk<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
8 I will write Sk and Rk to refer to a sender or receiver <strong>of</strong> strategic type k. Likewise, S t k<br />
<strong>of</strong> strategic type k and knowledge type t.<br />
70<br />
refers to a sender
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
message credibility we need to assume that Rk takes Sk to be truthful all else being equal<br />
(see also discussion in section 4). We implement <strong>the</strong> TCP Assumption <strong>of</strong> Section 3.1 as<br />
a restriction S∗ (π∗ Sk ) ⊆ S(π∗ ) on signaling strategies held possible by R. Of course,<br />
Sk<br />
even when restricted, <strong>the</strong>re need not be a unique signaling strategy here. As a general<br />
tie-break rule, assume <strong>the</strong> “principle <strong>of</strong> insufficient reason” that all σk ∈ S∗ (π∗ ) are Sk<br />
equiprobable to Rk. That means that Rk effectively believes that his opponent is playing<br />
response strategy<br />
σ ∗ �<br />
σ∈S<br />
k(t, m) =<br />
∗ (π∗ S ) σ(t, m)<br />
k .<br />
|S ∗ (π ∗ Sk )|<br />
This fixes Rk’s beliefs about <strong>the</strong> behavior <strong>of</strong> his opponent, but it need not fix Rk’s belief<br />
π 3 R about surprise messages. Since this matter is intricate and moreover Rk’s counterfactual<br />
beliefs do not play a crucial role in any examples discussed in this paper, I will not<br />
pursue this issue at all in this paper (but see also footnote 10 below). In general, let us<br />
and whose third<br />
say that Rk is characterized by any belief whose second component is σ∗ k<br />
component satisfies some (coherent, but possibly vacuous) assumption about <strong>the</strong> interpretation<br />
<strong>of</strong> surprise messages. Let, π∗ Rk ⊆ ΠR be <strong>the</strong> set <strong>of</strong> all such beliefs. Rk is <strong>the</strong>n fully<br />
characterized by π∗ Rk .<br />
In turn, Sk+1 believes that her opponent is a level-k receiver who plays a best response<br />
ρk ∈ R(π∗ Rk ). With <strong>the</strong> above tie-break rule Sk+1 is fully characterized by <strong>the</strong> belief<br />
3.4 Credibility and Inference<br />
ρ ∗ k(m, a) =<br />
�<br />
ρ∈R(π ∗ R k )<br />
ρ(m, a)<br />
|R(π∗ Rk )|<br />
.<br />
Define that a signal m is k-OPTIMAL in t iff σ∗ k+1 (t, m) �= 0. The set <strong>of</strong> k-optimal messages<br />
in t are all messages that Rk+1 believes St k+1 might send (thus taking <strong>the</strong> TCP<br />
Assumption into account). 9 Similarly, distill from R’s beliefs his INTERPRETATION-<br />
STRATEGY δ : M → P(T ) as given by belief πR: δπR (m) = {t ∈ T | π3 R (m, t) �= 0}.<br />
This simply is <strong>the</strong> support <strong>of</strong> <strong>the</strong> posterior beliefs <strong>of</strong> R after receiving message m. Let’s<br />
write δk for <strong>the</strong> interpretation strategy <strong>of</strong> a level-k receiver.<br />
For any k > 0, since Sk believes to face Rk−1 with interpretation strategy δk−1, wanting<br />
to send message m would intuitively count as an attempt to mislead if sent by St k just in<br />
case t �∈ δk−1(m). Such an attempt would moreover be untruthful if t �∈ [m]. While<br />
Rk−1 would be deceived, Rk would see through <strong>the</strong> attempted deception. From Rk’s<br />
point <strong>of</strong> view, who adheres to <strong>the</strong> TCP Assumption, a message m is incredible if it is<br />
k − 1-optimal in some t �∈ [m]. But <strong>the</strong>n Rk will include t in his interpretation <strong>of</strong><br />
m: recognizing a deceptive intention leads to pragmatic inference. In general, we should<br />
consider a message m credible unless some type t �∈ [m] would want to use m somewhere<br />
along <strong>the</strong> IBR sequence; precisely, m is CREDIBLE iff δk(m) ⊆ [m] for all k ≥ 0. 10<br />
9 Without <strong>the</strong> TCP Assumption, 0-optimality would be equivalent to <strong>the</strong> notion <strong>of</strong> an optimal assertion<br />
in Benz and van Rooij (2007).<br />
10 It may seem that messages which would not be sent by any type (after <strong>the</strong> first round or later) come out<br />
credible under this definition, which would not be a good prediction. (Thanks to Daniel Rothschild (p.c.) for<br />
pointing this out to me.) However, this is not quite right: we get into this predicament only for some versions<br />
<strong>of</strong> <strong>the</strong> IBR sequence, not for o<strong>the</strong>rs. It all depends on how <strong>the</strong> receiver forms his counterfactual beliefs. If,<br />
for instance, we assume that R rationalizes observed behavior even if it surprises him, we can keep <strong>the</strong><br />
71
4 Discussion<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
a1 a2 m12 m3<br />
t1 1,1 0,0 √<br />
−<br />
t2 0,0 1,1 √<br />
√−<br />
t3 0,0 1,1 -<br />
G5: “White Lie”<br />
Pr(t) a1 a2 a3 m12 m23<br />
t1 1/8 1,1 0,0 0,0 √<br />
√ √−<br />
t2 3/4 0,0 1,1 0,0 √<br />
t2 1/8 0,0 0,0 1,1 −<br />
G6: “Some Game without a Name”<br />
The IBR model makes intuitively correct predictions about message credibility for <strong>the</strong><br />
games considered so far. In G1, R0 responds to msome with <strong>the</strong> appropriate action a∃¬∀,<br />
but still interprets δ0(msome) = {t∃¬∀, t∀}. In turn, R1 interprets as δ1(msome) = {t∃¬∀}; he<br />
has pragmatically enriched <strong>the</strong> semantic meaning by taking <strong>the</strong> sender’s pay<strong>of</strong>f structure<br />
and available messages into account. After one round a fixed-point is reached, with fully<br />
revealing credible signaling in accordance with intuition. In G2, IBR predicts that both<br />
S thigh<br />
1 and S tlow<br />
1 will use mhigh which is <strong>the</strong>refore not credible. In G3, also fully revealing<br />
communication is predicted and for G4 IBR predicts that all messages are credible for R0<br />
and R1, but not for R2, hence incredible as such. In general, <strong>the</strong> IBR model predicts that<br />
communication in games <strong>of</strong> pure coordination is always credible:<br />
Proposition 4.1. Take a signaling game with T = A and US,R(·, t, t ′ ) = c > 0 if t = t ′<br />
and 0 o<strong>the</strong>rwise. Then δk(m) ⊆ [m] for all k and m.<br />
Pro<strong>of</strong>. Clearly, δ0(m) ⊆ [m] for arbitrary m. So assume that δk(m) ⊆ [m]. In this case<br />
S t k+1 will use m only if t ∈ δk(m). But <strong>the</strong>n t ∈ [m] and <strong>the</strong>refore δk+1(m) ⊆ [m].<br />
However, <strong>the</strong> IBR model does not guarantee generally that communication is credible<br />
even when preferences are perfectly aligned, i.e., US = UR. This may seem surprising at<br />
first, but is due naturally to <strong>the</strong> possibility <strong>of</strong>, what we could call, white lies: untruthful<br />
signaling that is beneficial for <strong>the</strong> receiver. These may occur if <strong>the</strong> set <strong>of</strong> available signals<br />
is not expressive enough. As an easy example, consider G5 where St2 will use m3<br />
untruthfully to induce action a2, which, however, is best for both receiver and sender.<br />
To understand <strong>the</strong> central role <strong>of</strong> <strong>the</strong> TCP assumption in <strong>the</strong> present proposal, consider<br />
<strong>the</strong> game G6. In G6, R0 has <strong>the</strong> following posterior beliefs: after hearing message m12 he<br />
rules out t3 and believes that t2 is three times as likely as t1; similarly, after hearing message<br />
m23 he rules out t1 and believes that t2 is three times as likely as t3. Consequently,<br />
R0 responds to both signals with a2. Now, S t1<br />
1 , for instance, does not care which mes-<br />
sage to choose from, as far as her expected utilities are concerned. But R1 never<strong>the</strong>less<br />
assumes that S t1<br />
1 speaks truthfully. It’s thanks to <strong>the</strong> TCP Assumption that IBR predicts<br />
messages to be credible in this game.<br />
G6 also shows a difference between <strong>the</strong> IBR model and Rabin’s (1990) model <strong>of</strong> credible<br />
communication, which superficially look very similar. Rabin’s model consists <strong>of</strong> two<br />
components: <strong>the</strong> first component is a definition <strong>of</strong> message credibility which is almost a<br />
two-step iteration <strong>of</strong> best responses starting from <strong>the</strong> semantic meaning; <strong>the</strong> second component<br />
is iterated strict dominance around a fixed core set <strong>of</strong> Rabin-credible messages<br />
definition unchanged: if no type whatsoever has an outstanding reason to send m, <strong>the</strong> receiver’s posterior<br />
beliefs after m will support any type. So, unless m is tautologous, it is incredible. Still, Rothschild’s<br />
criticism is appropriate: <strong>the</strong> definition <strong>of</strong> message credibility <strong>of</strong>fered here is, in a sense, incomplete as long<br />
as we do not properly define <strong>the</strong> receiver’s counterfactual beliefs; something left for ano<strong>the</strong>r occasion.<br />
72
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
being sent truthfully and believed. In particular, Rabin requires for m to be credible that<br />
m induces, when taken literally, exactly <strong>the</strong> set <strong>of</strong> all sender-best actions (from <strong>the</strong> set <strong>of</strong><br />
actions that are inducible by some receiver belief) <strong>of</strong> all t ∈ [m]. This is defensible under<br />
<strong>the</strong> Rich Language Assumption, but both messages in G6 fail this requirement. Consequently,<br />
with no credible message to restrict iterated strict dominance, Rabin’s model<br />
predicts a total anything-goes for game G6. This shows <strong>the</strong> limited applicability <strong>of</strong> approaches<br />
to message credibility that are inseparable from <strong>the</strong> Rich Language Assumption.<br />
The present notion <strong>of</strong> message credibility and <strong>the</strong> IBR model are not restricted in this<br />
sense and fare well with (partial) inexpressibility and <strong>the</strong> resulting inferences.<br />
To wrap up: as a solution concept, <strong>the</strong> epistemic IBR model <strong>of</strong>fers, basically, a set <strong>of</strong><br />
beliefs, viz., beliefs obtained under certain assumptions about <strong>the</strong> psychology <strong>of</strong> agents<br />
from a sequence <strong>of</strong> iterated best responses. I do not claim that this model is a reasonable<br />
model for human reasoning in general. Certainly, <strong>the</strong> simplifying assumption that<br />
players believe that <strong>the</strong>y are facing a level-k opponent, and not possibly a level-l < k opponent,<br />
is highly implausible proportional to k, but especially so for agents that have, in<br />
a manner <strong>of</strong> speaking, already reasoned <strong>the</strong>mselves through a circle multiple times. (It is<br />
easily verified that for finite M and T <strong>the</strong> IBR sequence always enters a circle after some<br />
k ∈ N.) 11 Still, I wish to defend that <strong>the</strong> IBR model does capture (our intuitions about)<br />
certain aspects <strong>of</strong> (idealized) linguistic behavior, namely pragmatic inference in cooperative<br />
and non-cooperative situations. Whe<strong>the</strong>r it is a plausible model <strong>of</strong> belief formation<br />
and reasoning in <strong>the</strong> envisaged linguistic situations is ultimately an empirical question.<br />
In conclusion, <strong>the</strong> IBR model <strong>of</strong>fers a novel perspective on message credibility and<br />
<strong>the</strong> pragmatic inferences based on this notion. The model generalizes existing game<strong>the</strong>oretical<br />
models <strong>of</strong> pragmatic inference by taking conflicting interests into account. It<br />
also generalizes game-<strong>the</strong>oretic accounts <strong>of</strong> credibility by giving up <strong>the</strong> Rich Language<br />
Assumption. The explicitly epistemic perspective on agents’ deliberation assigns a natural<br />
place to semantic meaning in cheap-talk signaling games as a focal starting point. It also<br />
highlights <strong>the</strong> unity in pragmatic inference: in this model both credibility-based inferences<br />
and implicatures are different outcomes <strong>of</strong> <strong>the</strong> same reasoning process.<br />
Acknowledgements<br />
I’d like to thank Tikitu de Jager, Robert van Rooij, Daniel Rothschild, Marc Staudacher<br />
and three anonymous referees for insightful comments, help and discussion. I moreover<br />
benefited greatly from discussing with Gerhard Jäger an early version <strong>of</strong> his paper (Jäger,<br />
2008), which also defines and applies a general iterated best response model different<br />
from what I did here. Also, I am thankful to Sven Lauer for waking my interest by first<br />
explaining to me with enormous patience some puzzles about credibility that I did not<br />
fully understand at <strong>the</strong> time (see Lauer, 2007). Errors are my own.<br />
11 It is tempting to assume that “looping reasoners” may have an Aha-Erlebnis and to extend <strong>the</strong> IBR<br />
sequence by transfinite induction assuming, for instance, that level-ω players best respond to <strong>the</strong> belief<br />
that <strong>the</strong> IBR sequence is circling. I do not know whe<strong>the</strong>r this is necessary and/or desirable for linguistic<br />
applications. We should keep in mind though that in some cases human reasoners may not get to <strong>the</strong> ideal<br />
level <strong>of</strong> reasoning in this model and in o<strong>the</strong>rs <strong>the</strong>y might even go beyond it.<br />
73
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Battigalli, P. (2006). Rationalization in signaling games: Theory and applications, International<br />
Game Theory Review 8(1): 67–93.<br />
Benz, A. and van Rooij, R. (2007). Optimal assertions and what <strong>the</strong>y implicate, Topoi<br />
26: 63–78.<br />
Camerer, C. F., Ho, T.-H. and Chong, J.-K. (2004). A cognitive hierarchy model <strong>of</strong> games,<br />
The Quarterly Journal <strong>of</strong> Economics 119(3): 861–898.<br />
Crawford, V. P. (2003). Lying for strategic advantage: Rational and boundedly rational<br />
misrepresentation <strong>of</strong> intentions, American Economic Review 93(1): <strong>13</strong>3–149.<br />
Crawford, V. P. (2007). Let’s talk it over: Coordination via preplay communication with<br />
level-k thinking. Unpublished Manuscript.<br />
Farrell, J. (1993). Meaning and credibility in cheap-talk games, Games and Economic<br />
Behavior 5: 514–531.<br />
Grice, P. H. (1989). Studies in <strong>the</strong> Ways <strong>of</strong> Words, Harvard University Press.<br />
Ho, T.-H., Camerer, C. and Weigelt, K. (1998). Iterated dominance and iterated best<br />
response in experimental “p-beauty contests”, The American Economic Review<br />
88(4): 947–969.<br />
Jäger, G. (2007). Game dynamics connects semantics and pragmatics, in A.-V. Pietarinen<br />
(ed.), Game Theory and Linguistic Meaning, Elsevier, pp. 89–102.<br />
Jäger, G. (2008). Game <strong>the</strong>ory in semantics and pragmatics. Manuscript, University <strong>of</strong><br />
Bielefeld.<br />
Katz, J. J. (1981). Language and O<strong>the</strong>r Abstract Objects, Basil Blackwell.<br />
Katzir, R. (2007). Structurally-defined alternatives. To appear in Linguistics and Philosophy.<br />
Lauer, S. (2007). Some kinds <strong>of</strong> deception do not occur: Credibility and <strong>the</strong> maxim <strong>of</strong><br />
sincerity. Unpublished Manuscript. Amsterdam, Stanford.<br />
Mat<strong>the</strong>ws, S. A., Okuno-Fujiwara, M. and Postlewaite, A. (1991). Refining cheap talk<br />
equilibria, Journal <strong>of</strong> Economic Theory 55: 247–273.<br />
Rabin, M. (1990). Communication between rational agents, Journal <strong>of</strong> Economic Theory<br />
51: 144–170.<br />
Stahl, D. O. and Wilson, P. W. (1995). On players’ models <strong>of</strong> o<strong>the</strong>r players: Theory and<br />
experimental evidence, Games and Economic Behavior 10: 218–254.<br />
Stalnaker, R. (2006). Saying and meaning, cheap talk and credibility, in A. Benz, G. Jäger<br />
and R. van Rooij (eds), Game Theory and Pragmatics, Palgrave MacMillan, pp. 83–<br />
100.<br />
74
TOWARDS A NEW CHARACTERISATION<br />
OF CHOMSKY'S HIERARCHY VIA ACCEPTANCE PROBABILITY<br />
Michael Hartwig<br />
Multimedia University, Cyberjaya, Malaysia<br />
Abstract. Researchers have recently studied <strong>the</strong> acceptance probability <strong>of</strong> P and<br />
NP languages hoping to find new ways <strong>of</strong> differentiating both classes. The paper<br />
outlines <strong>the</strong> authors findings related to <strong>the</strong> acceptance probability <strong>of</strong> regular and<br />
context-free languages, which we describe using <strong>the</strong> term <strong>of</strong> a difference shrinking<br />
chain. A first pro<strong>of</strong> technique, <strong>the</strong> inflating lemma, based on above results and able<br />
to separate higher languages from regular languages up to star height 1 as well as<br />
some incentives to apply those techniques to higher classes are given.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
“The major quest for <strong>the</strong> complexity <strong>the</strong>ory community is finding methods that may<br />
separate classes.” (Buhrmann & Torenvliet 2005) Although <strong>the</strong>re has been made an<br />
impressive progress recently within <strong>the</strong> area <strong>of</strong> complexity <strong>the</strong>ory <strong>the</strong> need for new,<br />
creative approaches that may result in methods that could be used to separate classes<br />
has not diminished and is nicely exemplified by <strong>the</strong> long outstanding P vs. NP problem.<br />
One <strong>of</strong> <strong>the</strong> recent approaches included <strong>the</strong> study <strong>of</strong> properties <strong>of</strong> <strong>the</strong> acceptance<br />
probability function <strong>of</strong> such languages, that is, <strong>the</strong> study <strong>of</strong> <strong>the</strong> form <strong>of</strong> <strong>the</strong> graph <strong>of</strong> <strong>the</strong><br />
function which takes as an argument a natural number n and returns <strong>the</strong> ratio between<br />
<strong>the</strong> number <strong>of</strong> accepted words <strong>of</strong> length n in <strong>the</strong> given language and all possible words<br />
<strong>of</strong> <strong>the</strong> same length. This study has lead to to many discoveries like <strong>the</strong> so called phase<br />
transition in <strong>the</strong> acceptance probability graph <strong>of</strong> NP complete problems (Clote &<br />
Kranakis 2002, Dubois et. al. 2000). There has been hope that if we were able to<br />
describe mentioned phase transition with more and more precision (Achlioptas et al.<br />
2001, Kirousis et al. 1998) we would <strong>the</strong>n also be able to separate P from NP.<br />
Unfortunately, this has not yet happened.<br />
Like o<strong>the</strong>r researchers we have <strong>the</strong>refore turned our attention to smaller classes<br />
like regular and context free languages first. Given such a language, we define <strong>the</strong><br />
density function dL(n) = |L ∩ Σ n | counting <strong>the</strong> number <strong>of</strong> words <strong>of</strong> length n in L. The<br />
study <strong>of</strong> <strong>the</strong> density <strong>of</strong> regular languages has a longer history (Schützenberger 1962,<br />
Eilenberg 1974, Rozenberg et al. 1997, Bodirsky at al. 2004). Languages with a density<br />
function that can be bounded from above by a polynomial (i.e. <strong>the</strong>re exists a polynomial<br />
p(x) such that dL(n) ≤ p(n)) are called sparse. If on <strong>the</strong> o<strong>the</strong>r hand <strong>the</strong>re exists a real<br />
number h > 1 such that dL(n) ≥ h n for infinitely many n ≥ 0 <strong>the</strong>n L is called dense<br />
(Demain 2003, Krieger 2007). Notice that <strong>the</strong> language a*b* is a sparse language, while<br />
<strong>the</strong> language that includes all words over a binary alphabet that start with <strong>the</strong> letter a<br />
(i.e. a(a+b)*) is dense. As described in (Szilard et al. 1992, Rozenberg et al. 1997) a<br />
regular language is sparse “if and only if it can be represented as a finite union <strong>of</strong><br />
regular expressions <strong>of</strong> <strong>the</strong> form xy1*z1...ym*zm, where x, y1, z1, ..., ym, zm are all strings in<br />
Σ*”. Such regular languages are also called SLRE and equivalent to bounded regular<br />
75
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
languages (Habermehl et al. 2000). Never<strong>the</strong>less, it is not difficult to see that <strong>the</strong><br />
majority <strong>of</strong> all regular languages are dense. (Flajolet 1987) demonstrated that a regular<br />
language is ei<strong>the</strong>r sparse or dense, which was recently generalized to context-free<br />
languages (Ilie 2000, Incitti 2000). While it is interesting in its own right to study such<br />
properties, (Demaine et al. 2003) could show that only sparse regular languages have<br />
<strong>the</strong> power to restrict NP complete problems such that <strong>the</strong>y are polynomially solvable. In<br />
o<strong>the</strong>r words, that <strong>the</strong> intersection <strong>of</strong> such a regular language with an NP complete<br />
problems results in a language from P. (Eisman et al. 2005) proposed ano<strong>the</strong>r<br />
application by stating that <strong>the</strong> density function could be used in some application areas<br />
such as streaming algorithms, where “rapid computation must be performed (<strong>of</strong>ten in a<br />
single pass)”.<br />
Still we feel that it is <strong>of</strong>ten more interesting to study <strong>the</strong> acceptance probability<br />
Acc(L, n) = |L ∩ Σ n | / |Σ n | <strong>of</strong> a given language ra<strong>the</strong>r than its density, that is <strong>the</strong> ratio<br />
between <strong>the</strong> number <strong>of</strong> accepted words and all possible words <strong>of</strong> a given length. As<br />
mentioned above, a(a+b)* has exponential density but it has only stable acceptance<br />
probability as Acc(a(a+b)*, n) = 0.5, which seems to describe <strong>the</strong> quantity <strong>of</strong> accepted<br />
words more appropriately. Secondly, such a different view allows us to combine both<br />
sparse and dense languages and study common properties. In (Hartwig et al. 2006a,<br />
Hartwig et al. 2006b) we could show that <strong>the</strong> acceptance probability graph is indeed<br />
expressive enough to separate complexity classes making it an acceptable candidate in<br />
above mentioned quest. The objective in using such properties to separate mentioned<br />
classes is hereby to familiarize ourselves with properties, techniques, applications and<br />
aimed at getting a better understanding <strong>of</strong> possible uses <strong>of</strong> acceptance probability<br />
graphs in higher classes. In (Hartwig et al. 2006a) we described <strong>the</strong> acceptance<br />
probability <strong>of</strong> very low regular languages and in (Hartwig et al. 2006b) we presented a<br />
pro<strong>of</strong> technique (<strong>the</strong> inflating lemma) that is powerful enough to separate many higher<br />
languages from regular languages up to star height 1 and can be compared with <strong>the</strong> well<br />
known pumping lemma (Sisper 1997) 1 .<br />
Inflating Lemma If L ∈ REG(1) and L has increasing acceptance probability <strong>the</strong>n<br />
<strong>the</strong>re exist a length n0 and natural number k ≥ 1 such that for all w ∈ L with |w| ≥ n0:<br />
w = pr ∈ L → p(Σ k )*r ⊆ L.<br />
An example application would be <strong>the</strong> following pro<strong>of</strong>.<br />
Example (MAJORITY does not belong to REG(1)) L = {w | w ∈ Σ* and w has more<br />
(or equal) a than b}∉ REG(1).<br />
Pro<strong>of</strong>. Acc(L, n) is constantly increasing; hence <strong>the</strong> inflating lemma can be applied. But<br />
none <strong>of</strong> <strong>the</strong> words accepted can be inflated. We could take any word and position and<br />
insert (or: inflate with) as many b’s as needed until <strong>the</strong> word has more b’s than a’s.<br />
□<br />
1 Although <strong>the</strong> inflating lemma seems to have only limited applicability <strong>the</strong> following work suggests<br />
that every regular language has ei<strong>the</strong>r increasing, stable or decreasing chains. Fur<strong>the</strong>rmore, if L is regular<br />
and <strong>of</strong> decreasing acceptance probability, <strong>the</strong>n <strong>the</strong> lemma could be applied to <strong>the</strong> complement <strong>of</strong> L.<br />
76
The following paper continues this work by providing an overview on <strong>the</strong> status <strong>of</strong> our<br />
work on <strong>the</strong> acceptance probability <strong>of</strong> regular and context free languages over binary<br />
alphabets claiming that both classes have acceptance probability graphs that can be split<br />
into ei<strong>the</strong>r increasing, decreasing or stable chains with a decreasing (or shrinking)<br />
difference. We think that <strong>the</strong> minimal number <strong>of</strong> mentioned chains should be studied in<br />
more detail and put into a relationship to <strong>the</strong> size <strong>of</strong> any program or machine accepting<br />
<strong>the</strong> language. Knowing that NP complete problems exhibit phase transitions in <strong>the</strong>ir<br />
acceptance probability graphs switching from difference shrinking to difference<br />
increasing sections and vice versa we believe that techniques making use <strong>of</strong> those<br />
properties may contribute to <strong>the</strong> separation <strong>of</strong> higher classes, too.<br />
2 Preliminaries<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
We use <strong>the</strong> following definitions: The alphabet for all strings is Σ = {a, b}. The length<br />
<strong>of</strong> a string w is given by |w|, all sets L1, L2,.. are considered subsets <strong>of</strong> Σ*. A regular<br />
expression e over Σ is built from all symbols in Σ, <strong>the</strong> symbol λ, <strong>the</strong> binary operators +,<br />
· and <strong>the</strong> unary operator *. The language specified by a regular expression is denoted by<br />
L(e) and is referred to as a regular language (Kleene 1956, Kulloch et al. 1943). We call<br />
a regular expression to be unambiguous (or non overlapping) if and only if its<br />
corresponding NFA is unambiguous. “An NFA is called unambiguous if for each word<br />
w <strong>the</strong>re is at most one path from <strong>the</strong> initial state to a final state that spells out w.”<br />
(Bruggemann-Klein et al. 2007, Moreira et al. 2005)) It is important to know that all<br />
regular languages are unambiguous (Giammarresi et al. 2001) and can henceforward be<br />
described by an unambiguous regular expression. sh(e) computes <strong>the</strong> star height <strong>of</strong> a<br />
regular expression and REG(1) specifies all regular languages having a star height <strong>of</strong> 1<br />
or less.<br />
As mentioned in <strong>the</strong> introduction, <strong>the</strong> density <strong>of</strong> a language counts <strong>the</strong> number <strong>of</strong><br />
accepted words per given length and is defined as<br />
dL(n) = |L ∩ Σ n |,<br />
while <strong>the</strong> acceptance probability <strong>of</strong> a language is defined as <strong>the</strong> ratio between <strong>the</strong><br />
number <strong>of</strong> accepted words dL(n) and <strong>the</strong> number <strong>of</strong> all words <strong>of</strong> a given length,<br />
Acc(L, n) = |L ∩ Σ n | / |Σ n |.<br />
3 Regular acceptance probability<br />
3.1 Low regular languages<br />
Describing <strong>the</strong> acceptance probability <strong>of</strong> a finite language is straightforward.<br />
Lemma (Finite Languages) For any finite language L: Acc(L, n) = O(0).<br />
Pro<strong>of</strong>. If L is finite <strong>the</strong>n <strong>the</strong>re exists a length after which no word is accepted by <strong>the</strong><br />
language. The acceptance probability reaches 0.<br />
77
□<br />
Regular languages which can be described by a regular expression having star height<br />
0 or at most one expression using <strong>the</strong> star operator and being <strong>of</strong> <strong>the</strong> form (a+b)* have<br />
constant acceptance probability.<br />
Lemma (Simple Regular Languages) If L = w1(a+b)*w2 with w1, w2 words <strong>the</strong>re exist<br />
a constant c such that:<br />
Acc(L, n) = O(c).<br />
Pro<strong>of</strong>. The smallest accepted word <strong>of</strong> <strong>the</strong> language L is <strong>of</strong> length |w| = |w1| + |w2|. As<br />
<strong>the</strong>re is only one such smallest word, Acc(L, |w|) = 1/2 |w| = c. For any length n greater<br />
than |w| we can say that dL(n) = 2 · dL(n-1). Henceforward <strong>the</strong> acceptance ratio<br />
remains stable.<br />
□<br />
It is <strong>the</strong>n not difficult to see that also any unification <strong>of</strong> simple regular languages (in<br />
<strong>the</strong> above sense) will again only yield a language with constant acceptance<br />
probability.<br />
1<br />
0 . 8<br />
0 . 6<br />
0 . 4<br />
0 . 2<br />
0<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
0 1 2 3 4 5 6 7 8<br />
Figure 1. Acceptance probability graphs <strong>of</strong> low regular languages.<br />
Left L1 = {a, aba} (finite), right L2 = ab(a+b)*.<br />
3.2 Regular languages having one star<br />
Languages built upon regular expressions using <strong>the</strong> star operator at most once include<br />
also languages with a decreasing acceptance probability, if <strong>the</strong> expression under <strong>the</strong> star<br />
is not entirely composed <strong>of</strong> (a+b)* expressions. The length <strong>of</strong> <strong>the</strong> expression under <strong>the</strong><br />
star defines <strong>the</strong> step width d decomposing <strong>the</strong> acceptance probability graph into d<br />
chains. We will have d-1 chains with <strong>the</strong> acceptance probability O(0) and one chain<br />
being ei<strong>the</strong>r stable or decreasing.<br />
1<br />
0 . 8<br />
0 . 6<br />
0 . 4<br />
0 . 2<br />
0<br />
1<br />
0 . 8<br />
0 . 6<br />
0 . 4<br />
0 . 2<br />
0<br />
0 1 2 3 4 5 6 7 8<br />
0 1 2 3 4 5 6 7 8<br />
Figure 2. Acceptance probability graph <strong>of</strong> L3 =b(ba)*. L3 has a step width <strong>of</strong> 2 with one chain being<br />
stable (dL3(0) = dL3(2) = ... = 0), while <strong>the</strong> remaining elements belong to a chain with its peaks<br />
constantly decreasing by ¾.<br />
78
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Lemma (Regular Languages with One Star) For any regular language L = w1w2*w3 with w1,<br />
w3 words and sh(w2) = 0 <strong>the</strong>re exists a minimal length n0 such that for all n > n0:<br />
Acc(L, n) ≤ Acc(L, n-|w2|).<br />
Pro<strong>of</strong>. The length d = |w2| is usually referred to as a step width for this language touching <strong>the</strong><br />
peaks <strong>of</strong> <strong>the</strong> acceptance probability graph. The number <strong>of</strong> accepted words <strong>of</strong> any length can be<br />
traced back to <strong>the</strong> number <strong>of</strong> accepted words with length n-|w2| as we can apply <strong>the</strong> word under<br />
<strong>the</strong> star. Henceforward, Acc(L, n) = c · Acc(L, n-|w2|). c is easily determined from w2 and <strong>the</strong><br />
fact that <strong>the</strong> chains are ei<strong>the</strong>r decreasing or stable is obvious and follows also directly from <strong>the</strong><br />
inflating lemma.<br />
□<br />
3.3 Regular languages up to star height 1<br />
Regular languages up to star height 1 provide already a wide range <strong>of</strong> different<br />
acceptance probability graphs.<br />
Lemma (Regular Languages up to Star Height 1) If L ∈ REG(1) <strong>the</strong>n <strong>the</strong>re exists<br />
constants s, um, and vm such that:<br />
dL(s) = u0,<br />
dL(s+1) = u1,<br />
... ,<br />
dL(s+m) = um<br />
dL(n) = u1dL(n-v1) + u2dL(n-v2) + .. + umdL(n-vm)<br />
Pro<strong>of</strong>. (Sketch) See (Hartwig 2008) for <strong>the</strong> complete pro<strong>of</strong>. If L ∈ REG(1) <strong>the</strong>n L has an<br />
unambiguous regular expression <strong>of</strong> <strong>the</strong> following form:<br />
L = L1 + L2 + ... + Lk<br />
where Li = Ri0Ri1...Rit with sh(Rij) ≤ 1<br />
Calculating <strong>the</strong> number <strong>of</strong> accepted words for each Li is done successively starting from<br />
left. The number <strong>of</strong> accepted words <strong>of</strong> length n for Ri0 can be determined from <strong>the</strong><br />
length's <strong>of</strong> all expressions under <strong>the</strong> star. For example, let<br />
L4 = b (aa + bbb)* b (ab + bba)* b<br />
we would have R4,0 = (aa + bbb)* and L4,1 = (ab + bba)*, which would give us for R4,0:<br />
dR4,0(3) = 1 // as |b| + |b| + |b| = 3<br />
dR4,0(n) = dR4,0(n-|aa|) + dR4,0(n-|bbb|)<br />
= dR4,0(n-2) + dR4,0(n-3)<br />
This process continues until <strong>the</strong> last expression within Li is reached consequently<br />
adding all <strong>the</strong> accepted words <strong>of</strong> formerly considered components.<br />
79
dR4,1(n) = dR4,1(n-|ab|) + dR4,1(n-|bba|) + No_acc_words_for_R4,0<br />
= dR4,1(n-2) + dR4,1(n-3) + dR4,0(n)<br />
And this would give us in our (simple) case,<br />
dL4(n) = dR4,1(n)<br />
Above result (here depending on R4,0 and R4,1) could <strong>the</strong>n be converted into a recursive<br />
formula referring only to itself and obeying <strong>the</strong> requirements. In <strong>the</strong> example case,<br />
dL4(3) = 1,<br />
dL4(n) = 2dL4(n-2) +2dL4(n-3) - dL4(n-4) - 2dL4(n-5) – dL4(n-6).<br />
0.20<br />
0.16<br />
0.12<br />
0.08<br />
0.04<br />
0.00<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
1 2 3 4 5 6 7 8<br />
1 2 3 4 5 6 7 8<br />
Figure 3. Acceptance probability graphs <strong>of</strong> higher regular languages up to star height 1. Left<br />
L4 = b (aa + bbb)* b (ab + bba)* b) from above example,<br />
right L5 =a (a+b)* + (b + ba)*) with a union operator also outside <strong>the</strong> star.<br />
To describe <strong>the</strong> acceptance probability graphs <strong>of</strong> such regular and higher languages we<br />
introduced <strong>the</strong> term <strong>of</strong> a difference shrinking chain.<br />
Definition (Difference Shrinking Chain) We call a language to have a difference<br />
shrinking chain, if <strong>the</strong>re exists a step width d and length n0 such that for all i ≥ 0:<br />
|Acc(L, n0+(i+2)d) - Acc(L, n0+(i+1)d)| ≤ |Acc(L, n0+(i+1)d) - Acc(L, n0+i·d)|<br />
Δ 1<br />
Figure 4. An example language with only difference shrinking chains. A chain is called<br />
difference shrinking if for such a chain and any length n <strong>the</strong> speed <strong>of</strong> <strong>the</strong> increase (or<br />
decrease) slows constantly, i.e. Δ2 ≤ Δ1.<br />
We call a language to be difference shrinking, if <strong>the</strong>re exists a step width d ≥ 1<br />
decomposing <strong>the</strong> acceptance probability graph into d difference shrinking chains. We<br />
call a language to be a regular increasing language, if it can be decomposed into at<br />
least one increasing and 0 or more stable chains. (Regular decreasing languages are<br />
defined in a similar way.) A language is fur<strong>the</strong>rmore called strongly increasing if<br />
80<br />
1.00<br />
0.80<br />
0.60<br />
0.40<br />
0.20<br />
0.00<br />
Δ 2<br />
□
only one increasing chain completely describes <strong>the</strong> graph. While similar concepts<br />
apply to strongly decreasing languages, such languages are also called to have<br />
monotone acceptance probability.<br />
Lemma (Star Height 1 Languages are Difference Shrinking) If L ∈ REG(1), <strong>the</strong>n L<br />
is difference shrinking.<br />
Pro<strong>of</strong>. See (Hartwig 2008).<br />
The pro<strong>of</strong> includes an algorithm that is able to compute for any given regular language<br />
a step width, which might not be <strong>the</strong> minimal step width but which is decomposing <strong>the</strong><br />
language's acceptance probability graph into such difference shrinking chains. It is not<br />
difficult to see that most <strong>of</strong> <strong>the</strong> regular languages up to star height 1 have also only<br />
monotone chains and we claim that it is also true for <strong>the</strong> languages left out.<br />
4 The acceptance probability <strong>of</strong> context free languages<br />
Calculating <strong>the</strong> number <strong>of</strong> accepted words <strong>of</strong> a regular language with a star height <strong>of</strong> 2<br />
or higher seems to require a different approach. Let L6 = (w1*w2*)*, we could <strong>the</strong>n<br />
compute accepted words <strong>of</strong> length n as follows: dL6(n) = dL6(1)*dL6(n-1) +<br />
dL6(2)*dL6(n-2) + ... A word <strong>of</strong> length n is a composition <strong>of</strong> an accepted word <strong>of</strong><br />
length c ≤ n from w1 and an accepted word <strong>of</strong> length n-c from w2. Surprisingly <strong>the</strong><br />
same approach will also work in <strong>the</strong> calculation <strong>of</strong> <strong>the</strong> acceptance probability <strong>of</strong> a<br />
context-free language as <strong>the</strong> following examples suggest.<br />
Example. Let G1 be <strong>the</strong> following grammar:<br />
S => SaN | a<br />
N => bN | bb<br />
We could compute <strong>the</strong> number <strong>of</strong> accepted words that are derived from each <strong>of</strong> <strong>the</strong> given non<br />
terminals. The rule S => SaN specifies that a terminal word can be constructed from any<br />
smaller word from S and N as long as <strong>the</strong> sum <strong>of</strong> <strong>the</strong>ir length's equals n-1. (n-1, because <strong>the</strong><br />
letter a makes up <strong>the</strong> one place.) This would bring us to <strong>the</strong> following:<br />
dS(1) = 1<br />
dN(2) = 1<br />
n_1<br />
dS(n) = !<br />
i=0<br />
dN(n) = dN(n-1)<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
_d S_i_d N _n_1_ i _<br />
Having S as <strong>the</strong> start symbol, we can calculate <strong>the</strong> number <strong>of</strong> accepted words for <strong>the</strong> given<br />
grammar with dG1(n) = dS(n). Being also a regular language (a(abbb + )*), <strong>the</strong> number <strong>of</strong><br />
accepted words could also be calculated with ds(1) = 1, ds(n) = ds(n-1) + ds(n-3)<br />
following thoughts from <strong>the</strong> previous chapters.<br />
Example. Let G2 be <strong>the</strong> following grammar:<br />
81
S => aSb | ab<br />
Although being a truly context-free language, calculating <strong>the</strong> language's density<br />
remains quite simple and suggests that <strong>the</strong> acceptance probability <strong>of</strong> all context-free<br />
languages can completely be described with a form similar to <strong>the</strong> one presented for<br />
star-height 1 languages.<br />
dS(2) = 1<br />
dS(n) = dS(n-2)<br />
Above examples and referring to <strong>the</strong> Chomsky-Schutzenberger Theorem stating that<br />
for every context free language and PDA M = (Q, Σ, Γ, δ, q0, Z0, F) <strong>the</strong>re is a regular<br />
language R, <strong>the</strong> Dyck set D2 and two homomorphisms g, h such that L(M) = h(g −1 (D2)<br />
∩ R) we <strong>the</strong>n claim that context-free languages are equally difference shrinking and<br />
monotone.<br />
While we can foresee challenges in <strong>the</strong> use <strong>of</strong> our results related to higher<br />
classes in <strong>the</strong> construction <strong>of</strong> new pro<strong>of</strong> techniques, <strong>the</strong> long outstanding P vs NP<br />
problem should provide enough incentives to make an attempt. The phase transition<br />
that such NP complete problems exhibit, is only possible because <strong>the</strong> language's<br />
acceptance probability switches from sections being difference shrinking to difference<br />
increasing as shown in <strong>the</strong> example below.<br />
Figure 6. Example languages from NP complete having acceptance probability graphs with<br />
sections <strong>of</strong> increasing difference (some <strong>of</strong> <strong>the</strong>m indicated).<br />
We think that finding <strong>the</strong> minimal step width for a given language would help in <strong>the</strong><br />
search for new pro<strong>of</strong> techniques. As mentioned earlier, <strong>the</strong> minimal step width should<br />
indicate more properties related to <strong>the</strong> complexity <strong>of</strong> accepting <strong>the</strong> language.<br />
5 Conclusions<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
We have given a first overview related to a new attempt in characterizing classes from<br />
<strong>the</strong> Chomsky Hierarchy using properties derived from <strong>the</strong> language's acceptance<br />
probability graphs. Regular languages up to star height 1 have <strong>the</strong>refore graphs that<br />
can be split into difference shrinking chains. Current research suggests that this holds<br />
also for context-free languages. Knowing that NP complete languages usually have<br />
graphs performing a phase transition between difference shrinking and difference<br />
increasing sections, we recommended fur<strong>the</strong>r work. Especially <strong>the</strong> problem <strong>of</strong> finding<br />
82
<strong>the</strong> minimal step width seems to be crucial in <strong>the</strong> construction <strong>of</strong> new pro<strong>of</strong><br />
techniques.<br />
Class Acceptance Probability Properties<br />
finite Acc(L, n) = 0<br />
simple<br />
regular<br />
Acc(L, n) = 2 dL(n-1) / 2 n<br />
one star Acc(L, n) = c dL(n-d) / 2 n<br />
star height 1 Acc(L, n) = [ u1dL(n-v1)<br />
+ u2dL(n-v2)<br />
+ ...<br />
+ umdL(n-vm) ] / 2 n<br />
regular,<br />
n_d<br />
context-free Acc(L, n) = ! _d S_i_d N _n_ d _i_...._/ 2<br />
i=0<br />
n<br />
context<br />
sensitive<br />
convergent to 0<br />
convergent to a<br />
constant (stable)<br />
as above & at most one<br />
decreasing chain<br />
monotone 2 , difference<br />
shrinking chains<br />
monotone, difference<br />
shrinking chains 3<br />
Acc(L, n) = ? ? as above & difference<br />
increasing chains, non<br />
monotonic chains<br />
Table 1. Acceptance probability <strong>of</strong> different classes from <strong>the</strong> Chomsky Hierarchy (state <strong>of</strong><br />
<strong>the</strong> art, <strong>the</strong> class <strong>of</strong> context free languages is currently looked at).<br />
Acknowledgments<br />
We'd like to thank <strong>the</strong> anonymous referees for <strong>the</strong>ir comments.<br />
References<br />
H. Buhrmann & L. Torenvliet (2005). 'A Post's Program for Complexity Theory', BEATCS 85<br />
(pp. 41-51)<br />
P. Clote & E. Kranakis (2002). 'Boolean Functions and Computation Models', Springer,<br />
M. Hartwig et al. (2006a). 'In Search <strong>of</strong> a New Pro<strong>of</strong> Technique', M2USIC06<br />
M. Hartwig et al. (2006b). 'Proving Non Regularity using Acceptance Probability Techniques',<br />
CSCM2006<br />
A. Bruggemann-Klein & R. Mesing. (2007). 'Regular Expressions into Finite Automata,<br />
http://webcourse.cs.technion.ac.il/236826/Spring2005/ho/WCFiles/RegularExpressions into Finite<br />
Automata.doc<br />
D. Giammarresi, R. Montalbano, D. Wood (2001). 'Block-Deterministic Languages',<br />
ICTCS01<br />
M. Sisper (1997). 'Introduction to <strong>the</strong> Theory <strong>of</strong> Computation', PWS Publishing Company (pp.<br />
2 Claimed for some languages.<br />
3 Claimed.<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
83
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
63ff.)<br />
O. Dubois et al. (2000). 'Typical Random 3-SAT Formulae and <strong>the</strong> Satisfiability Threshold',<br />
SODA '00 (pp. 126-127)<br />
D. Achlioptas et al. (2001). 'The Phase Transition in 1-in-k SAT and NAE 3-SAT', SODA '01<br />
(pp. 721-722)<br />
L. Kirousis et al. (1998). 'Approximating <strong>the</strong> unsatisfiability threshold <strong>of</strong> random formulas',<br />
Random Structures and Algorithms 12(3) (pp. 253-269)<br />
D. Achlioptas et al. (2001). 'A Sharp Threshold yields in Pro<strong>of</strong> Complexity Yields a<br />
Lower Bound for Satisfiability Search', Journal <strong>of</strong> Comp. & Sys. Sci. 68 (2)<br />
M. Hartwig (2008), 'Regular Languages up to Star Height 1 have Difference Shrinking<br />
Acceptance Probability', TMFCS-08<br />
M. Bodirsky et al. (2004), 'Efficiently computing <strong>the</strong> density <strong>of</strong> regular languages', <strong>Proceedings</strong><br />
<strong>of</strong> Latin American INformatics (LATIN'04), pages 262-270, Buenos Aires<br />
M.P. Schützenberger (1962), 'Finite counting automata', Information and Control 5(2), 91-107<br />
S. Eilenberg (1974), 'Auomata, Languages, and Machines', Academic Press, Inc., Orlando,<br />
Florida, USA<br />
A. Szilard et al.(1992), 'Characterizing Regular Languages with Polynomial Densities', Lecture<br />
Notes in Computer Science, Volume 629, Springer, 494-503<br />
G. Rozenberg et al. (1997), 'Handbook <strong>of</strong> Formal Languages', Chapter 2: Regular Languages,<br />
Springer<br />
E. D. Demaine et al. (2003), 'On Universally Easy Classes for NP-complete Problems',<br />
Theoretical Computer Science, Vol. 304, pages 471-476<br />
D. Krieger et al. (2007), 'Finding <strong>the</strong> Growth Rate <strong>of</strong> a Regular Language in Polynomial Time',<br />
CoRR abs/0711.4990<br />
P. Habermehl et al. (2000), 'A Note on SLRE', http://citeseer.ist.psu.edu/375870.html<br />
P. Flajolet (1987), 'Analytic Models and Ambiguity <strong>of</strong> Context-Free Languages', TCS, 49:283-<br />
309<br />
L. Ilie et al. (2000), 'A Characterization <strong>of</strong> Polyslender Context-Free Languages', Theoret.<br />
Informatics Appl., 34(1):77-86<br />
R. Incitti (2000), 'The Growth Function <strong>of</strong> Context-Free Languages', Theoretical Computer<br />
Science, 255:601-605<br />
G. Eisman et al. (2005), 'Approximate Recognition <strong>of</strong> Non-regular Languages by Finite<br />
Automata', <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Twenty-Eighth Australasian Computer Science Conference<br />
(ACSC2005), Newcastle, Australia<br />
S. Kleene (1956), 'Representation <strong>of</strong> events in nerve nets and finite automata', Automata<br />
Studies, Princeton University Press, Princeton, USA, 3-42<br />
W. S. Kulloch et al. (1943), 'A logical calculus <strong>of</strong> <strong>the</strong> ideas immanent in <strong>the</strong> nervous activity',<br />
Bull. Math. Biophys, 5:115-<strong>13</strong>3<br />
N. Moreira et al. (2005), 'On <strong>the</strong> Density <strong>of</strong> Languages Representing Finite Set Partitions',<br />
Journal <strong>of</strong> Integer Sequences, Vol. 8<br />
84
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
DISTANCE EFFECTS IN SENTENCE PROCESSING<br />
Simon Hopp<br />
University <strong>of</strong> Konstanz<br />
Abstract. This paper reports results from two experiments investigating distance<br />
effects in sentence processing. It is well known that <strong>the</strong> processing difficulty <strong>of</strong><br />
dependency relation increases with <strong>the</strong> distance between <strong>the</strong> two items concerned.<br />
The paper addresses <strong>the</strong> question what exactly determines ‘distance’: Time or<br />
amount <strong>of</strong> linguistic material between <strong>the</strong> first and <strong>the</strong> second item. Experiment 1<br />
disentangles <strong>the</strong>se factors and suggests that linguistic material is <strong>the</strong> source <strong>of</strong><br />
difficulty. Experiment 2 investigates <strong>the</strong> role <strong>of</strong> <strong>the</strong> characteristics <strong>of</strong> that<br />
intervening material. The logic <strong>of</strong> this experiment is based on Gibson’s (2000)<br />
claim that <strong>the</strong> ease <strong>of</strong> integrating a word into <strong>the</strong> CPPM decreases with <strong>the</strong> number<br />
<strong>of</strong> newly introduced discourse referents. In particular, experiment 2 asks whe<strong>the</strong>r<br />
adverbials which do not introduce new discourse referents have <strong>the</strong> same effect.<br />
The results indicate that while intervening discourse referents elicit <strong>the</strong> expected<br />
effect, adverbials do not show any effect at all.<br />
1 Working Memory and Sentence Processing<br />
In cognitive science <strong>the</strong>re is a broad agreement that a certain kind <strong>of</strong> store is necessary<br />
for all kinds <strong>of</strong> complex cognitive tasks such as mental arithmetic or language<br />
processing. The following example (cf. Gibson 2000) illustrates <strong>the</strong> need for a short<br />
term store in sentence processing.<br />
(1) The reporter [that <strong>the</strong> senator attacked] admitted <strong>the</strong> error.<br />
In (1) <strong>the</strong> short term store (or working memory) has to keep <strong>the</strong> determiner phrase (DP)<br />
<strong>the</strong> reporter active over <strong>the</strong> period <strong>of</strong> time in which <strong>the</strong> relative clause is processed, to<br />
ensure that <strong>the</strong> human sentence parser is able to link <strong>the</strong> DP to <strong>the</strong> verb admitted and<br />
<strong>the</strong>n to check <strong>the</strong> grammatical features correctly. Since sentences can contain several<br />
dependencies between items and <strong>the</strong>se items can be separated by fur<strong>the</strong>r items, storing<br />
linguistic information over a short time is a basic requirement for sentence processing.<br />
As has long been noticed in linguistic <strong>the</strong>ory, sentences like (1) <strong>of</strong>ten lead to processing<br />
difficulties (e.g. Just & Carpenter 1992). One <strong>of</strong> <strong>the</strong> reasons for this fact is <strong>the</strong> distance<br />
between <strong>the</strong> linguistic items dependent on each o<strong>the</strong>r. It seems that integrating a word w<br />
into <strong>the</strong> CPPM (Current Partial Phrase Marker) is <strong>of</strong>ten adversely affected by <strong>the</strong><br />
distance between w and information within <strong>the</strong> CPPM necessary for integrating w.<br />
However, it is still unclear why prior pieces in <strong>the</strong> CPPM are difficult to retrieve at later<br />
points. There are two prominent mechanisms that are said to be responsible for<br />
forgetting over a short term: The amount <strong>of</strong> time that passes between two items and<br />
linguistic material that has to be processed between two items. According to time-based<br />
decay earlier information might already have faded away at <strong>the</strong> point when it is needed<br />
again. In current models <strong>of</strong> working memory involvement in sentence processing, timebased<br />
decay ei<strong>the</strong>r plays a decisive role (e.g., Lewis & Vasishth 2005) or is taken as one<br />
possible candidate for contributing to <strong>the</strong> cost <strong>of</strong> integrating a word into <strong>the</strong> sentence<br />
85
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(Levy et al. 2007). The alternatives to <strong>the</strong>ories <strong>of</strong> time-based decay are event-based<br />
models (cf. Lewandowsky et al. 2004). Those models admit that forgetting in working<br />
memory is observed over time, but <strong>the</strong>y predict that time is not <strong>the</strong> crucial factor for this<br />
phenomenon. Some event-based models argue that it is ra<strong>the</strong>r interference <strong>of</strong> linguistic<br />
material that leads to processing difficulties (eg. Nairne 1990) Items that have already<br />
been processed may be forgotten by <strong>the</strong> time <strong>the</strong>y are needed again, because new<br />
incoming material interferes. Clarifying <strong>the</strong> role <strong>of</strong> time-based decay versus<br />
interference-based forgetting is complicated because normally amount <strong>of</strong> linguistic<br />
material and amount <strong>of</strong> time are confounded.<br />
2 Case Checking as a Test Case<br />
In this paper, I present two experiments that were run to investigate <strong>the</strong> nature <strong>of</strong><br />
forgetting in working memory. 1 The focus was on <strong>the</strong> process <strong>of</strong> linking and checking<br />
in German verb-final clauses adhering to <strong>the</strong> scheme in (2). When integrating <strong>the</strong> verb<br />
in clause-final position, <strong>the</strong> case <strong>of</strong> NP must be retrieved until <strong>the</strong> end <strong>of</strong> <strong>the</strong> sentence in<br />
order to check it against <strong>the</strong> case feature <strong>of</strong> <strong>the</strong> verb.<br />
(2) .. dass NP[case: X] … {distance} … verb[case: Y]<br />
An example <strong>of</strong> a verb-final clause in German, as it was used in <strong>the</strong> following<br />
experiments, is given in (3).<br />
(3) Ich glaube, dass die <strong>Student</strong>in das wichtige Buch gelesen hat.<br />
I think that <strong>the</strong> student(fem) <strong>the</strong> important book read has<br />
‘I think, that <strong>the</strong> student has read <strong>the</strong> important book.’<br />
The auxiliary in clause-final position hat asks for nominative case in <strong>the</strong> NP die<br />
<strong>Student</strong>in. The memory trace <strong>of</strong> <strong>the</strong> case feature <strong>of</strong> this NP has to be memorized over a<br />
certain distance until <strong>the</strong> auxiliary hat is reached. The human sentence parser is <strong>the</strong>n<br />
able to link <strong>the</strong> two dependent items - NP and verb - and to check <strong>the</strong> case features <strong>of</strong><br />
both items. If, however, <strong>the</strong> distance between <strong>the</strong> verb and <strong>the</strong> related NP is too long<br />
than working memory is unable to keep <strong>the</strong> memory trace until <strong>the</strong> end <strong>of</strong> <strong>the</strong> sentence.<br />
In this case processing difficulties arise which can be measured experimentally.<br />
As mentioned above, amount <strong>of</strong> linguistic material and amount <strong>of</strong> time are<br />
normally confounded. In <strong>the</strong> first experiment <strong>the</strong> two factors were disentangled to<br />
investigate <strong>the</strong>ir respective impact on <strong>the</strong> human sentence parser independently. This<br />
builds on related work by Lewandowsky et al (2004) and Saito & Miyake (2004).<br />
The second experiment focused on sentence complexity according to <strong>the</strong><br />
Dependency Locality Theory (DLT) (Gibson 2000). The DLT assumes that <strong>the</strong> costs <strong>of</strong><br />
integrating a word w increase with <strong>the</strong> number <strong>of</strong> new discourse referents intervening<br />
between w and information needed to integrate w. For case-checking in German this<br />
prediction has not been tested so far.<br />
3 Experiment 1: Time-Based Decay versus Interference<br />
As shown in (3), <strong>the</strong> issue <strong>of</strong> forgetting in working memory was addressed by<br />
investigating <strong>the</strong> process <strong>of</strong> case-checking during <strong>the</strong> parsing <strong>of</strong> German verb-final<br />
clauses. By integrating <strong>the</strong> verb in clause-final position, <strong>the</strong> case <strong>of</strong> NP must be<br />
1 The experiments were part <strong>of</strong> a bigger project on sentence processing toge<strong>the</strong>r with Markus Bader.<br />
86
etrieved in order to check it against <strong>the</strong> case feature <strong>of</strong> <strong>the</strong> verb. If <strong>the</strong> intervening<br />
distance is to long, essential information about case features will be lost at a later point<br />
when it is needed again. To be able to investigate <strong>the</strong> nature <strong>of</strong> ‘distance’ <strong>the</strong> crucial<br />
factors have to be disentangled. This is achieved by manipulating <strong>the</strong> factors<br />
independently. First <strong>of</strong> all, a procedure was chosen that allowed to present <strong>the</strong> stimuli<br />
experimenter-paced in a non-cumulative word-by-word fashion (for details see section<br />
Procedure). Two different presentation rates, one for a slow and one for a fast<br />
presentation, were preset. Second, <strong>the</strong> intervening material between <strong>the</strong> related items<br />
was manipulated. Sentences as in (3) were created in a long and in a short version.<br />
Additional adverbials (e.g. ‘für die letzte Prüfung im Mai’) were inserted for <strong>the</strong> long<br />
versions, as can be seen in (4):<br />
(4) Ich glaube, dass die <strong>Student</strong>in (für die letzte Prüfung im Mai)<br />
I think that <strong>the</strong> student(fem) ( for <strong>the</strong> last exam in may)<br />
das wichtige Buch gelesen hat.<br />
<strong>the</strong> important book read has<br />
‘I think, that <strong>the</strong> student has read <strong>the</strong> important book.’<br />
A cross-combination <strong>of</strong> <strong>the</strong> two independently manipulated factors led to four different<br />
conditions that were presented (see Figure 1). Sentence (a) is a short sentence presented<br />
in <strong>the</strong> fast presentation rate (short-fast). Sentence (b) contains additional material and is<br />
also presented in <strong>the</strong> fast pace (long-fast). Sentences (c) and (d) are both presented in<br />
<strong>the</strong> slow pace. Note that (c) does not contain any additional material (short-slow),<br />
whereas (d) contains an additional adverbial (long-slow). Note especially that<br />
conditions (b) and (c) differ in <strong>the</strong> amount <strong>of</strong> intervening material, but - due to <strong>the</strong><br />
different presentation rates - <strong>the</strong>y are matched in <strong>the</strong> amount <strong>of</strong> time.<br />
a<br />
b<br />
c<br />
d<br />
NP1 das wichtige Buch<br />
V AUX<br />
NP1<br />
NP1<br />
NP1<br />
Figure 1: Presentation Time <strong>of</strong> all 4 Sentence Types <strong>of</strong> Experiment 1<br />
This design allows analyzing <strong>the</strong> impact <strong>of</strong> both factors independently. As this experiment<br />
partly builds on work <strong>of</strong> Lewandowsky et al. (2004) <strong>the</strong> terminology for <strong>the</strong> crucial factors will<br />
be adopted and labeled Time (amount <strong>of</strong> time) and Event (intervening material).<br />
Participants and Material<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
für die letzte Prüfung im Mai das wichtige Buch<br />
das wichtige Buch<br />
für die letzte Prüfung im Mai das wichtige Buch<br />
V AUX<br />
V AUX<br />
V AUX<br />
16 students <strong>of</strong> <strong>the</strong> University <strong>of</strong> Konstanz participated for course credit or payment. All<br />
participants were native speakers <strong>of</strong> German and naive with respect to <strong>the</strong> purpose <strong>of</strong> <strong>the</strong><br />
experiment.<br />
128 sentences were created, each in 16 versions according to <strong>the</strong> factors Voice (active<br />
versus passive), Status (grammatical versus ungrammatical), Time (fast versus slow) and Event<br />
(long versus short). Table 1 shows a Sample Stimuli Item <strong>of</strong> Experiment 1.<br />
87
Table 1. Sample Stimuli Item <strong>of</strong> Experiment 1<br />
Intervening material for all „(adverbial)“: ([…] für die letzte Prüfung im August […])<br />
([…] for <strong>the</strong> last exam in august […])<br />
(Active/ Grammatical)<br />
Der Dozent h<strong>of</strong>ft, dass die <strong>Student</strong>in (adverbial) das wichtige Buch gelesen hat<br />
<strong>the</strong> lecturer hopes that <strong>the</strong>(nom) student(fem) (adverbial) <strong>the</strong> important book read has<br />
'The lecturer hopes, that <strong>the</strong> student has read <strong>the</strong> important book (for <strong>the</strong> last exam in august).'<br />
(Passive/ Grammatical)<br />
Der Dozent h<strong>of</strong>ft, dass der <strong>Student</strong>in (adverbial) das wichtige Buch besorgt wurde<br />
<strong>the</strong> lecturer hopes that <strong>the</strong>(dat) student(fem) (adverbial) <strong>the</strong> important book obtained was<br />
'The lecturer hopes, that <strong>the</strong> important book (for <strong>the</strong> last exam in august) was obtained for <strong>the</strong> student.'<br />
(Active/ Ungrammatical)<br />
Der Dozent h<strong>of</strong>ft, dass der <strong>Student</strong>in (adverbial) das wichtige Buch gelesen hat<br />
<strong>the</strong> lecturer hope that <strong>the</strong>(dat) student(fem) (adverbial) <strong>the</strong> important book read has<br />
'The lecturer hopes, that <strong>the</strong> student has read <strong>the</strong> important book (for <strong>the</strong> last exam in august).'<br />
(Passive/Ungrammatical)<br />
Der Dozent h<strong>of</strong>ft, dass die <strong>Student</strong>in (adverbial) das wichtige Buch besorgt wurde.<br />
<strong>the</strong> lecturer hopes that <strong>the</strong>(nom) student(fem) (adverbial) <strong>the</strong> important book obtained was.<br />
'The lecturer hopes, that <strong>the</strong> important book (for <strong>the</strong> last exam in august) was obtained for <strong>the</strong> student.'<br />
The length <strong>of</strong> intervening material and presentation rate were manipulated<br />
independently. The factor Event (intervening material) was varied by adding adverbials<br />
<strong>of</strong> six words for <strong>the</strong> long version (cf. Table 1). The factor Time (presentation rate) was<br />
ei<strong>the</strong>r slow (188ms/word + 25ms/character) or fast 369ms/word + 44ms/character).<br />
Procedure<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In both experiments <strong>the</strong> speeded grammaticality judgment method was used. In this<br />
procedure sentences are presented in a word-by-word fashion. Each trial begins with <strong>the</strong><br />
presentation <strong>of</strong> <strong>the</strong> words "Bitte Leertaste drücken" ("Please Press Spacebar") to start<br />
<strong>the</strong> sentence. After pressing <strong>the</strong> spacebar, a fixation point appears in <strong>the</strong> center <strong>of</strong> <strong>the</strong><br />
screen for 1050ms. Thereafter <strong>the</strong> sentence is shown word by word in <strong>the</strong> center <strong>of</strong> <strong>the</strong><br />
screen. Immediately after <strong>the</strong> last word <strong>the</strong> participants are asked to judge <strong>the</strong><br />
grammaticality <strong>of</strong> <strong>the</strong> sentence as fast as possible by pressing one <strong>of</strong> two response<br />
buttons. Type <strong>of</strong> response and response time are recorded automatically. If a subject<br />
does not give a response within 2000ms after <strong>the</strong> last word appeared, <strong>the</strong> words "zu<br />
langsam" ("too slow") are shown and <strong>the</strong> trial is finished. In both experiments each<br />
subject received at least 10 practice items before <strong>the</strong> experimental sessions started.<br />
In experiment 1, all sentences were presented in two separate blocks in two<br />
different paces (according to <strong>the</strong> manipulations <strong>of</strong> <strong>the</strong> factor Time in a slow and in a fast<br />
pace). Every participant had to fulfill <strong>the</strong> experiment in both paces within one<br />
experimental session. Each block contained half <strong>of</strong> <strong>the</strong> entire set <strong>of</strong> sentences. Therefore<br />
each participant saw half <strong>of</strong> <strong>the</strong> sentences in <strong>the</strong> slow condition and <strong>the</strong> o<strong>the</strong>r half in <strong>the</strong><br />
fast condition. The order <strong>of</strong> <strong>the</strong> two blocks alternated between participants. The<br />
sentences were presented with filler sentences. The proportion <strong>of</strong> experimental<br />
sentences to filler sentences was 1:1. Filler sentences covered a range <strong>of</strong> various<br />
constructions and were half grammatical and half ungrammatical. Most <strong>of</strong> <strong>the</strong> fillers<br />
served as experimental items in two o<strong>the</strong>r experiments.<br />
88
Results<br />
The percentages <strong>of</strong> correct judgments in Experiment 1 are shown in Figure 2<br />
(grammatical conditions) and Figure 3 (ungrammatical conditions). Statistical analyses<br />
were conducted with subject as <strong>the</strong> random factor (F1) and with sentences as <strong>the</strong><br />
random factor (F2). The following main effects occurred: First, a significant effect <strong>of</strong><br />
<strong>the</strong> factor Event is obtained (F1(1,15)=22.30, p
4 Experiment 2: The Role <strong>of</strong> Complexity in Sentence Parsing<br />
Experiment 2 investigated <strong>the</strong> role <strong>of</strong> sentence complexity according to Gibson’s<br />
Distance Locality Theory (Gibson 2000) in <strong>the</strong> context <strong>of</strong> verb-final clauses in German.<br />
The DLT is a resource-driven model <strong>of</strong> language processing. The model assumes two<br />
major kinds <strong>of</strong> resource use. First, integrating a new word w into <strong>the</strong> current structure<br />
causes some cost (integration cost). Second, keeping <strong>the</strong> structure in memory also<br />
causes a certain kind <strong>of</strong> cost (storage cost). A central idea <strong>of</strong> <strong>the</strong> DLT is locality. Gibson<br />
assumes that <strong>the</strong> cost <strong>of</strong> integrating a new element into <strong>the</strong> current structure depends on<br />
<strong>the</strong> distance between <strong>the</strong> new element and <strong>the</strong> related element already processed. The<br />
assumption is that <strong>the</strong> distance is defined by <strong>the</strong> amount <strong>of</strong> discourse referents that are<br />
newly introduced between <strong>the</strong> items concerned.<br />
If this is so, an interesting question is whe<strong>the</strong>r material not introducing a new<br />
discourse referent also affects <strong>the</strong> ease <strong>of</strong> integrating w into <strong>the</strong> CPPM. This was tested<br />
in experiment 2 by <strong>the</strong> means <strong>of</strong> adverbial material. The crucial factors <strong>of</strong> experiment 2<br />
<strong>the</strong>refore are: Adverbial and Discourse Referents (DR).<br />
Participants and Material<br />
16 students <strong>of</strong> <strong>the</strong> University <strong>of</strong> Konstanz participated for course credit or payment. All<br />
participants were native speakers <strong>of</strong> German and naive with respect to <strong>the</strong> purpose <strong>of</strong><br />
<strong>the</strong> experiment.<br />
We created 128 sentences, each in 16 versions according to <strong>the</strong> factors Voice<br />
(active versus passive), Status (grammatical versus ungrammatical), Adverbial (NoAdv<br />
versus Adv) and Discourse Referents (0 DR versus 2 DR).<br />
Table 2 shows a Sample Stimuli <strong>of</strong> Experiment 2.<br />
Ich vermute, dass […]<br />
I guess , that […]<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Table 2. Sample Stimuli Item <strong>of</strong> Experiment 2<br />
(NoAdv. / 0 DR)<br />
[…] meine Pr<strong>of</strong>essorin, die sehr gut erklärt, eine freie Stelle ausgeschrieben hat.<br />
[…] my pr<strong>of</strong>essor(fem) who very good explains a vacant position <strong>of</strong>fered has<br />
‘I guess that my pr<strong>of</strong>essor, who explains very well, has <strong>of</strong>fered a vacant position.’<br />
(Adv. / 0 DR)<br />
[…] meine Pr<strong>of</strong>essorin, die immer wieder sehr gut erklärt, eine freie Stelle ausgeschrieben hat.<br />
[…] my pr<strong>of</strong>essor(fem) who again and again very good explains a vacant position <strong>of</strong>fered has<br />
‘I guess that my pr<strong>of</strong>essor, who explains very well repeatedly, has <strong>of</strong>fered a vacant position.’<br />
(NoAdv. / 2 DR)<br />
[…] meine Pr<strong>of</strong>essorin, die dem <strong>Student</strong>en das Skript ausleiht, eine freie Stelle ausgeschrieben hat.<br />
[…] my pr<strong>of</strong>essor(fem) who <strong>the</strong> student(dat) <strong>the</strong> script lends a vacant position <strong>of</strong>fered has<br />
‘I guess that my pr<strong>of</strong>essor, who lends <strong>the</strong> script to <strong>the</strong> student, has <strong>of</strong>fered a vacant position.’<br />
(Adv. / 2 DR)<br />
[…] meine Pr<strong>of</strong>essorin, die dem <strong>Student</strong>en doch noch das Skript ausleiht, eine freie Stelle<br />
[…] my pr<strong>of</strong>essor(fem) who <strong>the</strong> student(dat) eventually <strong>the</strong> script lends a vacant position<br />
ausgeschrieben hat.<br />
<strong>of</strong>fered has<br />
‘I guess that my pr<strong>of</strong>essor, who eventually lends <strong>the</strong> script to <strong>the</strong> student, has <strong>of</strong>fered a vacant position.’<br />
90
The complexity <strong>of</strong> relative clauses was manipulated in a two-factorial way. First,<br />
<strong>the</strong> relative clause contains ei<strong>the</strong>r 0 or 2 new NP-related discourse referents. The event<br />
referent introduced by <strong>the</strong> verb is ignored as it is introduced in all four relative clause<br />
types. Second, <strong>the</strong> relative clause does or does not contain an additional adverbial <strong>of</strong><br />
two words. Both factors were crossed. The resulting conditions are shown below in<br />
Figure 4. Relative-clause complexity increases from (a) to (d). Fur<strong>the</strong>rmore, (b) and (c)<br />
are matched according to <strong>the</strong> number <strong>of</strong> words <strong>the</strong>y contain, but <strong>the</strong>y differ in <strong>the</strong>ir<br />
internal structure. As one can see below, (b) contains additional adverbials <strong>of</strong> two words<br />
(“immer wieder”), but does not include any newly introduced discourse referents.<br />
Sentence type (c), on <strong>the</strong> o<strong>the</strong>r hand, only introduces two new discourse referents<br />
(“<strong>Student</strong>en” and “Skript”).<br />
Procedure<br />
In this experiment <strong>the</strong> same procedure, <strong>the</strong> speeded grammaticality judgment task, as in<br />
experiment 1 was used. In experiment 2 no manipulation <strong>of</strong> <strong>the</strong> presentation time was<br />
accomplished. The experiment was conducted in a one block. A presentation rate <strong>of</strong><br />
252ms per word + additional 28ms per letter was used.<br />
a<br />
b<br />
c<br />
d<br />
Results<br />
NP1 die sehr gut erklärt<br />
NP2 V<br />
NP1<br />
NP1<br />
NP1<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
die immer wieder sehr gut erklärt<br />
die dem <strong>Student</strong>en das neue Skript ausleiht<br />
die dem <strong>Student</strong>en doch noch das neue Skript ausleiht<br />
Figure 4: Length <strong>of</strong> Relative Clauses (According to <strong>the</strong> Number <strong>of</strong> Words)<br />
NP2 V<br />
NP2 V<br />
NP2 V<br />
The percentages <strong>of</strong> correct judgments in Experiment 2 are provided in Figure 5 (for<br />
grammatical conditions) and Figure 6 (for ungrammatical conditions). Statistical<br />
analyses revealed main effects for <strong>the</strong> factors Status (F1(1,15)= 26.57, p < .001;<br />
F2(1,15)= 2<strong>13</strong>.43, p
Percentage Correct (%)<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
92<br />
89<br />
noNP<br />
noAdv<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
86<br />
83<br />
2NPs<br />
noAdv<br />
91<br />
87<br />
noNP<br />
Adverbial<br />
Active Passive<br />
79 78<br />
2NPs<br />
Adverbial<br />
Figure 5. Percentages <strong>of</strong> correct judgments for<br />
Grammatical Sentences<br />
5 General Discussion<br />
Percentage Correct (%)<br />
100<br />
90<br />
80<br />
70<br />
60<br />
50<br />
40<br />
61 60<br />
noNP<br />
n o A d v<br />
56<br />
51<br />
2NPs<br />
n o A d v<br />
68<br />
58<br />
noNP<br />
A d v<br />
Active Passive<br />
56<br />
51<br />
2NPs<br />
A d v<br />
Figure 6. Percentages <strong>of</strong> correct judgments<br />
for Ungrammatical Sentences<br />
In Experiment 1, <strong>the</strong> factors Time and Event were disentangled to investigate <strong>the</strong> nature <strong>of</strong><br />
distance in sentence processing. The experiment had a clear-cut outcome for both factors.<br />
First, <strong>the</strong> factor Event clearly affects sentence processing. This especially can be seen in<br />
ungrammatical passive sentences. In that condition a decrease in <strong>the</strong> percentages <strong>of</strong> correct<br />
judgments <strong>of</strong> about 14% between long compared to short sentences can be found. As earlier<br />
experimental work has shown, ungrammatical passive sentences are always judged less<br />
reliably (cf. Bader & Bayer 2006). More material to process increases processing difficulty<br />
immensely, which results in a higher error rate <strong>of</strong> long sentences compared to short<br />
sentences. Second, <strong>the</strong> factor Time does not seem to affect sentence processing as predicted<br />
by time-based models. For short sentences, <strong>the</strong> slow presentation rate resulted in better<br />
performance than <strong>the</strong> fast presentation rate. This goes against <strong>the</strong> predictions. Long ra<strong>the</strong>r<br />
than short time intervals should affect sentence processing adversely (note that <strong>the</strong> fast<br />
presentation rate was not too fast, as can be seen in high percentages <strong>of</strong> correct judgments<br />
with up to 92%). For long sentences <strong>the</strong> presentation rate had no effect at all. The results<br />
suggest that time-based decay does not contribute to <strong>the</strong> difficulty <strong>of</strong> integrating a new word<br />
into <strong>the</strong> CPPM.<br />
Experiment 2 has two major results. First, confirming prior results, <strong>the</strong> number <strong>of</strong> new<br />
discourse referents had a major effect. Sentences containing two new discourse referents in<br />
<strong>the</strong> relative clause received significantly more judgment errors. Second, an intervening<br />
adverbial had no effect at all. This clearly can be found in <strong>the</strong> sentences which were equal in<br />
length according to <strong>the</strong> number <strong>of</strong> words <strong>the</strong>y contain, but which were manipulated with<br />
different material. Sentences that contained new discourse referents but no additional<br />
adverbial received substantially more judgment errors than sentences containing <strong>the</strong> same<br />
amount <strong>of</strong> words, but only containing additional adverbials. The results suggest that <strong>the</strong><br />
pure linear distance between w and information necessary to integrate w cannot be <strong>the</strong><br />
source <strong>of</strong> <strong>the</strong> observed difficulty. In particular, finding no differences between (a) versus (b)<br />
and (c) versus (d), but a substantial difference between (b) and (c) (cf. Figure 4) argues<br />
against <strong>the</strong>ories assuming that time or pure length - not introducing a new discourse referent<br />
- leads to forgetting in working memory. The results <strong>the</strong>refore support <strong>the</strong> Dependency<br />
Locality Theory <strong>of</strong> Gibson (2000).<br />
92
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
M. Bader & J. Bayer (2006). Case and Linking in Language Comprehension. Evidence<br />
from German, Springer, Dordrecht.<br />
E. Gibson (2000). ‘The dependency locality <strong>the</strong>ory: A distance-based <strong>the</strong>ory <strong>of</strong><br />
linguistic complexity’. In A. Marantz et al. (eds.), Image Languae, Brain. MIT Press.<br />
S. Hopp & M. Bader (in prep.). ‘Forgetting in Working Memory: Interference versus<br />
Decay? Evidence from German Sentence Processing’.<br />
M. A. Just & P. A. Carpenter (1992). ‘A Capacity Theory <strong>of</strong> Comprehension: Individual<br />
Differences in Working Memory’. Psychological Review, vol. 99, no.1.<br />
R. L. Lewis & S. Vasishth. (2005). ‘An activation-based model <strong>of</strong> sentence processing<br />
as skilled memory retrieval’. Cognitive Science 29.<br />
R. Levy et al. (2007). ‘The syntactic complexity <strong>of</strong> Russian relative clauses’. Paper<br />
presented at <strong>the</strong> Annual Conference on Human Sentence Processing – CUNY 2007,<br />
San Diego, CA.<br />
S. Lewandowsky et al. (2004). ‘Time does not cause forgetting in short-term serial<br />
recall’. Psychonomic Bulletin & Review 11.<br />
J. S. Nairne (1990). ‘A feature model <strong>of</strong> immediate memory’. Memory & Condition, 18<br />
Saito, S., & Miyake, A. (2004). On <strong>the</strong> nature <strong>of</strong> forgetting and <strong>the</strong> processing-storage<br />
relationship in reading span performance. Journal <strong>of</strong> Memory and Language, 20.<br />
93
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
94
A SALIENCE-DRIVEN APPROACH TO<br />
SPEECH RECOGNITION FOR HUMAN-ROBOT INTERACTION<br />
Pierre Lison<br />
German Research Center for Artificial Intelligence<br />
Abstract. We present an implemented model for speech recognition in natural environments<br />
which relies on contextual information about salient entities to prime utterance recognition.<br />
The hypo<strong>the</strong>sis underlying our approach is that, in situated human-robot interaction, speech<br />
recognition performance can be significantly enhanced by exploiting knowledge about <strong>the</strong><br />
immediate physical environment and <strong>the</strong> dialogue history. To this end, visual salience (objects<br />
perceived in <strong>the</strong> physical scene) and linguistic salience (previously referred-to objects<br />
within <strong>the</strong> current dialogue) are integrated into a single cross-modal salience model. The<br />
model is dynamically updated as <strong>the</strong> environment evolves, and is used to establish expectations<br />
about uttered words which are most likely to be heard given <strong>the</strong> context. The update is<br />
realised by continously adapting <strong>the</strong> word-class probabilities specified in <strong>the</strong> statistical language<br />
model. The present article discusses <strong>the</strong> motivations behind our approach, describes<br />
our implementation as part <strong>of</strong> a distributed, cognitive architecture for mobile robots, and<br />
reports <strong>the</strong> evaluation results on a test suite.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Recent years have seen increasing interest in service robots endowed with communicative<br />
capabilities. In many cases, <strong>the</strong>se robots must operate in open-ended environments<br />
and interact with humans using natural language to perform a variety <strong>of</strong> service-oriented<br />
tasks. Developing cognitive systems for such robots remains a formidable challenge.<br />
S<strong>of</strong>tware architectures for cognitive robots are typically composed <strong>of</strong> several cooperating<br />
subsystems, such as communication, computer vision, navigation and manipulation<br />
skills, and various deliberative processes such as symbolic planners (Langley, Laird and<br />
Rogers, 2005).<br />
These subsystems are highly interdependent. It is not enough to equip <strong>the</strong> robot with<br />
basic functionalities for dialogue comprehension and production to make it interact naturally<br />
in situated dialogues. We also need to find meaningful ways to relate language,<br />
action and situated reality, and enable <strong>the</strong> robot to use its perceptual experience to continuously<br />
learn and adapt itself to <strong>the</strong> environment.<br />
The first step in comprehending spoken dialogue is automatic speech recognition [ASR].<br />
For robots operating in real-world noisy environments, and dealing with utterances pertaining<br />
to complex, open-ended domains, this step is particularly error-prone. In spite <strong>of</strong><br />
continuous technological advances, <strong>the</strong> performance <strong>of</strong> ASR remains for most tasks at<br />
least an order <strong>of</strong> magnitude worse than that <strong>of</strong> human listeners (Moore, 2007).<br />
One strategy for addressing this issue is to use context information to guide <strong>the</strong> speech<br />
recognition by percolating contextual constraints to <strong>the</strong> statistical language model (Gruenstein,<br />
Wang and Seneff, 2005). In this paper, we follow this approach by defining a contextsensitive<br />
language model which exploits information about salient objects in <strong>the</strong> visual<br />
scene and linguistic expressions in <strong>the</strong> dialogue history to prime recognition. To this end,<br />
95
a salience model integrating both visual and linguistic salience is used to dynamically<br />
compute lexical activations, which are incorporated into <strong>the</strong> language model at runtime.<br />
Our approach departs from previous work on context-sensitive speech recognition by<br />
modeling salience as inherently cross-modal, instead <strong>of</strong> relying on just one particular<br />
modality such as gesture (Chai and Qu, 2005), eye gaze (Qu and Chai, 2007) or dialogue<br />
state (Gruenstein et al., 2005). The FUSE system described in (Roy and Mukherjee, 2005)<br />
is a closely related approach, but limited to <strong>the</strong> processing <strong>of</strong> object descriptions, whereas<br />
our system was designed from <strong>the</strong> start to handle generic situated dialogues (cf. §3.3).<br />
The structure <strong>of</strong> <strong>the</strong> paper is as follows: in <strong>the</strong> next section we briefly introduce <strong>the</strong><br />
s<strong>of</strong>tware architecture in which our system has been developed. We <strong>the</strong>n describe <strong>the</strong><br />
salience model, and explain how it is utilised within <strong>the</strong> language model used for ASR.<br />
We finally present <strong>the</strong> evaluation <strong>of</strong> our approach, followed by conclusions.<br />
Figure 1: Robotic platform (left) and example <strong>of</strong> a real visual scene (right)<br />
2 Architecture<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Our approach has been implemented as part <strong>of</strong> a distributed cognitive architecture (Hawes,<br />
Sloman, Wyatt, Zillich, Jacobsson, Kruijff, Brenner, Berginc and Skocaj, n.d.). Each subsystem<br />
consists <strong>of</strong> a number <strong>of</strong> processes, and a working memory. The processes can<br />
access sensors, effectors, and <strong>the</strong> working memory to share information within <strong>the</strong> subsystem.<br />
Figure 2 illustrates <strong>the</strong> spoken dialogue comprehension. Numbers 1-11 in <strong>the</strong><br />
figure indicate <strong>the</strong> usual sequential order for <strong>the</strong> processes..<br />
The speech recognition utilises Nuance Recognizer v8.5 toge<strong>the</strong>r with a statistical language<br />
model (§ 3.4). For <strong>the</strong> online update <strong>of</strong> word class probabilities according to <strong>the</strong><br />
salience model, we use <strong>the</strong> “just-in-time grammar” functionality provided by Nuance.<br />
Syntactic parsing is based on an incremental chart parser 1 for Combinatory Categorial<br />
Grammar (Steedman and Baldridge, 2003), and yields a set <strong>of</strong> interpretations – that is,<br />
1 Built on top <strong>of</strong> <strong>the</strong> OpenCCG NLP library: http://openccg.sf.net<br />
96
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Figure 2: Schematic view <strong>of</strong> <strong>the</strong> architecture for spoken dialogue comprehension<br />
97
logical forms expressed as ontologically rich, relational structures (Baldridge and Kruijff,<br />
2001). Figure 3 gives an example <strong>of</strong> such logical form.<br />
These interpretations are <strong>the</strong>n packed into a single representation (Oepen and Carroll,<br />
2000; Kruijff, Lison, Benjamin, Jacobsson and Hawes, in submission), a technique which<br />
enables us to efficiently handle syntactic ambiguity.<br />
Once <strong>the</strong> packed logical form is built, it is retrieved by <strong>the</strong> dialogue recognition module,<br />
which performs dialogue-level analysis tasks such as discourse reference resolution<br />
and dialogue move interpretation, and consequently updates <strong>the</strong> dialogue structure.<br />
@w1:cognition(want ∧<br />
ind ∧<br />
pres ∧<br />
(i1 : person ∧ I ∧<br />
sg ∧<br />
(t1 : action-motion ∧ take ∧<br />
y1 : person ∧<br />
(m1 : thing ∧ mug ∧<br />
unique ∧<br />
sg ∧<br />
specific singular)) ∧<br />
(y1 : person ∧ you ∧<br />
sg))<br />
Figure 3: Logical form generated for <strong>the</strong> utterance ‘I want you to take <strong>the</strong> mug’<br />
Linguistic interpretations must finally be associated with extra-linguistic knowledge<br />
about <strong>the</strong> environment – dialogue comprehension hence needs to connect with o<strong>the</strong>r subarchitectures<br />
like vision, spatial reasoning or planning. We realise this information binding<br />
between different modalities via a specific module, called <strong>the</strong> “binder”, which is responsible<br />
for <strong>the</strong> ontology-based mediation accross modalities (Jacobsson, Hawes, Kruijff<br />
and Wyatt, 2008).<br />
3 Approach<br />
3.1 Motivation<br />
As psycholinguistic studies have shown, humans do not process linguistic utterances in<br />
isolation from o<strong>the</strong>r modalities. Eye-tracking experiments notably highlighted that, during<br />
utterance comprehension, humans combine, in a closely time-locked fashion, linguistic<br />
information with scene understanding and world knowledge (Altmann and Kamide,<br />
2004; Knoeferle and Crocker, 2006).<br />
These observations – along with many o<strong>the</strong>rs – <strong>the</strong>refore provide solid evidence for <strong>the</strong><br />
embodied and situated nature <strong>of</strong> language and cognition (Lak<strong>of</strong>f, 1987; Barsalou, 1999).<br />
Humans thus systematically exploit dialogue and situated context to guide attention<br />
and help disambiguate and refine linguistic input by filtering out unlikely interpretations.<br />
Our approach is essentially an attempt to reproduce this mechanism in a robotic system.<br />
3.2 Salience modeling<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In our implementation, we define salience using two main sources <strong>of</strong> information:<br />
1. <strong>the</strong> salience <strong>of</strong> objects in <strong>the</strong> perceived visual scene;<br />
98
2. <strong>the</strong> linguistic salience or “recency” <strong>of</strong> linguistic expressions in <strong>the</strong> dialogue history.<br />
In <strong>the</strong> future, o<strong>the</strong>r sources could be added, for instance <strong>the</strong> possible presence <strong>of</strong> gestures<br />
(Chai and Qu, 2005), eye gaze tracking (Qu and Chai, 2007), entities in large-scale<br />
space (Zender and Kruijff, 2007), or <strong>the</strong> integration <strong>of</strong> a task model – as salience generally<br />
depends on intentionality (Landragin, 2006).<br />
3.2.1 Visual salience<br />
Via <strong>the</strong> “binder”, we can access <strong>the</strong> set <strong>of</strong> objects currently perceived in <strong>the</strong> visual scene.<br />
Each object is associated with a concept name (e.g. printer) and a number <strong>of</strong> features,<br />
for instance spatial coordinates or qualitative propreties like colour, shape or size.<br />
Several features can be used to compute <strong>the</strong> salience <strong>of</strong> an object. The ones currently<br />
used in our implementation are (1) <strong>the</strong> object size and (2) its distance relative to <strong>the</strong> robot<br />
(e.g. spatial proximity). O<strong>the</strong>r features could also prove to be helpful, like <strong>the</strong> reachability<br />
<strong>of</strong> <strong>the</strong> object, or its distance from <strong>the</strong> point <strong>of</strong> visual focus – similarly to <strong>the</strong> spread <strong>of</strong><br />
visual acuity across <strong>the</strong> human retina. To derive <strong>the</strong> visual salience value for each object,<br />
we assign a numeric value for <strong>the</strong> two variables, and <strong>the</strong>n perform a weighted addition.<br />
The associated weights are determined via regression tests.<br />
At <strong>the</strong> end <strong>of</strong> <strong>the</strong> processing, we end up with a set Ev <strong>of</strong> visual objects, each <strong>of</strong> which<br />
is associated with a numeric salience value s(ek), with 1 ≤ k ≤ |Ev|.<br />
3.2.2 Linguistic salience<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
There is a vast amount <strong>of</strong> literature on <strong>the</strong> topic <strong>of</strong> linguistic salience. Roughly speaking,<br />
linguistic salience can be characterised ei<strong>the</strong>r in terms <strong>of</strong> hierarchical recency, according<br />
to a tree-like model <strong>of</strong> discourse structure, or in terms <strong>of</strong> linear recency <strong>of</strong> mention<br />
(Kelleher, 2005). Our implementation can <strong>the</strong>orically handle both types <strong>of</strong> linguistic<br />
salience, but, at <strong>the</strong> time <strong>of</strong> writing, only <strong>the</strong> linear recency is calculated.<br />
To compute <strong>the</strong> linguistic salience, we extract a set El <strong>of</strong> potential referents from <strong>the</strong><br />
discourse structure, and for each referent ek we assign a salience value s(ek) equal to<br />
<strong>the</strong> distance (measured on a logarithmic scale) between its last mention and <strong>the</strong> current<br />
position in <strong>the</strong> discourse structure.<br />
3.2.3 Cross-modal salience model<br />
Once <strong>the</strong> visual and linguistic salience are computed, we can proceed to <strong>the</strong>ir integration<br />
into a cross-modal statistical model. We define <strong>the</strong> set E as <strong>the</strong> union <strong>of</strong> <strong>the</strong> visual and<br />
linguistic entities: E = Ev ∪ El, and devise a probability distribution P (E) on this set:<br />
P (ek) = δv IEv(ek) sv(ek) + δl IEl (ek) sl(ek)<br />
|E|<br />
where IA(x) is <strong>the</strong> indicator function <strong>of</strong> set A, and δv, δk are factors controlling <strong>the</strong><br />
relative importance <strong>of</strong> each type <strong>of</strong> salience. They are determined empirically, subject to<br />
<strong>the</strong> following constraint to normalise <strong>the</strong> distribution :<br />
δv<br />
�<br />
ek∈Ev<br />
s(ek) + δl<br />
�<br />
ek∈El<br />
(1)<br />
s(ek) = |E| (2)<br />
The statistical model P (E) thus simply reflects <strong>the</strong> salience <strong>of</strong> each visual or linguistic<br />
entity: <strong>the</strong> more salient, <strong>the</strong> higher <strong>the</strong> probability.<br />
99
3.3 Lexical activation<br />
In order for <strong>the</strong> salience model to be <strong>of</strong> any use for speech recognition, a connection<br />
between <strong>the</strong> salient entities and <strong>the</strong>ir associated words in <strong>the</strong> ASR vocabulary needs to<br />
be established. To this end, we define a lexical activation network, which lists, for each<br />
possible salient entity, <strong>the</strong> set <strong>of</strong> words activated by it. The network specifies <strong>the</strong> words<br />
which are likely to be heard when <strong>the</strong> given entity is present in <strong>the</strong> environment or in<br />
<strong>the</strong> dialogue history. It can <strong>the</strong>refore include words related to <strong>the</strong> object denomination,<br />
subparts, common properties or affordances. The salient entity laptop will activate words<br />
like ‘laptop’, ‘notebook’, ‘screen’, ‘opened’, ‘ibm’, ‘switch on/<strong>of</strong>f’, ‘close’, etc. The list<br />
is structured according to word classes, and a weight can be set on each word to modulate<br />
<strong>the</strong> lexical activation: supposing a laptop is present, <strong>the</strong> word ‘laptop’ should receive a<br />
higher activation than, say, <strong>the</strong> word ‘close’, which is less situation specific.<br />
The use <strong>of</strong> lexical activation networks is a key difference between our model and (Roy<br />
and Mukherjee, 2005), which relies on a measure <strong>of</strong> “descriptive fitness” to modify <strong>the</strong><br />
word probabilities. One advantage <strong>of</strong> our approach is <strong>the</strong> possibility to go beyond object<br />
descriptions and activate word types denoting subparts, properties or affordances <strong>of</strong><br />
objects 2 .<br />
If <strong>the</strong> probability <strong>of</strong> specific words is increased, we need to re-normalise <strong>the</strong> probability<br />
distribution. One solution would be to decrease <strong>the</strong> probability <strong>of</strong> all non-activated words<br />
accordingly. This solution, however, suffers from a significant drawback: our vocabulary<br />
contains many context-independent words like ‘thing’, or ‘place’, whose probability<br />
should remain constant. To address this issue, we mark an explicit distinction in our<br />
vocabulary between context-dependent and context-independent words.<br />
In <strong>the</strong> current implementation, <strong>the</strong> lexical activation network is constructed semimanually,<br />
using a simple lexicon extraction algorithm. We start with <strong>the</strong> list <strong>of</strong> possible<br />
salient entities, which is given by<br />
1. <strong>the</strong> set <strong>of</strong> physical objects <strong>the</strong> vision subsystem can recognise ;<br />
2. <strong>the</strong> set <strong>of</strong> nouns specified in <strong>the</strong> CCG lexicon with ‘object’ as ontological type.<br />
For each entity, we <strong>the</strong>n extract its associated lexicon by matching domain-specific syntactic<br />
patterns against a corpus <strong>of</strong> dialogue transcripts.<br />
3.4 Language modeling<br />
We now detail <strong>the</strong> language model used for <strong>the</strong> speech recognition – a class-based trigram<br />
model enriched with contextual information provided by <strong>the</strong> salience model.<br />
3.4.1 Corpus generation<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
We need a corpus to train any statistical language model. Unfortunately, no corpus <strong>of</strong><br />
situated dialogue adapted to our task domain was available. Collecting in-domain data via<br />
Wizard <strong>of</strong> Oz experiments is a very costly and time-consuming process, so we decided<br />
to follow <strong>the</strong> approach advocated in (Weilhammer, Stuttle and Young, 2006) instead and<br />
generate a class-based corpus from a task grammar we had at our disposal.<br />
Practically, we first collected a small set <strong>of</strong> WOz experiments, totalling about 800<br />
utterances. This set is <strong>of</strong> course too small to be directly used as a corpus for language<br />
2 In <strong>the</strong> context <strong>of</strong> a laptop object, ‘screen’ and ‘switch on/<strong>of</strong>f’ would for instance be activated.<br />
100
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
model training, but sufficient to get an intuitive idea <strong>of</strong> <strong>the</strong> kind <strong>of</strong> utterances we had to<br />
deal with.<br />
Based on it, we designed a domain-specific context-free grammar able to cover most<br />
<strong>of</strong> <strong>the</strong> utterances. Weights were <strong>the</strong>n automatically assigned to each grammar rule by<br />
parsing our initial corpus, hence leading to a small stochastic context-free grammar.<br />
As a last step, this grammar is randomly traversed a large number <strong>of</strong> times, which gives<br />
us <strong>the</strong> generated corpus.<br />
3.4.2 Salience-driven, class-based language models<br />
The objective <strong>of</strong> <strong>the</strong> speech recognizer is to find <strong>the</strong> word sequence W ∗ which has <strong>the</strong><br />
highest probability given <strong>the</strong> observed speech signal O and a set E <strong>of</strong> salient objects:<br />
W ∗ = arg max<br />
W<br />
P (O|W) ×<br />
� �� �<br />
P (W|E)<br />
� �� �<br />
acoustic model salience-driven language model<br />
For a trigram language model, <strong>the</strong> probability <strong>of</strong> <strong>the</strong> word sequence P (w n 1 |E) is:<br />
P (w n 1 |E) �<br />
(3)<br />
n�<br />
P (wi|wi−1wi−2; E) (4)<br />
i=1<br />
Our language model is class-based, so it can be fur<strong>the</strong>r decomposed into word-class<br />
and class transitions probabilities. The class transition probabilities reflect <strong>the</strong> language<br />
syntax; we assume <strong>the</strong>y are independent <strong>of</strong> salient objects. The word-class probabilities,<br />
however, do depend on context: for a given class – e.g. noun -, <strong>the</strong> probability <strong>of</strong> hearing<br />
<strong>the</strong> word ‘laptop’ will be higher if a laptop is present in <strong>the</strong> environment. Hence:<br />
P (wi|wi−1wi−2; E) = P (wi|ci; E)<br />
� �� �<br />
× P (ci|ci−1, ci−2)<br />
� �� �<br />
word-class probability class transition probability<br />
We now define <strong>the</strong> word-class probabilities P (wi|ci; E):<br />
P (wi|ci; E) = �<br />
P (wi|ci; ek) × P (ek) (6)<br />
ek∈E<br />
To compute P (wi|ci; ek), we use <strong>the</strong> lexical activation network specified for ek:<br />
⎧<br />
⎪⎨<br />
P (wi|ci) + α1 if wi ∈ activatedWords(ek)<br />
P (wi|ci) − α2 if wi /∈ activatedWords(ek) ∧<br />
P (wi|ci; ek) =<br />
wi ⎪⎩<br />
∈ contextDependentWords<br />
P (wi|ci) else<br />
The optimum value <strong>of</strong> α1 is determined using regression tests. α2 is computed relative<br />
to α1 in order to keep <strong>the</strong> sum <strong>of</strong> all probabilities equal to 1:<br />
α2 =<br />
|activatedWords|<br />
× α1<br />
|contextDependentWords| − |activatedWords|<br />
These word-class probabilities are dynamically updated as <strong>the</strong> environment and <strong>the</strong><br />
dialogue evolves and incorporated into <strong>the</strong> language model at runtime.<br />
101<br />
(5)<br />
(7)
4 Evaluation<br />
4.1 Evaluation procedure<br />
We evaluated our approach using a test suite <strong>of</strong> 250 spoken utterances recorded during<br />
Wizard <strong>of</strong> Oz experiments. The participants were asked to interact with <strong>the</strong> robot while<br />
looking at a specific visual scene. We designed 10 different visual scenes by systematic<br />
variation <strong>of</strong> <strong>the</strong> nature, number and spatial configuration <strong>of</strong> <strong>the</strong> objects presented. Figure<br />
4 gives an example <strong>of</strong> a visual scene.<br />
The interactions could include descriptions, questions and commands. No particular<br />
tasks were assigned to <strong>the</strong> participants. The only constraint we imposed was that all<br />
interactions with <strong>the</strong> robot had to be related to <strong>the</strong> shared visual scene.<br />
Figure 4: Sample visual scene including three objects: a box, a ball, and a chocolate bar.<br />
4.2 Results<br />
Table 1 summarises our experimental results. Due to space constraints, we focus our<br />
analysis on <strong>the</strong> WER <strong>of</strong> our model compared to <strong>the</strong> baseline – that is, compared to a<br />
class-based trigram model not based on salience.<br />
4.3 Analysis<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Word Error Rate<br />
[WER]<br />
Classical LM Salience-driven LM<br />
vocabulary size 25.04 % 24.22 %<br />
� 200 words (NBest 3: 20.72 %) (NBest 3: 19.97 %)<br />
vocabulary size 26.68 % 23.85 %<br />
� 400 words (NBest 3: 21.98 %) (NBest 3: 19.97 %)<br />
vocabulary size 28.61 % 23.99 %<br />
� 600 words (NBest 3: 24.59 %) (NBest 3: 20.27 %)<br />
Table 1: Comparative results <strong>of</strong> recognition performance<br />
As <strong>the</strong> results show, <strong>the</strong> use <strong>of</strong> a salience model can enhance <strong>the</strong> recognition performance<br />
in situated interactions: with a vocabulary <strong>of</strong> about 600 words, <strong>the</strong> WER is indeed reduced<br />
by 16.1 % compared to <strong>the</strong> baseline. According to <strong>the</strong> Sign test, <strong>the</strong> differences for <strong>the</strong><br />
last two tests (400 and 600 words) are statistically significant. As we could expect, <strong>the</strong><br />
salience-driven approach is especially helpful when operating with a larger vocabulary,<br />
102
where <strong>the</strong> expectations provided by <strong>the</strong> salience model can really make a difference in <strong>the</strong><br />
word recognition.<br />
The word error rate remains never<strong>the</strong>less quite high. This is due to several reasons.<br />
The major issue is that <strong>the</strong> words causing most recognition problems are – at least in<br />
our test suite – function words like prepositions, discourse markers, connectives, auxiliaries,<br />
etc., and not content words. Unfortunately, <strong>the</strong> use <strong>of</strong> function words is usually not<br />
context-dependent, and hence not influenced by salience. We estimated that 89 % <strong>of</strong> <strong>the</strong><br />
recognition errors were due to function words. Moreover, our chosen test suite is constituted<br />
<strong>of</strong> “free speech” interactions, which <strong>of</strong>ten include lexical items or grammatical<br />
constructs outside <strong>the</strong> range <strong>of</strong> our language model.<br />
5 Conclusion<br />
We have presented an implemented model for speech recognition based on <strong>the</strong> concept <strong>of</strong><br />
salience. This salience is defined via visual and linguistic cues, and is used to compute<br />
degrees <strong>of</strong> lexical activations, which are in turn applied to dynamically adapt <strong>the</strong> ASR<br />
language model to <strong>the</strong> robot’s environment and dialogue state.<br />
As future work we will examine <strong>the</strong> potential extension <strong>of</strong> our approach in three directions.<br />
First, we are investigating how to use <strong>the</strong> situated context to perform some priming<br />
<strong>of</strong> function words like prepositions or discourse markers. Second, we wish to take o<strong>the</strong>r<br />
information sources into account, particularly <strong>the</strong> integration <strong>of</strong> a task model, relying on<br />
data made available by <strong>the</strong> symbolic planner. And finally, we want to go beyond speech<br />
recognition, and investigate <strong>the</strong> relevance <strong>of</strong> such salience model for <strong>the</strong> development <strong>of</strong><br />
a robust understanding system for situated dialogue.<br />
Acknowledgements<br />
My thanks go to G.-J. Kruijff, H. Zender, M. Wilson and N. Yampolska for <strong>the</strong>ir insightful comments.<br />
The research reported in this article was supported by <strong>the</strong> EU FP6 IST Cognitive Systems<br />
Integrated project Cognitive Systems for Cognitive Assistants “CoSy” FP6-004250-IP.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Altmann, G. T. and Kamide, Y. (2004). Now you see it, now you don’t: Mediating<br />
<strong>the</strong> mapping between language and <strong>the</strong> visual world, Psychology Press, New York,<br />
pp. 347–386.<br />
Baldridge, J. and Kruijff, G.-J. M. (2001). Coupling ccg and hybrid logic dependency<br />
semantics, ACL ’02: <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 40th Annual Meeting on Association for<br />
Computational Linguistics, ACL, Morristown, NJ, USA, pp. 319–326.<br />
Barsalou, L. W. (1999). Perceptual symbol systems., Behavioral & Brain Sciences 22(4).<br />
Chai, J. Y. and Qu, S. (2005). A salience driven approach to robust input interpretation in<br />
multimodal conversational systems, <strong>Proceedings</strong> <strong>of</strong> Human Language Technology<br />
Conference and Conference on Empirical Methods in Natural Language Processing<br />
2005, Association for Computational Linguistics, Vancouver, Canada, pp. 217–224.<br />
Gruenstein, A., Wang, C. and Seneff, S. (2005). Context-sensitive statistical language<br />
modeling, <strong>Proceedings</strong> <strong>of</strong> INTERSPEECH 2005, pp. 17–20.<br />
103
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Hawes, N., Sloman, A., Wyatt, J., Zillich, M., Jacobsson, H., Kruijff, G.-J. M., Brenner,<br />
M., Berginc, G. and Skocaj, D. (n.d.). Towards an integrated robot with multiple<br />
cognitive functions., AAAI, AAAI Press, pp. 1548–1553.<br />
Jacobsson, H., Hawes, N., Kruijff, G.-J. and Wyatt, J. (2008). Crossmodal content binding<br />
in information-processing architectures, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 3rd ACM/IEEE International<br />
Conference on Human-Robot Interaction (HRI), Amsterdam, The Ne<strong>the</strong>rlands.<br />
Kelleher, J. (2005). Integrating visual and linguistic salience for reference resolution, in<br />
N. Creaney (ed.), <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 16th Irish conference on Artificial Intelligence<br />
and Cognitive Science (AICS-05), Portstewart, Nor<strong>the</strong>rn Ireland.<br />
Knoeferle, P. and Crocker, M. (2006). The coordinated interplay <strong>of</strong> scene, utterance, and<br />
world knowledge: evidence from eye tracking, Cognitive Science 30(3): 481–529.<br />
Kruijff, G.-J. M., Lison, P., Benjamin, T., Jacobsson, H. and Hawes, N. (in submission).<br />
Incremental, multi-level processing for comprehending situated dialogue in humanrobot<br />
interaction, Connection Science .<br />
Lak<strong>of</strong>f, G. (1987). Women, fire and dangerous things: what categories reveal about <strong>the</strong><br />
mind, University <strong>of</strong> Chicago Press, Chicago.<br />
Landragin, F. (2006). Visual perception, language and gesture: A model for <strong>the</strong>ir understanding<br />
in multimodal dialogue systems, Signal Processing 86(12): 3578–3595.<br />
Langley, P., Laird, J. E. and Rogers, S. (2005). Cognitive architectures: Research issues<br />
and challenges, Technical report, Institute for <strong>the</strong> Study <strong>of</strong> Learning and Expertise,<br />
Palo Alto.<br />
Moore, R. K. (2007). Spoken language processing: piecing toge<strong>the</strong>r <strong>the</strong> puzzle, Speech<br />
Communication: Special Issue on Bridging <strong>the</strong> Gap Between Human and Automatic<br />
Speech Processing 49: 418–435.<br />
Oepen, S. and Carroll, J. (2000). Ambiguity packing in constraint-based parsing - practical<br />
results, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 1st Conference <strong>of</strong> <strong>the</strong> North America Chapter <strong>of</strong> <strong>the</strong><br />
Association <strong>of</strong> Computational Linguistics, Seattle, WA, pp. 162–169.<br />
Qu, S. and Chai, J. (2007). An exploration <strong>of</strong> eye gaze in spoken language processing for<br />
multimodal conversational interfaces, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Conference <strong>of</strong> <strong>the</strong> North<br />
America Chapter <strong>of</strong> <strong>the</strong> Association <strong>of</strong> Computational Linguistics, pp. 284–291.<br />
Roy, D. and Mukherjee, N. (2005). Towards situated speech understanding: visual context<br />
priming <strong>of</strong> language models, Computer Speech & Language (2): 227–248.<br />
Steedman, M. and Baldridge, J. (2003). Combinatory categorial grammar. MS Draft 4.<br />
Weilhammer, K., Stuttle, M. N. and Young, S. (2006). Bootstrapping language models<br />
for dialogue systems, <strong>Proceedings</strong> <strong>of</strong> INTERSPEECH 2006, Pittsburgh, PA.<br />
Zender, H. and Kruijff, G.-J. M. (2007). Towards generating referring expressions in<br />
a mobile robot scenario, Language and Robots: <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Symposium,<br />
Aveiro, Portugal, pp. 101–106.<br />
104
A LOGIC WITH A CONDITIONAL PROBABILITY OPERATOR<br />
Petar Maksimović, Dragan Doder, Bojan Marinković and Aleksandar Perović<br />
Ma<strong>the</strong>matical Institute <strong>of</strong> <strong>the</strong> Serbian Academy <strong>of</strong> Sciences and Arts, Belgrade, Serbia<br />
Abstract. This paper presents a sound and strongly complete axiomatization <strong>of</strong> <strong>the</strong> reasoning<br />
about linear combinations <strong>of</strong> conditional probabilities, including comparative statements.<br />
The developed logic is decidable, with a PSPACE containment for <strong>the</strong> decision procedure.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
The present paper constitutes an effort to proceed along <strong>the</strong> lines <strong>of</strong> <strong>the</strong> research presented<br />
in (Fagin, Halpern and Megiddo, 1990; Lukasiewicz, 2002; Ognjanović and Raˇsković,<br />
1996; Ognjanović and Raˇsković, 1999; Ognjanović and Raˇsković, 2000; Ognjanović,<br />
Marković and Raˇsković, 2005; Ognjanović, Perović and Raˇsković, 2008; Raˇsković, Ognjanović<br />
and Marković, 2004), on <strong>the</strong> formal development <strong>of</strong> probabilistic logics, where<br />
probability statements are expressed by probabilistic operators expressing bounds on <strong>the</strong><br />
probability <strong>of</strong> a propositional formula.<br />
The main technical novelty <strong>of</strong> this paper lies in <strong>the</strong> fact that in it is given a sound and<br />
strongly complete axiomatization <strong>of</strong> <strong>the</strong> reasoning about linear combinations <strong>of</strong> conditional<br />
probabilities, which also allows for qualitative statements. For instance, we formally<br />
write <strong>the</strong> statement “<strong>the</strong> conditional probability <strong>of</strong> α given β is at least <strong>the</strong> sum <strong>of</strong><br />
conditional probabilities <strong>of</strong> α given γ and twice γ given α” as<br />
CP (α, β) � CP (α, γ) + 2 · CP (γ, α).<br />
It should be noted that all <strong>of</strong> <strong>the</strong> probabilities we use are Kolmogorov-style. We also prove<br />
that <strong>the</strong> developed logic is decidable.<br />
As it is well known, <strong>the</strong> conditional probability <strong>of</strong> α given β has meaning only if<br />
P (β) > 0, and is, by definition, calculated by<br />
P (α|β) =<br />
P (α ∧ β)<br />
.<br />
P (β)<br />
To avoid technical difficulties, we will adopt <strong>the</strong> convention that 0 −1 = 1. Namely, it is<br />
more convenient to assume that −1 is a total operation, with this being considered usual<br />
practice in quantifier elimination for <strong>the</strong> <strong>the</strong>ory <strong>of</strong> real closed fields. In this way, we make<br />
sure that conditional events are always defined.<br />
The rest <strong>of</strong> <strong>the</strong> paper is organized as follows. In Section 2 <strong>the</strong> syntax <strong>of</strong> <strong>the</strong> logic is<br />
given and <strong>the</strong> class <strong>of</strong> measurable probabilistic models is described. Section 3 contains<br />
<strong>the</strong> corresponding axiomatization and introduces <strong>the</strong> notion <strong>of</strong> deduction. A pro<strong>of</strong> <strong>of</strong> <strong>the</strong><br />
completeness <strong>the</strong>orem is presented in Section 4, whereas <strong>the</strong> decidability <strong>of</strong> <strong>the</strong> logic is<br />
analyzed in Section 5. Concluding remarks are in Section 6.<br />
105
2 Syntax and semantics<br />
Let V ar = {pn | n < ω} be <strong>the</strong> set <strong>of</strong> propositional variables. The corresponding set <strong>of</strong> all<br />
propositional formulas over V ar will be denoted by F orC, where C stands for classical,<br />
and is defined in <strong>the</strong> usual way. Propositional formulas will be denoted by α, β and γ,<br />
possibly with indices.<br />
Definition 1 The set T erm <strong>of</strong> all probabilistic terms is recursively defined as follows:<br />
• T erm(0) = {s | s ∈ Q} ∪ {CP (α, β) | α, β ∈ F orC}.<br />
• T erm(n + 1) = T erm(n) ∪ {(f + g), (s · g), (−f) | f, g ∈ T erm(n), s ∈ Q}<br />
• T erm = ∞�<br />
n=0<br />
T erm(n). �<br />
Probabilistic terms will be denoted by f, g and h, possibly with indices. To simplify<br />
notation, we introduce <strong>the</strong> following convention: f+g is (f+g), f+g+h is ((f+g)+h).<br />
For n > 3, n�<br />
fi is ((· · · ((f1 + f2) + f3) + · · ·) + fn). Similarly, −f is (−f) and f − g<br />
i=1<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
is (f + (−g)).<br />
If α and β are propositional formulas, <strong>the</strong>n <strong>the</strong> probabilistic term CP (α, β) reads “<strong>the</strong><br />
conditional probability <strong>of</strong> α given β”. To simplify notation, we will write P (α) instead<br />
<strong>of</strong> CP (α, ⊤), where ⊤ is an arbitrary tautology instance.<br />
Definition 2 A basic probabilistic formula is any formula <strong>of</strong> <strong>the</strong> form f � 0. Fur<strong>the</strong>rmore,<br />
we define <strong>the</strong> following abbreviations:<br />
• f � 0 is −f � 0; • f > 0 is ¬(f � 0); • f < 0 is ¬(f � 0);<br />
• f = 0 is f � 0 ∧ f � 0; • f �= 0 is ¬(f = 0); • f � g is f − g � 0.<br />
We define f � g, f > g, f < g, f = g and f �= g in a similar way. �<br />
We define <strong>the</strong> notion <strong>of</strong> a probabilistic formula as a Boolean combination <strong>of</strong> basic<br />
probabilistic formulas. As in <strong>the</strong> propositional case, ¬ and ∧ are <strong>the</strong> primitive connectives,<br />
while all <strong>of</strong> <strong>the</strong> o<strong>the</strong>r connectives are introduced in <strong>the</strong> usual way. Probabilistic formulas<br />
will be denoted by φ, ψ and θ, possibly with indices. The set <strong>of</strong> all probabilistic formulas<br />
will be denoted by F orP .<br />
By “formula” we mean ei<strong>the</strong>r a classical formula or a probabilistic formula. We do<br />
not allow for <strong>the</strong> mixing <strong>of</strong> those types <strong>of</strong> formulas, nor for <strong>the</strong> nesting <strong>of</strong> <strong>the</strong> probability<br />
operator P . Formulas will be denoted by Φ, Ψ and Θ, possibly with indices. The set <strong>of</strong><br />
all formulas will be denoted by F or.<br />
We define <strong>the</strong> notion <strong>of</strong> a model as a special kind <strong>of</strong> Kripke model. Namely, a model<br />
M is any tuple 〈W, H, µ, v〉 such that:<br />
• W is a nonempty set. As usual, its elements will be called worlds.<br />
• H is an algebra <strong>of</strong> sets over W .<br />
• µ : H −→ [0, 1] is a finitely additive probability measure.<br />
106
• v : F orC × W −→ {0, 1} is a truth assignment 1 compatible with ¬ and ∧. That is,<br />
v(¬α, w) = 1 − v(α, w) and v(α ∧ β, w) = v(α, w) · v(β, w).<br />
For a given model M, let [α]M be <strong>the</strong> set <strong>of</strong> all w ∈ W such that v(α, w) = 1. If<br />
<strong>the</strong> context is clear, we will write [α] instead <strong>of</strong> [α]M. We say that M is measurable if<br />
[α] ∈ H for all α ∈ F orC.<br />
Definition 3 Let M = 〈W, H, µ, v〉 be any measurable model. We define <strong>the</strong> satisfiability<br />
relation |= recursively as follows:<br />
• M |= α if v(α, w) = 1 for all w ∈ W .<br />
• M |= f � 0 if f M � 0, where f M is recursively defined in <strong>the</strong> following way:<br />
– s M = s.<br />
– CP (α, β) M = µ([α ∧ β]) · µ([β]) −1 .<br />
– (f + g) M = f M + g M .<br />
– (s · g) M = s · g M .<br />
– (−f) M = −(f M ).<br />
• M |= ¬φ if M �|= φ.<br />
• M |= φ ∧ ψ if M |= φ and M |= ψ. �<br />
A formula Φ is satisfiable if <strong>the</strong>re is a measurable model M such that M |= Φ; Φ is<br />
valid if it is satisfied in every measurable model. We say that <strong>the</strong> set T <strong>of</strong> formulas is<br />
satisfiable if <strong>the</strong>re is a measurable model M such that M |= Φ for all Φ ∈ T .<br />
Notice that <strong>the</strong> last two clauses <strong>of</strong> Definition 3 provide validity <strong>of</strong> each tautology instance.<br />
3 Axiomatization<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In this section we will introduce <strong>the</strong> axioms and inference rules and prove that <strong>the</strong> proposed<br />
axiomatization is sound and strongly complete with respect to <strong>the</strong> class <strong>of</strong> all measurable<br />
models. The set <strong>of</strong> axioms from our axiomatic system, which we denote AXLPCP,<br />
is divided into three groups: axioms for propositional reasoning, axioms for probabilistic<br />
reasoning and arithmetical axioms.<br />
Axioms for propositional reasoning<br />
A1. τ(Φ1, . . . , Φn), where τ(p1, . . . , pn) ∈ F orC is any tautology and Φi are ei<strong>the</strong>r all<br />
propositional or all probabilistic.<br />
Axioms for probabilistic reasoning<br />
A2. P (α) � 0; A5. P (α ↔ β) = 1 → P (α) = P (β);<br />
A3. P (⊤) = 1; A6. P (α ∨ β) = P (α) + P (β) − P (α ∧ β);<br />
A4. P (⊥) = 0; A7. (P (α ∧ β) = r ∧ P (β) = s) → CP (α, β) = r · s −1 .<br />
1 1 stands for “true”, while 0 stands for “false”<br />
107
Arithmetical axioms.<br />
A8. r � s, whenever r � s; A16. s · (f + g) = (s · f) + (s · g)<br />
A9. s · r = sr; A17. r · (s · f) = r · s · f<br />
A10. s + r = s + r; A18. 1 · f = f<br />
A11. f + g = g + f; A19. f � g ∨ g � f<br />
A12. (f + g) + h = f + (g + h); A20. (f � g ∧ g � h) → f � h<br />
A<strong>13</strong>. f + 0 = f; A21. f � g → f + h � g + h<br />
A14. f − f = 0; A22. (f � g ∧ s > 0) → s · f � s · g<br />
A15. (r · f) + (s · f) = r + s · f;<br />
Inference rules<br />
R1. From Φ and Φ → Ψ infer Ψ.<br />
R2. From α infer P (α) = 1.<br />
R3. From <strong>the</strong> set <strong>of</strong> premises {φ → f � −n −1 | n = 1, 2, 3, . . .} infer φ → f � 0.<br />
Let us briefly comment on <strong>the</strong> axioms and inference rules. The axioms A1-A7 provide<br />
<strong>the</strong> required properties <strong>of</strong> probability, while <strong>the</strong> axioms A8-A22 provide <strong>the</strong> properties<br />
required for computation. In <strong>the</strong> inference rules, R1 is modus ponens, R2 resembles<br />
necessitation, while R3 provides that non-Archimedean probabilites are not permitted.<br />
Definition 4 A formula Φ is a <strong>the</strong>orem (⊢ Φ) if <strong>the</strong>re is an at most countable sequence<br />
<strong>of</strong> formulas Φ0, Φ1, . . . , Φ, such that every Φi is ei<strong>the</strong>r an axiom or it is derived from <strong>the</strong><br />
preceding formulas <strong>of</strong> <strong>the</strong> sequence by an inference rule. In this paper we will also use<br />
<strong>the</strong> notion <strong>of</strong> deducibility. A formula Φ is deducible from a set T <strong>of</strong> sentences (T ⊢ Φ) if<br />
<strong>the</strong>re is an at most countable sequence <strong>of</strong> formulas Φ0, Φ1, . . . , Φ, such that every Φi is<br />
an axiom or a formula from <strong>the</strong> set T , or it is derived from <strong>the</strong> preceding formulas by an<br />
inference rule. A formula Φ is a <strong>the</strong>orem (⊢ Φ) if it is deducible from <strong>the</strong> empty set. A set<br />
<strong>of</strong> sentences T is consistent if <strong>the</strong>re is at least one formula from F orC, and at least one<br />
formula from F orP that are not deducible from T . O<strong>the</strong>rwise, T is inconsistent. A set T<br />
is deductively closed if for every Φ ∈ F or, if T ⊢ Φ, <strong>the</strong>n Φ ∈ T .<br />
�<br />
Observe that <strong>the</strong> length <strong>of</strong> <strong>the</strong> inference may be any successor ordinal lesser than <strong>the</strong><br />
first uncountable ordinal ω1. Using a straightforward induction on <strong>the</strong> length <strong>of</strong> <strong>the</strong> inference,<br />
one can easily show that <strong>the</strong> above axiomatization is sound with respect to <strong>the</strong> class<br />
<strong>of</strong> all measurable models.<br />
4 Completeness<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Theorem 1 (Deduction <strong>the</strong>orem) Suppose that T is an arbitrary set <strong>of</strong> formulas and that<br />
Φ, Ψ ∈ F or. Then, T ⊢ Φ → Ψ iff T ∪ {Φ} ⊢ Ψ.<br />
108
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Pro<strong>of</strong>: If T ⊢ Φ → Ψ, <strong>the</strong>n clearly T ∪ {Φ} ⊢ Φ → Ψ, so, by modus ponens (R1),<br />
T ∪ {Φ} ⊢ Ψ. Conversely, let T ∪ {Φ} ⊢ Ψ. As in <strong>the</strong> classical case, we will use <strong>the</strong><br />
induction on <strong>the</strong> length <strong>of</strong> inference to prove that T ⊢ Φ → Ψ. The pro<strong>of</strong> differs from <strong>the</strong><br />
classical only in <strong>the</strong> cases when we apply <strong>the</strong> inifinitary inference rule R3.<br />
Suppose that Ψ is <strong>the</strong> formula φ → f � 0 and that T ⊢ Φ → (φ → f � −n −1 ) for all<br />
n. Since <strong>the</strong> formula (p0 → (p1 → p2)) ↔ ((p0 ∧ p1) → p2), is a tautology, we obtain<br />
T ⊢ (Φ ∧ φ) → f � −n −1 for all n (A1). Now, by R3, T ⊢ (Φ ∧ φ) → f � 0. Hence, by<br />
<strong>the</strong> same tautology, T ⊢ Φ → Ψ.<br />
�<br />
The next technical lemma will be used in <strong>the</strong> construction <strong>of</strong> a maximally consistent<br />
extension <strong>of</strong> a consistent set <strong>of</strong> formulas.<br />
Lemma 2 Suppose that T is a consistent set <strong>of</strong> formulas. If T ∪ {φ → f � 0} is inconsistent,<br />
<strong>the</strong>n <strong>the</strong>re is a positive integer n such that T ∪ {φ → f < −n −1 } is consistent.<br />
Pro<strong>of</strong>: The pro<strong>of</strong> is based on <strong>the</strong> reductio ad absurdum argument. Thus, let us suppose<br />
that T ∪ {φ → f < −n −1 } is inconsistent for all n. Due to Deduction <strong>the</strong>orem, we can<br />
conclude that<br />
T ⊢ φ → f � −n −1<br />
for all n. By R3, T ⊢ φ → f � 0, so T is inconsistent; a contradiction. �<br />
Definition 5 Suppose that T is a consistent set <strong>of</strong> formulas and that F orP = {φi | i =<br />
0, 1, 2, 3, . . .}. We define a completion T ∗ <strong>of</strong> T recursively as follows:<br />
1. T0 = T ∪ {α ∈ F orC | T ⊢ α} ∪ {P (α) = 1 | T ⊢ α}.<br />
2. If Ti ∪ {φi} is consistent, <strong>the</strong>n Ti+1 = Ti ∪ {φi}.<br />
3. If Ti ∪ {φi} is inconsistent, <strong>the</strong>n:<br />
(a) If φi has <strong>the</strong> form ψ → f � 0, <strong>the</strong>n Ti+1 = Ti ∪ {ψ → f < −n −1 }, where n<br />
is a positive integer such that Ti+1 is consistent. The existence <strong>of</strong> such an n is<br />
provided by Lemma 2.<br />
(b) O<strong>the</strong>rwise, Ti+1 = Ti. �<br />
Obviously, each Ti is consistent. In <strong>the</strong> next <strong>the</strong>orem we will prove that T ∗ is deductively<br />
closed, consistent and maximal with respect to F orP .<br />
Theorem 3 Suppose that T is a consistent set <strong>of</strong> formulas and that T ∗ is constructed as<br />
above. Then:<br />
1. T ∗ is deductively closed, id est, T ∗ ⊢ Φ implies Φ ∈ T ∗ .<br />
2. There is φ ∈ F orP such that φ /∈ T ∗ .<br />
3. For each φ ∈ F orP , ei<strong>the</strong>r φ ∈ T ∗ , or ¬φ ∈ T ∗ .<br />
109
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Pro<strong>of</strong>: We will prove only <strong>the</strong> first clause, since <strong>the</strong> remaining clauses can be proved<br />
in <strong>the</strong> same way as in <strong>the</strong> classical case. In order to do so, it is sufficient to prove <strong>the</strong><br />
following four claims:<br />
(i) Each instance <strong>of</strong> any axiom is in T ∗ .<br />
(ii) If Φ ∈ T ∗ and Φ → Ψ ∈ T ∗ , <strong>the</strong>n Ψ ∈ T ∗ .<br />
(iii) If α ∈ T ∗ , <strong>the</strong>n P (α) = 1 ∈ T ∗ .<br />
(iv) If {φ → f � −n −1 | n = 1, 2, 3, . . .} is a subset <strong>of</strong> T ∗ , <strong>the</strong>n φ → f � 0 ∈ T ∗ .<br />
(i): If Φ ∈ F orC, <strong>the</strong>n Φ ∈ T0. O<strong>the</strong>rwise, <strong>the</strong>re is a nonnegative integer i such that<br />
Φ = φi. Since ⊢ φi, Ti ⊢ φi as well, so φi ∈ Ti+1.<br />
(ii): If Φ, Φ → Ψ ∈ F orC, <strong>the</strong>n Ψ ∈ T0. O<strong>the</strong>rwise, let Φ = φi, Ψ = φj, and Φ →<br />
Ψ = φk. Then, Ψ is a deductive consequence <strong>of</strong> each Tl, where l � max(i, k) + 1.<br />
Let ¬Ψ = φm. If φm ∈ Tm+1, <strong>the</strong>n ¬Ψ is a deductive consequence <strong>of</strong> each Tn, where<br />
n � m + 1. So, for every n � max(i, k, m) + 1, Tn ⊢ Ψ ∧ ¬Ψ, a contradiction.<br />
Thus, ¬Ψ �∈ T ∗ . On <strong>the</strong> o<strong>the</strong>r hand, if also Ψ �∈ T ∗ , we have that Tn ∪ {Ψ} ⊢ ⊥, and<br />
Tn ∪ {¬Ψ} ⊢ ⊥, for n � max(j, m) + 1, a contradiction with <strong>the</strong> consistency <strong>of</strong> Tn.<br />
Thus, Ψ ∈ T ∗ .<br />
(iii): If α ∈ T ∗ , <strong>the</strong>n α ∈ T0, so P (α) = 1 ∈ T0.<br />
(iv): Suppose that {φ → P (α) � −n −1 | n = 0, 1, 2, . . .} is a subset <strong>of</strong> T ∗ . We want<br />
to prove that φ → P (α) � 0 ∈ T ∗ . The pro<strong>of</strong> uses reductio ad absurdum argument. So,<br />
let φ → P (α) � 0 = φi and let us suppose that Ti ∪ {φi} is inconsistent. By 3.(a) <strong>of</strong><br />
Definition 5, <strong>the</strong>re is a positive integer n such that<br />
Ti+1 = Ti ∪ {φ → P (α) < −n −1 }<br />
and Ti+1 is consistent. Then, for all sufficiently large k, Tk ⊢ φ → P (α) < −n −1<br />
and Tk ⊢ φ → P (α) � −n −1 , so Tk ⊢ φ → ψ for all ψ ∈ F orP . In particular,<br />
Tk ⊢ φ → P (α) � 0, i.e., Tk ⊢ φi for all sufficiently large k. But, φi /∈ T ∗ , so φi is<br />
inconsistent with all Tk, k � i. It follows that each Tk is inconsistent for sufficiently large<br />
k, a contradiction.<br />
Thus, Ti ∪ {φi} is consistent, so φ → P (α) � 0 ∈ Ti+1.<br />
�<br />
For <strong>the</strong> given completion T ∗ , we define a canonical model M ∗ as follows:<br />
• W is <strong>the</strong> set <strong>of</strong> all functions w : F orC −→ {0, 1} with <strong>the</strong> following properties:<br />
– w is compatible with ¬ and ∧.<br />
– w(α) = 1 for each α ∈ T ∗ .<br />
• v : F orC × W −→ {0, 1} is defined by v(α, w) = 1 iff w(α) = 1.<br />
• H = {[α] | α ∈ F orC}.<br />
• µ : H −→ [0, 1] is defined by µ([α]) = sup{s ∈ [0, 1] ∩ Q | T ∗ ⊢ P (α) � s}.<br />
110
Lemma 4 M ∗ is a measurable model.<br />
Pro<strong>of</strong>: We need to prove that H is an algebra <strong>of</strong> sets and that µ is a finitely additive<br />
probability measure. It is easy to see that H is an algebra <strong>of</strong> sets, since [α]∩[β] = [α ∧β],<br />
[α] ∪ [β] = [α ∨ β] and H \ [α] = [¬α]. Concerning µ, it is sufficient to prove that A3, A4<br />
and A6 are satisfied in M. Here we will only give <strong>the</strong> sketch <strong>of</strong> <strong>the</strong> pro<strong>of</strong> for A6, which<br />
provides finite additivity <strong>of</strong> µ.<br />
Let µ([α]) = a, µ([β]) = b and µ([α ∧ β]) = c. We claim that<br />
µ([α ∨ β]) = a + b − c.<br />
This is an immediate consequence <strong>of</strong> <strong>the</strong> following facts:<br />
• µ([γ]) = sup{s ∈ Q | T ∗ ⊢ P (γ) � s}, γ ∈ F orC.<br />
• The real function F (x, y, z) = x + y − z is continuous.<br />
• For each r, s ∈ Q, T ∗ ⊢ r � s iff r � s.<br />
• Q 3 is dense in R 3 .<br />
Namely, for each positive ε, <strong>the</strong>re are positive δ1, δ2, δ3 such that for all 〈r1, r2, r3〉 ∈<br />
((a − δ1, a] × (b − δ2, b] × (c − δ3, c]) ∩ Q 3 ,<br />
In particular, for each s ′ , s ′′ ∈ Q such that<br />
r1 + r2 − r3 ∈ (a + b − c − ε, a + b − c + ε).<br />
a + b − c − ε < s ′ � r1 + r2 − r3 � s ′′ < a + b − c + ε,<br />
using <strong>the</strong> axioms about rational numbers, we have that<br />
T ∗ ⊢ s ′ � r1 + r2 − r3 � s ′′ ,<br />
i.e., µ([α ∨ β]) = µ([α]) + µ([β]) − µ([α ∧ β]). �<br />
Theorem 5 (Strong completeness <strong>the</strong>orem) Every consistent set <strong>of</strong> formulas has a measurable<br />
model.<br />
Pro<strong>of</strong>: Let T be a consistent set <strong>of</strong> formulas. We can extend it to a maximally consistent<br />
set T ∗ , and define a canonical model M ∗ , as above. By induction on <strong>the</strong> complexity<br />
<strong>of</strong> <strong>the</strong> formulas we can prove that M ∗ |= Φ iff Φ ∈ T ∗ .<br />
To begin <strong>the</strong> induction, let Φ = α ∈ F orC. If α ∈ T ∗ , i.e., T ∗ ⊢ α, <strong>the</strong>n by definition<br />
<strong>of</strong> M ∗ , M ∗ |= α. Conversely, if M ∗ |= α, by <strong>the</strong> completeness <strong>of</strong> classical propositional<br />
logic, T ∗ ⊢ α, and α ∈ T ∗ .<br />
Let us suppose that f � 0 ∈ T ∗ . Then, using <strong>the</strong> axioms for ordered commutative<br />
rings, we can prove that<br />
T ∗ ⊢ f = s +<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
m�<br />
si · CP (αi, βi) and T ∗ ⊢ s +<br />
i=1<br />
111<br />
m�<br />
si · CP (αi, βi) � 0,<br />
i=1
for some s, si ∈ Q and some αi, βi ∈ F orC such that T ∗ ⊢ P (βi) > 0. Let ai = µ([αj])<br />
and bi = µ([βi]). It remains to prove that<br />
s +<br />
m�<br />
i=1<br />
si · ai · b −1<br />
i<br />
� 0. (1)<br />
Similarly as in <strong>the</strong> pro<strong>of</strong> <strong>of</strong> Lemma 4, we can show that (1) is an immediate consequence<br />
<strong>of</strong> <strong>the</strong> following facts:<br />
• µ([γ]) = sup{s ∈ Q | T ∗ ⊢ P (γ) � s}, γ ∈ F orC.<br />
• The real function F (x1, . . . , xm, y1, . . . , ym) = s + n�<br />
• For each r, s ∈ Q, T ∗ ⊢ r � s iff r � s.<br />
• Q k is dense in R k .<br />
i=1<br />
si · xi · y −1<br />
i is continuous.<br />
For <strong>the</strong> o<strong>the</strong>r direction, let M ∗ |= f � 0. If f � 0 /∈ T ∗ , by construction <strong>of</strong> T ∗ ,<br />
<strong>the</strong>re is a positive integer n such that f < −n−1 ∈ T ∗ . Reasoning as above, we have that<br />
M ∗<br />
f < 0, which is a contradiction. So, f � 0 ∈ T ∗ .<br />
Let Φ = ¬φ ∈ F orP . Then M ∗ |= ¬φ iff M ∗ �|= φ iff φ �∈ T ∗ iff (by Theorem 3)<br />
¬φ ∈ T ∗ .<br />
Finally, let Φ = φ ∧ ψ ∈ F orP . M ∗ |= φ ∧ ψ iff M ∗ |= φ and M ∗ |= ψ iff φ, ψ ∈ T ∗<br />
iff (by Theorem 3) φ ∧ ψ ∈ T ∗ . �<br />
5 Decidability<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Theorem 6 Satisfiability <strong>of</strong> probabilistic formulas is decidable.<br />
Pro<strong>of</strong>: Up to equivalence, each probabilistic formula is a finite disjunction <strong>of</strong> finite<br />
conjunctions <strong>of</strong> literals, where literal is ei<strong>the</strong>r a basic probabilistic formula, or a negation<br />
<strong>of</strong> a basic probabilistic formula. Thus, it is sufficient to show <strong>the</strong> decidability <strong>of</strong> <strong>the</strong><br />
satisfiability problem for <strong>the</strong> formulas <strong>of</strong> <strong>the</strong> form<br />
�<br />
fi � 0 ∧ �<br />
gj < 0. (2)<br />
i<br />
j<br />
Suppose that p1, . . . , pn are all <strong>of</strong> <strong>the</strong> propositional formulas appearing in (2). Let A1, . . . , A2 n<br />
be all <strong>of</strong> <strong>the</strong> formulas <strong>of</strong> <strong>the</strong> form<br />
±p1 ∧ · · · ∧ ±pn,<br />
where +p = p and −p = ¬p. Clearly, Ai are pairwise disjoint and form a partition <strong>of</strong> ⊤.<br />
Fur<strong>the</strong>rmore, for each α appearing in (2) <strong>the</strong>re is a unique set Iα ⊆ {1, . . . , 2n } such that<br />
α ↔ �<br />
112<br />
i∈Iα<br />
Ai
is a tautology. Now we can equivalently rewrite (2) as<br />
� � �<br />
sii ′CP (<br />
i<br />
i ′<br />
k∈Iα ii ′<br />
Ak, �<br />
l∈Iβ ii ′<br />
Al) � 0 ∧ � � �<br />
sjj ′CP (<br />
Let σi(x1, . . . , x2n), δj(x1, . . . , x2n) be <strong>the</strong> formulas<br />
�<br />
and �<br />
j ′<br />
i ′<br />
�<br />
sii ′ · (<br />
k∈Iα ii ′<br />
�<br />
sjj ′ · (<br />
k∈Iα jj ′<br />
j<br />
j ′<br />
xk) · ( �<br />
l∈Iβ ii ′<br />
xk) · ( �<br />
l∈Iβ jj ′<br />
k∈Iα jj ′<br />
xl) −1 � 0<br />
xl) −1 < 0.<br />
Then, it is easy to see that (2) is satisfiable iff <strong>the</strong> sentence<br />
∃x1 . . . ∃x2n(� σi(¯x) ∧ �<br />
δj(¯x))<br />
i<br />
j<br />
Ak, �<br />
l∈Iβ jj ′<br />
Al) < 0.<br />
is satisfied in <strong>the</strong> ordered field <strong>of</strong> reals. Since <strong>the</strong> latter question is decidable, we have our<br />
claim. �<br />
It should be noted that this logic can be embedded into <strong>the</strong> logic described in (Fagin<br />
et al., 1990), which has a PSPACE containment for <strong>the</strong> decision procedure. Also, <strong>the</strong><br />
rewriting <strong>of</strong> formulas from our logic into that logic can be accomplished in linear time:<br />
CP (α, β) is equavivalent to<br />
w(α ∧ β)<br />
w(β)<br />
which is representable in (Fagin et al., 1990).<br />
Thus, we conclude that our logic is also decidable in PSPACE.<br />
6 Conclusion<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In this paper we introduced a sound and strongly-complete axiomatic system for <strong>the</strong> probabilistic<br />
logic with <strong>the</strong> conditional probability operator CP , which allows for linear combinations<br />
and comparative statements. As it was noticed in (van der Hoek, 1997), it is not<br />
possible to give a finitary strongly complete axiomatization for such a system. In our case<br />
<strong>the</strong> strong completeness was made possible by adding an infinitary rule <strong>of</strong> inference.<br />
The obtained formalism is quite expressive and allows for <strong>the</strong> representation <strong>of</strong> uncertain<br />
knowledge, where uncertainty is modeled by probability formulas. For instance,<br />
conditional statement <strong>of</strong> <strong>the</strong> form “<strong>the</strong> sum <strong>of</strong> probabilities <strong>of</strong> α given β and γ given δ is<br />
at least 0.95” can be written as<br />
CP (α, β) + CP (γ, δ) � 0.95.<br />
A similar approach can be applied to de Finetti style conditional probabilities. Future<br />
research will also consider a possibility <strong>of</strong> dealing with probabilistic first-order formulas.<br />
1<strong>13</strong>
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Fagin, Halpern and Megiddo (1990). A logic for reasoning about probabilities, Information<br />
and Computation 87(1/2): 78–128.<br />
Lukasiewicz, T. (2002). Probabilistic default reasoning with conditional constraints, Annals<br />
<strong>of</strong> Ma<strong>the</strong>matics and Artificial Intelligence 34: 35–88.<br />
Ognjanović, Z., Marković, Z. and Raˇsković, M. (2005). Completeness <strong>the</strong>orem for a<br />
logic with imprecise and conditional probabilities, Publications de l’institute ma<strong>the</strong>matique,<br />
nouvelle serie 78(92): 35–49.<br />
Ognjanović, Z., Perović, A. and Raˇsković, M. (2008). Logics with <strong>the</strong> qualitative probability<br />
operator, Logic Journal <strong>of</strong> IGPL 16(2): 105–120.<br />
Ognjanović, Z. and Raˇsković, M. (1996). A logic with higher order probabilities, Publication<br />
de l‘Institut Math. (NS) 60(74): 1–4.<br />
Ognjanović, Z. and Raˇsković, M. (1999). Some probability logics with new types <strong>of</strong><br />
probability operators, Journal <strong>of</strong> Logic and Computation 9(2): 181–195.<br />
Ognjanović, Z. and Raˇsković, M. (2000). Some first-order probability logics, Theoretical<br />
Computer Science 247(1-2): 191–212.<br />
Raˇsković, M., Ognjanović, Z. and Marković, Z. (2004). A logic with conditional probabilities,<br />
in J. Leite and J. Alferes (eds), 9th European Conference Jelia’04 Logics in<br />
Artificial Intelligence, Vol. 3229, Springer-Verlag, pp. 226–238.<br />
van der Hoek, W. (1997). Some considerations on <strong>the</strong> logic pfd: a logic combining<br />
modality and probability, Journal <strong>of</strong> Applied Non-Classical Logics 7(3): 287–307.<br />
114
A PROOF-THEORETIC APPROACH TO FRENCH PRONOMINAL CLITICS ◦<br />
Scott Martin<br />
The Ohio State University<br />
Abstract. This paper sketches an account <strong>of</strong> <strong>the</strong> behavior <strong>of</strong> French pronominal clitics in<br />
CVG, a pro<strong>of</strong>-<strong>the</strong>oretic categorial grammar formalism. The approach shown here differs<br />
from most categorial analyses <strong>of</strong> French clitics in that it treats clitics as noun phrases ra<strong>the</strong>r<br />
than as functions that operate on under-saturated verb phrases. Basic French cliticization,<br />
clitics in infinitival constructions, and both auxiliary and non-auxiliary clitic climbing are<br />
analyzed.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Cliticization in French is a set <strong>of</strong> phenomena in which pronominal complements to a<br />
verbal host are systematically realized as affixes. Linguistic generalizations about <strong>the</strong>se<br />
phenomena have been structured using several different frameworks, with Sag & Miller’s<br />
(1997) HPSG treatment <strong>of</strong> French clitics as morphological affixes being <strong>the</strong> most comprehensive<br />
and successful. Categorial accounts <strong>of</strong> cliticization phenomena, among <strong>the</strong>m<br />
Kraak (1998) for French and Morrill & Gavarro (1992) for Catalan, have largely analyzed<br />
clitics as functors over under-saturated verb phrases. Stabler (2001) and Amblard (2006)<br />
are two recent approaches to French clitics in <strong>the</strong> Minimalist Grammar formalism, both<br />
<strong>of</strong> which treat <strong>the</strong>m as syntactic elements with certain feature sets.<br />
In this paper, I give a preliminary account <strong>of</strong> some <strong>of</strong> <strong>the</strong> phenomena involving French<br />
clitics using Convergent Grammar (CVG), a categorial grammar framework that uses natural<br />
deduction with hypo<strong>the</strong>tical pro<strong>of</strong>. 1 This treatment is limited to a subset <strong>of</strong> what<br />
Bonami & Boye (2005) call French Pronominal Clitics (FPCs), specifically, those FPCs<br />
that appear as verbal complements. From Kraak (1998) I borrow <strong>the</strong> idea <strong>of</strong> a specialized<br />
combinatory mode for FPC attachment to a verbal host (analogous to her •ca) that is<br />
“stronger” than normal Complement Merge and reflects <strong>the</strong> status <strong>of</strong> clitic attachment as<br />
a process more morphological than syntactic. In contrast to Kraak’s and much o<strong>the</strong>r work<br />
on FPCs in categorial frameworks, however, <strong>the</strong> account sketched here partly follows <strong>the</strong><br />
work <strong>of</strong> Stabler and Amblard in analyzing FPCs not as functors over verb phrases but<br />
as sets <strong>of</strong> morphological features that also represent a syntactic and semantic argument,<br />
much like ordinary NPs.<br />
Drawing on Sag & Miller’s work on French clitics as inspiration, <strong>the</strong> analysis reflected<br />
here relies mainly on properly-structured lexical axioms to describe <strong>the</strong> behavior <strong>of</strong> FPCs.<br />
Basic instances <strong>of</strong> cliticization are considered as well as more complicated situations,<br />
such as argument composition and <strong>the</strong> interaction <strong>of</strong> FPCs with infinitivals. However,<br />
this paper does not take a firm stance on <strong>the</strong> question <strong>of</strong> whe<strong>the</strong>r cliticization phenomena<br />
◦ For many helpful comments and suggestions on this and earlier drafts <strong>of</strong> this paper, I am grateful to<br />
Yusuke Kubota, Carl Pollard, Chris Worth, and three anonymous <strong>ESSLLI</strong> reviewers.<br />
1 Pollard (2007) provides an introduction to CVG.<br />
115
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
should be considered syntactic or morphological, since CVG’s tectogrammatical terms<br />
represent syntactic dependency relations and do not necessarily correspond exactly to<br />
surface word order or prosodic form.<br />
2 Pronominal Complement Clitics in French<br />
French verbs take canonical complements in a manner that resembles complement selection<br />
for <strong>the</strong>ir English analogs: <strong>the</strong> verbal head combines with its complement(s) to <strong>the</strong><br />
right and with its subject to <strong>the</strong> left to form a finite or infinitive clause. When certain<br />
complements are pronominalized, however, <strong>the</strong>y can optionally appear to <strong>the</strong> immediate<br />
left <strong>of</strong> <strong>the</strong> verb in a variant form as proclitics. The following data, replicated in part from<br />
(1) in Sag & Miller (1997), show <strong>the</strong> verb voir ‘to see’ with its complement realized both<br />
canonically and as a proclitic: 2<br />
(1) a. Marie voit Jean. ‘Marie sees Jean.’<br />
b. Marie voit lui. ‘Marie sees him.’ [boldface = prosodic stress]<br />
c. Marie le voit.<br />
Marie ACC.3S sees<br />
‘Marie sees him.’<br />
The cliticized configuration is given in (1c), with <strong>the</strong> complement in its clitic form (le)<br />
instead <strong>of</strong> <strong>the</strong> canonical one (here Jean, or lui with appropriate stress).<br />
Among <strong>the</strong> o<strong>the</strong>r distinctive characteristics <strong>of</strong> complement FPCs noted by Kraak (1998),<br />
<strong>the</strong> ones that bear most on <strong>the</strong> account given here are that:<br />
• as verbal complements, <strong>the</strong>y do not co-occur with <strong>the</strong>ir non-pronominal or noncliticized<br />
versions (exemplified in (1)).<br />
• <strong>the</strong>y do not serve as <strong>the</strong> complement to bare past participles. This fact gives rise to<br />
an instance <strong>of</strong> <strong>the</strong> phenomenon known as “clitic climbing”:<br />
(2) a. *Marie a le vu. ‘Marie saw him.’<br />
b. Marie l’a vu.<br />
Marie ACC.3S has seen<br />
‘Marie saw him.’<br />
Here, (2a) is unacceptable because although <strong>the</strong> clitic le is <strong>the</strong> accusative complement<br />
<strong>of</strong> vu, it must be realized on <strong>the</strong> tense auxiliary form a as in (2b). However,<br />
causatives and certain verbs <strong>of</strong> perception exhibit different behavior. For <strong>the</strong>se<br />
verbs, it is possible for some <strong>of</strong> <strong>the</strong>ir arguments to be realized as clitics on <strong>the</strong><br />
upstairs verb and some on <strong>the</strong> downstairs one:<br />
(3) Jean le fera la réparer.<br />
Jean ACC.3S make.FUT ACC.3FS repair<br />
‘Jean will make him repair it.’<br />
(From Abeille, Godard and Miller(1995, example (2a)).)<br />
2 I adopt Bonami & Boye’s (2005) scheme here for annotating morphological features.<br />
116
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
• No syntactic material except ano<strong>the</strong>r clitic can intervene between an FPC and its<br />
host verb. This fact distinguishes cliticized complements from <strong>the</strong>ir canonical counterparts<br />
in which certain adverbials can occur between a verb and its complements:<br />
(4) a. Marie l’a souvent dit à lui.<br />
Marie ACC.3S has <strong>of</strong>ten said to him<br />
‘Marie has <strong>of</strong>ten said it to him.’<br />
b. Marie l’a dit souvent à lui.<br />
Marie ACC.3S has said <strong>of</strong>ten<br />
‘Marie has <strong>of</strong>ten said it to him.’<br />
to him<br />
c. Marie le lui a souvent dit.<br />
Marie ACC.3S DAT.3S has <strong>of</strong>ten<br />
‘Marie has <strong>of</strong>ten said it to him.’<br />
said<br />
d. *Marie le lui souvent a dit.<br />
Marie ACC.3S DAT.3S <strong>of</strong>ten has said<br />
‘Marie has <strong>of</strong>ten said it to him.’<br />
e. *Marie le souvent lui a dit.<br />
Marie ACC.3S <strong>of</strong>ten DAT.3S has said<br />
‘Marie has <strong>of</strong>ten said it to him.’<br />
(Example (4d) is from Kraak (1998, (7d)).) Here, (4d) and (4e) show <strong>the</strong> disallowed<br />
intervention <strong>of</strong> <strong>the</strong> adverbial souvent ‘<strong>of</strong>ten’ between an FPC and its host verb,<br />
while (4b) demonstrates <strong>the</strong> allowable intervention <strong>of</strong> souvent in <strong>the</strong> canonical form.<br />
• <strong>the</strong>y are normally realized on <strong>the</strong> verb <strong>the</strong>y complement, illustrated here with an<br />
embedded infinitival:<br />
(5) a. *Marie le veut voir. ‘Marie wants to see him.’<br />
b. Marie veut le voir.<br />
Marie wants ACC.3S to see<br />
‘Marie wants to see him.’<br />
The cliticized accusative le here is <strong>the</strong> complement <strong>of</strong> <strong>the</strong> infinitive voir, and does<br />
not to attach to <strong>the</strong> upstairs verb veut.<br />
These are <strong>the</strong> most basic facts about cliticization <strong>of</strong> declarative verbal complements in<br />
French. FPCs also occur in passive constructions and in constructions like those in (6):<br />
(6) a. i. Pierre reste fidèle à Jean.<br />
‘Pierre remains faithful to Jean.’<br />
ii. Pierre lui reste fidèle.<br />
b.<br />
Pierre DAT.3S remains faithful<br />
‘Pierre remains faithful to him.’<br />
i. Marie connaît la fin de l’histoire.<br />
‘Marie knows <strong>the</strong> end <strong>of</strong> <strong>the</strong> story.’<br />
117
ii. Marie en connaît la fin.<br />
Marie GEN.3S knows <strong>the</strong> end<br />
‘Marie knows <strong>the</strong> end <strong>of</strong> it.’<br />
(Both are from Sag & Miller (1997, example 3).) Constructions involving FPCs like those<br />
in (6) are similar to <strong>the</strong> clitic climbing that occurs with auxiliaries like avoir (as shown in<br />
(2)).<br />
In §3, I sketch an analysis <strong>of</strong> <strong>the</strong> basic facts about cliticization in some <strong>of</strong> <strong>the</strong> situations<br />
described above.<br />
3 Accounting for <strong>the</strong> Data<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Sag & Miller (1997) give extensive argumentation for considering clitics as morphological<br />
ra<strong>the</strong>r than syntactic in nature. Their account constrains <strong>the</strong> inflectional paradigm<br />
<strong>of</strong> French verbs, treating clitics as pronominal affixes that reduce <strong>the</strong> valence requirements<br />
<strong>of</strong> a given verb. In examining French clitics from a deductive perspective, Kraak<br />
(1998) instead describes cliticization as occurring on a “sliding scale” between morphology<br />
(affix-host attachment) and syntax (complement selection). The view presented here<br />
is more in line with Kraaks in that it uses CVG tectogrammatical pro<strong>of</strong> terms to describe<br />
<strong>the</strong> combinatoric potential <strong>of</strong> functions and arguments.<br />
However, this account diverges from Kraak’s and most o<strong>the</strong>r categorial grammar treatments<br />
in that it construes FPCs as regular pronominal NPs, instead <strong>of</strong> formulating <strong>the</strong>m<br />
as functors over under-saturated verb phrases. This approach allows <strong>the</strong> semantics to be<br />
nearly identical between canonical and cliticized forms by specifying a separate mode <strong>of</strong><br />
complement selection specifically for clitics.<br />
3.1 FPCs as a Local Dependency<br />
Because cliticization differs from <strong>the</strong> canonical form <strong>of</strong> complement selection (⊸C) in<br />
various ways, a separate implication mode, called ⊸PC (for proclitic), is used. As a local<br />
implication mode, it has modus ponens (elimination) but not hypo<strong>the</strong>tical pro<strong>of</strong> (introduction),<br />
which is used in CVG for non-local extractions. The elimination (or “merge”)<br />
rule for ⊸PC is as follows: 3<br />
Proclitic Merge<br />
If Γ ⊢ a, x : A, C ⊣ ∆<br />
and Γ ′ ⊢ f, v : A ⊸PC B, C ⊃ D ⊣ ∆ ′<br />
<strong>the</strong>n Γ, Γ ′ ⊢ ( PC a f), v(x) : B, D ⊣ ∆, ∆ ′<br />
This rule formalizes <strong>the</strong> affixation <strong>of</strong> clitics to a verbal host, taking into account both<br />
<strong>the</strong> syntactic and semantic pro<strong>of</strong> terms. This new ⊸PC implication mode allows lexical<br />
axioms to specify <strong>the</strong> cliticized complement mode <strong>of</strong> combination as opposed to<br />
<strong>the</strong> canonical one, and is central to <strong>the</strong> account <strong>of</strong> clitic behavior sketched here. As a<br />
mnemonic meant to reflect French word order in derivational history, function application<br />
for ⊸PC writes an FPC to <strong>the</strong> left <strong>of</strong> its host. This rule also states that hypo<strong>the</strong>ses present<br />
3 A CVG sign is a triple made up <strong>of</strong> <strong>the</strong> prosodic/phonological form, syntactic tectogrammatical term,<br />
and semantic content. For brevity, I omit <strong>the</strong> prosodic element and only include <strong>the</strong> syntactic tecto-term and<br />
semantic denotation.<br />
118
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
in both <strong>the</strong> syntactic context (to <strong>the</strong> left <strong>of</strong> ⊢) and <strong>the</strong> semantic co-context (to <strong>the</strong> right<br />
<strong>of</strong> ⊣) <strong>of</strong> both premises are propagated into <strong>the</strong> conclusion. This ensures that <strong>the</strong> application<br />
<strong>of</strong> this rule does not have any effect on any non-local extractions (filler-gap path<br />
information), stored quantifiers, or anaphoric pronouns.<br />
With this new implication mode and merge rule, an account <strong>of</strong> FPC behavior as demonstrated<br />
in §2 is possible that requires no o<strong>the</strong>r machinery than <strong>the</strong> CVG merge rules described<br />
in Pollard (2007). All that remains is to correctly specify <strong>the</strong> necessary lexical<br />
axioms. First are <strong>the</strong> canonical forms <strong>of</strong> <strong>the</strong> verbs and complements: 4<br />
⊢ Marie, marie ′ : Nom, Ind<br />
⊢ Jean, jean ′ : Acc, Ind<br />
⊢ lui1 , a : Acc, Ind<br />
⊢ voit1 , λyλxsee ′ (x, y) : (Acc \ Pcl) ⊸C (Nom ⊸SU Fin), Ind ⊃ (Ind ⊃ Prop)<br />
The new type Pcl is assigned to proclitics in order to differentiate <strong>the</strong>m from <strong>the</strong>ir canonical<br />
counterparts. Here, voit selects a complement <strong>of</strong> type Acc \ Pcl to indicate that it<br />
does not combine with proclitics in canonical complement position: <strong>the</strong> set complement<br />
specifies all inhabitants <strong>of</strong> type Acc except those that inhabit Pcl. Next, <strong>the</strong> lexicon is<br />
extended to reflect <strong>the</strong> syntactic/morphological features <strong>of</strong> le and <strong>the</strong> cliticization mode<br />
<strong>of</strong> complement selection for voir:<br />
⊢ le, b : Acc ∩ 3Sg ∩ Pcl, Ind<br />
⊢ voit2 , λyλxsee ′ (x, y) : (Acc ∩ Pcl) ⊸PC (Nom ⊸SU Fin),<br />
Ind ⊃ (Ind ⊃ Prop)<br />
These axioms allow <strong>the</strong> following pro<strong>of</strong> terms for <strong>the</strong> data in (1): 5<br />
(7) a. ⊢ ( SU Marie (voit1 Jean C )), see ′ (marie ′ , jean ′ ) : Fin, Prop<br />
b. ⊢ ( SU Marie (voit1 lui1 C )), see ′ (marie ′ , a) : Fin, Prop<br />
c. ⊢ ( SU Marie ( PC le voit2 )), see ′ (marie ′ , b) : Fin, Prop<br />
Aside from <strong>the</strong> different implication mode, <strong>the</strong> only difference between <strong>the</strong> canonical<br />
form <strong>of</strong> voit (voit1 ) and <strong>the</strong> cliticized variant (voit2 ) is that <strong>the</strong> argument to voit2 must<br />
be <strong>of</strong> <strong>the</strong> intersective type Acc ∩ Pcl. The type 3Sg represents <strong>the</strong> argument’s agreement<br />
features. So stated, this selectional restriction ensures that voit2 can only combine in<br />
cliticized mode with accusative complements that are also proclitics, as desired. It is<br />
important to note that not only are <strong>the</strong> semantics <strong>of</strong> both variants <strong>of</strong> voit identical, but<br />
both cliticized and canonical complements are <strong>of</strong> <strong>the</strong> same semantic type (Ind) as well.<br />
4 The basic tectogrammatical types used here are Nom for nominative NPs, Acc for accusative NPs, and<br />
Fin for finite clauses. The hyperintensional types Ind, <strong>the</strong> type <strong>of</strong> individual concepts; and Prop, <strong>the</strong> type<br />
<strong>of</strong> propositions, are <strong>the</strong> basic semantic types. In addition to <strong>the</strong> new combinatory mode ⊸PC, implicative<br />
tectogrammatical types are constructed using ⊸SU and ⊸C, which invoke Subject Merge and Complement<br />
Merge, respectively.<br />
5 For clarity, <strong>the</strong> pro<strong>of</strong> terms given in this account show <strong>the</strong> semantics but not <strong>the</strong> co-context as quantification,<br />
wh-phrases, and anaphoric binding are not discussed here.<br />
119
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
3.2 “Clitic Climbing” and Tense Auxiliaries<br />
The axioms for tense auxiliaries are structured so that <strong>the</strong>y take <strong>the</strong> complements <strong>of</strong> <strong>the</strong>ir<br />
verbal complement. Past-participial verbs in turn need to be specified in such a way that<br />
<strong>the</strong> proclitic merge rule does not apply to <strong>the</strong>m. This approach is reminiscent <strong>of</strong> <strong>the</strong><br />
argument composition approach employed by Sag & Miller (1997) and Abeille, Godard<br />
& Sag(1998). The axioms necessary to describe <strong>the</strong> “climbing” behavior in (2) are <strong>the</strong><br />
following:<br />
⊢ aA, λvv<br />
: ((A \ Pcl) ⊸C (Nom ⊸SU Psp)) ⊸C ((A ∩ Pcl) ⊸PC (Nom ⊸SU Fin)),<br />
(Ind ⊃ (Ind ⊃ Prop)) ⊃ (Ind ⊃ (Ind ⊃ Prop))<br />
⊢ vu, λyλxsee ′ (x, y) : (Acc \ Pcl) ⊸C (Nom ⊸SU Psp), Ind ⊃ (Ind ⊃ Prop)<br />
The tense auxiliary form a (from avoir) is schematically defined to combine with a verb<br />
in past participial form missing its complement, <strong>of</strong> polymorphic type A, to yield a finite<br />
sentence missing both that same A complement and a nominative subject. In this way, <strong>the</strong><br />
A-type complement is “passed along” from <strong>the</strong> past participle to <strong>the</strong> tense auxiliary, whose<br />
semantics are just to apply <strong>the</strong> identity function to <strong>the</strong> meaning <strong>of</strong> its past-participial<br />
complement.<br />
A pro<strong>of</strong> term that correctly predicts <strong>the</strong> allowed form <strong>of</strong> (2b) is <strong>the</strong>n possible: 6<br />
(8) ⊢ ( SU Marie ( PC le (aAcc vu C ))), see ′ (marie ′ , b) : Fin, Prop<br />
No pro<strong>of</strong> is available for <strong>the</strong> disallowed form in (2a) because <strong>the</strong> lexical axiom vu only<br />
uses <strong>the</strong> ⊸C mode <strong>of</strong> implication, and as a result proclitics can not directly combine with<br />
it.<br />
3.3 FPCs in Infinitival Constructions<br />
Ensuring that cliticized complements <strong>of</strong> infinitival complements stay on <strong>the</strong> infinitiveform<br />
verb, as depicted in (5), can also be accomplished with well formulated lexical<br />
axioms. This ends up being simply a matter <strong>of</strong> making sure that infinitive-form verbs can<br />
take proclitic complements and <strong>the</strong> verbs that select infinitivals can not:<br />
⊢ voir1 , λyλxsee ′ (x, y)<br />
: (Acc ∩ Pcl) ⊸PC (Nom ⊸SU Inf), Ind ⊃ (Ind ⊃ Prop)<br />
⊢ veut, λPλxwant ′ (x, P (x))<br />
: (Nom ⊸SU Inf) ⊸C (Nom ⊸SU Fin), (Ind ⊃ Prop) ⊃ (Ind ⊃ Prop)<br />
The semantic representation <strong>of</strong> veut given here is <strong>the</strong> “equi” version <strong>of</strong> <strong>the</strong> denotation<br />
λP∈Propλx∈Indwant ′ (x, P ) that might be used where veut takes a sentential complement,<br />
as in Marie veut qu’elle gagne ‘Marie wants that she wins’.<br />
With <strong>the</strong> lexicon so extended, a pro<strong>of</strong> term for (5b) can be derived:<br />
(9) ⊢ ( SU Marie (veut ( PC le voir) C )), want ′ (marie ′ , see ′ (marie ′ , b)) : Fin, Prop<br />
A derivation for (5a) is not possible because veut does not employ <strong>the</strong> ⊸PC mode <strong>of</strong><br />
combination required for FPCs.<br />
6 Note that <strong>the</strong> tectogrammatical pro<strong>of</strong> term in (8) does not describe <strong>the</strong> phonological elision between le<br />
and a that occurs in French.<br />
120
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
3.4 FPCs and Non-auxiliary Composition<br />
Extending CVG to account for FPCs that combine with argument composition verbs o<strong>the</strong>r<br />
than auxiliaries, whose behavior is exemplified in (6), requires defining special lexical<br />
axioms for those verbs. Similar to <strong>the</strong> data examined so far, “non-local pronominal affixation”<br />
(in <strong>the</strong> terminology <strong>of</strong> Sag & Miller (1997)) is very short distance in nature, and as<br />
such employs <strong>the</strong> local implication ⊸PC that was introduced to handle procliticization.<br />
It is not necessary to invoke CVG’s hypo<strong>the</strong>tical pro<strong>of</strong> machinery for handling extraction<br />
phenomena to explain <strong>the</strong> data in (6).<br />
Here, a strategy is adopted <strong>of</strong> composing a predicative adjectival (for example, fidèle)<br />
or transitive verb (like connaît) with a version <strong>of</strong> its complement that is itself expecting<br />
a complement. The necessary extensions to <strong>the</strong> lexicon for <strong>the</strong> data in (6a) are <strong>the</strong><br />
following: 7<br />
⊢ Pierre, pierre ′ : Nom, Ind<br />
⊢ lui2 , d : Dat ∩ 3Sg ∩ Pcl, Ind<br />
⊢ fidèle, λyλxfaithful ′ (x, y) : (Dat \ Pcl) ⊸C (Nom ⊸SU Adj),<br />
Ind ⊃ (Ind ⊃ Prop)<br />
⊢ reste, λPλyλxremain ′ (P (x, y))<br />
: ((Dat \ Pcl) ⊸C (Nom ⊸SU Adj)) ⊸C ((Dat ∩ Pcl) ⊸PC (Nom ⊸SU Fin)),<br />
(Ind ⊃ (Ind ⊃ Prop)) ⊃ (Ind ⊃ (Ind ⊃ Prop))<br />
These axioms describe fidèle as an adjective missing a dative complement to form an<br />
adjectival small clause and <strong>the</strong> form <strong>of</strong> rester that takes an adjectival complement that is<br />
itself missing its complement. These extensions permit a pro<strong>of</strong> term for (6a-ii):<br />
(10) ⊢ ( SU Pierre ( PC lui2 (reste fidèle C ))), remain ′ (faithful ′ (pierre ′ , d)) : Fin, Prop<br />
(A full derivation <strong>of</strong> (10) is given in Figure 1 in <strong>the</strong> appendix.) With a few fur<strong>the</strong>r extensions<br />
to <strong>the</strong> lexicon, (6b) can also be accounted for:<br />
⊢ connaît, λf λyλxknow ′ (x, f(y))<br />
: ((De \ Pcl) ⊸C Acc) ⊸C ((De ∩ Pcl) ⊸PC (Nom ⊸SU Fin)),<br />
(Ind ⊃ Ind) ⊃ (Ind ⊃ Prop)<br />
⊢ fin, end ′ : N, Ind<br />
⊢ la, λf λxf(x) : N ⊸SP ((De \ Pcl) ⊸C Acc), Ind ⊃ (Ind ⊃ Ind)<br />
⊢ en, e : De ∩ Pcl, Ind<br />
Here, connaît is formulated as just an ordinary transitive verb except that it selects an<br />
accusative complement that is itself missing its De complement. The definite article la<br />
is treated as a function from common nouns (type N) to possessive NPs (functions from<br />
canonical de-phrases to accusatives), using <strong>the</strong> specifier combinatory mode ⊸SP. The<br />
clitic en is represented as an axiom whose type is <strong>the</strong> intersection <strong>of</strong> De and Pcl. These<br />
axioms allow a pro<strong>of</strong> term like <strong>the</strong> one in (10) for (6b-ii):<br />
7 This account assumes <strong>the</strong> analysis <strong>of</strong> predicatives given by Pollard (2006) pp. 52–65, for example, for<br />
adjectival small clauses <strong>of</strong> <strong>the</strong> type Nom ⊸SU Adj.<br />
121
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(11) ⊢ ( SU Marie ( PC en (connaît (la fin SP ) C ))), know ′ (marie ′ , end ′ (e)) : Fin, Prop<br />
The lexical axioms introduced here predict that FPCs in non-auxiliary composition<br />
contexts behave in a way largely parallel with that <strong>of</strong> FPCs that combine with auxiliary<br />
verbs. The main difference between FPCs with auxiliaries and with non-auxiliaries is that<br />
<strong>the</strong> complement types for non-auxiliaries must be more constrained than <strong>the</strong> free-ranging<br />
polymorphic complement allowed by auxiliaries. Since this approach does not appeal<br />
to CVG’s unbounded dependency machinery, instead relying on axioms that specify <strong>the</strong><br />
⊸PC local dependency, <strong>the</strong>se instances <strong>of</strong> cliticization are guaranteed to remain shortdistance.<br />
If FPCs in non-auxiliary composition contexts were construed as non-local<br />
extractions, it would be difficult to rule out constructions like (12), for example, which do<br />
not occur in French: 8<br />
(12) *Marie luii reste certaine que Céline a donné le livre i.<br />
4 Conclusions and Future Work<br />
This paper sketches a pro<strong>of</strong>-<strong>the</strong>oretic account <strong>of</strong> <strong>the</strong> behavior <strong>of</strong> FPCs as complements.<br />
For local cliticization, a new valence implication mode ⊸PC is introduced to differentiate<br />
procliticization from <strong>the</strong> canonical form <strong>of</strong> verbal complement selection. Combined with<br />
properly-formulated lexical axioms, this new mode can account for some <strong>of</strong> <strong>the</strong> behavior<br />
<strong>of</strong> FPCs, including <strong>the</strong> basic instances <strong>of</strong> cliticization, FPCs in infinitival constructions,<br />
and two forms <strong>of</strong> “clitic climbing” via an argument composition analysis.<br />
The analysis given here departs from traditional categorial analyses <strong>of</strong> cliticization<br />
by construing FPCs as special instances <strong>of</strong> NPs. An advantage <strong>of</strong> this approach is that<br />
a cliticized complement has identical semantics and a nearly identical tectogrammatical<br />
form as its canonical counterpart. This fact, in combination with <strong>the</strong> new ⊸PC mode<br />
<strong>of</strong> implication for FPC affixation, allows lexical axioms to more strictly constrain <strong>the</strong><br />
behavior <strong>of</strong> FPCs in comparison to o<strong>the</strong>r types <strong>of</strong> verbal complements. This ability may<br />
be central to correctly predicting, for example, <strong>the</strong> distribution <strong>of</strong> souvent as shown in (4).<br />
This approach suffers, however, from <strong>the</strong> proliferation <strong>of</strong> lexical axioms that must occur<br />
since all verbs that take complements need at least two distinct representations in <strong>the</strong><br />
lexicon. Such a requirement would have especially adverse implications for computational<br />
applications like parsing. Since very <strong>of</strong>ten, as with voit1 and voit2 , <strong>the</strong> canonical<br />
form <strong>of</strong> a verb closely resembles its cliticized variant, it is clear that a lexical rule associating<br />
<strong>the</strong>se forms is crucial to <strong>the</strong> success <strong>of</strong> this type <strong>of</strong> approach. The instances <strong>of</strong><br />
auxiliary and non-auxiliary composition presented here are also largely similar between<br />
cliticized and non-cliticized versions. A general account <strong>of</strong> FPCs in French along <strong>the</strong><br />
lines <strong>of</strong> <strong>the</strong> analyses presented here must include a mapping between <strong>the</strong>se similar forms<br />
that captures <strong>the</strong>ir common linguistic and information-structural characteristics.<br />
Future work on FPCs will aim to develop a correspondence between canonical and<br />
cliticized verb forms that predicts FPC behavior in a general way. This work will need to<br />
account for multiple clitic constructions, <strong>the</strong> rigid (and sometimes idiosyncratic) ordering<br />
<strong>of</strong> FPC clusters, agreement between FPCs and past participles, FPCs in passive, causative,<br />
and perceptual-verb constructions, and <strong>the</strong> enclitic attachment to imperative-form verbs<br />
in French.<br />
8 This example is due to Carl Pollard (personal communication <strong>of</strong> March 18, 2008).<br />
122
References<br />
Abeillé, A., Godard, D. and Miller, P. (1995). Causatifs et Verbes de Perception en<br />
Français, Actes du Deuxième Colloque Langues et Grammaire, Paris VIII, Saint<br />
Denis.<br />
Abeillé, A., Godard, D. and Sag, I. A. (1998). Two Kinds <strong>of</strong> Composition in French<br />
Complex Predicates, Syntax and Semantics: Complex Predicates in Nonderivational<br />
Syntax 30: 1–41.<br />
Amblard, M. (2006). Treating clitics with minimalist grammars, in S. Wintner (ed.),<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Eleventh Conference on Formal Grammar, CSLI Publications,<br />
pp. 9–20.<br />
Bonami, O. and Boyé, G. (2005). French pronominal clitics and <strong>the</strong> design <strong>of</strong> Paradigm<br />
Function Morphology, in G. Booij, L. Ducceschi, B. Fradin, E. Guevara, A. Ralli<br />
and S. Scalise (eds), <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Fifth Mediterranean Morphology Meeting,<br />
pp. 291–322.<br />
Kraak, E. (1998). A Deductive Account <strong>of</strong> French Object Clitics, Syntax and Semantics:<br />
Complex Predicates in Nonderivational Syntax 30: 271–312.<br />
Morrill, G. and Gavarro, A. (1992). Catalan Clitics, in A. Lecomte (ed.), Word Order in<br />
Categorial Grammar, Editions Adosa, Clermont-Ferrand, pp. 211–232.<br />
Pollard, C. (2006). Higher Order Grammar: A Tutorial. Unpublished ms., available at<br />
http://www.ling.osu.edu/∼hana/hog/pollard2006-synners.pdf.<br />
Pollard, C. (2007). Nonlocal dependencies via variable contexts, in R. Muskens (ed.),<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Workshop on New Directions in Type-Theoretic Grammar. ESS-<br />
LLI 2007, Dublin.<br />
Sag, I. A. and Miller, P. H. (1997). French Clitic Movement without Clitics or Movement,<br />
Natural Language and Linguistic Theory 15(3): 573–639.<br />
Stabler, E. P. (2001). Recognizing Head Movement, LACL ’01: <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 4th International<br />
Conference on Logical Aspects <strong>of</strong> Computational Linguistics, Springer-<br />
Verlag, London, UK, pp. 245–260.<br />
Appendix A: Full Derivation<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
123
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
⊢ reste : ((Dat \ Pcl) ⊸C (Nom ⊸SU Adj)) ⊸C ((Dat ∩ Pcl) ⊸PC (Nom ⊸SU Fin)) ⊢ fidèle : (Dat \ Pcl) ⊸C (Nom ⊸SU Adj)<br />
⊢ (reste fidèle C ) : (Dat ∩ Pcl) ⊸PC (Nom ⊸SU Fin)<br />
⊢ ( PC lui2 (reste fidèle C )) : Nom ⊸SU Fin<br />
⊢ lui2 : Dat ∩ 3Sg ∩ Pcl<br />
⊢ Pierre : Nom<br />
⊢ ( SU Pierre ( PC lui2 (reste fidèle C ))) : Fin<br />
124<br />
⊢ λPλyλxremain ′ (P (x, y)) : (Ind ⊃ (Ind ⊃ Prop)) ⊃ (Ind ⊃ (Ind ⊃ Prop)) ⊢ λyλxfaithful ′ (x, y) : Ind ⊃ (Ind ⊃ Prop)<br />
⊢ λyλxremain ′ (faithful ′ (x, y)) : Ind ⊃ (Ind ⊃ Prop)<br />
⊢ λxremain ′ (faithful ′ (x, d)) : Ind ⊃ Prop<br />
⊢ d : Ind<br />
⊢ pierre ′ : Ind<br />
⊢ remain ′ (faithful ′ (pierre ′ , d)) : Prop<br />
Figure 1: Full derivation <strong>of</strong> (10), with tecto-terms (above) and semantic terms (below) given separately for space considerations.
INFINITE GAMES<br />
FROM AN INTUITIONISTIC POINT OF VIEW<br />
Takako Nemoto<br />
Tohoku University<br />
Abstract. In this paper, we consider determinacy in Brouwerian intuitionistic ma<strong>the</strong>matics.<br />
We give some examples <strong>of</strong> games such that <strong>the</strong> character <strong>of</strong> this ma<strong>the</strong>matical setting—<strong>the</strong><br />
lack <strong>of</strong> <strong>the</strong> law <strong>of</strong> excluded middle and <strong>the</strong> adoption <strong>of</strong> continuity principle—makes <strong>the</strong><br />
behavior <strong>of</strong> determinacy drastically different from that on <strong>the</strong> classical setting.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Games on N N have been <strong>of</strong> great interest in ma<strong>the</strong>matical logic for a long time. On one<br />
hand, determinacy <strong>of</strong> games has been used as a strong tool to investigate Baire space N N<br />
or Cantor space {0, 1} N . On <strong>the</strong> o<strong>the</strong>r hand, as has been known, determinacy statements<br />
are quite sensitive to <strong>the</strong> ma<strong>the</strong>matical setting: For example, with <strong>the</strong> axiom <strong>of</strong> choice,<br />
full determinacy is inconsistent; determinacy <strong>of</strong> analytic games are beyond ZFC.<br />
The ultimate purpose <strong>of</strong> <strong>the</strong> author is to know how Baire space and Cantor space vary<br />
depending on settings o<strong>the</strong>r than usual ones. As <strong>the</strong> first step toward this, she has been<br />
investigating <strong>the</strong> promising tool, determinacy, on <strong>the</strong>se settings. Among <strong>the</strong>se are subsystems<br />
<strong>of</strong> second order arithmetic, much weaker ones than ZFC (cf. (Nemoto, Ould Med-<br />
Salem and Tanaka, 2007), (Nemoto, 2008)).<br />
This paper treats ano<strong>the</strong>r setting, Brouwerian intuitionistic ma<strong>the</strong>matics. It denies <strong>the</strong><br />
law <strong>of</strong> excluded middle (LEM) and adopts <strong>the</strong> continuity principle, asserting that all <strong>the</strong><br />
functions from N N to N N or to N are continuous (for detail, see Section 2). We give<br />
some examples <strong>of</strong> games, which show that <strong>the</strong> continuity principle and <strong>the</strong> lack <strong>of</strong> LEM<br />
make <strong>the</strong> behavior <strong>of</strong> determinacy drastically different from that on <strong>the</strong> classical setting.<br />
To explicate <strong>the</strong> role <strong>of</strong> classical principles in determinacy, we treat predeterminacy—<br />
a formalization <strong>of</strong> determinacy in <strong>the</strong> intuitionistic ma<strong>the</strong>matics—also in <strong>the</strong> classical<br />
ma<strong>the</strong>matics.<br />
2 Axioms <strong>of</strong> <strong>the</strong> intuitionistic ma<strong>the</strong>matics<br />
In this section, we clarify <strong>the</strong> ma<strong>the</strong>matical setting <strong>of</strong> this paper.<br />
The logical constants have <strong>the</strong>ir constructive meanings and <strong>the</strong> rules <strong>of</strong> <strong>the</strong> intuitionistic<br />
logic are employed. In particular, a disjunctive statement A∨B means <strong>the</strong>re exists a pro<strong>of</strong><br />
<strong>of</strong> A or one <strong>of</strong> B, and an existential statement ∃x ∈ V [A(x)] means <strong>the</strong>re exist an element<br />
a <strong>of</strong> V and an pro<strong>of</strong> <strong>of</strong> A(a). A statement A is decidable if A ∨ ¬A holds. A set X ⊆ V<br />
is decidable if <strong>the</strong> statement a ∈ X is decidable for each a ∈ V .<br />
An infinite sequence α <strong>of</strong> natural numbers α(0), α(1), α(2), ... may be determined by<br />
some finitely described algorithm, i.e., <strong>the</strong> n-th element α(n) <strong>of</strong> α is <strong>the</strong> result <strong>of</strong> <strong>the</strong><br />
algorithm for input n. Sometimes, however, such an infinite sequence may be constructed<br />
step by step by choosing its elements one by one. In this case, <strong>the</strong> construction <strong>of</strong> <strong>the</strong><br />
125
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
sequence is never finished: At any point in time, only finitely many elements have been<br />
chosen, and so we can only know a finite part <strong>of</strong> <strong>the</strong> sequence.<br />
The latter construction is not permitted in <strong>the</strong> constructive ma<strong>the</strong>matics, and so this<br />
point divides <strong>the</strong> intuitionistic ma<strong>the</strong>matics from <strong>the</strong> constructive ma<strong>the</strong>matics.<br />
Note that every infinite sequence, even if it is given by an algorithm, can be regarded<br />
as a result <strong>of</strong> step-by-step-construction. This is <strong>the</strong> reason we do not distinguish infinite<br />
sequences <strong>of</strong> natural number by <strong>the</strong>ir manners <strong>of</strong> construction.<br />
Let N be <strong>the</strong> set <strong>of</strong> natural numbers. XN is <strong>the</strong> set <strong>of</strong> infinite sequences from X.<br />
In particular NN is called Baire space and 2N is called Cantor space. Xn is <strong>the</strong> set<br />
<strong>of</strong> sequences from X <strong>of</strong> length n and X
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
The strict fan <strong>the</strong>orem<br />
For a fan S and a decidable bar B in S, <strong>the</strong>re is a bounded sub-bar B ′ ⊆ B in S.<br />
While König’s lemma and <strong>the</strong> strict fan <strong>the</strong>orem are equivalent in <strong>the</strong> classical ma<strong>the</strong>matics,<br />
<strong>the</strong>y are not in <strong>the</strong> intuitionistic ma<strong>the</strong>matics. Actually we can construct a “socalled”<br />
intuitionistic counterexample, i.e., a fan T which has sequences <strong>of</strong> any finite length<br />
such that we cannot prove that T has an infinite path, i.e., αN → N such that αn ∈ T for<br />
all n. Let i n ∈ {0, 1} n be such that i n (k) = i for all k < n and let i N ∈ {0, 1} N be such<br />
that i N (n) = i for all n. Define T ⊆ {i n : i < 2, n ∈ N} by<br />
0 n ∈ T ↔<strong>the</strong>re is no k < n such that pk+i = 9 for all i < 99, or if <strong>the</strong> least such k is even,<br />
1 n ∈ T ↔<strong>the</strong>re is no k < n such that pk+i = 9 for all i < 99, <strong>the</strong>n <strong>the</strong> least such k is odd,<br />
where pk denotes <strong>the</strong> k-th digit <strong>of</strong> <strong>the</strong> decimal expansion <strong>of</strong> π. We can easily see that T<br />
is a fan which has sequences <strong>of</strong> any finite length and that if T has an infinite path α, <strong>the</strong>n<br />
α = 0 N or α = 1 N . Assume that T has an infinite path α. If α(0) = 0 (or 1), <strong>the</strong>n we<br />
must have a pro<strong>of</strong> <strong>of</strong> <strong>the</strong> statement “if <strong>the</strong>re is uninterrupted occurrences <strong>of</strong> 9 <strong>of</strong> length 99<br />
in <strong>the</strong> decimal expansion <strong>of</strong> π, <strong>the</strong> least such one starts at an even (resp. odd) digit.” Up<br />
to now, we do not have any pro<strong>of</strong> <strong>of</strong> such statements, and so <strong>the</strong>re is no infinite path in T .<br />
(If we have a pro<strong>of</strong> in future, we can find ano<strong>the</strong>r so-called counterexample using ano<strong>the</strong>r<br />
unsolved problem in a similar way.)<br />
3 Determinacy in intuitionistic ma<strong>the</strong>matics<br />
In this section, we introduce <strong>the</strong> notion <strong>of</strong> determinacy and variants.<br />
For A ⊆ N N , <strong>the</strong> game G(A) in N N is defined as follows. Two players, called players<br />
I and II, starting with player I, alternately choose a natural number to construct α ∈ N N .<br />
Player I wins if and only if <strong>the</strong> resulting play α is in A. Player II wins if and only if player<br />
I does not win. A strategy for player I (resp. II) is a function which assigns a natural<br />
number to each even-(resp. odd-)length sequence in N
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(Veldman, 2004) gave three formalizations <strong>of</strong> determinacy in <strong>the</strong> intuitionistic ma<strong>the</strong>matics.<br />
G(A) is strongly determinate if, in G(A), ei<strong>the</strong>r player I or player II has a winning<br />
strategy. This is <strong>the</strong> simplest formalization, but almost no game is strongly determinate.<br />
G(A) is determinate from <strong>the</strong> view point <strong>of</strong> player I if, if for every strategy τ <strong>of</strong> player<br />
II, <strong>the</strong>re is α ∈II τ with α ∈ A, <strong>the</strong>n player I has a winning strategy in G(A). This<br />
statement corresponds to <strong>the</strong> classical statement “if player II has no winning strategy,<br />
<strong>the</strong>n player I has one in G(A),” which is classically equivalent to “G(A) is determinate.”<br />
To describe <strong>the</strong> last, we need a new notion. An anti-strategy for player I in G(A) is<br />
a function η which assigns α ∈II τ to each strategy τ for player II in G(A). An antistrategy<br />
η for player I secures A if, for any strategy τ for player II, η(τ) ∈ A. G(A)<br />
is predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I if, if he has an anti-strategy securing A,<br />
<strong>the</strong>n he has a winning strategy in G(A).<br />
Note that G(A) is predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I, if G(A) is determinate<br />
from his viewpoint.<br />
Moreover, in a game G(X) in N N (or spread [S]), <strong>the</strong> second axiom <strong>of</strong> continuous<br />
choice yields <strong>the</strong> converse, i.e., predeterminacy implies determinacy, since a strategy for<br />
a player can be regarded as a function from N to N and since if <strong>the</strong>re is α ∈II τ with<br />
α ∈ X for all strategy τ for player II, <strong>the</strong>n by <strong>the</strong> second axiom <strong>of</strong> continuous choice an<br />
anti-strategy for player I securing X is given by a code η <strong>of</strong> a continuous function.<br />
The intuitionistic determinacy <strong>the</strong>orem (Veldman, 2004, Theorem 3.5) If [S] is a IIfinitary<br />
branching spread, i.e., S is a spread-law such that, for every odd-length s ∈ S,<br />
<strong>the</strong>re are at most finitely many n with s ∗ 〈n〉 ∈ T , <strong>the</strong>n G[S](A) is predeterminate from<br />
<strong>the</strong> viewpoint <strong>of</strong> player I for every A ⊆ [S].<br />
In particular, if A ⊆ {0, 1} N , G {0,1} N(A) is predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player<br />
I. (Veldman, 2004) also gave A ⊆ N N such that G(A) is not predeterminate from <strong>the</strong><br />
viewpoint <strong>of</strong> player I.<br />
Remark The notion <strong>of</strong> predeterminacy can be formalized from <strong>the</strong> viewpoint <strong>of</strong> player II<br />
and we can obtain similar results to <strong>the</strong> last <strong>the</strong>orem.<br />
4 Variations <strong>of</strong> games and predeterminacy<br />
In this section, we consider o<strong>the</strong>r variations <strong>of</strong> games in <strong>the</strong> intuitionistic ma<strong>the</strong>matics.<br />
For <strong>the</strong>se games, we can define <strong>the</strong> three formalizations <strong>of</strong> determinacy in <strong>the</strong> same way.<br />
4.1 2-length games in {0, 1} N × {0, 1}<br />
This subsection treats one <strong>of</strong> <strong>the</strong> simplest cases in which less strategies are allowed than<br />
in <strong>the</strong> classical context. {0, 1} N × {0, 1} denotes <strong>the</strong> product topological space <strong>of</strong> Cantor<br />
space and discrete space {0, 1}.<br />
For given A ⊆ {0, 1} N × {0, 1}, <strong>the</strong> game G1(A) is defined as follows:<br />
• Player I chooses α ∈ {0, 1} N .<br />
• Player II chooses i ∈ {0, 1}.<br />
• Player I wins if (α, i) ∈ A and player II wins if player I does not win.<br />
128
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Although {0, 1} N × {0, 1} is homeomorphic to Cantor space topologically, we must be<br />
sensitive to <strong>the</strong> ordertype <strong>of</strong> <strong>the</strong> indexing set for <strong>the</strong> sequences.<br />
In this game, a strategy for player I is his initial move α, and a strategy for player<br />
II is a function from {0, 1} N to {0, 1}. The continuity principle forces all <strong>the</strong> strategies<br />
for player II to be continuous, and so we may regard a strategy τ for player II as a code<br />
<strong>of</strong> a continuous function such that (τ|α)(0) ∈ {0, 1} for all α ∈ {0, 1} N . B = {s ∈<br />
{0, 1} 0} is a decidable bar in <strong>the</strong> fan {0, 1} N . Then, by <strong>the</strong> strict fan <strong>the</strong>orem,<br />
<strong>the</strong>re is a bounded sub-bar B ′ ⊆ B. Take n such that lh(s) < n for every s ∈ B ′ .<br />
Then, {0, 1} n is also a bar in {0, 1} N , and, for every α, β ∈ {0, 1} N , αn = βn implies<br />
τ|α(0) = τ|β(0). Thus we can regard τ as a function from {0, 1} nτ to {0, 1}, which can<br />
be coded by a natural number. Because an anti-strategy η for player I is a function from<br />
<strong>the</strong> set <strong>of</strong> all strategies for player II to <strong>the</strong> set <strong>of</strong> plays in this game, it can be regarded as<br />
a function from N with <strong>the</strong> discrete topology to {0, 1} N × {0, 1}.<br />
The following examples shows that even simpler sets, such as open or closed sets, are<br />
not predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I.<br />
Example 1 An open game G1(A) which is not predeterminate from <strong>the</strong> view point <strong>of</strong><br />
player I: Define Ai = {0 n ∗ 〈1, i〉 : n ∈ N} and A = {(α, i) : ∃n[αn ∈ Ai]}. Then A is<br />
open. Let η be <strong>the</strong> anti-strategy for player I which assigns (0 n τ ∗ 〈1, τ(0 n τ )〉 ∗ 0 N , τ(0 n τ )) to<br />
each strategy τ for player II. Then η(τ) ∈ A for each strategy τ for player II, and so η is<br />
an anti-strategy for player I securing A. On <strong>the</strong> o<strong>the</strong>r hand, it is clear that player I has no<br />
winning strategy in G1(A).<br />
Example 2 A closed game G1(B) which is not predeterminate from <strong>the</strong> viewpoint <strong>of</strong><br />
player I: Let T be an intuitionistic counterexample to König’s lemma, i.e., an unbounded<br />
binary tree without infinite paths. Let Ti = {t ∗ i n |t ∈ T ∧ n ∈ N}. Then B =<br />
{(α, i)|∀n[αn ∈ Ti]} is a closed set. If player I had a winning strategy α in G1(B), α<br />
would be an infinite path <strong>of</strong> T . Thus player I cannot have a winning strategy in G1(B).<br />
On <strong>the</strong> o<strong>the</strong>r hand, player I has an anti-strategy securing B. Fix an enumeration <strong>of</strong> T and<br />
let tn be <strong>the</strong> minimum s ∈ T such that lh(s) = n with respect to this enumeration. Let<br />
η be <strong>the</strong> anti-strategy for player I which assigns (tn ∗ (τ(tn)) N , τ(tn)) to each strategy<br />
τ : {0, 1} n → {0, 1} for player II. Clearly η secures B.<br />
4.2 ω + 1 length games in {0, 1} N × {0, 1}<br />
In this subsection, we consider ano<strong>the</strong>r kind <strong>of</strong> games in {0, 1} N × {0, 1}.<br />
For given A ⊆ {0, 1} N × {0, 1}, <strong>the</strong> game G2(A) is defined as follows.<br />
• Player I and player II alternately choose i ∈ {0, 1} to form α ∈ {0, 1} N .<br />
• After α is formed, player I chooses i ∈ {0, 1}.<br />
• Player I wins G2(A) if and only if (α, i) ∈ A.<br />
In this game, a strategy σ for player I is a pair (σ0, σ1) <strong>of</strong> functions σ0 : �<br />
n∈N {0, 1}2n →<br />
{0, 1} and σ1 : {0, 1} N → {0, 1}. By <strong>the</strong> strict fan <strong>the</strong>orem, we can regard, as well as in<br />
<strong>the</strong> last subsection, σ1 as a function from {0, 1} n to {0, 1} for some n ∈ N.<br />
A strategy for player II is a function τ : �<br />
n∈N {0, 1}2n+1 → {0, 1}, which can be<br />
regarded as an element <strong>of</strong> {0, 1} N . Then an anti-strategy η for player I is a function from<br />
129
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
{0, 1} N to {0, 1} N × {0, 1}, which can be regarded a pair (η0, η1) <strong>of</strong> codes <strong>of</strong> continuous<br />
functions such that, for any strategy τ for player II, (η0|τ, (η1|τ)(0)) ∈II τ. By <strong>the</strong><br />
strict fan <strong>the</strong>orem, <strong>the</strong>re is n such that for any strategies τ and τ ′ , τn = τ ′ n implies<br />
(η1|τ)(0) = (η1|τ ′ )(0), and so we can regard η1 as a function from {0, 1} n to {0, 1}.<br />
Theorem 1 For any C ⊆ {0, 1} N × {0, 1}, G2(C) is predeterminate from <strong>the</strong> viewpoint<br />
<strong>of</strong> player I.<br />
Pro<strong>of</strong>. For i < {0, 1}, set Ci = {α : (α, i) ∈ C}. Assume that η = (η0, η1) is an<br />
anti-strategy for player I securing C and η1 can be regarded as a function from {0, 1} n<br />
to {0, 1} for some n. Note that, in G {0,1} N(C0 ∪ C1), η0 is an anti-strategy for player I<br />
securing C0 ∪ C1. Let σ0 be a winning strategy for player I constructed in <strong>the</strong> pro<strong>of</strong> <strong>of</strong><br />
The intuitionistic determinacy <strong>the</strong>orem in G {0,1} N(C0 ∪ C1). Set Pσ0 = {α : α ∈I σ0}.<br />
Note that Pσ0 is a spread. By <strong>the</strong> pro<strong>of</strong> <strong>of</strong> The intuitionistic determinacy <strong>the</strong>orem, for any<br />
α ∈ Pσ0, <strong>the</strong>re exists a strategy δ for player II with η0|δ = α. By <strong>the</strong> second axiom <strong>of</strong><br />
continuous choice, <strong>the</strong>re exists a code <strong>of</strong> continuous function ζ such that, for any strategy<br />
α ∈ Pσ0, ζ|α is a strategy for player II with η0|(ζ|α) = α. By <strong>the</strong> strict fan <strong>the</strong>orem, <strong>the</strong>re<br />
exists a natural number N such that, for any α and β in Pσ0, αN = βN implies (ζ|α)n =<br />
(ζ|β)n. Then we can define σ1 : Pσ0 → {0, 1} by σ1(α) = η1((ζ|α)n), since σ1(α) is<br />
determined by αN. Define a new strategy σ = (σ0, σ1) for player I in G2(C). Then, for<br />
any (α, i) ∈I σ, a strategy δ = ζ|α for player II satisfies (α, i) = (η0|δ, (η1|δ)(0)), and so<br />
σ is a winning strategy for player I in G2(C). �<br />
Comparing this <strong>the</strong>orem with <strong>the</strong> examples in <strong>the</strong> last subsection, we can conclude that<br />
predeterminacy depends how players construct <strong>the</strong> sequence ra<strong>the</strong>r than what sequence<br />
<strong>the</strong>y do.<br />
4.3 ω + 2-length game in {0, 1} N × {0, 1} 2<br />
Next we consider slightly longer games.<br />
For a given set A ⊆ {0, 1} N × {0, 1} 2 , consider <strong>the</strong> following game G3(A).<br />
• First, player I and player II alternately choose n ∈ {0, 1} to form α ∈ {0, 1} N .<br />
• After α is formed, player I chooses i ∈ {0, 1} and player II chooses j ∈ {0, 1}.<br />
• Player I wins if (α, 〈i, j〉) ∈ A and player II wins if player I does not win.<br />
Similarly to <strong>the</strong> previous subsection, a strategy σ for player I is a pair (σ0, σ1), where<br />
σ0 is a function �<br />
n∈N {0, 1}2n to {0, 1} and where σ1 is a function from {0, 1} N to {0, 1}.<br />
We can regard σ1 as a function from {0, 1} n to {0, 1} for some n ∈ N.<br />
A strategy τ for player II is a pair (τ0, τ1), where τ0 is a function from �<br />
n∈N {0, 1}2n+1<br />
to {0, 1} and where τ1 is a function from {0, 1} N × {0, 1} to {0, 1}. Note that since τ1 is<br />
continuous, its restriction τ1,i to {0, 1} N × {i} is also continuous and so we can regard τ1<br />
as a pair (τ10, τ11) <strong>of</strong> functions {0, 1} ni to {0, 1} for some ni’s.<br />
Hence, <strong>the</strong> set <strong>of</strong> strategies for player II can be regarded as {0, 1} N × N, and so an antistrategy<br />
for player I can be regarded as a function η from {0, 1} N × N to {0, 1} N × {0, 1} 2<br />
such that η(τ) ∈II τ for each strategy τ for player II.<br />
As in <strong>the</strong> case <strong>of</strong> G1(X), we have <strong>the</strong> following examples. For any s ∈ {0, 1}
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Example 3 Recall Ai defined in Example 1. Then <strong>the</strong> open game G3(A ′ ) defined by<br />
A ′ = {(α, 〈i, j〉) : ∃n[(αn) ′ ∈ Aj]} is not predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I.<br />
Example 4 Recall Ti defined in Example 2. Then <strong>the</strong> closed game G3(B ′ ) defined by<br />
B ′ = {(α, 〈i, j〉) : ∀n(αn) ′ ∈ Tj} is not predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I.<br />
5 Predeterminacy in <strong>the</strong> classical ma<strong>the</strong>matics<br />
In this section, we consider predeterminacy in <strong>the</strong> classical ma<strong>the</strong>matics in order to investigate<br />
<strong>the</strong> role <strong>of</strong> classical principles in predeterminacy. Note that all <strong>the</strong> definitions<br />
and statements in this section are made in <strong>the</strong> classical ma<strong>the</strong>matics which includes <strong>the</strong><br />
countable axiom <strong>of</strong> choice.<br />
Recall that, in <strong>the</strong> intuitionistic ma<strong>the</strong>matics, an anti-strategy is a function η such that<br />
η(τ) ∈II τ for each strategy τ for player II. We translate this definition into <strong>the</strong> classical<br />
ma<strong>the</strong>matics, noticing that every function on N N is continuous in <strong>the</strong> intuitionistic<br />
ma<strong>the</strong>matics:<br />
Let G(X) be any <strong>of</strong> games treated in <strong>the</strong> previous sections. An anti-strategy for player<br />
I in G(X) is a continuous function which assigns α ∈II τ to every continuous strategy<br />
τ for player II in G(X). An anti-strategy η for player I in G(X) secures X if η(τ) ∈ X<br />
for all continuous strategies τ for player II. G(X) is predeterminate from <strong>the</strong> viewpoint<br />
<strong>of</strong> player I if,<br />
if player I has an anti-strategy η securing X <strong>the</strong>n player I has a winning<br />
strategy in G(X).<br />
Note that <strong>the</strong> ordinary definition <strong>of</strong> determinacy statement can be seen as “if <strong>the</strong>re is a<br />
function η such that η(τ) ∈II τ and η(τ) ∈ X for all strategies τ for player II, <strong>the</strong>n player<br />
I has a winning strategy in G(X).”<br />
For X ⊆ N N , strategies for players in <strong>the</strong> game G(X) can be regarded as functions N<br />
to N, and so all <strong>the</strong> strategies are continuous. Therefore <strong>the</strong> condition “continuous” for<br />
strategies has no effect in games G(X), but it does in <strong>the</strong> games G1(X), G2(X) and G3(X).<br />
Moreover <strong>the</strong> continuity in <strong>the</strong> definition <strong>of</strong> anti-strategy is essential in <strong>the</strong> following<br />
discussion.<br />
As mentioned in (Veldman, 2004, 1.1), The intuitionistic determinacy <strong>the</strong>orem holds<br />
also in <strong>the</strong> classical ma<strong>the</strong>matics. In particular, for all A ⊆ {0, 1} N , G {0,1} N(A) is predeterminate<br />
from <strong>the</strong> viewpoint <strong>of</strong> player I in <strong>the</strong> classical ma<strong>the</strong>matics.<br />
Now we consider <strong>the</strong> predeterminacy <strong>of</strong> <strong>the</strong> games G1(X), G2(X) and G3(X) which<br />
are defined in <strong>the</strong> last section, in <strong>the</strong> classical ma<strong>the</strong>matics. Due to König’s lemma, <strong>the</strong><br />
classical counterpart <strong>of</strong> <strong>the</strong> strict fan <strong>the</strong>orem, also in <strong>the</strong> classical ma<strong>the</strong>matics, a continuous<br />
function from {0, 1} N → {0, 1} or {0, 1} N → {0, 1} N is given by its code η defined<br />
in Section 2. In particular, a strategy for player II in G1(A) can be seen as a function<br />
τ : {0, 1} n → {0, 1} for some n and an anti-strategy for player I in G2(A) can be seen as<br />
a pair (η0, η1) <strong>of</strong> a code η0 <strong>of</strong> continuous function and η1 : {0, 1} m → {0, 1} for some m.<br />
The game G1(A) is not predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I, where A is<br />
defined in <strong>the</strong> pro<strong>of</strong> <strong>of</strong> Example 1. For closed games, <strong>the</strong> situation differs: Whereas<br />
Example 2 is a closed game which is not predeterminate from <strong>the</strong> viewpoint <strong>of</strong> player I<br />
in <strong>the</strong> intuitionistic ma<strong>the</strong>matics, we will show that <strong>the</strong>re is no such closed game in <strong>the</strong><br />
classical ma<strong>the</strong>matics.<br />
<strong>13</strong>1
For X ⊆ {0, 1} N × {0, 1} and s ∈ {0, 1}
6 Fur<strong>the</strong>r problems<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Predeterminacy <strong>of</strong> closed game G3(X) in <strong>the</strong> classical ma<strong>the</strong>matics The first problem<br />
<strong>the</strong> author is interested in is whe<strong>the</strong>r <strong>the</strong> closed games G3(X) are predeterminate or<br />
not in <strong>the</strong> classical ma<strong>the</strong>matics. It will be solved by analyzing <strong>the</strong> property <strong>of</strong> continuous<br />
functions in Cantor space.<br />
Classical investigation <strong>of</strong> predeterminacy We can consider various formalizations <strong>of</strong><br />
predeterminacy in <strong>the</strong> classical ma<strong>the</strong>matics o<strong>the</strong>r than defined in Section 5, e.g.,<br />
If player I has an anti-strategy such that η(τ) ∈ A for each continuous strategy<br />
τ for player II, <strong>the</strong>n player I has a continuous winning strategy in G(A).<br />
Note that <strong>the</strong> italicized part is newly added. Again, in game G(X) in N N , this modification<br />
has no effect. However, we can easily find X ⊆ {0, 1} N which is not predeterminate in<br />
this sense but which is predeterminate in <strong>the</strong> sense <strong>of</strong> Section 5. The author expects that<br />
<strong>the</strong> investigation on <strong>the</strong>se variations explicates how continuity confines functions on Baire<br />
space or Cantor space.<br />
Constructive reverse ma<strong>the</strong>matical analysis <strong>of</strong> predeterminacy Constructive reverse<br />
ma<strong>the</strong>matics is a study to measure <strong>the</strong> strength <strong>of</strong> ma<strong>the</strong>matical statements by nonconstructive<br />
principles using constructive ma<strong>the</strong>matics as a base <strong>the</strong>ory. Constructive ma<strong>the</strong>matics<br />
is a ma<strong>the</strong>matics which is based on <strong>the</strong> intuitionistic logic, but which does not<br />
adopt axioms introduced in Section 2. Therefore it is included both in <strong>the</strong> classical ma<strong>the</strong>matics<br />
and in <strong>the</strong> intuitionistic ma<strong>the</strong>matics.<br />
(1) The role <strong>of</strong> <strong>the</strong> second axiom <strong>of</strong> continuous choice for predeterminacy Under<br />
<strong>the</strong> second axiom <strong>of</strong> continuous choice, predeterminacy implies determinacy. This implication<br />
needs only a fragment <strong>of</strong> <strong>the</strong> second axiom <strong>of</strong> continuous choice, and it is natural<br />
to ask exactly how strong fragments are required. If we measure <strong>the</strong> strength <strong>of</strong> fragments<br />
by <strong>the</strong> complexity <strong>of</strong> R in <strong>the</strong> axiom, <strong>the</strong> difficulty is in <strong>the</strong> reduction <strong>of</strong> general formulas<br />
<strong>of</strong> <strong>the</strong> form ∀α∃βR(α, β) to <strong>the</strong> form ∀τ∃σ∀α(α ∈I σ ∧ α ∈II τ → R ′ (α)).<br />
(2) Equivalences between predeterminacy and intuitionistic axioms (Veldman,<br />
200x) proposed intuitionistic second order arithmetic and proved that <strong>the</strong> predeterminacy<br />
<strong>of</strong> open subsets <strong>of</strong> II-finitary branching spreads in N is equivalent to <strong>the</strong> strict fan <strong>the</strong>orem<br />
over <strong>the</strong> system BIM, which corresponds a popular classical base <strong>the</strong>ory RCA0 in <strong>the</strong> field<br />
called Friedman-Simpson’s reverse ma<strong>the</strong>matics (cf. (Simpson, 1999)). The author <strong>of</strong> <strong>the</strong><br />
present paper is now looking for similar equivalences beyond open sets. The first task in<br />
this direction is to find a suitable intuitionistic axiom to compare with. One <strong>of</strong> candidates<br />
is almost-fan-<strong>the</strong>orem proposed in (Veldman, 2001).<br />
(3) The role <strong>of</strong> LEM for predeterminacy In <strong>the</strong> pro<strong>of</strong> <strong>of</strong> Theorem 2, we use <strong>the</strong> law<br />
<strong>of</strong> excluded middle. It seems impossible to prove it without this classical law, because<br />
we have B <strong>of</strong> Example 2 in <strong>the</strong> intuitionistic ma<strong>the</strong>matics. The next natural question<br />
is what fragment <strong>of</strong> <strong>the</strong> classical law (such as <strong>the</strong> excluded middle or double negation<br />
<strong>13</strong>3
elimination) is necessary and sufficient for determinacy or predeterminacy statements.<br />
(Akama, Berardi, Hayashi and Kohlenbach, 2004) discovered a hierarchy consisting <strong>of</strong><br />
<strong>the</strong>se fragments over Heyting arithmetic HA, which is <strong>the</strong> constructive counterpart to<br />
Peano arithmetic. The author <strong>of</strong> present paper tries to measure predeterminacy or determinacy<br />
statements along this hierarchy.<br />
(4) Equivalences between predeterminacy and classical axioms Since we treat<br />
predeterminacy also in <strong>the</strong> classical ma<strong>the</strong>matics, it is natural to consider Friedman-<br />
Simpson’s reverse ma<strong>the</strong>matical study <strong>of</strong> predeterminacy. Using constructive ma<strong>the</strong>matics<br />
as a base <strong>the</strong>ory, we can make a finer reverse ma<strong>the</strong>matical study <strong>of</strong> predeterminacy.<br />
Acknowledgements<br />
Some parts <strong>of</strong> this paper were done as <strong>the</strong> final assignment <strong>of</strong> master class 2006/2007 in<br />
logic at ma<strong>the</strong>matical research institute, <strong>the</strong> Ne<strong>the</strong>rlands. The author would like to express<br />
her gratitude to <strong>the</strong> supervisor, Dr. Wim Veldman, who introduced her to <strong>the</strong> attractivity<br />
<strong>of</strong> <strong>the</strong> intuitionistic ma<strong>the</strong>matics.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Akama, Y., Berardi, S., Hayashi, S. and Kohlenbach, U. (2004). An arithmetical hierarchy<br />
<strong>of</strong> <strong>the</strong> law <strong>of</strong> excluded middle and related principles, in H. Ganzinger (ed.),<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Nineteenth Annual IEEE Symp. on Logic in Computer Science,<br />
LICS 2004, IEEE Computer Society Press, pp. 192–201.<br />
Nemoto, T. (2008). Determinacy <strong>of</strong> wadge classes and subsystems <strong>of</strong> second order arithmetic.<br />
Accepted for publication in Math. Log. Q., available at<br />
http://www.math.tohoku.ac.jp/˜sa4m20/wadge.pdf.<br />
Nemoto, T., Ould MedSalem, M. and Tanaka, K. (2007). Infinite games in <strong>the</strong> cantor<br />
space and subsystems <strong>of</strong> second order arithmetic, Math. Log. Q. 53: 226–236.<br />
Simpson, S. G. (1999). Subsystems <strong>of</strong> second order arithmetic, Springer.<br />
Veldman, W. (2001). Almost <strong>the</strong> fan <strong>the</strong>orem, Technical report, Department <strong>of</strong> Ma<strong>the</strong>matics,<br />
University <strong>of</strong> Nijmegen.<br />
Veldman, W. (2004). The problem <strong>of</strong> <strong>the</strong> determinacy <strong>of</strong> infinite games from an intuitionistic<br />
point <strong>of</strong> view, Technical report, Department <strong>of</strong> Ma<strong>the</strong>matics, University <strong>of</strong><br />
Nijmegen. To appear in <strong>the</strong> proceeding <strong>of</strong> Logic, Games and Philosophy: Foundational<br />
Perspectives, Prague 2004.<br />
Veldman, W. (200x). Brouwer’s fan <strong>the</strong>orem as an axiom and as a contrast to kleene’s<br />
alternative. Preprint.<br />
<strong>13</strong>4
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
LANGUAGE TECHNOLOGIES FOR INSTRUCTIONAL RESOURCES IN<br />
BULGARIAN<br />
Ivelina Nikolova<br />
University <strong>of</strong> S<strong>of</strong>ia<br />
Abstract. This paper describes a system that uses language technologies applied on instructional<br />
materials in order to provide computer-aided design <strong>of</strong> test items. This approach<br />
employs lexical and syntactic information obtained from various techniques like POS tagging,<br />
constituency parsing and term extraction. The system compiles a list <strong>of</strong> central terms<br />
for <strong>the</strong> instructional materials, creates drafts <strong>of</strong> fill in <strong>the</strong> blank questions and suggests possible<br />
distrators. The experiment is carried out on textbooks in geography, biology and history<br />
<strong>of</strong> Bulgarian high-schools.<br />
1 Introduction and related work<br />
Asking questions is a way to keep students attention in class and verify <strong>the</strong>ir understanding.<br />
Depending on <strong>the</strong> type <strong>of</strong> education and <strong>the</strong> goal <strong>of</strong> <strong>the</strong> teacher, questions could<br />
be asked in a different form - orally or as short writing examination, in a game manner<br />
etc. One common technique to do that is asking multiple choice questions, which became<br />
even more popular in <strong>the</strong> last years, because it is also applicable for <strong>the</strong> case <strong>of</strong> e-learning.<br />
However, designing thousands <strong>of</strong> tests is a time and effort-consuming educational activity.<br />
All questions in <strong>the</strong> test should be carefully tuned for <strong>the</strong> target group <strong>of</strong> test-takers<br />
and should not underestimate or overestimate <strong>the</strong>ir knowledge. Hence <strong>the</strong> teaching experts<br />
who prepare <strong>the</strong> tests must have much broader knowledge in <strong>the</strong> field, compared to<br />
<strong>the</strong> content which is explicitly included in <strong>the</strong> particular textbook, and <strong>the</strong>y have to tune<br />
<strong>the</strong> tests to <strong>the</strong> knowledge <strong>of</strong> <strong>the</strong> test-takers. One <strong>of</strong> <strong>the</strong> most difficult tasks in producing<br />
test items is to decide whe<strong>the</strong>r a question does really have its answer in <strong>the</strong> instructional<br />
materials.<br />
These difficulties gave rise <strong>of</strong> a relatively new research area dealing with support <strong>of</strong><br />
<strong>the</strong> generation <strong>of</strong> test items, answer and distractor suggestions. Generation <strong>of</strong> multiple<br />
choice questions with <strong>the</strong> help <strong>of</strong> NLP technologies is a hot area where different tools for<br />
text processing are used in order to transform <strong>the</strong> facts from <strong>the</strong> instructional materials<br />
to questions which can be used for students assessment. One <strong>of</strong> <strong>the</strong> most interesting approaches<br />
in this respect is presented by (Mitkov, Ha and Karamis, 2006), where <strong>the</strong>y apply<br />
language technologies (LT) for generation <strong>of</strong> test-items for English, focusing on <strong>the</strong> automatic<br />
choice <strong>of</strong> distractors. They report speeding up <strong>of</strong> <strong>the</strong> process <strong>of</strong> test development<br />
about 6-10 times, compared to <strong>the</strong> manual test elicitation. Their approach is not domain<br />
specific and can be applied to each area. O<strong>the</strong>r authors actively working in <strong>the</strong> area are<br />
(Aldabe, De Lacalle and Maritxalar, 2007), who are focusing on <strong>the</strong> different types <strong>of</strong><br />
question models with application primary in <strong>the</strong> language learning. We are not familiar<br />
with any related work concerning this activity for learning materials in Bulgarian except<br />
for <strong>the</strong> previous work <strong>of</strong> <strong>the</strong> author (Nikolova, 2007). So our efforts are strongly inspired<br />
by <strong>the</strong> growing interest to this field, which is due to its significant practical importance.<br />
On <strong>the</strong> o<strong>the</strong>r hand, we are motivated and encouraged by <strong>the</strong> presence <strong>of</strong> sophisticated<br />
<strong>13</strong>5
LT for Bulgarian language, which enable relatively complex text preprocessing, so <strong>the</strong><br />
automatic acquisition <strong>of</strong> learning objects from raw texts does not start from scratch.<br />
This article presents <strong>the</strong> idea <strong>of</strong> <strong>the</strong> master <strong>the</strong>sis <strong>of</strong> <strong>the</strong> author which is still work in<br />
progress. The aim is to develop a workbench supporting test designers by language technologies,<br />
applied to <strong>the</strong> instructional materials. The task has three aspects: (1) suggestion<br />
<strong>of</strong> key terms for (2) question generation and (3) distractor suggestion. For our purpose<br />
<strong>the</strong> text is preprocessed by a number <strong>of</strong> preliminary available LT modules and lexical and<br />
syntactic features are extracted and kept in meta-data format. Those features are used<br />
later on for <strong>the</strong> generation <strong>of</strong> <strong>the</strong> draft learning objects. The experiment described in <strong>the</strong><br />
article has been applied for three different domain areas Geography, Biology and History.<br />
The materials are taken from textbooks for 9th, 10th and 11th grade respectively.<br />
The remaining part <strong>of</strong> this article is organised as follows: we first sketch <strong>the</strong> general<br />
architecture <strong>of</strong> <strong>the</strong> system in section 2; in section 3 we describe <strong>the</strong> data processing;<br />
section 4 explains in detail <strong>the</strong> experiment done so far; section 5 concerns <strong>the</strong> evaluation<br />
at <strong>the</strong> current stage <strong>of</strong> <strong>the</strong> experiment; section 6 presents <strong>the</strong> conclusion and issues for<br />
future work.<br />
Figure 1: Workbench supporting <strong>the</strong> development <strong>of</strong> multiple-choice test items.<br />
2 Workbench description<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
The system suggests draft learning objects to <strong>the</strong> test designers in order to help <strong>the</strong>m during<br />
<strong>the</strong> test items preparation. As shown in Fig.1 <strong>the</strong> instructional materials are supplied<br />
by <strong>the</strong> test maker. They are being preprocessed and two main data sets are created: (a)<br />
list <strong>of</strong> key terms (terms central for <strong>the</strong> text which is supplied), <strong>the</strong> way how it is built is<br />
explained later in section 4.1 and (b) lexical and syntactic information about <strong>the</strong> supplied<br />
text, which is kept in metadata format. Then <strong>the</strong> user may obtain all possible questions<br />
generated from <strong>the</strong> supplied material or <strong>the</strong> ones related to a certain key term she is interested<br />
in. If <strong>the</strong> system does not find appropriate sentences, containing <strong>the</strong> term, which<br />
match its internal question templates (explained later in section 4.2), it returns a list <strong>of</strong><br />
<strong>13</strong>6
pointers to <strong>the</strong> text, containing <strong>the</strong> local context in which <strong>the</strong> term appears and a list <strong>of</strong><br />
related concepts, generated by <strong>the</strong> same model as <strong>the</strong> distractors are.<br />
3 Data processing<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Our task is to support test makers during <strong>the</strong> process <strong>of</strong> building educational resources,<br />
namely test questions and vocabulary <strong>of</strong> important concepts for <strong>the</strong> domain. We do this<br />
by using language technologies over <strong>the</strong> raw instructional materials and obtain linguistic<br />
resources which are to be loaded into a workbench that help <strong>the</strong> test designers during <strong>the</strong>ir<br />
work. For our purpose we passed through several phases as shown in Fig. 2.<br />
Figure 2: Data processing.<br />
The instructional material is taken in a plain text format and is firstly parsed with<br />
an NP extractor, where nouns and noun phrases are obtained in order to make a list <strong>of</strong><br />
potential key terms, which are to be suggested to <strong>the</strong> test designers. By <strong>the</strong> same time<br />
when those extracted terms are marked an inverted index is produced. It contains a list<br />
<strong>of</strong> <strong>the</strong> extracted NPs (nouns and noun phrases) and <strong>the</strong>ir corresponding absolute position<br />
in <strong>the</strong> text. A threshold for <strong>the</strong> importance <strong>of</strong> <strong>the</strong> extracted terms is set and all NPs with<br />
frequency higher than <strong>the</strong> threshold are included in <strong>the</strong> list <strong>of</strong> key terms. In addition all<br />
<strong>the</strong> NPs that contain a noun which is a key term are also included in <strong>the</strong> key terms list.<br />
During <strong>the</strong> next phase <strong>the</strong> raw text is tagged for POS categories. For our case we found<br />
practical to use <strong>the</strong> SVMTool made by (Gimenéz and Márquez, 2004) which was trained<br />
over <strong>the</strong> newspaper part <strong>of</strong> BulTreeBank 1 . The proper names, recognised by <strong>the</strong> tagger<br />
were added to <strong>the</strong> list <strong>of</strong> key terms and <strong>the</strong>n <strong>the</strong> output was processed with <strong>the</strong> multilingual<br />
statistical parsing engine <strong>of</strong> Dan Bikels (Bikel, 2004), which is implementation<br />
and extension <strong>of</strong> Collins parser referred bellow as (Collins, 1999). The parsing model<br />
1 HPSG-based Syntactic Treebank <strong>of</strong> Bulgarian (BulTreeBank), http://bultreebank.org/<br />
<strong>13</strong>7
was trained on BulTreeBank. All <strong>the</strong> syntactic and lexical information obtained in <strong>the</strong>se<br />
phases is kept in meta-format and used later in order to produce draft learning objects<br />
(key terms, test items), which are suggested to <strong>the</strong> test designers.<br />
4 The experiment<br />
4.1 Key terms suggestion<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
We build our approach on <strong>the</strong> understanding that questions given to <strong>the</strong> learner concern<br />
terms, which are central for <strong>the</strong> domain. These are <strong>the</strong> terms, which serve as a basis<br />
for <strong>the</strong> learned material and represent a specific domain vocabulary. Here those terms<br />
are referred to as key terms. Although verbs might be also qualified as good key terms<br />
in some domains, in this experiment we pay attention only to nouns and noun phrases<br />
as potential key terms. They were extracted by <strong>the</strong> classic approach for automatic term<br />
extraction based on frequencies. In order to overcome <strong>the</strong> problem <strong>of</strong> <strong>the</strong> inflection <strong>of</strong><br />
<strong>the</strong> language <strong>the</strong> raw texts were firstly lemmatized and <strong>the</strong>n parsed with <strong>the</strong> NP-extractor<br />
Morena. Once we obtained a list <strong>of</strong> nouns (LN) and noun phrases (LNP) we had to<br />
rank <strong>the</strong>m in order to extract only <strong>the</strong> most important ones which are <strong>the</strong> focus <strong>of</strong> our<br />
approach and users queries. We applied two different techniques for measuring <strong>the</strong> term<br />
importance over LN: simple frequency counting and TF-IDF measuring. As reported by<br />
(Mitkov et al., 2006), we also noticed that TF-IDF produces worse results as it tends to<br />
give low score to frequently used words (for example ���������� - economy) which are<br />
actually quite important in <strong>the</strong> case <strong>of</strong> instructional materials (it is common to repeat <strong>the</strong><br />
same information to <strong>the</strong> learners in order to force <strong>the</strong>m to better remember it). At <strong>the</strong><br />
same time sorting <strong>the</strong> list <strong>of</strong> nouns by <strong>the</strong>ir frequencies, after removing <strong>the</strong> stop words,<br />
gave us quite satisfying results.<br />
Word frequency fi Number <strong>of</strong> words wf with frequency fi<br />
55 1<br />
46 1<br />
22 6<br />
20 1<br />
18 1 wf ≤ fi<br />
16 1<br />
14 1<br />
12 5<br />
10 5<br />
8 6<br />
6 8<br />
4 44 wf ≥ fi<br />
2 174<br />
Table 1: Word frequency distribution in a text with length about 1000 words.<br />
To set <strong>the</strong> threshold for important and less important terms in previous experiments<br />
we have observed already prepared test items, prepared manually by <strong>the</strong> test designers,<br />
concerning <strong>the</strong> same material as <strong>the</strong> corpora we are processing. The test items were parsed<br />
with an NP extractor. We checked <strong>the</strong> popularity <strong>of</strong> <strong>the</strong> NPs, extracted from <strong>the</strong> test items,<br />
in <strong>the</strong> whole corpus and <strong>the</strong> lowest popularity was accepted as a threshold. After repeating<br />
<strong>the</strong> same procedure for different domain corpora we noticed that <strong>the</strong> importance border<br />
is near <strong>the</strong> term frequency, which equals to <strong>the</strong> number <strong>of</strong> words having that count. For<br />
<strong>13</strong>8
example in a comparatively short text we have <strong>the</strong> following figure (Table 1) where <strong>the</strong><br />
threshold is set to frequency f = 7.<br />
Once adjusted <strong>the</strong> threshold, we consider all <strong>the</strong> terms above it as key terms which<br />
should be suggested to <strong>the</strong> test-makers. Now we add all NPs, which contained key terms<br />
to <strong>the</strong> list <strong>of</strong> key terms. For example: along with <strong>the</strong> term (economy) from <strong>the</strong> materials<br />
in geography we add <strong>the</strong> following NPs:<br />
������� ���������� (rural economy),<br />
�������� ���������� (world economy),<br />
���������� ���������� (national economy),<br />
������� ���������� (market economy),<br />
���������� ������� ���������� (national market economy),<br />
���������� �������� ���������� (contemporary world economics),<br />
������� ���������� (Japanese economy),<br />
��������� ���������� (natural economy),<br />
���������� ������� ������� ���������� (contemporary modern rural economy)<br />
Removing <strong>the</strong> NPs containing stop words prevented <strong>the</strong> use <strong>of</strong> phrases like �������<br />
���������� (<strong>the</strong>ir economy). After <strong>the</strong> POS tagging <strong>the</strong> recognised proper nouns were<br />
also added to <strong>the</strong> list <strong>of</strong> key terms and <strong>the</strong> final list <strong>of</strong> key terms was formed.<br />
4.2 Question generation<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In order to filter out clauses which are appropriate for question generation a module processes<br />
<strong>the</strong> lexico-syntactic information collected during <strong>the</strong> preprocessing phase and decides<br />
that a clause is eligible if:<br />
(1) it contains at least one key term,<br />
(2) <strong>the</strong> term is in a NPA clause <strong>of</strong> its VPS 2 (<strong>the</strong> NPA clause is <strong>the</strong> subject daughter <strong>of</strong><br />
VPS phrase) and<br />
(3) <strong>the</strong> clause is finite.<br />
If <strong>the</strong> three conditions are present, we consider that <strong>the</strong> term is in <strong>the</strong> subject phrase in<br />
<strong>the</strong> sentence, which means that it is has central meaning for <strong>the</strong> sentence and we apply a<br />
rule which replaces <strong>the</strong> focal term with a blank. The system additionally checks whe<strong>the</strong>r<br />
<strong>the</strong> sentences do not point to some figures or tables, appendixes.<br />
For example in <strong>the</strong> materials <strong>of</strong> Biology <strong>the</strong> terms �������������� (heredity) and<br />
������������ (inheritance) are key terms. And we have <strong>the</strong> following information about<br />
<strong>the</strong> constituents for one <strong>of</strong> <strong>the</strong> sentences which contain <strong>the</strong> terms.<br />
(S (VPS (NPA (N (NN ������������)) (PP (Prep (IN ��)) (Ncfsd ����������������) (CoordP (Conj (C (CC<br />
�)) (Ncnsd ��������������)) (ConjArg (NPA (N (NN ��������)) (PP (Prep (IN �)) (N (NN ���������))))))))<br />
(VPC (V (T (RP ��)) (Pron (Ppxta ��)) (V (VB ��������))) (NPA (A (JJ �����)) (N (NN �����))))) (PUNC .))<br />
Whichever <strong>of</strong> both terms is chosen by <strong>the</strong> user <strong>the</strong> system will try to produce a stem<br />
from this sentence because it satisfies <strong>the</strong> three necessary conditions. So it will replace<br />
<strong>the</strong> suggested key term with a blank and suggest <strong>the</strong> key term as an answer.<br />
E.g. ������������ �� ��� � �������������� �������� � ��������� �� �� �������� ����� ������<br />
(Due to ... and inheritance <strong>the</strong> species remain unchanged for long periods.)<br />
2 NPA - head-adjunct noun phrase / VPS -head-subject verb phrase for full definitions - HPSGbased<br />
Syntactic Treebank <strong>of</strong> Bulgarian (BulTreeBank), BulTreeBank Project Technical Report 05. 2004,<br />
http://bultreebank.org/TechRep/BTB-TR05.pdf<br />
<strong>13</strong>9
correct answer: ����������������(<strong>the</strong> heredity)<br />
In <strong>the</strong> following sentence, again <strong>the</strong> key term �������������� is present.<br />
(S (VPS (NPA (CoordP (ConjArg (NPA (N (NN �����������)) (PP (Prep (IN ��)) (Ncfsd ����������������)<br />
(CoordP (Conj (C (CC �))) (Ncfsd �������������))))) (Conj (C (CC �))) (ConjArg (N (NN ������������)))) (PP<br />
(Prep (IN )) (Ncmpd ) (Pron (Ppetdp3 )))) (VPC (V (VB )) (NPA (A (JJ )) (N (NN )) (IN ))) (Ncfsd )) (PUNC .))<br />
The term is a part <strong>of</strong> <strong>the</strong> subject phrase, so it is possible to make a fill in <strong>the</strong> blank<br />
question, where <strong>the</strong> blank will replace <strong>the</strong> focal term ����������������.<br />
����������� �� ��� � ������������� � ������������ �� ���������������� �� �� ��������� ������<br />
�� �����������<br />
(The study <strong>of</strong> ... and variability and <strong>the</strong> discovery <strong>of</strong> <strong>the</strong>ir regularities are <strong>the</strong> basic tasks <strong>of</strong> genetics.)<br />
correct answer: ����������������(heredity)<br />
Except for <strong>the</strong> change <strong>of</strong> <strong>the</strong> focal term with a blank, we do not apply any o<strong>the</strong>r transformation<br />
to <strong>the</strong> chosen sentence.<br />
4.3 Distractor generation<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
For <strong>the</strong> purpose <strong>of</strong> our application we need to suggest distractors in two cases: (1) when<br />
questions are generated automatically and (2) when a key term was chosen by <strong>the</strong> designer,<br />
but no questions could be generated for that key term, <strong>the</strong>n only related concepts<br />
are shown to <strong>the</strong> user (<strong>the</strong>y are extracted by <strong>the</strong> same principle as distractors and that is<br />
why we explain <strong>the</strong>ir construction in this section).<br />
In <strong>the</strong> well-designed multiple-choice tests, <strong>the</strong> distractors are always semantically close<br />
to <strong>the</strong> correct answer (as well as to each o<strong>the</strong>r, in a sense). To find such distractors<br />
in previous studies we have tried paragraph clustering in order to define groups <strong>of</strong> text<br />
sections which have similar topics, but in short text this methodology does not give a<br />
promising result. Because <strong>of</strong> that we chose a ra<strong>the</strong>r simple working solution. We observed<br />
already prepared tests for beginners level and we noticed that most <strong>of</strong> <strong>the</strong> distractors<br />
looked very similar in first sight. They were mostly phrases holding <strong>the</strong> same noun and<br />
different modifiers or <strong>the</strong> opposite, composed by <strong>the</strong> same modifier and different nouns.<br />
That is why we accepted <strong>the</strong> practice to suggested as distractors NPs, which contain <strong>the</strong><br />
same noun, which <strong>the</strong> key term chosen by <strong>the</strong> user contains, but we change <strong>the</strong> modifier<br />
<strong>of</strong> <strong>the</strong> phrase. And also <strong>the</strong> o<strong>the</strong>r way around, we change <strong>the</strong> noun <strong>of</strong> <strong>the</strong> chosen key<br />
term and suggest phrases with <strong>the</strong> same modifier and different noun. All <strong>the</strong>se phrases are<br />
taken from <strong>the</strong> NP list generated in <strong>the</strong> first stage.<br />
140
Such an example is:<br />
Constant modifier Constant noun<br />
�������� �������� (natural complex) ������� ���������� (rural economy)<br />
�������� ���� (natural zone) �������� ���������� (world economy)<br />
�������� ��������� (natural component) ���������� ���������� (national economy)<br />
5 Evaluation<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
At <strong>the</strong> current stage <strong>the</strong> system has been tested by three teachers, who are pr<strong>of</strong>essional<br />
test designers. Each one <strong>of</strong> <strong>the</strong>m is a specialist in one <strong>of</strong> <strong>the</strong> three areas and has a degree<br />
also in one <strong>of</strong> <strong>the</strong> o<strong>the</strong>rs. They have experimented with materials in <strong>the</strong> three domains<br />
biology, geography and history. Each designer had to choose 20 key terms in total and<br />
to evaluate with a YES/NO mark (YES - acceptable question with or without need to be<br />
changed; NO - not acceptable question) <strong>the</strong> questions produced by <strong>the</strong> system, related to<br />
<strong>the</strong> chosen key terms.<br />
From <strong>the</strong> materials in biology and geography useful definitions were extracted and<br />
<strong>the</strong>y were appreciated by <strong>the</strong> designers while for <strong>the</strong> history domain mainly proper names<br />
were helpful. In total <strong>the</strong> average <strong>of</strong> <strong>the</strong> generated fill in <strong>the</strong> blank questions reported as<br />
acceptable by <strong>the</strong> designers were 61% (with or without post-editing). The pr<strong>of</strong>essionals<br />
shared that <strong>the</strong> context and <strong>the</strong> distractors have helped <strong>the</strong>m a lot, because <strong>the</strong>y gave <strong>the</strong>m<br />
more options to seek for <strong>the</strong> needed information in order to correct a not well-formed<br />
question. The reasons for discarding <strong>the</strong> rest <strong>of</strong> <strong>the</strong> questions were mainly that some<br />
<strong>of</strong> <strong>the</strong> sentences had common meaning and did not represent specific definition; some<br />
o<strong>the</strong>rs were discarded because <strong>the</strong> blank was ambiguous - <strong>the</strong>y had two many possible<br />
options for a correct answer; or <strong>the</strong> chosen term was not central for <strong>the</strong> sentence which<br />
was chosen.<br />
The designers were especially satisfied with <strong>the</strong> high quality <strong>of</strong> <strong>the</strong> key terms which<br />
served as a cross-reference over <strong>the</strong> whole material. They find <strong>the</strong>m useful in order to<br />
systematise <strong>the</strong> topics on which <strong>the</strong> student could be examined. In this way <strong>the</strong>y saved<br />
<strong>the</strong>m time, because <strong>the</strong>y could use <strong>the</strong> vocabulary <strong>of</strong> key terms as a summary <strong>of</strong> <strong>the</strong><br />
contents. Deeper analysis <strong>of</strong> <strong>the</strong> speeding-up <strong>of</strong> <strong>the</strong> process will be done after improving<br />
<strong>the</strong> user interface <strong>of</strong> <strong>the</strong> system.<br />
The test designers were certain that <strong>the</strong> so-prepared question items are useful only in<br />
<strong>the</strong> case <strong>of</strong> beginner level testing, where deep understanding is not required and learners<br />
are taught mostly basic definitions.<br />
6 Conclusion and future work<br />
This experiment represents a step towards <strong>the</strong> automatic test generation and it shows <strong>the</strong><br />
advances gained using more sophisticated tools and deeper processing <strong>of</strong> <strong>the</strong> instructional<br />
materials.<br />
Although <strong>the</strong> approach is considered as domain independent we consider Biology and<br />
Geography more suitable, producing better results than History. One <strong>of</strong> <strong>the</strong> reasons is that<br />
in history pure definitions in one sentence are hardly found and normally many references<br />
141
are used. In this domain important role had <strong>the</strong> proper names which were also included<br />
in <strong>the</strong> list <strong>of</strong> key terms.<br />
As this article represents a work in progress we plan to go deeper in <strong>the</strong> data analysis<br />
by adding dependency parsing. Then we can observe <strong>the</strong> subject and object clauses and<br />
make additional inferences. We will also try different techniques for distractor selection,<br />
such as using term similarity measures over <strong>the</strong> corpus and different types <strong>of</strong> questions.<br />
We plan to improve <strong>the</strong> user interface, because it is a main issue, which concerns <strong>the</strong><br />
efficiency <strong>of</strong> <strong>the</strong> work <strong>of</strong> <strong>the</strong> test designers. Overall we plan deeper evaluation <strong>of</strong> <strong>the</strong><br />
system,including Classical test <strong>the</strong>ory and error analysis in order to improve <strong>the</strong> produced<br />
items.<br />
7 Acknowledgements<br />
My complements go to my supervisor Galia Angelova and for Atanas Chanev who kindly<br />
provided models for <strong>the</strong> SVMTool and Dan Bikel’s parser for Bulgarian.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Aldabe, I., De Lacalle, M. L. and Maritxalar, M. (2007). Automatic acquisition <strong>of</strong> didactic<br />
resources: generating test-based questions, in I. F. de Castro (ed.), Proceeding <strong>of</strong><br />
SINTICE 07, pp. 105–111.<br />
Bikel, D. (2004). A distributional analysis <strong>of</strong> a lexicalized statistical parsing model, in<br />
D. Lin and D. Wu (eds), <strong>Proceedings</strong> <strong>of</strong> EMNLP.<br />
URL: http://www.cis.upenn.edu/ dbikel/s<strong>of</strong>tware.htmlstat-parser<br />
Collins, M. (1999). Head-Driven Statistical Models for Natural Language Parsing, PhD<br />
<strong>the</strong>sis, University <strong>of</strong> Pennsylvania.<br />
Gimenéz, J. and Márquez, L. (2004). Svmtool: A general pos tagger generator based on<br />
support vector machines, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 4th International Conference LREC’04.<br />
Mitkov, R., Ha, L. A. and Karamis, N. (2006). A computer-aided environment for generating<br />
multiple-choice test items, Natural Language Engineering 12.: 177–194.<br />
Nikolova, I. (2007). Supporting <strong>the</strong> development <strong>of</strong> multiple-choice tests in bulgarian<br />
by language technologies, in E. Paskaleva and M. Slavcheva (eds), <strong>Proceedings</strong> <strong>of</strong><br />
<strong>the</strong> Workshop A Common Natural Language Processing Paradigm for Balkan Languages,<br />
pp. 31–34.<br />
142
WORD SPACE MODELS OF<br />
SEMANTIC SIMILARITY AND RELATEDNESS<br />
Yves Peirsman<br />
University <strong>of</strong> Leuven & Research Foundation – Flanders<br />
Abstract. Word Space Models provide a convenient way <strong>of</strong> modelling word meaning in<br />
terms <strong>of</strong> a word’s contexts in a corpus. This paper investigates <strong>the</strong> influence <strong>of</strong> <strong>the</strong> type <strong>of</strong><br />
context features on <strong>the</strong> kind <strong>of</strong> semantic information that <strong>the</strong> models capture. In particular,<br />
we make a distinction between semantic similarity and semantic relatedness. It is shown<br />
that <strong>the</strong> strictness <strong>of</strong> <strong>the</strong> context definition correlates with <strong>the</strong> models’ ability to identify<br />
semantically similar words: syntactic approaches perform better than bag-<strong>of</strong>-word models,<br />
and small context windows are better than larger ones. For semantic relatedness, however,<br />
syntactic features and small context windows are at a clear disadvantage. Second-order bag<strong>of</strong>-word<br />
models perform below average across <strong>the</strong> board.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Word Space Models have become <strong>the</strong> standard approach to <strong>the</strong> computational modelling<br />
<strong>of</strong> lexical semantics (Landauer and Dumais, 1997; Lin, 1998; Schütze, 1998; Padó and<br />
Lapata, 2007). They indeed <strong>of</strong>fer a convenient way <strong>of</strong> capturing <strong>the</strong> meaning <strong>of</strong> a word<br />
simply on <strong>the</strong> basis <strong>of</strong> <strong>the</strong> contexts in which it is used in a corpus. In that way, <strong>the</strong>y can<br />
retrieve <strong>the</strong> most similar words for a given target word. Yet, <strong>the</strong>re is no agreement on how<br />
context should be defined exactly. Context features vary from sentences or paragraphs to<br />
single words, with or without <strong>the</strong> addition <strong>of</strong> syntactic relations. While all <strong>the</strong>se features<br />
definitely capture some semantic information, it is only to be expected that <strong>the</strong> choice <strong>of</strong><br />
context definition has an influence on <strong>the</strong> kind <strong>of</strong> semantic relatives that <strong>the</strong> Word Space<br />
Models will find.<br />
It is well known that words may be semantically related along a number <strong>of</strong> dimensions<br />
(Cruse, 1986). In <strong>the</strong> NLP literature, similarity takes up a central position, with synonymy<br />
as <strong>the</strong> most obvious example. But <strong>the</strong>re are o<strong>the</strong>r types <strong>of</strong> semantic relations, too. For<br />
instance, two words like doctor and hospital have a clear connection, although <strong>the</strong>y are<br />
in no way semantically similar. Recovering this semantic relatedness from a corpus may<br />
have to proceed along different lines than <strong>the</strong> modelling <strong>of</strong> semantic similarity. Specific<br />
Word Space Models may thus have a bias towards one or <strong>the</strong> o<strong>the</strong>r <strong>of</strong> <strong>the</strong>se relations. In <strong>the</strong><br />
literature, however, <strong>the</strong> investigation <strong>of</strong> this semantic behaviour <strong>of</strong> Word Space Models<br />
has only recently come to <strong>the</strong> fore (Sahlgren, 2006; Peirsman, Heylen and Speelman,<br />
2007).<br />
In this paper, we investigate eleven Word Space Models, representing three broad<br />
classes, with respect to <strong>the</strong>ir performance in <strong>the</strong> fields <strong>of</strong> semantic similarity and semantic<br />
relatedness. It will be shown that <strong>the</strong>re is no such thing as a single best Word Space<br />
Model: <strong>the</strong> ranking <strong>of</strong> <strong>the</strong> approaches depends on <strong>the</strong> type <strong>of</strong> semantic information we<br />
want to find. The paper is structured as follows: in <strong>the</strong> next section, we will introduce <strong>the</strong><br />
different context models and <strong>the</strong> two types <strong>of</strong> semantic relationship that we investigate.<br />
Section 3 <strong>the</strong>n presents <strong>the</strong> precise setup <strong>of</strong> our experiments, while section 4 discusses<br />
<strong>the</strong>ir results. Section 5 wraps up with conclusions and an outlook for future research.<br />
143
2 Word Space Models<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
2.1 Competing definitions <strong>of</strong> context<br />
All Word Space Models <strong>of</strong> lexical semantics rely on <strong>the</strong> so-called distributional hypo<strong>the</strong>sis<br />
(Harris, 1954), which claims that words with similar meanings occur in similar contexts.<br />
From this hypo<strong>the</strong>sis, it follows that semantic similarity can be modelled in terms <strong>of</strong><br />
contextual or distributional similarity. This is done by constructing for each target word<br />
a so-called context vector, which contains <strong>the</strong> scores <strong>of</strong> its target word for all possible<br />
context features. These scores can be <strong>the</strong> number <strong>of</strong> times that <strong>the</strong> contextual feature<br />
co-occurs with <strong>the</strong> target, or more <strong>of</strong>ten, some kind <strong>of</strong> weighted frequency that captures<br />
<strong>the</strong> statistical link between <strong>the</strong> target word and that feature. The distributional similarity<br />
between two words is <strong>the</strong>n calculated as <strong>the</strong> similarity between <strong>the</strong>ir vectors, on <strong>the</strong> basis<br />
<strong>of</strong> a function like <strong>the</strong> cosine. In this way, it is possible to find for each target word <strong>the</strong> n<br />
most distributionally similar words in any given corpus. We call <strong>the</strong>se words <strong>the</strong> nearest<br />
neighbours <strong>of</strong> <strong>the</strong> target.<br />
Based on <strong>the</strong> definition <strong>of</strong> context, it is possible to define a hierarchy <strong>of</strong> Word Space<br />
Models, each with its own kind <strong>of</strong> contextual features. At <strong>the</strong> top <strong>of</strong> <strong>the</strong> tree we make a distinction<br />
between document-based and word-based approaches. Document-based models<br />
use sentences, paragraphs or documents as dimensions, and count how <strong>of</strong>ten a target word<br />
appears in each <strong>of</strong> <strong>the</strong>se entities in <strong>the</strong> corpus (Landauer and Dumais, 1997; Sahlgren,<br />
2006). Word-based models, by contrast, take not <strong>the</strong> context itself, but features from this<br />
context as dimensions. They can be subdivided into syntactic and bag-<strong>of</strong>-word models.<br />
So-called bag-<strong>of</strong>-word or co-occurrence models take into account all words within a predefined<br />
distance <strong>of</strong> <strong>the</strong> target word (generally with <strong>the</strong> exception <strong>of</strong> semantically empty<br />
words like articles, etc.), whereas syntactic models consider only those words to which<br />
<strong>the</strong> target is syntactically related. Sometimes <strong>the</strong> features <strong>of</strong> such syntactic models consist<br />
<strong>of</strong> <strong>the</strong>se syntactically related words alone (Padó and Lapata, 2007), sometimes <strong>the</strong>y are<br />
formed by <strong>the</strong> word plus its relation (Lin, 1998). Finally we can distinguish between firstorder<br />
and second-order approaches. First-order bag-<strong>of</strong>-word approaches count <strong>the</strong> context<br />
words directly (Levy and Bullinaria, 2001), while second-order bag-<strong>of</strong>-word approaches<br />
sum <strong>the</strong> vectors <strong>of</strong> <strong>the</strong>se context words. In this last case, <strong>the</strong> target’s context vector thus<br />
contains frequency information about <strong>the</strong> context words <strong>of</strong> its (first-order) context words<br />
(Schütze, 1998). Although it is in principle possible to construct second-order syntactic<br />
models, to our knowledge no implementation has been presented in <strong>the</strong> literature.<br />
2.2 Semantic similarity and semantic relatedness<br />
While it is claimed that all Word Space Models capture some kind <strong>of</strong> semantic information,<br />
so far we have only very limited knowledge about <strong>the</strong> influence <strong>of</strong> <strong>the</strong> context<br />
definition on <strong>the</strong> types <strong>of</strong> semantic relationship that <strong>the</strong> models find. In this paper we<br />
investigate two such types: semantic similarity and semantic relatedness. The first applies<br />
to synonyms (e.g., plane and airplane), hyponyms and hypernyms (e.g., bird and<br />
blackbird) and co-hyponyms (e.g., blackbird and robin) — two words with a relationship<br />
<strong>of</strong> similarity between <strong>the</strong> concepts <strong>the</strong>y refer to. Semantic relatedness, by contrast, exists<br />
between words whose concepts are not necessarily similar, but still related, for instance<br />
because <strong>the</strong>y belong to <strong>the</strong> same script, frame or lexical field. This is true for pairs like<br />
bird and beak or plane and pilot. Note that it is not possible to draw a clear boundary be-<br />
144
tween semantic similarity and semantic relatedness. Take <strong>the</strong> word pair pepper–salt, for<br />
instance. These two words are clearly semantically similar, since <strong>the</strong>y both refer to spices.<br />
At <strong>the</strong> same time, however, <strong>the</strong>y are also semantically related: not only do <strong>the</strong>y both belong<br />
to <strong>the</strong> lexical fields <strong>of</strong> food or spices, <strong>the</strong>y also <strong>of</strong>ten co-occur toge<strong>the</strong>r in <strong>the</strong> phrase<br />
salt and pepper. Instead <strong>of</strong> mutually exclusive classes, semantic similarity and relatedness<br />
can thus better be thought <strong>of</strong> as <strong>the</strong> two ends <strong>of</strong> a continuum, or two perpendicular<br />
axes in a two-dimensional plane.<br />
For many NLP applications, similarity might be <strong>the</strong> most important relation to model.<br />
In typical Query Expansion, for instance, only semantically similar words (synonyms or<br />
possibly hyponyms) make for a desired extension <strong>of</strong> a search query. Similarly, in Question<br />
Answering a word in <strong>the</strong> question should only be matched with semantically similar<br />
words in <strong>the</strong> database where <strong>the</strong> computer looks for <strong>the</strong> answer. Semantic similarity, however,<br />
is just one way in which words may be related in our mental lexicon, as suggested<br />
by psycholinguistic association experiments. According to Aitchinson (2003), <strong>the</strong> four<br />
major types <strong>of</strong> associations that people give in response to a cue word are, in order <strong>of</strong><br />
frequency, co-ordination (co-hyponyms like pepper and salt), collocation (like salt and<br />
water), superordination (hypernyms like butterfly and insect) and synonymy (like starved<br />
and hungry). A similar observation is made by Schulte im Walde and Melinger (2005).<br />
Comparing <strong>the</strong> results <strong>of</strong> <strong>the</strong>ir German verb association experiment with GermaNet, <strong>the</strong>y<br />
note that only 6% <strong>of</strong> <strong>the</strong> associations are synonyms, 14% are hypernyms and 16% are<br />
hyponyms, while no less than 54% <strong>of</strong> <strong>the</strong> associations are unrelated to <strong>the</strong>ir cue words in<br />
<strong>the</strong> GermaNet taxonomy. Although part <strong>of</strong> this can be explained by <strong>the</strong> incompleteness<br />
<strong>of</strong> <strong>the</strong> database, such results will be difficult to replicate with models <strong>of</strong> semantic similarity.<br />
After all, <strong>the</strong>se are meant to prefer synonyms over hypernyms and co-hyponyms, and<br />
even exclude collocates altoge<strong>the</strong>r. The best Word Space Models <strong>of</strong> semantic similarity<br />
may thus not be <strong>the</strong> best models <strong>of</strong> relatedness, and vice versa.<br />
Despite <strong>the</strong> wealth <strong>of</strong> research into Word Space Models, studies into <strong>the</strong>ir semantic<br />
characteristics are scarce. Most <strong>of</strong>ten one model is applied to a specific computationallinguistic<br />
task, and “comparisons between <strong>the</strong> (...) models have been few and far between<br />
in <strong>the</strong> literature” (Padó and Lapata, 2007, p. 166). Sahlgren (2006) is one exception to this<br />
rule. Focusing on document-based and first-order bag-<strong>of</strong>-word models, he showed that <strong>the</strong><br />
latter are better geared towards <strong>the</strong> modelling <strong>of</strong> paradigmatic (similarity) relations, while<br />
<strong>the</strong> former have a clear bias towards syntagmatic relations. Unfortunately, Sahlgren left<br />
out a number <strong>of</strong> popular word space approaches, like those based on syntactic relations or<br />
second-order co-occurrences. Peirsman et al. (2007) also included syntactic models, but<br />
concentrated on similarity relations only. This article thus sets out to fill <strong>the</strong>se gaps in <strong>the</strong><br />
literature, by discussing a wide variety <strong>of</strong> model types from <strong>the</strong> perspectives <strong>of</strong> similarity<br />
as well as relatedness.<br />
3 Experimental setup<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
We investigate three classes <strong>of</strong> Word Space Models, for a total <strong>of</strong> eleven approaches: five<br />
first-order bag-<strong>of</strong>-word models, five second-order bag-<strong>of</strong>-word models and one syntactic<br />
model. Our corpus is <strong>the</strong> 300 million word Twente Nieuws Corpus <strong>of</strong> Dutch newspaper<br />
articles, collected at <strong>the</strong> University <strong>of</strong> Twente and parsed by <strong>the</strong> Alpino parser at <strong>the</strong><br />
University <strong>of</strong> Groningen. As our test set, we selected from this corpus <strong>the</strong> 10,000 most<br />
145
frequent nouns. For each <strong>of</strong> <strong>the</strong>se, we had all models retrieve <strong>the</strong> 100 most similar neighbours<br />
from <strong>the</strong> 9,999 remaining nouns in <strong>the</strong> set.<br />
The bag-<strong>of</strong>-word models, both first-order and second-order, varied <strong>the</strong> size <strong>of</strong> <strong>the</strong> context<br />
window <strong>the</strong>y took into account — 1, 3, 5, 10 or 20 words to ei<strong>the</strong>r side <strong>of</strong> <strong>the</strong> target —<br />
for a total <strong>of</strong> ten models. Sentence boundaries were ignored; article boundaries were not.<br />
The syntactic model considered eight different types <strong>of</strong> syntactic dependency relations,<br />
in which <strong>the</strong> target word could be (1) <strong>the</strong> subject <strong>of</strong> verb v, (2) <strong>the</strong> direct object <strong>of</strong> verb<br />
v, (3) a prepositional complement <strong>of</strong> verb v introduced by preposition p, (4) <strong>the</strong> head <strong>of</strong><br />
an adverbial prepositional phrase (PP) <strong>of</strong> verb v introduced by preposition p, (5) modified<br />
by adjective a, (6) postmodified by a PP with head n introduced by preposition p,<br />
(7) modified by an apposition with head n, or (8) coordinated with head n. Each specific<br />
instantiation <strong>of</strong> <strong>the</strong> variables v, p, a, or n was responsible for a new context feature.<br />
The o<strong>the</strong>r parameter settings were shared by all eleven models:<br />
• Dimensionality: For all approaches, we used <strong>the</strong> 2,000 most frequent contextual<br />
features in <strong>the</strong> corpus as dimensions. This is a simple but common way <strong>of</strong> reducing<br />
<strong>the</strong> o<strong>the</strong>rwise huge dimensionality <strong>of</strong> <strong>the</strong> vectors, which leads to state-<strong>of</strong>-<strong>the</strong>-art<br />
results, particularly for <strong>the</strong> syntactic model (Levy and Bullinaria, 2001; Padó and<br />
Lapata, 2007). For <strong>the</strong> syntactic model <strong>the</strong>se dimensions are <strong>the</strong> 2,000 most frequent<br />
syntactic features, like subj <strong>of</strong> fly. For <strong>the</strong> bag-<strong>of</strong>-word models, <strong>the</strong>y are<br />
formed by <strong>the</strong> 2,000 most frequent words in <strong>the</strong> corpus. Function words and o<strong>the</strong>r<br />
semantically empty words were excluded a priori on <strong>the</strong> basis <strong>of</strong> a stop list.<br />
• Frequency cut-<strong>of</strong>f: Depending on <strong>the</strong> context size, we established a cut-<strong>of</strong>f value n,<br />
so that <strong>the</strong> models ignored those features that occurred toge<strong>the</strong>r with <strong>the</strong> target fewer<br />
than n times. For context size 3, this cut-<strong>of</strong>f was fixed at 3, for <strong>the</strong> larger context<br />
sizes it lay at 5. The syntactic model and <strong>the</strong> bag-<strong>of</strong>-word model with context size<br />
1 did not use a cut-<strong>of</strong>f, since it led to data sparseness.<br />
• Frequency weighting: As is usual in <strong>the</strong> literature, <strong>the</strong> context vectors <strong>of</strong> <strong>the</strong> target<br />
words did not contain <strong>the</strong> simple frequencies <strong>of</strong> <strong>the</strong> features. Instead, <strong>the</strong>y listed<br />
<strong>the</strong> point-wise mutual information between each feature and <strong>the</strong> target word. This<br />
measure expresses whe<strong>the</strong>r <strong>the</strong> two occur toge<strong>the</strong>r more or less <strong>of</strong>ten in <strong>the</strong> corpus<br />
than we expect on <strong>the</strong> basis <strong>of</strong> <strong>the</strong>ir individual relative frequencies.<br />
• Similarity measure: Finally, <strong>the</strong> distributional similarity between two target words<br />
was measured by <strong>the</strong> cosine between <strong>the</strong>ir context vectors.<br />
4 Results<br />
4.1 Semantic similarity<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
We evaluated <strong>the</strong> ability <strong>of</strong> our models to find semantically similar words on <strong>the</strong> basis <strong>of</strong><br />
a comparison with Dutch EuroWordNet (Vossen, 1998). This lexical database contains<br />
more than 34,000 sets <strong>of</strong> noun synonyms and <strong>the</strong> relations that exist between <strong>the</strong>m. Two<br />
evaluation measures were applied. First, we focused on <strong>the</strong> general ability <strong>of</strong> our models<br />
to capture semantic similarity. Then we looked into <strong>the</strong> distribution <strong>of</strong> four more specific<br />
similarity relations.<br />
146
wu & palmer<br />
0.0 0.2 0.4 0.6 0.8 1.0<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
syn c1 c3 c5 c10 c20 cc1 cc3 cc5 cc10 cc20<br />
word space models<br />
Figure 1: Wu & Palmer similarity scores between target and nearest neighbour.<br />
syn: syntactic model, cn: first-order bag-<strong>of</strong>-words, ccn: second-order bag-<strong>of</strong>-words<br />
n: context size (number <strong>of</strong> words on ei<strong>the</strong>r side <strong>of</strong> target)<br />
The general performance <strong>of</strong> <strong>the</strong> models was quantified by <strong>the</strong> average Wu and Palmer<br />
score between a target word and its single nearest neighbour (Wu and Palmer, 1994). This<br />
Wu and Palmer score is a popular way <strong>of</strong> measuring <strong>the</strong> semantic similarity between two<br />
words, based on <strong>the</strong>ir depth and <strong>the</strong>ir distance from each o<strong>the</strong>r in a taxonomic structure<br />
like EuroWordNet. If ei<strong>the</strong>r <strong>the</strong> target or its nearest neighbour were not present in <strong>the</strong><br />
database, <strong>the</strong> pair was simply ignored. In order to make <strong>the</strong> results perfectly comparable<br />
across models, we restricted <strong>the</strong> results to <strong>the</strong> 4183 target words with a nearest neighbour<br />
in EuroWordNet for all models. The resulting Wu and Palmer scores are given in Figure 1.<br />
Figure 1 shows a clear decrease in Wu and Palmer score as <strong>the</strong> definition <strong>of</strong> context<br />
becomes less strict. A Friedman test indeed confirms <strong>the</strong> influence <strong>of</strong> <strong>the</strong> type <strong>of</strong> Word<br />
Space Model on performance (Friedman chi-squared = 3541.575, df = 10, p-value <<br />
.001). The syntactic model achieves <strong>the</strong> highest average similarity score by far, followed<br />
by <strong>the</strong> first-order bag-<strong>of</strong>-word models and finally <strong>the</strong> second-order bag-<strong>of</strong>-word models.<br />
Moreover, small contexts appear to model semantic similarity better than large ones. A<br />
test <strong>of</strong> multiple comparisons after Friedman showed that <strong>the</strong> differences between all pairs<br />
<strong>of</strong> models are indeed statistically significant at <strong>the</strong> .05 level, except for those between<br />
context sizes 1 and 3 (both first-order and second-order) and that between <strong>the</strong> first-order<br />
model with context size 20 and <strong>the</strong> second-order model with context size 5.<br />
Of course, this general similarity score does not give any information about what specific<br />
type <strong>of</strong> similarity relation <strong>the</strong> models find. We <strong>the</strong>refore defined four taxonomic<br />
similarity relations, again with EuroWordNet as a gold standard. Synonyms were defined<br />
as words in <strong>the</strong> same synonym set as <strong>the</strong> target word, hypernyms as words exactly one<br />
node above <strong>the</strong> target, hyponyms those one node below and co-hyponyms as words one<br />
node below any <strong>of</strong> <strong>the</strong> target’s hypernyms. Toge<strong>the</strong>r, <strong>the</strong>se relations make up <strong>the</strong> target’s<br />
EuroWordNet environment. Note that our strict definition <strong>of</strong> <strong>the</strong>se relationships does not<br />
147
frequency<br />
0 500 1000 1500 2000 2500<br />
0.512<br />
0.384 0.405<br />
0.369<br />
0.327<br />
0.264<br />
0.25<br />
syn c1 c3 c5 c10 c20 cc1 cc3 cc5 cc10 cc20<br />
word space models<br />
0.273 0.247<br />
0.217<br />
cohyponym<br />
hyperonym<br />
hyponym<br />
synonym<br />
0.185<br />
Figure 2: Distribution <strong>of</strong> semantic similarity relations for all models.<br />
allow for more than one or two steps in <strong>the</strong> tree, and thus disregards possible hypernyms<br />
or hyponyms that are more than one step away from <strong>the</strong> target. This approach ensures <strong>the</strong><br />
reliability <strong>of</strong> our gold standard, but constitutes a test that a relatively low percentage <strong>of</strong><br />
nearest neighbours will pass. Figure 2 shows how <strong>the</strong> single nearest neighbours <strong>of</strong> our target<br />
words are distributed over <strong>the</strong> four similarity relations. Again we restricted ourselves<br />
to <strong>the</strong> 4183 target words with a neighbour in EuroWordNet for all models.<br />
Not surprisingly, <strong>the</strong> number <strong>of</strong> retrieved similarity relations mirrors <strong>the</strong> general Wu<br />
and Palmer similarity score. Again <strong>the</strong> syntactic model performs best: 51.2% <strong>of</strong> its single<br />
nearest neighbours that occur in EuroWordNet are situated in <strong>the</strong> environment <strong>of</strong> <strong>the</strong> target<br />
word. This precision drops to between 40.5% and 26.4% for <strong>the</strong> first-order bag-<strong>of</strong>-word<br />
methods and even lower for <strong>the</strong> second-order models. As above, <strong>the</strong> performance <strong>of</strong><br />
<strong>the</strong> models seems to depend on <strong>the</strong> strictness <strong>of</strong> <strong>the</strong>ir context definition. The stricter <strong>the</strong>y<br />
view context — i.e., syntactic context ra<strong>the</strong>r than a bag <strong>of</strong> words, smaller context windows<br />
ra<strong>the</strong>r than large ones — <strong>the</strong> more examples <strong>of</strong> semantic similarity <strong>the</strong>y find. This pattern<br />
remains unchanged when a larger number <strong>of</strong> nearest neighbours is taken into account.<br />
With one exception, <strong>the</strong> distribution <strong>of</strong> <strong>the</strong> four relations is comparable across <strong>the</strong> different<br />
models. Co-hyponyms figure most prominently among <strong>the</strong> nearest neighbours,<br />
followed by synonyms, hypernyms and hyponyms. Only <strong>the</strong> syntactic model finds an<br />
unexpectedly high number <strong>of</strong> hypernyms. This can probably be explained by <strong>the</strong> way<br />
syntactic relations are typically inherited in a taxonomy: all characteristics <strong>of</strong> a (prototypical)<br />
concept (can fly, for instance) also apply to its hypernyms, so that <strong>the</strong>se are <strong>of</strong>ten<br />
most similar in terms <strong>of</strong> syntactic distribution in a corpus.<br />
4.2 Semantic relatedness<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
The results in <strong>the</strong> previous section do not necessarily express <strong>the</strong> overall quality <strong>of</strong> <strong>the</strong> investigated<br />
Word Space Models. It is possible that <strong>the</strong> models that scored relatively badly<br />
148
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
0.0 0.1 0.2 0.3 0.4<br />
0 20 40 60 80 100<br />
number <strong>of</strong> nearest neighbours<br />
precision<br />
recall<br />
F−score<br />
Figure 3: Evolution <strong>of</strong> <strong>the</strong> precision, recall and F-score <strong>of</strong> <strong>the</strong> first-order bag-<strong>of</strong>-word<br />
model with context size 5 in its retrieval <strong>of</strong> associations.<br />
in <strong>the</strong> similarity experiments are simply biased towards a different kind <strong>of</strong> semantic relation.<br />
In this second round <strong>of</strong> experiments we <strong>the</strong>refore turn our attention from semantic<br />
similarity to semantic relatedness.<br />
For this task, we relied on a psycholinguistic experiment <strong>of</strong> human associations, described<br />
in De Deyne and Storms (in press). In this experiment, participants were asked<br />
to list three different word associations for 1,424 cue words. Each word was presented<br />
to at least 82 participants, resulting in a total <strong>of</strong> 381,909 responses. For instance, aap<br />
(‘monkey’) triggered <strong>the</strong> response zoo (‘zoo’) 27 times, aarde (‘earth’) prompted planeet<br />
(‘planet’) 14 times and bikini (‘bikini’) elicited vakantie (‘holiday’) 6 times. These examples<br />
show that this experiment taps into a different kind <strong>of</strong> semantic relationship than<br />
<strong>the</strong> previous one. Note that at this moment, we ignore <strong>the</strong> fact that association strength is<br />
<strong>of</strong>ten asymmetric (Michelbacher, Evert and Schütze, 2007).<br />
In order to make <strong>the</strong> results comparable to those in section 4.1, we reduced <strong>the</strong> data set<br />
to those cue words and associations that belong to <strong>the</strong> 10,000 most frequent nouns in our<br />
corpus. This gave a gold standard <strong>of</strong> 768 cue words with a total <strong>of</strong> 31,862 different cue–<br />
association pairs. When <strong>the</strong>se associations are checked against EuroWordNet, we indeed<br />
find that only 8% belong to <strong>the</strong> EuroWordNet environment <strong>of</strong> <strong>the</strong>ir target word. 9% <strong>of</strong><br />
<strong>the</strong>se are synonyms, 19% are hypernyms, 16% are hyponyms and 56% are cohyponyms.<br />
We evaluated <strong>the</strong> Word Space Models against this gold standard by counting <strong>the</strong> number<br />
<strong>of</strong> associations that <strong>the</strong>y find as <strong>the</strong> nearest neighbours to <strong>the</strong> cue words. If we consider<br />
just one nearest neighbour, <strong>the</strong> results already show a considerable difference from<br />
<strong>the</strong> previous experiments. As <strong>the</strong> top chart in Figure 4 indicates, <strong>the</strong> syntactic model still<br />
performs best, with 340 associations (a precision <strong>of</strong> .443), followed by <strong>the</strong> first-order and<br />
<strong>the</strong>n <strong>the</strong> second-order bag-<strong>of</strong>-word models. However, within <strong>the</strong> bag-<strong>of</strong>-word models, <strong>the</strong><br />
ideal context size has changed. The first-order bag-<strong>of</strong>-word models with context sizes 10<br />
and 20 have 299 and 293 associations among <strong>the</strong>ir single nearest neighbours, respectively.<br />
For 768 targets, this gives precision values <strong>of</strong> .389 and .382. Then we find context sizes 5<br />
(n = 281, P = .366), 3 (n = 269, P = .350) and 1 (n = 228, P = .297). Larger contexts<br />
thus outperform <strong>the</strong>ir smaller competitors here. Note that <strong>the</strong> two best models share only<br />
149
90 correct predictions, which indicates that <strong>the</strong>y have different preferences among <strong>the</strong> associations.<br />
A look at <strong>the</strong> data suggests that <strong>the</strong> syntactic model indeed picks out those<br />
associations that are also semantically similar to <strong>the</strong>ir target word, while <strong>the</strong> first-order<br />
bag-<strong>of</strong>-word models with large contexts cover collocational relatedness better. With <strong>the</strong><br />
second-order models, finally, context size 3 seems optimal.<br />
When we consider one nearest neighbour, <strong>the</strong> models cannot find more than 768 associations,<br />
and recall thus stays extremely low. We <strong>the</strong>refore increased <strong>the</strong> number <strong>of</strong> nearest<br />
neighbours from 1 to 100 and calculated <strong>the</strong> precision, recall and F-score at each step.<br />
By way <strong>of</strong> example, Figure 3 plots <strong>the</strong> evolution <strong>of</strong> <strong>the</strong>se values for <strong>the</strong> best-performing<br />
model. The bottom bar chart in Figure 4, <strong>the</strong>n, shows <strong>the</strong> maximum F-score <strong>of</strong> all <strong>the</strong><br />
models. The syntactic approach has lost its lead, which suggests that it is able to model<br />
only a small number <strong>of</strong> associations well — probably those that also score highly on<br />
similarity. Instead it is now <strong>the</strong> first-order bag-<strong>of</strong>-word model with context size 5 that<br />
outclasses all o<strong>the</strong>rs, with an F-score <strong>of</strong> .127 (P = .112, R = .148) at 55 neighbours.<br />
Extending <strong>the</strong> context window to 10 words brings <strong>the</strong> F-score down to .122 (P = .102,<br />
R = .150, 61 neighbours); reducing <strong>the</strong> window to 3 words takes it to .120 (P = .104,<br />
R = .143, 57 neighbours). Next, we have <strong>the</strong> bag-<strong>of</strong>-word model with context size 20<br />
(F = .115, P = .102, R = .<strong>13</strong>3, 54 neighbours) and only <strong>the</strong>n <strong>the</strong> syntactic model<br />
(F = .111, P = .102 R = .123, 50 neighbours). Large contexts now score slightly worse<br />
than intermediate ones, which probably strike <strong>the</strong> best balance between similarity relations<br />
and collocational links. Second-order models never attain an F-score above .10, and<br />
nei<strong>the</strong>r do <strong>the</strong> smallest context windows, which are thus clearly biased towards similarity.<br />
4.3 Discussion<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In part, our experiments have confirmed earlier results in <strong>the</strong> literature. For instance,<br />
Sahlgren (2006) already noted that with first-order bag-<strong>of</strong>-word models, larger contexts<br />
score better in his association experiment, while smaller contexts score better in <strong>the</strong> synonymy<br />
test. Peirsman et al. (2007) found even better results for a syntactic model in<br />
Dutch, at least with respect to semantic similarity evaluated against EuroWordNet. Both<br />
findings are borne out by our experiments.<br />
At <strong>the</strong> same time, our results add some new insights to <strong>the</strong>se earlier observations. We<br />
have shown that <strong>the</strong> syntactic model and <strong>the</strong> bag-<strong>of</strong>-word models with context size 1 are<br />
most biased towards semantic similarity. The syntactic model scored best in our first<br />
round <strong>of</strong> experiments, while <strong>the</strong> results <strong>of</strong> <strong>the</strong> bag-<strong>of</strong>-word models with context size 1<br />
were ei<strong>the</strong>r not statistically different from or better than those <strong>of</strong> models with larger context<br />
windows. When it came to <strong>the</strong> discovery <strong>of</strong> semantic associations, however, context<br />
size 1 proved <strong>the</strong> least advisable choice, and <strong>the</strong> syntactic model was outperformed by<br />
all first-order bag-<strong>of</strong>-word models with an intermediate or large context window. Secondorder<br />
bag-<strong>of</strong>-word models scored below average in both experiments. They probably only<br />
show <strong>the</strong>ir power when data sparseness is an issue, as with Word Sense Discrimination<br />
(Schütze, 1998) or with corpora smaller than ours.<br />
5 Conclusions and future research<br />
In this paper, we investigated <strong>the</strong> influence <strong>of</strong> <strong>the</strong> context definition on <strong>the</strong> ability <strong>of</strong><br />
several Word Space Models to capture two kinds <strong>of</strong> semantic information — semantic<br />
150
frequency<br />
F−score<br />
0 100 200 300 400<br />
0.00 0.04 0.08 0.12<br />
0.443<br />
0.297<br />
0.35 0.366 0.389 0.382<br />
0.185<br />
syn c1 c3 c5 c10 c20 cc1 cc3 cc5 cc10 cc20<br />
0.111<br />
0.084<br />
0.12<br />
word space models<br />
0.127 0.122 0.115<br />
0.052<br />
0.267 0.247 0.247 0.232<br />
syn c1 c3 c5 c10 c20 cc1 cc3 cc5 cc10 cc20<br />
word space models<br />
0.081 0.079 0.081 0.08<br />
Figure 4: Frequency <strong>of</strong> associations among single nearest neighbours (top) and maximal<br />
F-scores for all models (bottom).<br />
similarity and semantic relatedness. We studied a total <strong>of</strong> eleven Word Space Models:<br />
one syntactic approach and ten bag-<strong>of</strong>-word models with context sizes 1, 3, 5, 10 and<br />
20, first-order as well as second-order. Both for semantic similarity and semantic relatedness,<br />
first-order models clearly beat <strong>the</strong>ir second-order competitors. However, while<br />
syntactic models gave <strong>the</strong> best results for semantic similarity, first-order bag-<strong>of</strong>-word approaches<br />
with intermediate to large context windows fared better in <strong>the</strong> retrieval <strong>of</strong> associated<br />
words.<br />
In <strong>the</strong> short term, we aim to extend <strong>the</strong> repository <strong>of</strong> Word Space Models that we are<br />
investigating — document-based models and second-order syntactic models are particularly<br />
high on our list. In <strong>the</strong> longer term, we will try and determine if <strong>the</strong> differences we<br />
observed in <strong>the</strong> modelling <strong>of</strong> semantic relations between word types also play a role in<br />
Word Sense Discrimination. In this task, all contexts <strong>of</strong> a word are clustered in order to<br />
automatically find <strong>the</strong> multiple senses <strong>of</strong> that word. Given <strong>the</strong> results here, we suspect that<br />
different kinds <strong>of</strong> polysemy or homonymy may not demand <strong>the</strong> same context definitions.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Aitchinson, J. (2003). Words in <strong>the</strong> Mind. An Introduction to <strong>the</strong> Mental Lexicon, Oxford:<br />
Blackwell.<br />
Cruse, D. A. (1986). Lexical Semantics, London: Cambridge University Press.<br />
151
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
De Deyne, S. and Storms, G. (in press). Word associations: Norms for 1,424 dutch words<br />
in a continuous task, Behaviour Research Methods .<br />
Harris, Z. (1954). Distributional structure, Word 10(23): 146–162.<br />
Landauer, T. K. and Dumais, S. T. (1997). A solution to Plato’s problem: The Latent<br />
Semantic Analysis <strong>the</strong>ory <strong>of</strong> <strong>the</strong> acquisition, induction, and representation <strong>of</strong> knowledge,<br />
Psychological Review 104: 211–240.<br />
Levy, J. P. and Bullinaria, J. A. (2001). Learning lexical properties from word usage<br />
patterns: Which context words should be used, in R. French and J. Sougne (eds),<br />
Connectionist Models <strong>of</strong> Learning, Development and Evolution: <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />
6th Neural Computation and Psychology Workshop, London: Springer, pp. 273–<br />
282.<br />
Lin, D. (1998). Automatic retrieval and clustering <strong>of</strong> similar words, <strong>Proceedings</strong> <strong>of</strong><br />
COLING-ACL98, Montreal, Canada, pp. 768–774.<br />
Michelbacher, L., Evert, S. and Schütze, H. (2007). Asymmetric association measures,<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> International Conference on Recent Advances in Natural Language<br />
Processing (RANLP-07), Borovets, Bulgaria.<br />
Padó, S. and Lapata, M. (2007). Dependency-based construction <strong>of</strong> semantic space models,<br />
Computational Linguistics 33(2): 161–199.<br />
Peirsman, Y., Heylen, K. and Speelman, D. (2007). Finding semantically related words in<br />
dutch. co-occurrences versus syntactic contexts, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> CoSMO Workshop,<br />
Roskilde, Denmark, pp. 9–16.<br />
Sahlgren, M. (2006). The Word-Space Model. Using Distributional Analysis to Represent<br />
Syntagmatic and Paradigmatic Relations Between Words in High-dimensional<br />
Vector Spaces, PhD <strong>the</strong>sis, Stockholm University.<br />
Schulte im Walde, S. and Melinger, A. (2005). Identifying Semantic Relations and Functional<br />
Properties <strong>of</strong> Human Verb Associations, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> joint Conference<br />
on Human Language Technology and Empirical Methods in Natural Language Processing,<br />
Vancouver, Canada, pp. 612–619.<br />
Schütze, H. (1998). Automatic word sense discrimination, Computational Linguistics<br />
24(1): 97–124.<br />
Vossen, P. (ed.) (1998). EuroWordNet: a Multilingual Database with Lexical Semantic<br />
Networks for European Languages, Dordrecht: Kluwer.<br />
Wu, Z. and Palmer, M. (1994). Verb semantics and lexical selection, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />
32nd Annual Meeting <strong>of</strong> <strong>the</strong> Association for Computational Linguistics (ACL-94),<br />
Las Cruces, NM, pp. <strong>13</strong>3–<strong>13</strong>8.<br />
152
EXAMINING THE NOTICING FUNCTION OF OUTPUT<br />
Maren Schierloh<br />
Michigan State University<br />
Abstract. Following Izumi and Bigelow’s research (Izumi and Bigelow, 2000), this study<br />
re-investigates <strong>the</strong> noticing function <strong>of</strong> output; that is, whe<strong>the</strong>r producing <strong>the</strong> target language<br />
focuses learners’ attention to second language (L2) structures in subsequent input. Izumi<br />
and Bigelow found no effects <strong>of</strong> output on ei<strong>the</strong>r noticing or acquisition. They attributed<br />
<strong>the</strong>ir findings to limitations in operationalizing noticing via underlining, coupled with <strong>the</strong><br />
relative difficulty <strong>of</strong> <strong>the</strong> target-structure (past-hypo<strong>the</strong>tical-conditional). Under <strong>the</strong> premise<br />
that <strong>the</strong> learner’s developmental level and attentional resources may constrain noticing, this<br />
partial replication addresses whe<strong>the</strong>r a less difficult structure may yield greater noticing and,<br />
consequently, greater L2 gains. Fifteen intermediate ESL learners were randomly assigned<br />
to two experimental groups (EGs) and one control group (CG). The first EG was given opportunities<br />
for output that elicited <strong>the</strong> past hypo<strong>the</strong>tical conditional (more difficult structure),<br />
while <strong>the</strong> second EG had opportunities to produce <strong>the</strong> present hypo<strong>the</strong>tical conditional (less<br />
difficult structure). The CG was not prompted to produce output that required use <strong>of</strong> ei<strong>the</strong>r<br />
structure. All groups engaged in follow-up reading and underlining activities. The reading<br />
texts modeled target-like use <strong>of</strong> <strong>the</strong> relevant structure for both EGs. Methodological<br />
triangulation measured noticing through underlining <strong>of</strong> <strong>the</strong> target-structure and stimulated<br />
recall to elicit data about cognitive processes involved. Additionally, noticing and L2 gains<br />
were assessed based on participants’ performance on subsequent essay-writing activities and<br />
posttests. Quantitative raw data revealed no effect <strong>of</strong> output (EGs vs. CG) or difficulty-level<br />
(EG1 vs. EG2) on <strong>the</strong> underlining <strong>of</strong> target forms in subsequent texts. Qualitative stimulated<br />
recall data, however, showed that output influences subsequent noticing <strong>of</strong> certain input<br />
elements; e.g. ’This is a good word for my essay’. Overall findings suggest that output<br />
can trigger noticing <strong>of</strong> vocabulary and fur<strong>the</strong>r illustrate how methodological triangulation<br />
can enhance insights into learners’ L2 processes. Thus, this study has ramifications for both<br />
classroom practices and research methodology.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In <strong>the</strong> past decade <strong>of</strong> second language acquisition (SLA) research, <strong>the</strong> notion that noticing<br />
is essential for <strong>the</strong> acquisition <strong>of</strong> new linguistic systems has been a matter <strong>of</strong> debate<br />
(Jourdenais, 2001; Leow, 2002; Robinson, 2001; Schmidt, 2001; Simard and Wong, 2001;<br />
Tomlin and Villa, 1994; Truscott, 1998). Much <strong>of</strong> <strong>the</strong> argumentation is grounded in <strong>the</strong><br />
difficulty <strong>of</strong> operationalizing and measuring <strong>the</strong> second language (L2) learner’s internal<br />
cognitive processes. Research in SLA and cognitive science has raised questions as to<br />
<strong>the</strong> type and amount <strong>of</strong> ’attention’ necessary for language learning, <strong>the</strong> specific aspects <strong>of</strong><br />
language that are more likely to be noticed, <strong>the</strong> what extent to which <strong>the</strong> developmental<br />
level <strong>of</strong> <strong>the</strong> learner determines what is noticed.<br />
Recently, researchers have turned <strong>the</strong>ir attention to <strong>the</strong> role output plays in noticing.<br />
The oral or written production <strong>of</strong> language may consciously induce learners to realize<br />
<strong>the</strong> gap between what <strong>the</strong>y want to say and what <strong>the</strong>y can say. This noticing <strong>of</strong> linguistic<br />
limitations may prompt learners to seek solutions in subsequent input. A study by<br />
Izumi and Bigelow (2000) centered on <strong>the</strong> noticing function <strong>of</strong> output. They investigated<br />
whe<strong>the</strong>r L2 written output promotes noticing <strong>of</strong> form in subsequent text. They compared<br />
an experimental group, which produced output, to a control group, which did not produce<br />
any output but engaged in comprehension-based activities instead. The noticing <strong>of</strong> <strong>the</strong><br />
153
participants was operationalized through <strong>the</strong> participants’ underlining <strong>of</strong> <strong>the</strong> target structure<br />
in written text. Both groups underlined <strong>the</strong> same amount and Izumi and Bigelow<br />
concluded that output does not trigger noticing. Because Izumi and Bigelow’s inquiry<br />
is <strong>of</strong> importance as it may inform L2 pedagogy, <strong>the</strong> present study partially replicates<br />
<strong>the</strong>ir study by asking analogous research questions and by implementing a similar design.<br />
Yet, to achieve a more valid measure <strong>of</strong> noticing, this study uses stimulated recall to tap<br />
into learners’ cognitive processes. In addition to <strong>the</strong> stimulated recall data, this study<br />
quantitatively and qualitatively analyzes <strong>the</strong> data from learners’ underlining and written<br />
production to better examine a possible relationship between output, noticing and L2 development.<br />
This research also addresses whe<strong>the</strong>r a cognitively less demanding structure<br />
may have an effect on noticing by <strong>the</strong> learner. The following section provides a review<br />
<strong>of</strong> <strong>the</strong> literature on noticing, followed by sections detailing <strong>the</strong> difficulties associated with<br />
measuring noticing, <strong>the</strong> role output plays in noticing as well as <strong>the</strong> role <strong>of</strong> <strong>the</strong> learner<br />
level. The third section details <strong>the</strong> research methodology, and <strong>the</strong> subsequent sections<br />
provide a discussion <strong>of</strong> findings and limitations and a conclusion.<br />
2 Review <strong>of</strong> <strong>the</strong> Literature<br />
2.1 Noticing and SLA<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Since Schmidt (1990) first proposed his well-known “noticing hypo<strong>the</strong>sis”, a large body<br />
<strong>of</strong> SLA and cognitive science research has focused on <strong>the</strong> role <strong>of</strong> noticing, or conscious<br />
attention 1 , in promoting L2 development (Alanen, 1995; Leow, 2002; Rosa and O’Neill,<br />
1999). The noticing hypo<strong>the</strong>sis claims that noticing requires awareness and is a necessary<br />
condition for second language acquisition. Yet, some research findings are not in line with<br />
<strong>the</strong> premise that conscious attention is a necessary prerequisite for L2 acquisition (Gass,<br />
Svetics and Lemelin, 2003; Robinson, 1995).<br />
Truscott rejects <strong>the</strong> crucial role <strong>of</strong> noticing in L2 learning process from a <strong>the</strong>oretical<br />
perspective, maintaining that noticing only advances metalinguistic knowledge but not<br />
competence. He fur<strong>the</strong>r contends that “awareness is not only unnecessary but also unhelpful”<br />
(Truscott, 1998, page 126). Such a narrow account <strong>of</strong> <strong>the</strong> role <strong>of</strong> noticing in SLA<br />
is certainly challenged by substantial L2 research data supporting that noticing facilitates<br />
L2 learning (Ellis, 1994; Long, 1996; Robinson, 1995; Swain and Lapkin, 1998).<br />
2.1.1 Operationalizing and Measuring Noticing<br />
At <strong>the</strong> heart <strong>of</strong> <strong>the</strong> ongoing debate on <strong>the</strong> role <strong>of</strong> noticing in SLA is <strong>the</strong> difficulty in<br />
operationalizing it, which requires introspection and assessment <strong>of</strong> learner-internal cognitive<br />
activities. For example, Schmidt (2001) operationalized noticing in terms <strong>of</strong> <strong>the</strong><br />
learners’ self-reporting ei<strong>the</strong>r during or immediately after exposure to <strong>the</strong> input, yet, <strong>the</strong><br />
lack <strong>of</strong> self-reporting should not be interpreted as a lack <strong>of</strong> awareness, as some thinking<br />
processes may be difficult to verbalize (Jourdenais, 2001; Schmidt, 2001). As such, <strong>the</strong><br />
challenge facing <strong>the</strong> measurement <strong>of</strong> noticing is to accurately link observable behaviors<br />
by language learners to <strong>the</strong> construct <strong>of</strong> noticing. Methodologies used to qualitatively and<br />
1 Due to terminological vagueness <strong>of</strong> ’noticing’ resulting from related terms such as ’attention’<br />
(Leow, 2002) and ’awareness’ (Tomlin and Villa, 1994) in noticing- literature, Schmidt’s definition <strong>of</strong><br />
noticing as ’conscious attention’ has been adopted for <strong>the</strong> present study (Schmidt, 2001). Schmidt equates<br />
consciousness with awareness and/or attention.<br />
154
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
quantitatively account for a learners’ noticing <strong>of</strong> a specific target language features fall<br />
into two categories: online, which measure <strong>the</strong> language learner’s noticing during performance<br />
<strong>of</strong> a certain language task, and <strong>of</strong>fline, which employs post-treatment assessment<br />
<strong>of</strong> noticing. Nei<strong>the</strong>r online nor <strong>of</strong>fline methodologies enable an absolute account <strong>of</strong> <strong>the</strong><br />
learners’ attentional processes.<br />
Online methodologies include, for example, think-aloud protocols which require <strong>the</strong><br />
participants to monitor and orally self-report <strong>the</strong>ir mental processes while <strong>the</strong>y perform a<br />
certain language task. Izumi and Bigelow used <strong>the</strong> online methodology <strong>of</strong> participants’<br />
underlining <strong>of</strong> “<strong>the</strong> word, words, or parts <strong>of</strong> <strong>the</strong> words that are [felt to be] particularly<br />
necessary for subsequent production” (Izumi and Bigelow, 2000, page 250). Izumi and<br />
Bigelow characterize underlining as an au<strong>the</strong>ntic procedure readers naturally do during<br />
a reading task, and argue that <strong>the</strong> marking <strong>of</strong> words would not occur without conscious<br />
awareness <strong>of</strong> <strong>the</strong> importance <strong>of</strong> that particular word or phrase. In partially replicating<br />
Izumi and Bigelow, <strong>the</strong> present study utilizes underlining as one integral attribute <strong>of</strong> <strong>the</strong><br />
triangulated measurement <strong>of</strong> noticing.<br />
The advantage <strong>of</strong> online measures, as opposed to post-exposure measures, is <strong>the</strong>ir instantaneous<br />
access to L2 processing, thus minimizing <strong>the</strong> risk <strong>of</strong> possible memory decay<br />
by <strong>the</strong> L2 learner (Gass and Mackey, 2000). Yet, stimulated recall has evolved as a<br />
sound and widely used <strong>of</strong>fline method to obtain data <strong>of</strong> <strong>the</strong> language learner’s thought<br />
processes. During stimulated recall, learners are prompted with a stimulus (e.g. learner’s<br />
written products or a video displaying <strong>the</strong> learner while engaging in <strong>the</strong> language task),<br />
and he/she is asked to report on thought processes while performing <strong>the</strong> language task.<br />
Note, however, that <strong>the</strong> lack <strong>of</strong> evidence <strong>of</strong> noticing in online or <strong>of</strong>fline protocol does not<br />
necessarily imply absence <strong>of</strong> noticing.<br />
2.1.2 Developmental Level as a Factor in Noticing<br />
In addition to <strong>the</strong> concern over how noticing data should be collected and analyzed, current<br />
SLA research has scrutinized connections between <strong>the</strong> difficulty level <strong>of</strong> <strong>the</strong> target<br />
language input and <strong>the</strong> learner’s attentional resources (Ellis, 1994; Gass et al., 2003; Philp,<br />
2003; VanPatten, 1996). Long (1996), for instance, found that <strong>the</strong> pr<strong>of</strong>iciency <strong>of</strong> <strong>the</strong><br />
learner may modulate noticing. Advanced learners may benefit from <strong>the</strong> increasing automaticity<br />
which allows <strong>the</strong>m to attend to more complex structures. A recent study by Philp<br />
(2003) similarly revealed that <strong>the</strong> developmental level <strong>of</strong> <strong>the</strong> learners was one factor to<br />
determine accurate recall <strong>of</strong> <strong>the</strong> reformulation by <strong>the</strong> native speaker. Thus, developmental<br />
readiness may constrain <strong>the</strong> learner’s attention to aspects <strong>of</strong> more difficult structures. In<br />
a similar vein, Robinson (1995) argued that <strong>the</strong> extent to which a language learner may<br />
notice a particular form <strong>of</strong> <strong>the</strong>ir linguistic limitations is dependent on <strong>the</strong> demands <strong>of</strong> <strong>the</strong><br />
pedagogical task.<br />
In <strong>the</strong> research by Izumi and Bigelow (2000), <strong>the</strong> study to be partially replicated here,<br />
<strong>the</strong> past hypo<strong>the</strong>tical conditional was selected as <strong>the</strong> target structure, based on <strong>the</strong> rationale<br />
that this structure poses some difficulty to <strong>the</strong> learner, which may trigger noticing <strong>of</strong><br />
linguistic limitations. Yet, learner level and attentional capacities for <strong>the</strong> target structure,<br />
it is <strong>of</strong> present interest whe<strong>the</strong>r reduced cognitive demands may yield greater noticing<br />
and, in turn, greater L2 gains.<br />
155
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
2.2 The Noticing Function <strong>of</strong> Output<br />
Underlying <strong>the</strong> relationship between noticing and SLA is <strong>the</strong> question <strong>of</strong> under what circumstances<br />
L2 learners may notice linguistic forms. Is it through input or through output,<br />
or both in combination? While <strong>the</strong> essential role <strong>of</strong> input for SLA is universally accepted,<br />
<strong>the</strong> sufficiency <strong>of</strong> input for acquisition has been debated since Swain first proposed her<br />
Output Hypo<strong>the</strong>sis (Swain, 1985) in reaction to Krashen’s view <strong>of</strong> primacy <strong>of</strong> comprehensible<br />
input (Krashen, 1982). While Swain does not negate <strong>the</strong> importance <strong>of</strong> input,<br />
she argues that “L2 output pushes learners to process language more deeply (with more<br />
mental effort) than does input” (Swain, 1995). A series <strong>of</strong> studies by Swain and Lapkin<br />
revealed noticing as one <strong>of</strong> <strong>the</strong> main reasons why producing output mediates L2 development<br />
(Swain and Lapkin, 1995; Swain and Lapkin, 1998). As such, <strong>the</strong>ir argument corresponds<br />
to Schmidt’s Noticing Hypo<strong>the</strong>sis. Because output focuses <strong>the</strong> learner’s attention<br />
on <strong>the</strong> L2 structures <strong>the</strong>y produce (<strong>the</strong>ir interlanguage), it enables <strong>the</strong>m to compare <strong>the</strong>ir<br />
interlanguage to <strong>the</strong> target language <strong>the</strong>y receive, <strong>the</strong>reby attending to <strong>the</strong>ir linguistic limitations<br />
(Gass and Varonis, 1994). If relevant input is immediately available afterwards,<br />
<strong>the</strong> noticing <strong>of</strong> <strong>the</strong> gap may cause <strong>the</strong> learner to process <strong>the</strong> subsequent input with more<br />
focused attention. This hypo<strong>the</strong>sis has been approached by Izumi and Bigelow (2000),<br />
which constitutes <strong>the</strong> basis <strong>of</strong> <strong>the</strong> research reported here.<br />
2.3 Izumi and Bigelow 2000<br />
Izumi and Bigelow addressed <strong>the</strong> issue <strong>of</strong> output and noticing in <strong>the</strong>ir study guided by<br />
two questions: (1) “Do output activities promote <strong>the</strong> noticing <strong>of</strong> linguistic form in subsequent<br />
input?” and (2) “Do <strong>the</strong>se output-input-activities result in improved production <strong>of</strong><br />
<strong>the</strong> target form?” (Izumi and Bigelow, 2000, page 247). They compared an EG, which<br />
was engaged in output tasks (essay writing and text reconstruction) to a CG, which did<br />
not produce any written output. Both groups received <strong>the</strong> same textual input for <strong>the</strong> subsequent<br />
reading and underlining activity; however, <strong>the</strong> groups were given different purposes<br />
for underlining which may have influenced participant’s attentional focus. In <strong>the</strong><br />
present study, all participants received <strong>the</strong> same instructions for <strong>the</strong> reading and underlining<br />
activity. In Izumi and Bigelow (2000), noticing <strong>of</strong> <strong>the</strong> target form (past hypo<strong>the</strong>tical<br />
conditional in English 2 ) was assessed through underlining and through <strong>the</strong> demonstration<br />
<strong>of</strong> uptake (correct use <strong>of</strong> <strong>the</strong> target form by <strong>the</strong> learner) as a complementary measure <strong>of</strong><br />
noticing and acquisition <strong>of</strong> that form. The study presented here did not treat uptake as a<br />
distinct measurement <strong>of</strong> noticing or acquisition, but qualitatively examined to what extent<br />
learner’s uptake corresponds to prior noticing action. Izumi and Bigelow attributed <strong>the</strong>ir<br />
non-significant findings to <strong>the</strong> relative difficulty <strong>of</strong> <strong>the</strong> target structure. Thus, this study<br />
investigates learners’ noticing when engaging with a less complex yet similar structure:<br />
<strong>the</strong> present hypo<strong>the</strong>tical conditional 3 .<br />
Except for one statistically significant increase <strong>of</strong> performance from <strong>the</strong> pretest to <strong>the</strong><br />
second posttest <strong>of</strong> <strong>the</strong> experimental group, Izumi and Bigelow evidenced no statistically<br />
significant between-group differences on any measure. Both groups underlined nearly <strong>the</strong><br />
same percentage <strong>of</strong> conditional-related forms. They concluded that output did not draw<br />
<strong>the</strong> learner’s attention to <strong>the</strong> targeted form, and insignificant results were attributed to<br />
2 i.e. If Lisa had traveled to Spain, she would have seen <strong>the</strong> Olympic games.<br />
3 i.e. If Lisa traveled to Spain, she would see <strong>the</strong> Olympic Games<br />
156
effects <strong>of</strong> input flood and individual variation. I argue that underlining as a single measure<br />
gives an insufficient account <strong>of</strong> learners’ cognitive processes, and I hypo<strong>the</strong>size that <strong>the</strong><br />
output treatment could have been observed to trigger noticing if additional qualitative and<br />
quantitative measures had been employed. Therefore, <strong>the</strong> present study follows Izumi and<br />
Bigelow’s suggestion to implement “methodological triangulation as <strong>the</strong> research design<br />
allows” (Izumi and Bigelow, 2000, page 271) by operationalizing noticing through targetstructure<br />
underlining and reporting <strong>of</strong> conscious attention during <strong>the</strong> stimulated recall<br />
session. In o<strong>the</strong>r words, through triangulated data collection, noticing is investigated<br />
from multiple perspectives.<br />
3 Research Questions and Hypo<strong>the</strong>ses<br />
In order to validly replicate Izumi and Bigelow’s study (Izumi and Bigelow, 2000), similar<br />
research questions are pursued along with <strong>the</strong>ir congruent hypo<strong>the</strong>ses:<br />
RQ1: Do output activities promote noticing <strong>of</strong> linguistic form in subsequent input?<br />
RQ2: Do <strong>the</strong>se output-input activities result in improved production <strong>of</strong> <strong>the</strong> target<br />
form?<br />
It is hypo<strong>the</strong>sized that <strong>the</strong> experimental groups, which are required to produce output,<br />
would show greater noticing <strong>of</strong> <strong>the</strong> target-structure contained in <strong>the</strong> input than <strong>the</strong> control<br />
group, which does not produce output requiring <strong>the</strong> use <strong>of</strong> <strong>the</strong> target-structure. Fur<strong>the</strong>rmore,<br />
on <strong>the</strong> posttests, <strong>the</strong> experimental groups are expected to demonstrate greater gains<br />
in accuracy <strong>of</strong> <strong>the</strong>ir use <strong>of</strong> <strong>the</strong> target form than <strong>the</strong> control group. Given that prior research<br />
found <strong>the</strong> language learner’s developmental level to be associated with attentional<br />
resources available for <strong>the</strong> target-structure, it is hypo<strong>the</strong>sized that a less difficult targetstructure<br />
promotes greater noticing and greater L2 gains. Thus, <strong>the</strong> present study is fur<strong>the</strong>r<br />
guided by <strong>the</strong> following research question:<br />
RQ3: Does <strong>the</strong> present hypo<strong>the</strong>tical conditional, as a less difficult structure, promote<br />
greater noticing compared to <strong>the</strong> past-hypo<strong>the</strong>tical-conditional structure?<br />
4 Methodology<br />
4.1 Participants<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Fifteen intermediate ESL learners enrolled in <strong>the</strong> second semester ESL academic writing<br />
class at Michigan State University have participated up to this point. <strong>Student</strong>s’ enrollment<br />
in <strong>the</strong> ESL academic writing class is determined by a placement test or by passing <strong>the</strong><br />
previous course. The ESL learners were from a variety <strong>of</strong> L2 backgrounds including<br />
Cantonese, Japanese, Korean and Arabic with an average <strong>of</strong> 8.7 years <strong>of</strong> previous English<br />
study 4 . Three students have lived in <strong>the</strong> United States for more than two years, and <strong>the</strong><br />
remaining have resided <strong>the</strong>re for at least a year. Upon completion <strong>of</strong> <strong>the</strong> questionnaire,<br />
participants were randomly assigned to one <strong>of</strong> <strong>the</strong> two experimental groups (EGs) or to<br />
<strong>the</strong> single control group (CG).<br />
4 It needs to be noted that <strong>the</strong> different native languages <strong>of</strong> <strong>the</strong> learners affect <strong>the</strong>ir proximity to (distance<br />
from) English, which could make some structures easier (more difficult) to process for some learners than<br />
for o<strong>the</strong>rs. The native language <strong>of</strong> <strong>the</strong> participant was not systematically investigated here.<br />
157
4.2 Procedures<br />
The experiment followed a pretest-posttest design. The researcher met one-on-one with<br />
each participant for about 1 or 1.5 hours depending on whe<strong>the</strong>r <strong>the</strong> participants chose<br />
to take part in <strong>the</strong> stimulated recall session or not. The participants were informed <strong>of</strong><br />
<strong>the</strong> sequence <strong>of</strong> <strong>the</strong> activities before <strong>the</strong>y completed <strong>the</strong> pretest (see Appendix A for an<br />
example) and <strong>the</strong> reading and writing activities. Participants assigned to <strong>the</strong> first experimental<br />
group (EG1) composed an essay that elicited <strong>the</strong> past hypo<strong>the</strong>tical conditional<br />
(Appendix B), whereas participants assigned to <strong>the</strong> second experimental group (EG2)<br />
composed an essay that elicited <strong>the</strong> present hypo<strong>the</strong>tical conditional. Participants in <strong>the</strong><br />
control group (CG) engaged in a writing task that did not require <strong>the</strong> use <strong>of</strong> ei<strong>the</strong>r structure.<br />
Each participant subsequently received input that modeled <strong>the</strong> correct use <strong>of</strong> <strong>the</strong><br />
relevant target structures (Appendix C); yet, for <strong>the</strong> CG, <strong>the</strong> reading text did not serve<br />
as a model. All groups were instructed to ei<strong>the</strong>r underline what [<strong>the</strong>y] feel is important<br />
for re-writing <strong>the</strong> essay or underline what [<strong>the</strong>y] feel is important for writing an essay<br />
about this topic. By leaving <strong>the</strong> words to be underlined unspecified, <strong>the</strong> learner’s attentional<br />
foci were not predisposed. Before <strong>the</strong> participants carried out <strong>the</strong> actual task, <strong>the</strong><br />
grammar-focused underlining was demonstrated to <strong>the</strong> students using a passage that did<br />
not contain <strong>the</strong> target-structure 5 . Following <strong>the</strong> reading and underlining activity, all participants<br />
in <strong>the</strong> EGs reproduced <strong>the</strong>ir initial essay, whereas <strong>the</strong> CG group wrote about<br />
<strong>the</strong> EGs’ initial essay topic for <strong>the</strong> first time. The immediate posttest was administered<br />
upon completion <strong>of</strong> <strong>the</strong> second essay writing activity or after <strong>the</strong> stimulated recall session<br />
depending on whe<strong>the</strong>r or not participants took part in <strong>the</strong> stimulated recall interview. The<br />
delayed posttest was given after one week had passed 6 . Four participants <strong>of</strong> each EG and<br />
three participants <strong>of</strong> <strong>the</strong> CG volunteered to being videotaped during <strong>the</strong> reading activity.<br />
To better track <strong>the</strong>ir focus during <strong>the</strong> reading and underlining task, <strong>the</strong> videotaped participants<br />
were asked to read aloud. Immediately following completion <strong>of</strong> <strong>the</strong> second essay,<br />
<strong>the</strong> videotape was rewound and played to <strong>the</strong> learner. While watching <strong>the</strong> videotape, <strong>the</strong><br />
researcher stopped <strong>the</strong> tapes after episodes that appeared to involve noticing <strong>of</strong> linguistic<br />
features (i.e. underlining or hesitation), asking <strong>the</strong> student to describe his/her thoughts<br />
during that time. English was used during all interactions between <strong>the</strong> participants and<br />
<strong>the</strong> researchers, which were audio recorded for transcription purposes.<br />
5 Results and Discussion<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
The first research question asked whe<strong>the</strong>r output activities promote noticing <strong>of</strong> grammatical<br />
features in subsequent input. In a restricted way, <strong>the</strong> hypo<strong>the</strong>sis predicting greater<br />
noticing <strong>of</strong> <strong>the</strong> target forms for <strong>the</strong> EGs than <strong>the</strong> CGs was not confirmed (p 0.5) 7 . All<br />
participants underlined vocabulary items ra<strong>the</strong>r than <strong>the</strong> grammatical cues in <strong>the</strong> reading<br />
text. However, <strong>the</strong> present study does show that output had an effect on learners’<br />
attentional foci and input processing. While no participant appeared to notice <strong>the</strong> target<br />
form, most participants’ attention was drawn to <strong>the</strong> vocabulary in order to process <strong>the</strong><br />
main message <strong>of</strong> <strong>the</strong> input passages. The predicted effect <strong>of</strong> output in promoting noticing<br />
5 Modeling familiarizes <strong>the</strong> learners with <strong>the</strong> underlining procedure and increases precision <strong>of</strong> <strong>the</strong> mea-<br />
sure 6Three students did not show up for <strong>the</strong> delayed posttest<br />
7 Wilcoxon-signed-rank tests were used for within-group comparisons<br />
158
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
<strong>of</strong> <strong>the</strong> correct use <strong>of</strong> conditional sentences was not supported in this study. Similarly,<br />
<strong>the</strong> output-input-output treatment did not alter <strong>the</strong> students’ level performance on <strong>the</strong> immediate<br />
and delayed posttests when compared to <strong>the</strong> input-output treatment. However,<br />
output appeared to trigger noticing <strong>of</strong> vocabulary, style, and some content issues. This<br />
finding will be discussed in more detail below.<br />
The second research question addressed <strong>the</strong> acquisition issue and inquired whe<strong>the</strong>r<br />
output-input activities results in improved production <strong>of</strong> <strong>the</strong> target form. The present<br />
study did not yield clear results in support <strong>of</strong> such a relationship, mainly because <strong>the</strong><br />
noticing scores could not be sufficiently squared with posttest scores as <strong>the</strong>re was a lack<br />
<strong>of</strong> grammar-related noticing with all candidates during <strong>the</strong> treatment phase. Put ano<strong>the</strong>r<br />
way, <strong>the</strong> posttests do not provide a measure <strong>of</strong> <strong>the</strong> effect <strong>of</strong> noticing. Future research will<br />
need to use correlation analyses in order to square <strong>the</strong> underlining, stimulated recall, and<br />
second essay scores (as a measure <strong>of</strong> noticing) with gain on individualized vocabulary<br />
posttests. While data from <strong>the</strong> underlining, second essay, and stimulated recall point to a<br />
link between noticing and <strong>the</strong> subsequent use <strong>of</strong> noticed items, it would be too suggestive<br />
to claim a causal relationship between noticing and acquisition.<br />
Under <strong>the</strong> premise that attentional resources constrain noticing, <strong>the</strong> third research question<br />
asked whe<strong>the</strong>r a less difficult structure promotes greater noticing than a more difficult<br />
structure. The results <strong>of</strong> this study suggest that <strong>the</strong> less difficult structure had no effect<br />
on noticing or L2 learning 8 . There was no notable difference between EG1 and EG2 performance<br />
on any measure. Of course, any interpretation <strong>of</strong> <strong>the</strong> test-, noticing-, or essay<br />
scores would be unconvincing, given that only three candidates could be compared to<br />
ano<strong>the</strong>r set <strong>of</strong> three candidates. The small number <strong>of</strong> participants notwithstanding, one<br />
possible explanation for this finding might be that <strong>the</strong> less difficult structure was not significantly<br />
easier. The production and processing <strong>of</strong> <strong>the</strong> present-hypo<strong>the</strong>tical- conditional<br />
may have been just as cognitively demanding as <strong>the</strong> past-hypo<strong>the</strong>tical-conditional. Thus,<br />
<strong>the</strong> results <strong>of</strong> this research can not validate (nor invalidate) <strong>the</strong> claim that <strong>the</strong> demands<br />
<strong>of</strong> <strong>the</strong> targeted grammatical structure or <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> pedagogical task have no<br />
effect on learners’ attentional resources. In order to tap into a possible relationship between<br />
cognitive demands and noticing, <strong>the</strong> fact that one task is indeed cognitively more<br />
demanding must first be established. Before this research project is fur<strong>the</strong>r pursued, <strong>the</strong><br />
relative difficulty <strong>of</strong> both structures needs to be evaluated with a larger number <strong>of</strong> ESL<br />
learners.<br />
Although <strong>the</strong> research hypo<strong>the</strong>ses <strong>of</strong> this study have not been supported, this research<br />
demonstrates that noticing has occurred. These insights contrast with Izumi and Bigelow’s<br />
conclusion that output does not trigger learner’s noticing (Izumi and Bigelow, 2000). The<br />
present study demonstrates that output treatment influences learners’ subsequent cognitive<br />
processes, e.g., that is good way to say. I also wanna say something like that, but my<br />
essay is not so good, so I try to remember. As such, output focused <strong>the</strong> learner’s attention<br />
to specific linguistic features in <strong>the</strong> output, and those noticed features were <strong>the</strong>n compared<br />
to <strong>the</strong> features <strong>the</strong> learner had produced in <strong>the</strong>ir first writing activity. Yet, <strong>the</strong> data from<br />
this study leaves us to wonder whe<strong>the</strong>r (and to what extent) <strong>the</strong> noticed features were<br />
incorporated into <strong>the</strong> interlanguage system. Chaudron (1985) argued that L2 learning involves<br />
two stages: first, <strong>the</strong> perception <strong>of</strong> input (noticing), and second, <strong>the</strong> integration <strong>of</strong><br />
intake into <strong>the</strong> learner’s interlanguage system. Gass and Varonis (1994) similarly sug-<br />
8 Mann-Whitney-U tests were used for between-group comparisons<br />
159
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
gested that learners need to apperceive input before it can become intake. According to<br />
Ellis “Intake occurs when learners take features into <strong>the</strong>ir short- or medium-term memories,<br />
whereas interlanguage change occurs only when <strong>the</strong>y become part <strong>of</strong> long term<br />
memory” (Ellis, 1997, page 119). Accordingly, <strong>the</strong> learner has to convert from preliminary<br />
to final intake. It would be interesting to investigate whe<strong>the</strong>r <strong>the</strong> learners in this<br />
study process <strong>the</strong> new linguistic items (e.g., vocabulary) beyond noticing and immediate<br />
intake, in order to contribute to <strong>the</strong>ory building on input, intake, and L2 acquisition. A<br />
possible way <strong>of</strong> approaching this would be to include a delayed essay production task to<br />
see whe<strong>the</strong>r, and to what extent, <strong>the</strong> learners arrived at <strong>the</strong> “final intake stage”.<br />
The overall findings <strong>of</strong> <strong>the</strong> present study indicate that learners processed <strong>the</strong> input<br />
primarily for meaning. Although no form-focused comparisons were invoked, EG candidates<br />
noticed a difference between <strong>the</strong>ir word choice and style and those <strong>of</strong> <strong>the</strong> native<br />
speaker. These findings are in line with VanPatten (1996) who proposed input processing<br />
principles:<br />
1. The Primacy <strong>of</strong> Meaning Principle: Learners process input for meaning before <strong>the</strong>y<br />
process it for form.<br />
2. The Primacy <strong>of</strong> Content Words Principle: Learners process content words in <strong>the</strong><br />
input before anything else.<br />
3. The Lexical Preference Principle: Learners will tend to rely on lexical items as<br />
opposed to grammatical form to get meaning when both encode <strong>the</strong> same semantic<br />
information 9 .<br />
Applying VanPatten’s principles to <strong>the</strong> present study, it might be that all participants<br />
processed meaningful elements in <strong>the</strong> input while reading <strong>the</strong> input text. This may explain<br />
why <strong>the</strong>y did not underline grammatical elements such as modals like would and could,<br />
auxiliaries and past participles. Because learners were not capable <strong>of</strong> attending to vocabulary<br />
and grammar, <strong>the</strong> past/present hypo<strong>the</strong>tical conditional may have been processed<br />
only peripherally.<br />
Based on <strong>the</strong> input processing principles, VanPatten (1996) investigated <strong>the</strong> effects<br />
<strong>of</strong> processing instruction, revealing that learners’ focal attention during processing can<br />
be directed toward <strong>the</strong> relevant grammatical items and, in turn, enhance L2 learning.<br />
Follow-up research should investigate whe<strong>the</strong>r input enhancement or specific instructions<br />
to underline grammatical structures (e.g., <strong>the</strong> past/present hypo<strong>the</strong>tical conditional) would<br />
enhance noticing, intake, and L2 acquisition. The present study did not provide any specific<br />
instructions for <strong>the</strong> underlining, but to underline what is important for subsequent<br />
production, on purpose: The study’s objective was to see whe<strong>the</strong>r output which requires<br />
use <strong>of</strong> a particular structure results in underlining <strong>of</strong> that particular structure in <strong>the</strong> subsequent<br />
input passage. If <strong>the</strong> learners were told to underline grammatical structures, <strong>the</strong>ir<br />
attentional foci would have been predisposed, as it was <strong>the</strong> case in Izumi and Bigelow<br />
(2000).<br />
The findings in <strong>the</strong> present study also raise important methodological issues that should<br />
be addressed in future studies that investigate <strong>the</strong> role <strong>of</strong> noticing in SLA. First and foremost,<br />
this study has shown that triangulated or multiple data-elicitation measures can<br />
9 Only <strong>the</strong> relevant subset <strong>of</strong> <strong>the</strong> entire set <strong>of</strong> input processing principles is presented here<br />
160
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
provide a much more complex picture <strong>of</strong> learners’ internal processes. In this study, <strong>the</strong><br />
underlining, <strong>the</strong> essays, <strong>the</strong> tests scores, and <strong>the</strong> verbal reports from <strong>the</strong> stimulated recall<br />
session, all helped to puzzle out <strong>the</strong> role <strong>of</strong> output and noticing in second language<br />
acquisition. Although verbal stimulated recall reports cannot provide a complete reflection<br />
<strong>of</strong> actual internal processing, <strong>the</strong>y provided useful information as to how learners’<br />
minds process language information when <strong>the</strong> learners articulated <strong>the</strong>ir concerns (e.g., I<br />
wanna say something like that, but my essay is not so good, so I try to remember) or<br />
when <strong>the</strong>y made comparisons to <strong>the</strong>ir first essay (e.g., This is a big word I want to remember).<br />
Learners underlined <strong>the</strong> words that captured <strong>the</strong> author’s key message, and<br />
<strong>the</strong>ir comments reflected <strong>the</strong>ir intent in seeking meaning and better vocabulary for use in<br />
<strong>the</strong>ir second essay. The stimulated recall protocols obtained in this study collaboratively<br />
demonstrate that <strong>the</strong> learners did not attend to grammatical features. Additionally, <strong>the</strong><br />
data from <strong>the</strong> first and second essays illustrate that students improved <strong>the</strong>ir expression<br />
and word choice, but not <strong>the</strong>ir grammatical accuracy. Izumi and Bigelow were unable to<br />
draw such conclusions as <strong>the</strong>y limited <strong>the</strong>ir measurement <strong>of</strong> noticing and <strong>the</strong>ir measurement<br />
<strong>of</strong> acquisition to <strong>the</strong> underlining <strong>of</strong> conditional related items and posttest scores,<br />
respectively. Izumi and Bigelow found that output does not prompt <strong>the</strong> learners to “notice<br />
<strong>the</strong> gap”. The present study, by contrast, reveals that some learners were aware that <strong>the</strong>y<br />
could not express <strong>the</strong>mselves as entirely as <strong>the</strong>y wished, (e.g., I want to say negotiate in<br />
my essay, but I don’t remember it). They noticed <strong>the</strong>ir restricted lexicon and searched<br />
for more appropriate words in <strong>the</strong> input passage. In o<strong>the</strong>r words, students realized lexical<br />
gaps which triggered <strong>the</strong>ir attention to vocabulary in subsequent input.<br />
6 Limitations and Future Research<br />
Although <strong>the</strong> present study sheds some light on meaning-focused processing and noticing<br />
as well as methodological issues, <strong>the</strong>re are some limitations that need to be acknowledged.<br />
First and foremost, <strong>the</strong> small number <strong>of</strong> participants clearly limits <strong>the</strong> generalization <strong>of</strong><br />
findings to a broader variety <strong>of</strong> L2 learners 10 . Proceeding with this research up to a minimum<br />
<strong>of</strong> twenty-one participants will reveal whe<strong>the</strong>r <strong>the</strong> current trends hold true. Fur<strong>the</strong>r<br />
study may include asking non-stimulated recall participants about what <strong>the</strong>y have noticed<br />
in a short questionnaire and what <strong>the</strong>y assume <strong>the</strong> purpose <strong>of</strong> <strong>the</strong> reading and writing tasks<br />
were.<br />
The testing instruments employed in this study are limited in length and scope which<br />
may have impacted <strong>the</strong> measurement <strong>of</strong> L2 attainment. Whereas a more comprehensive<br />
test <strong>of</strong> <strong>the</strong> past-hypo<strong>the</strong>tical-conditional may yield more valid results, it may also prompt<br />
participants to pay closer attention to <strong>the</strong> form in <strong>the</strong> input passage. Consequently, a tenable<br />
comparison between output and no-output treatment would be difficult, as all groups<br />
would produce <strong>the</strong> target form to <strong>the</strong> same extent. As mentioned earlier, in order to better<br />
understand <strong>the</strong> relationship between attention and learning, future research may develop<br />
tests that examine students’ acquisition <strong>of</strong> noticed vocabulary items. For such measurement,<br />
individualized delayed posttests in which <strong>the</strong> noticed (underlined and commented)<br />
items are assessed in terms <strong>of</strong> adequate usage and comprehension would be appropriate.<br />
10 The fact that <strong>the</strong> participants were willing to take part in <strong>the</strong> study outside <strong>of</strong> class time may have lead<br />
to a participant body that is more motivated and eager to improve than <strong>the</strong> average intermediate ESL learner<br />
161
7 Conclusions<br />
The purpose <strong>of</strong> this study was to investigate <strong>the</strong> effects <strong>of</strong> output and cognitive demands<br />
on noticing and second language acquisition, providing <strong>the</strong> following two merits: First,<br />
this study has demonstrated how multiple perspectives can help to obtain insights into<br />
learners’ cognitive processes. Secondly, <strong>the</strong> results <strong>of</strong> this study support <strong>the</strong> noticing<br />
function <strong>of</strong> output to some extent. Output-input treatment has shown to trigger comparison<br />
<strong>of</strong> <strong>the</strong> learner’s interlanguage lexicon with language produced by a native speaker.<br />
Fur<strong>the</strong>rmore, this study demonstrates that learners primarily attend to meaning, which<br />
is in line with VanPatten’s input processing principles (VanPatten, 1996). However, <strong>the</strong><br />
overall results do not allow for clear conclusions. Much more research is needed to find<br />
<strong>the</strong> extent to which learners notice specific features in <strong>the</strong> input as well as to explore <strong>the</strong><br />
very mechanisms <strong>of</strong> noticing. Until <strong>the</strong>n, our understanding <strong>of</strong> what takes place in <strong>the</strong><br />
learners head remains complex and opaque.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Alanen, R. (1995). Input enhancement and rule representation in second language acquisition,<br />
in R. Schmidt (ed.), Attention and Awareness in Foreign Language Learning,<br />
University <strong>of</strong> Hawai’i Press, Honolulu.<br />
Chaudron, C. (1985). Intake: On models and methods for discovering learners’ processing<br />
<strong>of</strong> input, Studies in Second Language Acquisition 7(1): 1–14.<br />
Ellis, R. (1994). Factors in <strong>the</strong> incidental acquisition <strong>of</strong> second language vocabulary from<br />
oral input: A review essay, Applied Language Learning 5(1): 1–32.<br />
Ellis, R. (1997). SLA Research and Language Teaching, University Press, Oxford.<br />
Gass, S. and Mackey, A. (2000). Stimulated recall methodology in second language<br />
research, Lawrence Erlbaum Associates, London.<br />
Gass, S., Svetics, I. and Lemelin, S. (2003). Differential effects <strong>of</strong> attention, Language<br />
Learning 53(3): 497–545.<br />
Gass, S. and Varonis, E. M. (1994). Input, interaction and second language production,<br />
Studies in Second Language Acquisition 16(3): 283–302.<br />
Izumi, S. and Bigelow, M. (2000). Does output promote noticing and second language<br />
acquisition, TESOL Quarterly 34(2): 239–287.<br />
Jourdenais, R. (2001). Cognition, instruction and protocol analysis, in P. Robinson (ed.),<br />
Cognition and Second Language Instruction, Cambridge University Press, New<br />
York.<br />
Krashen, S. (1982). Principles and Practice in Second Language Acquisition, Pergamon,<br />
Oxford.<br />
Leow, R. P. (2002). Models, attention, and awareness in sla, Studies in Second Language<br />
Acquisition 24(1): 1<strong>13</strong>–119.<br />
162
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Long, M. (1996). The role <strong>of</strong> linguistic environment in second language acquisition,<br />
in W. C. Ritchie and T. K. Bhatia (eds), The Handbook <strong>of</strong> Language Acquisition,<br />
Academic Press, San Diego.<br />
Philp, J. (2003). Constraints on noticing <strong>the</strong> gap, Studies in Second Language Acquisition<br />
25(1): 99–126.<br />
Robinson, P. (1995). Attention, memory, and <strong>the</strong> noticing hypo<strong>the</strong>sis, Language Learning<br />
45(2): 283–331.<br />
Robinson, P. (2001). Individual differences, cognitive abilities, aptitude complexes and<br />
learning conditions in second language acquisition, Second Language Research<br />
17(4): 368–392.<br />
Rosa, E. and O’Neill, M. D. (1999). Explicitness, intake and <strong>the</strong> issue <strong>of</strong> awareness,<br />
Studies in Second Language Acquisition 21(4): 511–556.<br />
Schmidt, R. (1990). The role <strong>of</strong> consciousness in second language learning, Applied<br />
Linguistics 11(2): 129–158.<br />
Schmidt, R. (2001). Attention, in P. Robinson (ed.), Cognition and Second Language<br />
Instruction, Cambridge University Press, New York.<br />
Simard, D. and Wong, W. (2001). Alertness, orientation and detection, Studies in Second<br />
Language Acquisition 23(1): 103–124.<br />
Swain, M. (1985). Communicative competence: Some roles <strong>of</strong> comprehensible input and<br />
comprehensible output in its development, in S. Gass and C. Madden (eds), Input in<br />
Second Language Acquisition, Heinle & Heinle, Boston.<br />
Swain, M. (1995). Three functions <strong>of</strong> output in second language learning, in G. Cook and<br />
B. Seidlh<strong>of</strong>er (eds), Principles and practice in applied linguistics: Studies in honor<br />
<strong>of</strong> H. Widdowson, University Press, Oxford.<br />
Swain, M. and Lapkin, S. (1995). Problems in output and <strong>the</strong> cognitive processes <strong>the</strong>y<br />
generate: A step towards second language learning, Applied Linguistics 16(3): 371–<br />
391.<br />
Swain, M. and Lapkin, S. (1998). Interaction and second language learning: Two adolescent<br />
french immersion students working toge<strong>the</strong>r, Modern Language Journal<br />
82(3): 320–337.<br />
Tomlin, R. and Villa, V. (1994). Attention in cognitive science and second language<br />
acquisition, Studies in Second Language Acquisition 16(2): 183–204.<br />
Truscott, J. (1998). Noticing in second language acquisition: A critical review, Second<br />
Language Research 24(2): 103–<strong>13</strong>5.<br />
VanPatten, B. (1996). Input processing and grammar instruction in second language<br />
acquisition, Ablex, Westport.<br />
163
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
164
CDIPROVER3: A TOOL FOR PROVING DERIVATIONAL COMPLEXITIES<br />
OF TERM REWRITING SYSTEMS<br />
Andreas Schnabl<br />
University <strong>of</strong> Innsbruck<br />
Abstract. This paper describes cdiprover3 a tool for proving termination <strong>of</strong> term rewrite<br />
systems by polynomial interpretations and context dependent interpretations. The methods<br />
used by cdiprover3 induce small bounds on <strong>the</strong> derivational complexity <strong>of</strong> <strong>the</strong> considered<br />
system. We explain <strong>the</strong> tool in detail, and give an overview <strong>of</strong> <strong>the</strong> employed pro<strong>of</strong> methods.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Term rewriting is a Turing complete model <strong>of</strong> computation, which is conceptually closely<br />
related to declarative and (first-order) functional programming. One <strong>of</strong> its most studied<br />
properties, termination, is also a central problem in computer science. This property is<br />
undecidable in general, but many partial decision methods have been developed in <strong>the</strong><br />
last decades. Beyond showing termination <strong>of</strong> a given rewriting system, some <strong>of</strong> <strong>the</strong>se<br />
methods can also give bounds on different measures <strong>of</strong> its complexity. As suggested in<br />
(H<strong>of</strong>bauer and Lautemann, 1989), a natural way <strong>of</strong> measuring <strong>the</strong> complexity <strong>of</strong> a term<br />
rewrite system is to analyze its derivational complexity. The derivational complexity is<br />
a function which relates <strong>the</strong> size <strong>of</strong> a term and <strong>the</strong> maximal number <strong>of</strong> rewrite steps that<br />
can be executed starting from any term <strong>of</strong> that size in <strong>the</strong> given rewrite system . We<br />
are particularly interested in small, i.e. polynomial upper bounds on this function. In<br />
contrast to our approach <strong>of</strong> measuring derivational complexity, <strong>the</strong> constructor discipline<br />
is mentioned in (Lescanne, 1995). In this field, we look at <strong>the</strong> complexity <strong>of</strong> <strong>the</strong> function<br />
that is encoded by a constructor system. It is ei<strong>the</strong>r measured by <strong>the</strong> number <strong>of</strong> rewrite<br />
steps needed to bring <strong>the</strong> term into normal form (Bonfante, Cichon, Marion and Touzet,<br />
n.d.; Avanzini and Moser, 2008), or by counting <strong>the</strong> number <strong>of</strong> steps needed by some<br />
evaluation mechanism different from standard term rewriting (Marion, 2003; Bonfante,<br />
Marion and Péchoux, 2007).<br />
In this paper, we describe cdiprover3, a tool which uses polynomial and contextdependent<br />
interpretations in order to prove termination and complexity bounds <strong>of</strong> term<br />
rewrite systems. The tool, its predecessors, and full experimental data are available at<br />
http://cl-informatik.uibk.ac.at/˜aschnabl/experiments/cdi/ .<br />
s Polynomial interpretations, introduced in (Lankford, 1979), are a standard direct termination<br />
pro<strong>of</strong> method. Besides showing termination <strong>of</strong> rewrite systems, <strong>the</strong>y also provide<br />
an easy way to extract upper bounds on <strong>the</strong> derivational complexity (H<strong>of</strong>bauer and<br />
Lautemann, 1989). However, as noticed in (H<strong>of</strong>bauer, 2001), this <strong>of</strong>ten heavily overestimates<br />
<strong>the</strong> derivational complexity. Context dependent interpretations, also introduced in<br />
(H<strong>of</strong>bauer, 2001), are an effort to improve <strong>the</strong>se upper bounds.<br />
165
The remainder <strong>of</strong> this paper is organised as follows: Section 2 outlines <strong>the</strong> basics <strong>of</strong><br />
term rewriting needed to state all relevant results. In Section 3, we briefly describe polynomial<br />
and context dependent interpretations, which are used by cdiprover3. Section<br />
4 describes <strong>the</strong> implementation <strong>of</strong> cdiprover3, and mentions some experimental results.<br />
In Section 5, we explain <strong>the</strong> input and output <strong>of</strong> cdiprover3 in detail. Last, in<br />
Section 6, we state conclusions and potential future work.<br />
2 Term Rewriting<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In this section, we review some basics <strong>of</strong> term rewriting. We only cover <strong>the</strong> concepts<br />
which are relevant to this paper. A general introduction to term rewriting can be found in<br />
(Baader and Nipkow, 1998; TeReSe, 2003), for instance.<br />
A term rewrite system (TRS) R consists <strong>of</strong> a signature F, a countably infinite set <strong>of</strong><br />
variables V disjoint from F, and a finite set <strong>of</strong> rewrite rules l → r, where l and r are terms<br />
such that l /∈ V and all variables which occur in r also occur in l. The signature F defines<br />
a set <strong>of</strong> function symbols, and assigns to each function symbol f its arity. We assume that<br />
every signature contains at least one function symbol <strong>of</strong> arity 0. The set <strong>of</strong> terms built<br />
from F and V is denoted by T (F, V). The set <strong>of</strong> terms T (F) without any variables is<br />
called <strong>the</strong> set <strong>of</strong> ground terms over F. A function symbol is defined if it occurs at <strong>the</strong><br />
root <strong>of</strong> a left hand side <strong>of</strong> a rewrite rule. All non-defined function symbols are called<br />
constructors. A constructor based term is a term containing exactly one defined function<br />
symbol, which appears at <strong>the</strong> root <strong>of</strong> that term. We call <strong>the</strong> total number <strong>of</strong> function<br />
symbol and variable occurrences in a term t its size, denoted by |t|. A substitution is a<br />
mapping σ : Dom(σ) → T (F, V), where Dom(σ) is a finite subset <strong>of</strong> V. The result <strong>of</strong><br />
replacing all occurrences <strong>of</strong> variables x ∈ Dom(σ) in a term t by σ(x) is denoted by tσ.<br />
A context is a term C[�] containing a single occurrence <strong>of</strong> a fresh function symbol � <strong>of</strong><br />
arity 0. If we replace � with a term t, we denote <strong>the</strong> resulting term by C[t]. Given a TRS<br />
R and two terms s, t, we say that s rewrites to t (s →R t) if <strong>the</strong>re exist a context C, a<br />
substitution σ and a rewrite rule l → r in R such that s = C[lσ] and t = C[rσ]. The<br />
transitive closure <strong>of</strong> this relation is → +<br />
R . The reflexive and transitive closure is →∗R . We<br />
write →n R to express n-fold composition <strong>of</strong> →R. A TRS R is terminating if <strong>the</strong>re exists<br />
no infinite chain <strong>of</strong> terms t0, t1, . . . such that ti →R ti+1 for each i ∈ N. For a terminating<br />
TRS R, <strong>the</strong> derivation length <strong>of</strong> a ground term t is defined as dlR(t) = max{n | ∃s :<br />
t →n R s}. The derivational complexity is <strong>the</strong> function dcR : N → N which maps n to<br />
max{dlR(t) | |t| = n}.<br />
3 Used Termination Pro<strong>of</strong> Methods<br />
3.1 Polynomial Interpretations<br />
An F-algebra A for some signature F consists <strong>of</strong> a carrier A and interpretation functions<br />
{fA : A n → A | f ∈ F, n = arity(f)}. Given an assignment α : V → A, we denote <strong>the</strong><br />
evaluation <strong>of</strong> a term t into A by [α]A(t). It is defined inductively as follows:<br />
[α]A(x) = α(x) for x ∈ V<br />
[α]A(f(t1, . . . , tn)) = fA([α]A(t1), . . . , [α]A(tn)) for f ∈ F<br />
166
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
A well-founded monotone F-algebra is a pair (A, >) where A is an F-algebra and > is<br />
a well-founded proper order such that for every function symbol f ∈ F, fA is monotone<br />
with respect to >. It is compatible with a TRS R if for every rewrite rule l → r in R<br />
and every assignment α, [α]A(l) > [α]A(r) holds. It is a well-known fact that a TRS R<br />
is terminating if and only if <strong>the</strong>re exists a well-founded monotone algebra that is compatible<br />
with R. A polynomial interpretation (Lankford, 1979) is an interpretation into a<br />
well-founded monotone algebra (A, >) such that A ⊆ N, > is <strong>the</strong> standard order on <strong>the</strong><br />
natural numbers, and fA is a polynomial for every function symbol f. If a polynomial<br />
interpretation is compatible with a TRS R, <strong>the</strong>n we clearly have dlR(t) � [α]A(t) for all<br />
terms t.<br />
Example 1. Consider <strong>the</strong> TRS R with <strong>the</strong> following rewrite rules over <strong>the</strong> signature containing<br />
<strong>the</strong> function symbols 0 (arity 0), s (arity 1), + and - (arity 2). The system is<br />
example SK90/2.11.trs in <strong>the</strong> termination problems database 1 (TPDB), which is <strong>the</strong><br />
standard benchmark for termination provers:<br />
+(0, y) → y -(0, y) → 0 -(s(x), s(y)) → -(x, y)<br />
+(s(x), y) → s(+(x, y)) -(x, 0) → x<br />
The following interpretation functions build a compatible polynomial interpretation A<br />
over <strong>the</strong> carrier N:<br />
+A(x, y) = 2x + y -A(x, y) = 3x + 3y sA(x) = x + 2 0A = 1<br />
A strongly linear interpretation is a polynomial interpretation such that every interpretation<br />
function fA has <strong>the</strong> form fA(x1, . . . , xn) = �n i=1 xi + c, c ∈ N. A surprisingly<br />
simple property is that compatibility with a strongly linear interpretation induces a linear<br />
upper bound on <strong>the</strong> derivational complexity (Schnabl, 2007).<br />
A linear polynomial interpretation is a polynomial interpretation where each interpretation<br />
function fA has <strong>the</strong> shape fA(x1, . . . , xn) = �n i=1 aixi + c, ai ∈ N, c ∈ N.<br />
For instance, <strong>the</strong> interpretation given in Example 1 is a linear polynomial interpretation.<br />
Because <strong>of</strong> <strong>the</strong>ir simplicity, this class <strong>of</strong> polynomial interpretations is <strong>the</strong> one most commonly<br />
used in automatic termination provers. As illustrated by Example 2 below, if only<br />
a single one <strong>of</strong> <strong>the</strong> coefficients ai in any <strong>of</strong> <strong>the</strong> functions fA is greater than 1, <strong>the</strong>re might<br />
already exist derivations whose length is exponential in <strong>the</strong> size <strong>of</strong> <strong>the</strong> starting term.<br />
Example 2. Consider <strong>the</strong> TRS S with <strong>the</strong> following single rule over <strong>the</strong> signature containing<br />
<strong>the</strong> function symbols a, b (arity 1), and c (arity 0). The system is example<br />
SK90/2.50.trs in <strong>the</strong> TPDB:<br />
a(b(x)) → b(b(a(x)))<br />
The following interpretation functions build a compatible linear polynomial interpretation<br />
A over N:<br />
aA(x) = 2x bA(x) = x + 1 cA = 0<br />
If we start a rewrite sequence from <strong>the</strong> term an (b(c)), we reach <strong>the</strong> normal form b2n(an (c))<br />
after 2n − 1 rewriting steps. Therefore, <strong>the</strong> derivational complexity <strong>of</strong> S is at least exponential.<br />
1 http://www.lri.fr/˜marche/tpdb/.<br />
167
3.2 Context Dependent Interpretations<br />
Even though polynomial interpretations provide an easy way to obtain an upper bound<br />
on <strong>the</strong> derivational complexity <strong>of</strong> a TRS, <strong>the</strong>y are not very suitable for proving polynomial<br />
derivational complexity. Strongly linear interpretations only capture linear derivational<br />
complexity, but even a slight generalization admits already examples <strong>of</strong> exponential<br />
derivational complexity, as illustrated by Example 2. In (H<strong>of</strong>bauer, 2001), context dependent<br />
interpretations are introduced. They use an additional parameter (usually denoted<br />
by ∆) in <strong>the</strong> interpretation functions, which changes in <strong>the</strong> course <strong>of</strong> evaluating <strong>the</strong> interpretation<br />
<strong>of</strong> a term, thus making <strong>the</strong> interpretation dependent on <strong>the</strong> context. This way <strong>of</strong><br />
computing interpretations also allows us to bridge <strong>the</strong> gap between linear and polynomial<br />
derivational complexity.<br />
Definition 3. A context-dependent interpretation C for some signature F consists <strong>of</strong> functions<br />
{fC[∆] : (R + 0 ) n → R + 0 | f ∈ F, n = arity(f), ∆ ∈ R + } and {f i C : R+ → R + | f ∈<br />
F, i ∈ {1, . . . , arity(f)}}. Given a ∆-assignment α : R + × V → R + 0 , <strong>the</strong> evaluation <strong>of</strong> a<br />
term t by C is denoted by [α, ∆]C(t). It is defined inductively as follows:<br />
[α, ∆]C(x) = α(∆, x) for x ∈ V<br />
[α, ∆]C(f(t1, . . . , tn)) = fC[∆]([α, f 1 C (∆)]C(t1), . . . , [α, f n C (∆)]C(tn)) for f ∈ F<br />
Definition 4. For each ∆ ∈ R + , let >∆ be <strong>the</strong> order defined by a >∆ b ⇐⇒ a − b � ∆.<br />
A context-dependent interpretation C is compatible with a TRS R if for all rewrite rules<br />
l → r in R, all ∆ ∈ R + , and every ∆-assignment α, we have [α, ∆]C(l) >∆ [α, ∆]C(r).<br />
Definition 5. A ∆-linear interpretation is a context dependent interpretation C whose<br />
interpretation functions have <strong>the</strong> form<br />
fC[∆](z1, . . . , zn) =<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
n�<br />
n�<br />
a(f,i)zi + b(f,i)zi∆ + cf∆ + df f<br />
i=1<br />
i=1<br />
i C(∆) =<br />
a(f,i) + b(f,i)∆<br />
with a(f,i), b(f,i), cf, df ∈ N, a(f,i) + b(f,i) �= 0 for all f ∈ F, 1 � i � n. If we have<br />
a(f,i) ∈ {0, 1} for all f, i, we also call it a ∆-restricted interpretation<br />
We consider ∆-linear interpretations because <strong>of</strong> <strong>the</strong> similarity between <strong>the</strong> functions<br />
fC[∆] and <strong>the</strong> interpretation functions <strong>of</strong> linear polynomial interpretations. Ano<strong>the</strong>r point<br />
<strong>of</strong> interest is that <strong>the</strong> simple syntactical restriction to ∆-restricted interpretations yields a<br />
quadratic upper bound on <strong>the</strong> derivational complexity. Moreover, because <strong>of</strong> <strong>the</strong> special<br />
shape <strong>of</strong> ∆-linear interpretations, we need no additional monotonicity criterion for our<br />
main <strong>the</strong>orems:<br />
Theorem 6 ((Moser and Schnabl, 2008)). Let R be a TRS and suppose that <strong>the</strong>re exists<br />
a compatible ∆-linear interpretation. Then R is terminating and dcR(n) = 2 O(n) .<br />
Theorem 7 ((Schnabl, 2007)). Let R be a TRS and suppose that <strong>the</strong>re exists a compatible<br />
∆-restricted interpretation. Then R is terminating and dcR(n) = O(n 2 ).<br />
168<br />
∆
Example 8. Consider <strong>the</strong> TRS given in Example 1 again. A compatible ∆-restricted (and<br />
∆-linear) interpretation C is built from <strong>the</strong> following interpretation functions:<br />
+C[∆](x, y) = (1 + ∆)x + y + ∆ + 1 C(∆) = ∆<br />
1 + ∆<br />
+ 2 C(∆) = ∆<br />
-C[∆](x, y) = x + y + ∆ - 1 C(∆) = ∆ − 2 C(∆) = ∆<br />
sC[∆](x) = x + ∆ + 1 s 1 C(∆) = ∆ 0C[∆] = 0<br />
Note that this interpretation gives a quadratic upper bound on <strong>the</strong> derivational complexity.<br />
However, from <strong>the</strong> polynomial interpretation given in Example 1, we can only infer an exponential<br />
upper bound (H<strong>of</strong>bauer and Lautemann, 1989). Consider <strong>the</strong> term Pn,n, where<br />
we define P0,n = s n (0) and Pm+1,n = +(Pm,n, 0). We have |Pn,n| = 3n + 1. For every<br />
m, n ∈ N, Pm+1,n rewrites to Pm,n in n + 1 steps. Therefore, Pn,n reaches its normal form<br />
s n (0) after n(n + 1) rewriting steps. Hence, <strong>the</strong> derivational complexity is also Ω(n 2 ) for<br />
this example, so <strong>the</strong> inferred bound O(n 2 ) is tight.<br />
4 Implementation<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
cdiprover3 is written fully in OCaml 2 . It employs <strong>the</strong> libraries <strong>of</strong> <strong>the</strong> termination<br />
prover TTT2 3 . From <strong>the</strong>se libraries, functionality for handling TRSs and SAT encodings,<br />
and an interface to <strong>the</strong> SAT solver MiniSAT 4 are used. Without counting this, <strong>the</strong> tool<br />
consists <strong>of</strong> about 1700 lines <strong>of</strong> OCaml code. About 25% <strong>of</strong> that code are devoted to<br />
<strong>the</strong> manipulation <strong>of</strong> polynomials and extensions <strong>of</strong> polynomials that stem from our use<br />
<strong>of</strong> <strong>the</strong> parameter ∆. Ano<strong>the</strong>r 35% are used for constructing parametric interpretations<br />
and building suitable Diophantine constraints (see below) which enforce <strong>the</strong> necessary<br />
conditions for termination. Using TTT2’s library for propositional logic and its interface<br />
to MiniSAT, 15% <strong>of</strong> <strong>the</strong> code deal with encoding Diophantine constraints into SAT. The<br />
remaining code is used for parsing input options and <strong>the</strong> given TRS, generating output,<br />
and controlling <strong>the</strong> program flow.<br />
In order to find polynomial interpretations automatically, Diophantine constraints are<br />
generated according to <strong>the</strong> procedure described in (Contejean, Marché, Tomás and Urbain,<br />
2005). Putting an upper bound on <strong>the</strong> coefficients makes <strong>the</strong> problem finite. Essentially<br />
following (Fuhs, Giesl, Middeldorp, Schneider-Kamp, Thiemann and Zankl, 2007),<br />
we <strong>the</strong>n encode <strong>the</strong> (finite domain) constraints into a propositional satisfiability problem.<br />
This problem is given to MiniSAT. From a satisfying assignment for <strong>the</strong> SAT problem,<br />
we construct a polynomial interpretation which is monotone and compatible with <strong>the</strong><br />
given TRS.<br />
This procedure is also <strong>the</strong> basis <strong>of</strong> <strong>the</strong> automatic search for ∆-linear and ∆-restricted<br />
interpretations. The starting point <strong>of</strong> that search is an interpretation with uninstantiated<br />
coefficients. If we want to be able to apply Theorem 6 or 7, we need to find coefficients<br />
which make <strong>the</strong> resulting interpretation compatible with <strong>the</strong> given TRS. Fur<strong>the</strong>rmore,<br />
we need to make sure that no divisions by zero occur in <strong>the</strong> interpretation functions.<br />
Again, we encode <strong>the</strong>se properties into Diophantine constraints on <strong>the</strong> coefficients <strong>of</strong> a<br />
∆-linear or ∆-restricted interpretation. The encoding is an adaptation <strong>of</strong> <strong>the</strong> procedure in<br />
2 http://caml.inria.fr.<br />
3 http://colo6-c703.uibk.ac.at/ttt2.<br />
4 http://minisat.se.<br />
169
Table 1: Performance <strong>of</strong> cdiprover3<br />
Method SL SL+∆-restricted ∆-linear ∆-restricted<br />
-i -b X 31 31 31 3 7 15 31<br />
# success 41 87 83 83 86 86 86<br />
average success time 20 3010 5527 3652 4041 4008 3986<br />
# timeout 0 237 797 144 189 221 238<br />
(Contejean et al., 2005) to context-dependent interpretations. It is described in detail in<br />
(Schnabl, 2007; Moser and Schnabl, 2008). Once we have built <strong>the</strong> constraints, we continue<br />
using <strong>the</strong> same techniques as for searching polynomial interpretations: we encode<br />
<strong>the</strong> constraints in a propositional satisfiability problem, apply <strong>the</strong> SAT solver, and use a<br />
satisfying assignment to construct a context-dependent interpretation.<br />
Table 1 shows experimental results <strong>of</strong> applying cdiprover3 on <strong>the</strong> 957 known terminating<br />
examples <strong>of</strong> <strong>the</strong> TPDB. The tests were performed single-threaded on a 2.40 GHz<br />
Intel R○ CoreTM 2 Duo with 2 GB <strong>of</strong> memory. For each system, cdiprover3 was given<br />
a timeout <strong>of</strong> 60 seconds. All times in <strong>the</strong> table are given in milliseconds. The method<br />
SL denotes strongly linear interpretations. In all tests, we called cdiprover3 with <strong>the</strong><br />
options -i -b X (see Section 5 below), where X is specified in <strong>the</strong> second row <strong>of</strong> <strong>the</strong><br />
table. As we can see, cdiprover3 is currently able to prove polynomial derivational<br />
complexity for 87 <strong>of</strong> <strong>the</strong> 368 known terminating non-duplicating rewrite systems <strong>of</strong> <strong>the</strong><br />
TPDB (duplicating rewrite systems have at least exponential derivational complexity, so<br />
this restriction is harmless here). The results indicate that an upper bound <strong>of</strong> 7 on <strong>the</strong> coefficient<br />
variables suffices to capture all examples on our test set. Therefore, 3 and 7 seem<br />
to be good candidates for default values <strong>of</strong> <strong>the</strong> -b option. However, it should be noted<br />
that our handling <strong>of</strong> <strong>the</strong> divisions introduced by <strong>the</strong> functions f i C<br />
is computationally ra<strong>the</strong>r<br />
expensive, which is indicated by <strong>the</strong> number <strong>of</strong> timeouts and <strong>the</strong> average time needed<br />
for successful pro<strong>of</strong>s. This also explains <strong>the</strong> slight decrease in performance when we<br />
extend <strong>the</strong> search space to ∆-linear interpretations. However, <strong>the</strong>re is one system which<br />
can be handled by ∆-linear interpretations, but not by ∆-simple interpretations: system<br />
SK90/2.50 in <strong>the</strong> TPDB, which we mentioned in Example 2.<br />
5 Using cdiprover3<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
cdiprover3 is called from command line. The basic usage pattern for cdiprover3<br />
is<br />
$ ./cdiprover3 <br />
• specifies <strong>the</strong> maximum number <strong>of</strong> seconds until cdiprover3 stops<br />
looking for a suitable interpretation.<br />
• specifies <strong>the</strong> path to <strong>the</strong> file which contains <strong>the</strong> considered TRS.<br />
• For , <strong>the</strong> following switches are available:<br />
-c defines <strong>the</strong> desired subclass <strong>of</strong> <strong>the</strong> searched polynomial or contextdependent<br />
interpretation. The following values <strong>of</strong> are legal:<br />
170
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
linear, simple, simplemixed, quadratic These classes correspond to <strong>the</strong> respective<br />
subclasses <strong>of</strong> polynomial interpretations, as defined in (Steinbach,<br />
1992). Linear polynomial interpretations imply an exponential upper<br />
bound on <strong>the</strong> derivational complexity. The o<strong>the</strong>r classes imply a double<br />
exponential upper bound, cf. (H<strong>of</strong>bauer and Lautemann, 1989).<br />
pizerolinear, pizerosimple, pizerosimplemixed, pizeroquadratic For <strong>the</strong>se<br />
values, cdiprover3 tries to find a polynomial interpretation with <strong>the</strong><br />
following restrictions: defined function symbols are interpreted by linear,<br />
simple, simple-mixed, or quadratic polynomials, respectively. Constructors<br />
are interpreted by strongly linear polynomials. These interpretations<br />
guarantee that <strong>the</strong> derivation length <strong>of</strong> all constructor based terms is polynomial<br />
(Bonfante et al., n.d.).<br />
sli This option corresponds to strongly linear interpretations. As mentioned<br />
in Section 3, <strong>the</strong>y induce a linear upper bound on <strong>the</strong> derivational complexity<br />
<strong>of</strong> a compatible TRS.<br />
deltalinear This value specifies that <strong>the</strong> tool should search for a ∆-linear<br />
interpretation. By Theorem 6, compatibility with such an interpretation<br />
implies an exponential upper bound on <strong>the</strong> derivational complexity.<br />
deltarestricted This option corresponds to ∆-restricted interpretations. By<br />
Theorem 7, <strong>the</strong>y induce a quadratic upper bound.<br />
-b sets <strong>the</strong> upper bound for <strong>the</strong> coefficient variables. The default value<br />
for this bound is 3.<br />
-i This switch activates an incremental strategy for handling <strong>the</strong> upper bound on<br />
<strong>the</strong> coefficient variables. First, cdiprover3 tries to find a solution using<br />
an intermediate upper bound <strong>of</strong> 1 (which corresponds to encoding each coefficient<br />
variable by one bit). Whenever <strong>the</strong> tool fails to find a pro<strong>of</strong> for some<br />
upper bound b, it is checked whe<strong>the</strong>r b is equal to <strong>the</strong> bound specified by <strong>the</strong><br />
-b option. If that is <strong>the</strong> case, <strong>the</strong>n <strong>the</strong> search for a pro<strong>of</strong> is given up. O<strong>the</strong>rwise,<br />
b is set to <strong>the</strong> minimum <strong>of</strong> <strong>the</strong> bound specified by <strong>the</strong> -b option and<br />
2(b+1)−1 (which corresponds to increasing <strong>the</strong> number <strong>of</strong> bits used for each<br />
coefficient variable by 1).<br />
If <strong>the</strong> -c switch is not specified, <strong>the</strong>n <strong>the</strong> standard strategy for proving polynomial<br />
derivational complexity is employed. First, cdiprover3 looks for a strongly linear<br />
interpretation. If that is not successful, <strong>the</strong>n a suitable ∆-restricted interpretation is<br />
searched. The input TRS files are expected to have <strong>the</strong> same format as <strong>the</strong> files in <strong>the</strong><br />
TPDB. The format specification for this database is available at http://www.lri.<br />
fr/˜marche/tpdb/format.html.<br />
The output given by cdiprover3, as exemplified by Example 9, is structured as<br />
follows. The first line contains a short answer to <strong>the</strong> question whe<strong>the</strong>r <strong>the</strong> given TRS<br />
is terminating: YES, MAYBE, or TIMEOUT. The latter means that cdiprover3 was<br />
still busy after <strong>the</strong> specified timeout. MAYBE means that a termination pro<strong>of</strong> could not<br />
be found, and cdiprover3 gave up before time ran out. The answer YES indicates<br />
that an interpretation <strong>of</strong> <strong>the</strong> given class has been found which guarantees termination <strong>of</strong><br />
<strong>the</strong> given TRS. It is followed by <strong>the</strong> inferred bound on <strong>the</strong> derivational complexity and a<br />
171
listing <strong>of</strong> <strong>the</strong> interpretation functions. After <strong>the</strong> interpretation functions, <strong>the</strong> elapsed time<br />
between <strong>the</strong> call <strong>of</strong> cdiprover3 and <strong>the</strong> output <strong>of</strong> <strong>the</strong> pro<strong>of</strong> is given. In all cases, <strong>the</strong><br />
answer is concluded by statistics stating <strong>the</strong> total number <strong>of</strong> monomials in <strong>the</strong> constructed<br />
Diophantine constraints, and <strong>the</strong> upper bound for <strong>the</strong> coefficients that was used in <strong>the</strong> last<br />
call to MiniSAT.<br />
Example 9. Given <strong>the</strong> TRS shown in Example 1, cdiprover3 produces <strong>the</strong> output<br />
shown in Figure 1. The interpretations in Example 8 and in <strong>the</strong> output are equivalent.<br />
Note that <strong>the</strong> parameter ∆ in <strong>the</strong> interpretation functions fC[∆] is treated like ano<strong>the</strong>r<br />
argument <strong>of</strong> <strong>the</strong> function. The interpretation functions f i C are represented by f tau i in <strong>the</strong><br />
output.<br />
6 Conclusion<br />
In this paper, we have presented <strong>the</strong> (as far as we know) first tool which is specifically<br />
designed for automatically proving polynomial derivational complexity <strong>of</strong> term rewriting.<br />
We have also given a brief introduction into <strong>the</strong> applied pro<strong>of</strong> methods. With our current<br />
implementation, we are able to prove polynomial derivational complexity for 87 <strong>of</strong><br />
<strong>the</strong> 368 known terminating non-duplicating rewrite systems <strong>of</strong> <strong>the</strong> TPDB. By adding new<br />
termination methods to our tool which can prove polynomial derivational complexity <strong>of</strong><br />
rewrite systems, we could extend <strong>the</strong> range <strong>of</strong> problems that <strong>the</strong> prover can solve. The<br />
matchbounds technique comes to mind here, which induces a linear upper bound on <strong>the</strong><br />
derivational complexity <strong>of</strong> <strong>the</strong> considered system (Geser, H<strong>of</strong>bauer, Waldmann and Zantema,<br />
2007; Korp and Middeldorp, 2007). Ano<strong>the</strong>r avenue for future work is <strong>the</strong> search for<br />
o<strong>the</strong>r subclasses <strong>of</strong> context-dependent interpretations which imply non-quadratic and nonlinear,<br />
but polynomial upper bounds on <strong>the</strong> derivational complexity. A fur<strong>the</strong>r possibility<br />
would be to find more efficient ways <strong>of</strong> handling <strong>the</strong> divisions introduced by <strong>the</strong> functions<br />
f i C . Results in this area would help to fur<strong>the</strong>r improve <strong>the</strong> power <strong>of</strong> cdiprover3.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Avanzini, M. and Moser, G. (2008). Complexity analysis by rewriting, Proc. 9th FLOPS,<br />
Vol. 4989 <strong>of</strong> LNCS, pp. <strong>13</strong>0–146.<br />
Baader, F. and Nipkow, T. (1998). Term Rewriting and All That, Cambridge University<br />
Press.<br />
Bonfante, G., Cichon, A., Marion, J.-Y. and Touzet, H. (n.d.). Algorithms with polynomial<br />
interpretation termination pro<strong>of</strong>, J. Funct. Program. (1): 33–53.<br />
Bonfante, G., Marion, J.-Y. and Péchoux, R. (2007). Quasi-interpretation syn<strong>the</strong>sis by<br />
decomposition, Proc. 4th ICTAC, Vol. 4711 <strong>of</strong> LNCS, pp. 410–424.<br />
Contejean, E., Marché, C., Tomás, A. P. and Urbain, X. (2005). Mechanically proving<br />
termination using polynomial interpretations., J. Autom. Reason. 34(4): 325–363.<br />
Fuhs, C., Giesl, J., Middeldorp, A., Schneider-Kamp, P., Thiemann, R. and Zankl, H.<br />
(2007). SAT solving for termination analysis with polynomial interpretations, Proc.<br />
SAT 2007, Vol. 4501 <strong>of</strong> LNCS, pp. 340–354.<br />
172
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Geser, A., H<strong>of</strong>bauer, D., Waldmann, J. and Zantema, H. (2007). On tree automata that<br />
certify termination <strong>of</strong> left-linear term rewriting systems, Inf. Comput. 205(4): 512–<br />
534.<br />
H<strong>of</strong>bauer, D. (2001). Termination pro<strong>of</strong>s by context-dependent interpretations, Proc. 12th<br />
RTA, Vol. 2051 <strong>of</strong> LNCS, pp. 108–121.<br />
H<strong>of</strong>bauer, D. and Lautemann, C. (1989). Termination pro<strong>of</strong>s and <strong>the</strong> length <strong>of</strong> derivations,<br />
Proc. 3rd RTA, Vol. 355 <strong>of</strong> LNCS, pp. 167–177.<br />
Korp, M. and Middeldorp, A. (2007). Proving termination <strong>of</strong> rewrite systems using<br />
bounds, Proc. 18th RTA, Vol. 4533 <strong>of</strong> LNCS, pp. 273–287.<br />
Lankford, D. (1979). On proving term-rewriting systems are noe<strong>the</strong>rian, Technical Report<br />
MTP-2, Math. Dept., Louisiana Tech. University.<br />
Lescanne, P. (1995). Termination <strong>of</strong> rewrite systems by elementary interpretations, Formal<br />
Aspects <strong>of</strong> Computing 7(1): 77–90.<br />
Marion, J.-Y. (2003). Analysing <strong>the</strong> implicit complexity <strong>of</strong> programs, Inf. Comput.<br />
183(1): 2–18.<br />
Moser, G. and Schnabl, A. (2008). Proving quadratic derivational complexities using<br />
context dependent interpretations, Proc. 19th RTA. Accepted for publication.<br />
Schnabl, A. (2007). Context Dependent Interpretations 5 , Master’s <strong>the</strong>sis, Universität<br />
Innsbruck.<br />
Steinbach, J. (1992). Proving polynomials positive, Proc. 12th FSTTCS, Vol. 652 <strong>of</strong><br />
LNCS, pp. 191–202.<br />
TeReSe (2003). Term Rewriting Systems, Vol. 55 <strong>of</strong> Cambridge Tracts in Theoretical<br />
Computer Science, Cambridge University Press.<br />
5 Available online at http://cl-informatik.uibk.ac.at/˜aschnabl/<br />
173
Figure 1: Output produced by cdiprover3.<br />
$ cat tpdb-4.0/TRS/SK90/2.11.trs<br />
(VAR x y)<br />
(RULES<br />
+(0,y) -> y<br />
+(s(x),y) -> s(+(x,y))<br />
-(0,y) -> 0<br />
-(x,0) -> x<br />
-(s(x),s(y)) -> -(x,y)<br />
)<br />
(COMMENT Example 2.11 (Addition and Subtraction) in \cite{SK90})<br />
$ ./cdiprover3 -i tpdb-4.0/TRS/SK90/2.11.trs 60<br />
YES<br />
QUADRATIC upper bound on <strong>the</strong> derivational complexity<br />
This TRS is terminating using <strong>the</strong> deltarestricted interpretation<br />
-(delta, X1, X0) = + 1*X0 + 1*X1 + 0 + 0*X0*delta + 0*X1*delta + 1*delta<br />
s(delta, X0) = + 1*X0 + 1 + 0*X0*delta + 1*delta<br />
0(delta) = + 0 + 0*delta<br />
+(delta, X1, X0) = + 1*X0 + 1*X1 + 0 + 0*X0*delta + 1*X1*delta + 1*delta<br />
- tau 1(delta) = delta/(1 + 0 * delta)<br />
- tau 2(delta) = delta/(1 + 0 * delta)<br />
s tau 1(delta) = delta/(1 + 0 * delta)<br />
+ tau 1(delta) = delta/(1 + 1 * delta)<br />
+ tau 2(delta) = delta/(1 + 0 * delta)<br />
Time: 0.024418 seconds<br />
Statistics:<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Number <strong>of</strong> monomials: 187<br />
Last formula building started for bound 1<br />
Last SAT solving started for bound 1<br />
174
THE RANK(S) OF A TOTALLY LEXICALIST SYNTAX<br />
Éva Szilágyi<br />
University <strong>of</strong> Pécs<br />
Abstract. Our project works on <strong>the</strong> implementation <strong>of</strong> a totally lexicalist grammar. Now<br />
syntax has been worked out, which in this approach is like a dependency grammar, but word<br />
order is handled. In harmony with <strong>the</strong> idea <strong>of</strong> total lexicalism, no PS-trees (nor transformation)<br />
exist. We use rank parameters, close to Optimality Theory for expressing word order<br />
variations in a language. A special kind <strong>of</strong> rank parameters account for Hungarian focus phenomena,<br />
which makes radical surface changes in word order (beyond intonational effects).<br />
The system is implemented in a relational database (SQL).<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Predicates are seeking <strong>the</strong>ir arguments in every language <strong>of</strong> <strong>the</strong> world, and adjuncts are<br />
seeking <strong>the</strong>ir joining points too. We claim that only 8-10 operations work in languages,<br />
but <strong>the</strong>ir effectiveness is different. This can be ordered by rank parameters: a universal<br />
tool (as in Optimality Theory (Archangeli and Langendoen, 1997)) with language-specific<br />
settings. Our project aims to develop an MT system based on GASG (Generalized Argument<br />
Structure Grammar), a totally lexicalist <strong>the</strong>ory (Alberti, 1999). We are linguists<br />
basically, so our high-priority goal is linguistic. Lexicalist <strong>the</strong>ories are successful nowadays,<br />
and we aim to try out this extremity <strong>of</strong> lexicalism both <strong>the</strong>oretically and practically.<br />
For us this is more important than effectiveness in size, speed or time.<br />
The lexicon is in a relational database. The essence <strong>of</strong> relational databases is in <strong>the</strong><br />
definition <strong>of</strong> relations. Relations describe facts and contribute <strong>the</strong> database as well. Each<br />
entity is an n-tuple: <strong>the</strong> elements <strong>of</strong> <strong>the</strong> tuples are in a relation contributing a record.<br />
The elements are attributes contributing <strong>the</strong> fields <strong>of</strong> a record. A relation is a table<br />
in <strong>the</strong> database, where each row (record) is an n-tuple and each column is an attribute<br />
(Halassy, 1994). We chose Micros<strong>of</strong>t SQL 2005 for our implementation, so we have a<br />
complete and complex database management frame system.<br />
A morphophonological component has been transferred from our former project. Now<br />
rules <strong>of</strong> syntax are being built in. The main component will be <strong>the</strong> semantic component:<br />
<strong>the</strong> implementation <strong>of</strong> <strong>the</strong> DRT-based (Kamp, van Genabith and Reyle, 2004) (Asher and<br />
Lascarides, 2003) ReALIS dynamical semantic system (Alberti, 2005).<br />
GASG is a monostratal declarative grammar which is considered to be ”totally lexicalist”.<br />
Total lexicalism means that all information is in <strong>the</strong> description <strong>of</strong> <strong>the</strong> lexical<br />
items, and unification exclusively moves <strong>the</strong> combining <strong>of</strong> lexical elements. Thus, it can<br />
be considered as a modified unificational categorial grammar (even function application<br />
is omitted). It carries on radical lexicalism, introduced by (Karttunen, 1986), which states<br />
that if <strong>the</strong> lexicon is properly rich, <strong>the</strong>n sentences so can be produced by unification that<br />
phrase-structure is practically redundant, besides, it goes to false ambiguities. Works<br />
in computational linguistics (for example (Schneider, 2005)) also come to <strong>the</strong> point that<br />
175
educing phrase-structure could be useful. Many applications lean on phrase-structure,<br />
because o<strong>the</strong>rwise a dependency grammar, without restricting word-order, is not effecitve<br />
in computation. GASG accounts for word-order by rank parameters, so giving up phrasestructure<br />
does not result in exponential running time <strong>of</strong> <strong>the</strong> analyzing algorhythm.<br />
Thus, ’rules’ mentioned above are not really rules, but properties which can be unified.<br />
Requested arguments and <strong>the</strong>ir realizations are properties, too. Word order requirements<br />
are also properties: requirements with different strength. Our grammar model uses rank<br />
parameters for expressing word order, so this means that a requirement can not only be<br />
completed or violated, but it can compete with (partially) incompatible requirements.<br />
A special variant <strong>of</strong> <strong>the</strong>se rank parameters also expresses those cases where focus (or<br />
ano<strong>the</strong>r operator) is ”re-ordering” word order (compared to a neutral sentence). In written<br />
Hungarian sentences <strong>the</strong>re is no o<strong>the</strong>r sign <strong>of</strong> focus (in spoken sentences <strong>the</strong>re is emphasis<br />
as well).<br />
2 Rank parameters<br />
Primitive syntactic relations (like being before or after each o<strong>the</strong>r) can be considered as a<br />
direct preceding requirement in <strong>the</strong> description <strong>of</strong> <strong>the</strong> lexical item. This is because if an<br />
element is in relationship with a head, it wants to be its neighbour. To give a short example<br />
in Hungarian: a definite article needs a noun immediately after itself (1a). If an adjective<br />
is <strong>the</strong>re, it needs <strong>the</strong> noun being immediately after itself as well (1b). If this noun has a<br />
possessive suffix, <strong>the</strong> suffix wants <strong>the</strong> possessor between <strong>the</strong> article and <strong>the</strong> adjective (1c).<br />
Ano<strong>the</strong>r adjective, expressing nationality has to be before <strong>the</strong> noun (1d). Both adjectives<br />
cannot precede <strong>the</strong> noun: nationality gets priority in this case. Since sentences are linear,<br />
a head has only two neighbours <strong>the</strong>oretically. And practically languages usually pick <strong>the</strong>ir<br />
complements from one direction.<br />
(1) a. a tanárom<br />
<strong>the</strong> teacher-Poss1Sg<br />
’my teacher’<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
b. az okos tanárom<br />
<strong>the</strong> clever teacher-Poss1Sg<br />
’my clever teacher’<br />
c. az én okos tanárom / *az okos én tanárom<br />
<strong>the</strong> I clever teacher-Poss1Sg / <strong>the</strong> clever I teacher-Poss1Sg<br />
’my clever teacher’<br />
d. az én okos magyar tanárom<br />
<strong>the</strong> I clever Hungarian teacher-Poss1Sg<br />
’my clever Hungarian teacher’<br />
These relations can be expressed by a parameter, called rank parameter, a number<br />
expressing that two lexical items need to be that close to each o<strong>the</strong>r to express <strong>the</strong> relationship<br />
between <strong>the</strong>m. So now we can calculate how a requirement can be satisfied<br />
indirectly (or partially). In <strong>the</strong> case <strong>of</strong> (1a) and as for <strong>the</strong> nationality adjective (1d) it is<br />
regarded as <strong>the</strong> direct satisfaction <strong>of</strong> a requirement. The requirement <strong>of</strong> <strong>the</strong> article in (1b)<br />
or <strong>the</strong> adjective in (1d) is an indirect satisfaction.<br />
176
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Figure 1: Indirect satisfaction in (1d).<br />
Rank parameters show in which direction <strong>the</strong> satisfying word should be. It is expressed<br />
by a character. It can be a, b or c, referring to a following or a previous position or both.<br />
We differentiate two types <strong>of</strong> rank parameters based on <strong>the</strong> way <strong>of</strong> satisfying requirements.<br />
Recessive rank parameters (r) give neighbourhood relations (as in (1a-d)), and<br />
<strong>the</strong>y are satisfied ei<strong>the</strong>r if <strong>the</strong>y are adjacent immediately or ano<strong>the</strong>r element with stronger<br />
(a smaller number) rank is wedged in 1 . In Figure 1. <strong>the</strong> 5 strength requirement <strong>of</strong> <strong>the</strong> determinant<br />
az ’<strong>the</strong>’ to <strong>the</strong> noun tanárom ’teacher-Poss1S’ is satisfied. This case is a partial<br />
or indirect satisfaction (Alberti, 1999) (see fur<strong>the</strong>r examples in (6-7)). From conflicting<br />
dominant rank parameters (d) only <strong>the</strong> strongest one can be satisfied, all o<strong>the</strong>rs are deleted<br />
(see section 6).<br />
Dominant parameters come language-specifically from ei<strong>the</strong>r syntax or semantics. For<br />
example, in Hungarian <strong>the</strong> subject <strong>of</strong> a sentence precedes <strong>the</strong> verb by a dominant semantic<br />
rank parameter, and in no language it is morpheme-marked (thus, it is not a separate<br />
lexical item). In contrary, <strong>the</strong> subject obligately precedes <strong>the</strong> verb in English, even if it is<br />
semantically empty. Dominant parameters also play an important part in <strong>the</strong> Hungarian<br />
focus phenomena (see examples (6-11)).<br />
3 Predicates and arguments, heads and complements<br />
Argument structures are considered as entities. Their elements are given by a stock table<br />
<strong>of</strong> argument types. Therefore, an argument is formed by a relationship between <strong>the</strong> argument<br />
structure and an argument type. For example, <strong>the</strong> Hungarian verb lakik ’live’ has<br />
two arguments: <strong>the</strong> one who lives somewhere and <strong>the</strong> place where <strong>the</strong> one lives.<br />
Argument types are described by a number parameter which places <strong>the</strong> argument in<br />
a scale <strong>of</strong> being agentive or patient-like. Those types which are not in <strong>the</strong> central frame<br />
which describes relations between subjects and objects get a neutral parameter.<br />
In Hungarian we consider nominal parts <strong>of</strong> speech as <strong>the</strong>y have more than one argument<br />
structure: <strong>the</strong>y can be arguments <strong>the</strong>mselves as <strong>the</strong>ir basic – in most <strong>of</strong> <strong>the</strong> languages<br />
<strong>the</strong> only one – role (2a), or can be nominal predicates, too, because <strong>the</strong> copula is phonetically<br />
null in Hungarian in present tense third person singular (2b). And we count <strong>the</strong> short<br />
possessive form here, which searches for a possessive suffix (2c).<br />
(2) a. Péter Budapesten lakik.<br />
Peter-NOM Budapest-SUPERESS live-3Sg<br />
’Peter lives in Budapest.’<br />
1 Wedging in has perceptional limits.<br />
177
. Annak a fiúnak a neve Péter.<br />
That-DAT <strong>the</strong> boy-DAT <strong>the</strong> name-Poss3Sg Peter-NOM<br />
’That boy’s name is Peter.’<br />
c. Péter kalapja.<br />
Peter-NOM hat-Poss3Sg<br />
’Peter’s hat’<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
We store <strong>the</strong> required complements <strong>the</strong> same way: <strong>the</strong>re is a case frame, where: <strong>the</strong><br />
word ’case’ now has an extended meaning, we record here all forms like infinitive or<br />
postpositional phrases, just like constant phrases to which a case-suffixed word form (3a)<br />
can be switched (3b). Therefore, cases are stored as a relationship between <strong>the</strong> case frame<br />
and a case type.<br />
(3) a. Péter elárult pár dolgot Mariról.<br />
Peter-NOM disclose-Past3Sg couple thing-ACC Mary-DELAT<br />
’Peter disclosed a couple <strong>of</strong> things about Mary.’<br />
b. Péter elárult pár dolgot Marival kapcsolatban.<br />
Peter-NOM disclose-Past3Sg couple thing-ACC Mary-INS relation-INESS<br />
’Peter disclosed a couple <strong>of</strong> things about Mary/related to Mary.’<br />
Sometimes <strong>the</strong> lexical item does not select a certain case for its argument. The verb<br />
lakik ’live’ has two cases for its arguments: <strong>the</strong> former one gets <strong>the</strong> nominative case,<br />
by <strong>the</strong> linkage between <strong>the</strong> argument and <strong>the</strong> case. The o<strong>the</strong>r one is a joker type: ’not<br />
specified’. The lack <strong>of</strong> <strong>the</strong> filled argument may cause a non-grammatical sentence, even<br />
though at this point we do not know <strong>the</strong> exact case (case type) it is realized as. Therefore,<br />
argument types and case types can be linked, too. For <strong>the</strong> ’PLACE’ type argument, more<br />
case types can be selected, as <strong>the</strong>se examples show:<br />
(4) a. Péter egy szép házban lakik.<br />
Peter-NOM a nice house-INESS live-3Sg<br />
’Peter lives in a nice house’<br />
b. Péter Budapesten lakik.<br />
Peter-NOM Budapest-SUPERESS live-3Sg<br />
’Peter lives in Budapest.’<br />
c. Péter az iskola mellett lakik.<br />
Peter-NOM <strong>the</strong> school-NOM next-POSTPOS live-3Sg<br />
’Peter lives next to <strong>the</strong> school.’<br />
Syntax may account for adjuncts too. A suffixed noun is an adjunct when <strong>the</strong> suffix is<br />
compositional, but all those compositional elements are complements which are required<br />
by ano<strong>the</strong>r element. In this case <strong>the</strong> suffix (or <strong>the</strong> lexical item: ott ’<strong>the</strong>re’) tells about<br />
itself that it is an adjunct requiring a noun.<br />
4 Rank parameters in operation<br />
Rank parameters come from description, by experience. In <strong>the</strong> followings, some Hungarian<br />
examples show how <strong>the</strong>y work.<br />
178
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In Hungarian a head-complement relation is given by a 7 strength rank parameter. We<br />
do not give any direction because (since lexical items are moprhemes) <strong>the</strong> place <strong>of</strong> <strong>the</strong><br />
complement is underspecified at this point. Semantic requirements search an aspectualization<br />
argument in <strong>the</strong> pre-verbal position. There is always an argument giving aspect:<br />
usually it is a pre-verb (5a) 2 or a bare NP (5b) or occasionally it can be <strong>the</strong> verb itself (5c)<br />
(Alberti, 2004).<br />
(5) a. Péter megírta a leckét.<br />
Peter-NOM Perf+write-Past3Sg <strong>the</strong> homework-ACC<br />
’Peter has written <strong>the</strong> homework.’<br />
b. Már három hete újságot árulok.<br />
Already three week newspaper-ACC sell-1Sg<br />
’I have been selling newspaper for three weeks already.’<br />
c. Péter csalódik Mariban.<br />
Peter-NOM get-disappointed-3Sg Mary-INESS<br />
’Peter gets disappointed in Mary.’<br />
Pre-verbs have two rank parameters, both recessive. In neutral sentences like (6a)<br />
<strong>the</strong> pre-verb el ’away’ must precede indul ’starts going’, given by a strong (r2b) rank<br />
parameter. The emphasis is on <strong>the</strong> pre-verb, and <strong>the</strong> verb has no emphasis, so practically<br />
<strong>the</strong>y form one phonological word. In o<strong>the</strong>r cases, like in (6b), <strong>the</strong> pre-verb may follow <strong>the</strong><br />
verb by a weaker (r3a) rank parameter. This time <strong>the</strong>y are separate phonological words.<br />
(6) a. Péter elindul horgászni.<br />
Peter-NOM away+go3Sg fish-INF<br />
’Peter goes fishing.’<br />
b. Péter ’horgászni indul el. / ’Péter indul el horgászni.<br />
Peter-NOM fish-INF go-3Sg away / Peter-NOM go-3Sg away fish-INF<br />
’Why Peter goes away is that he will fish.’ / ’It is Peter who goes fishing.’<br />
Sometimes a certain argument gives aspect. For example, <strong>the</strong> verb lakik ’live’ has an<br />
argument for ’PLACE’, and it is in <strong>the</strong> preceding position with a strong (r2b) rank (7a),<br />
or in <strong>the</strong> following position with a weaker (r3a) rank (7b) 3 .<br />
(7) a. Péter Budapesten lakik.<br />
Peter-NOM Budapest-SUPERESS live-3Sg<br />
’Peter lives in Budapest.’<br />
b. *Péter lakik Budapesten / ’Péter lakik Budapesten.<br />
Peter-NOM live-3Sg Budapest-SUPERESS<br />
’*Peter lives in Budapest.’ / ’It is Peter who lives in Budapest.’<br />
There are even more special cases when a verb having a pre-verb still gets <strong>the</strong> aspect<br />
from ano<strong>the</strong>r argument.<br />
2 Pre-verbs in Hungarian are considered as complements (as well as in o<strong>the</strong>r <strong>the</strong>ories), because <strong>the</strong>y are<br />
separate words. It is a matter <strong>of</strong> orthography that if <strong>the</strong> preverb preceeds <strong>the</strong> verb immediately <strong>the</strong>y should<br />
be joint.<br />
3 In <strong>the</strong> examples apostrophe means strong emphasis. Besides word order, this denotes focus in a Hungarian<br />
sentence.<br />
179
(8) a. Péter Budapesten szállt meg.<br />
Peter-NOM Budapest-SUPERESS stay-Past3Sg Perf<br />
’Peter stayed in Budapest.’<br />
b. *Péter megszállt Budapesten / Péter ’megszállt Budapesten.<br />
Peter-NOM Perf+stay-Past3Sg Budapest-SUPERESS<br />
’*Peter stayed in Budapest.’ / ’What Peter did in Budapest was that he stayed <strong>the</strong>re.’<br />
As we can see in (8b), <strong>the</strong> first sentence without emphasis is non-grammatical. The<br />
second variant is grammatical, but not neutral in any cases: a focus throws <strong>the</strong> locative<br />
back, so only <strong>the</strong> weaker requirement can be satisfied (see fur<strong>the</strong>r in <strong>the</strong> next two sections).<br />
The aspect-giving argument has to be stored with two rank parameters in every case.<br />
5 Focus in Hungarian<br />
Focus in Hungarian can be noticed by emphasis and word order (Kiss, 2000). In <strong>the</strong><br />
following examples (9a) is a neutral sentence and (9b-c) are variants with a focus pointing<br />
on different complements <strong>of</strong> <strong>the</strong> verb.<br />
(9) a. Mari süteményt süt Péternek.<br />
Mary-NOM cookie-ACC bake-3Sg Peter-DAT<br />
’Mary is baking cookies for Peter.’<br />
b. Mari ’Péternek süt süteményt.<br />
’It is Peter for whom Mary is baking cookies.’<br />
c. Mari ’süteményt süt Péternek (és nem kenyeret).<br />
’Those are cookies (and not bread) what Mary is baking for Peter.’<br />
In our solution focus is a separate lexical item 4 , because it influences o<strong>the</strong>r elements<br />
in <strong>the</strong> sentence by its own requirements. It searches for two o<strong>the</strong>r elements: <strong>the</strong> focused<br />
element and a verb. Focus gives <strong>the</strong> verb a strong dominant rank parameter to be in <strong>the</strong><br />
following position (d6a). 5<br />
In <strong>the</strong> previous section we claimed that <strong>the</strong> aspect-giving argument (mostly a pre-verb)<br />
has to be stored with two rank parameters. In neutral sentences (as in (6a)) <strong>the</strong> stronger<br />
(r2b) rank parameter is satisfied. But when a focus comes (see (6b)), <strong>the</strong> requirement <strong>of</strong><br />
<strong>the</strong> pre-verb cannot be satisfied. The weaker (r3a) requirement is still <strong>the</strong>re, and it can be<br />
satisfied.<br />
6 Processing<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Search rolls from <strong>the</strong> finite verb. Those elements, which turn out to be not required by <strong>the</strong><br />
verb or any <strong>of</strong> its complements (adjuncts mostly), are legitimate if <strong>the</strong>y find an element to<br />
attach to.<br />
The first step <strong>of</strong> <strong>the</strong> process is to check dominant rank parameters. In Figure 2. <strong>the</strong> focused<br />
element tortát ’cake-ACC’ directly preceds <strong>the</strong> verb hozott ’bring-Past3Sg’.) Then<br />
all conflicting requirements are deleted:<br />
4 Although it is phonetically null in Hungarian, in some languages it is a morpheme (eg. eskimo,<br />
quechua, tamil). This explains why we consider it as a separate lexical item.<br />
5 Progressive form <strong>of</strong> telic situations may work <strong>the</strong> same.<br />
180
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Figure 2: Processing.<br />
1. Ranks applying to <strong>the</strong> same element from <strong>the</strong> same element (In Figure 2. r3a between<br />
<strong>the</strong> pre-verb be ’in’ and <strong>the</strong> verb, only r7b remains);<br />
2. All o<strong>the</strong>r ranks between <strong>the</strong> two elements (r7a from <strong>the</strong> verb to <strong>the</strong> focused tortát,<br />
and r7c from tortát to <strong>the</strong> verb);<br />
3. Ranks applying to ano<strong>the</strong>r element with a reverse direction (r7b rank <strong>of</strong> <strong>the</strong> verb to<br />
<strong>the</strong> subject Péter ’Peter-NOM’ changes 6 to r7c, because subject could be anywhere<br />
around <strong>the</strong> verb if <strong>the</strong>re is a focus);<br />
4. The dominant rank parameter wins if <strong>the</strong>re are two conflicting requirements <strong>of</strong> <strong>the</strong><br />
same element (between Péter ’Peter-NOM’ and <strong>the</strong> verb hozott ’bring-Past3Sg’<br />
<strong>the</strong>re is r7c and d7a, due to <strong>the</strong> focus <strong>the</strong> former one remains in this sentence, but<br />
in a neutral sencence d7a applies.)<br />
The next step is to check recessive rank parameters: ei<strong>the</strong>r two elements are neighbours<br />
directly or <strong>the</strong>re is ano<strong>the</strong>r element between <strong>the</strong>m which is required with a stronger rank<br />
parameter (this may bring adjoining elements). In <strong>the</strong> example it goes as follows:<br />
1. egy tortát ’a cake-ACC’, a szobába ’<strong>the</strong> room-ILLAT’, hozott be ’bring-Past3Sg in’<br />
are neighbours directly;<br />
2. in be a szobába ’in <strong>the</strong> room-ILLAT’ <strong>the</strong> definite article is in between, but it has a<br />
stronger rank parameter (r5a against r7c);<br />
3. Péter and hozott has egy tortát in between due to <strong>the</strong> 6 strength rank parameter by<br />
<strong>the</strong> focus.)<br />
In our system, contrary to phrase-ctructure grammars, any element can be focused.<br />
Sometimes <strong>the</strong> verb does not succeed <strong>the</strong> focused element immediately. An adjoining<br />
word may follow it which wedges itself in by a stronger rank parameter, like (10) shows:<br />
(10) a. Péter egy lánnyal találkozott.<br />
Peter-NOM a girl-INS meet-Past3Sg<br />
’Peter met a girl.’<br />
6 Practically its direction is deleted, see section 2.<br />
181
. Péter egy ’okos lánnyal találkozott.<br />
Peter-NOM a clever girl-INS meet-Past3Sg<br />
’It was a clever girl whom Peter met.’<br />
c. Péter ’két okos lánnyal találkozott.<br />
Peter-NOM two clever girl-INS meet-Past3Sg<br />
’It was two clever girls whom Peter met.’<br />
(11) a. Péter olvasott egy verset Adytól.<br />
Peter-NOM read-Past3Sg a poem-ACC Ady-ABL<br />
’Peter read a poem by Ady.’<br />
b. *Péter egy ’verset Adytól olvasott.<br />
Peter-NOM a poem-ACC Ady-ABL read-Past3Sg<br />
In (11) <strong>the</strong> focused element (verset ’poem-ACC’) has a complement (Adytól ’Ady-<br />
ABL’), but complements are required with a 7 strength rank parameter and it is weaker<br />
than <strong>the</strong> 6 strength rank parameter between <strong>the</strong> focus and <strong>the</strong> verb.<br />
7 Conclusion<br />
We are working on <strong>the</strong> implementation <strong>of</strong> this system in which predicate-argument and<br />
head-complement relations, adjuncts and word order are all handled in <strong>the</strong> lexicon. Rank<br />
parameters account for word order variations in a language, and for o<strong>the</strong>r phenomena like<br />
scrambling (this shows clear differences between languages) or focus and progressive<br />
(which are sometimes invisible). The next step will be a semantic component, because<br />
we believe that intelligent applications can be made only on real linguistic basis which<br />
requires fine semantics.<br />
Acknowledgements<br />
I am grateful to <strong>the</strong> Hungarian National Scientific Research Fund (OTKA K60595) for<br />
<strong>the</strong>ir contribution to my costs.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Alberti, G. (1999). GASG: The grammar <strong>of</strong> total lexicalism, Working Papers in <strong>the</strong> Theory<br />
<strong>of</strong> Grammar 6(1). Theoretical Linguistics Programme, Budapest University and<br />
Research Institute for Linguistics, Hungarian Academy <strong>of</strong> Sciences.<br />
Alberti, G. (2004). Climbing for aspect with no rucksack, in K. É. Kiss and H. van<br />
Riemsdijk (eds), Verb Clusters; A study <strong>of</strong> Hungarian, German and Dutch, Linguistics<br />
Today 69, John Benjamins, Amsterdam:Philadelphia, pp. 253–289.<br />
Alberti, G. (2005). ReALIS. Doctoral dissertation at Hungarian Academy <strong>of</strong> Sciences,<br />
ms. HAS Research Institute for Linguistics and University <strong>of</strong> Pécs.<br />
URL: http://lingua.btk.pte.hu/gelexi.asp<br />
Archangeli, D. and Langendoen, T. D. (eds) (1997). Optimality Theory: an Overview,<br />
Blackwell, Oxford.<br />
182
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Asher, N. and Lascarides, A. (2003). Logics <strong>of</strong> Conversation, Cambridge University<br />
Press, Cambridge.<br />
Halassy, B. (1994). Az adatbázis-tervezés alapjai és titkai [Basics and Secrets <strong>of</strong> Designing<br />
a Database], IDG, Budapest.<br />
Kamp, H., van Genabith, J. and Reyle, U. (2004). Discourse representation <strong>the</strong>ory. ms.<br />
to appear in Handbook <strong>of</strong> Philosophical Logic.<br />
URL: http://www.ims.uni-stuttgart.de/∼hans<br />
Karttunen, L. (1986). Radical lexicalism, Report No. CSLI-86-68, CSLI Publications.<br />
Kiss, K. É. (2000). Az egyszerü mondat szerkezete [<strong>the</strong> Structure <strong>of</strong> <strong>the</strong> Simple Sentence],<br />
in F. Kiefer (ed.), Strukturális magyar nyelvtan I. Mondattan [Structural Hungarian<br />
Grammar Vol. 1 Syntax], Vol. 7., Akadémiai Kiadó, Budapest, pp. 79–177.<br />
Schneider, G. (2005). A broad-coverage, representationally minimal LFG parser: Chunks<br />
and f-structures are sufficient, in M. Butt and T. H. King (eds), <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />
LFG05 Conference, CSLI Publications, University <strong>of</strong> Bergen, pp. 388–407.<br />
183
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
184
EXPRESSING CONJUNCTIVE AND AGGREGATE QUERIES OVER<br />
ONTOLOGIES WITH CONTROLLED ENGLISH<br />
Camilo Thorne<br />
Free University <strong>of</strong> Bozen-Bolzano<br />
Abstract. We propose to characterize <strong>the</strong> computational complexity <strong>of</strong> answering questions<br />
in ontology-mediated controlled language interfaces to structured data sources by expressing<br />
ontology-based data access in controlled English. This means: compositionally mapping a<br />
controlled subset <strong>of</strong> English into knowledge bases and formal queries for which <strong>the</strong> computational<br />
complexity <strong>of</strong> ontology-based data access is known. In <strong>the</strong> present paper, we extend<br />
this approach to conjunctive queries and to conjunctive queries with aggregation functions.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Lately, <strong>the</strong>re has been a renewed interest within <strong>the</strong> computational linguistics community<br />
(Minock, 2005; Lesmo and Robaldo, 2007) in natural language interfaces to databases<br />
(NLIDBs), where what is aimed at is managing, with natural language (NL), relational<br />
databases (DBs). In particular, robust interfaces supporting controlled fragments (CLs)<br />
<strong>of</strong> English and based on ontologies, computational semantics and deep semantic parsing<br />
have been developed, by, for instance, <strong>the</strong> Attempto project (Bernstein et al., 2003;<br />
Fuchs et al., 2005). Controlled languages are fragments <strong>of</strong> NL tailored to fit data management<br />
tasks by, typically, constraining <strong>the</strong>ir restricted vocabulary (and syntax), <strong>the</strong>reby<br />
stripping <strong>the</strong>m from ambiguity, whe<strong>the</strong>r structural or semantic. Controlled languages allow<br />
a trade-<strong>of</strong>f between <strong>the</strong> coverage and <strong>the</strong> accuracy <strong>of</strong> <strong>the</strong> translation <strong>of</strong> questions into<br />
formal queries. Ontologies (<strong>the</strong> conceptualizations <strong>of</strong> <strong>the</strong> domain) play <strong>the</strong> intermediate<br />
role between <strong>the</strong> CL’s vocabulary and <strong>the</strong> domain terminology.<br />
However, some important issues regarding controlled English interfaces have not been,<br />
to <strong>the</strong> best <strong>of</strong> our knowledge, fully adressed. One <strong>of</strong> <strong>the</strong>m is <strong>the</strong> tractability and untractability<br />
<strong>of</strong> processing CL information requests and utterances, viz., how difficult is<br />
declaring and accessing structured data with a controlled English interface? And by difficult,<br />
we mean its computational complexity. We believe that a way <strong>of</strong> adressing this issue<br />
consists in expressing ontology based data access with CLs. By this we mean designing<br />
declarative and interrogative controlled subsets <strong>of</strong> English that compositionally map<br />
through a semantic mapping �.� (taken from NL formal semantics) into formal queries,<br />
ontologies and database facts, <strong>the</strong>ir meaning representations (MRs). Ontology based data<br />
access provides <strong>the</strong> logical underpinning <strong>of</strong> accessing structured data w.r.t. ontologies and<br />
its computational complexity, a measure <strong>of</strong> how difficult a task it might be.<br />
The main purpose <strong>of</strong> this paper is tw<strong>of</strong>old. On <strong>the</strong> one hand, we will say what means to<br />
express in CL ontology based data access. On <strong>the</strong> o<strong>the</strong>r hand, we will proceed to express<br />
in controlled English a class <strong>of</strong> formal queries known as conjunctive queries. Conjunctive<br />
queries are good in that with <strong>the</strong>m we reach an optimal computational complexity. Last,<br />
but not least, we will extend our controlled language to cover aggregate queries, which<br />
are conjunctive queries to which <strong>the</strong> basic SQL aggregation functions, COUNT, MIN, MAX<br />
and SUM, have been added.<br />
185
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
2 Ontology Based Data Access<br />
Accessing and declaring data w.r.t. an ontology or conceptualization can be characterized<br />
in terms <strong>of</strong> formal logic as follows (Rosati, 2007). A relational query q <strong>of</strong> arity n is a<br />
formal expression q(x) ← Qyβ(x, y), where q(x) is <strong>the</strong> head and x denotes a sequence<br />
<strong>of</strong> n variables, <strong>the</strong> query’s distinguished variables, and Qyβ(x, y) is <strong>the</strong> body, a first order<br />
logic (FOL) quantified boolean combination <strong>of</strong> relational atoms where <strong>the</strong> distinguished<br />
variables occur free and <strong>the</strong> o<strong>the</strong>rs (<strong>the</strong> sequence y) bound to a quantifier. Qy denotes<br />
<strong>the</strong> sequence <strong>of</strong> its quantifier prefixes. When no confusion arises, we shall abbreviate<br />
Qyβ(x, y) with Φ[x]. A query is said to be boolean if its arity is n = 0. A collection<br />
<strong>of</strong> such queries is called a query language. A relational database (DB) D is a finite set<br />
<strong>of</strong> ground atoms over a schema R := {R1, ..., Rn}, where, for i ∈ [1, n], Ri is a relation<br />
symbol <strong>of</strong> arity m ≥ 1, and over a countably infinite domain Dom <strong>of</strong> constants. The<br />
active domain adom(D) <strong>of</strong> D is <strong>the</strong> set <strong>of</strong> constants that occur in D (a finite subset <strong>of</strong><br />
Dom). An ontology O is a set <strong>of</strong> FOL axioms that make explicit a certain number <strong>of</strong><br />
constraints holding over a domain. They are typically defined over some fragment <strong>of</strong><br />
FOL called an ontology language. This language should be rich enough to express DBs<br />
(i.e., DB atoms). The pair 〈O, D〉 is called a knowledge base (KB), and can be seen as a<br />
FOL logical <strong>the</strong>ory: a set <strong>of</strong> ground atoms (<strong>the</strong> DB) plus a set <strong>of</strong> axioms (<strong>the</strong> ontology).<br />
A gound substitution is a function σ(.) from V ar(q), <strong>the</strong> set <strong>of</strong> variables <strong>of</strong> q, into Dom.<br />
They are extended to sequences <strong>of</strong> variables in <strong>the</strong> standard way. KBs and substitutons<br />
give rise to <strong>the</strong> certain answers semantics <strong>of</strong> query q <strong>of</strong> arity n over a KB 〈O, D〉, denoted<br />
q(〈O, D〉). It consists in collecting <strong>the</strong> values in adom(D) <strong>of</strong> all <strong>the</strong> ground substitutions<br />
σ(.) for which 〈O, D〉 logically entails qσ, where qσ denotes <strong>the</strong> grounding <strong>of</strong> q by σ(.).<br />
Formally, q(〈O, D〉) := {σ(x) ∈ adom(D) n | σ s.t. 〈O, D〉 |= qσ}. To investigate its<br />
computational complexity we must look at <strong>the</strong> associated recognition problem:<br />
Definition 1. (QA) The KB query answering (QA) decision problem is <strong>the</strong> FOL entailment<br />
problem stated as follows: given a KB 〈O, D〉, a sequence c ∈ Dom n <strong>of</strong> n constants, a<br />
CQ q <strong>of</strong> arity n and distinguished variables x, check if <strong>the</strong>re exists a ground substitution<br />
σ(.) s.t. σ(x) = c and 〈O, D〉 |= qσ holds, where qσ is <strong>the</strong> grounding <strong>of</strong> q by σ(.).<br />
When we focus on #(adom(D)) (<strong>the</strong> number <strong>of</strong> constants <strong>of</strong> D) while considering<br />
constant both size(q) (<strong>the</strong> number <strong>of</strong> symbols <strong>of</strong> <strong>the</strong> query) and #(O) (<strong>the</strong> number <strong>of</strong><br />
axioms), we speak, in a manner set by (Vardi, 1982), <strong>of</strong> <strong>the</strong> data complexity <strong>of</strong> QA. Such<br />
complexity will depend on <strong>the</strong> query language and <strong>the</strong> ontology language chosen (Rosati,<br />
2007).<br />
The certain answers semantics can provide a formal semantics for ontology mediated<br />
CL data access interfaces and QA’s data complexity both a measure <strong>of</strong> <strong>the</strong>ir difficulty and<br />
a criterion for optimality. To implement this strategy we need, we believe, to go through<br />
two stages: (i) We need to choose an ontology language and a query language for which<br />
<strong>the</strong> computational complexity <strong>of</strong> QA is known and for which data complexity is optimal.<br />
(ii) We need to express with controlled English QA.<br />
3 Expressing QA with Controlled English<br />
A compositional translation �.�, as proposed and conceived by Montague in (Montague,<br />
1970) is a function that homomorphically maps a fragment <strong>of</strong> natural language (English<br />
186
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
in our case) into, basically, FOL augmented with <strong>the</strong> types, <strong>the</strong> lambda abstraction and <strong>the</strong><br />
function application constructs <strong>of</strong> <strong>the</strong> simply typed λ-calculus, a.k.a. λ-FOL. They assign<br />
to NL utterances a λ-FOL formula: its meaning representation (MR). The key feature <strong>of</strong><br />
compositional translations is that <strong>the</strong>y can be made to map declarative fragments <strong>of</strong> NL<br />
into ontology languages and interrogative fragments into query languages.<br />
Definition 2. (Expressing QA) Given an ontology language L and a query language Q,<br />
expressing QA in controlled English consists in: (i) Defining a grammar G and a compositional<br />
translation �.� for a controlled declarative fragment L(G) s.t. �.� maps L(G)<br />
into L. (ii) Defining a grammar G ′ and a compositional translation �.� for a controlled<br />
interrogative fragment L(G ′ ) s.t. �.� maps L(G ′ ) into Q.<br />
We have dealt elsewhere with <strong>the</strong> problem <strong>of</strong> expressing KBs and ontology languages<br />
by expressing, in particular, <strong>the</strong> DL-LiteR,⊓ ontology language or logic and, in general,<br />
<strong>the</strong> DL-Lite family <strong>of</strong> DLs (Calvanese, De Giacomo, Lembo, Lenzerini and Rosati,<br />
2007). Description logics (DLs) are knowledge representation logics that conceptually<br />
model a domain in terms <strong>of</strong> classes, roles (binary relations among classes) and inheritance<br />
relations between classes and roles. In (Bernardi, Calvanese and Thorne, 2007; Thorne,<br />
2007) we define a declarative CL, Lite-English, a compositional translation �.� and<br />
show that:<br />
Theorem 1. (Bernardi et al., 2007) For every sentence S in <strong>the</strong> CL Lite-English,<br />
<strong>the</strong>re exists a DL-LiteR,⊓ assertion α s.t. �S� = α. Conversely, every DL-LiteR,⊓<br />
assertion α is <strong>the</strong> image by �.� <strong>of</strong> some sentence S in Lite-English.<br />
To get <strong>the</strong> whole picture we need to look now at query languages. It turns out to be that<br />
QA for DL-LiteR,⊓ is optimal w.r.t. data complexity, falling under LOGSPACE (actually,<br />
AC 0 ), a minimal complexity class, when we choose as query language <strong>the</strong> class <strong>of</strong><br />
relational queries known as ruled-based conjunctive queries (CQs). Conjunctive queries<br />
are queries over a schema R whose body is a conjunction <strong>of</strong> existentially quantified relational<br />
atoms. Expressing query languages w.r.t. which QA’s computational complexity is<br />
optimal can shed light on <strong>the</strong> conditions under which <strong>the</strong> task <strong>of</strong> accessing data w.r.t. an<br />
ontology with CL might be a relatively easy task.<br />
4 Expressing Conjunctive Queries<br />
In this section we will show how to express graph-shaped simple conjunctive queries, a<br />
subclass <strong>of</strong> <strong>the</strong> class <strong>of</strong> CQs, for which QA is optimal too. A typical boolean graph-shaped<br />
query over, say, <strong>the</strong> constant Mary and <strong>the</strong> binary predicates loves and hates is<br />
(1) q() ← ∃x∃y(loves(Mary, x) ∧ hates(x, y))<br />
which we would like to express through <strong>the</strong> CL Y/N-question<br />
(2) Does Mary love somebody who hates somebody?<br />
And a typical non-boolean graph-shaped query over <strong>the</strong> same set <strong>of</strong> relational symbols<br />
(i.e., <strong>the</strong> schema {loves, hates}) is<br />
(3) q(x) ← ∃y(loves(x, y) ∧ hates(x, y))<br />
187
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(Lexical rule) (Value <strong>of</strong> �.� on word and category)<br />
Det → some λP.λQ.∃x(P (x) ∧ Q(x)): (e → t) → ((e → t) → (e → t))<br />
Proi → somebody λP.∃xP (x): (e → t) → t<br />
Pro −<br />
i → anybody<br />
Coord → and<br />
λP.∃xP (x): (e → t) → t<br />
λP.λQ.∃x(P (x) ∧ Q(x)): (e → t) → ((e → t) → (e → t))<br />
Relproi → who<br />
Proi → him<br />
Proi → himself<br />
Intpro → which<br />
λP.λx.P (x): (e → t) → (e → t)<br />
λP.P (x): (e → t) → t<br />
λP.P (x): (e → t) → t<br />
λP.λQ.λx.P (x) ∧ Q(x): (e → t) → (e → t)<br />
Intproi → whoi<br />
NPgapi → ɛ<br />
λP.λx.P (x): (e → t) → (e → t)<br />
λP.P (x): (e → t) → t<br />
Ni → man,... λx.man(x): e → t,...<br />
IVi → runs,... λx.run(x): e → t,...<br />
IV −<br />
i → run,...<br />
TVi,j → loves,...<br />
TV<br />
λx.run(x): e → t,...<br />
λα.λx.α(λy.loves(x, y)): ((e → t) → t) → (e → t),...<br />
−<br />
i,j → love,...<br />
TV<br />
λα.λx.α(λy.loves(x, y)): ((e → t) → t) → (e → t),...<br />
p<br />
i,j → loved,...<br />
Adji → mortal,...<br />
Pni → Mary,...<br />
λα.λx.α(λy.loves(x, y)): ((e → t) → t) → (e → t),...<br />
λx.mortal(x): e → t,...<br />
λP.P (Mary): (e → t) → t,...<br />
Table 1: Lexical rules for GCQ-English.<br />
which we would like to express through <strong>the</strong> CL Wh-question (containing an anaphoric<br />
pronoun)<br />
(4) Who loves somebody who hates him?<br />
Definition 3. (GCQs) A non-boolean graph-shaped simple conjunctive query (GCQ) <strong>of</strong><br />
arity ≤ 1 is a CQ over a schema R composed <strong>of</strong> relation symbols <strong>of</strong> arity ≤ 2 <strong>of</strong> <strong>the</strong> form<br />
q := q(x) ← Φ[x] where <strong>the</strong> body Φ[x] is inductively defined as:<br />
Φ[x] := Ai0 (x) ∧ ... ∧ Aim(x) ∧ Rj0 (x, x) ∧ ... ∧ Rjm(x, x) ∧ Rj0 (x, c) ∧ Rjm(x, c).<br />
Φ[x] := Φ ′ [x] ∧ ∃y(Ai0 (x) ∧ ... ∧ Aim(x) ∧ Rj0 (x, y) ∧ ... ∧ Rjm(x, y) ∧ Rj0 (y, x)∧<br />
∧Rjm(y, x) ∧ Φ ′′ [y]).<br />
Note that we allow in this definition for empty sequences <strong>of</strong> conjuncts, e.g., |Ai0(x)∧...∧<br />
Aim(x)| ≥ 0 (where |.| is <strong>the</strong> function that returns <strong>the</strong> number <strong>of</strong> predicates in <strong>the</strong> body<br />
<strong>of</strong> a relational query). A boolean GCQ is a query <strong>of</strong> <strong>the</strong> form q := q() ← ∃yΦ[y], where<br />
Φ[y] is <strong>the</strong> body <strong>of</strong> a non-boolean GCQ.<br />
4.1 Expressing Conjunctive Queries with GCQ-English<br />
GCQs are captured by <strong>the</strong> interrogative CL GCQ-English. Questions in GCQ-English<br />
fall under two main classes : (i) Wh-questions, that will map into non-boolean GCQs and<br />
(ii) Y/N-questions, that will map into boolean GCQs. For simplicity, we assume grammars<br />
to be phrase structure grammars augmented with semantic actions. Phrase structure<br />
grammars are composed <strong>of</strong> two sets <strong>of</strong> rewriting rules: lexical rules (a.k.a. lexicons)<br />
and phrase-structure rules. Table 2 shows <strong>the</strong> phrase-structure rules <strong>of</strong> GCQ-English’s<br />
grammar and Table 1 its lexicon. Moreover, <strong>the</strong> latter is divided into two sets: a closed<br />
set <strong>of</strong> function word rules, that express (at <strong>the</strong> semantical level) logical operations and<br />
connectives, and an open set <strong>of</strong> content word rules (nouns, adjectives, verbs), a feature we<br />
convey through dots.<br />
188
(Rule) (Semantic Action)<br />
Qwh → Intpro Ni Sgapi ? �Qwh� := �Intpro�(�Ni�)(�Sgapi�) Qwh → Intproi Sgapi ? �Qwh� := �Intproi�(�Sgapi ?�)<br />
QY /N → does NP −<br />
i VP−i<br />
? �QY /N � := �NP −<br />
i �(�VP−i<br />
�)<br />
QY /N → is NPi VPi? �QY /N � := �NPi�(�VPi�)<br />
Sgap i → NPgap i VPi<br />
�Sgap i � := �NPgap i �(�VPi�)<br />
VPi → VPi Coord VPi �VP� := �Coord�(�VP�)(�VP�)<br />
VP −<br />
i → VP−i<br />
Coord VP−i<br />
�VP −<br />
i � := �Coord�(�VPi�)(�VP −<br />
i �)<br />
VPi → TVi,j NPj<br />
VPi → is Adji VPi → is a Ni<br />
VP<br />
�VPi� := �TVi,j�(�NPj�)<br />
�VPi� := �Adji� �VPi� := �Ni�<br />
−<br />
i → IV−i<br />
�VP −<br />
i � := �IV−i<br />
�<br />
VPi → IVi<br />
VP<br />
�VPi� := �IVi�<br />
−<br />
i → TV−i,j<br />
NPj �VP−i<br />
� := �TV−i,j�(�NPj�)<br />
VPi → VP p<br />
i<br />
�VPi� := �VP p<br />
i �<br />
VP p<br />
i → TVpi,j<br />
NPj �VPpi<br />
� := �TVpi,j�(�NPj�)<br />
NP −<br />
NPi → Proi<br />
NPi → Det Ni<br />
NPi → Pni<br />
NPi → Proi<br />
i → Det− Ni �NP −<br />
Ni → Adj Ni<br />
Ni → Ni Relpro i Sgap i<br />
i � := �Det−�(�Ni�) �NPi� := �Proi�<br />
�NPi� := �Det�(�N�)<br />
�NPi� := �Pni�<br />
�NPi� := �Proi�<br />
�Ni� := �Adj�(�Ni�)<br />
�Ni� := �Relproi�(�Ni�)(�Sgapi�)) Table 2: Phrase structure rules for GCQ-English.<br />
The empty expression ɛ is what in linguistic <strong>the</strong>ory is called a trace, a placeholder for<br />
<strong>the</strong> antecedent <strong>of</strong> <strong>the</strong> relative pronoun. Symbols occurring in <strong>the</strong> phrase-structure rewriting<br />
rules are called components and represent <strong>the</strong> syntactic chunks into which sentences<br />
can be analysed. Symbols that rewrite into words, that is, symbols in <strong>the</strong> lexicon, are<br />
called categories or terminal components and represent parts <strong>of</strong> speech, that is, verbs,<br />
common and proper nouns, pronouns, adjectives, etc. Some basic morpho-syntactic and<br />
semantic features are attached to (some) components. The feature . − means that <strong>the</strong> component<br />
is <strong>of</strong> negative polarity, <strong>the</strong> feature . p , associated to verbs and verb phrase components,<br />
indicates that such component is to be inflected in <strong>the</strong> passive voice. Absence <strong>of</strong><br />
features indicates that components are in positive polarity and verbs and verb phrases in<br />
<strong>the</strong> active voice. Fur<strong>the</strong>rmore, indexes are assigned to components following <strong>the</strong> standard<br />
set by (Pratt, 2001) to: (i) Resolve intrasentential anaphora: anaphoric pronouns (”him”,<br />
”himself”) resolve with <strong>the</strong>ir nearest (antecedent) head noun. (ii) Indicate gap-filler dependencies.<br />
For simplicity, verbs are in 3rd person singular and in present tense.<br />
A quick glace at <strong>the</strong> grammar rules <strong>of</strong> GCQ-English will convince <strong>the</strong> reader that,<br />
for instance, <strong>the</strong> (English) question<br />
(5) Does John love Mary?<br />
and <strong>the</strong> question<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(6) Which man is mortal and loves somebody who hates him?<br />
lie within GCQ-English. By <strong>the</strong> same token, it is easy to see that <strong>the</strong> question<br />
(7) *Which teacher gives a lesson to his pupils?<br />
189
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
lies outside this CL. Why? Because we have no possesive adjectives (e.g., ”his”) and no<br />
ditransitive verbs (e.g., ”gives”).<br />
Semantic actions mean that we define <strong>the</strong> translation �.� by recursion over <strong>the</strong> syntactic<br />
components <strong>of</strong> GCQ-English in such a way that <strong>the</strong> application <strong>of</strong> each grammar rule,<br />
lexical or o<strong>the</strong>rwise, ”triggers” �.� (Jurafsky and Martin, 2000). The intermediate values<br />
<strong>of</strong> this function are called partial MRs. When we reach in a Wh-question <strong>the</strong> Qwh component<br />
�.� will map <strong>the</strong> λ-FOL expression obtained, <strong>of</strong> <strong>the</strong> form �Qwh� = λx.Φ[x]: e → t,<br />
into <strong>the</strong> GCQ q(x) ← Φ[x], where Φ[x] denotes a conjunction <strong>of</strong> existentially quantified<br />
atoms where variable x occurs free. In <strong>the</strong> case <strong>of</strong> a Y/N-question, <strong>the</strong> λ-FOL<br />
�QY/N� = Φ: t will be mapped into <strong>the</strong> boolean GCQ q() ← Φ, where Φ stands for a<br />
conjunction <strong>of</strong> existentially quantified atoms with no free variables. Types ensure that �.�<br />
always terminates. We can compute, given a GCQ-English question Q, �Q� as follows:<br />
(i) We compute <strong>the</strong> parse tree <strong>of</strong> Q. (ii) We compute �Q� bottom-up, from leaves to root,<br />
as in Figure 1. We start by assigning a λ-expression to <strong>the</strong> leaves. Then, at each internal<br />
node, we unify types and compute <strong>the</strong> λ-application and <strong>the</strong> β-reduction <strong>of</strong> its siblings.<br />
We omit types for reasons <strong>of</strong> space. In <strong>the</strong> end we obtain, at <strong>the</strong> root <strong>of</strong> <strong>the</strong> tree, a GCQ.<br />
The circle delimits an island; <strong>the</strong> dotted line, a gap-filler dependency forced upon by <strong>the</strong><br />
use <strong>of</strong> <strong>the</strong> pronoun.<br />
Figure 1: Translating ”Who loves Mary?”.<br />
Lemma 1. (Expressing GCQs) For every question Q in GCQ-English, <strong>the</strong>re exists a<br />
GCQ q s.t. �Q� = q. Conversely, every GCQ q is <strong>the</strong> image by �.� <strong>of</strong> some question Q in<br />
GCQ-English.<br />
Pro<strong>of</strong>. (Sketch) We prove each implication separately:<br />
(⇒) We need to show that for every Wh-question Q in GCQ-English <strong>the</strong>re exists a<br />
GCQ q <strong>of</strong> distinguished variable x and body Φ[x] s.t. �Q� = q(x) ← Φ[x]. Given<br />
that <strong>the</strong> only recursive components in GCQ-English’s grammar are verb phrases<br />
(VPs) and nominals (Ns), this can be proved by an easy simulatenous induction on<br />
Ns and VPs any by discarding all possible parse states where components do not<br />
satisfy co-indexing, polarity and voice constraints. For Y/N-questions we reason<br />
analogously.<br />
190
(⇐) We will prove, by induction on <strong>the</strong> body Φ[x] <strong>of</strong> a non-boolean GCQ q <strong>of</strong> distinguished<br />
variable x, that we can construct a question Q s.t. that q is <strong>the</strong> image <strong>of</strong><br />
Q by �.�. The result will <strong>the</strong>n follow both for boolean and non-boolean GCQs. Recall<br />
that Ns translate into unary predicates, TVs into binary predicates and Pns into<br />
constants:<br />
– (Basis) q(x) ← Φ[x] is <strong>the</strong> image <strong>of</strong> <strong>the</strong> question ”which Ai0 who is a Ai1 who<br />
. . . who is a Aim Rj0s himself and . . . and Rjms himself and Rj0s c and . . . and<br />
Rjms c and is Rj0d by c and . . . and is Rjmd by c?”.<br />
– (Inductive step) q(x) ← Φ[x] is <strong>the</strong> image <strong>of</strong> <strong>the</strong> question ”which Φ ′ [x] Rj0s<br />
and Rj1s and . . . and Rjms some Ai0 who is a Ai1 and who is a Ai2 and . . . and<br />
who is a Aim and who Rj0s him and . . . and who Rjms him and who Φ ′′ [y]?”,<br />
by induction hypo<strong>the</strong>sis on Φ ′ [x] and Φ ′′ [y]. ✷<br />
Theorem 2. (Expressing QA) The QA problem for Lite-English and GCQ-English<br />
falls under in LOGSPACE w.r.t. data complexity.<br />
Pro<strong>of</strong>. It follows immediately from Theorem 1 and Lemma 1. ✷<br />
5 Expressing Aggregate Queries<br />
The question we now need to answer is: how can we expand <strong>the</strong> coverage <strong>of</strong> our CL<br />
without compromising <strong>the</strong> tractability <strong>of</strong> QA? In this section we propose to cover graphshaped<br />
aggregate queries, that is, GCQs augmented with (some <strong>of</strong>) <strong>the</strong> basic SQL aggregation<br />
functions, COUNT, MIN, MAX and SUM. These functions are defined on finite subsets<br />
<strong>of</strong> Dom ∪ Q, i.e., on DB domains plus <strong>the</strong> linearly ordered set <strong>of</strong> rational numbers and<br />
take values in Q, that is, <strong>the</strong>y compute a rational number. For <strong>the</strong> purposes <strong>of</strong> <strong>the</strong> current<br />
paper, we will restrict our analysis to only two <strong>of</strong> <strong>the</strong>m, namely MAX and MIN, although<br />
this analysis can be easily generalized to cover all <strong>of</strong> <strong>the</strong>se functions.<br />
Aggregates arise frequently in domains and systems containing numerical data, e.g.<br />
geographical domains and systems. One <strong>of</strong> <strong>the</strong>m, <strong>the</strong> GEOQUERY geography database<br />
system, comes with a NL interface that supports NL questions expressing such functions<br />
(Mooney, 2007). The corpus <strong>of</strong> <strong>the</strong>se questions showed that user questions did basically<br />
convey ei<strong>the</strong>r a CQ or a CQ with aggregation functions (see Table 3). Most importantly,<br />
CQs Aggregations Negation<br />
Questions 34.54% 65.35% 0.11%<br />
Table 3: Frequency <strong>of</strong> CQs in GEOQUERY.<br />
answering CQs (and a fortiori GCQs) with aggregation functions over DL-Lite ontologies<br />
is polynomial w.r.t. data complexity. So, how do <strong>the</strong>se queries look like and what<br />
kind <strong>of</strong> questions do we want to have in our CL? We would like to capture queries over<br />
unary predicates computing a maximum like<br />
(8) q(max(n)) ← height(n) ∧ odd(n)<br />
with a CL Wh-question like<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(9) Which is <strong>the</strong> greatest height that is odd?<br />
191
(Rule) (Semantic action)<br />
VPi,j → COP NPj �VPi,j� := �COP�(�NPj�)<br />
(Lexical rule) (Value <strong>of</strong> �.� on word and category)<br />
Det → <strong>the</strong> greatest λP.max(P ): (Q → t) → Q<br />
Det → <strong>the</strong> smallest λP.min(P ): (Q → t) → Q<br />
Det → some λP.λQ.∃n(P (n) ∧ Q(n)): (Q → t) → ((Q → t) → (Q → t))<br />
Proi → something λP.∃nP (n): (Q → t) → t<br />
Pro −<br />
→ anything λP.∃nP (n): (Q → t) → t<br />
i<br />
Proi → it λP.P (n): (Q → t) → t<br />
Proi → itself λP.P (n): (Q → t) → t<br />
Coord → and λP.λQ.∃n(P (n) ∧ Q(n)): (Q → t) → ((Q → t) → (Q → t))<br />
Relpro i → that λP.λn.P (n): (Q → t) → (Q → t)<br />
Intpro i → which λP.λn.P (n): (Q → t) → (Q → t)<br />
COPi,j → is λn.λm.n ≈ m: Q → (Q → t)<br />
NPgap i → ɛ λP.P (n): (Q → t) → t<br />
Ni → height,... λn.height(n): Q → t,...<br />
Adj → odd,... λn.odd(n): Q → t,...<br />
Or queries computing a sum<br />
Table 4: Grammar rules for AGCQ-English.<br />
(10) q(sum(n)) ← height(n) ∧ odd(n)<br />
with <strong>the</strong> question<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(11) Which is <strong>the</strong> sum <strong>of</strong> all heights that are odd?<br />
Definition 4. (AGCQs) A graph-shaped conjunctive aggregate query (AGCQ) over a relational<br />
schema R is a query <strong>of</strong> <strong>the</strong> form q(α(n)) ← Φ[n], where α ∈ {min, max}, n<br />
is q’s distinguished variable, a numerical variable, and Φ[n] is <strong>the</strong> body <strong>of</strong> a non boolean<br />
GCQ. Note that <strong>the</strong>re are no boolean AGCQs.<br />
5.1 Expressing Aggregate Queries with AGCQ-English<br />
To express AGCQs in CL we extend AGCQ-English into a new fragment <strong>of</strong> English<br />
called AGCQ-English as follows. Aggregation functions min and max are conveyed,<br />
in English, by, respectively, definite NPs like ”<strong>the</strong> smallest N” and ”<strong>the</strong> greatest N”, only<br />
this time <strong>the</strong>y must denote not a set <strong>of</strong> properties, but, instead, a numeric value. The<br />
symbol N stands for a nominal component that denotes sets <strong>of</strong> numerical values. The<br />
rest <strong>of</strong> <strong>the</strong> expression behaves in a manner similar to a determiner. We must thus start by<br />
enriching our set <strong>of</strong> primitive λ-FOL types from {e, t} into {e, t, Q} and allow for new<br />
determiners <strong>of</strong> type (Q → t) → Q.<br />
Definition 5. (Aggregate Determiners) An aggregate determiner is any <strong>of</strong> <strong>the</strong> following:<br />
(i) The determiner ”<strong>the</strong> greatest”, associated to max and <strong>of</strong> partial MR λP.max(P ): (Q →<br />
t) → Q. (ii) The determiner ”<strong>the</strong> smallest”, associated to <strong>the</strong> aggregation function min<br />
and <strong>of</strong> partial MR λP.min(P ): (Q → t) → Q.<br />
Once aggregate determiners have been introduced, <strong>the</strong>re are three steps left to finish<br />
covering aggregate queries with AGCQ-English. (i) We introduce a new interrogative<br />
192
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Figure 2: Translating ”Which is <strong>the</strong> greatest height?”.<br />
pronoun ”which” <strong>of</strong> semantics λP.λn.(P )n: (Q → t) → (Q → t), where P is a predicate<br />
symbol <strong>of</strong> type Q → t (<strong>the</strong> type <strong>of</strong> sets <strong>of</strong> numbers) and n a variable <strong>of</strong> type Q (<strong>the</strong><br />
type <strong>of</strong> numeric values). (ii) We introduce new entries for function words to take into<br />
account <strong>the</strong> new basic type Q. (iii) We introduce <strong>the</strong> identity predicate ”is” (<strong>of</strong> category<br />
COP, for copula) <strong>of</strong> semantics λn.λm.m ≈ n: Q → (Q → t). The reader can see in Table<br />
4 <strong>the</strong> (new) lexical rules that extend CL coverage to aggregations. The semantic mapping<br />
�.� is <strong>the</strong>n computed in <strong>the</strong> standard way over <strong>the</strong> parse tree <strong>of</strong> a AGCQ-English<br />
question, only it will now output, at <strong>the</strong> root <strong>of</strong> <strong>the</strong> tree a λ-FOL expression <strong>of</strong> <strong>the</strong> form<br />
λm.m ≈ α(λn.Φ[n]): Q → t that �.� will proceed to map into q(α(n)) ← Φ[n]. The<br />
reader can see a sample run <strong>of</strong> <strong>the</strong> procedure in Figure 2. Whence:<br />
Lemma 2. For every question Q in AGCQ-English, <strong>the</strong>re exists a AGCQ q s.t. �Q� = q.<br />
Conversely, every ATCQ q is <strong>the</strong> image by �.� <strong>of</strong> some question Q in AGCQ-English.<br />
Pro<strong>of</strong>. (Sketch) As before, <strong>the</strong> first implication is proved by simultaneous induction on<br />
<strong>the</strong> Ns and TVs <strong>of</strong> question Q. The second implication is proved by induction on <strong>the</strong> body<br />
<strong>of</strong> AGCQs q. ✷<br />
Theorem 3. QA is in P for Lite-English and AGCQ-English.<br />
Pro<strong>of</strong>. It follows from Theorem 1 and Lemma 2. ✷<br />
6 Conclusions and Fur<strong>the</strong>r Work<br />
We have provided a certain number <strong>of</strong> guidelines on how to characterize <strong>the</strong> computational<br />
complexity <strong>of</strong> CL interfaces to ontology-driven data access and management systems.<br />
This is achieved by expressing QA in controlled English. We have also shown that<br />
we reach tractability when we choose DL-Lite as ontology language and GCQs and<br />
AGCQs as query languages, for which two CLs, GCQ-English and AGCQ-English,<br />
have been introduced. As fur<strong>the</strong>r work we plan to extend <strong>the</strong> coverage <strong>of</strong> AGCQ-English<br />
to <strong>the</strong> rest <strong>of</strong> <strong>the</strong> basic SQL functions, namely COUNT and SUM and to substantiate (or<br />
validate) <strong>the</strong> intuitiveness <strong>of</strong> <strong>the</strong>se CLs anf <strong>of</strong> <strong>the</strong>ir English constructs by analysing more<br />
question corpora.<br />
193
Acknowledgements<br />
I would like to thank my supervisors, R. Bernardi and D. Calvanese, toge<strong>the</strong>r with I. Pratt,<br />
for <strong>the</strong>ir help and suggestions.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Bernardi, R., Calvanese, D. and Thorne, C. (2007). Lite Natural Language, <strong>Proceedings</strong><br />
<strong>of</strong> <strong>the</strong> 7th International Workshop on Computational Semantics (IWCS-7).<br />
Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M. and Rosati, R. (2007).<br />
Tractable Reasoning and Efficient Query Answering in Description Logics: The<br />
DL-Lite Family, JAR .<br />
Jurafsky, D. and Martin, J. (2000). Speech and Language Processing, Prentice Hall.<br />
Lesmo, L. and Robaldo, L. (2007). Use <strong>of</strong> Ontologies in Practical NL Query Interpretation,<br />
<strong>Proceedings</strong> <strong>of</strong> AI*IA 2007.<br />
Minock, M. (2005). A Phrasal Approach to Natural Language Interfaces over Databases,<br />
Natural Language Processing and Information Systems, 10th International Conference<br />
on Applications <strong>of</strong> Natural Language to Information Systems (NLDB 2005).<br />
Montague, R. (1970). Universal Grammar, Theoria (36).<br />
Mooney, R. J. (2007). Learning for Semantic Parsing, <strong>Proceedings</strong> <strong>of</strong> CICLing2007.<br />
Pratt, I. (2001). On <strong>the</strong> Semantic Complexity <strong>of</strong> some Fragments <strong>of</strong> English, Technical<br />
report, Department <strong>of</strong> Computer Science – University <strong>of</strong> Manchester.<br />
Rosati, R. (2007). The Limits <strong>of</strong> Querying Ontologies, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Eleventh International<br />
Conference on Database Theory (ICDT 2007).<br />
Thorne, C. (2007). Managing Structured Data with Controlled English - An Approach<br />
Based on Description Logics, <strong>Proceedings</strong> <strong>of</strong> <strong>ESSLLI</strong> 2007 <strong>Student</strong> <strong>Session</strong>.<br />
Vardi, M. (1982). The Complexity <strong>of</strong> Relational Query Languages, <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />
Fourteenth Annual ACM Symposium on Theory <strong>of</strong> Computing.<br />
194
1 Introduction<br />
INTERROGATION IN DYNAMIC EPISTEMIC LOGIC ∗<br />
Christina Unger – Gianluca Giorgolo<br />
UiL-OTS, Universiteit Utrecht<br />
Questions still exhibit an aura <strong>of</strong> mystery and challenge as it is commonly found with<br />
natural language phenomena that lie on <strong>the</strong> border between semantics and speech acts.<br />
Several treatments have been proposed within denotational semantics, never<strong>the</strong>less <strong>the</strong>re<br />
is no strong consensus about what kind <strong>of</strong> a semantic object <strong>the</strong>ir meaning is. One <strong>of</strong> <strong>the</strong><br />
early and most well-known approaches to <strong>the</strong> semantics <strong>of</strong> interrogatives was introduced<br />
by (Hamblin, 1973) and fur<strong>the</strong>r developed by (Karttunen, 1977). Their line <strong>of</strong> work reduces<br />
<strong>the</strong> meaning <strong>of</strong> interrogatives to propositions by letting questions denote <strong>the</strong> set<br />
<strong>of</strong> possible or true answers. A slightly different approach is <strong>the</strong> partition semantics by<br />
(Higginbotham and May, 1981) and (Groenendijk and Stokh<strong>of</strong>, 1984). It is based on <strong>the</strong><br />
intuition that <strong>the</strong> meaning <strong>of</strong> questions are partitions <strong>of</strong> <strong>the</strong> logical space constituted by<br />
<strong>the</strong> mutually exclusive possibilities that can serve as answers.<br />
In this paper we propose Dynamic Epistemic Logic (DEL) as a powerful tool for a<br />
unified treatment <strong>of</strong> all question types. We will start with an epistemic interpretation <strong>of</strong><br />
Dynamic Propositional Logic and show how to use it to formalize yes/no questions and<br />
answerhood. We will <strong>the</strong>n extend it with public announcements and add public questions<br />
as well as a possibility to embed questions. Finally we sketch how this can also be generalized<br />
to <strong>the</strong> case <strong>of</strong> constituent questions in <strong>the</strong> line <strong>of</strong> Groenendijk & Stokh<strong>of</strong>. In <strong>the</strong><br />
last section we will give an outlook on additional benefits <strong>of</strong> using DEL as a framework,<br />
e.g. <strong>the</strong> interaction <strong>of</strong> questions and presuppositions.<br />
2 Yes/no questions<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
The basis for our investigations is propositional dynamic logic (PDL), an extension <strong>of</strong><br />
propositional logic with programs, under an epistemic interpretation. If P is a set <strong>of</strong><br />
propositions and A a set <strong>of</strong> relational atoms, with p ∈ P an arbitrary proposition and i<br />
ranging over A, <strong>the</strong> language is given by<br />
φ ::= ⊤ | p | ¬φ | φ ∧ φ | [π] φ<br />
π ::= i | π ; π | π ∪ π | π ∗ | TEST φ<br />
The main idea <strong>of</strong> giving this PDL-language an epistemic interpretation (see e.g. (van<br />
Ben<strong>the</strong>m, van Eijck and Kooi, 2006)) is that relational atoms represent epistemic accessibilities<br />
<strong>of</strong> single agents. The picture thus is <strong>the</strong> following: <strong>the</strong> state <strong>of</strong> knowledge <strong>of</strong><br />
∗ For valuable comments we are very grateful to <strong>the</strong> referees and to Jan van Eijck.<br />
195
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
a group <strong>of</strong> agents is modeled as a multimodal S5 Kripke model M = (W, V, R), where<br />
W is a non-empty set <strong>of</strong> worlds, V is a valuation function that assigns to every basic<br />
proposition <strong>the</strong> set <strong>of</strong> all worlds where that proposition is true, and R is a function that<br />
assigns to every agent i an equivalence relation ∼i, where w ∼i w ′ expresses that i cannot<br />
distinguish between w and w ′ , i.e. that w and w ′ are epistemic alternatives for i.<br />
The semantics is defined with respect to a model M = (W, R, V ), with <strong>the</strong> usual<br />
interpretation for ⊤, p, negation, and conjunction. The interpretation <strong>of</strong> [π] φ is given by:<br />
M, w |= [π] φ iff for all w ′ with (w, w ′ ) ∈ �π� M : M, w ′ |= φ<br />
where �π� M is <strong>the</strong> meaning <strong>of</strong> <strong>the</strong> epistemic construct π, given as follows: basic epistemic<br />
constructs i are interpreted by ∼i and composed ones by means <strong>of</strong> regular operations on<br />
relations: �π ; π ′ � M = �π� M ◦ �π ′ � M where ◦ is relational composition, �π ∪ π ′ � M =<br />
�π� M ∪�π ′ � M , �π ∗ � M = (�π� M ) ∗ where ∗ is <strong>the</strong> reflexive transitive closure, and TEST is <strong>the</strong><br />
usual test <strong>of</strong> dynamic logics with �TEST φ� M = {(w, w) | w ∈ W and M, w |= φ}. The<br />
epistemic modalities thus express knowledge <strong>of</strong> an agent or a group <strong>of</strong> agents, including<br />
higher-order knowledge. Common knowledge among a group <strong>of</strong> agents is given by <strong>the</strong><br />
reflexive transitive closure <strong>of</strong> <strong>the</strong> union <strong>of</strong> all individual accessibilities <strong>of</strong> agents in <strong>the</strong><br />
group: [(i∪j ∪. . .) ∗ ] φ expresses that φ is common knowledge. 1 As an example, consider<br />
<strong>the</strong> following model (arrows in both directions are drawn as simple lines, and reflexive<br />
arrows are not drawn):<br />
w1<br />
p q r<br />
i, j<br />
j<br />
w0<br />
j<br />
p q r<br />
w2<br />
pq r<br />
Connections between two worlds with label i indicate that agent i confuses <strong>the</strong>se<br />
worlds. What is depicted is a knowledge state in which it is common knowledge among<br />
i and j that p (i.e. both know that p and both know that <strong>the</strong>y both know p, etc.) and that<br />
r → q, and where i also knows q but does not know r, whereas j is ignorant both about q<br />
and r.<br />
2.1 Direct yes/no questions and answerhood<br />
Now it is possible to formulate <strong>the</strong> ideas <strong>of</strong> Groenendijk & Stokh<strong>of</strong>’s partition semantics<br />
in E-PDL. Given <strong>the</strong> language above, we can define an additional process Fφ:<br />
F φ =def (TEST φ ; G ; TEST φ)∪(TEST ¬φ ; G ; TEST ¬φ) where G = W ×W<br />
With respect to <strong>the</strong> semantics given above, Fφ denotes an equivalence relation:<br />
�F φ� M = {(w, w ′ ) | M, w |= φ iff M, w ′ |= φ}<br />
1 The possibility to express common knowledge makes E-PDL more expressive than simple multi-agent<br />
epistemic logics that arise from extending propositional logic with knowledge modalities.<br />
196
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
The process F thus partitions <strong>the</strong> model with respect to a formula. Using <strong>the</strong> restriction<br />
operation on relations, we will refer to <strong>the</strong> partition cell containing a world w (in a<br />
model M) as [w]F φ. An example is given in <strong>the</strong> following figure, where <strong>the</strong> model was<br />
partitioned with respect to p.<br />
p<br />
p<br />
p<br />
w<br />
[w]F p<br />
Important to note is that a question is not a formula, as in (Groenendijk and Stokh<strong>of</strong>,<br />
1984), but a process. This might seem like a minor difference, but it will allow us in <strong>the</strong><br />
next section to use questions as updates and <strong>the</strong>reby talk about <strong>the</strong> communicative act <strong>of</strong><br />
questioning.<br />
Defining <strong>the</strong> process F is enough already to define what it means for a formula to be a<br />
true or a possible answer to a question.<br />
Definition 1. A formula ψ is a true answer to <strong>the</strong> question whe<strong>the</strong>r φ, w.r.t. a model M<br />
and a world w, if for all w ′ ∈ W : M, w ′ |= ψ iff w ′ ∈ {v | (w, v) ∈ [w]F φ}.<br />
Definition 2. A formula ψ is a possible (or: appropriate) answer to <strong>the</strong> question whe<strong>the</strong>r<br />
φ, w.r.t. a model M, if <strong>the</strong>re is some w ∈ W , such that ψ is a true answer to <strong>the</strong> question<br />
whe<strong>the</strong>r φ w.r.t. M and w.<br />
In o<strong>the</strong>r words, a possible answer is a formula with a denotation that spans exactly one<br />
<strong>of</strong> <strong>the</strong> partition cells induced by <strong>the</strong> question; it is a true answer if this partition cell is<br />
<strong>the</strong> actual one with respect to a particular world w. The somewhat small difference to <strong>the</strong><br />
picture <strong>of</strong> Groenendijk & Stokh<strong>of</strong> is that we do not rely on entailment between questions<br />
to define answerhood. Entailment between questions can never<strong>the</strong>less be explicated by<br />
requiring for two questions whe<strong>the</strong>r φ and whe<strong>the</strong>r ψ to entail each o<strong>the</strong>r that for all M:<br />
p<br />
p<br />
�F φ� M ⊆ �F ψ� M<br />
Up to now, we have a logic with basic propositions and boolean combinations, toge<strong>the</strong>r<br />
with epistemic operations on <strong>the</strong>se, that represent knowledge <strong>of</strong> (groups <strong>of</strong>) agents. This<br />
gave us a possibility to talk about questions as partitioning processes and about formulas<br />
being answers to questions. However, it tells us nothing about what it means to pose<br />
a question in a communication, about its effects, and about what it means to answer it,<br />
because we have no means yet to talk about communicative actions. For that we will<br />
move to a Dynamic Epistemic Logic that also contains public announcements and public<br />
questions.<br />
197<br />
F p<br />
p
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
3 Questioning and answering<br />
Dynamic Epistemic Logic provides a logical framework for reasoning about knowledge <strong>of</strong><br />
(groups <strong>of</strong>) agents and change <strong>of</strong> this knowledge due to communication. Ano<strong>the</strong>r modal<br />
operator, taken from Public Announcement Logic (Plaza, 1989), is added to E-PDL, that<br />
models <strong>the</strong> event <strong>of</strong> all agents being told simultaneously and transparently that a certain<br />
formula holds. Change <strong>of</strong> knowledge induced by announcements corresponds to updates<br />
<strong>of</strong> knowledge states. Analogously, we add a modal operator for public questions and<br />
a one-place predicate to turn a formula φ into a formula WH φ. Thus, <strong>the</strong> language is<br />
extended in <strong>the</strong> following way:<br />
φ ::= ... | [!φ] φ | [?φ] φ | WH φ<br />
The communicative effect <strong>of</strong> a public announcement is given by a restriction operation on<br />
epistemic models:<br />
M, w |= [!φ] φ ′ iff M, w |= φ implies M | φ, w |= φ ′<br />
Where M | φ is <strong>the</strong> restriction <strong>of</strong> M with φ, i.e. <strong>the</strong> epistemic model M ′ = (W ′ , V ′ , R ′ )<br />
with W ′ = {w ∈ W | M, w |= φ} and V ′ <strong>the</strong> restiction <strong>of</strong> V to W ′ and R ′ <strong>the</strong> result <strong>of</strong><br />
restricting each ∼i to W ′ × W ′ .<br />
The interpretion <strong>of</strong> <strong>the</strong> question update has to be different, because asking a question in<br />
a communicative situation obviously has no effect on <strong>the</strong> knowledge <strong>of</strong> agents. It ra<strong>the</strong>r<br />
creates or shifts <strong>the</strong> focus <strong>of</strong> <strong>the</strong> conversation, in many cases to particular alternatives<br />
among which <strong>the</strong> answer lies. To model <strong>the</strong> focus <strong>of</strong> a conversation we add to R in <strong>the</strong><br />
model an additional accessibility relation FOCUS, which initially denotes W × W and is<br />
visible for all agents. The question update can <strong>the</strong>n be interpreted as reseting FOCUS:<br />
M, w |= [?φ] ψ iff M[FOCUS := �F φ� M ], w |= ψ<br />
Where M[FOCUS := �F φ� M ] is like M except for that <strong>the</strong> denotation <strong>of</strong> <strong>the</strong> relation<br />
FOCUS is set to <strong>the</strong> denotation <strong>of</strong> F φ.<br />
Answering can <strong>the</strong>n simply be seen as announcement <strong>of</strong> an answer. Note that an appropriate<br />
answer automatically resets <strong>the</strong> FOCUS relation to its default value (i.e. neutral<br />
focus W × W ), because <strong>the</strong> update with <strong>the</strong> answer will eliminate all but one partition<br />
cell. From <strong>the</strong> definition <strong>of</strong> an appropriate answer it also follows immediately that <strong>the</strong><br />
following holds.<br />
Proposition 1. If ψ is an appropriate answer to <strong>the</strong> question whe<strong>the</strong>r φ, <strong>the</strong>n for all<br />
w : M, w |= [?φ][!ψ] φ or M, w |= [?φ][!ψ]¬φ.<br />
It expresses that appropriate answers do indeed answer <strong>the</strong> question. (For announcements<br />
in general it is <strong>of</strong> course not <strong>the</strong> case that ei<strong>the</strong>r [!ψ]φ or [!ψ]¬φ holds.)<br />
This is best demonstrated by an example. Assume two agents Turing (t) and Church<br />
(c), and let p and q be propositions (for example, p for For every program we can<br />
decide whe<strong>the</strong>r it halts and q for There is no general algorithm to decide whe<strong>the</strong>r a<br />
property <strong>of</strong> natural numbers is true or not). The knowledge state <strong>of</strong> Turing and Church<br />
is depicted in <strong>the</strong> leftmost model <strong>of</strong> <strong>the</strong> below figure. For example, Turing does not know<br />
p, but he knows that Church knows whe<strong>the</strong>r p. These are preconditions usually considered<br />
198
to hold for questions to be felicitous. So Turing decides to ask whe<strong>the</strong>r p. 2 After updating<br />
with this question, appropriate answers will be p and ¬p (or formulas equivalent to <strong>the</strong>se).<br />
Assuming that w0 is our world <strong>of</strong> reference, <strong>the</strong> true answer is ¬p. Announcement <strong>of</strong> <strong>the</strong><br />
true answer eliminates world w1, which results in Turing knowing ¬p. In fact he now also<br />
happens to know q, whereas Church is still ignorant about it.<br />
w0<br />
p q<br />
t<br />
c<br />
w1<br />
p q<br />
w2<br />
p q<br />
?p<br />
3.1 Embedded yes/no questions<br />
w0<br />
p q<br />
t<br />
c<br />
w1<br />
p q<br />
w2<br />
p q<br />
!¬p<br />
w0<br />
p q<br />
c<br />
w2<br />
p q<br />
This approach is not at all restricted to direct questions but can straightforwardly deal with<br />
embedded questions as well, by means <strong>of</strong> <strong>the</strong> predicate WH. Embedded questions differ<br />
from direct questions in that <strong>the</strong>y seem to refer to a specific partition cell (namely <strong>the</strong> true<br />
one) and not <strong>the</strong> partitioning as a whole, yet do not give away which partition cell is <strong>the</strong><br />
true one. We already have <strong>the</strong> means to achieve this:<br />
�WH φ� M,w =def {w ′ | (w, w ′ ) ∈ [w]F φ}<br />
The formula WH φ can now be embedded in o<strong>the</strong>r formulas, for example in statements<br />
about <strong>the</strong> knowledge <strong>of</strong> agents. E.g., Ian knows whe<strong>the</strong>r Penicillin was flown in can<br />
be represented as [i] (WH p). In general, <strong>the</strong> following facts hold.<br />
Proposition 2. [i] (WH φ) |= [i] φ ∨ [i] ¬φ<br />
I.e. if an agent knows whe<strong>the</strong>r a formula is a case, he ei<strong>the</strong>r knows <strong>the</strong> formula or its<br />
negation.<br />
Proposition 3. [i] (WH φ) |�= φ and analogously [i] (WH φ) |�= ¬φ<br />
I.e. <strong>the</strong> statement that an agents knows whe<strong>the</strong>r a formula is <strong>the</strong> case does not provide <strong>the</strong><br />
information whe<strong>the</strong>r <strong>the</strong> formula or its negation is <strong>the</strong> case.<br />
4 Constituent questions<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Let us shortly sketch how our approach can be extended to <strong>the</strong> predicate logic case, to<br />
also account for constituent questions in <strong>the</strong> line <strong>of</strong> Groenendijk & Stokh<strong>of</strong>’s work.<br />
For this, we start with a first order dynamic logic:<br />
t ::= c | x<br />
φ ::= ⊤ | t | P t . . . t | ¬φ | φ ∧ φ | ∃x φ | [π]<br />
π ::= i | π ; π | π ∪ π | π ∗ | TEST φ<br />
2 Actually announcements and questions are not parametrized with respect to an agent. We will shortly<br />
come back to that in section 5.<br />
199
where c ranges over constants, and x ranges over variables. To this language, public<br />
announcements, questions, and a question embedding predicate are added, as above:<br />
φ ::= ... | [!φ] φ | [?φ] φ | WH φ<br />
This language is interpreted with respect to a first order model M with fixed domain, a<br />
world w, and a variable assignment g, as usual.<br />
Like Groenendijk & Stokh<strong>of</strong>, we add ano<strong>the</strong>r operator to bind variables.<br />
Definition 3. If φ is a formula in which all and only <strong>the</strong> variables x1, . . . , xn have one or<br />
more free occurrences, <strong>the</strong>n Qx1 . . . xn φ is a formula.<br />
A partitioning process over such a formula will model constituent questions. E.g.<br />
F Qx P x corresponds to <strong>the</strong> question about which entity lies in <strong>the</strong> denotation <strong>of</strong> P . The<br />
semantics can be adopted from (Groenendijk and Stokh<strong>of</strong>, 1997):<br />
where<br />
�Qx1 . . . xn φ� M,w,g = {w ′ | 〈Qx1 . . . xn φ〉 M,w′ ,g = 〈Qx1 . . . xn φ〉 M,w,g }<br />
〈Qx1 . . . xn φ〉 M,w,g = { (g ′ (x1), . . . , g ′ (xn)) | M, w, g ′ |= φ,<br />
where g ′ (x) = g(x) for all x �= x1 . . . xn}<br />
I.e. �Qx1 . . . xn φ� M,w,g is <strong>the</strong> set <strong>of</strong> all worlds in which <strong>the</strong> same entities belong to <strong>the</strong><br />
extension <strong>of</strong> φ as in w. For example, for a question like Who is coming to <strong>the</strong> party?<br />
we get<br />
�F Qx P x� M,g = {(w, w ′ ) | �Qx P x� M,w,g = �Qx P x� M,w′ ,g }<br />
This means that F Qx P x partitions <strong>the</strong> model with respect to all possible extensions <strong>of</strong><br />
<strong>the</strong> predicate P . Thus, John is coming to <strong>the</strong> party is indeed a true answer to <strong>the</strong><br />
question if j is in <strong>the</strong> extension <strong>of</strong> P in <strong>the</strong> actual world.<br />
Notice that in <strong>the</strong> case <strong>of</strong> closed formulas φ, <strong>the</strong> process F φ models a yes/no question<br />
as above.<br />
4.1 Answerhood<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
In <strong>the</strong> case <strong>of</strong> constituent questions, we have to distinguish between partial and exhaustive<br />
true answers. This does not pose a problem, assuming that <strong>the</strong> result <strong>of</strong> announcing an<br />
exhaustive answer to a question φ is equal to [w]F φ, where w is <strong>the</strong> actual world, while<br />
<strong>the</strong> result <strong>of</strong> announcing a partial answer is a subset <strong>of</strong> FOCUS that contains [w]F φ.<br />
4.2 Embedded constituent questions<br />
The WH predicate introduced in section 3 for embedding yes/no question also works for<br />
embedding constituent questions. Consider, for example, <strong>the</strong> embedding <strong>of</strong> who is coming<br />
to <strong>the</strong> party (as in Ian knows who is coming to <strong>the</strong> party): its interpretation is<br />
�WH Qx P x� M,g = {w ′ | (w, w ′ ) ∈ [w]F Qx P x}<br />
As mentioned above, F Qx P x partitions <strong>the</strong> model with respect to all possible extensions<br />
<strong>of</strong> P . Thus [w]F Qx P x corresponds to <strong>the</strong> true exhaustive answer. Therefore, Ian knows<br />
who is coming to <strong>the</strong> party means that Ian knows <strong>the</strong> exhaustive answer to <strong>the</strong> question<br />
about who is coming to <strong>the</strong> party.<br />
200
5 Fur<strong>the</strong>r research<br />
One goal for a semantics <strong>of</strong> direct questions, that we did not touch upon yet, is <strong>of</strong>fering<br />
updates YES and NO as answers to yes/no questions. Having <strong>the</strong>m correspond to <strong>the</strong><br />
actual use <strong>of</strong> natural language yes and no, however, is a quite complex matter. Leaving<br />
this discussion aside, <strong>the</strong> most straight-forward way to accommodate <strong>the</strong> possibility <strong>of</strong><br />
simple yes/no answers in our system would be to add a yes/no predicate to <strong>the</strong> language,<br />
which a question whe<strong>the</strong>r φ sets to <strong>the</strong> denotation <strong>of</strong> φ. YES would <strong>the</strong>n correspond<br />
to <strong>the</strong> announcement <strong>of</strong> this predicate, whereas NO would be <strong>the</strong> announcement <strong>of</strong> its<br />
negation. Having such a predicate provides ano<strong>the</strong>r possibility to specify answerhood:<br />
a formula ψ would be an answer to <strong>the</strong> question whe<strong>the</strong>r φ if it were equivalent to <strong>the</strong><br />
predicate or its negation. This would actually suffice to also get all <strong>the</strong> partitioning effects<br />
and propositions we talked about in section 2. As long as only alternative questions are<br />
addressed, it is just a question <strong>of</strong> design whe<strong>the</strong>r to use such a predicate or a relation as we<br />
did. The advantage <strong>of</strong> our proposal, however, is it can be extended to <strong>the</strong> predicat logic<br />
case for constituent questions. It thus allows to use one uniform mechanism underlying<br />
both kinds <strong>of</strong> questions.<br />
Possibly <strong>the</strong> main benefit <strong>of</strong> our proposal is <strong>the</strong> general benefit <strong>of</strong> Dynamic Epistemic<br />
Logic for natural language analysis: DEL provides a powerful framework for <strong>the</strong> formalization<br />
<strong>of</strong> pragmatic concepts that play a role in communicative situations. Based on<br />
this, it can also serve to explore <strong>the</strong> interaction between <strong>the</strong>se phenomena. Let us illustrate<br />
this by shortly looking at presuppositions. An update <strong>of</strong> a question ψ that carries a<br />
presupposition φ can be expressed as <strong>the</strong> update (TEST φ ; ?ψ), which first tests whe<strong>the</strong>r<br />
<strong>the</strong> presupposition is satisfied, and, if so, updates with <strong>the</strong> actual question. If <strong>the</strong> presupposition<br />
test fails, <strong>the</strong> whole update is not successful. Such a presuppositon could, for<br />
example, be that <strong>the</strong> speaker who asks does not know <strong>the</strong> answer but holds it possible that<br />
someone among <strong>the</strong> addressees knows it. For this, one would need a means to parametrize<br />
announcements (and question updates) to a certain agent, which, to our knowledge, has<br />
not been done yet. On <strong>the</strong> o<strong>the</strong>r hand side, one can ask whe<strong>the</strong>r a proposition is <strong>the</strong><br />
case which contains a presupposition itself, e.g. Did John stop smoking?. Adopting<br />
<strong>the</strong> treatment <strong>of</strong> presuppositions in (Eijck and Unger, 2007), asking a proposition ψ that<br />
carries a presupposition φ, would set FOCUS to <strong>the</strong> denotation <strong>of</strong> F (Cφ ∧ ψ) (where Cφ<br />
abbreviates that φ is common knowledge). If Cφ is false, however, <strong>the</strong> partitioning induced<br />
by Cφ ∧ ψ consists just <strong>of</strong> one partition cell, thus fails to create a focus with which<br />
an informative answer would be possible.<br />
Fur<strong>the</strong>rmore, incorporating not only knowledge but also belief would allow for modeling,<br />
among <strong>the</strong> presuppositions, different expectations in question pairs like Are you<br />
going to Groningen? and Aren’t you going to Groningen?.<br />
Last but not least, ano<strong>the</strong>r possible line <strong>of</strong> research is to investigate how our FOCUS<br />
relation connects to Rooth’s <strong>the</strong>ory <strong>of</strong> focus interpretation (Rooth, 1992), and thus explore<br />
<strong>the</strong> connection between questions and information-structural focus.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Eijck, J. v. and Unger, C. (2007). The epistemics <strong>of</strong> presupposition projection, in<br />
M. Aloni, P. Dekker and F. Roel<strong>of</strong>sen (eds), <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> Sixteenth Amsterdam<br />
Colloquium, December 17–19, 2007, ILLC, Amsterdam, pp. 235–240.<br />
201
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Groenendijk, J. and Stokh<strong>of</strong>, M. (1984). Studies on <strong>the</strong> Semantics <strong>of</strong> Questions and <strong>the</strong><br />
Pragmatics <strong>of</strong> Answers, PhD <strong>the</strong>sis, Universiteit van Amsterdam.<br />
Groenendijk, J. and Stokh<strong>of</strong>, M. (1997). Questions, in J. van Ben<strong>the</strong>m and A. ter Meulen<br />
(eds), Handbook <strong>of</strong> logic and language, Elsevier, chapter 19, pp. 1055–1124.<br />
Hamblin, C. (1973). Questions in montague english, Foundations <strong>of</strong> Language 10: 41–53.<br />
Higginbotham, J. and May, R. (1981). Questions, quantifiers and crossing, Linguistic<br />
Review 1: 41–79.<br />
Karttunen, L. (1977). Syntax and semantics <strong>of</strong> questions, Linguistics and Philosophy 1: 1–<br />
44. Also published in: Portner & Partee (eds.): Formal Semantics. The Essential<br />
Readings. Blackwell, 2003, pp 382–420.<br />
Plaza, J. (1989). Logics <strong>of</strong> public communications, in M. Emrich, M. Pfeifer,<br />
M. Hadzikadic and Z. Ras (eds), <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> 4th International Symposium<br />
on Methodologies for Intelligent Systems, pp. 201––216.<br />
Rooth, M. (1992). A <strong>the</strong>ory <strong>of</strong> focus interpretation, Natural Language Semantics<br />
1(1): 75–116.<br />
van Ben<strong>the</strong>m, J., van Eijck, J. and Kooi, B. (2006). Logics <strong>of</strong> communication and change,<br />
Information and Computation 204(11): 1620–1662.<br />
202
THE SEMANTIC CHANGE OF THE FRENCH -AGE-DERIVATION<br />
Melanie Uth<br />
University <strong>of</strong> Stuttgart<br />
Abstract. In this paper, I will investigate <strong>the</strong> diachrony <strong>of</strong> <strong>the</strong> French -age-derivation. I will<br />
argue that -age originally served to derive kind terms that have been reinterpreted as group<br />
terms and as true event nominalizations. The main hypo<strong>the</strong>sis will be that <strong>the</strong> reinterpretation<br />
<strong>of</strong> <strong>the</strong> -age-suffixation was enabled by <strong>the</strong> fact that <strong>the</strong> original derivatives and <strong>the</strong><br />
new derivatives share important features <strong>of</strong> <strong>the</strong>ir abstract conceptual representations. This<br />
approach to <strong>the</strong> diachrony <strong>of</strong> <strong>the</strong> -age-suffixation predicts that even nowadays, true event<br />
nominalizations in -age focus on <strong>the</strong> atelic parts <strong>of</strong> <strong>the</strong> event denoted by <strong>the</strong> base verb. As<br />
such, <strong>the</strong> proposal constitutes a (fur<strong>the</strong>r) evidence in favor <strong>of</strong> <strong>the</strong> hypo<strong>the</strong>sis that -age may<br />
be differentiated from its rival -ment by means <strong>of</strong> its specific aspectual characteristics.<br />
1 Introduction<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Modern French -ment and -age are <strong>of</strong>ten described as competing nominalization suffixes,<br />
since <strong>the</strong>y frequently attach to <strong>the</strong> same verbal bases. For example, Lüdtke (1987) gives<br />
a listing <strong>of</strong> 187 doublets, as e.g. gonflement/gonflage (’inflation’) and alludes to ”<strong>the</strong> numerically<br />
most important overlap in <strong>the</strong> French lexicon” (ibd.: 103). Contrary to that, in<br />
Old French, <strong>the</strong> -ment-suffixation was one <strong>of</strong> <strong>the</strong> standard procedures for deverbal event<br />
nominalization, while event nominalizations in -age were marginal. In our database, we<br />
attest 99,5% event nominalizations in -ment contrary to 0,5% deverbal nominalizations in<br />
-age. In this paper, I will investigate <strong>the</strong> conditions that enabled <strong>the</strong> propagation <strong>of</strong> true<br />
event nominalizations in -age from Old to Modern French. I will argue that -age could<br />
develop into an event nominalization suffix next to -ment because <strong>the</strong> two suffixes systematically<br />
differ as concerns <strong>the</strong> abstract conceptual representation <strong>of</strong> <strong>the</strong>ir derivatives.<br />
In section 2, I will concentrate on <strong>the</strong> Latin antecedents <strong>of</strong> <strong>the</strong> French -age-derivation, <strong>the</strong><br />
relational adjectives in -aticu, as well as on <strong>the</strong>ir substantivized forms that are transfered<br />
to Old French. Section 3 focusses on <strong>the</strong> genuinly French formations, i.e. group terms<br />
and true event nominalizations. In section 4, I will argue that <strong>the</strong> genesis <strong>of</strong> group terms<br />
in -age resulted from a reinterpretation <strong>of</strong> <strong>the</strong> borrowed terms, that was enabled by <strong>the</strong><br />
fact that <strong>the</strong> original derivatives and <strong>the</strong> new derivatives share important features <strong>of</strong> <strong>the</strong>ir<br />
abstract conceptual representations. In section 5, I hypo<strong>the</strong>size that an analogous reinterpretation<br />
occured in <strong>the</strong> deverbal domain, predicting that true event nominalizations in<br />
-age only attach to bases that are in some sense atelic. Finally, we will consider different<br />
analyses <strong>of</strong> Modern French -age-nominalizations showing that <strong>the</strong>se may indeed be<br />
characterized by <strong>the</strong> salience <strong>of</strong> atelic aspectual values.<br />
2 The antecedents <strong>of</strong> New French -age: relational -aticu-adjectives and borrowed<br />
substantivizations<br />
The French -age-suffixation developed from <strong>the</strong> Latin denominal relational adjectives in<br />
-aticu that served to sub-classify <strong>the</strong> type <strong>of</strong> object or event denoted by <strong>the</strong> head nouns<br />
203
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(e.g. census terraticus ’tax on land’). Denominal relational adjectives establish a relation<br />
between <strong>the</strong> head noun and <strong>the</strong> base noun whose exact semantic function needs to be<br />
specified by <strong>the</strong> semantics <strong>of</strong> <strong>the</strong> derivational constituents, and possibly by fur<strong>the</strong>r contextual<br />
influences, c.f. eg. Fradin (2008: 3ff). For <strong>the</strong> present purposes, <strong>the</strong> crucial point is<br />
that relational adjectives classify types <strong>of</strong> nouns (in <strong>the</strong> sense <strong>of</strong> Vergnaud & Zubizaretta<br />
(1992)), instead <strong>of</strong> concrete tokens.<br />
In recent work on kind terms it is common to treat ”kinds” on a pair with ”classes or<br />
”types” <strong>of</strong> objects (in <strong>the</strong> relevant sense). For example, Krifka et al. (1995) as well as<br />
Chierchia (1998) assume that common nouns generally have both a kind-referring function<br />
and a predicative function, predicates and kinds being related by virtue <strong>of</strong> <strong>the</strong> realization<br />
relation R in <strong>the</strong> sense that every object y in <strong>the</strong> extension <strong>of</strong> <strong>the</strong> predicate δ is<br />
an instantiation <strong>of</strong> <strong>the</strong> kind x (ιx.∀y[δp(y) ↔ R(y,x)]). 1 Building on this approach to<br />
kinds, we may conceive <strong>of</strong> <strong>the</strong> relational -aticu-adjectives as deriving terms referring to<br />
(sub-)kinds, in a way such that census terraticus is a kind <strong>of</strong> census, just as e.g. porcus<br />
silvaticus (’wild pig’) is a kind <strong>of</strong> porcus and canis venaticus (’staghound’) is a kind <strong>of</strong><br />
canis. 2<br />
During <strong>the</strong> transition from Latin to Old French, <strong>the</strong> -aticu-adjectives were substantivized<br />
and resulted in designations <strong>of</strong> taxes, rights, status etc. that are lexicalized and<br />
entail <strong>the</strong> traditional head noun as a semantic constituent (cf. Fleischman (1990: 10ff)):<br />
(1) TAX (chevage = bounty, capitation; from chief ):<br />
la fud subjecte e rendid chevage . . .<br />
<strong>the</strong>re was-3Sg subjected and paid-3Sg capitation . . .<br />
’he subjected it and paid capitation’ (NCA:reis)<br />
(2) RIGHT (passage = right to cross a territory; from passer ):<br />
si (. . . ) disent . . . , que il queroient passage . . .<br />
prt. say-3Pl that <strong>the</strong>y ask for passing<br />
’and (<strong>the</strong>y) said that <strong>the</strong>y ask for <strong>the</strong> right to pass. . . ’ (NCA: clari)<br />
It is important to note that, whereas <strong>the</strong> base nouns <strong>of</strong> <strong>the</strong>se lexicalized substantivizations<br />
consistently retain <strong>the</strong>ir kind-referring function, <strong>the</strong> interpretation <strong>of</strong> <strong>the</strong> incorporated<br />
head nouns varies between kind-reference and object-reference, depending on <strong>the</strong><br />
context. This difference is signalled by <strong>the</strong> determiner system <strong>of</strong> Old French, where terms<br />
that do not refer to actual extensions show up as bare nouns (cf. Foulet (1998: 49)):<br />
(3) a. et cele claciele guardoit en zz escrignet k il avoit quanqu<br />
and this little key keep-3Sg in a shrine that he got-3Sg when<br />
estovoit a monniage.<br />
was-3Sg in monasticism.<br />
‘and he kept this little key in a shrine that he got when he lived in monasticism’<br />
(NCA: P. Mouskes)<br />
b. ne fait sanblant que il s en faingne le singne fait dou<br />
not do-3Sg seeming that he refl <strong>of</strong> it feign-3Sg <strong>the</strong> monkey done by <strong>the</strong><br />
moniage.<br />
monkhood.<br />
‘he dit not seem to feign <strong>the</strong> monkey made by <strong>the</strong> monkhood’ (NCA: Renart)<br />
1 Contrary to Chierchia (1998), Krifka et al. (1995: 66) ”leave it open as to we<strong>the</strong>r every predicate has a<br />
corresponding kind individual”.<br />
2 See McNally & Boleda (2004) for a similar approach to relational adjectives. However, for <strong>the</strong> time<br />
being <strong>the</strong> above analysis is ment to refer only to Latin -aticu, not to relational adjectives in general.<br />
204
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Building on a representational format proposed by Fradin (2008: 4), <strong>the</strong> lexical semantics<br />
<strong>of</strong> <strong>the</strong> above substantivizations my be represented as in (4), where <strong>the</strong> index k is ment<br />
to indicate kind-reference,o signals object reference, and REL designates <strong>the</strong> relation that<br />
is introduced by <strong>the</strong> denominal adjective relating <strong>the</strong> denotation <strong>of</strong> <strong>the</strong> head noun (’rank’)<br />
to <strong>the</strong> one <strong>of</strong> <strong>the</strong> base noun (’monk’): 3<br />
(4) a. T (moniage) = (λxk.REL(xk, yk) ∧ rank ′ (xk) ∧ monk ′ (yk))<br />
b. T (moniage) = (λxk.REL(xk, yk) ∧ rank ′ (xk) ∧ monk ′ (yk))<br />
3 Genuinly French -age-derivatives: group terms and event nominals<br />
Next to <strong>the</strong> substantivized -age-derivatives borrowed from Latin <strong>the</strong>re are two genuinly<br />
French coinages, i.e. group terms (5) and true event nominalizations (6):<br />
(5) GROUP (porcage, ’porc’ = herd <strong>of</strong> swine):<br />
toutes mes bestes et le meilleur porc du porcage<br />
all my beasts and <strong>the</strong> best pig <strong>of</strong> <strong>the</strong> herd <strong>of</strong> pigs<br />
’all my beasts and <strong>the</strong> best pig <strong>of</strong> <strong>the</strong> herd <strong>of</strong> pigs’ (GO: B. deCaux)<br />
(6) EVENT NOMINALIZATION (mariage, ’marier’ = marriage):<br />
il firent le mariage du dit chevalier et de . . .<br />
<strong>the</strong>y make-3Pl <strong>the</strong> marriage <strong>of</strong> <strong>the</strong> said knight and <strong>of</strong> . . .<br />
’<strong>the</strong>y conducted <strong>the</strong> marriage <strong>of</strong> <strong>the</strong> mentioned knight and . . . ’ (NCA: vilhar)<br />
These forms developed from <strong>the</strong> substantivized -aticu-adjectives by means <strong>of</strong> three<br />
innovations: <strong>the</strong> replacement <strong>of</strong> <strong>the</strong> semantically incorporated head nouns by means <strong>of</strong><br />
”group <strong>of</strong>” and ”event <strong>of</strong>”, respectively, <strong>the</strong> strictly extensional interpretation <strong>of</strong> <strong>the</strong> <strong>the</strong>se<br />
new ”head nouns”, as well as <strong>the</strong> strictly extensional interpretation <strong>of</strong> <strong>the</strong> derivational<br />
bases, in a way such that <strong>the</strong> new derivatives generally refer to actual objects and events<br />
instead <strong>of</strong> kinds and event types:<br />
(7) T (porcage) = (λxo.REL(xo, yo) ∧ rank ′ (xo) ∧ pork ′ (yo))<br />
4 Approaching <strong>the</strong> origin <strong>of</strong> <strong>the</strong> group terms<br />
As regards <strong>the</strong> (semantic) constituents <strong>of</strong> <strong>the</strong> newly coined group nouns in -age, note that<br />
<strong>the</strong> new ”head nouns” are interpreted as denoting singular individuals (one group), while<br />
<strong>the</strong> kind-denoting base nouns have been reinterpreted as denoting plural individuals (e.g.<br />
several pigs). In <strong>the</strong> following I would like to argue that this hybrid character <strong>of</strong> <strong>the</strong> new<br />
coinages may be traced back to <strong>the</strong> fact that <strong>the</strong> kind-reference <strong>of</strong> <strong>the</strong> traditional -aticubase<br />
nouns was much more dependent on <strong>the</strong> instantiations <strong>of</strong> <strong>the</strong> respective kinds than<br />
<strong>the</strong> kind-reference <strong>of</strong> <strong>the</strong> head nouns.<br />
In section 2, we already argued that a common noun may principally denote both a<br />
kind as well as its instantiations. The relevant definition by Krifka et al. (1995: 66) is<br />
repreated in (8):<br />
3 This relation is specified as REL instead <strong>of</strong> R in order to separater it from <strong>the</strong> realization relation R<br />
mediating between a kind and its instances (cf. above).<br />
205
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(8) ιx.∀y[δp(y) ↔ R(y,x)]<br />
Roughly: “There is a kind x, such that every y that is in <strong>the</strong> extension <strong>of</strong> <strong>the</strong><br />
belonging predicate δp is a realization <strong>of</strong> x.”<br />
Fur<strong>the</strong>rmore, Krifka et al. (1995:78ff) show that, when a common noun shows up with<br />
kind-reference, <strong>the</strong> interpretation <strong>of</strong> <strong>the</strong> corresponding sentences <strong>of</strong>ten also involves <strong>the</strong><br />
instantiations <strong>of</strong> <strong>the</strong> kind. According to <strong>the</strong>se authors, ”even kind predicates [such as be<br />
extinct or be widespread (MU)] are related to properties <strong>of</strong> instances <strong>of</strong> <strong>the</strong> kind, if we<br />
engage in an analysis <strong>of</strong> <strong>the</strong> lexical meaning <strong>of</strong> such predicates (. . . ). For example, in<br />
order to show that <strong>the</strong> dodo is extinct, one has to show that <strong>the</strong>re have been realizations <strong>of</strong><br />
this kind in <strong>the</strong> past, that <strong>the</strong>re are no present realizations <strong>of</strong> this kind now, and, perhaps,<br />
that <strong>the</strong>re will be no more in <strong>the</strong> future” (ibd:78f). In <strong>the</strong> restant contexts triggering kindreference<br />
that are discussed by Krifka et al. (1995), <strong>the</strong> interpretation <strong>of</strong> <strong>the</strong> common<br />
noun relies to a still greater extent on <strong>the</strong> instantiations <strong>of</strong> <strong>the</strong> relevant kind. A central<br />
example is <strong>the</strong> so-called distinguishing property interpretation as in Dutchmen are good<br />
sailors meaning that ”<strong>the</strong> Dutch distinguish <strong>the</strong>mselves from o<strong>the</strong>r comparable nations by<br />
having good sailors” (ibd.: 82f). According to this interpretation, <strong>the</strong> verbal predicate is<br />
definitely related to properties <strong>of</strong> instantiations <strong>of</strong> <strong>the</strong> kind. 4<br />
Coming back to <strong>the</strong> Latin -aticu-adjectives, I would like to argue that <strong>the</strong> interpretation<br />
<strong>of</strong> <strong>the</strong>ir base nouns also essentially relied on <strong>the</strong> close relation between <strong>the</strong> kinds and <strong>the</strong>ir<br />
instantiations. For example, <strong>the</strong> kind-reference <strong>of</strong> <strong>the</strong> base noun baron <strong>of</strong> barnage (’quality<br />
<strong>of</strong> barons’) builds on <strong>the</strong> fact that an undefined number <strong>of</strong> instantiations <strong>of</strong> <strong>the</strong> kind is<br />
said to be distinguishably noble. This analysis largely holds for all borrowed derivatives<br />
derived from bases that refer to human beings, as e.g. eschevinage (’rank <strong>of</strong> jury men’),<br />
veuvage (’widowhood’),etc. By contrast, <strong>the</strong> kind interpretation <strong>of</strong> fiscal terms as porcage<br />
(’tax on swine’) is closely related to <strong>the</strong> instantiations since <strong>the</strong>se are <strong>the</strong> ones <strong>the</strong> taxes<br />
are to be payed for. Note that if we highlight <strong>the</strong> impact <strong>of</strong> <strong>the</strong> instantiations for <strong>the</strong> referential<br />
characteristics <strong>of</strong> <strong>the</strong> -aticu base nouns, this is not to say that <strong>the</strong> base nouns do<br />
not refer to kinds any longer. We are still faced with an intensional interpretation, i.e. <strong>the</strong><br />
instantiations do not need to exist in <strong>the</strong> actual world. For <strong>the</strong> sake <strong>of</strong> convenience, we<br />
will distinguish in what follows between intensionally defined instantiations and extensionally<br />
defined (actual) instances <strong>of</strong> kinds. Finally note that <strong>the</strong> instantiations <strong>of</strong> a kind<br />
are necessarily non-singular in <strong>the</strong> sense <strong>of</strong> Chierchia (1998:350) who argues that ”kinds<br />
(. . . ) will generally have a plurality <strong>of</strong> instances (even though sometimes <strong>the</strong>y may have<br />
just one or non). But something that is necessarily instantiated by just one individual (e.g.,<br />
<strong>the</strong> individual concept or transworld line associated with Gennaro Chierchia) would not<br />
qualify as a kind.”<br />
Contrary to that, if a given head noun like ’rank’ or ’quality’ refers to a kind, <strong>the</strong> entire<br />
kind as a whole is much more salient than <strong>the</strong> plurality <strong>of</strong> its instantiations. The second<br />
difference between <strong>the</strong> -aticu head nouns and <strong>the</strong> corresponding base nouns is already<br />
illustrated by example 3 above, i.e. whereas <strong>the</strong> base nouns consistently retain <strong>the</strong>ir kindreferring<br />
function, <strong>the</strong> head nouns refer to actual instances <strong>of</strong> <strong>the</strong> kind as soon as <strong>the</strong><br />
derivative shows up with an adequate predicate, as e.g. faire in (3b), repeated below as<br />
(9):<br />
4 Krifka et al. (1995) argue that <strong>the</strong> above sentence clearly differs from characterizing sentences as ”Potatoes<br />
contain vitamin C” in that <strong>the</strong> former unlike <strong>the</strong> latter is not adequately paraphrased by an indefinite<br />
singular NP (cf. ”A Dutchman is a good sailor” vs. ”A potatoe contains vitamin C”).<br />
206
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(9) ne fait sanblant que il s en faingne le singne fait<br />
not do-3Sg seeming that he refl. <strong>of</strong> it feign-3Sg <strong>the</strong> monkey done<br />
dou moniage. 5<br />
by <strong>the</strong> monkhood.<br />
‘he dit not seem to feign <strong>the</strong> monkey made by <strong>the</strong> monkhood’ (cf. 4b.)<br />
Hence, we may generalize that <strong>the</strong> incorporated head nouns <strong>of</strong> <strong>the</strong> borrowed -agederivatives<br />
are interperted as denoting ei<strong>the</strong>r <strong>the</strong> kind as an entirety or a concrete instance<br />
<strong>of</strong> it, whereas <strong>the</strong> morphologically established kind-reference <strong>of</strong> <strong>the</strong> corresponding base<br />
nouns essentially relies on <strong>the</strong> (non-singular) instantiations. I would like to argue that<br />
this difference between <strong>the</strong> interpretation <strong>of</strong> head nouns and base nouns is reflected at an<br />
abstract conceptual level <strong>of</strong> semantic representation, in a way such that <strong>the</strong> concepts denoted<br />
by <strong>the</strong> head nouns are interpreted as representing bounded entities without internal<br />
structure, whereas <strong>the</strong> concepts denoted by <strong>the</strong> base nouns are interpreted as representing<br />
unbounded entities composed <strong>of</strong> sub-individuals. Relying on Jackend<strong>of</strong>f (1991), we may<br />
represent <strong>the</strong> conceptual structure <strong>of</strong> e.g. le moniage (’<strong>the</strong> rank <strong>of</strong> monks’) as in (10),<br />
where <strong>the</strong> feature [± b] encodes <strong>the</strong> distinction between bounded entities like PIG and<br />
non-bounded entities like WATER, while <strong>the</strong> feature [± i] signals <strong>the</strong> distinction between<br />
non-structured individuals like BRICK and those that are composed <strong>of</strong> sub-individuals,<br />
like BUSES or CATTLE (cf. Jackend<strong>of</strong>f (1991: 20)). PL (”plural”) and REL (”relation”)<br />
denote functions that map between different values <strong>of</strong> b and i:<br />
(10) conceptual structure <strong>of</strong> le moniage (’<strong>the</strong> rank <strong>of</strong> monks’):<br />
[+b, -i RANK (REL ([-b, +i MONKS (PL ([+b, -i MONK ])])]<br />
As regards <strong>the</strong> group terms in -age, it is interesting to note that recent analyses tend to<br />
define even non-derived group terms like English committee as being semantically hybrid<br />
in a sense reminiscent to our substantivized -aticu-adjectives. For example, Barker (1992)<br />
proposes that a group term denotes an atomic individual that is (merely) related to <strong>the</strong><br />
plural individual constituting its members by a membership function f. An argument for<br />
<strong>the</strong> difference in extension between <strong>the</strong> group term and <strong>the</strong> related plural predicate is that<br />
<strong>the</strong>re are properties common to all <strong>of</strong> <strong>the</strong> members which are never true <strong>of</strong> <strong>the</strong> group. For<br />
example, Bill can be a member <strong>of</strong> committee A, whereas committee A cannot (cf. ibd.:<br />
73). Likewise a group may have properties that <strong>the</strong> collection <strong>of</strong> its members does not<br />
have, e.g. a group has members while a plurality does not.<br />
One piece <strong>of</strong> evidence corroborating <strong>the</strong> validity <strong>of</strong> this approach for <strong>the</strong> group nouns<br />
in -age comes from predicates that directly refer to <strong>the</strong> members <strong>of</strong> <strong>the</strong> group denoted by<br />
<strong>the</strong> corresponding -age-derivative:<br />
(11) li quens (. . . ) a fait son barnage asanbler.<br />
<strong>the</strong> count has done his knights asemble<br />
‘<strong>the</strong> count assembled his knights’ (NCA: elie)<br />
Adopting this approach to group nouns, we may conclude that <strong>the</strong> abstract conceptual<br />
representation <strong>of</strong> <strong>the</strong> genuinly French group nouns in -age is very similar to <strong>the</strong> one <strong>of</strong><br />
<strong>the</strong> borrowed -aticu-substantivizations, since it likewise entails a bounded non-composed<br />
5 We may assume that borrowed -age-derivatives as in (8) require a determiner since <strong>the</strong> verbal predicate<br />
triggers <strong>the</strong> type shifting from kind (e) to predicate (e,t), cf. e.g. Chierchia (1998: 353).<br />
207
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
concept (in this case <strong>the</strong> group individual), followed by an unbounded concept that is<br />
composed <strong>of</strong> sub-individuals (i.e. <strong>the</strong> members). The relevant conceptual representation<br />
is given in (12), COMP representing Jackend<strong>of</strong>f’s (1991) ”composed <strong>of</strong>”-function (cf.<br />
(ibd.: 23)).<br />
(12) [+b, -i GROUP (COMP ([-b, +i BARONS (PL ([+b, -i BARON ])])]<br />
According to this analysis, <strong>the</strong> substantivized -aticu-adjectives borrowed from Latin<br />
and <strong>the</strong> genuinly French group nouns in -age have <strong>the</strong> same ’skeleton’ (in <strong>the</strong> sense <strong>of</strong><br />
Lieber (2004)):<br />
(<strong>13</strong>) a. [+b, -i ([-b, +i (+b, -i) ])] (borrowed substantivizations)<br />
rank monks monk<br />
b. [+b, -i ([-b, +i (+b, -i) ])] (genuinely French group terms)<br />
group barons baron<br />
This suggests that <strong>the</strong> genesis <strong>of</strong> <strong>the</strong> group nouns in -age essentially relied on <strong>the</strong><br />
abstract conceptual representation <strong>of</strong> <strong>the</strong> substantivized -aticu-adjectives.<br />
Still, <strong>the</strong> differences between <strong>the</strong> borrowed derivatives and <strong>the</strong> new ones are remarkable,<br />
<strong>the</strong> most important development arguably being <strong>the</strong> change from <strong>the</strong> kind level to<br />
<strong>the</strong> level <strong>of</strong> actual instances, that accompanies <strong>the</strong> replacement <strong>of</strong> <strong>the</strong> incorporated head<br />
nouns. One possible approach to this change would be to assume that <strong>the</strong> incorporated<br />
head nouns were replaced by concepts whose quantitative constitution is closest to <strong>the</strong><br />
abstract conceptual representation <strong>of</strong> <strong>the</strong> traditional derivatives, hence <strong>the</strong> introduction <strong>of</strong><br />
<strong>the</strong> group-concept. According to this view, <strong>the</strong> new coinages represent <strong>the</strong> default realizations<br />
choosen by <strong>the</strong> native speakers because <strong>of</strong> <strong>the</strong>ir proximity to <strong>the</strong> skeleton in<br />
(<strong>13</strong>a).<br />
However, since we do not dispose <strong>of</strong> any concrete evidence that could corroborate such<br />
an analysis, this reasoning remains highly speculative. Fur<strong>the</strong>r investigation is needed<br />
to shed light on <strong>the</strong> exact diachronic development <strong>of</strong> <strong>the</strong> -age-derivatives. For our purposes,<br />
<strong>the</strong> most important conclusion from <strong>the</strong> above is that <strong>the</strong> internally plural shape<br />
<strong>of</strong> <strong>the</strong> base nouns exhibited by <strong>the</strong> substantivized -aticu-adjectives is transferred to <strong>the</strong><br />
genuinely French -age-derives, in a way such that, from a synchronic point <strong>of</strong> view on<br />
<strong>the</strong> group nouns, we may generalize that <strong>the</strong> attachment <strong>of</strong> -age necessarily involves <strong>the</strong><br />
pluralization <strong>of</strong> <strong>the</strong> base noun. This generalization may be captured by assuming that<br />
-age introduces a plural operator *P (in <strong>the</strong> sense <strong>of</strong> Link (1983)) into <strong>the</strong> relevant representation.<br />
In <strong>the</strong> following, I will argue that <strong>the</strong> restriction to pluralized bases extends to<br />
<strong>the</strong> deverbal domain and that it is this restriction that enabled <strong>the</strong> event nominalization in<br />
-age to become more and more productive despite <strong>of</strong> <strong>the</strong> existence <strong>of</strong> <strong>the</strong> akin procedure<br />
in -ment.<br />
5 Etymologically conditioned pluractionality and aspectual properties <strong>of</strong> New French<br />
-age<br />
Example 14 contrasts a deverbal substantivized -aticu-adjective (14a) with a true event<br />
nominalization in -age (14b). The nominalization in (14a) means ’right <strong>of</strong> crossing’, <strong>the</strong><br />
head noun as well as <strong>the</strong> base verb denoting event types. Contrary to that, <strong>the</strong> nominalization<br />
in (14b) means ’<strong>the</strong> event <strong>of</strong> passing’, <strong>the</strong> new head noun as well as <strong>the</strong> base verb<br />
referring to actual instances <strong>of</strong> events:<br />
208
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(14) a. si disent que il queroient passage (. . . ).<br />
prt. say-3Pl that <strong>the</strong>y ask for passing<br />
’<strong>the</strong>y said that <strong>the</strong>y ask for <strong>the</strong> right to pass’ (cf.(2))<br />
b. cil m abandona le passage de la haie mout doucement.<br />
this me allow-3Sg <strong>the</strong> passing <strong>of</strong> <strong>the</strong> hedge very gently<br />
’he very gently allowed me to pass <strong>the</strong> hedge’ (NCA: rose)<br />
Evidently, <strong>the</strong> borrowed deverbal -age-derivatives exhibit <strong>the</strong> same semantics as <strong>the</strong><br />
borrowed denominal ones. That is, <strong>the</strong> event type denoted by e.g. passage in <strong>13</strong>a. is<br />
closely related to its instantiations, since <strong>the</strong> right is conceded for instantiations <strong>of</strong> crossing<br />
that are,fur<strong>the</strong>rmore, restricted to distinguished territories. Similarly, fees denoted by<br />
e.g. pressoirage or troillage (’fee for use <strong>of</strong> <strong>the</strong> village press’, cf. Fleischman (1990:74))<br />
are payed for instantiations <strong>of</strong> pressing events that display specific properties, etc. Note<br />
that <strong>the</strong> fees and rights are still estimated and conceded for events in general, i.e. for event<br />
types. Never<strong>the</strong>less, <strong>the</strong> derivation <strong>of</strong> <strong>the</strong> corresponding -aticu-adjectives obviously relied<br />
on (typical) instantiations.<br />
Naturally, <strong>the</strong> parallel between <strong>the</strong> denominal and <strong>the</strong> deverbal borrowed substantivized<br />
-age-derivatives also extends to <strong>the</strong> head noun. That is, due to <strong>the</strong> accidental<br />
character <strong>of</strong> <strong>the</strong>ir kind-reference (showing up only since <strong>the</strong>y occurr in <strong>the</strong> realm <strong>of</strong> an<br />
-aticu-adjective), <strong>the</strong> head nouns are largely independent from <strong>the</strong>ir instantiations, focussing<br />
on <strong>the</strong> entire type <strong>of</strong> event as a whole. Fur<strong>the</strong>rmore, just as <strong>the</strong> denominal derivatives,<br />
<strong>the</strong> head nouns <strong>of</strong> <strong>the</strong> deverbal substantivizations may also be coerced to refer to<br />
actual instances by contextual means:<br />
(15) et<br />
and<br />
rendi<br />
gave-3Sg<br />
chascuns<br />
everyone<br />
son<br />
his<br />
passage<br />
toll<br />
a<br />
to<br />
ceuls<br />
those<br />
qui<br />
that<br />
leur<br />
<strong>the</strong>m<br />
avoient<br />
had<br />
presté.<br />
lended<br />
’and everyone returned his toll to those had lended it to <strong>the</strong>m.’ (NCA: vilhar)<br />
This parallelism <strong>of</strong> denominal and deverbal -aticu-substantivizations suggests that <strong>the</strong><br />
extensional reinterpretation <strong>of</strong> <strong>the</strong> deverbal ones (i.e. <strong>the</strong>ir shift from kind-reference to<br />
object-reference) may be modelled along <strong>the</strong> lines <strong>of</strong> <strong>the</strong> analysis proposed above for <strong>the</strong><br />
group nouns. In order to put forward this hypo<strong>the</strong>sis, I will draw on van Geenhoven (2005)<br />
who introduces <strong>the</strong> so-called pluractional operator, that corresponds to Link’s plural operator<br />
*P and that operates o verbal bases in order to ”distribute subevent times in various<br />
ways over <strong>the</strong> overall event time <strong>of</strong> an utterance”. Pluractionality and (indefinite) plurality<br />
join <strong>the</strong> characteristic <strong>of</strong> cumulative reference, a concept that was originally introduced<br />
to define <strong>the</strong> reference <strong>of</strong> mass nouns and indefinite plurals denoting homogeneous pluralities<br />
or masses. The crucial characteristic <strong>of</strong> an entity being in <strong>the</strong> extension <strong>of</strong>, for<br />
example, a mass term, is that its parts, as well as any sum <strong>of</strong> its parts, are in <strong>the</strong> extension<br />
<strong>of</strong> <strong>the</strong> same term. As is pointed out by Quine (1960:19), ”[s]o called mass terms like ’water’,<br />
’footwear’, and ’red’ have <strong>the</strong> semantic property <strong>of</strong> referring cumulatively: any sum<br />
<strong>of</strong> parts which are water is water.” Evidently, this characteristic can easily be transferred<br />
to <strong>the</strong> domain <strong>of</strong> eventualities, in <strong>the</strong> sense that atelic expressions refer cumulatively to<br />
eventualities, whereas telic expressions refer non-cumulatively to eventualities. Accordingly,<br />
van Geenhoven (2005: 6) takes pluractionality to be ”<strong>the</strong> true source <strong>of</strong> atelicity”,<br />
covering <strong>the</strong> ”atelic nature” (ibd.) <strong>of</strong> different lexical items as e.g. simple activity verbs<br />
(to sing), imperfective aspectual markers (engl. -ing) or frequency adverbs (occasionally).<br />
209
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Based on this approach, we may hypo<strong>the</strong>size that, due to <strong>the</strong> specific conceptual ’skeleton’<br />
<strong>of</strong> <strong>the</strong>ir antecedents, <strong>the</strong> innovative event nominalizations in -age referring to individual<br />
events are conceptualized as being internally pluractional (PLUR), just as <strong>the</strong><br />
innovative denominal group terms are perceived <strong>of</strong> as having pluralized bases:<br />
(16) conceptual representation <strong>of</strong> passage (’event <strong>of</strong> crossing’):<br />
[+b, -i EVENT (COMP ([-b, +i CROSSING (PLUR ([+b, -i CROSSING ])])]<br />
Unfortunately, this hypo<strong>the</strong>sis may hardly be verified for Old French, since true event<br />
nominalizations are only marginally represented in Old French corpora. However, evidence<br />
in favour comes from analyses <strong>of</strong> -age and -ment in Modern French, showing that<br />
event nominalizations in -age even nowadays exhibit aspectual characteristics related to<br />
pluractionality. One example is Bally (1965), who argues that -ment-nominalizations are<br />
generally very likely to be punctual or terminative, whereas -age-nominalizations ra<strong>the</strong>r<br />
realize durative and iterative aspectual values. Since aspectual values as iterativity, continuativity,<br />
durativity etc. may all be traced back to pluractionality (cf. van Geenhoven<br />
(2005: ibd.)), Bally’s differentiation <strong>of</strong> New French -age and -ment clearly supports our<br />
analysis.<br />
Interestingly, Martin (2008) <strong>of</strong>fers a detailed analysis <strong>of</strong> various aspectual differences<br />
between -age and -ment that is largely in line with Bally’s classification. For example,<br />
<strong>the</strong> non-terminativity <strong>of</strong> -age is illustrated by <strong>the</strong> complementary distribution <strong>of</strong> -age- and<br />
-tion-nominals in contexts as in (17):<br />
(17) a. Le dénazifiage de l’Allemagne (par X) a abouti sa dénazification (par X).<br />
’The denazifying <strong>of</strong> Germany (by X) resulted in its denazification (by X)’<br />
b. *La dénazification de l’Allemagne (par X) a abouti son dénazifiage (par X). The<br />
denazification <strong>of</strong> Germany (by X) resulted in its denazifying (by X).<br />
(Martin (2008: 12))<br />
Secondly, Martin argues that -age is able to denote longer eventive chains than -ment,<br />
as is evidenced by <strong>the</strong> fact -age-nominals derived from unergative intransitive bases exhibit<br />
an iterative interpretation, whereas <strong>the</strong> corresponding -ment-nominals are forced to<br />
show up with plural inflection in iterative contexts:<br />
(18) a. OK Une séance de miaulage. (singular)<br />
’A meouwing session’<br />
b. vs. * Une séance de miaulement. (singular)<br />
c. vs. OK Une séance de miaulements. (plural) (Martin (2008: 6))<br />
Thirdly, Martin states that -age contrary to -ment prefers internal arguments that are<br />
incrementally affected by <strong>the</strong> event denoted by <strong>the</strong> relevant base verb, a feature that is also<br />
displayed by o<strong>the</strong>r atelic expressions as e.g. <strong>the</strong> English Progressive (cf. van Geenhoven<br />
(2005: 12)).<br />
In my view, <strong>the</strong> above findings may be unified by assuming that <strong>the</strong> diachronically<br />
motivated restriction <strong>of</strong> <strong>the</strong> -age-derivation to pluractional bases carries over to Modern<br />
French where it is reflected by <strong>the</strong> fact that -age-nominalizations exhibit several aspectual<br />
values related to pluractionality, as e.g. iterativity, durativity or imperfectivity. A fur<strong>the</strong>r<br />
advantage <strong>of</strong> this analysis is that we may answer <strong>the</strong> question relating to <strong>the</strong> alleged suffix<br />
rivalry by arguing that -age could develop into an event nominalization suffix since it<br />
displays specific aspectual properties that distinguishes it from rival suffixes as -ment.<br />
210
6 Conclusion<br />
In this paper, I investigated <strong>the</strong> semantic development <strong>of</strong> <strong>the</strong> French -age-derivation. I<br />
argued that <strong>the</strong> genesis <strong>of</strong> group terms in -age resulted from a reinterpretation <strong>of</strong> <strong>the</strong><br />
borrowed substantivized -aticu-derivatives that was enabled by <strong>the</strong> fact that <strong>the</strong> original<br />
procedure and <strong>the</strong> new procedure share <strong>the</strong> same quantitative structure at an abstract level<br />
<strong>of</strong> conceptual representation. The result <strong>of</strong> this reinterpretation is that denominal -age<br />
is restricted to pluralized bases. With reference to van Geenhoven (2005), I <strong>the</strong>n related<br />
nominal plurality to verbal pluractionality through <strong>the</strong> notion <strong>of</strong> cumulative reference and<br />
I argued that <strong>the</strong> development in <strong>the</strong> deverbal domain strictly parallels <strong>the</strong> development<br />
in <strong>the</strong> denominal domain, in a way such that deverbal -age is restrained to pluractional<br />
bases. This analysis enables us to approach several questions concerning <strong>the</strong> change <strong>of</strong><br />
<strong>the</strong> -age-derivation. First <strong>of</strong> all, <strong>the</strong> change turns out to only affect more concrete levels<br />
<strong>of</strong> word formation, <strong>the</strong> basic skeleton being retained through <strong>the</strong> course <strong>of</strong> <strong>the</strong> diachronic<br />
development. This common ground constitutes both <strong>the</strong> condition that enabled <strong>the</strong> change<br />
to take place and <strong>the</strong> basic frame that determines <strong>the</strong> specific characteristics <strong>of</strong> <strong>the</strong> -agederivation<br />
to this day. Secondly, we may answer <strong>the</strong> question relating to <strong>the</strong> suffix rivalry<br />
by arguing that -age could develop into an event nominalization suffix since it displayes<br />
specific aspectual properties that distinguish it from its alleged rival -ment.<br />
Acknowledgements<br />
I wish to thank Martin Becker, Steffen Heidinger, Fabienne Martin, Achim Stein and<br />
Johannes Wespel for helpful discussions. Many thanks to <strong>the</strong> reviewers for <strong>the</strong>ir helpful<br />
comments, as well as to Fabienne Martin and Dennis Spohr for <strong>the</strong>ir technical support<br />
and <strong>the</strong>ir patience.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Bally, C. (1965). Linguistique générale et linguistique française, Francke, Berne.<br />
Barker, C. (1992). Group terms in english, Journal <strong>of</strong> Semantics 9: 69–93.<br />
Blum, C. (2002). Godefroy – Le Dictionnaire de l’Ancienne Langue française du IX e au<br />
XV e siècle [GO], Université Paris-Sorbonne. Electronic edition.<br />
Chierchia, G. (1995). Reference to kinds across languages, Natural Language and Linguistic<br />
Theory 6: 339–405.<br />
Fradin, B. (2008). On <strong>the</strong> semantics <strong>of</strong> denominal adjectives, On Line <strong>Proceedings</strong> <strong>of</strong> <strong>the</strong><br />
6th Mediterranean Morphology Meeting, Sept. 27-30, 2007, Vol. 2, Ithaca.<br />
Jackend<strong>of</strong>f, R. S. (1991). Parts and boundaries, Cognition 41: 9–45.<br />
Krifka, M., Pelletier, F. J., Carlson, G. N., ter Meulen, A., Chierchia, G. and Link, G.<br />
(1995). Genericity: An introduction, in G. N. Carlson and F. J. Pelletier (eds), The<br />
Generic Book, University <strong>of</strong> Chicago Press, Chicago.<br />
211
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Lieber, R. (2004). Morphology and Lexical Semantics, Cambridge University Press, Cambridge.<br />
Link, G. (1983). The logical analysis <strong>of</strong> plurals and mass terms: A lattice <strong>the</strong>oretical<br />
approach, in G. Link, R. Bauerle, C. Schwarze and A. von Stechow (eds), Meaning,<br />
Use and Interpretation <strong>of</strong> Language, Walter de Gruyter, Berlin, pp. 302–323.<br />
Lüdtke, J. (1978). Prädikative Nominalisierungen mit Suffixen im Katalanischen, Spanischen<br />
und Französischen, Niemeyer, Tübingen.<br />
Martin, F. (2008). The semantics <strong>of</strong> eventive suffixes in french. Paper presented to Formal<br />
Semantics in Moscow 4, 5th April 2008.<br />
McNally, L. and Boleda, G. (2004). Relational adjectives as properties <strong>of</strong> kinds, in<br />
O. Bonami and P. Cabredo H<strong>of</strong>herr (eds), Empirical Issues in Syntax and Semantics<br />
5, Papers from CSSP 2003, pp. 179–196.<br />
Quine, W. V. (1960). Word and Object, MIT Press, Cambridge, Mass.<br />
Stein, A. and Kunstmann, P. (2006). Le Nouveau Corpus d’Amsterdam [NCA], Universität<br />
Stuttgart, Institut für Linguistik/Romanistik, Stuttgart.<br />
van Geenhoven, V. (2005). Atelicity, pluractionality and adverbial quantification, in<br />
H. Verkuyl, H. de Swart and A. van Hout (eds), Perspectives on Aspect, Springer,<br />
The Ne<strong>the</strong>rlands, pp. 107–125.<br />
Vergnaud, J. and Zubizaretta, M. (1975). The definite determiner and <strong>the</strong> inalienable<br />
construction in french and english, Linguistic Inquiry pp. 595–652.<br />
212
ADVERSARY IMPLICATURES ∗<br />
Grégoire Winterstein<br />
Université Paris 7<br />
Abstract. The work reported in this paper deals with a certain preference that speakers show<br />
when reinforcing some conversational implicatures. We look at <strong>the</strong> apparent correlation between<br />
this class <strong>of</strong> inferences and <strong>the</strong> bi-partite classification <strong>of</strong> conversational implicatures<br />
proposed by L. Horn. We <strong>the</strong>n argue for a separation between <strong>the</strong> argumentative and inferential<br />
dimensions <strong>of</strong> an utterance and propose a brief explanation based on propositions by<br />
Ducrot.<br />
In this work we are interested in one aspect <strong>of</strong> what is classically considered as <strong>the</strong> reinforcement<br />
<strong>of</strong> conversational implicatures. In <strong>the</strong> first section we show that <strong>the</strong> felicitous<br />
reinforcement <strong>of</strong> implicatures isn’t free, as it is <strong>of</strong>ten considered to be (e.g. in<br />
(Levinson, 2000)). In some cases speakers show a preference for marking a contrast<br />
when reinforcing inferences, in o<strong>the</strong>rs a contrast can’t be used. We examine <strong>the</strong> properties<br />
<strong>of</strong> each class <strong>of</strong> implicatures defined in this manner. We <strong>the</strong>n look at <strong>the</strong> similarities<br />
between <strong>the</strong> class <strong>of</strong> inferences exhibiting this preference for contrast and <strong>the</strong> Q-based<br />
class <strong>of</strong> implicatures as defined by Horn. Ultimately, we discard <strong>the</strong> similarity as irrelevant<br />
to our purpose. More generally, we argue that a classical neo-gricean approach can’t<br />
give an explanation for <strong>the</strong> facts at hand.<br />
The second section aims at explaining <strong>the</strong>se facts in an argumentative perspective based<br />
on <strong>the</strong> works <strong>of</strong> Anscombre and Ducrot. We claim that some implicatures are in a systematic<br />
rhetorical opposition to <strong>the</strong> utterance <strong>the</strong>y are derived from, a fact which licenses<br />
<strong>the</strong> use <strong>of</strong> a contrast for reinforcement. Besides licensing it, this opposition seemingly<br />
requires <strong>the</strong> presence <strong>of</strong> contrast. We propose two different views to explain this preference.<br />
1 Empirical Domain<br />
1.1 Core data<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
The data presented in (1) is our prime example <strong>of</strong> study. In (1b) B’s answer is interpreted<br />
as carrying with it <strong>the</strong> implicature in (1c) 1 , a standard example <strong>of</strong> scalar implicature as<br />
presented, among o<strong>the</strong>rs, in (Horn, 1989).<br />
(1) a. A: Do you know whe<strong>the</strong>r John will come?<br />
b. B: It’s possible<br />
c. ❀It’s not sure<br />
d. It’s possible, but it’s not sure<br />
∗ I thank Pascal Amsili, Jacques Jayez, Frédéric Laurens, François Mouret and <strong>the</strong> audiences <strong>of</strong> FSIM’4<br />
and JSM’08 for <strong>the</strong>ir precious help and remarks during <strong>the</strong> preparation <strong>of</strong> this work.<br />
1 We use <strong>the</strong> notation A❀B to mean that <strong>the</strong> utterance <strong>of</strong> A implicates B<br />
2<strong>13</strong>
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
The inference (1c) can be reinforced as in (1d). What interests us is that an utterance such<br />
as (2), without an adversative discourse marker, sounds degraded compared to (1d) (as an<br />
answer to (1a)).<br />
(2) B: # It’s possible and it’s not sure<br />
We believe that <strong>the</strong> preference for (1d) over (2) is somehow unexpected. If <strong>the</strong> implicature<br />
(1c) is indeed conveyed by <strong>the</strong> utterance <strong>of</strong> (1b), one has to explain how it can be construed<br />
as “opposed” to <strong>the</strong> utterance that allowed its presence in <strong>the</strong> first place (as suggested by<br />
<strong>the</strong> adversative but). A similar fact is already noted in (Anscombre and Ducrot, 1983)<br />
with <strong>the</strong> following example:<br />
(3) Pierre s’imagine que Jacques et moi sommes de vieilles connaissances, mais pourtant<br />
on ne s’est jamais rencontrés.<br />
Pierre figures that Jacques and I are old-time friends, but we never met.<br />
Anscombre and Ducrot use (3) to illustrate <strong>the</strong> difference between <strong>the</strong>ir notions <strong>of</strong> argumentation<br />
2 and inference. Although <strong>the</strong> first part <strong>of</strong> <strong>the</strong> utterance allows an inference<br />
towards <strong>the</strong> second part, it is never<strong>the</strong>less argumentatively opposed to it and thus licences<br />
a contrast. (Horn, 1991) shows that more generally any kind <strong>of</strong> content related to an<br />
utterance U (by relations <strong>of</strong> implicature, presupposition, logical entailment. . . ) can be<br />
felicitously redunded as long it is argumentatively opposed to U. Therefore, as unexpected<br />
as <strong>the</strong> preference for a contrast might be in (1dd), <strong>the</strong> situation appears common.<br />
This prompts us to look at <strong>the</strong> argumentative properties <strong>of</strong> <strong>the</strong> implicatures relative to<br />
<strong>the</strong>ir mo<strong>the</strong>r-utterances. More specifically, we’ll be checking whe<strong>the</strong>r certain subtypes <strong>of</strong><br />
implicatures are distinguished by this argumentative behaviour.<br />
On a last note about <strong>the</strong> core-data, we wish to mention <strong>the</strong> case <strong>of</strong> <strong>the</strong> scale <strong>of</strong> quantifiers:<br />
〈all, some〉. Usually, scalar implicatures are exemplified with this latter scale as in<br />
(4).<br />
(4) a. A: How is your experiment going?<br />
b. B: I tested some <strong>of</strong> <strong>the</strong> subjects.<br />
c. ❀B didn’t test all <strong>the</strong> subjects.<br />
d. I tested some <strong>of</strong> <strong>the</strong> subjects, but not all.<br />
e. # I tested some <strong>of</strong> <strong>the</strong> subjects, and not all.<br />
We prefer to rely on (1) because <strong>the</strong> preference for using an adversative appears stronger<br />
in (1d) than in (4d). Nei<strong>the</strong>r (2) nor (4e) can be entirely ruled out. Both can be used as<br />
corrections <strong>of</strong> a previous statements (in those cases <strong>the</strong>y would probably have specific<br />
prosodic patterns). Putting this aside, we also observe that <strong>the</strong> preference for marking<br />
a contrast is less strong for <strong>the</strong> examples with quantifiers. Simple Google searches for<br />
<strong>the</strong> french quelques-uns et pas tous or english some and not all yield several thousands<br />
<strong>of</strong> occurrences, not all <strong>of</strong> <strong>the</strong>m corrections, whereas a search for possible and not certain<br />
only provides results <strong>of</strong> <strong>the</strong> form only possible and not certain. The presence <strong>of</strong> <strong>the</strong> adverb<br />
2 The notion <strong>of</strong> argumentation is rooted in Anscombre and Ducrot’s view on discourse. According to<br />
<strong>the</strong>m a speaker always talk to a point and his utterances argue for a certain conclusion, quite <strong>of</strong>ten <strong>the</strong> topic<br />
<strong>of</strong> <strong>the</strong> discourse, which may or may not be explicit. Merin considers that understanding <strong>the</strong> nature <strong>of</strong> this<br />
topic is what “figuring out <strong>the</strong> speaker’s apparent and real intentions” is about. Anscombre and Ducrot<br />
consider that some linguistic items or structures, such as almost, bear specific argumentative properties and<br />
thus entertain a systematic argumentative opposition or correlation with o<strong>the</strong>r propositions.<br />
214
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
only restricts <strong>the</strong> meaning <strong>of</strong> possible and <strong>the</strong>se examples aren’t conclusive compared to<br />
<strong>the</strong> some and not all ones. However, <strong>the</strong> effect <strong>of</strong> only is an interesting one and we shall<br />
return to it below.<br />
1.2 First attempt at a classification<br />
Ra<strong>the</strong>r unsurprisingly, if we look at <strong>the</strong> cancellation <strong>of</strong> <strong>the</strong> implicature (1c), we find that<br />
<strong>the</strong> use <strong>of</strong> an adversative is odd in (5a). A reformulation as in (5b) sounds better.<br />
(5) a. # It’s possible but it’s sure<br />
b. It’s possible and it’s even sure<br />
Such observations have already been made in (Benndorf and Koenig, 1998). Using data<br />
about <strong>the</strong> cancellation <strong>of</strong> implicatures, <strong>the</strong> authors argue for a treatment <strong>of</strong> <strong>the</strong> semantic<br />
contribution <strong>of</strong> <strong>the</strong> adversative but based on Horn’s distinction between Q-based and Rbased<br />
implicatures. This distinction appears relevant since R-based implicatures 3 allow a<br />
contrast for <strong>the</strong>ir cancellation as shown with various examples in (6).<br />
(6) a. Gwen took <strong>of</strong>f her socks and jumped into bed, but not in that order<br />
b. Billy cut a finger, but it wasn’t his<br />
c. Sam and Max moved <strong>the</strong> piano, but not toge<strong>the</strong>r<br />
As expected, <strong>the</strong> use <strong>of</strong> an adversative to reinforce <strong>the</strong> same implicatures yields odd sentences:<br />
(7).<br />
(7) a. # Gwen took <strong>of</strong>f her socks and jumped into bed, but in that order<br />
b. # Billy cut a finger, but it was his<br />
c. # Sam and Max moved <strong>the</strong> piano, but toge<strong>the</strong>r<br />
It should be noted that <strong>the</strong> sentences in (7) are out only under <strong>the</strong> assumption that <strong>the</strong><br />
considered implicatures are present. It is easy to imagine contexts for which all <strong>the</strong>se<br />
sentences are correct. For example, if sentence (7b) is uttered about some mafia henchman<br />
who breaks o<strong>the</strong>r people’s fingers on a daily basis, <strong>the</strong> sentence is quite felicitous but <strong>the</strong><br />
implicature we’re interested in isn’t conveyed in <strong>the</strong> first place.<br />
In (5) we’ve seen that <strong>the</strong> cancellation <strong>of</strong> scalar implicatures doesn’t allow a contrast.<br />
The same goes for all o<strong>the</strong>r types <strong>of</strong> Q-based implicatures 4 : (8a) is a clausal implicature<br />
as first described in (Gazdar, 1979), (8b) is based on an attitude predicate, (8c) is based<br />
on Grice’s maxim <strong>of</strong> Manner ra<strong>the</strong>r than <strong>of</strong> Quantity (and belongs to Levinson’s M-based<br />
implicatures class).<br />
(8) a. Bill is in <strong>the</strong> kitchen or <strong>the</strong> living room, (?but/and in fact) I know which<br />
b. John thinks that Mary is pregnant, (?but/and in fact) she is indeed expecting a<br />
child<br />
c. Sam caused Max’s death, (?but/and in fact) he actually killed him on purpose<br />
3 R-based implicatures are enrichments <strong>of</strong> an utterance related to underspecified aspects <strong>of</strong> <strong>the</strong> propositional<br />
content (temporal ordering, causal relations etc.) They come about in a wide variety <strong>of</strong> shapes. In<br />
(Levinson, 2000) <strong>the</strong>se inferences are called I-based implicatures.<br />
4 For Horn, Q-based implicatures are essentially negative in nature: an implicated meaning is calculated<br />
by taking into account which stronger, or more informative, relevant forms <strong>the</strong> speaker could have uttered<br />
but chose not to. This notion <strong>of</strong> Q-implicatures subsumes Levinson’s Q and M implicatures.<br />
215
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
As in (1c) <strong>the</strong> reinforcement <strong>of</strong> <strong>the</strong>se inferences seems better with some contrast 5 .<br />
(9) a. Bill is in <strong>the</strong> kitchen or <strong>the</strong> living room, ?(but) I don’t know which<br />
b. John thinks that Mary is pregnant, ?(but) she’s not<br />
c. Sam caused Max’s death, ?(but) he didn’t kill him on purpose<br />
Relying on <strong>the</strong>se observations, Benndorf and Koenig proposes to change <strong>the</strong> classical<br />
description <strong>of</strong> but, as given in (Anscombre and Ducrot, 1977) and reproduced in (10), by<br />
reducing Ducrot’s notion <strong>of</strong> argumentativity to Gricean inferences.<br />
(10) a. A sentence p but q is felicitous iff <strong>the</strong>re is a proposition H such that:<br />
b. p is an argument for H<br />
c. q is an argument for ¬H<br />
d. q argues more strongly for ¬H than p argues for H<br />
Benndorf and Koenig’s description is given in (11), where “world inference” stands for<br />
any inference deriving from world knowledge.<br />
(11) a. A sentence p but q is felicitous iff <strong>the</strong>re is a proposition H such that:<br />
b. H is an R-inference or a “world inference” derived from p<br />
c. q toge<strong>the</strong>r with <strong>the</strong> common ground entails ¬H<br />
1.3 The limits <strong>of</strong> a purely Gricean description<br />
The description <strong>of</strong> but given in (11) is attractive because it explicits Ducrot’s argumentativity<br />
with well-studied inference mechanisms. However, this proposition raises several<br />
issues.<br />
As noted about (3), Anscombre and Ducrot are adamant about distinguishing inference<br />
and argumentation. A good illustration <strong>of</strong> <strong>the</strong> difference between <strong>the</strong> two is exemplified<br />
in (12).<br />
(12) a. Mary almost fell.<br />
b. → Mary didn’t fell.<br />
c. Mary almost fell but she caught herself.<br />
The utterance <strong>of</strong> (12a) conventionally conveys (12b) 6 and yet a contrast is preferred in<br />
(12c) where <strong>the</strong> first sentence is connected with one entailing (12b) (we don’t use (12b) as<br />
such because <strong>the</strong> repetition <strong>of</strong> <strong>the</strong> lexical material alters <strong>the</strong> judgment on (12c)). The use <strong>of</strong><br />
an adversative shows that (12a) and (12b) are argumentatively opposed. According to <strong>the</strong><br />
description <strong>of</strong> but given in (11), this amounts to say that on one hand (12a) conventionally<br />
conveys (12b) and at <strong>the</strong> same time R-implicates its opposite. Put more simply, this<br />
means that an utterance could, and should, convey two opposite inferences at <strong>the</strong> same<br />
time. If we adopt <strong>the</strong> classical Gricean view <strong>of</strong> an implicature as a part <strong>of</strong> meaning<br />
mutually recognized by both speaker and addressee, <strong>the</strong>n a speaker uttering (12c) should<br />
be contradicting himself, or at <strong>the</strong> very least sound “dissonant”.<br />
5 Actually <strong>the</strong> versions without any connector might sound acceptable with <strong>the</strong> second conjunct as an<br />
explanation <strong>of</strong> <strong>the</strong> first (especially for (9c)). We acknowledge such readings but won’t deal with <strong>the</strong>m<br />
directly. Our point lies in <strong>the</strong> fact that it’s not possible to reinforce <strong>the</strong>se inferences without enforcing a<br />
discourse relation. A Contrast relation is <strong>the</strong> most “natural” one to convey and it is <strong>the</strong> most compatible<br />
with all studied inferences.<br />
6 For a detailed study <strong>of</strong> <strong>the</strong> properties <strong>of</strong> almost see (Jayez and Tovena (Jayez and Tovena, 2008)).<br />
216
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Moreover, should we be able to find a sentence coordinated by but such that <strong>the</strong> second<br />
conjunct is <strong>the</strong> cancellation <strong>of</strong> a Q-based implicature, it would be a counter-example to<br />
<strong>the</strong> description in (11). We believe (<strong>13</strong>) is such an example 7 .<br />
(<strong>13</strong>) a. Mo<strong>the</strong>r: I hope Kevin has been polite with Granny and he has managed to eat<br />
some <strong>of</strong> her terrible cookies.<br />
b. Fa<strong>the</strong>r: The problem is, he did eat some <strong>of</strong> <strong>the</strong>m, but in fact he ate all <strong>of</strong> <strong>the</strong>m<br />
and Granny said that he was greedy.<br />
The use <strong>of</strong> some in answer (<strong>13</strong>b) is such that it excludes that Kevin ate all <strong>of</strong> <strong>the</strong> cookies:<br />
an implicature restricting <strong>the</strong> meaning <strong>of</strong> some seems present. Two options are available:<br />
1. In this particular utterance <strong>the</strong> implicature from some to not all isn’t a scalar implicature<br />
but an R-based one. On one hand, this would be consistent with (11).<br />
On <strong>the</strong> o<strong>the</strong>r hand, <strong>the</strong> presence <strong>of</strong> <strong>the</strong> reformulative item in fact is similar to <strong>the</strong><br />
standard cases <strong>of</strong> scalar implicature cancellation. At this stage, it would mean that<br />
<strong>the</strong>re are two different mechanisms for producing <strong>the</strong> same inference with similar<br />
characteristics except on <strong>the</strong> argumentative side: not a very desirable situation.<br />
2. The implicature is indeed a scalar implicature: in this case <strong>the</strong> argumentative orientation<br />
<strong>of</strong> <strong>the</strong>se inferences isn’t always opposed to <strong>the</strong>ir base-utterance. A simple<br />
Gricean approach is <strong>the</strong>n unable to provide a satisfactory analysis <strong>of</strong> <strong>the</strong> core data<br />
in (1). Since <strong>the</strong> description <strong>of</strong> but is usually given in an argumentative framework,<br />
this isn’t surprising. What has now to be explained is how <strong>the</strong> argumentativity <strong>of</strong><br />
<strong>the</strong>se inferences can be accounted for.<br />
A last observation we’ll make is that explaining <strong>the</strong> core data is much simpler once we<br />
abandon implicatures. Taking <strong>the</strong> meaning <strong>of</strong> some as more than 2 and possibly all <strong>the</strong>re<br />
is a clear opposition with a not all interpretation. Things are however a bit more tricky: as<br />
shown by (<strong>13</strong>b) <strong>the</strong> argumentative relationship between <strong>the</strong> some and not all propositions<br />
can vary. What we mean to investigate is on one hand <strong>the</strong> effect that this relation has on<br />
<strong>the</strong> discourse relations one can use to connect discourse segments and on <strong>the</strong> o<strong>the</strong>r hand<br />
<strong>the</strong> effect it has, if any, on <strong>the</strong> derivation <strong>of</strong> inferences.<br />
2 The argumentative approach<br />
Based on <strong>the</strong> observations <strong>of</strong> (1.3) we decide not to adopt <strong>the</strong> description <strong>of</strong> but given in<br />
(11) and keep <strong>the</strong> more traditional one in (10). We now have to explore <strong>the</strong> argumentative<br />
properties <strong>of</strong> implicatures. We will start with a short account <strong>of</strong> <strong>the</strong> argumentative<br />
properties <strong>of</strong> R-based implicatures and <strong>the</strong>n have a closer look at Q-based inferences.<br />
2.1 On <strong>the</strong> reinforcement <strong>of</strong> R-based implicatures<br />
We observed that utterances contrasting <strong>the</strong> content <strong>of</strong> an R-based implicature with its<br />
mo<strong>the</strong>r-utterrance were odd (cf. (7)) and that felicitously interpreting <strong>the</strong>se utterances<br />
implied contexts such that <strong>the</strong> targeted implicature didn’t arise in <strong>the</strong> first place. For <strong>the</strong>se<br />
7 Attested examples <strong>of</strong> this sort are rare, and even scarcer if we restrict <strong>the</strong>m to <strong>the</strong> specific use <strong>of</strong> but<br />
we’re interested in (namely Anscombre and Ducrot’s but/aber/sino), but we think that <strong>the</strong>y’re possible.<br />
217
particular inferences, it seems that we can argue for a systematic argumentative orientation<br />
regarding <strong>the</strong>ir mo<strong>the</strong>r-utterance.<br />
Contrary to <strong>the</strong>ir Q-based counterparts R-based, implicatures lack a propositional content<br />
<strong>of</strong> <strong>the</strong>ir own (as noted for example in (Levinson, 2000)). Expressing <strong>the</strong>m linguistically<br />
amounts to explicitely expressing an enriched version <strong>of</strong> <strong>the</strong> mo<strong>the</strong>r-utterance. Thus,<br />
expressing a contrast between an utterance B and <strong>the</strong> linguistic expression I <strong>of</strong> an hypo<strong>the</strong>tical<br />
R-implicature attached to B means contrasting two identical propositions: if B<br />
indeed carries an implicature, its full interpretation is I and B but I should be interpreted<br />
as I but I. The only way to “redeem” <strong>the</strong> sentence is to reject <strong>the</strong> implicature I associated<br />
with B and interpret B literally or with ano<strong>the</strong>r implicature. The description (11) is thus<br />
accounted for as a sufficient condition for <strong>the</strong> felicitous use <strong>of</strong> but, albeit not a necessary<br />
one.<br />
In <strong>the</strong> Relevance Theory approach by Sperber and Wilson (see (Wilson and Sperber,<br />
2005) for an introduction) <strong>the</strong> inferences in (7) belong to <strong>the</strong> realm <strong>of</strong> explicatures (see<br />
(Carston, 2005) for a presentation). A tempting generalization would <strong>the</strong>n be to say that<br />
<strong>the</strong> preference for marking a contrast is limited to <strong>the</strong> sole “real” implicatures and not<br />
observed in <strong>the</strong> case <strong>of</strong> explicatures. The latter wouldn’t be argumentatively opposed<br />
to <strong>the</strong> utterance <strong>the</strong>y’re attached to because <strong>the</strong>y’re enrichments <strong>of</strong> <strong>the</strong> meaning <strong>of</strong> an<br />
utterance. But, according to (Noveck and Sperber, 2007) and (Carston, 2005), most cases<br />
<strong>of</strong> scalar implicatures are really explicatures, including <strong>the</strong> examples in (1). Fur<strong>the</strong>rmore,<br />
in Grice’s famous “garage” example, reproduced in (14), <strong>the</strong> relevant inference is an<br />
implicature, not an explicature, and yet, it is its cancellation that demands a contrast,<br />
not its reinforcement (cf. <strong>the</strong> bracketed part in (14b)).<br />
(14) a. A: I am out <strong>of</strong> petrol.<br />
b. B: There is a garage round <strong>the</strong> corner, [but it’s closed].<br />
Therefore <strong>the</strong> distinction between explicature and implicature in Relevance Theory isn’t<br />
satisfactory to explain our data.<br />
2.2 Q-based inferences<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Recent works in experimental pragmatics (see (Breheny, Katsos and Williams, 2005)) distinguish<br />
contexts according to <strong>the</strong>ir relation with a targeted scalar inference: <strong>the</strong>y can be<br />
upper-bounded (allowing an interpretation with <strong>the</strong> implicature), lower-bounded (blocking<br />
an interpretation with <strong>the</strong> implicature) or neutral. These cognitive studies showed that<br />
<strong>the</strong> implicature at hand is only generated in upper-bounded contexts. Our main interest<br />
will be limited to <strong>the</strong>se upper-bounded contexts, and inside <strong>the</strong>se contexts to have an account<br />
<strong>of</strong> both <strong>the</strong> cases for which <strong>the</strong> preference is marked and those where it isn’t. In <strong>the</strong><br />
future we shall try to extend our analysis to all kinds <strong>of</strong> contexts, notably with <strong>the</strong> benefit<br />
<strong>of</strong> experimental data (see (4)).<br />
We will first outline how scalar implicatures are accounted for in an argumentative perspective<br />
and <strong>the</strong>n see that <strong>the</strong> possibility <strong>of</strong> marking a contrast between an implicature and<br />
its mo<strong>the</strong>r-utterance follows directly from <strong>the</strong> described mechanism. We’ll base our presentation<br />
on <strong>the</strong> account by Anscombre and Ducrot who introduced and first formalized<br />
<strong>the</strong> concept <strong>of</strong> argumentativity in discourse; our explanations are compatible with later<br />
argumentative frameworks, such as <strong>the</strong> decision-<strong>the</strong>oretic one proposed in (Merin, 1999).<br />
218
2.2.1 The derivation <strong>of</strong> Q-Implicatures<br />
The derivation <strong>of</strong> Q-implicatures has known various refinements in <strong>the</strong> argumentative<br />
perspective. The main argument behind this approach to implicatures is <strong>the</strong> possibility to<br />
give an account <strong>of</strong> various cases where no logical entailment scale is at play although a<br />
preference over propositions is observed (for numerous examples see (Hirschberg, 1985)).<br />
Ducrot, and Merin after him, proposes to replace <strong>the</strong> ordering <strong>of</strong> items based on logical relations<br />
by a relevance-based order (Merin’s relevance matches Ducrot’s argumentativity).<br />
The ordering <strong>of</strong> <strong>the</strong> items on a scale is determined by <strong>the</strong>ir argumentative force relative to<br />
<strong>the</strong> topic at hand in discourse. The apparent ordering by informativity (typically assumed<br />
in neo-Gricean approaches) is due to <strong>the</strong> fact that more informative propositions usually<br />
have more argumentative values. In (Ducrot, 1980):61 <strong>the</strong> derivation <strong>of</strong> an implicature<br />
such as (1b) is as follows:<br />
(15) a. 〈sure, possible〉H is an argumentative scale, i.e. a simple utterance including<br />
sure has more argumentative power, regarding a certain conclusion H, than one<br />
relying on possible, and possible has a semantic “at least” interpretation<br />
b. <strong>the</strong> utterance <strong>of</strong> (1b) gets fur<strong>the</strong>r interpreted by an exhaustivity law, similar to<br />
standard Gricean reasoning, and yields <strong>the</strong> desired meaning: since an utterance<br />
relying on sure would have been argumentatively superior and wasn’t used, one<br />
is entitled to infer that <strong>the</strong> corresponding proposition is false<br />
The point that matters here is that <strong>the</strong> implicatures come about from <strong>the</strong> negation <strong>of</strong> propositions<br />
that are argumentatively superior (this remains valid in Merin’s framework even<br />
though <strong>the</strong> mechanism is different).<br />
2.2.2 Results<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
According to <strong>the</strong> mechanism in (15), Q-based implicatures are necessarly argumentatively<br />
opposed to <strong>the</strong>ir mo<strong>the</strong>r-utterance: <strong>the</strong>y come about from <strong>the</strong> negation <strong>of</strong> a proposition<br />
that is argumentatively superior to <strong>the</strong>ir mo<strong>the</strong>r-utterance and thus <strong>the</strong>y argue in <strong>the</strong><br />
opposite direction. This explains <strong>the</strong> core data straightforwardly but not examples such as<br />
(<strong>13</strong>). If <strong>the</strong> sole way to derive Q-implicatures is through a unique argumentation-driven<br />
mechanism, <strong>the</strong>n <strong>the</strong> scalar implicature in (<strong>13</strong>) isn’t accounted for. As it happens, we can<br />
justify its presence on o<strong>the</strong>r grounds.<br />
In <strong>the</strong> context <strong>of</strong> (<strong>13</strong>) <strong>the</strong> proposition including all isn’t argumentatively superior to<br />
that containing some (i.e. to justify Kevin’s good behaviour, it’s better to say that he only<br />
ate some <strong>of</strong> <strong>the</strong> cookies). The mechanism (15) doesn’t exclude <strong>the</strong> all interpretation. On<br />
<strong>the</strong> o<strong>the</strong>r hand, what <strong>the</strong> speaker asserts sets a lower-bound on <strong>the</strong> argumentative force<br />
<strong>of</strong> its assertion: he means to convey something at least as argumentatively strong as its<br />
utterance. Since <strong>the</strong> all-proposition is argumentatively inferior to <strong>the</strong> some-proposition, it<br />
doesn’t belong to <strong>the</strong> speaker’s commitment (in Merin’s terms <strong>the</strong> all-proposition doesn’t<br />
belong to <strong>the</strong> speaker’s upward relevance cone).<br />
The second part <strong>of</strong> (<strong>13</strong>) should thus be treated as a way <strong>of</strong> correcting <strong>the</strong> first part. Such<br />
examples, where semantic and argumentative information are clearly decoupled could be<br />
an interesting starting point in <strong>the</strong> examination <strong>of</strong> <strong>the</strong> nature <strong>of</strong> correction as compared to<br />
reformulation.<br />
Cases including only, such as (16), can also be explained.<br />
219
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
(16) The complete extinction <strong>of</strong> mankind is only possible and not certain.<br />
In (16) only excludes <strong>the</strong> necessity <strong>of</strong> <strong>the</strong> extinction <strong>of</strong> mankind; re-asserting this exclusion<br />
can’t be argumentatively opposed to <strong>the</strong> first part <strong>of</strong> <strong>the</strong> utterance, <strong>the</strong>refore <strong>the</strong> use <strong>of</strong><br />
an adversative should be excluded. However, <strong>the</strong>re is a strong feeling for interpreting <strong>the</strong><br />
second conjunct as echoic. This isn’t surprising as <strong>the</strong> second conjunct is redundant. It’s<br />
an open question to know whe<strong>the</strong>r <strong>the</strong> use <strong>of</strong> only in those examples is limited to echoic<br />
cases (even without <strong>the</strong> second conjunct <strong>of</strong> <strong>the</strong> utterance).<br />
We can add an interesting side-observation to this. As we already remarked, <strong>the</strong> second<br />
conjunct <strong>of</strong> (<strong>13</strong>) demands a reformulative marker <strong>of</strong> some kind to mark <strong>the</strong> cancellation<br />
<strong>of</strong> <strong>the</strong> implicature. This remains valid even in <strong>the</strong> core-cases, as shown in (5b). This<br />
would mean that whereas adversatives aren’t sensitive to <strong>the</strong> presence <strong>of</strong> inferences (but<br />
only to argumentative properties <strong>of</strong> <strong>the</strong> utterances), reformulatives are (and are oblivious<br />
to argumentativity). This is a tentative hypo<strong>the</strong>sis that we shall try to pursue in future<br />
work.<br />
3 Obligatoriness <strong>of</strong> contrast<br />
We gave arguments to explain why <strong>the</strong> examples we’re interested in license a contrast.<br />
We gave no arguments as to why this contrast is preferred when overtly marked. A possibility<br />
we want to examine is <strong>the</strong> application <strong>of</strong> a principle close to Sauerland’s “Maximize<br />
Redundancy”, as stated in (Sauerland, 2008). This principle can be roughly paraphrased<br />
as urging a speaker to prefer, among a set <strong>of</strong> alternatives, a sentence that presupposes<br />
an already existing proposition over a sentence that presupposes nothing (with a pragmatic<br />
approach to presupposition as a proposition that is non-controversially part <strong>of</strong> all<br />
speakers’ Common Ground). Thus, a speaker should prefer saying <strong>the</strong> fa<strong>the</strong>r <strong>of</strong> <strong>the</strong> victim<br />
ra<strong>the</strong>r than a fa<strong>the</strong>r <strong>of</strong> <strong>the</strong> victim because <strong>the</strong> former presupposes a non-controversial<br />
proposition. Uttering <strong>the</strong> latter would suggest that <strong>the</strong> presupposition doesn’t obtain, contrary<br />
to common knowledge. Applied to our case, this means that given two contextually<br />
argumentatively opposed propositions p and q, a speaker will prefer to utter p but q ra<strong>the</strong>r<br />
than p and q. Using a simple conjunction implies that a contrast doesn’t hold between p<br />
and q and thus contradicts intuition, or at least makes <strong>the</strong> speaker sound “dissonant”. At<br />
this stage we need to fur<strong>the</strong>r back up this claim on at least two counts:<br />
1. by ensuring that <strong>the</strong> non-felicitousness <strong>of</strong> (4e) is related to that <strong>of</strong> utterances such as<br />
“a fa<strong>the</strong>r <strong>of</strong> <strong>the</strong> victim”, and that <strong>the</strong> preference is <strong>of</strong> <strong>the</strong> same order <strong>of</strong> magnitude<br />
(as we already mentioned, <strong>the</strong> preference for (4d) is far from absolute)<br />
2. by ensuring that <strong>the</strong> predictions made by <strong>the</strong> Maximization principle apply to <strong>the</strong><br />
cases we study; <strong>the</strong> notion <strong>of</strong> presupposition used by Sauerland is technical and<br />
doesn’t necessarly applies to <strong>the</strong> contrast conveyed by <strong>the</strong> use <strong>of</strong> but (i.e. what is<br />
<strong>of</strong>ten called a conventional implicature ra<strong>the</strong>r than a presupposition)<br />
An alternative explanation for <strong>the</strong> preference for a marked contrast would be to consider<br />
this preference as an idiosyncratic property <strong>of</strong> <strong>the</strong> relation at hand. This would be in line<br />
with <strong>the</strong> approach <strong>of</strong> (Asher and Lascarides, 2003), where it is claimed that <strong>the</strong> semantics<br />
<strong>of</strong> <strong>the</strong> relation <strong>of</strong> Contrast (as defined in SDRT) are such that <strong>the</strong> relation requires a<br />
specific clue to be used, ei<strong>the</strong>r an overt cue element such as but or intonation alone. When<br />
two connected discourse segments are such that <strong>the</strong> second denies a default consequence<br />
220
<strong>of</strong> <strong>the</strong> first, <strong>the</strong> relation <strong>of</strong> Contrast holds and needs to be marked. As an example, <strong>the</strong><br />
first and second segment <strong>of</strong> (17) are opposed: that John doesn’t like hockey is a default<br />
consequence <strong>of</strong> <strong>the</strong> first, since <strong>the</strong> relation <strong>of</strong> opposition exists it needs to be overtly<br />
marked.<br />
(17) John hates sports but he likes hockey.<br />
The preference we observe for using an adversative would <strong>the</strong>n be a consequence <strong>of</strong> <strong>the</strong><br />
particular semantics <strong>of</strong> <strong>the</strong> relation <strong>of</strong> Contrast. In our core data if one ignores <strong>the</strong> implicature<br />
<strong>the</strong> needed opposition is more obvious: <strong>the</strong> implicature denies part <strong>of</strong> <strong>the</strong> denotation<br />
<strong>of</strong> its mo<strong>the</strong>r-utterance and somehow contradicts part <strong>of</strong> it in <strong>the</strong> same way that <strong>the</strong> second<br />
conjunct in (17) denies a default consequence <strong>of</strong> <strong>the</strong> first, thus triggering <strong>the</strong> need for a<br />
Contrast marker.<br />
4 Conclusion<br />
We observed what seemed to be a constraint on <strong>the</strong> felicitous reinforcement <strong>of</strong> some<br />
implicatures. Approaches in traditional Gricean terms weren’t sufficient to explain all <strong>the</strong><br />
possible data we encountered. The main conclusion we drew from this data was that,<br />
despite an apparent strong correlation, inference mechanisms couldn’t be at <strong>the</strong> source <strong>of</strong><br />
<strong>the</strong> argumentative orientation <strong>of</strong> an utterance. Therefore, <strong>the</strong> observed constraint doesn’t<br />
seem to apply on <strong>the</strong> reinforcement operation itself but is ra<strong>the</strong>r due to different discourse<br />
coherence mechanisms. We took an argumentative approach and showed that <strong>the</strong> standard<br />
accounts <strong>of</strong> adversatives and implicatures in this approach worked toge<strong>the</strong>r to justify <strong>the</strong><br />
possibility <strong>of</strong> marking a contrast. The actual preference for marking this available contrast<br />
could be related to <strong>the</strong> intrinsic nature <strong>of</strong> <strong>the</strong> Contrast discourse relation.<br />
We mentioned experimental pragmatics as a mean to shed more light on <strong>the</strong> phenomeon<br />
we studied. Among <strong>the</strong> different points we intend to study are <strong>the</strong> following:<br />
• According to <strong>the</strong> context <strong>the</strong> preference for a marked contrast should differ. In<br />
particular we expect that in lower-bounded contexts <strong>the</strong> preference might disappears<br />
and that <strong>the</strong> use <strong>of</strong> but is odd or takes longer to process.<br />
• We made an hypo<strong>the</strong>sis about reformulatives such as in fact that need to be refined.<br />
They appear to be sensitive to informativity scales and somehow indifferent to <strong>the</strong><br />
argumentative orientation <strong>of</strong> <strong>the</strong> propositions <strong>the</strong>y connect. A test in lower-bounded<br />
contexts could prove relevant to determine <strong>the</strong> truth behind this hypo<strong>the</strong>sis.<br />
The results <strong>of</strong> <strong>the</strong>se experiments could provide support for <strong>the</strong> argumentative approach<br />
to semantics and pragmatics we presented, and thus to an explanation <strong>of</strong> <strong>the</strong> main, nontrivial,<br />
fact we observed: an utterance can convey an implicature and yet be argumentatively<br />
opposed to it.<br />
References<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Anscombre, J.-C. and Ducrot, O. (1977). Deux mais en français, Lingua 43: 23–40.<br />
Anscombre, J.-C. and Ducrot, O. (1983). L’argumentation dans la langue, Pierre<br />
Mardaga, Liège:Bruxelles.<br />
221
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
Asher, N. and Lascarides, A. (2003). Logics <strong>of</strong> Conversation, Cambridge: Cambridge<br />
University Press.<br />
Benndorf, B. and Koenig, J.-P. (1998). Meaning and context : German aber and sondern,<br />
in J.-P. Koenig (ed.), Discourse and cognition : bridging <strong>the</strong> gap, CSLI Publications,<br />
Stanford, pp. 365–386.<br />
Breheny, R., Katsos, N. and Williams, J. (2005). Are generalised scalar implicatures<br />
generated by default? an on-line investigation into <strong>the</strong> role <strong>of</strong> context in generating<br />
pragmatic inferences, Cognition .<br />
Carston, R. (2005). Relevance <strong>the</strong>ory and <strong>the</strong> saying/implicating distinction, in L. Horn<br />
and G. Ward (eds), The handbook <strong>of</strong> Pragmatics, Blackwell.<br />
Ducrot, O. (1980). Les échelles argumentatives, Les Éditions de Minuit.<br />
Gazdar (1979). Pragmatics: Implicature, Presupposition and Logical Form, New York :<br />
Academic Press.<br />
Hirschberg, J. (1985). A <strong>the</strong>ory <strong>of</strong> scalar implicature, PhD <strong>the</strong>sis, Univ. <strong>of</strong> Pennsylvania.<br />
Horn, L. (1989). A natural history <strong>of</strong> negation, The University <strong>of</strong> Chicago Press.<br />
Horn, L. (1991). Given as new: when redundant information isn’t, Journal <strong>of</strong> Pragmatics<br />
15(4): 3<strong>13</strong>–336.<br />
Jayez, J. and Tovena, L. (2008). Presque and almost: how argumentation derives from<br />
comparative meaning, in O. Bonami and P. C. H<strong>of</strong>herr (eds), Empirical Issues in<br />
Syntax and Semantics, Vol. 7, pp. 1–23.<br />
Levinson, S. C. (2000). Presumptive Meanings: The Theory <strong>of</strong> Generalized Conversational<br />
Implicature, MIT Press, Cambridge, MA, USA.<br />
Merin, A. (1999). Information, relevance and social decision-making, in L. Moss,<br />
J. Ginzburg and M. de Rijke (eds), Logic, Language, and computation, Vol. 2, CSLI<br />
Publications, Stanford:CA, pp. 179–221.<br />
Noveck, I. and Sperber, D. (2007). The why and how <strong>of</strong> experimental pragmatics: The<br />
case <strong>of</strong> ’scalar inferences’, in N. Burton-Roberts (ed.), Advances in Pragmatics,<br />
Palgrave Macmillan, Basingstoke.<br />
Sauerland, U. (2008). Implicated presuppositions, in A. Steube (ed.), Sentence and Context,<br />
Language, Context and Cognition, Mouton de Gruyter, Berlin. to appear.<br />
Wilson, D. and Sperber, D. (2005). Relevance <strong>the</strong>ory, in L. Horn and G. Ward (eds), The<br />
handbook <strong>of</strong> pragmatics, Blackwell.<br />
222
List <strong>of</strong> authors<br />
Martin Avanzini<br />
Martin.Avanzini@student.uibk.ac.at<br />
Institute <strong>of</strong> Computer Science<br />
University <strong>of</strong> Innsbruck<br />
Austria<br />
Timo Baumann<br />
timo@ling.uni-potsdam.de<br />
Institut für Linguistik<br />
Universität Potsdam<br />
Germany<br />
Christopher Brumwell<br />
chrisbrumwell@gmail.com<br />
ILLC<br />
Universiteit van Amsterdam<br />
The Ne<strong>the</strong>rlands<br />
Bert Le Bruyn<br />
Bert.LeBruyn@let.uu.nl<br />
Utrecht Institute <strong>of</strong> Linguistics<br />
Universiteit Utrecht<br />
The Ne<strong>the</strong>rlands<br />
James Burton<br />
jb162@brighton.ac.uk<br />
University <strong>of</strong> Brighton<br />
United Kingdom<br />
Gemma Celestino<br />
gceles@interchange.ubc.ca<br />
Department <strong>of</strong> Philosophy<br />
University <strong>of</strong> British Columbia &<br />
LOGOS Research Group<br />
Canada<br />
Dragan Doder<br />
ddoder@mas.bg.ac.yu<br />
Faculty <strong>of</strong> Mechanical Engineering<br />
Serbian Academy <strong>of</strong> Sciences and Arts<br />
Serbia and Montenegro<br />
Michael Franke<br />
m.franke@uva.nl<br />
ILLC<br />
Universiteit van Amsterdam<br />
The Ne<strong>the</strong>rlands<br />
Gianluca Giorgolo<br />
Gianluca.Giorgolo@let.uu.nl<br />
Utrecht Institute <strong>of</strong> Linguistics<br />
Universiteit Utrecht<br />
The Ne<strong>the</strong>rlands<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
223<br />
Michael Hartwig<br />
Michael.Jua.Hartwig@gmail.com<br />
Multimedia University, Cyberjaya<br />
Malaysia<br />
Simon Hopp<br />
simon.hopp@uni-konstanz.de<br />
Fachbereich Sprachwissenschaft<br />
University <strong>of</strong> Konstanz<br />
Germany<br />
Pierre Lison<br />
pierrel@coli.uni-sb.de<br />
Language Technology Lab<br />
Research Center for Artificial Intelligence<br />
Saarbrücken<br />
Germany<br />
Petar Maksimović<br />
petarmax@mi.sanu.ac.yu<br />
Ma<strong>the</strong>matical Institute<br />
Serbian Academy <strong>of</strong> Sciences and Arts<br />
Serbia and Montenegro<br />
Bojan Marinković<br />
bojanm@mi.sanu.ac.yu<br />
Ma<strong>the</strong>matical Institute<br />
Serbian Academy <strong>of</strong> Sciences and Arts<br />
Serbia and Montenegro<br />
Scott Martin<br />
scott@ling.osu.edu<br />
The Ohio State University<br />
United States<br />
Takako Nemoto<br />
nmt0731@yahoo.co.jp<br />
Ma<strong>the</strong>matical Institute<br />
Tohoku university<br />
Japan<br />
Ivelina Nikolova<br />
iva@lml.bas.bg<br />
LMD, IPP<br />
Bulgarian Academy <strong>of</strong> Sciences<br />
Bulgaria<br />
Yves Peirsman<br />
yvespeirsman@gmail.com<br />
QLVL<br />
Katholieke Universiteit Leuven<br />
Belgium
Aleksandar Perović<br />
pera@sf.bg.ac.yu<br />
Faculty <strong>of</strong> Transport and Traffic Engineering<br />
Serbian Academy <strong>of</strong> Sciences and Arts<br />
Serbia and Montenegro<br />
Maren Schierloh<br />
schierl1@msu.edu<br />
Michigan State University<br />
U.S.A.<br />
Andreas Schnabl<br />
andreas.schnabl@uibk.ac.at<br />
Institute <strong>of</strong> Computer Science<br />
University <strong>of</strong> Innsbruck<br />
Austria<br />
Éva Szilágyi<br />
essay229@gmail.com<br />
Department <strong>of</strong> Linguistics<br />
University <strong>of</strong> Pécs<br />
Hungary<br />
<strong>Proceedings</strong> <strong>of</strong> <strong>the</strong> <strong>13</strong> th <strong>ESSLLI</strong> <strong>Student</strong> <strong>Session</strong><br />
224<br />
CamiloThorne<br />
camilo.thorne@gmail.com<br />
Faculty <strong>of</strong> Computer Science<br />
Free University <strong>of</strong> Bozen-Bolzano<br />
Italy<br />
Christina Unger<br />
christina.unger@let.uu.nl<br />
Utrecht Institute <strong>of</strong> Linguistics<br />
Universiteit Utrecht<br />
The Ne<strong>the</strong>rlands<br />
Melanie Uth<br />
melanie.uth@ling.uni-stuttgart.de<br />
Institut fr Linguistik/Romanistik<br />
University <strong>of</strong> Stuttgart<br />
Germany<br />
Grégorie Winterstein<br />
gregoire.winterstein@linguist.jussieu.fr<br />
Laboratoire de Linguistique Formelle<br />
Université Paris 7<br />
France