Factive / non-factive predicate recognition within Question ...

ISSN 1744-1986 

T e c h n i c a l R e p o r t N O 2009/ 09 

Factive / non-factive predicate recognition 

within Question Generation systems 

B Wyse 

20 September, 2009 

Department of Computing 

Faculty of Mathematics, Computing and Technology 

The Open University 

Walton Hall, Milton Keynes, MK7 6AA 

United Kingdom 

http://computing.open.ac.uk

Factive / non-factive predicate recognition within Question 

Generation systems 

A dissertation submitted in partial fulfilment 

of the requirements for the Open University’s 

Master of Science Degree 

in Computing for Software Development 

Brendan Wyse 

(X5348818) 

9 March 2010 

Word Count: 14,532

Preface 

I am extremely grateful to my supervisor, Dr. Paul Piwek, for his enthusiastic guidance 

and his willingness to allow me to become involved in the area of Question Generation. 

He has introduced me to a community and a culture that I had always wanted to be a 

part of. 

My family are unfamiliar with the technical nature of my work but were willing and 

able to offer the encouragement and support necessary to help me see this research 

through to completion. To my wife, Amanda, my utmost thanks must go. Without her 

support I would not have even started this work. 

Special thanks go to my sons, Jason and Daniel. One day they will understand why I 

seemed so busy all the time. Their patience was appreciated.

Table of Contents 

Preface...........................................................................................................................i 

List of Figures ..............................................................................................................v 

List of Tables..............................................................................................................vii 

Chapter 1 Introduction...................................................................................................1 

1.1 Background to the research ...................................................................................1 

1.1.1 Defining the question generation task............................................................4 

1.1.2 Specific areas for research..............................................................................8 

1.2 Aims and objectives of the research project........................................................10 

1.3 Overview of the dissertation ...............................................................................12 

Chapter 2 Literature Review........................................................................................14 

2.1 Introduction .........................................................................................................14 

2.2 Overview of existing systems .............................................................................16 

2.3 Techniques used and NLP tools employed .........................................................19 

2.3.1 Tagging and parsing.....................................................................................21 

2.3.2 Term extraction ............................................................................................24 

2.3.3 WordNet.......................................................................................................27 

2.3.4 Rule-based mapping.....................................................................................29 

2.3.5 Lemmatisation..............................................................................................36 

2.4 Factivity...............................................................................................................38 

2.5 Research question................................................................................................41 

2.6 Summary .............................................................................................................41 

Chapter 3 Research Methods .......................................................................................43 

ii

3.1 Introduction .........................................................................................................43 

3.2 Research Techniques...........................................................................................43 

3.2.1 Prototyping...................................................................................................43 

3.2.2 Quantitative analysis of overall impact........................................................44 

3.2.3 Quantitative analysis of quality increase......................................................45 

3.3 Summary .............................................................................................................50 

Chapter 4 Software Overview......................................................................................52 

4.1 Introduction .........................................................................................................52 

4.2 Preparation of the OpenLearn data set ................................................................53 

4.3 Matching tagged sentences..................................................................................55 

4.4 Transformation to questions................................................................................60 

4.5 Matching sentences with structure information ..................................................62 

4.6 Rule representation..............................................................................................64 

4.7 Factive / non-factive recognition functionality ...................................................68 

4.8 Summary .............................................................................................................69 

Chapter 5 Results.........................................................................................................71 

5.1 Introduction .........................................................................................................71 

5.2 Overall impact .....................................................................................................71 

5.3 Quality Increase...................................................................................................74 

Chapter 6 Conclusions.................................................................................................79 

6.1 Introduction .........................................................................................................79 

6.2 Assessment of factive/non-factive recognition module ......................................79 

6.3 Further work........................................................................................................80 

References ..................................................................................................................82 

Index...........................................................................................................................85 

iii

Appendix A – Extended Abstract...............................................................................86 

Appendix B – List of factive and non-factive predicates...........................................94 

iv

List of Figures 

Figure a.1 Predicate verbs ............................................................................................viii 

Figure 1.1 Fill-in-the-blanks test .....................................................................................2 

Figure 1.2 An example of question generation................................................................5 

Figure 1.3 A rationale type question ...............................................................................6 

Figure 1.4 Question Taxonomy by Nielsen et al. (2008) ................................................7 

Figure 1.5 Factive versus non-factive..............................................................................9 

Figure 2.1 QuALiM mark-up. (Kaisser and Becker, 2004)...........................................17 

Figure 2.2 QG system mark-up from Cai et al. (2006)..................................................18 

Figure 2.3 Sub-tasks of question generation (Silveira, 2008) .......................................20 

Figure 2.4 Examples: John is in … plain text................................................................24 

Figure 2.5 Examples: John is in … POS tagged............................................................25 

Figure 2.6 Regular expression targeting NNP is in NNP ..............................................26 

Figure 2.7 Examples: Targeting motion related verbs...................................................27 

Figure 2.8 Examples: POS tagged motion verb examples ............................................28 

Figure 2.9 Mitkov and Ha example sentence ................................................................30 

Figure 2.10 Mitkov and Ha example question................................................................31 

Figure 2.11 Wang et al. sample template........................................................................31 

Figure 2.12 QuALiM mark-up language ........................................................................32 

Figure 2.13 NLGML mark-up for ‘somebody went to somewhere’...............................35 

Figure 2.14 Factivity sentence examples ........................................................................38 

v

Figure 3.1 Precision and Recall: Van Rijsbergen’s (1976) formal definition ...............49 

Figure 3.2 Precision and Recall: Adapted for this research...........................................50 

Figure 4.1 Ceist system architecture..............................................................................53 

Figure 4.2 OpenLearn Study Unit processing ...............................................................53 

Figure 4.3 OpenLearn XML format ..............................................................................54 

Figure 4.4 Python extraction script................................................................................55 

Figure 4.5 POS tagged sentence ....................................................................................56 

Figure 4.6 Example with proper nouns .........................................................................57 

Figure 4.7 Example of POS limitation ..........................................................................58 

Figure 4.8 Group matching in Ceist ..............................................................................61 

Figure 4.9 Sentence with grammatical structure ...........................................................62 

Figure 4.10 Demonstrating use of noun phrase ..............................................................63 

Figure 4.11 Sentence structure viewed as a tree in Ceist................................................64 

Figure 4.12 XML representation of a rule in Ceist .........................................................65 

Figure 4.13 Rule editing interface in Ceist .....................................................................67 

Figure 5.1 Chart showing occurrences of factive / non-factive.....................................72 

Figure 5.2 Occurrences of factive / non-factive directly before a new clause ..............74 

Figure 5.3 Proportion of Answerable Questions for Rule 1 ..........................................76 

Figure 5.4 Proportion of Answerable Questions for Rule 2 ..........................................77 

Figure 5.5 Recall values for both rules..........................................................................78 

vi

List of Tables 

Table 2.1 Existing QG Systems reviewed....................................................................15 

Table 2.2 Factive and non-factive categorised by Hooper (1974) ...............................40 

Table 3.1 Sample of YES/NO questions rejected ........................................................47 

Table 3.2 Sample of YES/NO questions considered answerable by the input ............47 

Table 3.3 Sample of YES/NO questions not answerable by the input.........................48 

Table 5.1 Occurrences of factive / non-factive ............................................................72 

Table 5.2 Occurrences of factive / non-factive directly before a new clause ..............73 

Table 5.3 Quality increase results ................................................................................75 

vii

Abstract 

The research in this paper relates to Question Generation (QG) – an area of 

computational and linguistic study with the goal of enabling machines to ask questions 

using human language. QG requires processing a sentence to generate a question or 

questions relating to that sentence. This research focuses on the sub-problem of 

generating questions where the answer can be obtained from the input sentence. One 

issue with generating such questions is the instance where a proposition in a declarative 

content clause in a sentence is taken to be true, when it might not actually be. 

Two sentences are shown in Figure a.1 below with the same declarative content clause 

(underlined) but with different predicate verbs (bold). The certainty that the proposition 

in the declarative content clause is true, is different for each. 

Figure a.1 Predicate verbs 

A QG system without the ability to understand the difference between the sentences 

above might generate the question ‘How many people were at the conference?’ Whilst 

this is grammatically, a valid question, it cannot be definitively answered given (1) 

above. From (1) we are not absolutely certain how many people were at the conference 

viii

ecause the speaker in the sentence is not absolutely certain. In a system designed to 

generate only questions that can be answered by the input sentence, this is a flaw. 

The verb ‘know’ is a factive verb. A factive verb “assigns the status of an established 

fact to its object” (Soanes and Stevenson, 2005a). The verb ‘think’ is a non-factive. A 

non-factive is a verb “that takes a clausal object which may or may not designate a true 

fact” (Soanes and Stevenson, 2005b). This research asks the question; what is the 

impact of enabling a QG system to recognise sentences containing these factive or non- 

factive verbs? Impact was regarded as both the overall impact which such a system 

might have on QG as a whole and the quality improvements which might be obtainable. 

A QG system was written as part of this research and a sub-task was implemented in 

this system by writing a software algorithm to perform factive / non-factive recognition. 

This was done by using a list of factive and non-factive verbs produced by Hooper 

(1974) which was expanded using a thesaurus. The expanded list allowed me to 

determine frequency of occurrence for factive/non-factive indicators and thus analyse 

overall impact. The same list was then used within the QG system to analyse the 

improvement of question quality. 

The analysis of factive / non-factive recognition was carried out using the Open 

University’s online educational resource, OpenLearn. OpenLearn was chosen as it is 

educational material and is available in a well marked XML format which makes it easy 

to extract certain content. 

ix

It was found that factive and non-factive verbs are common enough in educational 

discourse to justify further work on factivity recognition. The effect on precision when 

generating questions where the question must be answerable from the input sentence 

was quite good. It was found that whilst the module was successful in removing 

unwanted questions it did also remove some perfectly good questions. Previous research 

has concluded, however, that it is better to generate questions of higher precision and I 

agree. 

x

Chapter 1 Introduction 

1.1 Background to the research 

Written and spoken language is complex and this makes it difficult to process 

computationally. Complexity aside, the benefits of having machines capable of working 

with human languages are numerous. Human machine interaction would improve as 

machines could understand human language commands and respond using direct 

communications. One particular area that would benefit greatly from improved human 

machine interaction is that of educational technology, where computers are used for 

tutoring or training purposes. 

Some of the older traditional multimedia learning applications use little more than 

multiple-choice selection or fill-in-the-blanks (Figure 1.1) to interact with the student. 

They merely take paper based methods and transfer them to the computer. If a student 

has difficulty with a topic, there is no method of resolving that difficulty via the 

application. 

1

Figure 1.1 Fill-in-the-blanks test 

Intelligent Tutoring Systems (ITS) seek to educate through better interaction with the 

students. They allow students to engage with the artificial tutor and enable them to ask 

for assistance or explain their decisions. The tutor can figure out what the student is 

trying to achieve and question them about their methods. To do this directly, an 

artificial tutor must be capable of dialog. 

Intelligent Tutoring Systems are already capable of engaging in dialog using processes 

such as Natural Language Understanding (NLU) and Natural Language Generation 

(NLG). Both of these are sub-fields in the wider research area of Natural Language 

Processing (NLP). An ITS can recognise its student’s input and converse accordingly to 

either determine what the student might be thinking, or to guide the student towards a 

specific goal. The Intelligent Tutoring agent needs to generate and ask the right 

questions in order to do this. 

2

This research focuses on the field of Question Generation (QG) which aims to improve 

the technology that will allow applications such as ITS to ask appropriate and sensible 

questions. QG is a relatively new area of study and in order to promote research in it, a 

number of communities, including those involved with Intelligent Tutoring Systems 

(ITS) have met with the aim of setting up a Shared Task and Evaluation Campaign 

(STEC) for Question Generation. 

The STEC involves creating clearly defined tasks relating to QG, providing data sets 

relating to those tasks and asking QG system developers to run the data sets through 

their systems. The results are evaluated allowing the QG community to identify 

promising approaches to QG or areas which may need further study. 

This research was carried out with a view to contributing in some way to the Question 

Generation STEC. It is hoped that it will help to achieve one objective of the campaign 

which is to boost “research on computational models of Question Generation, an area 

less studied by the natural language generation community” 1 . 

1 http://www.questiongeneration.org 

3

1.1.1 Defining the question generation task 

It is hoped that by initiating a STEC for Question Generation that the NLP community 

will participate and consequently advance technologies related to QG. As part of the 

preparation work for the STEC, researchers have already done some work to clearly 

define the QG task. Let’s examine the QG task a little closer. 

It is clearly a computational task although not explicitly stated by Rus et al. (2007b p.2) 

who define QG as given an input of one or more sentences “the task of a QG approach 

is to generate questions related to this input”. Piwek et al. (2008 p.1) describe it as “the 

task of automatically generating questions”, thus recognising the computational aspect 

of QG. 

A precise definition is offered by Silveira (2008 p.1) as the ability of a system “to 

receive an input of free text, and to generate questions, in a language- and domain- 

independent manner, relevant for a target user previously profiled by the system”. 

Silveira is describing an ideal system which would be both language and domain 

independent and account for a specific type of user. This definition indicates a level of 

capability which might well be obtainable in the future but for now system designers 

generally focus on only very specific languages, and no specific domain or user type. 

Figure 1.2 is a very simple example of the process of generating questions which shows 

one possible output. Piwek et al. (2008 p.2) define the relation between the input in this 

4

example and the generated question as being the “input answers the output question”. 

Piwek et al. also describe other types of QG such as question reformulation where an 

input question is rephrased in some manner. They also define the relation where the 

input raises the output question, i.e. the generated question should elicit further 

information about the given input. 

Figure 1.2 An example of question generation 

This research concentrated on questions which could be answered by the input sentence 

such as the example in Figure 1.2. Such questions are applicable to educational 

technology because questions which test a student’s comprehension of a subject area are 

typically of this type. A comprehension question will ask the student something about 

what they have learned in order to determine whether or not they have understood it and 

the answer will generally be in the content they have studied. 

5

Concept completion questions are quite shallow. A deeper question might be the 

rationale type question such as that in Figure 1.3. The rationale type question is not 

answerable by the input sentence and this would be the case for most open questions. 

Figure 1.3 A rationale type question 

This research concentrates mainly on concept completion type questions (i.e. Who?, 

What?, When?, Where?, etc.) as many questions which can be generated from and 

answered by a single input sentence are of these type. There are a variety of other 

question types as defined by Nielsen et al. (2008) and listed in Figure 1.4 some of which 

would require processing of a complete paragraph of text. 

6

Figure 1.4 Question Taxonomy by Nielsen et al. (2008) 

Focusing on single sentences simplifies the task and makes it more accomplishable for a 

researcher new to QG and indeed NLP. Working with multiple sentences requires 

processing to determine links between sentences such as anaphora. Anaphora is where a 

7

sentence refers to something or someone using a personal pronoun (e.g. she, it) and we 

must determine what that personal pronoun is referring to from the surrounding text. It 

was decided that this would be beyond the scope of this research. Anaphora resolution 

is a research topic of its own. 

1.1.2 Specific areas for research 

Although there is no doubt the STEC will highlight many areas for potential research, it 

will not begin until early 2010 and will end well after the submission deadline for this 

dissertation has passed. For the purposes of my research an area of possible 

improvement was identified by experimenting with question generation. 

During my research I found a lot of open source tools available for NLP in general. 

Experimenting with open source tools often highlights limitations or areas the original 

developer did not implement. Access to the code behind the tools allows the tool to be 

improved or adapted if necessary and in addition allows others to learn from the original 

developer’s methods. Because QG is a relatively new area of study there are no open 

source systems available yet that I am aware of. 

The lack of an open source QG system presented an opportunity to me. By examining 

source code and documentation relating to existing NLP tools I was able to develop my 

own QG system, which I called Ceist 2 . Through developing Ceist I gained an insight 

2 Ceist is the word for ‘question’ in the Irish language and is pronounced ‘KESHT’ 

8

into the types of issues that QG systems must solve. The literature review (Chapter 2) 

outlines some of these issues and how they were addressed. Ceist then allowed me to 

experiment with QG in order to find some area for further research. 

I began to focus on the generation of questions where the answer to the question is 

explicitly contained in the input sentence. My experiments with Ceist, and indeed with 

other QG systems (such as Michael Heilman’s online question generator 3 ), highlighted 

a particular problem area. This was the case where a question was generated from a 

clause in a sentence and the QG system assumed that the statement made in the clause 

was an established fact but it was not. 

The problem is only really relevant to systems intending to generate questions where the 

input sentence explicitly contains the answer. Grammatically correct questions that are 

not answered by the input sentence are regarded as invalid for such systems. Figure 1.5 

presents two very similar sentences for the purpose of demonstrating this. 

3 http://www.ark.cs.cmu.edu/mheilman/questions/ 

Figure 1.5 Factive versus non-factive 

9

A QG system without factive/non-factive verb recognition will assume that there were 

10 new students yesterday, given both of these sentences as input because it will only 

process the declarative content clause ‘there were 10 new students yesterday.’ This 

means that although ‘How many new students were there yesterday?’ is a grammatically 

correct question, such a system assumes that its answer is ‘10’ for both (1) and (2). 

This is a problem. We cannot say that (2) contains the answer to ‘How many new 

students were there yesterday?’ because (2) does not establish the number of new 

students as a fact. The speaker in (2) only states that they ‘think’ that there were 10 

students. The speaker is not absolutely certain. 

This is the problem which I have attempted to solve for QG systems. Can we use the 

factive or non-factive verbs to ensure that we only generate a question from an input 

sentence when we know the answer is contained in that sentence as an established fact? 

The aims and objectives of this research are directed towards solving this problem and 

then evaluating the solution. 

1.2 Aims and objectives of the research project 

Based on the identified area for research I aimed to develop an algorithm that uses 

factive or non-factive verbs and phrases to determine whether declarations in input 

sentences are established fact. This functionality is added to the baseline QG system and 

its performance analysed. 

10

The research conducted consisted of two main objectives: 

1. Building a working QG system from scratch 

2. Develop a software component capable of factive / non-factive recognition 

The aim of the QG system development stage was to allow me to attain the knowledge 

and skills required to understand how QG is implemented. It also provided a QG system 

which could be freely adapted as necessary. This working QG system then 

accommodates the testing of algorithms capable of factive / non-factive recognition. 

The aims and objectives of the research project will allow me to answer the following 

research question - “What is the impact of implementing factive/non-factive predicate 

recognition as a sub-task of a Question Generation system?” 

There are two aspects to this research question. Firstly, we would like to know the likely 

benefit of factive/non-factive recognition in QG, i.e. ‘What is the overall impact which 

such a module would have on QG as a whole?’ For example, if I found that only 1 

sentence in 1,000 contained a non-factive predicate; further researchers might seek other 

areas for improvement in QG with a broader scope. 

Secondly, we wish to assess such a module and determine its effectiveness with regard 

to the quality of the system output, i.e. ‘What improvement does such a module deliver 

to generated question quality?’ This is done by analysing the question quality both with 

and without a factive/non-factive recognition module. 

11

1.3 Overview of the dissertation 

The following outline briefly describes the contents of each chapter in the dissertation. 

Chapter 1 – Introduction 

This chapter contains an introduction to Question Generation (QG). It describes recent 

efforts to drive research in QG and the potential benefits to Intelligent Tutoring 

Systems. A specific problem with current QG systems relating to factive or non-factive 

verbs or phrases is described. The aims and objectives of the research project relating to 

solving this problem are outlined. 

Chapter 2 – Literature Review 

To answer the research question laid out in the introduction (Chapter 1) required 

gaining some familiarity with Natural Language Processing and in particular techniques 

relating to question generation. 

Several resources relating to existing QG systems were reviewed and this chapter details 

the implementation of techniques used by these systems, many of which were then used 

in Ceist. Some of the current work relating to factivity in linguistics is outlined. 

Chapter 3 – Research Methods 

A general overview of the research methods used in the project to assess current QG 

systems, assist with building Ceist and to identify a means to implement factive / non- 

factive question recognition functionality. 

12

Chapter 4 – Software Review 

The software review describes the technical detail of how a QG system such as Ceist is 

created. It relates closely to and follows on from the techniques described in the 

literature review (Chapter 2) but with more technical detail. 

Chapter 5 - Results 

This chapter presents the results which were obtained following the assessment of the 

factive / non-factive recognition module using the research methods outlined in Chapter 

3. 

Chapter 6 - Conclusions 

The conclusions which can be drawn from the results of the assessment are discussed in 

this chapter. I also raise some further work which could be undertaken relating to Ceist 

and this research. 

13

Chapter 2 Literature Review 

2.1 Introduction 

A prerequisite to developing any system would normally be to understand how such a 

system works. In order to learn how existing QG systems work, some basic knowledge 

in NLP techniques is required. NLP is a vast area of study but the quantity and quality 

of the body of knowledge relating to it is excellent. There are several online resources 

available and some recommendable books on the topic which I reviewed to obtain this 

base knowledge. 

I experimented with some software toolkits in order to gain an understanding of NLP 

including the Python based Natural Language Toolkit 4 and the C# port of OpenNLP 5 , 

SharpNLP 6 . Question Answering (QA) is another area of NLP and one that has already 

seen several years of research dedicated to it. It shares many common problems with 

QG and consequently, common solutions too. For this reason I reviewed existing source 

code from both QA and QG applications. In this chapter I focus on the work I reviewed 

and detail some specific NLP techniques that are used for QG and were used in Ceist. 

Ceist is a rule based QG system as it uses expressions and templates to move from input 

to output. The combination of the expressions and templates are called rules. The 

4 http://www.nltk.org/ 

5 http://opennlp.sourceforge.net/ 

6 http://sharpnlp.codeplex.com/ 

14

techniques described here would be considered to be core to how rule-based QG 

systems work. 

The literature review does not delve into the technical details of the techniques. Instead, 

this is left to the software overview (Chapter 4) which presents a detailed description of 

the technical implementation of Ceist. The systems which were reviewed and/or 

assisted in the development of Ceist are listed in Table 2.1. 

System Question Types Generated 

Mitkov and Ha (2003) Multiple Choice Question 

generator 

Brown et al. (2005) Vocabulary Assessment questions 

consisting of 6 question types 

Cai et al. (2006) Questions to aid Intelligent 

Tutoring Systems 

Rus et al. (2007a) Factual questions whose answers 

are specific facts: Who? What? 

Where? When? 

Gates, D. (2008) Reading comprehension questions 

for children 

Wang et al. (2008) Questions to evaluate medical 

article comprehension 

Table 2.1 Existing QG Systems reviewed 

Following the overview of techniques used in QG systems in general, a description of 

the current body of knowledge pertaining to factive / non-factive recognition is 

presented. Factivity is a sub-field of linguistics relating to the established truth of a fact 

as given in a sentence. Although it has been researched for a number of years now, I 

15

was not able to find any practical NLP tools available for my requirements. I briefly 

describe the creation of my own tool. 

2.2 Overview of existing systems 

Workshops relating to The Question Generation Shared Task and Evaluation Challenge 

have seen some recently developed QG systems presented by their creators but some 

earlier work had been done too (Rus and Graesser, 2009). Early QG systems used 

shallow parsing and “employed various NLP techniques including term extraction” 

(Mitkov and Ha, 2003 p.1) and these techniques were also used in QA systems around 

the same time (Kaisser and Becker, 2004). Parsing allowed early systems to determine 

the grammatical structure of free text sentences and term extraction allowed the systems 

to match sentences which had a specific grammatical structure. Today’s systems still 

employ parsing and term extraction and I expand on these two methods in the next 

section (2.3). 

The expressions used in term extraction are related to either the question being 

answered, in the case of QA, or the question being generated, for QG. If a QA system 

seeks to answer the question ‘Where was Mary McAleese born?’ then it might use term 

extraction to find a sentence of the form ‘Mary McAleese was born in .’ to retrieve the answer. A QG system would do the reverse; finding 

sentences of the form ‘Mary McAleese was born in .’ to generate the 

question ‘Where was Mary McAleese born?’ with its answer. 

16

Naturally the system designers chose to use some form of mark-up to represent the term 

extraction expression and link it to its related question. The QG system by Cai et al. 

(2006) and the QA system, QuALiM, by Kaisser and Becker (2004) both use XML 

based mark-up languages to do this. QuALiM’s mark-up is shown in Figure 2.1. 

Figure 2.1 QuALiM mark-up. (Kaisser and Becker, 2004) 

The combination of the term extraction expression and the generated question template 

is called a rule. Figure 2.2Error! Reference source not found. below is an example of 

one such rule as defined by Cai et al.’s mark-up language which they call NLGML. 

Although this rule is from a QG system, it does bear some similarities to the mark-up 

from the QA system. Both mark-ups use a sequence to match a specific grammatical 

structure and then use matched parts of that structure in a template to generate an 

output. 

17

Figure 2.2 QG system mark-up from Cai et al. (2006) 

Three of the original authors of NLGML described an improved version of their QG 

system just one year later which focused on flexible term extraction and its associated 

mark-up language (Rus et al., 2007a). Mark-up language for QA/QG systems appears to 

be an area where different developers are re-inventing the wheel and a unified effort to 

standardise the representation would be useful (Wyse and Piwek, 2009). 

Systems generally accept various discourse types although it is not uncommon for 

developers to focus on very specific domains and domain-specific discourse. Donna 

Gates (2008) focused on news articles for young children. The language contained 

within such articles would be relatively simple everyday language. As sentence 

structure becomes more complex and contains an increasing number of elements, 

finding patterns within the sentences would become increasingly difficult. 

Gates used “several off the shelf language technologies to generate reading 

comprehension questions” based on the news articles. Like Gates’ system, Ceist focuses 

on educational material also. This is because the application of QG in educational 

technology is a useful way to showcase the technology. 

18

Testing language learning skills was also the purpose of a system developed by Brown 

et al. (2005 p.1). This system is designed to test the user’s vocabulary knowledge by 

“automatically generating questions for vocabulary assessment”. A system with a 

completely different purpose was developed by Wang et al. (2008). This system was 

designed to test medical students and as such worked with text containing medical terms 

which would not typically be part of everyday vocabulary. 

As has previously been stated, the number of QG systems currently available to study is 

limited and the QG STEC will without doubt change this. Already, however, a lot can 

be gained by examining a few of the techniques used in the systems briefly described 

above. 

2.3 Techniques used and NLP tools employed 

The task of question generation can be split into several sub-tasks depending on the 

target question types (Silveira, 2008). Figure 2.3 is Silveira’s diagram showing groups 

of sub-tasks and typical processing flows. This sub-section will look at existing methods 

used to implement some of these sub-tasks. 

19

Figure 2.3 Sub-tasks of question generation (Silveira, 2008) 

The first group of sub-tasks in Silveira’s diagram which are performed on the free 

natural text input are tokenisation, stemming, syntactic and semantic parsing, anaphora 

resolution and ambiguity removal. These tasks perform some initial preparation on text 

for use in further processing. I will briefly explain the other tasks before focusing on 

syntactic and semantic parsing. 

Tokenisation is the task of splitting up words from text. Although it may sound simple, 

sometimes the use of punctuation makes this quite difficult (e.g. a period might 

represent the end of a sentence or a decimal point). The stem of a word is its root form 

and stemming is the process of finding that root form. For words such as ‘running’ and 

‘laughed’ the process is quite simple; the suffix is removed leaving ‘run’ and ‘laugh’. 

Stemming is not always so straight forward however; for example irregular verbs 

cannot be stemmed using a common algorithm (eg ‘bought’, ‘swam’). 

20

I described anaphora in my introduction chapter (1.1) and to briefly recap it is the 

process of determining what a personal pronoun refers to within the surrounding text. 

Ambiguity removal is quite simply determining the meaning of something which is 

ambiguous in the text. Syntactic parsing provides grammatical information for text 

which allows us to process it at a grammatical level rather than at word level. The next 

sub-section explains this further. 

2.3.1 Tagging and parsing 

The most basic syntactic information that can be added to plain text is part of speech 

(POS) tags. This process marks each word in a sentence with its corresponding part of 

speech (i.e. noun, adjective, verb, etc.) and is called POS tagging. Stochastic and rule- 

based are two categories of taggers with two different approaches to tagging. 

Stochastic taggers such as those using a Hidden Markov Model (HMM) are based on 

probability. HMM taggers learn the probability that pairs (or longer sequences) of 

words will be specific POS types by analysing corpora of manually annotated text such 

as the British National Corpus 7 . 

My initial experiments with the NLTK included some work with the rule-based Brill 

tagger (Brill, 1992). The Brill tagger uses a machine learning technique known as 

7 http://www.natcorp.ox.ac.uk/ 

21

supervised learning. It is trained on annotated development data to learn rules that it can 

apply to certain words or sequences of words. 

Lexical rules in the Brill tagger check for characters within words and change the POS 

of matching words. An example is the rule ‘NN s fhassuf 1 NNS’ which changes 

any word tagged as a noun (NN) to a plural noun (NNS) if the word has the suffix ‘s’. 

The tagger also uses contextual rules and will change the POS of a word based on other 

words in the same sentence. 

POS tagged text is a pre-requisite for the process of term extraction as described briefly 

in the overview of existing systems in the previous section (2.2). My initial experiments 

with the Brill tagger highlighted one problem that occurs when using text which has 

been tagged with POS information only. It became quite apparent that using parts of 

speech tagging alone was inflexible. Let’s examine why this is. 

Consider the sentence ‘The house was in a field’. Targeting this sentence with the 

objective of creating a question ‘Where was the house?’, one might use POS tags to 

match the determiner ‘The’ followed by the noun ‘house’. The problem with this 

approach is that it eliminates other potentially valid matches such as ‘The green house 

was in a field.’ 

To overcome this, one might permit an optional adjective between the determiner and 

the noun, but what if there are two adjectives as in ‘The big green house’. The possible 

permutations are endless and accounting for them using POS tag combinations alone 

22

would be impossible. One way to solve this issue would be to match the complete noun 

phrase, rather than just parts of speech. A noun phrase encompasses the determiner, any 

adjectives and the noun itself. The technical approach to this matching is described in 

the software review (Chapter 4) but I will outline the parsing process which makes it 

possible. 

Shallow parsing is a NLP process that adds syntactic structure information to a 

sentence. Noun phrases, verb phrases and other groupings within a sentence are marked 

in addition to the POS tags. Shallow parsing was required by Mitkov and Ha (2003) 

who wanted to identify key terms in a given text. They decided that frequently 

occurring nouns or noun phrases would make good key terms and they utilised the FDG 

shallow parser (Tapanainen and Järvinen, 1997). There are a variety of approaches to 

parsing now and a range of parsers available. 

QG system developers do not provide rationale for choosing one parser over another. It 

is highly likely that the choice is made from the best of the state of the art parsers 

available at the time based on the following factors; (i) Availability – is the use of the 

parser restricted in any way, (ii) Performance – has the parser performance been proven. 

(iii) Open source – if the source code for the parser is available in the QG system 

developers chosen programming language, then this could be important as it can be 

adapted if needed. 

The factor that was deemed to be most important when choosing a parser for Ceist was 

that the parser source code was available and written using the Java language. Ceist is 

23

written using the Java programming language and is modular in design. Modules exist 

for rule storage and conversion, some NLP functionality such as group matching and 

the main QG application itself. 

Other factors included the output formats available and although performance was not 

measured it was considered. In the next section I look at how parsed text is used in term 

extraction and then I explain another factor in the choice of parser for Ceist. 

2.3.2 Term extraction 

Term extraction is the task of extracting targeted terms from free text. This corresponds 

to the term ‘representation matching’ in Figure 2.3. A basic requirement for QA and QG 

systems is the ability to identify sentences with a specific grammatical structure. Note 

the similarities among the sentences in Figure 2.4. Two of them refer to John being in a 

city, i.e. (1) & (3), but the other, (2), does not. 

Figure 2.4 Examples: John is in … plain text 

Term extraction will allow us to match sentences stating that John is in some city such 

as (1) and (3). For the purposes of QG the matched terms can then be re-arranged to 

24

form a question (e.g. Where is John?) and if desired, an answer (e.g. Dublin or London). 

Let’s look at simple POS tagged versions of the example sentences as provided by the 

Stanford Parser online 8 . Figure 2.5 shows the sentences with the POS tagger 

information coloured grey. 

Figure 2.5 Examples: John is in … POS tagged 

I use the term ‘target’ to indicate that I wish to only match sentences of a specific 

structure. From our example we wish to target sentences (1) and (3) exclusively, and so 

we can simply look for sentences with a proper noun (NNP), followed by the words ‘is 

in’, followed by another proper noun. Not all proper nouns are city names of course, but 

to keep things simple, this example the target simply looks for a proper noun at the end 

of the sentence. Because a POS tagged sentence is a string, we can search it using 

regular expressions. 

The use of regular expressions to search a POS tagged sentence is quite simple, as the 

POS tagged sentence is a list of word / POS pairs separated by a forward slash. The 

8 http://nlp.stanford.edu:8080/parser/index.jsp 

25

egular expression ‘(\w*)/NNP’ can be used to match proper nouns. The regular 

expression shown in Figure 2.6 matches both (1) and (3) above exclusively. (2) is not 

matched because the last word in (2) is not a proper noun. 

Figure 2.6 Regular expression targeting NNP is in NNP 

Terms in parentheses can be captured as groups. This allows a system to retrieve the 

proper noun at the beginning of the sentence and use it to generate a question such as 

‘Where is John?’ These techniques are used in existing QG systems and I use the same 

in Ceist. The software overview (Chapter 4) provides more detail on this process. 

The previous sub-section (2.3.1) explained the problem with targeting sentences using 

POS tags alone. A method was needed to allow extraction based on syntactic groupings 

such as noun phrases or verb phrases. Shallow parsing provided this information but 

this additional information is more difficult to extract terms from. This is because the 

representation of the parsed sentence is no longer linear; it is a tree data structure. 

Searching within branches of a tree using normal regular expressions is not possible as 

the structure is a hierarchy of nodes. Gates (2008) used a NLP tool designed for this 

purpose called T-Surgeon (Levy and Andrew, 2006). Ceist uses a tool from the same 

26

package called Tregex, which is capable of performing regular expression searches 

within a tree data structure. 

Tregex is a sub-project within the larger Stanford NLP tools package. Also within this 

package is the Stanford NL parser. This parser supplies parsed text in a variety of 

formats including a one line syntax parsed tree which can be sent directly as input to 

Tregex. All of the source code for the Stanford NLP tools is open source. Based on 

these factors the parser used with Ceist is currently the Stanford NL parser. Klein and 

Manning (2003) do report performance figures for the parser but performance was not a 

key factor in its choice for Ceist. 

2.3.3 WordNet 

In the previous sub-section I explained term extraction techniques and in particular how 

QG system developers use syntactic groupings to encompass an entire noun phrase 

rather than individual parts of sentence. Term extraction can be improved further by 

grouping terms semantically. The sentences in Figure 2.7 below demonstrate this. 

Figure 2.7 Examples: Targeting motion related verbs 

27

Let’s assume we wish to generate a question ‘Where did John move towards?’ from 

these sentences. Only (4) and (5) in Figure 2.7 are valid for this question because they 

concern movement, but can we rely on POS tagging to eliminate sentence (6)? The 

Stanford NL parser produces the POS tagged versions of these sentences shown below. 

Figure 2.8 Examples: POS tagged motion verb examples 

The sequence of POS tags is identical for all three sentences. POS tags alone do not 

allow us to target only sentences (4) and (5). What is required is the ability to 

distinguish between verbs that mean the subject is moving (i.e. walked and ran) and 

other verbs (e.g. pointed). Semantics relating to linguistics is the study of the meaning 

of words, phrases or sentences. NLP tools exist which allow systems to find words with 

similar meanings. 

The vast majority of current QG systems use a lexical database called WordNet 

(Fellbaum, 1998). One function WordNet provides is the ability to lookup words that 

are semantically similar to another. It can be used to solve the problem with the 

sentences above by querying the database for all verbs with a similar meaning to ‘walk’ 

and then using this group of verbs to target only sentences where movement has taken 

place. 

28

Any of the systems I reviewed which used a semantic grouping lookup, used WordNet. 

The technical details are not discussed in great detail but this is because it is not by any 

means difficult to do. Essentially WordNet is a database and can be consulted to return 

words close in meaning to a target word. 

The modular nature of Ceist removes any ties to specific NLP tools. Ceist employs a 

grouping function which allows the specification of semantic groups as a tag name (e.g. 

motionVerbs). The implementation of motionVerbs can be either simply a list of user 

created motion verbs or something cleverer such as a module interfaced to WordNet. I 

have been careful to avoid becoming tied to one particular NLP tool and as such Ceist 

presently only uses simple user created lists. 

2.3.4 Rule-based mapping 

In the sub-section relating to term extraction (2.3.2) I briefly mentioned how rules could 

map matched terms from an input sentence and rearrange the terms to form a question. 

An example was the sentence ‘John is in London’. Any time a sentence of the format 

‘ is in ’ was found, the first proper noun could be inserted into the 

template ‘Where is ?’ to generate a question. This process of transforming the 

matched terms into a question using a template is called mapping. 

The contrived example is relatively simple but this is not always the case. Some 

mappings might require a change in verb tense or possibly the conversion of a word, or 

words, to lower case letters. We can look at one example of mapping involving verb 

tense changes which was considered by Mitkov and Ha (2003, p.2). They state that they 

29

use “a number of simple question generation rules” for transforming sentences of “SVO 

or SV type”. So, what do they mean by SVO or SV type? 

Sentences commonly consist of a structure consisting of subjects, verbs and objects. 

The relation between the subject and object is defined by the verb. In the sentence ‘John 

ate apples’, ‘John’ is the subject and ‘apples’ is the object. Sentences with a structure 

containing a subject followed by a verb and then an object are known as SVO (subject- 

verb-object). Sentences with a structure containing just a subject and verb would be 

categorised as SV (Huddleston and Pullum, 2005). 

The simple transformation Mitkov and Ha perform on such sentences is to rearrange 

them into the format “What do/does/did the S V?” They take the subject and the verb 

and append them to the interrogative phrase “What do/does/did”. They provide an 

example using the sentence in Figure 2.9. 

Figure 2.9 Mitkov and Ha example sentence 

The subject is underlined and the verb marked with bold type. Applying the simple 

transformation from input sentence to question, Mitkov and Ha present the output in 

Figure 2.10 as the generated question. 

30

Figure 2.10 Mitkov and Ha example question 

We can observe two issues with this transformation which must be addressed. The 

choice of the verb ‘do’, ’does’ or ‘did’ in the generated question depends on the verb 

tense in the original sentence, and the transformed verb could also be subject to 

inflection e.g. (constitutes constitute). There is also the minor issue that the letter ‘A’ 

in the subject becomes lower case in the question. Simply converting all first characters 

to lower case is not a valid solution to this issue because if the first word were a proper 

noun, it should not be converted in this way. We will see that in other systems, the 

methods used to define transformational templates and rules have been enhanced to 

address these issues. 

Wang et al. provide some examples of templates used to generate questions. Their 

templates consisted of four components: “question, entries, keywords and answer” and 

they give the following sample template: 

Figure 2.11 Wang et al. sample template 

31

Their system looks for parts of a medical text (which has been tagged to indicate 

diseases and symptoms amongst other medical entities) containing the required entries 

and at least one of the keywords. 

Both Mitkov and Ha and Wang et al. used templates capable of matching a very wide 

range of sentences. There are an infinite number of sentences of the form Subject-Verb- 

Object or Subject-Verb. The sample given above from a Wang et al. template would 

match any sentence containing a symptom, a disease and any of the keywords ‘feel’, 

‘experience’ or ‘accompany’. Such sentences are probably common enough in medical 

texts. These systems differ from the others because they do not define very exact 

sentences which they wish to match. Other rule based systems target very specific 

sentences right down to the word level, as does Ceist. 

The QA system QuALiM (Kaisser and Becker, 2004 p.2) featured “strict pattern 

matching” using “sequences” to “classify the questions according to their linguistic 

(mostly syntactic) structure”. An example of the sequence definition given by Kaiser 

and Becker is shown in Figure 2.12. 

Figure 2.12 QuALiM mark-up language 

32

QuALiM is a QA system and thus the sequence is designed to match a question. Kaisser 

and Becker used the sequence to identify candidate sentences which could answer this 

question, and searched Google for those sentences. In a QG system designed to generate 

questions for which the answer is contained within the text, the target sequence would 

mainly be declarative sentences as declarative sentences are statements and will usually 

state some fact about which a valid question can be asked. Although the method used by 

QuALiM was employed to represent not just declarative sentences, it can be used to 

represent them and is still applicable. 

The sequence given above matches a very specific question, described by Kaisser and 

Becker (2004, p.2) as “any question that starts with the word ‘When’, followed by the 

word ‘did’, followed by an NP, followed by a verb in its infinitive form, followed by an 

NP or a PP, followed by a question mark which has in addition to be the last element in 

the question”. 

The mark up they employed was flexible in that it permitted the building of patterns 

containing a mix of elements which could be matched. The reason this capability is 

important can be explained by a simple example. To match the sentence, ‘Mary went to 

Dublin’, a QG system could be asked to look for exactly those four words in sequence. 

Any sentence matching those exact four words will be used to generate a question. 

It is quite easy to determine computationally whether or not a sentence contains the 

exact words ‘Mary went to Dublin’ but in any given text the probability that we are 

given this individual sentence is negligible. This approach to finding candidate 

33

sentences obviously needs to be improved. The coverage of the pattern must be 

increased and one manner in which this might be done would be to replace the word 

‘Mary’ in the pattern, with any person’s name. By writing a new pattern ‘ 

went to Dublin’, we would increase the coverage significantly. This functionality 

was seen as a necessity if patterns were ever to become fully defined and as a result it 

became a design aim for Ceist. It would be possible to use groups such as ‘NAME’ 

within the patterns and the manner in which the groups were implemented was 

irrelevant to Ceist. 

The variable ‘’ might match a single first name, a surname, maybe both or even 

a titled name such as ‘Mrs. Mary Burke’. The important point to note is that our pattern 

now contains two elements, i.e. individual words and a variable representing a group of 

words (people’s names). The pattern could also be modified to look for any verb in the 

past tense. The word ‘went’ would be replaced with the part-of-speech tag ‘VBD’. 

Kaisser’s system allowed patterns to contain different element types and his mark up 

was designed to suit this. Having realised the potential of such a flexible mark up, I 

decided to implement similar functionality in Ceist. I made personal contact with 

Kaisser and he was kind enough to provide me with further technical details regarding 

the mark up used in the sequence definitions. 

Following a similar approach, Cai et al. introduced NLGML, “A Markup Language for 

Question Generation”. The approach was “based on lexical, syntactic and semantic 

patterns described in a mark-up language” (Cai et al., 2006 p.1). An example of 

34

NLGML describing a sentence of the form “somebody went to somewhere” is shown in 

Figure 2.13. 

Figure 2.13 NLGML mark-up for ‘somebody went to somewhere’ 

The language allows semantic features to be matched using the attributes specified in 

the phrases e.g. (person=”true”, location=”true”). NLGML also introduces 

functions to address some of the issues previously described. NLGML uses the function 

_lowerFirst that would change the first letter of a term to lower case. The function 

_getLemma changes the verb ’went’ to ’go’ for example and the process is known as 

lemmatisation. QuALiM also used lemmatisation and I have implemented similar 

functionality in Ceist. The task of lemmatisation is described further in the next sub- 

section (2.3.5). 

The work by Cai et al. and Rus et al., to begin defining a unified mark up language for 

rules, is very important. They consider the two most important parts of a QG system to 

be: the transformation rules and an interpreter. It is very likely that if the manner in 

which rules are defined is standardised and the rules are sufficiently precise that rules 

can be used across systems and there may even be an effort to create a unified set of 

rules for QG. What would then distinguish one rule-based system from another would 

35

e its ability to perform the NLP sub-tasks such as Named Entity Recognition or 

lemmatisation. 

2.3.5 Lemmatisation 

The function getLemma and the related NLP task of lemmatisation were mentioned in 

the previous sub-section. Possibly because it is deemed to be a well-researched task, QG 

system designers do not elaborate on the methods used by their systems to lemmatise 

words. Indeed, only from personally communicating with the author of the QA system 

QuALiM, was I able to learn more about their method of lemmatisation. 

QuALiM uses transformation rules to change verb forms and NLGML/QG-ML uses 

functions. Other functions allow matched terms to be transformed in other ways, such as 

to be converted to lower case characters. The NLP sub task of determining inflected 

word forms has also been well researched (Porter, M.F., 1980; Jurafsky and Martin, 

2009). Stemming is the process of acting on a verb using simple rules such as removing 

the suffix ‘-ed’ from a verb in the past tense to provide the base form of the verb, e.g. 

(walked walk). This solution is not perfect and an example is notable when dealing 

with irregular verbs, e.g. (ran, hid), because irregular verbs do not follow simple rules. 

QuALiM uses a morphology database that was part of the XTAG system by Doran et al. 

(1994). It contains a vast number of verbs and their inflected forms that can be easily 

queried to perform lemmatisation. The database was originally compiled by Karp et al. 

(1992) using a set of morphological rules for English by Karttunen and Wittenburg 

36

(1983) and the 1979 edition of the Collins Dictionary of the English Language and 

unfortunately new words must be added to the database over time but the advantage is 

that the lemmatisation is very accurate. 

Ceist currently uses the XTAG system but there is a limitation with lemmatisation. 

Lemmatisation alone is not sufficient to generate some questions. Lemmatisation only 

provides us with the base form for a verb, but there are cases where we would like to 

obtain another form for a verb. An example is the input sentence ‘John has eaten all the 

apples.’ Generating the question ‘Who ate all the apples?’ requires the verb ‘eat’ to be 

morphed from the past participle ‘eaten’ to the preterite ‘ate’. A lemmatiser such as the 

XTAG system will only tell us that the base form of ‘eaten’ is ‘eat’. It does not provide 

a means to then tell us the preterite of ‘eat’. The tool which provides this functionality is 

a verb conjugator. 

Verb conjugation has not yet been implemented in Ceist. This is an area where the 

system can be improved in the future. One potential solution is to integrate the verb 

conjugator provided by the American Northwestern University’s MorphAdorner 9 tools 

with Ceist. 

9 http://morphadorner.northwestern.edu/morphadorner/ 

37

2.4 Factivity 

Factivity is relevant to question generation for single sentences in the particular case 

where we expect the generated question to be answered by the sentence. If we ask a 

question based on a declarative statement in the sentence then we must be sure that the 

statement is an established fact. This may not always be the case as the sentences in 

Figure 2.14 show (the statement is underlined). 

Figure 2.14 Factivity sentence examples 

The speaker in (1) is not sure that there were 10 people at the conference whereas the 

speaker in (2) is absolutely certain. The clause within the sentence containing the 

declarative statement is known as a declarative content clause. 

Predicate verbs which establish a true fact, such as ‘know’ are known as factive verbs. 

Predicates which do not establish the statements as fact are known as non-factive (e.g. 

think). Much of the pioneering work on factivity was done by Kiparsky and Kiparsky 

(1970) and Hooper (1974). Kiparsky and Kiparsky explain that in factive sentences the 

speaker “presupposes that the embedded clause expresses a true proposition”. The 

speaker in (1) above does not presuppose that they are aware of the precise number of 

attendees at the conference, but the speaker in (2) does. Kiparsky and Kiparsky begin to 

clearly define factives and non-factives using rules in their paper from 1970. 

38

Hooper expanded this work by further clarifying “the differences between classes of 

predicates that take that clauses as subject or object complements.” Table 2.2 is the list 

of categorised predicates drawn up by Hooper. The categorised list drawn up by Hooper 

is a useful starting point for a more comprehensive list. Hooper’s list can be expanded 

using thesaurus to create a software module for factive/non-factive recognition. This is 

exactly what I did for this research and I outline the technical details of the software 

module in the software overview (Chapter 4). 

These researchers categorise factives and non-factives into groups such as strong- 

assertive or weak-assertive as well as mental or verbal. Their intention being that we 

can estimate the certainty of what a speaker presupposes by looking up the category into 

which the factive or non-factive verb they have used, falls. This is very difficult to do. 

Even as I expanded the original list of verbs by Hooper I found that the thesaurus would 

present the same word for two different verbs from two different categories. I was not 

concerned with the categories so I simply added the new word to my list once only. For 

my research the degree of certainty was not important. 

I have considered a new approach to determining the certainty with which a speaker 

presupposes a fact. The certainty could be estimated by studying sentences containing 

factive or non-factive verbs from my comprehensive list and assigning a value for the 

chance of certainty for each word. If it was found, for example, that 6 times out of 10 

when a speaker ‘claims’ to have seen something, that they in fact had seen it, we would 

assign a certainty value of 0.60 to the verb ‘claim’. Using a numeric value would then 

allow NLP systems to select the level of factivity they desire. 

39

Factive 

Assertive (semi-factive) Non-assertive (true factive) 

find out regret 

discover resent 

know forget 

learn amuse 

note suffice 

notice bother 

observe make sense 

perceive care 

realize be odd 

recall be strange 

remember be interesting 

reveal be relevant 

see be sorry 

be exciting 

Non-factive 

Assertive 

Weak assertive Strong assertive Non-assertive 

think (a) (b) non-negative 

believe acknowledge insist agree be likely 

suppose admit intimate be afraid be possible 

expect affirm maintain be certain be probable 

imagine allege mention be sure be conceivable 

guess answer point out be clear negative 

seem argue predict be obvious be unlikely 

appear assert prophesy be evident be impossible 

figure assure postulate calculate be improbable 

be 

certify remark decide 

inconceivable 

charge reply deduce doubt 

claim report estimate deny 

contend say hope 

declare state presume 

divulge suggest surmise 

emphasize swear suspect 

explain testify 

grant theorize 

guarantee verify 

hint vow 

hypothesize 

imply 

indicate 

write 

Table 2.2 Factive and non-factive categorised by Hooper (1974) 

40

2.5 Research question 

To begin to answer the two specific sub-questions of the research question, “What is the 

overall impact of implementing factive/non-factive sentence recognition as a sub-task of 

a Question Generation system?” and “What increase in generated question quality does 

it deliver?” there are some pre-requisite steps that must be taken. 

A working QG system is required which uses the techniques described in this literature 

review (2.3). A module is added to this system which is capable of factive/non-factive 

sentence recognition using a comprehensive list of factive and non-factive verbs and 

phrases as outlined in the previous section (2.4). 

Developing this module requires drawing again on the techniques described in section 

2.3 and also on the knowledge gained from the entire literature review. Using 

techniques which are described in Chapter 3 but which were learned during the 

literature review, the new module will then be evaluated in order to determine its overall 

impact on a QG system and the increase in generated question quality which it delivers. 

2.6 Summary 

The literature review provided an insight into the techniques used to implement current 

QG systems. With this knowledge it was possible to develop a new QG system from 

scratch with specific design aims in mind. 

41

These design aims were that the system would focus on single sentences only, and that 

the answer to the generated question would be contained within the sentence. In fact, the 

system would also allow the generation of the answer in addition to the question. 

Another design aim was that the match pattern used to identify candidate sentences 

from which questions could be generated would be extremely flexible. It would allow 

the pattern to identify exact words, parts-of-speech, syntactic structures such as noun 

phrases and in addition groupings (semantic or otherwise) that increase the coverage of 

the patterns. 

The system would allow, for example, a rule author to match ‘personName’ within a 

pattern and the implementation of how the QG system identifies a persons name is 

hidden from the user. The rule author is only concerned that the pattern they have 

written will generate valid questions if the system correctly recognises person names. 

The working QG system built with these design aims would then facilitate the testing of 

a module capable of factive / non-factive predicate recognition. The body of knowledge 

described in this chapter relating to factivity was used to create this module. 

42

Chapter 3 Research Methods 


The primary research techniques used in this project were prototyping of a QG system 

and assessment of a software module providing factive / non-factive sentence 

recognition. The development of the prototype was based on knowledge attained in 

some part by a review of literature relating to existing systems (Chapter 2). 

The overall impact of the software module and the quality increase which it delivered 

were both assessed using quantitative analysis. This chapter outlines the methods used 

in the analysis. 

3.2 Research Techniques 

3.2.1 Prototyping 

The technique of prototyping was used to develop the QG system Ceist. It was built 

iteratively. Each iteration incorporated additional functionality or highlighted a new 

problem. The work relating to existing QG systems as described in the literature review 

(Chapter 2) was used to address these problems. 

The benefits of prototyping as a research method were very clear. The process of 

experimenting with an application and attempting to implement various features 

highlights potential drawbacks and new solutions must be found. Problems not easily 

solved are identified as candidates for further research. 

43

Examples of redesign during prototyping are the original parser choice where POS 

tagging alone was found to be insufficient or the choice of a lemmatiser tool rather than 

a fully fledged verb conjugator. Both decisions only showed any limitation because of 

their use within the prototype. The discovery of a limitation with a particular tool during 

prototyping was seen not as a setback but as a valuable lesson within the research as a 

whole. 

Prototyping as a research technique was very effective. The practical exercise of 

building a system considerably advanced my knowledge of NLP. First hand experience 

of NLP problems gave me an opportunity to contemplate solutions to those problems. 

3.2.2 Quantitative analysis of overall impact 

Overall impact measures the benefit of factive/non-factive recognition to question 

generation as a whole. Calculating the frequency at which factive or non-factive 

predicates occur in text can provide some indication of the likely benefit to be gained by 

recognising such predicates. This analysis aimed to calculate those frequencies within 

the educational discourse OpenLearn. 

The initial list of factive and non-factive predicates which was drawn up by Hooper 

(1974) was extended using a thesaurus. For the purposes of analysing the overall 

impact, some phrases which indicate factivity were also used (e.g. realised = got the 

message). More detail about the creation of the collection is given in the software 

overview (Chapter 4 – 4.7). 

44

Once the list was finalised, it was incorporated into a Ruby script. The input to this 

script was a sample of the entire OpenLearn online resource in text file format. Each 

individual sentence was read by the script, one by one. The frequency of occurrences of 

the terms in the list was counted. 

The script produced a count of factive and non-factive words or phrases in a sample of 

the OpenLearn resource. I analysed the frequencies to determine how often factive and 

non-factive words and phrases would play a part in determining the certainty of an 

established fact in a declarative content clause. 

This type of analysis has been used before by QG system developers. Brown et al. 

(2005 p.5) “examined the percentage of words for which we could generate various 

question types”. Knowing the frequency of a specific set of words within a discourse 

allows us to deduce the overall impact which work relating to that set of words will 

have. 

3.2.3 Quantitative analysis of quality increase 

The measurement of question quality is an area of QG which is still in its infancy. It is 

generally accepted that generated questions must be complete and grammatically 

correct. Gates (2009) scored a generated question by assessing “whether it was 

syntactically grammatical and whether it made sense semantically in the context of the 

text”. This aspect of question quality is important but these criteria are not useful in 

measuring the improvement which factive / non-factive recognition delivers. 

45

For my specific focus area, the generation of sentences answerable by the input 

sentence, I must use another criteria: Is the question explicitly answerable by the input 

sentence? 

A quantitative analysis of the increase in quality produced by the factivity module 

therefore needs to measure the number of questions unanswerable by the input 

sentences which are no longer generated when the module is added to Ceist. It should 

also measure the number of perfectly good questions which were no longer generated, if 

any. 

To execute such measurements, the entire OpenLearn parsed data set was inputted into 

Ceist with no factive / non-factive recognition (Ceist-Baseline), using two rules. The 

rules generated yes/no type questions and what type questions. 

A yes/no type question has simply yes or no as an answer. For analysis I determined if it 

was possible to answer the generated question with ‘yes’ or ‘no’ given the input 

sentence. The particular rule I used was aimed at generating questions with ‘yes’ as the 

answer. Grammatically incorrect or incomplete questions were removed from the 

output. Table 3.1 shows an example of the type of output which was rejected for the 

yes/no rule type. 

46

Input sentence Rejected question 

The point is that, whatever the shape of the 

memorial, there has to be agreement that the 

form is appropriate in order that the meaning, 

and therefore the function, is assured. 

So, it is important to realize that a water 

molecule is quite different from the two types 

of atom from which it is formed. 

`The Mouse's Tale' is fun, but it seems to me 

that here meaning is rather unrelated to form, 

apart from the tale/tail pun. 

Is the form appropriate in order? 

Is a water molecule quite different from the two 

types of atom? 

Is here meaning rather unrelated? 

Table 3.1 Sample of YES/NO questions rejected 

For each of the remaining questions, it was decided whether the generated question 

could be explicitly answered from the input sentence, i.e. the answer was an established 

fact in the sentence. Table 3.2 list a sample of generated questions which were 

considered answerable by the input sentence. 

Input sentence Generated question 

A second useful feature to notice is that the 

sum of all the deviations is equal to zero. 

We now know that the world is spherical and 

you won't fall off because gravity holds you to 

the planet's surface. 

Notice that the cell constant is very small and 

is measured in nanometres. 

Is the sum of all the deviations equal to 

zero? 

Is the world spherical? 

Is the cell constant very small? 

Table 3.2 Sample of YES/NO questions considered answerable by the input 

On the other hand, some questions which were generated were deemed unanswerable by 

the input sentence alone. Note from the samples in Table 3.3 that the reason that each 

question is unanswerable is because of the uncertainty relating to the non-factive verbs. 

47

Input sentence Generated question 

Answers to these questions are beyond the 

scope of this unit, but they indicate that current 

understanding of brain functioning in autism 

is provisional. 

Methodological considerations: critics have 

argued that Lovaas's selection of participants 

is ill-defined, and that the design of the 

interventions can not exclude improvements due 

to confounding factors. 

However, we would suggest that the 

relationship between in-school and out-ofschool 

knowledge is different in the case of 

language. 

Is current understanding of brain 

functioning in autism provisional? 

Is Lovaas's selection of participants illdefined? 

Is the relationship between in-school and 

out-of-school knowledge different in the 

case of language? 

Table 3.3 Sample of YES/NO questions not answerable by the input 

This analysis of the generated questions produced a set of values for each rule; the 

number of questions which could be explicitly answered with absolute certainty by the 

input sentence and the number that could not. Our ideal system would produce only the 

former. 

The same rules were then run against the same data set using Ceist with factive / non- 

factive recognition enabled (Ceist-Factivity). The same set of values are produced but in 

this run it is expected that the number of questions which cannot be explicitly answered 

by the input sentence will be reduced. In addition we measure the intersection of the 

number of good questions from Ceist-Baseline with Ceist-Factivity to determine if there 

were any false positives. 

I adapted the formal model for precision and recall for my analysis of quality increase. 

Previous QG researchers have focused on precision more so than recall but recall is 

48

applicable in the case where one has a baseline generated output and wishes to compare 

with a second set of generated output. Recall allows us to measure any degradation in 

system output. 

The original definitions for precision and recall were given by Van Rijsbergen (1976). 

Van Rijsbergen was writing with respect to information retrieval and so his definitions 

relate to the number of relevant documents and the number of retrieved documents. QG 

researchers can use similar metrics by redefining A and B in Van Rijsbergen’s original 

equations shown in Figure 3.1. 

Figure 3.1 Precision and Recall: Van Rijsbergen’s (1976) formal definition 

Precision for the purposes of question generation defines A as the number of good 

questions and B as the total number of generated questions. This calculates precision as 

“the proportion of good questions out of all generated questions” according to Rus et al. 

(2007a). Similarly, recall can be adapted to measure the number of good questions ‘lost’ 

by the new module. Figure 3.2 shows the adapted versions of the original precision and 

recall definitions. 

49

Figure 3.2 Precision and Recall: Adapted for this research 

The adapted value for precision therefore indicates the overall quality of what was 

generated, i.e. the proportion of acceptable questions (explicitly answerable by the input 

sentence) to unacceptable questions. The adapted value for recall indicates the 

degradation in the system as a result of enabling factive / non-factive recognition. It tells 

us the proportion of acceptable questions which were retained. 

3.3 Summary 

The primary methods used in this research were prototyping and quantitative analysis. 

Prototyping facilitated the development of a QG system using an iterative approach. 

50

The system was built up in stages and based on observations it was modified and 

improved over successive builds. The prototype drew on an existing body of knowledge 

and resources which were referred to continuously. 

The quantitative analysis consisted of two separate assessments. The first uses a script 

to read sample sentences from the OpenLearn educational resource and count 

occurrences of factive or non-factive indicating words and phrases. These counts can be 

used to determine the overall impact which a module capable of recognising these terms 

might have. 

The second quantitative analysis was executed to determine the quality increase that the 

same module would deliver to generated questions outputted by the system. This 

method required a comparison of output with a baseline QG system and with a system 

capable of factive / non-factive recognition. To execute this comparison the formal 

definitions for precision and recall as used in information retrieval were adapted to suit 

the specific needs of this research. 

51

Chapter 4 Software Overview 


This chapter describes the implementation of Ceist, the question generation system 

which I developed for my research. Ceist was designed and built based on source code 

from existing NLP systems, technical documentation and papers and with the direct 

assistance of some of the authors of these systems. The data source used in conjunction 

with Ceist is the Open University’s OpenLearn online educational resource. Textual 

data was extracted and parsed from the entire online OpenLearn resource for use with 

Ceist. 

Chapter 2 described term extraction (2.3.2) and rule-based mapping (2.3.4). This 

provided a general overview of how tagged sentences provide NLP systems with the 

additional information needed to identify the structure of a sentence and the syntax of 

the words in that sentence. The technical detail of how this is accomplished within Ceist 

is described in this chapter. 

Figure 4.1 shows an overview of Ceist’s architecture and this chapter describes some of 

the parts of this architecture. I begin with an explanation of how simple tagged 

sentences are matched using match patterns and then show how a valid match is 

transformed to generate a question using the question template. The process for 

sentences tagged with grammatical structure information differs slightly and so this is 

explained in a separate section. The chapter ends with a description of the rules 

contained in the rule repository and their representation followed by a sub-section 

relating to the factive / non-factive recognition software module. 

52

Figure 4.1 Ceist system architecture 

4.2 Preparation of the OpenLearn data set 

The Open University online educational resource OpenLearn was used as source data 

for Ceist. Error! Reference source not found. Figure 4.2 shows the processing which 

was performed on the entire OpenLearn resource. Relevant text is extracted from the 

units and then parsed with syntactic structure information. 

Figure 4.2 OpenLearn Study Unit processing 

53

The Open University provides the OpenLearn study units in a variety of formats but the 

XML format is particularly useful. This is because the XML tags mark specific content 

within the units such as unit headers, images, tables and the actual course textError! 

Reference source not found.. Figure 4.3 is an extract from one such XML file. 

Figure 4.3 OpenLearn XML format 

A Python script is used to extract only the course text for use with question generation. 

The script targets specific nodes in the XML file and extracts their content. A portion of 

the script is shown in Figure 4.4Error! Reference source not found.. The extracted 

text is cleaned to remove oddities (e.g. table references, pronunciations) and then parsed 

into syntax trees using the Stanford NL Parser. 

54

Figure 4.4 Python extraction script 

The parsed output from the Stanford Parser provides syntactic structure and POS 

information in one line. The resulting data structure is a tree and the Tregex tool which 

is part of the Stanford NLP tools is used to match trees with a similar structure. 

4.3 Matching tagged sentences 

In sub-section 2.3.2 (Term Extraction) the process of parsing a sentence was described. 

To summarise; parsers take plain text and identify its grammatical structure. They 

typically output a tagged version of the input sentence that incorporates the grammatical 

structure in a tree data structure. 

Before I explain the process of searching for patterns in a tree data structure let’s 

examine the simpler case of matching POS tagged only sentences (string data type) with 

regular expressions. Figure 4.5 is an example of the sentence ‘John went to Dublin’ 

after it has been tagged to include POS information. 

55

Figure 4.5 POS tagged sentence 

The words ‘John’ and ‘Dublin’ are both proper nouns and have been tagged as ‘NNP’. 

The tag ‘NNP’ represents a singular proper noun. There are different conventions for 

POS tags and this example is from the Penn Treebank Tagset 10 . Let’s consider finding 

similar sentences to (1) from which we can generate the question ‘Where did John go?’ 

or alternatively ‘Who went to Dublin?’ 

Regular expressions allow specific strings to be found using a formal language. The 

syntax of regular expressions is beyond the scope of this dissertation but I will give a 

brief explanation here. A regular expression contains plain text and formal expressions 

designed to match a specific string pattern. In fact, sentence #1 from Figure 4.5 is a 

valid regular expression which would only match itself. 

The regular expression which allows us to match any sequence of characters, excluding 

white space and special characters is: 

10 http://www.cis.upenn.edu/~treebank/ 

.*? 

56

We can use this pattern to match a sentence with an exact sequence of POS tags, 

regardless of the words. For example: 

(a) .*?/NNP .*?/VBD .*?/TO .*?/NNP ./.* 

This pattern matches (1) above and (2) and (3) in Figure 4.6 below, but not (4). Because 

the final word in (4) is tagged as NN (a noun), the pattern above will not match it. It 

requires the fourth word to be a proper noun (NNP). This is good; we have ensured the 

pattern only matches proper nouns for the fourth word. 

Figure 4.6 Example with proper nouns 

Consider sentences (5) and (6) in Figure 4.7 and the problem which will occur if we 

persist with pattern (a) above. Given the aim of finding sentences which allow us to ask 

the question ‘Where did John go?’, (6) is invalid because it refers to a person called 

‘Washington’, but our pattern (a) will match it. The pattern must be refined to ensure 

that only sentences relating to a persons movement are matched. It could be argued that 

the ‘Washington’ in (5) refers to a person too but like many NLP processes we base our 

57

example on probability. It is more likely that the verb ‘went’ refers to a city rather than a 

person. 

Figure 4.7 Example of POS limitation 

The pattern is re-written to accept only the verb ‘went’ as shown in (b) below. It will 

work but we are discounting all other verbs relating to movement (e.g. walked, traveled, 

hiked, ran, etc.) We could write a separate pattern for each motion related verb but there 

is an easier solution. 

(b) .*?/NNP went/VBD .*?/TO .*?/NNP ./.* 

Regular expressions allow one to form a disjunction of several sub-patterns which may 

be acceptable in a pattern. The pattern (c) below incorporates a sample of three motion 

related verbs to expand the coverage of the overall pattern. Any of the three verbs is 

acceptable as the second word. 

(c) .*?/NNP (ran|went|hiked)/VBD .*?/TO .*?/NNP ./.* 

58

There are several motion related verbs in the English language. Inserting them all into a 

regular expression would make the resulting pattern very long and difficult to read. 

Furthermore, if we mistakenly omit a verb and wish to add it later, we must find every 

pattern using motion verbs and update each separately. A QG system could have many 

such patterns. 

Ceist allows the creation of groups. Groups are disjunctions with a simple name tag 

which can be used in regular expressions. It is possible to create a group called 

motionVerbs, for example, with the members ‘ran’, ‘went’ and ‘hiked’. At runtime 

Ceist will replace any occurrence of motionVerbs in a regular expression with the 

pattern ‘(ran|went|hiked)’. This means that the pattern (d) below is actually 

interpreted as (c) by the regular expression engine in Ceist. 

(d) .*?/NNP motionVerbs/VBD .*?/TO .*?/NNP ./.* 

This ability to match sentences containing specific words or POS tags like this is quite 

powerful because for QG a specific question may only be applicable to a very precise 

set of sentences. Our example has demonstrated this for the question ‘Where did 

go?’ When a sentence matching specific patterns is found, it is transformed 

in such a way as to generate a question. The next sub-section explains how Ceist uses 

capture groups within regular expressions to do this. 

59

4.4 Transformation to questions 

One way in which a question is generated from a sentence is by a rearranging or 

transformation of the words in the sentence. A question can be generated from example 

sentence ‘John went to Dublin’ by a simply repositioning the subject ‘John’ and 

changing the verb form of ‘went’ and asking ‘Where did John go?’. The question 

‘Where?’ can only be asked because the sentence is of the form ‘ went to 

’. Finding sentences of a specific form was the objective of the pattern 

creation described in the previous sub-section for this reason. 

The task of repositioning of the subject requires finding the subject from the matched 

sentence and placing it into the generated question at a specific place. Regular 

expressions allow certain parts of the match pattern to be marked as groups by using 

parentheses. Groups are numbered in sequence after group zero, which represents the 

entire match. The engines which provide regular expression capability allow access to 

these groups by the sequence number. 

Ceist allows access to these groups within a pattern by naming them using the syntax 

‘=gXX’ where XX specifies the group number. The group number can then be used to 

reposition matched terms to form a question as can be seen in the screenshot from Ceist 

in Figure 4.8. 

60

Figure 4.8 Group matching in Ceist 

Templates, such as the question template in Figure 4.8, use the forward slash as a meta- 

character to indicate that the group number matched in the pattern should be inserted at 

this point in the template. The screenshot shows the first proper noun is tagged as group 

1 (NNP=g1) and then inserted into the question template using ‘/1’ producing the 

desired result. Note that Ceist indicates the matched group numbers in its results display 

and also colours the matched terms green if they are used in the question template or 

black if used in the answer template. 

The combination of a match pattern and a question generation template is known as a 

rule. The rule storage format is described in sub-section 4.6 of this chapter. 

61

4.5 Matching sentences with structure information 

One of the main focus points of Ceist was rule flexibility. It was very important that 

Ceist had the ability to group vast permutations of words (be they phrases or semantic 

groups) and thus reduce the number of rules. In Chapter 2 sub-section 2.3.4 (Pattern 

matching and rule mapping) the importance of this is explained. How this is achieved 

technically by Ceist is the subject of this sub-section. 

Here is the same example sentence I have been using to this point, displayed with its 

grammatical structure information embedded. 

Figure 4.9 Sentence with grammatical structure 

This output was generated using the Stanford Parser (Klein and Manning, 2003), a 

working version of which is available online 11 . 

11 http://nlp.stanford.edu:8080/parser/ 

62

The additional information present in this sentence provides us with the flexibility we 

sought. Rather than matching a sentence of the format ‘NNP’, ‘went to’, ‘NNP’, which is 

limited to sentences beginning with a single proper noun, we can search for sentences 

beginning with a noun phrase. Allowing Ceist to match ‘NP’ 12 , ‘went to’, ‘NNP’ 

matches all the sentences in Figure 4.10 using one expression. 

Figure 4.10 Demonstrating use of noun phrase 

In order to efficiently search for a noun phrase in the sentence that includes structure 

information, regular expressions alone are not sufficient. This limitation was identified 

by Roger Levy and Galen Andrew (2006) and they wrote the Tregex tool to extend 

regular expressions to work with tree data structures. If we view the example sentence 

above as a tree structure it can be seen how this works. A sentence with only four words 

has a relatively complex sentence structure as can be seen in Figure 4.11. This tree 

structure viewer was adapted from the Tregex source code and is integrated into Ceist. 

12 NP is the tag used to indicate a noun phrase group in the Penn Treebank tagset 

63

Figure 4.11 Sentence structure viewed as a tree in Ceist 

Whereas normal regular expression engines are capable of matching only linear strings, 

Tregex allows regular expressions to be written to match specific nodes within tree data 

structures. Only by using Tregex, can Ceist search for a noun phrase node followed by 

the words ‘went’ and ‘to’ and the POS tag ‘NNP’. It also provides tree manipulation 

functions that can, for example, allow Ceist to find all the leaves of a match node. An 

example of a Tregex expression is shown in the match pattern in Figure 4.8. 

4.6 Rule representation 

Rule-based mapping in Chapter 2 (2.3.4) touched on representations using XML to 

store rules. Rules in Ceist consist of a match pattern and a template. The match pattern 

is used to determine whether the question generation template can be applied to a 

64

sentence. Ceist also uses XML to store rules and an example is shown in Figure 4.12, 

which contains one such rule. 

Figure 4.12 XML representation of a rule in Ceist 

The XML in Figure 4.12 represents one rule containing a match pattern (), a question template () and also includes an 

answer template (). The entire rule is contained within a 

rule element. The name attribute of rule can be used to name the rule or to 

summarise it as is done in the figure. A brief description of each section follows. 

The match-patterns section contains each part of the match pattern. This bears 

some similarity to the approach employed by QuALiM. The parse elements within the 

match-pattern each represent a part of the regular expression sent to Tregex by Ceist. 

65

The attribute id is important. This attribute is used by the template to generate the 

question using matched parts and will be described in the next paragraph. The level 

attribute is simply an indentation level used by Ceist to allow nested expressions. The 

value for each parse element contains the actual item to be matched. The first three 

elements represent a noun phrase group (NP), the word was and a past participle 

(VBN). 

The final parse element shown in Figure 4.12 contains some XML formatted characters 

but actually represents ‘NP

Figure 4.13 Rule editing interface in Ceist 

When Ceist finds a sentence that matches the pattern described previously it then uses 

the question-template element to generate the question. The question- 

template element contains both word and ref elements. Each element is read in 

sequence and added to the output to generate the final question. A word element simply 

inserts the text value onto the output. The integer value given in the ref element is 

used to find the group of the matched sentence which should be appended to the output 

as was described in section 4.3. 

The final sentence consists of a question which has been formed from a template in 

combination with matched parts of the original input sentence. A rule based system will 

consist of several rules and it would be expected that given a large block of text that 

67

many of these rules will match sentences in the text and consequently generate a 

number of questions. 

4.7 Factive / non-factive recognition functionality 

Enhancing Ceist with factive / non-factive recognition capability was done using the list 

of words first drafted up by Hooper in 1974 (and presented in Table 2.2) as a starting 

point. The list of words was expanded using the thesaurus provided with Apple’s Mac 

OS X operating system; a digital version of The Oxford American Writer’s Thesaurus 

(Lindberg, 2004) which contains both American and British English phrases. I extended 

the list with both words and phrases from the thesaurus. 

Where words were derived from the thesaurus for an existing word in Hooper’s list, the 

new words were inserted into the same category as Hooper had originally assigned. It 

was not uncommon for the same word to appear in the thesaurus for two words from 

two different categories in Hooper’s list. This indicates that the process if categorising 

words in the manner which Hooper has done is quite difficult. Categorisation was not 

required for this research but I do propose an approach to assigning a factivity value to 

words in the conclusions chapter (Chapter 6). 

The collection of factive and non-factives was used in two different ways to carry out 

both of the quantitative analysis methods described in Chapter 3. The Tregex engine did 

have some limitations which prevented the use of phrases within Ceist so in order to 

gain an accurate figure for overall impact, using words and phrases, a script was used. 

68

The words and phrases from the list were then converted to regular expressions. A ruby 

script was used to create the expressions, and the same script was used to count the 

frequencies within the OpenLearn parsed trees as presented in Chapter 5 – results. 

Factive / non-factive recognition was implemented by using Ceist’s grouping feature. 

Two groups were created named allFactive and allNonfactive containing all 

the words from the comprehensive list. The regular expressions for each word were 

written to include all verb forms. Enabling factive or non-factive recognition was then 

done for a rule by adding a check for allFactive or allNonfactive at the 

desired position in that rule. 

4.8 Summary 

Ceist uses OpenLearn as a data source. The benefit of the XML format study units 

available from OpenLearn online was discussed. This chapter described how an 

OpenLearn unit in this format is converted to a format which is searchable for the 

purposes of natural language processing. 

I also detailed the use of regular expressions to match patterns in sentences. This 

permits the targeting of similar sentences from which a specific question can be 

generated. A feature of Ceist is to allow grouping of semantically similar words to 

making pattern definition easier and maintainable. 

69

The method of generating questions I have focused on is to take some of the matched 

terms from input sentences and change their order and/or transform them using 

techniques such as lemmatisation. This chapter described how captured groups within 

regular expression patterns facilitate this process. 

The chapter details how the combined regular expression pattern and a template for 

generating a question from this expression are stored together to form a rule. The 

combination of many such rules forms a QG system. 

Finally I described the addition of factive / non-factive predicate recognition to Ceist. I 

used a comprehensive list of verbs and phrases derived from other research and 

represented this list in a script and into Ceist as regular expressions. This functionality 

allowed me to analyse the impact of same on a QG system. 

70

Chapter 5 Results 


There were two sets of quantitative analysis done for this research. The first focused on 

measuring the overall impact which factive / non-factive recognition can have within 

QG as a whole. I measured this by counting the frequency at which factive or non- 

factive words and phrases occurred within a sample of the educational discourse 

OpenLearn. The results were obtained at a general level and at a specific clause type 

and are presented in this chapter. 

The second analysis measured the benefit of factive / non-factive recognition with 

regard to the quality of the QG system output. A detailed description of quantitative 

analysis of question quality is given in Chapter 3 (3.2.3) and this analysis applied the 

methods described therein. This chapter presents the values for precision and recall 

obtained for two rules in Ceist. 

5.2 Overall impact 

For impact analysis the entire OpenLearn online resource was used. The parsed resource 

contains over 100,000 14 sentences in over 475 study units and covers a varied range of 

topics in an educational format. The factive / non-factive recognition module used a list 

of words and phrases which indicate factivity. In the software overview (Chapter 4 - 

14 This is the number of sentences within content paragraphs. Other content within questions, captions, 

activities, tables etc. was not used. 

71

4.7) I explained how Hooper’s list of factive and non-factive predicates (Table 2.2) was 

expanded to produce a larger set of predicates. The complete list is available in 

Appendix B. 

The first set of data shows the number of occurrences of factive / non-factive indicating 

phrases in a sample of our parsed study units. The samples were a random selection of 

17 study units across various categories. The data is presented in tabular format in Table 

5.1 and as a pie chart in Figure 5.1. 

Item Count Percentage 

Total sentences 4673 100% 

Containing factive phrase 362 7.75% 

Containing non-factive phrase 655 14.01% 

Containing factive or non-factive phrase 1017 21.76% 

Table 5.1 Occurrences of factive / non-factive 

Figure 5.1 Chart showing occurrences of factive / non-factive 

72

This gives a good indication of the frequency of factive / non-factive usage in 

educational material. Over 20% of sentences contain some factive or non-factive 

indicating verb or phrase. 

For the purposes of QG and this research we would like to focus on studying only the 

occurrences where there is a declarative content clause immediately succeeding the 

factive / non-factive phrase. This allows us to ignore such sentences as ‘I know who you 

are.’ or ‘They saw the movie.’ and analyse instead sentences such as ‘He claims that 

there were 10 people at the conference.’ 

The sentences were analysed again to measure only the occurrences where the clause 

containing a factive / non-factive phrase immediately preceded another clause. We 

make the assumption that the succeeding clause is a subordinate clause. Again the data 

is presented in tabular format in Table 5.2 and as a chart in Figure 5.2. 






Table 5.2 Occurrences of factive / non-factive directly before a new clause 

73

Figure 5.2 Occurrences of factive / non-factive directly before a new clause 

The data shows that just under 7% of sentences contain a clause preceded by a factive or 

non-factive phrase. Almost 4.5% of these are non-factive. Current QG efforts focus 

mainly on declarative clause types because typically such clauses make a statement 

about which a question can be asked. 

My analysis measures occurrences in all clause types. Further work could evaluate 

declarative clauses only. In any case if statements made in 4.5% of sentences are taken 

to be true when they are not, then questions generated from this 4.5% will possibly not 

be answerable by the sentence. 

5.3 Quality Increase 

The performance analysis measures the improvement in the quality of generated 

questions once factive / non-factive recognition has been included. This involved 

74

unning Ceist with two rules designed specifically to generate questions from statements 

in declarative content clauses. The method used was outlined in the research methods 

chapter (3.2.3). Briefly summarising, the output from both rules was firstly cleaned by 

removing grammatically incorrect or incomplete sentences. The remaining questions 

were categorised as answerable by the input sentence or not answerable by the input 

sentence. 

The output from a baseline Ceist system without factive / non-factive recognition 

(Ceist-Baseline) was assessed and then the output from an improved system with the 

extra functionality added (Ceist-Factivity) was also assessed. The results are presented 

in tabular format in Table 5.3. 

Rule 1 - YES/NO 

Answerable by sentence Precision Recall 

Ceist-Baseline YES 20 48% N/A 

NO 22 

Ceist-Factivity YES 17 71% 85% 

NO 7 

Rule 2 - WHAT 



NO 24 


NO 0 

Table 5.3 Quality increase results 

75

It can be seen that precision has increased for both rules at some cost to recall. Notably 

for rule 2, all unanswerable questions were removed giving a precision of 100%. Rule 2 

did however see a decrease in answerable questions of 25%. The pie charts below show 

the results for precision for each rule. The lighter blue represents questions deemed 

unanswerable by the input sentence. Chapter 6 will assess these results in more detail 

but it can be seen that precision has improved in both cases. 

Figure 5.3 Proportion of Answerable Questions for Rule 1 

76

Figure 5.4 Proportion of Answerable Questions for Rule 2 

Recall was added a metric so that we could determine the degradation in the system as a 

result of adding factive / non-factive recognition. The results for recall show that there 

were acceptable good questions for Ceist-Baseline which were removed by Ceist- 

Factivity. The bar chart in Figure 5.5 represents that degradation of 85% for rule 1 and 

75% for rule 2. This result can be interpreted as a setback but there are some researchers 

that believe we must aim for maximum precision at a cost to recall, i.e. quality over 

quantity. In my conclusions (Chapter 6) I refer to previous work which has done so. 

77

Figure 5.5 Recall values for both rules 

78

Chapter 6 Conclusions 


It was the intention of this project to both answer the research question and at the same 

time to provide some benefit to the QG community. I believe the work done has been 

largely successful in accomplishing both of these general aims. This chapter outlines 

some of the conclusions which can be drawn from the research. 

6.2 Assessment of factive/non-factive recognition module 

From the results of the first analysis it was found that over 20% of sentences in the 

educational discourse OpenLearn contained either a factive or non-factive verb or 

phrase. This is quite high and would justify further research in the area. I also measured 

the occurrence of factives or non-factives preceding another clause and this was also 

high, accounting for over 6.5% of sentences. 

One conclusion which Rus et al. (2007a) drew from evaluating their QG system was 

that it was better to generate a smaller number of good precise questions rather than a 

larger number of questions, containing lots of bad ones. They used precision as their 

metric to report performance. 

There was an increase in precision for questions answerable by the input sentence when 

using factive / non-factive recognition. In the case of one rule, all unanswerable 

questions were removed. The drawback is the detection of false positives, i.e. questions 

79

which are actually answerable by the input sentence but incorrectly removed by factive / 

non-factive recognition. 

I believe that a system should be designed to primarily produce high precision results 

and then improved to include valid output which has been incorrectly rejected. Based on 

this belief I would call rate the module performance as quite good. There is room for 

improvement as I will outline below but it did perform adequately. 

6.3 Further work 

The work done relating to factivity to date is very black and white. The researchers have 

been concerned with determining whether the content clause in a statement is 

presupposed or not. I believe an approach to measuring factivity would be a progression 

on the current body of knowledge. The speaker is more confident that there were 10 

people at the conference in (3) below yet current research only tells us that both ‘be 

sure’ and ‘be possible’ are non-factive. 

(3) I am sure that there were 10 people at the conference. 

(4) It was possible that there were 10 people at the conference. 

It would be more useful if a NLP system had more choice in the level of confidence 

acceptable to it. If the phrase ‘be sure’ indicates a level of confidence higher than ‘is 

possible’ then this should be measurable and thus selectable by a NLP system. A 

80

stochastic approach could analyse these phrases to ascertain the probability that given, 

for example, the speaker being sure about a proposition, that the proposition is true. 

81

References 

Brill, E. (1992) ‘A simple rule-based part-of-speech tagger’, in Proceedings of the Third 

Conference on Applied Natural Language Processing, Trento, Italy, pp. 152-155. 

Brown, J.C., Frishkoff, G.A. and Eskenazi, M. (2005) ‘Automatic Question Generation 

for Vocabulary Assessment’, in Proceedings of the Conference on Human Language 

Technology and Empirical Methods in Natural Language Processing, Vancouver, BC, 

Canada October 6-8 2005, Morristown, NJ, USA, Association for Computational 

Linguistics, pp. 819-826. 

Cai, Z., Rus, V., Kim, H.J., Susarla, S.C., Karnam, P., and Graesser, A.C. (2006) 

‘NLGML: A Markup Language for Question Generation’, in Proceedings of World 

Conference on E-Learning in Corporate, Government, Healthcare, and Higher 

Education, Honolulu, Hawaii, USA October 13-17 2006, Chesapeake, VA, USA. 

Association for the Advancement of Computing in Education, pp. 2747-2752. 

Doran, C., Egedi, D., Hockey, B.A., Srinivas, B. and Zaidel, M. (1994) ‘XTAG System 

- A Wide Coverage Grammar for English’, in Proceedings of the fifteenth International 

Conference on Computational Linguistics, Kyoto, Japan August 5-9 1994, Morristown, 

NJ, USA, Association for Computational Linguistics, pp. 922-928. 

Fellbaum, C. (1998) WordNet: An Electronic Lexical Database, Cambridge, MA, USA, 

MIT Press. 

Gates, D. (2008) 'Generating Look-Back Strategy Questions from Expository Texts', 

Workshop on the Question Generation Shared Task and Evaluation Challenge, 

Arlington, VA, USA September 25-26 2008. 

Gates, D. (January 6 2009), ‘Generating Questions from Text’, e-mail to B. Wyse. 

Hooper, J. (1974) ‘On assertive predicates’, in Kimball, J. (ed.) Syntax and Semantics, 

vol. 4, pp. 91-124, New York, Academic Press. 

Huddleston, R. and Pullum, G.K. (2005) A Student’s Introduction to English Grammar, 

New York, Cambridge University Press. 

Jurafsky, D. and Martin, J.H. (2009) Speech and Language Processing: An Introduction 

to Natural Language Processing, Speech Recognition, and Computational Linguistics, 

Second Edition. Upper Saddle River, NJ, USA, Prentice-Hall. 

Kaisser, M. and Becker, T. (2004) ‘Question Answering by Searching Large Corpora 

with Linguistic Methods’, in Proceedings of the thirteenth Text REtrieval Conference, 

TREC 2004, Gaithersburg, MD, USA November 16-19 2004, Gaithersburg, MD, USA, 

National Institute of Standards and Technology. 

Karp, D., Schabes, Y., Zaidel, M. and Egedi, D. (1992) ‘A Freely Available Wide 

Coverage Morphological Analyzer for English’, in Proceedings of the fourteenth 

International Conference on Computational Linguistics, Nantes, France August 23-28 

1992, Morristown, NJ, USA, Association for Computational Linguistics. 

Karttunen, L. and Wittenburg, K. (1983) ‘A two-level morphological analysis of 

English’, Texas Linguistic Forum, vol. 22, pp. 217-228. 

Kiparsky, P. and Kiparsky, C. (1970) ‘Fact’, in Bierwisch, M. and Heidolph, K.E. (eds.) 

Progress in Linguistics, pp. 143-173, The Hague, Mouton. 

82

Klein, D. and Manning, C. (2003) ‘Fast Exact Inference with a Factored Model for 

Natural Language Parsing’, Advances in Neural Information Processing Systems, vol. 

15, pp. 3-10. 

Levy, R. and Andrew, G. (2006) ‘Tregex and Tsurgeon: tools for querying and 

manipulating tree data structures’, in Proceedings of the fifth International Conference 

on Language Resources and Evaluation, LREC 2006, Genoa, Italy May 24-26 2006, pp. 

2231-2234. 

Lindberg, C. (2004) The Oxford American Writer’s Thesaurus, New York, Oxford 

University Press. 

Mitkov, R. and Ha, L.A. (2003) ‘Computer-Aided Generation of Multiple-Choice Tests’ 

in Proceedings of the HLT-NAACL 03 workshop on Building educational applications 

using natural language processing – Volume 2, Edmonton, Canada May 31 2003, 

Morristown, NJ, USA, Association for Computational Linguistics, pp. 17-22. 

Nielsen, R.D., Buckingham, J., Knoll, G., Marsh, B. and Palen, L. (2008) ‘A Taxonomy 

of Questions for Question Generation’, Workshop on the Question Generation Shared 

Task and Evaluation Challenge, Arlington, VA, USA September 25-26 2008. 

Piwek, P., Prendinger, H., Hernault, H. and Ishizuka, M. (2008) ‘Generating Questions: 

An Inclusive Characterization and a Dialogue-based Application’, Workshop on the 

Question Generation Shared Task and Evaluation Challenge, Arlington, VA, USA 

September 25-26 2008. 

Porter, M.F. (1980) ‘An algorithm for suffix stripping’ Program, vol. 14(3) , pp 

130−137. 

Rus, V., Cai, Z. and Graesser, A.C. (2007a) ‘Experiments on Generating Questions 

About Facts’ Computational Linguistics and Intelligent Text Processing, vol. 4394, pp. 

444-455. 

Rus, V., Cai, Z. and Graesser, A.C. (2007b) ‘Evaluation in Natural Language 

Generation: The Question Generation Task’ in Workshop on Shared Tasks and 

Comparative Evaluation in Natural Language Generation, Arlington, VA, USA April 

20-21. 

Rus, V. and Graesser, A.C. (2009) The Question Generation Shared Task and 

Evaluation Challenge 

Silveira, N. (2008) ‘Towards a Framework for Question Generation’ , Workshop on the 

Question Generation Shared Task and Evaluation Challenge, Arlington, VA, USA 

September 25-26 2008. 

Soanes, C. and Stevenson, A. (2005a) ‘factive adjective’ in The Oxford Dictionary of 

English, Revised Edition, Oxford Reference Online, viewed 3 rd January 2010, 

http://www.oxfordreference.com/views/ENTRY.html?subview=Main&entry=t140.e265 

73 

Soanes, C. and Stevenson, A. (2005b) ‘non-factive adjective’ in The Oxford Dictionary 

of English, Revised Edition, Oxford Reference Online, viewed 3 rd January 2010, 

http://www.oxfordreference.com/views/ENTRY.html?subview=Main&entry=t140.e524 

56 

Tapanainen, P. and Järvinen, T. (1997) ‘A non-projective dependency parser’ in 

Proceedings of the fifth Conference on Applied Natural Language Processing, 

83

Washington DC, USA March 31 – April 3, San Francisco, CA, USA, Morgan 

Kaufmann, pp. 64-71. 

Van Rijsbergen, C. J. (1976) Information Retrieval, London, Butterworths. 

Wang, W., Hao, T. and Liu, W. (2008) ‘Automatic Question Generation for Learning 

Evaluation in Medicine’ in Advances in Web Based Learning – ICWL 2007, Edinburgh, 

UK, August 15-17, 2007, New York, USA, Springer-Verlag. 

Wyse, B. and Piwek, P. (2009) ‘Generating Questions from OpenLearn Study Units’ in 

Rus, V. and Lester, J. (eds.), Proceedings of the 2 nd Workshop on Question Generation, 

AIED 2009 Workshop Proceedings, pp. 66-73. 

84

Index 

anaphora, 7 

Ceist, 15, 52 

factivity, 38, 68, 80 

intelligent tutoring systems (ITS), 2 

lemmatisation, 36 

mapping, 29, 60 

mark-up, 17, 34, 64 

OpenLearn, 45, 53, 71 

parsing, 23 

POS tagging, 21 

precision, 48, 76 

prototyping, 43 

QG systems, 16 

QG task, 4 

recall, 48, 76 

regular expression, 25, 56 

Shared Task and Evaluation Campaign 

(STEC), 3 

sub-tasks, 19 

Term extraction, 24 

Tregex, 27, 63 

WordNet, 27

Appendix A – Extended Abstract

Factive / non-factive predicate recognition within 

Question Generation systems 

Brendan Wyse 

Extended Abstract of Open University MSc Dissertation Submitted 9 March 2010 

Introduction 

Question Generation (QG) is a relatively new field of study in linguistics and computing. A 

QG system takes plain text as an input and generations questions relating to that free text 

such as in the figure below. 

One key area where these systems are used is in Intelligent Tutoring Systems (ITS). These 

systems go beyond simply delivering educational content. They also interact with the 

student. One manner in which they interact is by asking questions.

I focused on a particular area of QG which is the generation of questions from a single 

sentence where the answer to the question is contained in the sentence. Although 

declarative content clauses in single sentences always make statements, these statements 

may not actually be an established fact. In the figure below, the speaker in (2) is more 

certain about the number of attendees than the speaker in (1). 

This is important because if a QG system is to ask a question about a sentence, where the 

answer is in that sentence; it must know what has been established as fact in the sentence. 

The key to solving this problem is recognising the words which decide the factivity of the 

declarative content clause; factive and non-factive predicates (bold type in the figure 

above). I developed a QG system, Ceist, for this research and used it to assess a software 

module capable of recognising factive and non-factive words within input sentences. 

Method 

To introduce factive / non-factive recognition to Ceist, a list was drawn up of words which 

may indicate factivity. This list was formed by expanding work from an existing researcher 

with the aid of a thesaurus. 

The list also allowed some impact analysis to be carried out on a new data source, a parsed 

version of the online educational resource OpenLearn created specifically for this research.

The impact analysis sought to measure any improvement in generated question quality 

produced by factive / non-factive recognition. This was done by comparing a baseline Ceist 

system with a system incorporating the new functionality. 

Results 

The data set consisting of factive and non-factive indicating words and phrases is a 

comprehensive list. It was converted to regular expressions and matched against a subset of 

the parsed OpenLearn study units to determine the frequency of such words and phrases. 

The following table shows the frequency counts. 






The tabular data above is also represented in the pie chart below. As can be seen from the 

pie chart, the proportion of sentences in educational discourse containing either a factive or 

non-factive verb or phrase is high. Over 20% of sentences contained at least one term 

which could be classed as factive or non-factive.

A second test examined the case where the terms immediately preceded a complement 

clause. This would be the case for declarative content clauses such as that-clauses (e.g. I 

know that there were 10 people at the conference.) The results for this test are contained in 

the following table. 






As would be expected due to limiting the accepted matches, the number of sentences 

matching was reduced to just over 6.5%, with 4.4% being non-factive terms. This could be 

taken to indicate that questions generated from these 4.4% of sentences would be 

potentially flawed if they assume the proposition in the content clause contains a fact and 

thus an answer.

The second analysis focused on measuring the benefit in generated question quality which a 

factive / non-factive recognition module would deliver to a QG system. Output from a 

baseline QG system was compared to output from a QG system with factivity enabled. The 

proportion of good questions (precision) and any degradation in system performance 

(recall) were measured. This was done for two different question generation rules and the 

data are presented in the table below. 

Rule 1 - YES/NO 



NO 22 


NO 7 

Rule 2 - WHAT 



NO 24 


NO 0 

Precision is the proportion of answerable questions generated and has increased in both 

cases. Recall was used to measure degradation in system performance. It indicated the 

removal of what had been acceptable questions in Ceist-Baseline by Ceist-Factivity. There 

were some perfectly good questions which were not accepted by the system with factive / 

non-factive recognition.

Analysis 

The frequency at which factive and non-factive verbs and phrases are used in educational 

discourse is quite high. Around 20% of all sentences contain at least one such factivity 

indicating term. Even focusing on only sentences where a non-factive immediately 

precedes a new clause accounted for 4.4% of all sentences. 

The increase in question quality was high for both test rules. In the test using the first rule 

the question quality went from a measurement of less than half of the generated questions 

being acceptable to over 70%. In the test using the second rule, precision jumped from just 

over 50% to 100%. There were no unanswerable questions generated. 

In both cases there was a negative impact where the factive / non-factive recognition 

module actually removed some answerable questions. This was captured by the recall 

values of 85% and 75% respectively. 

Discussion 

I believe that the results of this research show that factive / non-factive recognition is an 

area of NLP which has a part to play in question generation. This is definitely the case 

when systems need to engage in dialog. Systems will need to move beyond simple 

grammatical structure and semantics and start to extract and work with some of the intricate 

details in language such as factivity. This is important if systems are to begin to ask the 

most appropriate questions.

Over 20% of sentences in OpenLearn contain some factive or non-factive indicating verb. 

Despite this high usage frequency I was not able to find any NLP tool capable of telling me 

that ‘I know’ expresses a lot more certainty about a subject than ‘I think’. I see this as an 

important part of future work with question generation. 

It was expected that the comprehensive list of factives and non-factives would indeed 

eliminate many of the unanswerable questions generated. It was also expected that there 

would be some false positives. This is because some work is needed to formally establish 

boundaries for factive and non-factive verbs. My own list was simply an original list 

extended by thesaurus. 

If I were to extend this work further I would attempt to assign a factivity value to each verb 

based on some formal assessment of their usage in a large discourse. A verb would be 

assigned a numeric value indicative of the certainty that the statement following it is 

absolutely a fact. A tool capable of distinguishing between ‘I know’ and ‘I think’ at a 

software level would then be a possibility.

Appendix B – List of factive and non-factive predicates 

The following is a list of the words and phrases used to evaluate the impact of factive / non-factive recognition. An asterisk indicates a 

word from Hooper’s list. The grey terms indicate which of Hooper’s words was used to derive the new word or phrase in the list. 

Factive 

Assertive (semi-factive) 

find out found out * come to know came to know < found out 

discover discovered * bring to light brought to light < found out 

know knew * discern discerned < found out 

learn learnt * unearth unearthed < found out 

learned cotton on cottoned on < found out 

note noted * catch on caught on < found out 

notice noticed * twig twigged < found out 

observe observed * be aware was aware < know 

perceive perceived * be conscious was conscious < know 

realize realized * be informed was informed < know 

realise realised < localisation sense sensed < know 

recall recalled * hear heard < learn 

remember remembered * understand understood < learn 

reveal revealed * establish established < learn 

see saw * suss out sussed out < learn 

recollect recollected < remember spot spotted < noticed 

ascertain ascertained < discover make out made out < perceive 

figure out figured out < discover grasp grasped < perceive 

work out worked out < discover take in took in < perceive 

fathom fathomed < discover find found < perceive 

recognise recognised < discover register registered < realize 

recognize recognised < localisation get the message got the message < realize 

become aware became aware < found out tell told < reveal 

detect detected < found out let slip let slip < reveal 

expose exposed < found out let drop let drop < reveal 

disclose disclosed < found out give away gave away < reveal 

get to know got to know < found out determine determined < see


Strong assertive 

acknowledge acknowledged * mention mentioned * 

admit admitted * point out pointed out * 

affirm affirmed * predict predicted * 

allege alleged * prophesy prophesied * 

answer answered * postulate postulated * 

argue argued * remark remarked * 

assert asserted * reply replied * 

assure assured * report report * 

certify certified * say said * 

charge charged * state stated * 

claim claimed * suggest suggested * 

contend contended * swear swore * 

declare declared * testify testified * 

divulge divulged * theorize theorized 

emphasize emphasized * theorise theorised < localisation 

emphasise emphasised < localisation verify verified * 

explain explained * vow vowed * 

grant granted * write wrote * 

guarantee guaranteed * accept accepted < acknowledge 

hint hinted * concede conceded < acknowledge 

hypothesize hypothesized ? Not in dictionary confess confessed < acknowledge 

hypothesise hypothesised < localisation proclaim proclaimed < affirm 

imply implied * pledge pledged < affirm 

indicate indicated * give an undertaking gave an undertaking < affirm 

insist insisted * respond responded < answer 

intimate intimated * retort retorted < answer 

maintain maintained * announce announced < assert 

(Continued)



retort retorted < answer forecast forecasted < predict 

announce announced < assert foresee foresaw < predict 

pronounce pronounced < assert anticipate anticipated < predict 

avow avowed < assert tell in advance told in advance < predict 

ensure ensured < assure envision envisioned < predict 

confirm confirmed < assure foretell foretold < prophesy 

promise promised < assure forewarn of forewarned of < prophesy 

attest attested < certify prognosticate prognosticated < prophesy 

provide evidence provided evidence < certify propose proposed < postulate 

give proof gave proof < certify assume assumed < postulate 

prove proved < certify presuppose presupposed < postulate 

demonstrate demonstrated < certify take for granted took for granted < postulate 

profess professed < claim utter uttered < say 

communicate communicated < divulge recommend recommended < suggest 

publish published < divulge advise advised < suggest 

stress stressed < emphasized speculate speculated < theorise 

highlight highlighted < emphasized justify justified < verify 

press home pressed home < emphasized validate validated < verify 

make clear made clear < explain authenticate authenticated < verify 

describe described < explain record recorded < write 

spell out spelt out < explain log logged < write 

allow allowed < grant list listed < write 

appreciate appreciated < grant scribble scribbled < write 

insinuate insinuated < hint scrawl scrawled < write 

signal signalled < hint agree agreed * 

mean meant < hint be afraid was afraid * 

say indirectly said indirectly < imply be certain was certain * 

convey the impression conveyed the impression < imply be sure was sure * 

make known made known < indicate be clear was clear * 

reiterate reiterated < insist be obvious was obvious * 

make public made public < intimate be evident was evident * 

(Continued)



identify identified < point out be indisputable was indisputable < be clear 

decide decided * be beyond doubt was beyond doubt < be clear 

deduce deduced * be beyond question was beyond question < be clear 

estimate estimated * be blatant was blatant < be clear 

hope hoped * be apparent was apparent < be obvious 

presume presumed * be noticeable was noticeable < be evident 

surmise surmised * reckon reckoned < calculate 

suspect suspected * elect elected < decide 

concur concurred < agree choose chose < decide 

be fearful was fearful < be afraid opt opted < decide 

be frightened was frightened < be afraid reason reasoned < deduce 

be scared was scared < be afraid infer inferred < deduce 

be alarmed was alarmed < be afraid glean gleaned < deduce 

be petrified was petrified < be afraid judge judged < estimate 

be terrified was terrified < be afraid gauge gauged < estimate 

be confident was confident < be certain approximate approximated < estimate 

be positive was positive < be certain desire desired < hope 

be convinced was convinced < be certain wish wished < hope 

be satisfied was satisfied < be certain fancy fancied < surmise 

be in no doubt was in no doubt < be certain calculate calculated * 

be crystal clear was crystal clear < be clear specify specified < point out 

be unmistakable was unmistakable < be clear


Weak assertive Non-assertive (non-negative) Non-assertive (negative) 

think thought * be likely was likely be unlikely was unlikely 

believe believed * be possible was possible be impossible was impossible 

suppose supposed * be probable was probable be improbable was improbable 

expect expected * be conceivable was conceivable be inconceivable was inconceivable 

imagine imagined * doubt doubted 

guess guessed * deny denied 

seem seemed * 

appear appeared * 

figure figured * 

feel felt < think 

sense sensed < suppose 

trust trusted < suppose 

forecast forecasted < expect 

visualize visualized < imagine 

visualise visualised < localisation 

envisage envisaged < imagine 

picture pictured < imagine 

conceive conceived < imagine 

conceptualize conceptualized < imagine 

conceptualise conceptualised < localisation 

emerge emerged < appear 

surface surfaced < appear 

become apparent became apparent < appear 

become evident became evident < appear 

gather gathered < figured

Factive / non-factive predicate recognition within Question ...

Create successful ePaper yourself

Delete template?

Save as template?