Acceptance Test-Driven Development with Keyword ... - Niksula

HELSINKI UNIVERSITY OF TECHNOLOGY 

Department of Computer Science and Engineering 

Software Business and Engineering Institute 

Juha Rantanen 

Acceptance Test-Driven Development with Keyword-Driven 

Test Automation Framework in an Agile Software Project 

Master’s Thesis 

Espoo, May 18, 2007 

Supervisor: 

Instructor: 

Professor Tomi Männistö 

Harri Töhönen, M.Sc.

HELSINKI UNIVERSITY OF TECHNOLOGY 

Department of Computer Science and Engineering 

ABSTRACT OF MASTER’S THESIS 

Author 

Title of thesis 

Professorship 

Supervisor 

Instructor 

Date 

Juha Rantanen May 18, 2007 

Pages 

102 

Acceptance Test-Driven Development with Keyword-Driven Test Automation 

Framework in an Agile Software Project 

Professorship Code 

Computer Science T-76 

Professor Tomi Männistö 

Harri Töhönen, M.Sc. 

Agile software development uses iterative development allowing changes and updates periodically 

to the software requirements. In agile software development methods, customer-defined tests have 

an important role in assuring that the software fulfills the customer’s needs. These tests can be defined 

before implementation to establish a clear goal for the development team. This is called acceptance 

test-driven development (ATDD). 

With ATDD the acceptance tests are usually automated. Keyword-driven testing is the latest evolution 

in test automation approaches. In keyword-driven testing, instructions, inputs, and expected 

outputs are defined in separate test data. A test automation framework tests the software accordingly 

and reports the results. 

In this thesis, the use of acceptance test-driven development with the keyword-driven test automation 

framework is studied in a real-world agile software development project. The study was conducted 

using action research during a four-month period. The main methods used were observations and 

interviews. 

It was noticed that the keyword-driven test automation framework can be used in acceptance testdriven 

development. However, there were some limitations preventing the implementation of all the 

test cases before the software implementation started. It was also noticed that the test automation 

framework used to implement the acceptance test cases is not in a crucial role in acceptance test 

driven development. The biggest benefits were gained from the detailed planning done before the 

software implementation at the beginning of the iterations. 

Based on the results, acceptance test-driven development improves communication and cooperation, 

and gives a common understanding about the details of the software’s features. These improvements 

help the development team to implement the wanted features. Therefore, the risk of building incomplete 

software decreases. The improvements also help to implement the features more efficiently as 

the features are more likely to be implemented correctly at the first time. Also remarkable changes to 

the test engineers’ role were noticed as the test engineers are more involved in the detailed planning. 

It seems that the biggest challenge in acceptance test-driven development is creating tests on right 

test levels and in a right scope. 

Keywords: acceptance test-driven development, keyword-driven testing, agile testing, test automation 

ii

TEKNILLINEN KORKEAKOULU 

Tietotekniikan osasto 

DIPLOMITYÖN TIIVISTELMÄ 

Tekijä 

Työn nimi 

Professuuri 

Työn valvoja 

Työn ohjaaja 

Päiväys 

Juha Rantanen May 18, 2007 

Sivumäärä 

102 

Hyväksymistestauslähtöinen kehitys avainsanaohjatulla testiautomaatiokehyksellä 

ketterässä ohjelmistoprojektissa 

Koodi 

Ohjelmistoliiketoiminta ja tuotanto T-76 

Professori Tomi Männistö 

DI Harri Töhönen 

Ketterä ohjelmistokehitys pohjautuu iteratiiviseen lähestymistapaan. Iteratiivisuus mahdollistaa ohjelmiston 

vaatimusten muuttamisen ja päivittämisen jaksottaisesti. Ketterissä ohjelmistokehitysprosesseissa 

asiakkaan määrittämät testit ovat tärkeässä roolissa varmistettaessa, että kehitettävä ohjelmisto 

täyttää asiakkaan tarpeet. Nämä testit voidaan määritellä ennen toteutuksen aloittamista selkeän 

tavoitteen luomiseksi kehitystiimille. Tätä kutsutaan hyväksymistestauslähtöiseksi kehitykseksi. 

Hyväksymistestauslähtöisessä kehityksessä hyväksymistestit usein automatisoidaan. Yksi uusimmista 

testiautomaatiomenetelmistä on avainsanaohjattu testaus. Avainsanaohjatussa testauksessa ohjeet, 

syötteet ja oletetut lopputulokset määritellään erillisissä testitiedoissa. Testiautomaatiokehys testaa 

ohjelmistoa kyseisten tietojen mukaisesti ja raportoi tulokset. 

Tässä diplomityössä tarkastellaan avainsanaohjatun testiautomaatiokehyksen käyttöä hyväksymistestauslähtöisessä 

kehityksessä. Tutkimuksen kohteena oli eräs käynnissä oleva ketterä ohjelmistotuotantoprojekti. 

Lähestymistapana käytettiin toimintatutkimusta (action research) ja pääasiallisina menetelminä 

havainnointia ja haastatteluita. Tutkimusjakson pituus oli neljä kuukautta. 

Tutkimuksessa havaittiin, että avainsanaohjattua testiautomaatiokehystä voidaan käyttää hyväksymislähtöisessä 

kehityksessä. Jotkin rajoitteet kuitenkin estivät testien tekemisen ennen ohjelmiston 

toteutuksen aloittamista. Lisäksi havaittiin, että hyväksymistestauslähtöisessä kehityksessä testitapausten 

luomisessa käytettävällä testiautomaatiokehyksellä ei ole ratkaisevaa roolia. Suurimmat 

hyödyt saavutettiin yksityiskohtaisella suunnittelulla ennen ohjelmiston toteuttamista jokaisen iteraation 

alussa. 

Tulosten perusteella hyväksymistestauslähtöinen kehitys edistää eri osapuolten välistä kommunikaatiota, 

yhteistyötä ja käsitystä ohjelmiston ominaisuuksien yksityiskohdista. Tämä edistää haluttujen 

ominaisuuksien toteuttamista. Näin ollen riski toimimattoman tai väärin toimivan ohjelmiston valmistamisesta 

pienenee. Tämä edesauttaa tehokkaampaa ohjelmistokehitystä, sillä oikeat ominaisuudet 

tuotetaan todennäköisemmin jo ensimmäisellä toteutuskerralla. Testaajien roolissa huomattiin 

myös merkittäviä muutoksia johtuen testaajien lisääntyneestä osallistumisesta yksityiskohtaiseen 

suunnitteluun. Näyttää siltä, että hyväksymistestauslähtöisen kehityksen suurimmat haasteet liittyvät 

oikeilla testitasoilla ja oikeassa laajuudessaan olevien testien luomiseen. 

Avainsanat: hyväksymislähtöinen kehitys, avainsanaohjattu testaus, ketterä testaus, testiautomaatio 

iii

ACKNOWLEDGEMENTS 

This master’s thesis has been written for a Finnish software testing consultancy company Qentinel during 

the years 2006 and 2007. I would like to thank all the Qentinelians who have made this possible. 

Big thanks belong to my instructor Harri Töhönen for his interest, valuable feedback, and time used for 

listening and commenting my ideas. 

I would express my gratitude to my supervisor Tomi Männistö who gave advice and comments when 

those were needed. 

I would like to thank Petri Haapio and Pekka Laukkanen whom I have been working with and who 

have been giving valuable ideas, comments and feedback. The discussions with these two professionals 

have improved my know-how about agile software development and test automation. That knowhow 

has been priceless during this work. 

I would wish to thank all the members of the project where the research was carried out. It has been 

very rewarding to work with them. 

Also my good friend Pauli Aho deserves to be thanked. I am deeply indebted to him for using his time 

to check the language of this thesis. 

Finally, special thanks go to my lovely wife Aino for the help and support I received during this project. 

I am grateful to her for being so patient. 

iv

TABLE OF CONTENTS 

TERMS....................................................................................................................................... VII 

1 INTRODUCTION ............................................................................................................. 1 

1.1 Motivation .............................................................................................................. 1 

1.2 Aim of the Thesis ................................................................................................... 3 

1.3 Structure of the Thesis .......................................................................................... 3 

2 TRADITIONAL TESTING................................................................................................ 4 

2.1 Purpose of Testing................................................................................................. 4 

2.2 Dynamic and Static Testing................................................................................... 4 

2.3 Functional and Non-Functional Testing................................................................. 4 

2.4 White-Box and Black-Box Testing ......................................................................... 5 

2.5 Test Levels ............................................................................................................ 5 

3 AGILE AND ITERATIVE SOFTWARE DEVELOPMENT ............................................... 9 

3.1 Iterative Development Model................................................................................. 9 

3.2 Agile Development................................................................................................. 10 

3.3 Scrum..................................................................................................................... 11 

3.4 Extreme Programming........................................................................................... 15 

3.5 Scrum and Extreme Programming Together......................................................... 17 

3.6 Measuring Progress in Agile Projects ................................................................... 17 

4 TESTING IN AGILE SOFTWARE DEVELOPMENT ...................................................... 19 

4.1 Purpose of Testing................................................................................................. 19 

4.2 Test Levels ............................................................................................................ 19 

4.3 Acceptance Test-Driven Development.................................................................. 22 

5 TEST AUTOMATION APPROACHES............................................................................ 28 

5.1 Test Automation..................................................................................................... 28 

5.2 Evolution of Test Automation Frameworks............................................................ 29 

5.3 Keyword-Driven Testing ........................................................................................ 29 

6 KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK .......................................... 32 

6.1 Keyword-Driven Test Automation Framework....................................................... 32 

6.2 Test Data ............................................................................................................... 33 

6.3 Test Execution ....................................................................................................... 35 

6.4 Test Reporting ....................................................................................................... 35 

7 EXAMPLE OF ACCEPTANCE TEST-DRIVEN DEVELOPMENT WITH 

KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK .......................................... 36 

7.1 Test Data between User Stories and System under Test ..................................... 36 

7.2 User Stories ........................................................................................................... 37 

7.3 Defining Acceptance Tests.................................................................................... 37 

7.4 Implementing Acceptance Tests and Application.................................................. 39 

8 ELABORATED GOALS OF THE THESIS...................................................................... 45 

8.1 Scope..................................................................................................................... 45 

8.2 Research Questions .............................................................................................. 45 

9 RESEARCH SUBJECT AND METHOD ......................................................................... 47 

9.1 Case Project .......................................................................................................... 47 

9.2 Research Method .................................................................................................. 47 

9.3 Data Collection ...................................................................................................... 49 

v

10 ACCEPTANCE TEST-DRIVEN DEVELOPMENT WITH KEYWORD-DRIVEN 

TEST AUTOMATION FRAMEWORK IN THE PROJECT UNDER STUDY................... 51 

10.1 Development Model and Development Practices Used in the Project.................. 51 

10.2 January Sprint........................................................................................................ 52 

10.3 February Sprint ...................................................................................................... 55 

10.4 March Sprint .......................................................................................................... 61 

10.5 April Sprint ............................................................................................................. 63 

10.6 Interviews............................................................................................................... 65 

11 ANALYSES OF OBSERVATIONS ................................................................................. 72 

11.1 Suitability of the Keyword-Driven Test Automation Framework with 

Acceptance Test-Driven Development.................................................................. 72 

11.2 Use of the Keyword-Driven Test Automation Framework with Acceptance 

Test-Driven Development...................................................................................... 76 

11.3 Benefits, Challenges and Drawbacks of Acceptance Test-Driven 

Development with Keyword-Driven Test Automation Framework......................... 78 

11.4 Good Practices ...................................................................................................... 87 

12 DISCUSSION AND CONCLUSIONS.............................................................................. 89 

12.1 Researcher’s Experience ...................................................................................... 89 

12.2 Main Conclusions .................................................................................................. 89 

12.3 Validity ................................................................................................................... 90 

12.4 Evaluation of the Thesis ........................................................................................ 92 

12.5 Further Research Areas ........................................................................................ 92 

BIBLIOGRAPHY........................................................................................................................ 94 

APPENDIX A PRINCIPLES BEHIND THE AGILE MANIFESTO ..................................... 101 

APPENDIX B INTERVIEW QUESTIONS .......................................................................... 102 

vi

TERMS 

Acceptance Criteria 

Acceptance Testing 

Acceptance Test-Driven Development 

(ATDD) 

Actual Result 

Agile Testing 

Base Keyword 

Behavior 

Bespoke Software 

Beta Testing 

Black-box Testing 

Bug 

Capture/Playback Tool 

Component 

The exit criteria that a component or system must satisfy in order to 

be accepted by a user, customer, or other authorized entity. (IEEE 

Std 610.12-1990) 

Formal testing with respect to user needs, requirements, and business 

processes conducted to determine whether or not a system satisfies 

the acceptance criteria and to enable the user, customers or 

other authorized entity to determine whether or not to accept the 

system. (IEEE Std 610.12-1990) See also component testing, integration 

testing and acceptance testing. 

A way of developing software where the acceptance test cases are 

developed, and often automated, before the software is developed to 

run those test cases. See also test-driven development. 

The behavior produced/observed when a component or system is 

tested. (ISTQB 2006) 

Testing practice for a project using agile methodologies, such as 

extreme programming (XP), treating development as the customer 

of testing and emphasizing the test-first design paradigm. (ISTQB 

2006) See also test-driven development and acceptance test-driven 

development. 

Keyword implemented in a test library of a keyword-driven test 

automation framework. (Laukkanen 2006) See also sentence format 

keyword and user keyword. 

The response of a component or system to a set of input values and 

preconditions. (ISTQB 2006) 

Software developed specifically for a set of users or customers. The 

opposite is off-the-shelf software. (ISTQB 2006) 

Operational testing by potential and/or existing users/customers at 

an external site not otherwise involved with the developers, to determine 

whether or not a component or system satisfies the 

user/customer needs and fits within the business processes. Beta 

testing is often employed as a form of external acceptance testing 

for off-the-shelf software in order to acquire feedback from the market. 

(ISTQB 2006) 

Testing, either functional or non-functional, without reference to the 

internal structure of the component or system. (ISTQB 2006) See 

also white-box testing. 

See defect. 

A type of test execution tool where inputs are recorded during manual 

testing in order to generate automated test scripts that can be 

executed later (i.e. replayed). These tools are often used to support 

automated regression testing. (ISTQB 2006) 

A minimal software item that can be tested in isolation. (ISTQB 

2006) 

vii

Component Testing The testing of individual software components. (IEEE Std 610.12- 

1990) 

Context-Driven Testing 

Daily Build 

Data-Driven Testing 

Defect 

Defined Process 

Dynamic Testing 

Empirical Process 

Expected Outcome 

Expected Result 

Exploratory Testing 

Fail 

Failure 

Fault 

A testing methodology that underlines the importance of the context 

where different testing practices are used over the practices themselves. 

The main message is that there are good practices in a context 

but there are no general best practices. (Kaner et al. 2001a) 

A development activity where a complete system is compiled and 

linked every day (usually overnight), so that a consistent system is 

available at any time including all latest changes. (ISTQB 2006) 

A scripting technique that stores test input and expected results in a 

table or spreadsheet, so that a single control script can execute all of 

the tests in the table. Data-driven testing is often used to support the 

application of test execution tools such as capture/playback tools. 

(Fewster & Graham 1999) See also keyword-driven testing. 

A flaw in a component or system that can cause the component or 

system to fail to perform its required function, e.g. an incorrect 

statement or data definition. A defect, if encountered during execution, 

may cause a failure of the component or system. (ISTQB 

2006) 

In defined process every piece of work is well understood. With 

well defined input, the defined process can be started and allowed 

to run until completion, ending with the same results every time. 

(Schwaber & Beedle 2002) See also empirical process. 

Testing that involves the execution of the software of a component 

or system. (ISTQB 2006) See also static testing. 

In empirical process the unexpected is expected. Empirical process 

provides and exercises control through frequent inspection and adaptation 

in imperfectly defined environments where unpredictable 

and unrepeatable outputs are generated. (Schwaber & Beedle 2002) 

See also defined process. 

See expected result. 

The behavior predicted by the specification, or another source, of 

the component or system under specified conditions. (ISTQB 2006) 

An informal test design technique where the tester actively controls 

the design of the tests as those tests are performed and uses information 

gained while testing to design new and better tests. (Bach 

2003b) 

A test is deemed to fail if its actual result does not match its expected 

result. (ISTQB 2006) 

Deviation of the component or system from its expected delivery, 

service or result. (Fenton 1996) 

See defect. 

viii

Feature 

Feature Creep 

Functional Testing 

Functionality 

High Level Test Case 

Input 

Input Value 

Information Radiator 

Integration Testing 

Iterative Development 

Model 

Keyword 

Keyword-Driven Test 

Automation Framework 

Keyword-Driven Testing 

An attribute of a component or system specified or implied by requirements 

documentation (for example reliability, usability or design 

constraints). (IEEE Std 1008-1987) 

On-going requirements increase without corresponding adjustment 

of approved cost and schedule allowances. As some projects progress, 

especially through the definition and development phases, 

requirements tend to change incrementally, causing the project 

manager to add to the project's mission or objectives without getting 

a corresponding increase in the time and budget allowances. (Wideman 

2002) 

Testing based on an analysis of the specification of the functionality 

of a component or system. (ISTQB 2006) See also black-box testing. 

The capability of the software product to provide functions which 

meet stated and implied needs when the software is used under 

specified conditions. (ISO/IEC Std 9126-1:2001) 

A test case without concrete (implementation level) values for input 

data and expected results. Logical operators are used; instances of 

the actual values are not yet defined and/or available. (ISTQB 2006) 

See also low level test case. 

A variable (whether stored within a component or outside) that is 

read by a component. (ISTQB 2006) 

An instance of an input. (ISTQB 2006) See also input. 

An information radiator is a large display of critical team information 

that is continuously updated and located in a spot where the 

team can see it constantly. (Agile Advice 2005) 

Testing performed to expose defects in the interfaces and in the interactions 

between integrated components or systems. (ISTQB 

2006) See also component testing, system testing and acceptance 

testing. 

A development life cycle where a project is broken into a usually 

large number of iterations. Iteration is a complete development loop 

resulting in a release (internal or external) of an executable product, 

a subset of the final product under development, which grows from 

iteration to iteration to become the final product. (ISTQB 2006) 

A directive representing a single action in keyword-driven testing. 

(Laukkanen 2006) 

Test automation framework using keyword-driven testing technique. 

A scripting technique that uses data files to contain not only test 

data and expected results, but also keywords related to the application 

being tested. The keywords are interpreted by special supporting 

scripts that are called by the control script for the test. (ISTQB 

2006) See also data-driven testing. 

ix

Low Level Test Case 

Negative Testing 

Non-functional testing 

Off-the-shelf Software 

Output 

Output Value 

Pass 

Postcondition 

Precondition 

Problem 

Quality 

Quality Assurance 

Regression Testing 

Requirement 

Result 

A test case with concrete (implementation level) values for input 

data and expected results. Logical operators from high level test 

cases are replaced by actual values that correspond to the objectives 

of the logical operators. (ISTQB 2006) See also high level test case. 

Tests aimed at showing that a component or system does not work. 

Negative testing is related to the testers’ attitude rather than a specific 

test approach or test design technique, e.g. testing with invalid 

input values or exceptions. (Beizer 1990) 

Testing the attributes of a component or system that do not relate to 

functionality, e.g. reliability, efficiency, usability, maintainability 

and portability. (ISTQB 2006) 

A software product that is developed for the general market, i.e. for 

a large number of customers, and that is delivered to many customers 

in identical format. (ISTQB 2006) 

A variable (whether stored within a component or outside) that is 

written by a component. (ISTQB 2006) 

An instance of an output. (ISTQB 2006) See also output. 

A test is deemed to pass if its actual result matches its expected result. 

(ISTQB 2006) 

Environmental and state conditions that must be fulfilled after the 

execution of a test or test procedure. (ISTQB 2006) 

Environmental and state conditions that must be fulfilled before the 

component or system can be executed with a particular test or test 

procedure. (ISTQB 2006) 

See defect. 

The degree to which a component, system or process meets specified 

requirements and/or user/customer needs and expectations. 

(IEEE Std 610.12-1990) 

Part of quality management focused on providing confidence that 

quality requirements will be fulfilled. (ISO Std 9000-2005) 

Testing of a previously tested program following modification to 

ensure that defects have not been introduced or uncovered in unchanged 

areas of the software, as a result of the changes made. It is 

performed when the software or its environment is changed. 

(ISTQB 2006) 

A condition or capability needed by a user to solve a problem or 

achieve an objective that must be met or possessed by a system or 

system component to satisfy a contract, standard, specification, or 

other formally imposed document. (IEEE Std 610.12-1990) 

The consequence/outcome of the execution of a test. It includes 

outputs to screens, changes to data, reports, and communication 

messages sent out. See also actual result, expected result. (ISTQB 

2006) 

x

Running Tested Features 

(RTF) 

Sentence Format Keyword 

Software 

Software Quality 

Static Code Analysis 

Static Testing 

System 

System Testing 

Running Tested Features is a metric to measure the progress of an 

agile team. (Jeffries 2004) 

Term defined in this thesis for keywords which name is a sentence 

and it does not take any arguments. See also base keyword and user 

keyword. 

Computer programs, procedures, and possibly associated documentation 

and data pertaining to the operation of a computer system. 

(IEEE Std 610.12-1990) 

The totality of functionality and features of a software product that 

bear on its ability to satisfy stated or implied needs. (ISO/IEC Std 

9126-1:2001) 

Analysis of source code carried out without execution of that software. 

(ISTQB 2006) 

Testing of a component or system at specification or implementation 

level without execution of that software, e.g. reviews or static code 

analysis. (ISTQB 2006) See also dynamic testing. 

A collection of components organized to accomplish a specific 

function or set of functions. (IEEE Std 610.12-1990) 

The process of testing an integrated system to verify that it meets 

specified requirements. (Burnstein 2003) See also component testing, 

integration testing and acceptance testing. 

System Under Test (SUT) The entire system or product to be tested. (Craig and Jaskiel 2002) 

Test A set of one or more test cases. (IEEE Std 829-1983) 

Test Automation 

Test Automation Framework 

Test Case 

Test Data 

Test-Driven development 

(TDD) 

Test Execution 

The use of software to perform or support test activities, e.g. test 

management, test design, test execution and results checking. 

(ISTQB 2006) 

A framework used for test automation. Provides some core functionality 

(e.g. logging and reporting) and allows its testing capabilities 

to be extended by adding new test libraries. (Laukkanen 2006) 

A set of input values, execution preconditions, expected results and 

execution postconditions, developed for a particular objective or 

test condition, such as to exercise a particular program path or to 

verify compliance with a specific requirement. (IEEE Std 610.12- 

1990) 

Data that exists (for example, in a database) before a test is executed, 

and that affects or is affected by the component or system 

under test. (ISTQB 2006) 

A way of developing software where the test cases are developed, 

and often automated, before the software is developed to run those 

test cases. (ISTQB 2006) 

The process of running a test on the component or system under 

test, producing actual result(s). (ISTQB 2006) 

xi

Test Execution Automation 

Test Engineer 

Test Input 

Test Level 

Test Log 

Test Logging 

Test Report 

Test Run 

Test Runner 

Test Result 

Test Script 

Test Set 

Test Suite 

Testability 

Tester 

Testing 

User Keyword 

The use of software, e.g. capture/playback tools, to control the execution 

of tests, the comparison of actual results to expected results, 

the setting up of test preconditions, and other test control and reporting 

functions. (ISTQB 2006) 

See tester. 

The data received from an external source by the test object during 

test execution. The external source can be hardware, software or 

human. (ISTQB 2006) 

A group of test activities that are organized and managed together. 

A test level is linked to the responsibilities in a project. Examples of 

test levels are component test, integration test, system test and acceptance 

test. (Pol 2002) 

A chronological record of relevant details about the execution of 

tests. (IEEE Std 829-1983) 

The process of recording information about tests executed into a 

test log. (ISTQB 2006) 

A document summarizing testing activities and results. (IEEE Std 

829-1983) 

Execution of a test on a specific version of the test object. (ISTQB 

2006) 

A generic driver script capable to execute different kinds of test 

cases and not only variations with slightly different test data. 

(Laukkanen 2006) 

See result. 

Commonly used to refer to a test procedure specification, especially 

an automated one. (ISTQB 2006) 

See test suite. 

A set of several test cases for a component or system under test, 

where the postcondition of one test is often used as the precondition 

for the next one. (ISTQB 2006) 

The capability of the software product to enable modified software 

to be tested. (ISO/IEC Std 9126-1:2001) 

A skilled professional who is involved in the testing of a component 

or system. (ISTQB 2006) 

The process consisting of all life cycle activities, both static and 

dynamic, concerned with planning, preparation and evaluation of 

software products and related work products to determine that they 

satisfy specified requirements, to demonstrate that they are fit for 

purpose and to detect defects. (ISTQB 2006) 

Keyword constructed from base keywords and other user keywords 

in a test design system. User keywords can be created easily even 

without programming skills. (Laukkanen 2006) See also base keyword 

and sentence format keyword. 

xii

Unit Testing 

Variable 

White-Box Testing 

See component testing. 

An element of storage in a computer that is accessible by a software 

program by referring to it by a name. (ISTQB 2006) 

Testing based on an analysis of the internal structure of the component 

or system. (ISTQB 2006) See also black-box testing. 

xiii

1 INTRODUCTION 

1.1 Motivation 

Quality is one of the most important aspects of software products. If software does not work, it is not 

worth a lot. The drawbacks caused by faulty software can be much higher than the advantages gained 

from using it. Malfunctioning or difficult to use software can complicate daily life. In life critical systems 

faults may even cause loss of human lives. In highly competing markets quality may determine 

which software product is going to be a success and which ones are going to fail. Low quality software 

products have a negative impact on firms’ reputation and unquestionably also on the sales. Unhappy 

customers are also more willing to change to other software suppliers. For these reasons organizations 

have to invest on the quality of software products. 

Even high quality software can fail at the markets if it does not meet the customers’ needs. At the beginning 

of a software project it is common that customers’ exact needs are unknown. This may lead to 

guessing the wanted features and development of useless features and in the worst case useless software. 

This should obviously be avoided. 

New feature ideas usually arise when the customer understands the problem domain more thoroughly. 

This might be quite problematic if strict contractual agreements on the developed features exist. Even 

when it is contractually possible to add new features to the software, there might be a lot of rework 

before the features are ready for use. 

Iterative and especially agile software processes are introduced as a solution for changing requirements. 

The basic idea in the iterative processes is to create the software in small steps. When software 

is developed in this way, the customers can try out the developed software and based on the customer’s 

feedback, the development team can create features that are valuable for the customer. The most valuable 

features are developed first allowing the customer to start using the software earlier than the software 

developed with a non-iterative development process. 

Iterative software development adds new challenges for software testing. In traditional software projects 

main part of the testing is conducted in the end of the development project. With the iterative and 

agile processes the software should, however, be tested in every iteration. If the customer uses the result 

of the iteration, at least all the major problems should be solved before the product can be delivered. 

In an ideal situation each iteration outcome would be high quality software. 

1

In the agile methods the need for testing is understood and there are development practices that are 

used to assure the quality of the software. Many of these practices are targeted for developers and used 

to test that the code works as the developers have thought it should. To also test that the features fulfill 

the customer’s requirements there is need for a higher level testing. This higher level testing is often 

called as acceptance testing or customer testing. Customer input is needed to define these higher level 

test cases to make sure that her requirements are met. 

Because the software is developed in the iterative manner and there is a continuous change, it would be 

beneficial to test all the features at least once during the iteration. Testing over again is needed, because 

the changes may have caused defects. Testing manually all functionalities after every change is 

not possible. It may be possible at the beginning, but when the count of features rises, manual regression 

testing becomes harder and eventually impossible. This leads to a situation in which the changes 

done late in the iteration may have caused faults that cannot be noticed in testing. And even if the 

faults could be noticed, developers may not be able to fix them during the iteration. 

Test automation can be used for helping the testing effort. Test automation means testing software with 

other software. When software and computers are used for testing, the test execution can be conducted 

much faster than manually. If the automated tests can be executed daily or even more often, the status 

of the developed software is continuously known. Therefore the problems can be found faster and the 

changes causing the problems can be pinpointed. That is why the test automation is an integral part of 

agile software development. 

By automating the customer defined acceptance tests, the test cases defining how the system should 

work from the customer point of view can be executed often. This makes it possible to know the status 

of the software in any point of the development. In acceptance test-driven development this approach 

is taken even further and the acceptance tests are not only used for verifying that the system works but 

also driving the system development. The customer defined test cases are created before the implementation 

starts. The goal of the implementation is then to develop software that passes all the acceptance 

test cases. 

2

1.2 Aim of the Thesis 

The aim of this thesis is to investigate whether the acceptance test-driven development can be used 

with in-house built keyword-driven test automation framework. The research is conducted in a real-life 

agile software development project and the suitability is evaluated in this case project. Also the pros 

and cons of this approach are evaluated. More detailed research question will follow in Chapter 8 after 

the acceptance test-driven development and keyword-driven test automation concepts are clarified. 

One purpose is to present the framework usage in level that can help others to try the approach with 

similar kind of tools. 

1.3 Structure of the Thesis 

Structure of this thesis is the following; in Chapter 2 the traditional software testing is described to introduce 

the basic concepts needed in the following chapters. In Chapter 3 the basis of the agile and iterative 

software development is described. The testing in the agile software development is introduced 

in Chapter 4. Chapter 4 contains also acceptance test-driven development which is the main topic in 

this thesis. Chapter 5 includes the test automation approaches in general and the keyword-driven test 

automation approach in particular. After the keyword-driven approach is introduced, the keyworddriven 

test automation framework used in this thesis is explained in Chapter 6 in level that is needed to 

understand the coming Chapters. Chapter 7 contains simple and fictitious example of the usage of the 

presented keyword-driven test automation framework with acceptance test-driven development. 

The research questions are defined in Chapter 8. The case project and product developed in the case 

project are described in Chapter 9. The research method used to conduct this research is also explained 

in Chapter 9. Chapter 10 contains all the results from the project. First the development model used in 

the case project is described. Then the use of acceptance test-driven development with the keyworddriven 

test automation framework is represented. Chapter 10 also contains results from the interviews 

which were conducted at the end of the research. In Chapter 11 the observations gained from the case 

project are analyzed. Chapter 12 contains the conclusions and the discussion about the results and the 

meaning of the analysis in a wider perspective. Further research areas are presented at the end of Chapter 

12. 

3

2 TRADITIONAL TESTING 

In this chapter the traditional testing terminology and divisions of different testing aspects are described. 

The purpose is to give an overall view of the testing field and make it possible in the following 

chapters to compare agile testing to traditional testing and specify the research area in a wider context. 

2.1 Purpose of Testing 

Testing is an integral part of the software development. The goal of software testing is to find faults 

from developed software and to make sure they get fixed (Kaner et al. 1999, Patton 2000). It is important 

to find the faults as early as possible because fixing them is more expensive in the later phases of 

the development (Kaner et al. 1999, Patton 2000). The purpose of testing is also to provide information 

about the current state of the developed software from the quality perspective (Burnstein 2003). One 

might argue that software testing should make sure that software works correctly. This is however impossible 

because even a simple piece of software has millions of paths that should all be tested to make 

sure that it works correctly (Kaner et al. 1999). 

2.2 Dynamic and Static Testing 

On a high level, software testing can be divided into dynamic and static testing. The division to these 

two categories can be done based on whether the software is executed or not. Static testing means testing 

without executing the code. This can be done with different kinds of reviews. Reviewed items can 

be documents or code. Other static testing methods are static code analysis methods for example syntax 

correctness and code complexity analysis. With static testing faults can be found in an early phase 

of software development because the testing can be started before any code is written. (IEEE Std 

610.12-1990; Burnstein 2003) 

Dynamic testing is the opposite of static testing. The system under test is tested by executing it or parts 

of it. Dynamic testing can be divided to functional testing and non-functional testing which are presented 

below. (Burnstein 2003) 

2.3 Functional and Non-Functional Testing 

The purpose of functional testing is to verify that software corresponds to the requirements defined for 

the system. The focus on functional testing is to enter inputs to the system under test and verify the 

proper output and state. The concept of functional testing is quite similar to all systems, even though 

the inputs and outputs differ from system to system. 

4

The non-functional testing means testing quality aspects of software. Examples of non-functional testing 

are performance, security, usability, portability, reliability, and memory management testing. Each 

non-functional testing needs different approaches and different kind of know-how and resources. The 

needed non-functional testing is always decided based on the quality attributes of the system and therefore 

selected by case basis. (Burnstein 2003) 

2.4 White-Box and Black-Box Testing 

There are two basic testing strategies, white-box testing and black-box testing. When the white-box 

strategy is used, the internal structure of the system under test is known. The purpose is to verify the 

correct behavior of internal structural elements. This can be done for example by exercising all the 

statements or all conditional branches. Because the white-box testing is quite time consuming, it is 

usually done for small parts of the system at a time. White-box testing methods are useful in finding 

design, code-based control, logic and sequence defects, initialization defects, and data flow defects. 

(Burnstein 2003) 

In black-box testing the system under test is seen as an opaque box. There is no knowledge of the inner 

structure of the software. The only knowledge is how the software works. The intention of the blackbox 

testing is to provide inputs to the system under test and verify that the system works as defined in 

the specifications. Because black box approach considers only behavior and functionality of the system 

under test, it is also called functional testing. With black box strategy requirement and specification 

defects are revealed. Black-box testing strategy can be used at all test levels defined in the following 

chapter. (Burnstein 2003) 

2.5 Test Levels 

Testing can be performed in multiple levels. Usually software testing is divided into unit testing, integration 

testing, system testing, and acceptance testing (Dustin et al. 1999; Craig & Jaskiel 2002; Burnstein 

2003). The purpose of these different test levels is to investigate and test the software from different 

perspectives and find different type of defects (Burnstein 2003). If the division of levels is done 

from test automation perspective, the levels can be unit testing, component testing and system testing 

(Meszaros 2003; Laukkanen, 2006). In this thesis, whenever traditional test levels are used, the division 

into unit, integration, system, and acceptance testing is meant. Figure 1 shows these test levels and 

their relative order. 

5

Figure 1: Test levels (Burnstein 2003) 

UNIT TESTING 

The smallest part of software is a unit. A unit is traditionally viewed as a function or a procedure in a 

(imperative) programming language. In object-oriented systems methods and classes/objects can be 

seen as units. Unit can also be a small-sized component or a programming library. The principal goal 

of unit testing is to detect functional and structural defects in the unit. Sometimes the name component 

is used instead of a unit. In that case the name of this phase is component testing. (Burnstein 2003) 

There are different opinions about who should create unit tests. Unit testing is in most cases best handled 

by developers who know the code under test and techniques needed (Dustin et al. 1999; Craig & 

Jaskiel 2002; Mosley & Posey 2002). On the other hand, Burnstein (2003) thinks that an independent 

tester should plan and execute the unit tests. The latter is the more traditional point of view, pointing 

that nobody should evaluate their own job. 

Unit testing can be started in an early phase of the software development after the unit is created. The 

failures revealed by the unit tests are usually easy to locate and repair since only one unit is under consideration 

(Burnstein 2003). For these reasons, finding and fixing the defects is cheapest on the unit 

test level. 

6

INTEGRATION TESTING 

When units are combined the resulting group of units is called a subsystem or some times in objectoriented 

software system a cluster. The goal of integration testing is to verify that the component/class 

interfaces are working correctly and the control and data flows are working correctly between the 

components. (Burnstein 2003) 

SYSTEM TESTING 

When ready and tested subsystems are combined to the final system, system test execution can be 

started. System tests evaluate both the functional behavior and non-functional qualities of the system. 

The goal is to ensure that the system performs according to its requirements when tested as a whole 

system. After system testing and corrections based on the found faults are done, the system is ready for 

the customer’s acceptance testing, alpha testing or beta testing (see next paragraph). If the customer 

has defined the acceptance tests, those can be used in the system testing phase to assure the quality of 

the system from the customer’s point of view. (Burnstein 2003) 

ACCEPTANCE TESTING 

When a software product is custom-made, the customer wants to verify that the developed software 

meets her requirements. This verification is done in the acceptance testing phase. The acceptance tests 

are developed in co-operation between the customer and test planners and executed after the system 

testing phase. The purpose is to evaluate the software in terms of customer’s expectations and goals. 

When the acceptance testing phase is passed, the product is ready for production. If the product is targeted 

for mass market, it is often not possible to arrange customer-specific acceptance testing. In these 

cases the acceptance testing is conducted in two phases called alpha and beta testing. In alpha testing 

the possible customers and members from the development organization test the product in the development 

organization premises. After defects found in alpha testing are fixed, beta testing can be 

started. The product is send to a cross-section of users who use it in the real-world environment and 

report the found defects. (Burnstein 2003) 

7

REGRESSION TESTING 

The purpose of regression testing is to ensure that old characteristics are working after changes made 

to the software and verify that the changes have not introduced new defects. Regression testing is not a 

test level as such and it can be performed in all test levels. The importance of the regression testing 

increases when the system is released multiple times. The functionality provided in the previous version 

should still work with all the new functionality and verifying this is very time consuming. Therefore 

it is recommended to use automated testing tools to support this task (Burnstein 2003). Also Kaner 

et al. (1999) have noticed that it is a common way to automate acceptance and regression tests to 

quickly verify the status of the latest build. 

8

3 AGILE AND ITERATIVE SOFTWARE DEVELOPMENT 

The purpose of this chapter is to explain iterative development model and agile methods in general, and 

illustrate development models Scrum and Extreme Programming (XP) on a more detailed level because 

their relevance to this thesis. 

3.1 Iterative Development Model 

In iterative development model software is built using multiple sequential iterations during the whole 

lifecycle of the software. An iteration can be seen as a mini-project containing requirement analysis, 

design, development, and testing. The goal of the iteration is to build an iteration release. An iteration 

release is a partially completed system which is stable, integrated, and tested. Usually most of the iteration 

releases are internal and not released for external customers. The final iteration release is the complete 

product, and it is released to customer or to markets. (Larman 2004) 

Usually a partial system grows incrementally with new features, iteration by iteration. This is called 

incremental development. The concept of a growing system via iterations has been called iterative and 

incremental development, although iterative development is the common term. The features to be implemented 

in iteration are decided at the beginning of the iteration. The customer selects the most valuable 

features at that time, so there is no a strict predefined plan. This is called adaptive planning. (Larman 

2004) 

In modern iterative methods, the recommended length of iteration is between one and six weeks. In 

most of the iterative and incremental development methods the length of the iteration is timeboxed. 

Timeboxing is a practice which sets a fixed end date for the iteration. Fixed end date means that if the 

iteration scope can not be met, the features with lowest priority are reduced from the scope of the iteration. 

This way the growing software is always in a stable and tested state at the end of the iteration. 

(Larman 2004) 

Evolutionary iterative development implies that requirements, plans, and solutions evolve and they are 

being refined during iterations, instead of using predefined specifications. There is also the term adaptive 

development. Difference between these two terms is that the adaptive development implies that the 

received feedback is guiding the development. (Larman 2004) 

9

Iterative and incremental development makes it possible that an enhanced product is repeatedly delivered 

to the markets. This is also called incremental delivery. Usually the incremental deliveries are 

done between three and twelve months. Evolutionary delivery is a refinement of incremental delivery. 

In evolutionary delivery the goal is to collect feedback and based on that plan content of the next delivery. 

In incremental delivery the feedback is not running the delivery plan. However, there is always 

some predefined and feedback based planning and therefore these two terms are used interchangeably. 

(Larman 2004) 

3.2 Agile Development 

Iterative and incremental development is the core of all agile methods, including Scrum and XP. Agile 

methods cannot be defined with a single definition, but all of them apply timeboxed iterative and evolutionary 

delivery as well as adaptive planning. There are also values and practices in agile methods 

that support agility, meaning rapid and flexible response to change. Agile methods also promote practices 

and principles like simplicity, lightness, communication, self-directed teams, and programming 

over documentation. The values and principles that guide the agile methods were written down by a 

group interested in iterative and agile methods in 2001. (Larman 2004) Those values are stated in the 

Agile Manifesto (Figure 2). Agile software development principles are listed in Appendix A. 

Figure 2: Agile Manifesto (Beck et al. 2001a) 

10

3.3 Scrum 

Scrum is an agile, lightweight process that can be used to manage and control software and product 

development, and it uses iterative and incremental development methods. Scrum emphasizes empirical 

process rather than defined process. Scrum consists of four phases: planning, staging, development, 

and release. In the planning phase items like vision, funding and initial requirements are created. In the 

staging phase requirements are defined and prioritized in a way that there is enough content for the 

first iteration. In the development phase the development is done in iterations. The release phase contains 

product tasks like documentation, training, and deployment. (Schwaber & Beedle 2002; Larman 

2004; Schwaber 2004) 

When using Scrum people involved in software development are divided into three different roles: 

product owner, scrum master, and the team. The product owner’s task is to get the funding, collect the 

project’s initial requirements and manage the requirements (see product backlog at next page). The 

team is responsible for developing the functionality. The teams are self-managing, self-organizing, and 

cross-functional, and their task is to figure out how to convert items in the product backlog to functionality 

in iterations. Team members are collectively responsible for success of iterations and of the 

project as a whole, and this is one of the core principles of the Scrum. The maximum size of the team 

is seven members. The scrum master is responsible for the Scrum process and teaching Scrum to everyone 

in the project. The scrum master also makes sure that everyone follows the rules and practices of 

Scrum. (Schwaber 2004) 

Scrum consists of several practices, which are Product Backlog, Daily Scrum Meetings, Sprint, Sprint 

Planning, Sprint Backlog, Sprint Review, and the Sprint Retrospective. Figure 3 shows the overview of 

Scrum. 

11

Figure 3: Overview of Scrum (Control Chaos 2006a) 

PRODUCT BACKLOG 

Product Backlog is a list of all features, functions, technologies, enhancements, and bug-fixes that constitute 

the changes to be made to the product for future releases. The items in the product backlog are 

in a prioritized list which is evolving all the time. The idea is to add new items to it whenever there are 

new features or improvement ideas. (Schwaber & Beedle 2002) 

SPRINT 

Sprint is the name of the timeboxed iteration in the Scrum. The length of sprint is usually 30 calendardays. 

Sprint planning takes place at the beginning of the sprint. There are two meetings in the beginning 

of the sprint. In the first meeting the product owner and the team select the content for the following 

sprint from the product backlog. Usually the items with the highest priority and risks are selected. 

In the second meeting, the team and the product owner meet to consider how to develop the selected 

features and create sprint backlog which contains all the tasks that are needed to meet the goals of the 

sprint. The duration of the tasks are estimated in the meeting and updated during the sprint. (Schwaber 

& Beedle 2002; Larman 2004; Schwaber 2004) 

12

DAILY SCRUM 

The development progress is monitored with daily scrum meetings. Daily scrum of the specified form 

is kept every work day at the same time and place. Meeting should not last more than 15 minutes. The 

team is standing in a circle, and the scrum master asks the following questions from all the team members: 

1. What have you done since the last daily scrum 

2. What are you going to do between now and the next daily scrum 

3. What is preventing you from doing your work 

If any problems are raised during the daily scrum meeting, it is the responsibility of the team to solve 

the problems. If the team cannot deal with the problems, it becomes a responsibility of the scrum master. 

If there is a need for a decision, the scrum master has to decide the matter within an hour. If there 

are some other problems, the scrum master should solve them within one day before the next daily 

scrum. (Schwaber & Beedle 2002; Schwaber 2004) 

SPRINT REVIEW 

At the end of the sprint results are shown in the sprint review hosted by the scrum master. The purpose 

of the sprint review is to demonstrate the done functionality to the product owner and the stakeholders. 

After every presentation, all the participants are allowed to voice any comments, observations, improvement 

ideas, changes, or missing features regarding the presented functionality. All these found 

items are noted down. At the end of the meeting all the items are checked and placed to the product 

backlog for prioritization. (Schwaber & Beedle 2002; Schwaber 2004) 

DEFINITION OF DONE 

Because only the done functionality can be shown in sprint review, there is a need to define what that 

means. Otherwise one might think that functionality is done when a feature is implemented and another 

thinks that it is done when it is properly tested, documented and ready to be deployed to production. 

Schwaber (2004) recommends having a definition of done that is written down and agreed by all 

members of the team. This way all stakeholders know the condition of the demonstrated functionalities. 

13

SPRINT RETROSPECTIVE 

The sprint retrospective meeting is used to improve the performance of the scrum team. The sprint retrospective 

takes place at the end of the sprint and the participants are the scrum master and the team. 

Two questions “What went well during the last sprint” and “What could be improved in the next 

sprint” are asked from all of the team members. Improvement ideas are prioritized, and the ideas that 

should be taken into the next sprint are added as high priority nonfunctional items to the product backlog. 

(Schwaber 2004) 

RULES IN SCRUM 

In addition to earlier mentioned aspects there are a few more rules in Scrum. It is forbidden to add any 

new tasks to the sprint backlog during the sprint, and the scrum masters must ensure this. If the proposed 

new tasks are however more important than the ones in the sprint backlog, the sprint can be abnormally 

terminated by the scrum master. After the termination, a new sprint can be started with the 

sprint backlog containing the new tasks. (Schwaber & Beedle 2002; Schwaber 2004) 

DAILY BUILD 

As mentioned earlier, Scrum is used to manage and control product development, and therefore there 

are no strict rules for development practices that should be used. However, there is a need to know the 

status of the project on a daily basis, and therefore a daily build practice is needed. The daily build 

practice means that every day the developed source code is checked into the version control system, 

built and tested. This means that integration problems can be noticed on a daily basis rather than at the 

end of the sprint. The daily build practice can be implemented by continuous integration. Because the 

daily build is the only development practice that has to be used in Scrum, the team is responsible for 

selecting other development practices to be used. This means that many practices from other agile 

methods can be used by the team. (Schwaber & Beedle 2002) 

14

SCALING SCRUM 

It was mentioned that the size of scrum team is seven people. When Scrum is used in a larger project, 

the project members can be divided into multiple teams (Schwaber 2004; Larman 2006). When multiple 

teams are used, the cooperation with the team can be handled with the scrum of scrums. The scrum 

of scrums is a daily scrum where at least one member from every scrum team is participating. This 

mechanism is used to remove obstacles that concern more than one team (Schwaber 2004). In a larger 

project it is also possible to divide the product owner’s responsibilities. Cohn (2007) suggests using 

group of product owners with one chief product owner. The product owners work in the teams while 

the chief product owner manages the wholeness. Larman (2006) calls product owners working with 

scrum teams as feature champions. 

3.4 Extreme Programming 

Extreme Programming (XP) is a disciplined and still very agile software development method for 

small teams from two to twelve members. The purpose of XP is to minimize the risk and the cost of 

change in the software development. XP is based on the gained experiences, and successfully used 

practices of the father of the method, Kent Beck. Communication, simplicity, feedback, and courage 

are the values that XP is based on. Simplicity means as simple code as possible. No extra functionality 

is done beforehand even there might be need for a more complex solution in the future. Communication 

means continuous communication between the customer and the developers and also between the 

developers. Some of the XP practices also force communication. This enhances the spread of important 

information inside the project. Continuous testing and communication provide feedback from the state 

of the system and the development velocity. Courage is needed to make hard decisions like changing 

the system heavily when seeking simplicity and better design. Another form of courage is deleting 

code when it is not working at the end of day. To concretize these values there are twelve development 

practices which XP heavily counts on. The practices are listed below: 

• The Planning Game: Quickly determine the scope of the next release by combining business 

priorities and technical estimates. As reality takes over the plan, update the plan. 

• Small Releases: Put a simple system into production quickly, and then release new versions 

on a very short cycle. 

• Metaphor: Guide all development with simple shared story of how the whole system 

works. 

15

• Simple Design: The system should be designed as simply as possible at any given moment. 

Extra complexity is removed as soon as it is discovered. 

• Testing: Programmers continually write unit tests, which must be run flawlessly for development 

to continue. Customers write tests demonstrating that features are finished. 

• Refactoring: Programmers restructure the system without changing its behavior to remove 

duplication, improve communication, simplify, or add flexibility. 

• Pair Programming: All production code is written with two programmers at one machine. 

• Collective ownership: Anyone can change any code anywhere in the system at any time. 

• Continuous integration: Integrate and build the system many times a day, every time a task 

is completed. 

• 40-hour week: Work no more than 40 hours a week as a rule. Never work overtime second 

week in a row. 

• On-site customer: Include a real, live user on the team, available full-time to answer questions. 

• Coding standards: Programmers write all code in accordance with rules the emphasizing 

communication through the code. 

None of the practices are unique or original. However the idea in the XP is to use all the practices together. 

When the practices are used together they complement each other (Figure 4). (Beck 2000) 

Figure 4: The practices support each other (Beck 2000) 

16

3.5 Scrum and Extreme Programming Together 

It is possible to combine agile management mechanism from Scrum and engineering practices from XP 

(Control Chaos 2006b). Figure 5 illustrates this approach. Mar and Schwaber (2002) have experience 

that these two practices are complementary; when used together, they can have a significant impact on 

both the productivity of a team and the quality of its outputs. 

Figure 5: 

XP@Scrum (Control Chaos2006b) 

3.6 Measuring Progress in Agile Projects 

Ron Jeffries (2004) recommends using the Running Tested Features metric (RTF) for measuring the 

team’s agility and productivity. He defines the RTF in the following way: 

1. The desired software is broken down into named features (requirements, stories) which are 

part of the system to be delivered. 

2. For each named feature, there are one or more automated acceptance tests which, when they 

work, will show that the feature in question is implemented. 

3. The RTF metric shows, at every moment in the project, how many features are passing all 

their acceptance tests. 

17

The RTF is a simple metric and it measures well the most important aspect of software, which is the 

amount of working features. The amount of RTF should start to increase in the beginning of the project 

and keep increasing until the end of the project. If the curve is not rising, there must be some problems 

in the project. Figure 6 shows how the RTF curve could look like if the project is doing well. (Jeffries 

2004) 

Figure 6: RTF curve for an agile project (Jeffries 2004) 

18

4 TESTING IN AGILE SOFTWARE DEVELOPMENT 

Agile testing is guided by the Agile Manifesto presented in Figure 2. Marick (2001) sees the working 

code and conversing people as the most important guides for agile testing. Communication between 

the project and test engineers should not be based on communicating with the written requirements and 

the design specifications handed over the wall to the testing department and then communicating back 

with the specifications and defect reports. Instead Marick (2001) emphasizes face-to-face conversations 

and informal discussions as the main channel for getting testing ideas and creating the test plan. 

Test engineers should work with developers and help testing even unfinished features. Marick is one of 

the people agreeing with the principles of the context-driven testing school (Kaner et al. 2001a), and 

therefore the principles in agile testing and context-driven testing overlap. 

4.1 Purpose of Testing 

The purpose of agile testing is to build confidence in the developed software. In extreme programming 

the confidence is built on two test levels. The unit tests created with test-driven development increase 

the developers’ confidence, and the customer’s confidence is founded on the acceptance tests (Beck 

2000). Unit tests verify that the code works correctly and acceptance tests make sure correct code has 

been implemented. In Scrum the integration and acceptance tests are not described (Abrahamsson et al. 

2002), and therefore it is up to the team to define the testing related issues. Itkonen et al. (2005) state 

that in agile testing the focus is on the constructive quality assurance practices. This is opposite to the 

destructive quality assurance practices like negative testing used in the traditional testing. Itkonen et al. 

(2005) have doubts about the sufficiency of the constructive quality assurance practices, but admit that 

more research in that area is needed. 

4.2 Test Levels 

In agile development the different testing activities overlap. This is mainly because the purpose is to 

deliver working software repeatedly. The levels of agile testing cannot be similarly distinguished from 

development phases as the traditional test levels can. The contents of the different levels also differ in 

agile and traditional testing. As was mentioned in the previous chapter, in XP the confidence is built 

with the unit and acceptance tests. As was earlier mentioned, Scrum does not contain guidelines on 

how testing should be conducted. There are also other opinions in the agile community how the testing 

could be divided. Therefore there is no coherent definition for the test levels in the agile testing. However, 

test levels in XP and some other categorizations are presented below. 

19

UNIT TESTING 

The unit testing, sometimes called also as developer testing, can be seen very similar to traditional unit 

testing. However, unit tests are usually done using test-driven development (TDD). As the name testdriven 

indicates, unit tests are written before the code (Beck 2003; Astels 2003). When TDD is used it 

is obvious that a developer writes the unit tests. Even though TDD is used to create the unit tests, its 

only purpose is not just testing. The TDD is an approach to write and design maintainable code, and as 

a nice side effect, a suite of unit tests is produced (Astels 2003). 

ACCEPTANCE TESTING IN XP 

The acceptance testing in XP has a wider meaning than the traditional acceptance testing. Acceptance 

tests can contain functional, system, end-to-end, performance, load, stress, security, and usability testing, 

among others (Crispin 2005). Acceptance tests are also called customer and functional tests in XP 

literature, but in this thesis the term acceptance test is used. 

The acceptance tests are written by the customer or by a tester with the customer’s help (Beck 2000). 

In some projects defining the acceptance tests have been a joint effort of the team (Crispin et al. 2002). 

The aim of acceptance testing is to show that the product is working as the customer wants and increase 

her confidence (Beck 2000; Jeffries 1999). The acceptance tests should contain only tests for 

features that customer wants. Jeffries (1999) advices to invest wisely and pick tests that have a meaning 

when passing and failing. Crispin et al. (2002) mention also that the purpose of the acceptance tests 

is not to go through all the paths in the system because the unit tests take care of that. However, 

Crispin (2005) had noticed that teams doing TDD test only the “happy paths”, especially when trying 

the TDD for the first time. Misunderstood requirements and hard to find defects may go undetected. 

Therefore the acceptance tests keep the teams on track. 

The acceptance tests should be always automated, and the automated tests should be simple and created 

incrementally (Jeffries et al. 2001; Crispin & House 2005). However, in practice, automating all 

the tests are extremely hard and some trade-offs has to be done (Crispin et al. 2002). Kaner (2003) 

thinks that automating all acceptance tests is a serious error and the amount of automated tests should 

be decided based on the context. Jeffries (2006) admits that automating all the tests is impossible but 

still phrases “if we want to be excellent at automated testing, we should set out to automate all tests”. 

When automating the tests, the entire development team should be responsible for the automation tasks 

(Crispin et al. 2002). The test first approach can be used also with the acceptance tests. The acceptance 

test-driven development concept is introduced in Chapter 4.3. 

20

OTHER TESTING PRACTICES IN XP 

While the unit and the acceptance testing are the heart of XP, Beck (2000) admits that there are also 

other testing practices that make sense from time to time. He lists parallel test, stress test, and monkey 

test as examples of these kinds of helpful testing approaches. 

OTHER TEST LEVELS IN AGILE TESTING 

There are also other test level divisions in the agile testing community addition to the division in XP. 

Marick (2004) divides testing into four categories: technology-facing programmer support, businessfacing 

team support, business-facing product critiques, and technology-facing product critiques. In 

Marick’s division, unit testing can be seen as technology-facing programmer support and acceptance 

testing as business-facing team support. The business-facing product critiques means testing forgotten, 

wrongly defined, or otherwise false requirements. Marick (2004) believes that different kinds of exploratory 

testing practices can be used in this phase. Technology-facing product critiques corresponds 

to non-functional testing. 

Hendrickson (2006) divides the agile testing practices into automated acceptance or story tests, automated 

unit tests, and manual exploratory testing (Figure 7). She thinks the exploratory testing provides 

additional feedback and covers gaps in automation. She also states that the exploratory testing is necessary 

to augment the automated tests. This division is quite similar with the Marick’s (2004) division 

from the functional testing’s point of view. 

21

Figure 7: Agile testing practices (Hendrickson 2006) 

4.3 Acceptance Test-Driven Development 

The idea of acceptance test-driven development (ATDD) was firstly introduced by Beck (2003) with 

the name application test-driven development. However, he had some doubts on how well the acceptance 

tests can be written before the development. Before this, the acceptance test-driven development 

had been used, although named as acceptance testing (Miller & Collins 2001). After that, there have 

been projects using acceptance test-driven development (Andersson et al. 2003; Reppert 2004; Crispin 

2005; Sauvé et al. 2006). The ATDD concept has also been called as story test-driven development 

(Mudridge & Cunningham 2005, Reppert 2004) and customer test-driven development (Crispin 2005). 

22

PROCESS 

On a high level the acceptance test-driven development process contains three steps. The first step is to 

define the requirements for the coming iteration. In agile projects the requirements are usually written 

in a format of user stories. User stories are short descriptions representing the customer requirements 

used for planning and as a reminder (Cohn 2004). When the user stories are defined, the acceptance 

tests for those requirements can be done. As the name acceptance test indicates the purpose of these 

tests is to define the acceptable functionality of the system. Therefore, the customer has to take part in 

defining the acceptance tests. The acceptance tests have to be written in a format the customer understands 

(Miller & Collins 2001; Mudridge & Cunningham 2005). When the tests have been defined, the 

development can be started. As the concept on a high level is quite simple, there are multiple possible 

approaches by whom, when and in which extent the acceptance tests are written and automated. 

WHO WRITES THE TESTS 

As it was mentioned above, the customer or some other person with the proper knowledge of the domain 

is needed when writing the tests (Reppert 2004; Crispin 2005). Usually the customer needs some 

help in writing the tests (Crispin 2005). Crispin (2005) describes a process where the test engineer 

writes the acceptance tests with customer. On the other hand, it is also possible for the developers and 

the customer to define the tests (Andersson et al. 2003). It is also possible that the customer, the developers 

and the test engineers write the tests in collaboration (Reppert 2004). As can be seen, there are 

several alternative ways of writing the acceptance tests, and it evidently depends on the available people 

and their skills. 

WHEN TESTS ARE WRITTEN AND AUTOMATED 

Tests are written before the development when ATDD is used. This can mean writing the test cases 

before the iteration planning or after it. Mudridge and Cunningham (2005) describe an example on 

how to use the acceptance tests to define the user stories on a more detailed level and this way ease the 

task estimation in the iteration planning session. Watt and Leigh-Fellows (2004) have also used acceptance 

tests to clarify the user stories before the planning sessions. On the other hand, Crispin (2005) 

and Sauvé et al. (2006) describe a process where the acceptance tests are developed after the stories 

have been selected for the iteration. 

23

While working in one software development project, Crispin (2005) noticed that writing too many detailed 

test cases at the beginning can make it difficult for the developers to understand the big picture. 

Therefore, in that project the high level test cases were written at the beginning of the iteration and the 

more detailed low level test cases were developed parallel with the developers writing the code. This 

way the risk of having to rework a lot of test cases is lowered. A similar kind of an approach has also 

been used by Andersson et al. (2003) and Miller and Collins (2001). However, Crispin (2005) states 

that this is not “pure” ATDD because all the tests are not written before the code. 

HOW ACCEPTANCE TESTS ARE AUTOMATED 

As was mentioned in Chapter 4.2 the goal in agile testing is to automate as many tests as possible. Depending 

on the tool used to automate the test cases, the actual work varies. In general, there are two 

tasks. The test cases have to be written in a format that can be processed with the test automation 

framework. In addition to these test cases, some code is needed to move the instructions from the test 

cases into the system under test. Often this code bypasses the graphical user interface and calls 

straightly the business logic (Reppert 2004; Crispin 2005). 

There are several open source tools used to automate the test cases. The most known of these tools is 

FIT (Framework for Integrated Test) (Sauvé et al. 2006). When FIT is used the test cases consist of 

steps which are presented in tabular format. Developers have to implement test code for every different 

kind of step. Sauvé et al. (2006) see this as the weakness of FIT. Other tools and approaches used to 

automate the acceptance test cases are not presented here. 

24

PROMISES AND CHALLENGES 

Table 1 and Table 2 show the promises and challenges of acceptance test-driven development collected 

from the different references mentioned in the previous chapters. 

PROMISES 

The risk of building incorrect 

software is decreased. 

The development status is 

known at any point. 

A clear quality agreement is 

created. 

Requirements can be defined 

more cost-effectively. 

The requirements and tests 

are in synchronization. 

The quality of tests can be 

improved. 

The communication gap is reduced because the tests are an effective 

medium of communication between the customer and the development 

(Sauvé et al. 2006). When the collaboration takes place just 

before the development, there is a clear context for having a conversation 

and removing misunderstandings (Reppert 2004). Crispin 

(2005) even thinks that the most important function of the tests is to 

force the customer, the developers and the test engineers to communicate 

and create a common understanding before the development. 

When acceptance tests created in collaboration are passing, the feature 

is done. The readiness of the product can be evaluated based on 

the results of the suite of automated tests executed daily (Miller and 

Collins 2001). Knowing what features are ready makes also the project 

tracking easier and better (Reppert 2004). 

The tests made in collaboration with the customer and the development 

team serve as a quality agreement between the customer and 

the development (Sauvé et al. 2006). 

The requirements are described as executable artifacts that can be 

used to automatically test the software. Misunderstandings are less 

likely than with requirements defined in textual descriptions or diagrams. 

(Sauvé et al. 2006) 

Requirement changes become test updates, and therefore they are 

always in synchronization (Sauvé et al. 2006). 

The errors in the tests are corrected and approved by the customer, 

and therefore the quality of the tests is improved (Sauvé et al. 2006). 

25

Confidence in the developed 

software is increased. 

A clear goal for the developers. 

The test engineers are not 

seen as “bad guys”. 

Problems can be found earlier. 

Improve the design of the 

developed system. 

The correctness of refactoring 

can be verified. 

Without tests the customers cannot have confidence in the software 

(Miller and Collins 2001). The customers get confidence because 

they do not need to just hope that the developers have understood 

the requirements (Reppert 2004). 

The developers have a clear goal in making the customer defined 

acceptance tests to pass and that can prevent feature creep (Reppert 

2004, Sauvé et al. 2006). 

Because the developers and test engineers have the same well defined 

goal, the developers do not see the test engineers as “bad 

guys” (Reppert 2004). 

The customer’s domain knowledge helps to create meaningful tests. 

This helps to find problems already in an early phase of project 

(Reppert 2004). 

Joshua Kerievsky has been amazed how much simpler the code is 

when ATDD is used (Reppert 2004). 

The acceptance tests are not relying in the internal design of software, 

and therefore they can be used to reliably verify the refactoring 

has not broken anything (Andersson et al. 2003). 

Table 1: 

Promises of ATDD 

26

CHALLENGES 

Automating tests 

Writing the tests before development. 

The right level of test cases 

Crispin (2005) has noticed that defining and automating tests can be a 

huge challenge even with light tools like FIT. 

It might be hard to find time for writing the tests in advance (Crispin 

2005). 

Crispin (2005) has noticed that when many test cases are written beforehand, 

the test cases can cause more confusion than help to understand 

the requirements. This causes a lot of rework because some of 

the test cases have to be refactored. Therefore the team Crispin (2005) 

worked with, started with a few high level test cases and added more 

test cases during the iteration. 

Table 2: 

Challenges of ATDD 

Promises and challenges are revisited in the end of the thesis when the observations are analyzed. 

27

5 TEST AUTOMATION APPROACHES 

The purpose of this chapter is to describe briefly the field of test automation and the evolution of test 

automation frameworks. In addition the keyword-driven testing approach is explained on a more detailed 

level. 

5.1 Test Automation 

The term test automation usually means test execution automation. However, test automation is a much 

wider term and it can also mean activities like test generation, reporting the test execution results, and 

test management (Bach 2003a). All these test automation activities can take place on all the different 

test levels described in Chapter 2.5. The extent of test automation can also vary. Small scale test automation 

can mean tool aided testing like using a small collection of testing tools to ease different kind 

of testing tasks (Bach 2003a). On the other hand, large scale test automation frameworks are used for 

setting up the environment, executing test cases, and reporting the results (Zallar 2001). 

Automating the testing is not an easy task. There are several issues that have to be taken into account. 

Fewster and Graham (1999) have listed the common test automation problems as unrealistic expectations, 

poor testing practice, an expectation that automated tests will find a lot of new defects, a false 

sense of security, maintenance, technical problems, and organizational issues. As can be noticed, the 

list is quite long, and therefore all these issues have to be taken into account when planning the test 

automation usage. Laukkanen (2006) also lists some other test automation issues like when to automate, 

what to automate, what can be automated, and how much to automate. 

28

5.2 Evolution of Test Automation Frameworks 

The test automation frameworks have evolved over the time (Laukkanen 2006). Kit (1999) divides the 

evolution into three generations. The first generation test automation frameworks are unstructured, test 

cases are separate scripts containing also test data and therefore almost non-maintainable. In the second 

generation frameworks the test scripts are well-designed, modular and documented. This makes 

the second generation frameworks maintainable. The third generation frameworks are based on the 

second generation with the difference that the test data is taken out of the scripts. This makes the test 

data variation easy and similar test cases can be created quickly and without coding skills. This concept 

is called data-driven testing. The limitation of the data-driven testing is that one script is needed 

for every logically different test case (Fewster & Graham 1999; Laukkanen 2006). This can easily increase 

the amount of needed scripts dramatically. The keyword-driven testing is a logical extension of 

the data-driven testing (Fewster & Graham 1999), and it is described in the following chapter. 

5.3 Keyword-Driven Testing 

In the keyword-driven testing also the keywords controlling the test execution are taken out of the 

scripts into the test data (Fewster & Graham 1999; Laukkanen 2006). This makes it possible to create 

new test cases in the test data without creating a script for every different test case allowing also the 

test engineers without coding skills to add new test cases (Fewster & Graham 1999; Kaner et al. 

2001b). This removes the biggest limitation of the data-driven testing approach. Figure 8 is an example 

of keyword-driven test data containing two simple test cases for testing a calculator application. The 

test cases consist of keywords Input, Push and Check, and the arguments which are inputs and expected 

outputs for the test cases. As it can be seen, it is easy to add logically different test cases without 

implementing new keywords. 

29

Figure 8: Keyword-driven test data file (Laukkanen 2006) 

To be able to execute the tabular format test cases shown in Figure 8, there have to be mapping from 

the keywords to the code interacting with system under test (SUT). The scripts or code implementing 

the keywords are called handlers by Laukkanen (2006). In Figure 9 can be seen the handlers for the 

keywords used in test data (Figure 8). In addition to the handlers, test execution needs a driver script 

which parses the test data and calls the keyword handlers according to the parsed data. 

Figure 9: Handlers for keywords in Figure 8 (Laukkanen 2006) 

30

If there is a need for creating high level and low level test cases, different level keywords are needed. 

Simple keywords like Input are not enough for high level test cases. There are simple and more flexible 

solutions according to Laukkanen (2006). Higher level keywords can be created inside the framework 

by combining the lower level keywords. The limitation of this approach is the need for coding 

skills whenever there is a need for new higher level keywords. A more flexible solution proposed by 

Buwalda et al. (2002), Laukkanen (2006) and Nagle (2007) is to include a possibility to combine existing 

keywords in the keyword-driven test automation framework. This makes it possible to create 

higher level keywords by combining existing keywords inside the test data. Laukkanen (2006) calls 

these combined keywords as user keywords and this term will be used also in this thesis. 

31

6 KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK 

The keyword-driven test automation framework used in this research was developed inside the company 

the study took place and was called as Robot. The ideas and the basic concept of Robot were 

based on the master’s thesis of Laukkanen (2006). In the following chapters some functionalities of 

Robot that are interesting from this thesis’s point of view are briefly explained. 

6.1 Keyword-Driven Test Automation Framework 

In the keyword-driven test automation framework there are three logical parts; the test data, the test 

automation framework and the test libraries. The test data contains directives telling what to do with 

associated inputs and expected outputs. The test automation framework contains the functionality to 

read the test data, run the handlers in the libraries based on the directives in the test data, and handle 

errors during the test execution. The test automation framework contains also test logging and test reporting 

functionality. The test libraries are the interface between the framework and system under test. 

The libraries can use existing test tools to access the interfaces of the system under test or connect directly 

to the interfaces. In Figure 10 the logical structure of Robot is presented. 

Figure 10: 

Logical structure of Robot 

32

6.2 Test Data 

In Robot, the test data is in tabular format and it can be stored to html or tsv-files. The test data is divided 

to four different categories; test cases, keywords, variables and settings. All these different test 

data types are defined in their own table in the test data file. Robot recognizes the different tables 

through the name of the data type in the table’s first header cell. 

KEYWORDS AND TEST CASES 

In Robot, keywords can be divided into base and user keywords. Base keywords are keywords implemented 

in the libraries. User keywords are keywords that are defined in the test data by combining 

base keywords or other user keywords. The ability to create new user keywords in the test data decreases 

the amount of needed base keywords and therefore amount of programming. User keywords 

make it possible to increase the abstraction of test cases. In following Figure 11 the test cases shown in 

Figure 8 are modified to use user keywords Add, Equals and Multiply. The test cases are composed of 

keywords defined in the second column of test case table and arguments defined in the following columns. 

User keywords are defined in a similar way. In test case and keyword tables the second column 

is named as action. This column name can be defined by a user as it is not used by Robot. Same applies 

with the rest of the headers. 

Figure 11: Test cases and user keywords (Laukkanen 2006) 

33

VARIABLES AND SETTINGS 

It is possible to define variables in the Robot framework. Variables increase the maintainability of the 

test data because some changes need only updates to the variable values. In some cases variables can 

contain test environment specific data, like hostnames or alike. In these cases variables make it easier 

to use the same test cases in different environments with minimal extra effort. There are two types of 

variables in Robot. Scalar variables contain one value, and it can be anything from a simple string to an 

object. A list variable contains multiple items. Figure 12 contains a scalar variable ${GREETING} and 

a list variable @{ITEMS}. 

Figure 12: 

Variable table containing scalar and list variables 

Settings table is similar to the variable table. Name of the setting is defined in the first column and 

value or values in the following columns. Settings are predefined in Robot. Examples of settings are 

Library and Resource. Library setting is used to import library which contains the needed base keywords. 

The resource setting is used to import resource files. Resource files are used to define the user 

keywords and variables in one place. 

GROUPING TEST CASES 

There are two ways of grouping test cases in Robot. First of all, test cases are grouped hierarchically. 

A file containing the test cases (i.e. Figure 11) is called a test case file and it forms a test suite. A directory 

containing one or more test case files or directories with test case files also creates a test suite. In 

other words, hierarchical grouping is the same as the test data structure in the file system. 

The other way to group the test cases is based on project specific agreements. In Robot, there is a possibility 

to give words for the test cases that are used for grouping the test cases. These words are called 

tags. Tags can be used to define for example part of the system the test case tests, who has created the 

test case, does the test case belong to regression tests, and does it take a long time to execute the test 

case. 

34

6.3 Test Execution 

In Robot, the test execution is started from the command line. The scope of the test execution is defined 

by giving test suite directories or test case files as inputs. Without parameters, all the test cases in 

the given test suites are executed. A single test suite or test case can be executed with command line 

options. It is also possible to include or exclude test cases from the test run based on the tags (see the 

previous chapter). Command line execution makes it possible to start the test execution at some predefined 

time. It also enables starting test execution from continuous integration systems like Cruise Control 

(Cruise Control 2006). 

The test execution result can be pass or fail. By default, if even a single test case fails, the test execution 

result is a failure. To allow a success in test execution even with failing test cases, Robot contains 

a feature called critical tests. The test execution result is failure if any of the critical test cases fails. 

This means that test execution is considered successful even though non-critical test cases fail. The 

critical test cases are defined when starting the execution from command line. For example regression 

can be defined as a critical tag and all the test cases that contain a tag regression are handled as critical 

tests. This functionality allows adding test cases to the test execution in case the test case is failing, but 

the result is not wanted to be failure. This is needed if the test case or the feature is not ready. These 

test cases are not marked as critical. 

6.4 Test Reporting 

Robot produces a report, a log and an output from the test execution. The report contains statistics and 

information based on executed test suites and tags. It can be used as an information radiator since its 

background color shows whether the test execution status was pass or fail. The test log contains more 

detailed information about the executed keywords and information that can be used to solve problems. 

The output contains test execution results presented in an xml-format. The report and the log are generated 

from the output. 

35

7 EXAMPLE OF ACCEPTANCE TEST-DRIVEN 

DEVELOPMENT WITH KEYWORD-DRIVEN TEST 

AUTOMATION FRAMEWORK 

In this chapter a simple fictitious example of acceptance test-driven development with the Robot 

framework is shown. The purpose of this chapter is to help to understand the concept before showing 

the concept in practice. This is also a simple theoretical example of how the concept could work. However, 

at first the relation between user stories, test cases and keywords are briefly explained. 

7.1 Test Data between User Stories and System under Test 

As was described in Chapter 4.3, user stories are short descriptions representing the customer requirements 

used for planning. Different levels of test data is needed to map the user stories to the actual 

code interacting with the system under test. These levels and their interdependence are shown in Figure 

13. First of all the user story is mapped to one or multiple test cases. Every test case contains one or 

more sentence format keywords. The sentence format keyword means user keywords which are written 

in plain text, possibly containing some input or expected output values but no arguments. When the 

test cases contain only the sentence format keywords, those can be understood without technical skills. 

Every sentence format keyword consists of one or more base or user keywords. A user keyword includes 

one or more base or user keywords. Finally the base keywords contain the code which controls 

the system under test. The examples in the following chapters clarify the use of the different type of 

keywords presented above. 

Figure 13: 

Mapping from user story to the system under test 

36

7.2 User Stories 

The customer in this example is a person who handles registrations to different kind of events. People 

usually enroll to the events by email or by phone, and therefore the customer needs an application 

where to save the registrations. The customer has requested a desktop application that has a graphical 

user interface. The customer has defined following user stories: 

1. As a registration handler I want to add registrations and see all the registrations so that I can 

keep count of the registrations and later contact the registered people. 

2. As a registration handler I want to delete one or multiple registrations so that I can remove 

the canceled registration(s). 

3. As a registration handler I want to have the count of the registrations so that I can notice 

when there is no longer room for new registrations. 

4. As a registration handler I want to save registrations persistently so that I do not lose the 

registrations even if my computer crashes. 

7.3 Defining Acceptance Tests 

Before the stories can be implemented, there is a need to discuss and clarify a hidden assumption behind 

the stories. Details arising from the collaboration can be captured as acceptance tests. As was 

mentioned in Chapter 4.3, it can vary when and who are participating to this collaboration. Because 

those issues are a more matter of the process and the people available than the tool used, those issues 

are not taken into account in this example. 

The discussion about the user stories between the customer and the development team can lead to acceptance 

tests shown in Figure 14. The test cases are in the format that can be used as input for Robot. 

Test cases can be written straightly to this format using empty templates. However, it might be easier 

to discuss about the user stories and write drafts of the test cases to a flap board during the conversation. 

After sketches of the test cases have been made, those can be easily converted to digital format. 

37

Figure 14: 

Some acceptance test cases for the registration application 

While discussing the details of the user stories and the test cases, the outline of the user interface can 

be drawn. The outline in Figure 15 could be the result of the session where the test cases were created. 

It can be used as a starting point for the implementation. In the picture, names for the user interface 

elements are also defined. These are implementation details that have to be agreed if different persons 

are doing the test cases and the application. 

38

Figure 15: 

Sketch of the registration application 

7.4 Implementing Acceptance Tests and Application 

After the acceptance tests are defined it should be clear to all the stakeholders what are going to be implemented. 

If pure acceptance test-driven development is used, the test cases are implemented on a detailed 

level before the implementation of the application can be started. In this example the implementation 

of the test case User Can Add Registrations is described on a detailed level. 

CREATING THE TEST CASE ”USER CAN ADD REGISTRATIONS” 

User Can Add Registrations test case contains three sentence format keywords as can be seen in Figure 

16. The creation of the test case starts with defining those sentence format keywords. To keep the actual 

test case file as simple as possible, the sentence format keywords are defined in a separate resource 

file. The keywords defined in the resource file have to be taken into use by importing the resource file 

in the setting table. Because the test case starts with a sentence format keyword which launches the 

application, the application has to be closed at the end of the test case. This can be done in the test case 

or with a Test post condition setting. These two settings are shown in Figure 17. 

39

Figure 16: 

Test case “User Can Add Registrations” 

Figure 17: 

Settings for all test cases 

Figure 18 shows variables and user keywords defined in the atdd_keyword.html resource file. List 

variables @{person1}, @{person2} and @{person3} are described in the variable table. The comments 

Name and Email are used to clarify the meaning of the different columns. These variables are used in 

the sentence format keywords created in the keyword table. Application is started and there are no 

registrations in the database keyword contains two user keywords. The first keyword Clear database 

makes sure there are not users in the database when the application is started. The second keyword 

User launches registration application launches the registration application. The next two user keywords 

User adds three people and all three people should be shown in the application and should exist 

in the database repeat the same user keyword with the different person variables described in the variable 

table. These user keywords are not using base keywords from the libraries, and therefore the test 

case is not accessing the system under test at this level. The user keywords used to create the sentence 

format keywords can be defined in the same resource file or in other resource files. The missing user 

keywords are defined in resource file resource.html. 

40

Figure 18: 

Variables and user keywords for test case “User Can Add Registrations” 

In Figure 19 the user keywords used in the atdd_resource.html resource file are described. The base 

keywords needed by these user keywords are imported from the SwingLibrary and the OperatingSystem 

test libraries in the settings table. The SwingLibrary contains base keywords for handling the 

graphical user interface of applications made with Java Swing technology. The OperatingSystem library 

is a part of Robot, and it contains base keywords for example handling files (like Get file) and 

environment variables, and running system commands. If there are no existing libraries for the technologies 

the system under test is implemented with or some needed base keywords are missing from 

the existing library, the missing keywords must naturally be implemented. 

41

Figure 19: 

User keywords using the base keywords 

User launches registration application means the Launch base keyword with two arguments, the main 

method of the application, and the title of the application to be opened. Both of these arguments have 

been defined in the variables table as scalar variables. User Closes Registration Application uses the 

Close base keyword which simply closes the launched application. Clear Database consists of the base 

keyword Remove file which removes the database file from the file system. The ${DATABASE} variable 

contains the path to the database.txt file which is used as a database by the registration application. 

The ${CURDIR} and ${/} variables are Robot’s built-in variables. ${CURDIR} is the directory 

where the resource file is and ${/} is a path separator character which is resolved based on the operating 

system. 

42

User adds registration keyword takes two arguments ${name} and ${email}, and it consists of the 

Clear text field, Insert into text field and Push button base keywords. All these keywords take as the 

first argument the identifier of the element. These identifiers were agreed in the discussion and can be 

seen in Figure 15. The ${name} and ${email} arguments are entered to the corresponding text fields 

with the Insert into text field keyword. In the Registration should be shown in the application and 

should exist in the database user keyword the List value should exist base keyword is used to check 

that the name and email are in the list shown in the application. The Get file base keyword is used to 

read the data from the database to the ${data} variable and the Contains base keyword is used to check 

that the database contains the name and email pair. 

EXECUTING THE TESTS 

The team has made an agreement that all test cases that should pass will be tagged with a regression 

tag. When the first version of the application is available, the created test cases can be executed. At this 

stage none of the test cases are tagged with the regression tag. The result of this first test execution can 

be seen in Figure 20. Four of the eleven acceptance test cases passed. Passing test cases can be tagged 

now as regression test. In Figure 21 one of the passing test cases tagged with the tag regression is 

shown. When the test cases are executed next time, there will be four critical test cases. If any of those 

test cases fail, the test execution result will be failure and the report will turn to red. 

43

Figure 20: 

First test execution 

Figure 21: 

Acceptance test case tagged with tag regression 

When the application is next time updated, the test cases are executed. Again, all passing test cases can 

be tagged with the regression tag. At some point, all the test cases will pass and the features are ready 

and the following items can be taken under development. New acceptance test cases are defined, and 

the development can start. In case the old functionality is changed, the test cases have to be updated 

and the regression tags have to be removed. 

44

8 ELABORATED GOALS OF THE THESIS 

In this chapter the aim of this thesis is described on a more detailed level. First the scope is defined. 

Then the actual research questions are presented. 

8.1 Scope 

As it was seen in the previous chapters, the field of software testing is very wide. In this thesis the focus 

is in the acceptance test-driven development. It is important to distinguish the traditional acceptance 

test level and the agile acceptance test level and in the context of this thesis the term acceptance 

testing refers to the latter. Other testing areas are excluded from the scope of this master’s thesis. The 

testing areas which are excluded in this thesis are non-functional testing, static testing, unit testing, and 

integration testing. Also manual acceptance testing in such is out of the scope, but in some cases it may 

be mentioned. 

The different aspects and generations of test automation were explained in Chapter 6. This thesis concentrates 

the on large scale keyword-driven test automation framework called Robot. The following 

aspects of the test automation are included to scope of this thesis: creating the automated acceptance 

test cases, executing the automated acceptance test cases and reporting the test execution results. 

8.2 Research Questions 

The main aim of this thesis is to study how the keyword-driven test automation technique can be used 

in the acceptance test-driven development. The study is done in a real life software development project, 

and therefore another aim is to give an example on how a keyword-driven test automation framework 

was used in this specific case and also describe all the noticed benefits and drawbacks. The research 

question can be stated as: 

1. Can the keyword-driven test automation framework be used in the acceptance testdriven 

development 

2. How is the keyword-driven test automation framework used in the acceptance testdriven 

development in the project under study 

3. Does the acceptance test-driven development with the keyword-driven test automation 

framework provide any benefits What are the challenges and drawbacks 

45

The first question can be divided into the following more detailed questions: 

1. Is it possible to write the acceptance tests before the implementation with the keyword-driven 

test automation framework 

2. Is it possible to write the acceptance tests in a format that can be understood without 

technical competence with the keyword-driven test automation framework 

The second question can be divided into the following parts: 

1. How, when and by whom the acceptance test cases are planned 

2. How, when and by whom the acceptance test cases are implemented 

3. How, when and by whom the acceptance test cases are executed 

4. How and by whom the acceptance test results are reported 

The third research question can be evaluated against the promises and challenges of acceptance testdriven 

development shown in Table 1 and Table 2 in Chapter 4.3. 

46

9 RESEARCH SUBJECT AND METHOD 

The purpose of this chapter is to explain where and how this research was done. At first, the case project, 

and the product developed in the project are described on a level that is needed to understand the 

context where the research took place. Then the research method and the used data collection methods 

are described. 

9.1 Case Project 

This research was conducted in a software project at Nokia Siemens Networks referred as the Project 

from now on. The Project was located in Espoo. The Project consisted of two scrum teams each consisting 

of approximately ten persons. In addition to the teams, the Project had a product owner, a project 

manager, a software architect and half a dozen specialists working as feature owners. Feature 

owner meant same as feature champion (see Chapter 3.3). There were also several supporting functions 

like a test laboratory team. Several nationalities were represented in the Project. 

The software product developed in the Project was a network optimization tool referred as the Product 

from now on. The Product and its predecessors had been developed almost for 5 years. The Product is 

bespoke software aimed for mobile network operators. The Project was started June 2006, and the 

planned end was December 2007. The Product was a desktop application which was used through a 

graphical user interface developed with Java Swing technology. 

9.2 Research Method 

The Project under study was decided before the actual research method was chosen. When the role of 

the researcher became clear, there were two qualitative approaches to select from; case study and action 

research. It was clear from the beginning that the researcher would be highly involved with the 

Project under research. This high involvement with the Project prevented choosing case study for the 

research method. Action research was more suitable for this research. Unlike other research methods 

where the researcher seeks to study organizational phenomena but not to change them, the action researcher 

is concerned with creating organizational changes and simultaneously studying the process 

(Babüroglu & Ravn 1992). This describes pretty well the situation with this research. Researcher was 

participating, giving trainings and helping to define the actions that would change the existing process. 

47

While the research method was chosen, it was also kept in mind that one purpose of the research was 

to try out the acceptance test-driven development in practice. There was a demand for a method that 

would enable a practical approach to the problem. Avison et al. (1999) define that action research 

combines theory and practice (and researchers and practitioners) through change and reflection in an 

immediate problematic situation within a mutually acceptable ethical framework. This was another 

reason why action research was chosen to be the method for this research. 

According to Avison et al. (1999), action research is an iterative process involving researchers and 

practitioners acting together on a particular cycle of activities, including problem diagnosis, action intervention, 

and reflective learning. The iterative process of action research suited well with the iterative 

process of Scrum. The research iteration length was chosen to be the same as the length of the Scrum 

iterations. Figure 22 shows how these two processes were synchronized. With this arrangement the 

research cycle was quite short, but it also helped to concentrate on small steps in changing the process. 

It also helped to prioritize the most important steps. 

Figure 22: 

Action research activities and the Scrum process 

48

A management decision to increase the amount of automated testing was made before the research project 

started. This decision was also a trigger for starting this research. Stringer (1996) mentions that 

programs and projects begun on the basis of the decisions and the definitions of authority figures have 

a high probability of failure. This was taken into account in the beginning of the research and led to a 

different starting phase than what was defined in Stringer (1996) about defining the problems and defining 

the scope and actions based on that problem definition. Because the goal was already defined, 

the research started from collecting data about the environment and implementing the new acceptance 

test-driven development process. Otherwise, the action research method defined in Stringer (1996) was 

used. 

9.3 Data Collection 

There were two purposes for the data collection. The first purpose was to collect data about problems 

and benefits that individual project members confronted and noticed during the Project. The other purpose 

was to record the agreed implementation of the acceptance test-driven development and also to 

observe how this agreement was actually implemented. The latter was even more important as Avison 

et al. (1999) mentioned that, in action research, the emphasis is more on what practitioners do than on 

what they say they do. 

The data was collected with observations, informal conversations, semi-formal interviews and collecting 

meaningful emails and documents. The data was collected during four months period from the beginning 

of January 2007 to the end of April 2007. The researcher worked in the Project as a test automation 

engineer. The observations and the informal conversations were conducted when working in 

the Project. One continuous method to collect relevant issues was recording the issues raised in the 

daily scrum meetings. 

Initial information collection based mainly on informal discussions but also a few informal interviews 

were used. The main purpose of initial information collection was to build an overall understanding of 

the Project and a deep understanding about the testing in the project. This was done by asking questions 

about the used software processes, software development and testing practices, and problems encountered 

in these issues. Some interviews contained also questions about the Project’s history. 

49

The final interviews were semi-formal interviews meaning that the main questions were pre-defined 

but questions derived from the discussion were also asked. Nine persons were interviewed. Interviewees 

consist of two developers, two test engineers, two feature owner/usability specialist, one feature 

owner, one scrum master and one specification engineer. All these persons had participated more or 

less in developing the features with ATDD. The final interviews in the end of the research focused 

more on the influences of acceptance test-driven development on different software development aspects. 

Appendix B contains the questions asked in the final interviews. The interview questions were 

asked in order presented in the appendix, and the objective was to lead respondents’ answers as little as 

possible. Specifying questions were asked to get reasoning to the answers. The interviews were both 

noted done and tape-recorded. 

50

10 ACCEPTANCE TEST-DRIVEN DEVELOPMENT WITH 

KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK IN 

THE PROJECT UNDER STUDY 

This chapter describes what was done in the Project where the acceptance test-driven development was 

tried out. The emphasis is on issues that are relevant from the acceptance test-driven development 

point of view. At first the development model and practices used in the project are described. The case 

project was described in Chapter 9.1. Then it is illustrated how the keyword-driven test automation 

framework was used in the Project. The emphasis is on the four areas mentioned in the second research 

question in Chapter 8.2. At the end of this chapter the results of the final interviews are presented. 

10.1 Development Model and Development Practices Used in the 

Project 

The development process used in the Project was Scrum. Scrum was introduced and taken into use at 

the beginning of the project. That meant that the adjustment to the process was ongoing at the time of 

the research. There were also some differences to Scrum presented in Chapter 3.3. The biggest difference 

was the format of the product backlog. The main requirement types in the Project were the requirements 

defined in the requirement specifications and workflows. A workflow contained all the 

steps that user could do with the functionality. The workflow was a high level use-case. It contained 

multiple steps that were related to each other. These steps were divided into mandatory and optional 

steps. Every step in the workflow could be seen as a substitute for an item in the product backlog. 

As was mentioned in Chapter 3.3, Scrum does not define development practices other than the daily 

build. In the Project continuous integration was used. There were no rules defining which development 

practices should be used during the Project. Extreme programming practices like refactoring were used 

from time to time by the development team. The developers created unit tests, and there were targets 

for the unit testing coverage. However, the unit tests were not done using test-driven development. 

Main details of the features were written down to feature descriptions which were short verbal description 

of the feature. During the research project, the testing division to the automated acceptance test, 

automated unit test and manual exploratory testing was taken into use (see Chapter 4.2). 

51

The test automation with Robot was started in September 2006. At the beginning of the Project the 

automated test cases were created for the already existing functionality. This automation task was done 

by a separate test automation team. At the time the research was started, automated test cases covered 

most of the basic functionalities. This meant that the library to access the graphical user interfaces of 

the Product was already developed for some time, and it included the base keywords for most of the 

Java Swing components. At this stage there was a desire to create the automated test cases for features 

during the same sprint. To make this possible the acceptance test-driven development was taken into 

use. 

10.2 January Sprint 

In the first research sprint the goal was to start acceptance test-driven development with a few new features. 

At first it was problematic to find features to be developed with acceptance test-driven development. 

As part of the implementation was follow-on to the implementation of the previous sprints. 

These features were seen problematic to start with. Some of the new features needed internal models, 

and while being developed, they could not be tested through the user interface. Finally, one new feature 

was chosen to be the starting point. The feature was the map layer handling. The map layer handling 

is used to load backgrounds to a map view of the Product. Network elements and information 

about the network is shown on the map view. 

As mentioned, there was a separate team for the test automation when the research started. To be able 

to work better in the scrum teams, the test automation team members started working as team members 

in the scrum teams. This was done at the beginning of the sprint. 

PLANNING 

The test planning meeting for the map layer handling feature was arranged by a test engineer. It took 

place at the middle of the sprint, before the developer started implementing the feature. The participants 

of the meeting were a usability expert/feature owner, developer, test engineer and test automation 

engineer. 

52

The meeting started with a general discussion about the feature to be implemented, and the developer 

draw a sketch about the user interface he had in mind. After the initial sketch, the group started to think 

about the test cases; how the user could use the feature, and which kind of error situations should be 

handled. The sketch was updated based on the noticed needs. The test engineers wrote down test cases 

when ever those were agreed. During the discussions some important decisions were made about supported 

file formats and supported graphic types. At the end of the meeting, the agreed test cases were 

gone through to make sure that all of those were written down. At this phase the test cases were not 

written in any formal format. 

IMPLEMENTATION 

The test case implementation started by writing the test cases agreed in the planning meeting to the 

tabular format. At the same time, the developer started the development. Figure 23 contains some of 

the initial test cases. The highest level of abstraction was not used in the test cases, and therefore the 

test cases consist of lower level user keywords with short names and variables. These test cases remind 

more the test cases the test automation team had implemented earlier than the test cases shown in the 

example in Chapter 7.2 and Figure 13. 

53

Figure 23: 

Some of the initial acceptance test cases for map layer handling 

The implementation of the user keywords started after the test cases were written. There was a need to 

implement multiple base keywords even though the library had been developed for some months. Fortunately, 

the test automation engineer had time to create these needed keywords. At this stage the identifiers 

needed to select the correct widgets from the user interface were replaced with variables. The 

variable values were set when the developer had written identifiers to the code and emailed those to the 

test engineer. 

From the beginning it was clear that verifying that the map layers are drawn and shown correctly on 

the map would be hard to automate. It was not seen sensible to create a library for verifying the correctness 

of the map, and therefore a substitutive solution was created. The solution was to take screenshots, 

and combine those with instructions defining what should be checked from the picture. This lead 

to manual verification, but doing that from time to time was not seen as a problem. 

54

One of the tested features was changing the colors of the map layers. A base keyword for changing this 

color was created, but when it was tried out, it was not working. After the problem was investigated by 

the developer and test automation engineer, the base keyword implementation was noticed to be incorrect. 

However, the changes made to the base keyword were not correcting the problems, and one more 

problem was noticed in the application. These problems were technical test automation problems. The 

investigations took some time and the color changing functionality could not been tested in this sprint 

by automation. Also some parts of the feature were not fully implemented, and those were moved to 

the next sprint. 

TEST EXECUTION 

The test cases were executed on the test engineer’s and the test automation engineer’s workstations 

during the test case implementation phase. There were problems to get a working build during the 

sprint, and that slowed down the ability to test that the test cases and especially the implemented base 

keywords were working. During this phase the problems in test cases were corrected and defects were 

reported to the developer. 

REPORTING 

The project had one dedicated workstation for executing the automated acceptance test after the continuous 

integration system had successfully built the application. The web page showing a report of the 

latest acceptance tests was visible at a monitor situated in the project area. The test cases created during 

the sprint were added to an automatic regression test set at the end of the sprint. Tests that were 

passing at the end of the sprint were marked as regression tests with the Robot’s tagging functionality 

(see Chapter 6.2). The ability to define the critical test cases based on tags made it possible to execute 

all the tests even some test cases and features were not working. 

10.3 February Sprint 

In the second sprint the goal was to finalize the test cases for the map layer functionality and start 

ATDD with a few more functionality. The functionality selected was the visualization of the Abis configuration. 

The purpose of the feature was to collect data from multiple network elements and show the 

Abis configuration based on the collected data. 

55

PLANNING 

Immediately after the sprint planning people involved with the visualization of the Abis configuration 

feature development kept a meeting about the details of the feature. There were a feature owner, a 

specification person, two developers, a test engineer, a test automation engineer, a usability expert and 

a scrum master. The usability specialist had developed prototype showing how the functionality should 

look like. By using this prototype as starting point, the team discussed different aspects of the feature 

and asked specifying questions. The test automation engineer made notes during the meeting. 


Based on the issues agreed in the meeting, the initial test cases were created, and those were sent by 

email to all the participated people. The test cases were created on a high level to make them more understandable 

and these test cases can be seen in Figure 24. 

Figure 24: 

Initial acceptance test cases for the Abis configuration 

56

After the test cases were described, the needed keywords were implemented. Figure 25 contains the 

implementation of the sentence format keywords. The variables used in the keywords were defined in 

the same file as these keywords. As can be seen User opens and closes Abis dialog from navigator user 

keyword was used by multiple keywords, and its implementation can be seen in Figure 26. The user 

keywords used to implement the User opens and closes Abis dialog from navigator user keyword consists 

of user keywords and base keywords. 

Figure 25: 

The highest level user keywords used to map the sentences to user keywords and variables 

57

Figure 26: 

Lower level user keywords “User opens and closes Abis dialog from navigator” implementation 

Again more base keywords were needed. However, the base keywords were not implemented into the 

SwingLibrary. There was a need to implement a helper library to handle the data that was checked 

from the configuration table. The configuration table contained 128 cells and the content of every cell 

was wanted to be verified. The tabular test data format allowed to describe the expected output almost 

in the same format as it was seen in the application. However, the expected outcome could not be defined 

beforehand. The input for the feature was configuration data from a mobile network. In this context 

it was hard to create all the needed network data in a way that the expected outcome would be 

known and the data would be correct. In the test cases existing test data was used, and the configuration 

view to be tested automatically was selected from the available alternatives in the existing test 

data. 

58

Soon after the middle point of the sprint there was a meeting where the status of the visualization of 

the Abis configuration feature was checked. The feature was used by a few specialists while the scrum 

team was responding to the raised questions and writing down observations. Based on this meeting and 

some other informal discussions some more details were agreed to be done in the sprint. Figure 27 contains 

some of the test cases which were added and updated after the meeting. The changes were marked 

with bolded text to highlight them. 

Figure 27: 

Some of the added and updated test cases 

As was mentioned earlier, there was no easy way to automatically test the map component. However, 

one of the acceptance test cases was supposed to test that the Abis view can be opened from the map. It 

was not seen possible to automate the test case with a reasonable effort. The test case was still written 

down and tagged with the tag manual. The manual tag made it possible to see all the acceptance test 

cases that had to be executed manually. Another challenge was to keep the test cases in synchronization 

with the implementation because the details were changed a few times. 

The test case User Can See The Relation Between TRX And DAP visible in Figure 27 was one of the 

test cases added in the middle of the sprint. The implementation of the test case could not be finished 

during the sprint. The exact implementation of the feature was changed a few times, and the test case 

was not implemented before the implementation details were final. The feature was ready just before 

the sprint ended, and there was no time to finalize the test case. This was due to the final details were 

decided so late, and different people were implementing the feature and the test case. 

59

The problems in the map layer handling feature and base keywords were discussed during the sprint. 

Some changes to the map layer handling functionality were agreed. These changes were mainly functional 

changes to solve the problems with the feature itself. The acceptance test cases needed updates 

due to these changes. While the test cases were updated, they were changed to include only sentence 

format keywords. Some of the new test cases can be seen in Figure 28. The change was quite easy because 

most of the keywords were already ready and the mapping from sentence format keywords to 

user keywords and variables was very straightforward. 

Figure 28: 

Some of the updated acceptance test cases for the map layers handling functionality 

60

The problem with implementing the base keyword in the previous sprint was solved soon after the test 

cases were updated. There was also a need for implementing one base keyword, and again there were 

some small technical problems. The problem was again with a custom component. However, this time 

the problem was solved quite quickly. Some challenges in the implementation and implementing functionality 

with a higher priority took so much time that the map layers handling functionality was not 

ready at the end of the sprint and a few nasty defects remained open. 


The test cases were executed by the test automation engineer while developing the test cases similarly 

as in the previous sprint. During the sprint there were still problems with the builds. This made it 

harder to check whether the test cases were working or not and when some of the features were ready. 

During this sprint Abis configuration test cases found a defect from a feature which had already 

worked. 

REPORTING 

The reporting was done in a similar way as in the previous sprint. 

10.4 March Sprint 

During the previous two sprints it was seen that the test automation team was too much responsible for 

the test automation. The knowledge was decided to be divided more into the whole team. This meant 

arranging training during the sprint. The purpose was to continue the ATDD research with other new 

functionality. However, some of the team had to participate in a maintenance project during the sprint, 

and the sprint content was heavily decreased. 

PLANNING 

The team had agreed that the details of the new functionality should be agreed on a more detailed level 

in the sprint planning. Therefore the team and the feature owner were discussing on a detailed level 

what should be implemented in the sprint. All the details could not be described in the first planning 

meeting, and thus a second meeting was arranged. In the second meeting the feature owner, two developers, 

usability expert/feature owner, test engineer and test automation engineer participated. The 

functionalities were gone through, and there were discussion about the details. Agreements about the 

implementation details were made, and those were noted down. 

61


The test automation engineer was responsible for arranging the training, and therefore the test cases 

were not implemented at the beginning of the sprint. After the training, a developer and the test automation 

engineer implemented the test cases which were not finished during the previous sprint. At this 

point contents of the current sprint were decreased. All the functionality that was planned in the second 

planning meeting was moved to the following sprint. The initial test cases were still created before the 

sprint ended, and some of those can be seen in Figure 29. 

Figure 29: 

Initial test cases for Abis rule 

62


The test cases that were implemented by the developer and test automation engineer were added to the 

automated test execution system immediately after they were ready. All test cases created during the 

previous sprints were already there. 

10.5 April Sprint 

The goal in the April sprint was to continue with ATDD with the Abis analysis functionality. There 

were some big changes in the beginning of the sprint. The Abis analysis workflow was wanted to be 

ready at the end of the sprint. This led to combining the two teams to one big sprint team. The team 

that had not worked with the Abis analysis earlier needed introduction to the functionality. The big 

team made it impossible to go into such details that the acceptance tests could be updated during the 

sprint planning. 

PLANNING 

As it was mentioned in the previous chapter, initial acceptance test cases were created during the earlier 

sprint. After the sprint planning, the feature owner, the specification engineer and the test automation 

engineer went through the initial test cases and updated them. Some of the details still remained 

open as the feature owner found those out later in the sprint. After the test cases were updated, they 

were sent to the whole team. 


The implementation started immediately after the acceptance test cases were updated. The test automation 

engineer was writing the test cases. After some of the sentence format keywords were implemented, 

one step needed clarification. The test automation engineer invited two usability specialist/feature 

owners and a specification/test engineer to a meeting where the different options to solve the 

usability problem were discussed. After all the options were evaluated the test automation engineer 

discussed with the developer and the software architect about possible solutions. The changes were 

agreed to be implemented, and three developers, the usability expert/feature owner and the test automation 

engineer planned and agreed about the details for the feature. Based on the agreed details the 

test automation engineer created the acceptance tests for the new feature. Technically the test cases 

were created in a similar manner as in the previous sprints. 

63

The acceptance test cases were dependent on each other because every test case was a step in the Abis 

analysis workflow. This caused some problems as the first step, getting the needed data into the application, 

was ready only at the last day of the sprint. A part of the test cases could not be finalized before 

this data was available. It was seen too laborious to count all the needed inputs beforehand. Also one 

part of the feature could not be finished during the sprint. Therefore a few test cases were not ready 

when the sprint ended. 

At the end of the sprint, the test engineer and the test automation engineer created some more detailed 

test cases to test the Abis rule. These test cases tested different variations and checked that the rule result 

was correct. However, the rule was not working as it was meant to be. The developer, the feature 

owner and the test automation engineer had understood the details differently. This led to a more detailed 

discussion between these parties. It was even noticed that some of the special cases were not 

handled correctly. Based on the discussion the developer and test automation engineer wrote down all 

the different situations and mailed those to the feature owner. It was agreed that this kind of details 

need acceptance test cases in the coming sprints. 


Some of the test cases were verified in the developers’ development environments. One test case was 

failing, and it was noticed that the feature implementation has to be improved to fulfill the requirements. 

The developers continued the implementation, and after they thought it was ready, the acceptance 

test cases were executed again, and those passed. It was seen that the feature was ready. Some 

other test cases were executed in the test automation engineer’s workstation. Some problems and misunderstandings 

were found, and they were reported to developers. 

REPORTING 

The test cases were added to the acceptance test execution environment after they were updated in the 

beginning of the sprint. The idea was to make the development status visible to all via the acceptance 

test report. However, all the test cases were failing most of the sprint, and only a few days before the 

sprint ended, some of them passed. Even at the end of the sprint all of them were not passing. 

64

It was also planned to create a running tested features diagram from the acceptance test results. However, 

this idea was discarded because it was seen that it would not give the correct picture from the 

projects status. Some of the test cases were not acceptance test cases in a sense that those were defined 

by the test engineers, not by the feature owners. This limitation could be avoided by using acceptance 

tag and include only test cases with this tag to the RTF diagram. An even more important reason for 

dropping the idea was the fact that the whole projects development was not done in the ATDD manner. 

10.6 Interviews 

This chapter collects the experiences from the project members involved in the team which developed 

features with the acceptance test-driven development. The interview methods are described on a more 

detailed level in Chapter 9.3. Altogether nine persons were interviewed and in this chapter the results 

are briefly described. Results of the interviews are analyzed on a more detailed level in Chapter 11. 

CHANGES IN THE SOFTWARE DEVELOPMENT 

Interviewees thought that the biggest change due to the use of ATDD had been the increased understanding 

of the details and workflow in the whole team. One developer thought that ATDD had forced 

the team to communicate and co-operate. Another developer mentioned that due to ATDD, feedback 

about the features is obtained faster. The test engineers saw that they were able to affect on the developed 

software more than before. 

BENEFITS 

The biggest benefit mentioned in the interviews was a better common understanding of the details due 

to the increased communication, cooperation, and detailed planning. Four interviewees saw that requirements 

and feature descriptions are more accurate than before. One feature owner had noticed 

missing details in the requirements while defining the acceptance tests. The developers thought that 

they knew better what was expected from them. Three other interviewees agreed. Four interviewees 

saw that the increased understanding of the details had lead to doing the right things already at the first 

time. Two interviewees thought the acceptance test cases had increased the overall understanding 

about the workflow. One respondent had noticed improvements in team work. 

65

The test engineers thought their early involvement was beneficial because they were able to influence 

on the developed software, ask hard questions and create better test cases due to the increased understanding. 

One test engineer thought that being in the same cycle with the development is very efficient 

because then people remember what they have done, and therefore problems can be solved with a s- 

maller effort. One feature owner was of the opinion that the test engineers and developers understand 

better what to test and how to test. She also mentioned that the testing is now covering a full use case. 

Three interviewees mentioned that feedback was obtained much faster than earlier. The early involvement 

of the test engineers and test automation to helped to shorten the feedback loop. One developer 

saw that the automated user interface testing is improved. One interviewee thought the automated acceptance 

tests keep the quality at a certain level but does not increase it. Another interviewee was of 

opinion that test automation helps to reduce the manual regression testing and test engineers can concentrate 

more on the complex scenarios and use more their domain knowledge. 

DRAWBACKS 

There were not many drawbacks according to the interviewees. Two interviewees thought that the initial 

investment to test automation is the biggest disadvantage and they were wondering if the costs will 

be covered in the long run. Two interviewees were of the opinion that the extra work needed to rewrite 

the test cases after possible changes is a problem. One feature owner thought that the time needed to 

write the initial test cases is also a kind of a drawback. Two interviewees were speculating that some 

developers may not like that others come to their territory. Four interviewees could not find any weaknesses 

that are in the same dimension with the benefits. 

CHALLENGES 

Test data was seen as the biggest challenge and five respondents mentioned it. Flexible creation of test 

data and its use in acceptance test cases were considered challenging. Also reliable automated algorithm 

testing was seen as problematic. One developer mentioned that testing the map component and 

other visual issues with automated test cases would be troublesome. Three interviewees thought that 

there may be challenges with change resistance. The test engineers found that it was difficult to find 

the right working methods. The increased cooperation increases the need for asking right questions, 

and that can also be challenging. 

66

INFLUENCE ON THE RISK OF BUILDING INCORRECT SOFTWARE 

There were varying views on how ATDD influences the risk of building incorrect software. Some interviewees 

saw two risks. The first risk was building software that does not fulfill the end customer’s 

expectation. The second risk was building software that does not fulfill the requirements or the feature 

owner’s expectations. Two persons saw that ATDD does not affect on the risk of building incorrect 

software from the end user’s point of view. On the other hand, one test engineer thought that the early 

involvement of testing may even decrease the risk. Seven interviewees saw that the second risk about 

not creating software that has been specified and wanted by the internal customer had decreased compared 

to earlier. Increased communication, discussion about the details and an increased common understanding 

before the implementation were seen as the main reasons. One interviewee thought that if 

the test cases are incorrect and those are followed too narrowly, it may increase the risk. Another response 

was that if the application is developed too much from the test automation’s point of view, the 

actual application development could suffer. 

VISIBILITY OF THE DEVELOPMENT STATUS 

The visibility of the development status was not seen to have changed much with the use of ATDD. 

One individual view was that the automated tests will increase it in the future. Another comment was 

that breaking the tests into smaller parts and arranging a sprint-specific information radiator could 

help. The developers thought that merging the acceptance test reports as a part of the build reports 

would improve the situation. 

QUALITY AGREEMENT BETWEEN THE DEVELOPMENT AND FEATURE OWNERS 

Seven interviewees saw the acceptance test cases as an agreement between the development team and 

feature owners because the test cases were done in cooperation. However, four of them saw that the 

agreement is a functional agreement and not a quality agreement. The quality was seen as a bigger entity 

than correct functionality. Two interviewees saw that the agreement had not yet formed. 

67

CONFIDENCE IN THE APPLICATION 

In general, the confidence in the application had increased. One developer saw that ATDD had enhanced 

his confidence in the software because he knew that he was developing the right features. Also 

three other persons saw that confidence had grown because there was a common understanding on 

what should be done. Three other interviewees were of the opinion that test automation had built the 

confidence mainly because passing automated test cases indicated that the application was working on 

a certain level. One interviewee saw that the automated test cases increase confidence because she 

could trust that something was working after it had been shown to be working in the demo. One test 

engineer saw that the possibility to affect the implementation details had enhanced his confidence in 

the software. 

WHEN PROBLEMS ARE FOUND 

Five interviewees thought that problems can be found earlier than without using ATDD and three of 

them had already experienced that. However, four of them were of the opinion that manual testing and 

test engineers’ early involvement were the key issues. Two of them also mentioned that co-operation in 

the early phase can prevent problems from occurring. Four interviewees had not experienced changes, 

even though one of them hoped that problems could be found faster in the future. 

REQUIREMENTS UP-TO-DATENESS 

According to the interviewees, the requirements were more up-to-date than before. Seven of the interviewees 

had seen improvement in the way the requirement specification and feature descriptions were 

updated. One feature owner and specification engineer mentioned that some missing requirements 

were noticed while creating the test cases. Increased communication between the different roles was 

also seen to have helped updating the specifications. One developer and test engineer thought that if 

some of the agreed functionality has to be changed during the development, it may not get updated. 

Two interviewees had not seen any change compared to earlier. 

68

CORRESPONDENCE BETWEEN TEST CASES AND REQUIREMENTS 

Seven of the interviewees saw that the test cases and requirements are more in sync than before. Reasons 

mentioned were cooperation in the test case creation, increased communication, better understanding 

of the feature, and agreement about the details. Two persons thought that the test cases correspond 

better to the requirements at the beginning when the details are agreed. On the other hand, they 

thought that changes during the implementation phase may lead to differences between the test cases 

and requirements. One feature owner/usability expert saw that ATDD does not assure that the test 

cases and requirements are in sync. He also thought that the test cases cannot replace other specifications. 

In his opinion, there is not even a need for that. 

DEVELOPERS’ GOAL 

Both the developers thought that ATDD had made it easier to focus on the essential issues. One of 

them thought the acceptance test cases had also increased the understanding about where his code fits 

into the bigger context. Five persons other than developers thought that the developers’ focus is more 

on the right features. One interviewee hoped the developers’ goal had changed to a direction where the 

feature is implemented, tested and documented, not only implemented. 

DESIGN OF THE SYSTEM 

One developer thought that ATDD had helped in finding the design faster than before. The other developer 

did not have noticed any changes in the design. 

REFACTORING CORRECTNESS 

The developers found that ATDD had not affected on the evaluation of the refactoring correctness yet. 

However, they thought that automated acceptance tests could be used for that later on. 

QUALITY OF THE TEST CASES 

Most of the interviewees were of the opinion that the quality of the test cases had increased. The following 

justifications were presented; test cases are created in co-operation, test cases respond better to 

the requirements, test cases cover the whole workflow, and test cases are more detailed and executed 

more often. Some interviewees could not tell if there had been any changes. One developer thought 

that the acceptance tests done through the graphical user interface had been a huge improvement to the 

user interface testing. He explained that it had been very troublesome to unit test the user interfaces 

extensively. 

69

TEST ENGINEERS’ ROLE 

In general, it was seen that test engineers’ role had broadened due to the use of ATDD. Most of the 

interviewees mentioned that being a part of the detailed planning had been the biggest change. Other 

mentioned changes were increased need to communicate and an increased role in information sharing. 

The test engineers thought the change had been huge. The ability to influence on the details makes the 

work more rewarding. The improved knowledge about expected details makes it possible to test what 

should be done instead of testing what has been done. One feature owner thought that ATDD had 

eased the test engineers’ tasks due to the fact that test cases were defined together. 

Four interviewees had noticed the old confrontation between the developers and test engineers starting 

to decrease due to the increased cooperation. One developer had understood better the difficulties in 

testing which in turn had changed his view about the test engineers. One developer said he was happy 

that the communication is not only happening through defect reports. 

FORMAT OF THE TEST CASES 

All the interviewees thought the test cases are at the moment in a format which is very easy to understand. 

The sentence format was seen very descriptive. However, one developer had noticed some inconsistency 

between the terminology in the test cases and requirements specification. A few persons 

thought that still some domain knowledge is needed to understand the test cases. One test engineer 

thought the format is much more understandable than the test cases created with traditional test automation 

tools. 

LEVEL OF THE ACCEPTANCE TESTS 

The interviewees saw it difficult to define on which level the acceptance test cases should be. One test 

engineer thought that discussion at the beginning of the sprint may help to write proper acceptance test 

cases and to avoid duplicating the same tests in unit testing and acceptance test levels. Two persons 

thought that more detailed test cases would need better test data. One of them also mentioned that it 

will not be possible to test all the combinations. He also doubted the profitability of detailed automated 

test cases due to the increasing maintenance costs. One specification engineer thought that the acceptance 

test cases have probably been detailed enough, but more experiences are needed to become convinced. 

Other interviewees did not have any views on this issue. 

70

EASINESS OF TEST AUTOMATION 

Most of the interviewees did not know if ATDD had affected the easiness of test automation. One test 

engineer thought that ATDD helps to plan which test cases to automate and which not. 

IMPROVEMENT IDEAS 

The interviewees did not have any common opinion on improvement areas. One interviewee thought 

that increasing the routine is the most important thing to concentrate on because the method had been 

used only for a short time. One feature owner saw that in some areas there is a need for more detailed 

level acceptance tests. She also mentioned that there could be a check during the sprint where the acceptance 

test cases are reviewed. 

Both the developers thought that reporting could be improved to shorten even more the feedback loop. 

Adding the acceptance test reports to the build reports was seen as a solution. One of the developers 

thought that the written acceptance test cases could be communicated so that everyone really knows 

those test cases exist. One feature owner/usability specialist was of the opinion that splitting the acceptance 

test cases into smaller parts would help to follow the progress inside the sprint. He felt that 

smaller acceptance tests with sprint-specific reporting could be used to improve the visibility to all project 

members. 

One test engineer saw that there is room for improvement in defining and communicating what is 

tested with manual exploratory tests, automated acceptance tests, and automated unit tests. Two respondents 

thought that more specific process description should be created to ease the process adaptation 

if ATDD would be taken to wider use. It was also seen that the whole organization is needed to 

support the change. 

71

11 ANALYSES OF OBSERVATIONS 

In this chapter the observations made during the study, including the interviews, are analyzed against 

the research questions presented in Chapter 8. 

11.1 Suitability of the Keyword-Driven Test Automation Framework 

with Acceptance Test-Driven Development 

The first research question was: Can the keyword-driven test automation framework be used in the acceptance 

test-driven development This question was divided into two more specific questions and 

those are analyzed first. After the specific questions have been covered, the analysis of the actual research 

question is presented. 

IS IT POSSIBLE TO WRITE THE ACCEPTANCE TESTS BEFORE THE 

IMPLEMENTATION WITH THE KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK 

In the Project the test cases were written in two phases. The initial test cases were written based on the 

information gathered from the planning meetings. Writing the initial test cases took place after the 

planning and those were usually ready before the developers started implementing the features. Therefore, 

it can be said that the initial test cases were written before the implementation started. However, it 

has to be taken into account that the initial test cases were on a high level and the amount of test cases 

was only between 10 and 25 test cases per sprint. In case there had been more test cases or the test 

cases had been on a more detailed level, the result might have been different. 

The second phase, implementing the keywords that were needed to map the initial test cases to the system 

under test, was conducted in parallel with the application development. With some test cases, it 

was not possible to implement all the keywords before the actual implementation details were decided. 

There were also difficulties with implementing the test cases with the inputs and outputs dependent on 

the features under development. There were also problems with implementing the base keywords. This 

prevented finalizing some of the test cases during the sprint. Therefore, only some of the acceptance 

test cases were fully ready before the corresponding feature. It was noticed that the test cases could be 

implemented neither before the development nor before the features were ready. However, the test 

cases were mainly ready soon after the features. 

72

The reasons behind the test case implementation problems had to be analyzed. The first problem was 

that the interface between test cases and application was changing. It was not possible to implement the 

test cases before the interface was defined, which is obvious. However, the test cases were not implemented 

even immediately after the interface was clear. This was due to the fact that different persons 

were implementing the test cases and the features. In case the same person had implemented both, the 

test cases could have been created on time. This problem has also something to do with the tool and 

approach used to automate the test cases. If the interface had been a programmatic interface, the developers 

would have been forced to create the needed code to map the test cases and application. In this 

case, the changes in the interface would have been just one person’s responsibility. Therefore, it can be 

said that the selected interface made this problem possible. To avoid this problem, it is possible to 

move the test case implementation to the developer or improve the communication between the person 

implementing the test cases and the person developing the features. 

The second problem was defining the inputs and outputs beforehand. The interviewed project members 

mentioned that the test data is the biggest challenge in the domain. In the Project, some expected results 

were calculated for verification purposes. However, in some test cases more data was needed. It 

was not seen sensible to count all this data only for the sake of a few test cases. These problems can 

obviously make it hard or even impossible to implement the test cases before developing the features. 

On the other hand, these problems were not tool specific. It is even possible that in some other context 

this kind of problems do not exist or those are at least easier to solve. However, if this kind of problems 

exists it has to be decided case by case whether it is worth of the extra effort to implement the test 

cases in a test-first manner. 

The problems with creating the base keywords were technical. These kinds of problems occur every 

now and then. It was also noticed that it might be hard to implement the system specific base keywords 

without trying those out. There was no specific reason for the problems, and as the knowledge about 

the library increased, the amount of problems was decreasing. And more importantly, all of the problems 

were eventually solved. 

73

IS IT POSSIBLE TO WRITE THE ACCEPTANCE TESTS IN A FORMAT THAT CAN BE 

UNDERSTOOD WITHOUT TECHNICAL COMPETENCE WITH THE KEYWORD-DRIVEN 

TEST AUTOMATION FRAMEWORK 

The acceptance tests were easy to understand by all the project members. The main reason for this was 

that the acceptance test cases were written using plaintext sentences, in other words sentence format 

keywords. However, using the sentence format keywords caused extra cost. One additional abstraction 

layer was needed for the test cases. Whenever some inputs were defined in the test cases those were 

given as arguments for the keyword implementing the sentence format keyword. In some cases this led 

to creating duplicate data. The sentence format keyword was first converted to a user keyword and argument 

or arguments, and then the user keyword was mapped to other keywords. Implementing the 

sentence format keywords took usually only seconds, so the cost was not relevant. This was because 

the keyword-driven test automation framework supported a flexible way of defining user keywords in 

the test data. Without this functionality in the keyword-driven test automation framework, it may be 

harder to use the sentence format keywords and the cost may be higher. Overall the clarity gained with 

the sentence format keywords in the Project was worth of the extra effort. 

However, there are some doubts about the sentence format keywords’ suitability to lower level test 

cases. Especially if the test cases are created in a data-driven manner, and only inputs and expected 

outputs vary. In these cases the overhead caused by the extra abstraction layer may become a burden. 

In these cases it would probably be better to use descriptive keyword names and add some comments 

and column names to increase the readability of the test cases. This is something that needs further research 

because the acceptance test cases created in the Project were mainly on a high level. 

CAN THE KEYWORD-DRIVEN TEST AUTOMATION FRAMEWORK BE USED IN THE 

ACCEPTANCE TEST-DRIVEN DEVELOPMENT 

The answer to this question is ambiguous. It depends on how strictly the acceptance test-driven development 

is specified. It is clear that the acceptance test cases were not implemented before the development. 

In the Project it would have been very unprofitable to implement all the test cases in a testfirst 

manner and probably also impossible. The strict test-first approach with acceptance test cases may 

be hard in any environment, and Crispin (2005) has also noticed more problems than benefits with the 

strict test-first approach. On the other hand, the initial test cases were mainly ready before the development 

as was mentioned earlier. Therefore, the acceptance test cases were driving the development 

by giving a direction and goal for the sprints. One developer’s comment “The acceptance test cases 

really drove the development!” promotes this statement. 

74

However, the test cases created with the keyword-driven test automation framework can be on a very 

high level due to the ability to create abstraction layers to the test cases. This may lead to a situation 

where a high level use case is converted to high level test cases, and therefore the details are not 

agreed, and the benefits of ATDD evade. In the Project some of the test cases were created on such a 

high level that the problems were noticed only when the test cases were implemented. At least one usability 

problem was noticed while implementing the test cases. This could have been noticed already in 

the planning phase with more detailed test cases. On the other hand, the usability problem was solved 

during the sprint and without ATDD this problem would have been noticed and corrected much later. 

Also some misunderstandings noticed at the end of the April sprint could have been avoided with more 

detailed test cases. 

It was also observed that some of the agreed acceptance test cases were not driving the developers 

work as well as they could have. With some features the test automation engineer found problems that 

could have been avoided if the developers had been following the test cases more strictly. These problems 

were not considerable, but some extra implementation was needed to fix them. These situations 

were possible because the test automation engineer implemented the test cases instead of the developers. 

There were two reasons why the test automation engineer was implementing the test cases. First 

the keyword-driven test automation framework made it possible to implement the test cases with the 

keywords without programming. The other reason was the interface used to access the Product from 

the acceptance test cases. Because there was a test library to access the graphical user interface of the 

Product, it was possible to write the test cases without the developers’ continuous involvement. With 

tools like FIT (Framework for Integrated Test) there is usually a need for implementing some feature 

specific code between the test cases and application. Therefore, developers are enforced to work 

closely with the test cases. However, with the keyword-driven test automation framework this involvement 

is not forced by the tool. 

Overall, it seems that the keyword-driven test automation framework can be used in the acceptance 

test-driven development if the strict test-first approach is not required. However, there are a few things 

that are good to keep in mind if the keyword-driven test automation framework is used with ATDD. 

Creating only high level test cases should be avoided because those will not drive the discussion to the 

details which were mentioned to be the biggest benefit of ATDD. If different persons are creating the 

test cases and implementing the application, the communication between these two parties has to be 

assured. 

75

11.2 Use of the Keyword-Driven Test Automation Framework with 

Acceptance Test-Driven Development 

The second research question was: How is the keyword-driven test automation framework used in the 

acceptance test-driven development in the project under study This question was divided into acceptance 

test case planning, implementation, execution, and reporting. Chapter 10 already answers to 

these questions, but into this chapter sprints are summarized and analyzed. 

HOW, WHEN AND BY WHOM THE ACCEPTANCE TEST CASES WERE PLANNED 

There was no formal procedure for defining the acceptance test cases. The test cases were rather defined 

on the situation basis. However, in all the cases the implementation details were discussed in a 

group containing at least a developer, a feature owner, a usability specialist, and a test engineer, and 

the discussion was noted down to different sketches and notepads. These discussions took place usually 

soon after the sprint planning and always before the implementation. After the meetings it was 

mainly the test automation engineer’s task to convert the acceptance test cases to the tabular format 

used with Robot. In the April sprint the acceptance test cases were updated by a group including a feature 

owner, a specification engineer and a test automation engineer. 

Writing the test cases and details quickly down in the planning meetings was noticed to be a good 

choice. The discussion was not hindered by someone writing the test cases, but all the participants 

were really taking part in the conversation. However, there was one drawback with this approach. In a 

few meetings, some of the details needed to implement the test cases were not discussed. This was because 

the issues were not handled systematically. Because these details were straightened out from individual 

persons, those were not fully understood by the whole team. It was noticed that emailing and 

having the test cases in version control system was not enough. Therefore, it would have been beneficial 

to have some kind of a meeting after the test cases were written to check and clarify all the details 

to all the team members. This was mentioned also by two team members in the final interviews. A 

similar problem was noticed in the April sprint when the details were updated without the developers. 

HOW, WHEN AND BY WHOM THE ACCEPTANCE TEST CASES WERE IMPLEMENTED 

The acceptance test cases were implemented using the sentence format keywords from the February 

sprint onwards in a similar manner as was explained on example in Chapter 7. The test case implementation 

took place parallel with the feature implementation. The test cases were implemented mainly by 

the test automation engineer, but also a test engineer and a developer implemented some of the test 

cases. 

76

In addition to the challenges presented earlier in this chapter, there were challenges in keeping the test 

cases up to date in the February sprint. This problem could have been avoided if the details had been 

agreed on a more detailed level in the planning meeting. On the other hand, some of the changes were 

made based on the feedback gained from the meeting arranged with the specialist. These changes 

would have been very hard to foresee. However, updating the test cases was quite easy because the test 

cases were created with keywords. 

The biggest challenge compared to the simple example presented in Chapter 7 was the increase in test 

execution time. Starting the application and importing the network data took a considerably long time. 

Those actions were not wanted to be executed in every test case as the total test execution time would 

have been multiplied by the amount of test cases. It was important to keep the execution time short as 

it had an effect on the duration of the test case implementation, and the feedback time in the acceptance 

test execution system. 

HOW, WHEN AND BY WHOM THE ACCEPTANCE TEST CASES WERE EXECUTED 

The acceptance test cases were executed in three ways. During the test case implementation the test 

automation engineer executed the test cases on his workstation. The purpose was to verify that the test 

cases were implemented correctly. With some test cases this meant that the features were already implemented 

at this stage. Some of the test cases were executed on the developers’ workstations during 

the development by a test automation engineer and the developers. All the test cases were added to the 

acceptance test execution environment. At the beginning the test cases were added to the environment 

at the end of each sprint. However, in the last sprint the test cases were added to the acceptance test 

execution environment immediately after the initial versions were created. In the acceptance test environment 

the test cases were automatically executed whenever there were new builds available. 

As already was mentioned, the problems in the acceptance test implementation prevented the developers 

from evaluating whether they were ready or not by running the acceptance test cases. There were 

also two other reasons which made it hard for the developers to evaluate their work readiness with 

automated acceptance test cases. First of all, some of the test cases tested the workflow, and therefore 

those test cases were dependent on each other. That is why the test cases in a late phase of the workflow 

could not be tested before the features preceding them were working. Another reason was that the 

single test cases tested multiple developers’ work, and therefore the test cases were not passing until all 

the parts the test case was testing were ready. 

77

Many of the mentioned problems derive from the level of the test cases. When the acceptance test 

cases are on a high level, it is inevitable that those test cases test multiple features. This in turn will 

lead into the problems mentioned earlier. Avoiding the dependency between steps is hard in the workflow 

test cases. Even though these problems exist, it is obvious that the end-to-end acceptance test 

cases are needed. One possible solution to this problem is to divide the acceptance test cases more 

strictly into two categories. Higher level test cases could be traditional system level test cases containing 

end-to-end test cases. The feature specific test cases could be then integration and system level test 

cases concentrating only on one feature. The feature specific test cases could be executed by developers 

to evaluate the features readiness. Of course this will not remove the problems that some of the features 

can not be tested before pre conditional features are ready. This would also make it easier for the 

developers to implement the acceptance test cases. The higher level test cases could then still be the 

testers’ responsibility as was the case in the Project. 

HOW AND BY WHOM THE ACCEPTANCE TEST RESULTS WERE REPORTED 

The problems found during the test case implementation were told to the developers. The results of test 

case execution in the acceptance test execution environment were visible to all the project members 

through an information radiator. The problems found in the automated test execution were passed on to 

the developers by the test automation team members after having investigated the problems. However, 

this investigation was lengthening the feedback loop as the testers were not always available. In case 

the test cases would have been implemented by the developers, the feedback loop could have been 

shortened. The developers thought that the feedback loop should be shortened even though they had 

experienced that the feedback loop had already been cut radically. 

11.3 Benefits, Challenges and Drawbacks of Acceptance Test- 

Driven Development with Keyword-Driven Test Automation 

Framework 

The third research question was: Does the acceptance test-driven development with keyword-driven 

test automation framework provide any benefits What are the challenges and drawbacks Based on 

the experiences presented in Chapter 10.6 and the expected benefits and challenges presented in Chapter 

4.3, the answers to these questions are analyzed. 

78

BENEFITS 

The project members noticed many benefits in the use of ATDD. This was notable because the research 

period lasted only four months. The people who worked closely with the acceptance test cases 

had noticed much more benefits than those who were less involved in the use of ATDD. The role a 

person represented had much less influence on the experienced benefits than the degree of involvement. 

Of course, there were different viewpoints based on the role to some of the issues, but the main 

benefits were perceived similarly by different roles. The same benefits were also notices by the researcher 

while working in the Project. 

While the research was conducted, there were some changes in the Project as was mentioned in Chapter 

10.1. Not all of them were related to taking ATDD into use. The changes can be categorized into 

three main changes; taking test automation into use, a change towards agile testing and of course taking 

ATDD into use. The relations and effects of these changes on the experienced benefits had to be 

analyzed. The analysis is presented next. 

The main relations between the different benefits and reasoning are represented in Figure 30. As can 

be seen in the figure, quite many relations between the benefits can be found. The figure is only a simplified 

view of the benefits and their relations, but it is used as a basis of this analysis. 

79

Figure 30: 

The relations between the changes and benefits 

80

One of the sensed changes was the increased communication. As was mentioned in Chapter 4.1, agile 

testing emphasizes face-to-face communication. When ATDD is in use, the work needed to create the 

test cases forces to communication. The perceived increase in the communication can also be dependent 

on the tester as some people communicate more actively than others. Therefore, it is impossible to 

say how much of the increased communication was due to the use of ATDD and how much due to the 

other changes. The test engineers’ early involvement can be seen as a consequence of taking the agile 

testing into use. On the other hand, the use of ATDD forced the testers to take a part in an earlier phase 

of the development as the testers participated in the detailed planning. Therefore, most of the benefits 

gained due to the testers’ earlier participation were obtained because of the use of ATDD (see Figure 

30). Co-operation in acceptance test case creation is also something that is a part of the agile testing. 

However, in the Project it was due to the use of ATDD that the acceptance test cases were created with 

the feature owners. Therefore, it is hard to say whether the benefits could be gained without the use of 

ATDD. Anyway, the use of ATDD assures that acceptance test cases are created in co-operation and 

therefore the benefits relative to it are gained. 

The only practice that was taken into use purely due to the use of ATDD was detailed planning done 

by the feature owners, developers and testers (bolded in Figure 30). This was one of the biggest reasons 

leading to an improved common understanding about the details, which was seen in the Project as 

the biggest benefit of the use of ATDD. Crispin (2005) also stated that the cooperation between the 

groups before development was the biggest benefit of ATDD. The need to create the test cases forces 

to discussion. Of course, the detailed planning could be done without ATDD and some of the mentioned 

benefits could still be gained. However, as can be seen in Figure 30, the benefits are sums of 

multiple factors, and it is hard to say which benefits would be gained if only the detailed planning 

would be used. As mentioned earlier, an increased common understanding and benefits following from 

that can be missed if the test cases are on a too high level and the planning is not detailed enough. 

The test automation affected only a few observed benefits as can be seen in Figure 30. This suggests 

that the tool used in the ATDD is not relevant as most of the benefits were gained from well-timed 

planning done by people working in the different roles. However, the role of the test automation providing 

the feedback and helping the regression testing should not be undervalued. The benefits of automated 

regression testing were probably not broadly highlighted in the research because of the short 

research period. With a longer follow-up period, this benefit could have been greater. 

81

The increased common understanding, the biggest benefit of the use of ATDD, does not provide additional 

value as such. However, the increased understanding provides the “real” benefits. The most 

valuable benefits of the use of ATDD are therefore the decreased risk of building incorrect software 

and the increased development efficiency as the problems can be solved with a smaller effort, and the 

features are done right at the first time. The change in the tester’s role is also quite remarkable. 

The use of ATDD affects also to the software quality. As the risk of building incorrect software is decreased, 

it is more likely that the created features will satisfy the end user’s needs. A better understanding, 

improved test cases, and fact that the problems are found earlier should also improve the possibility 

to find the defects with a significant impact. However, this remains to be seen. The test automation 

as a part of ATDD provides a certain level of quality. As the regression testing is done automatically, 

the testers hopefully have more time to explore the system and find defects. In the Project the nonfunctional 

testing was not taken into account when the acceptance test cases were created. However, it 

was discussed to be one area to expand the use of ATDD to. Therefore, the non-functional qualities 

were not improved by the use of ATDD. 

The benefits mentioned are at least partially gained because the use of ATDD. If the agile testing, test 

automation, and increased communication are removed from the relations, none of the real benefits 

evade. Of course, that may influence the magnitude of the benefit. 

BENEFITS NOT PERCEIVED 

There were also areas where benefits were not noticed even though those areas were mentioned as possible 

benefit areas in the literature (see Chapter 4.3). Possible reasons for why the benefits were not 

gained are analyzed here. 

82

Development Status Was Not More Visible 

There were no changes in the development status visibility even though the acceptance test report was 

available to everyone through the information radiator and the web page. At the beginning of the research 

the test cases were added to the acceptance test execution environment at the end of each sprint. 

Therefore, it was clear that the development status could not be followed inside the sprints. At the last 

sprint of the research period the acceptance test cases were added to the acceptance test execution environment 

at the beginning of the sprint. However, this did not help as the test cases were failing most of 

the sprint. There were three reasons for that. The test cases were high level test cases testing multiple 

parts of the Product in one test case. Therefore, even the development team was able to finish some 

single features the test cases were still failing. Another reason was that the features were ready at a 

very late phase of the sprint if even then. Therefore, the test cases were actually describing the development 

status even though the people did not see failing tests as progress indicators. The third reason 

was that not all of the acceptance test cases were ready at the same time the features were. The reasons 

behind this problem were analyzed in Chapter 11.1. 

The development status visibility could be improved by dividing the development status follow-up into 

a project level and a sprint level progress. The division to the higher level and feature level test cases 

presented in Chapter 11.2 could be exploited. The higher level test cases could be used to indicate 

which workflows are working, and therefore those could provide the project level status. The feature 

level test cases could be used to follow-up the progress inside the sprints. 

Requirements Were Not Defined More Cost-effectively 

The test cases were not substituting the requirement specifications in the Project. Therefore the requirements 

and test cases were not created more cost-effectively. One clear reason was that the Project 

had been started before ATDD was tried out and a requirement specification was already created. Even 

if ATDD had been started at the beginning of the Project, the requirement specification would probably 

still have been created. One interviewed person also mentioned that there is no need for replacing 

the requirements with the test cases. On the other hand, keeping duplicate data up-to-date can be seen 

as a burden. 

83

No Remarkable Changes to System Design 

ATDD did not cause remarkable changes to the system design even though one developer thought that 

he had found the design faster in some cases. A relatively short research period may be one reason why 

the changes were not noticed. However, there might be other reasons as well. Reppert (2004) reported 

that remarkable improvements in system design were seen as ATDD was used in some project. It may 

be that this improvement could not be noticed because the interface used to access the system from the 

test cases was different. As was mentioned in Chapter 4.3 the acceptance test cases usually bypass the 

graphical user interface and use straightly the internal structures. This was not the case as the test cases 

used the graphical user interface to access the system under test. Therefore, in the Project, there was no 

need to create test code which would be interacting straightly the internal structures. This maybe was 

the reason why the developers did not notice a significant change. So it seems that the interface used to 

access the system under test affects whether the system design is improved or not. 

Acceptance Tests Were Not Used To Verify Refactoring Correctness 

Developers in the Project thought that the acceptance test cases created with ATDD could be used to 

evaluate the refactoring correctness even though they had not done that yet. A longer research period is 

needed to assess properly the acceptance test cases usefulness when evaluating the refactoring correctness. 

However, it is hard to see any reasons why the acceptance test cases created with the keyworddriven 

test automation framework could not be used to verify refactoring correctness. Probably the 

coverage and level of the acceptance test cases have a bigger influence than the tool used to create the 

acceptance test cases. 

84

CHALLENGES 

As was mentioned in Chapter 10.6, the main challenge in the Project’s environment was proper test 

data. This however, was a domain specific testing problem. However, it was seen to affect the creation 

of automated tests more than manual testing. There were also other challenges in automating the test 

cases. The base keyword creation problems were described in Chapter 11.1. There were also components 

in the application which could not be accessed from the automated test cases as was mentioned in 

Chapters 10.2 and 10.3. As was already mentioned in Chapter 5.1, it is not an easy task to automate 

testing. Test automation was also seen as one of the biggest challenges in the use of ATDD by Crispin 

(2005) (Chapter 4.3). The presented test automation challenges were mainly general test automation 

challenges. Some of these challenges are relative to the selected interface for accessing the application. 

However, none of them were keyword-driven test automation specific. The use of ATDD and agile 

testing helped to solve some of the problems easier than that could be done in a more traditional environment. 

It was easier to add the needed testability hooks to the Product because the implementations 

were done in parallel. 

As was mentioned, test automation is a part of ATDD, but the biggest benefits can be achieved even 

though not all of the test cases could be automated. However, this leads to a need to handle the manual 

regression testing. Therefore, it is not advisable to be immediately satisfied with manual tests. The importance 

of the automated regression tests in iterative software development should not be forgotten. 

Of course, the scale of test automation has to be decided based on the context. 

The second challenge mentioned in Chapter 4.3 was writing the tests before development. That was 

also noticed in the Project as was presented in Chapter 11.1. Crispin (2005) mentioned the problem 

was that there was no time to write the test cases before development. However, in the Project the 

problems were more test data and context specific. Time could have been a problem in case the amount 

of detailed level test cases would have been higher. 

The third challenge was the right level of test cases. Crispin (2005) noticed that when many test cases 

are written beforehand, the test cases can cause more confusion than help to understand the requirements. 

It was noticed in the Project that there would have been a need for test cases on multiple levels 

as was mentioned in Chapter 11.2. Including also non-functional testing to a part of the acceptance test 

cases in the future was seen beneficial by two interviewees. This would even widen the goal of the acceptance 

test cases. This challenge with the right level of test cases derives probably from the wide 

definition of the acceptance testing and the possibility to create test cases on multiple test levels simultaneously. 

85

One more challenge was noticed in the use of the keyword-driven test automation framework. As there 

was not an intelligent development environment for editing the test case files and resource files, the 

test data management took some time. Also some developers found it difficult to find all the keywords 

that were used in the test cases and user keywords because those were defined in multiple files. These 

problems with the test data management can even be bigger if there are more people implementing the 

test cases. 

DRAWBACKS 

Interviewees mentioned only a few drawbacks. One interviewee mentioned that writing the test case 

took time and it was a drawback. As there were more people defining the test cases, it took more resources. 

On the other hand, the first versions of the test cases were written by a test automation engineer, 

and therefore only the definitions were done with a bigger group. Two interviewees thought that 

updating the test cases can be seen as rework and therefore as a drawback. This drawback was also 

noticed in the February sprint. The reason was mainly that the details were not agreed well enough. 

However, the time used to do the changes was not remarkable. In all, it seems that the benefits gained 

from the use of ATDD exceed clearly the drawbacks. 

86

11.4 Good Practices 

Good practices are summarized based on literature, the observations, and the analysis of the observations, 

and those are shown in Table 3. These practices can be applied when acceptance test-driven development 

is used. 

PRACTICE 

Acceptance test cases are created also on a detailed 

level. 

Use case/workflow test cases are discussed with 

the whole team at the beginning of the sprint. 

Detailed level test cases are discussed in small 

groups. 

Test cases are written to the formal format after 

the planning meetings. 

Test cases are checked by the team. 

EXPLANATION 

If the acceptance test cases are created on a too 

high level, there is no need to clarify the details, 

and those remain unclear. However, creating too 

many detailed test cases at the beginning of the 

sprint may be confusing. 

It is important that all team members understand 

the big picture, and high level test cases can be 

used to clarify that. 

It is obviously not productive to plan all the details 

with the whole team. Therefore, the detailed 

test cases are created in small groups, where different 

roles are represented. 

During the planning meetings, the test cases can 

be quickly noted down. The purpose of the meetings 

is to find the needed details and create a 

common understanding about those details. The 

test cases can be written to a proper format after 

the meeting. 

Because the test cases are created based on the 

notes, it is good to check the test cases with the 

people who planned those. This helps to find ambiguities 

and to verify that all the people have 

understood the details similarly. 

87

The test-first approach is not mandatory. 

Initial test cases are added to the test execution 

environment. 

Different kinds of acceptance test cases are created. 

There can be situations where it is not profitable 

to implement the test cases in the test-first manner. 

However, plan and implement the test cases 

on some level before implementing the feature. 

Even the test case planning can help to understand 

the wanted features. 

When the test cases are executed often and there 

are detailed level test cases, the development progress 

can be followed during the sprints. With the 

high level test cases the development progress can 

be followed on the project level. 

The acceptance test cases should cover the functional 

and non-functional requirements. Therefore, 

there is a need to create different types of 

test cases. Functional test cases can even be on 

different testing levels. 

Table 3: 

Good practices 

88

12 DISCUSSION AND CONCLUSIONS 

This research was conducted by a comprehensive literature review, action research based observations 

of the use of acceptance test-driven development with the keyword-driven test automation framework 

in one software development project, and interviewing members of the project in question. Results of 

the research were analyzed by reflecting them to the relevant literature and to earlier studies. Conclusions 

based on the analysis are covered in this chapter. 

12.1 Researcher’s Experience 

The researcher’s background and experience at the field of software testing are described briefly, so 

that the reader can make some assumptions about the researcher’s competence. The researcher had four 

years of experience on software testing and test automation when the research was started. The researcher 

was a part of a team that had developed the keyword-driven test automation framework used 

in the Project and called as Robot. The Robot development had lasted over a year when the research 

started. The researcher had gained a lot of experience on Robot by using it for testing Robot itself. 

12.2 Main Conclusions 

It can be said that ATDD can provide many benefits, and it is a radical change to the traditional acceptance 

testing. ATDD together with agile testing brings testing to the core of the development opposed 

to the traditional way where the main part of the testing insufficiently takes place at the end of the software 

development. This is a positive feature which also improves meaningfulness of the work as all 

team members can take part in planning quality software. 

According to the results gained from the study, ATDD also helps to develop more efficiently software 

that corresponds better to the requirements. This is mainly due to the improved common understanding 

within the team about the details of the software’s features. So it seems that the use of ATDD is really 

profitable. 

It can be seen that the tool used to automate the test cases in ATDD does not play a crucial role as the 

biggest benefits noticed based on the interviews were gained from the process. However, the level on 

which the acceptance test cases are created have an influence on the gained benefits, and if the test 

cases are done on a too high level, the noticed benefits evade. 

89

Of course, ATDD is not a silver bullet, and challenges exist. As the acceptance testing should cover 

both non-functional testing and functional testing excluding only unit testing, there is wide a area to 

test. Finding the right level of tests is unquestionably hard. However, the cooperation between the team 

and the customer can ease that journey. 

It was acknowledged that the benefits were gained even though the acceptance test cases were not created 

before the development as pure ATDD requires. This leads to the question if ATDD should be 

defined so that there is no strict requirement that the test cases should be created in a test-first manner. 

The discussion about the test cases is anyway driving the development as the goal of the team is to get 

the acceptance test cases passing. 

Based on the work, the way of thinking is that ATDD can provide a clear process to arrange the testing 

in the iterative development inside the iteration, and in consequence, establish a prerequisite of a successful 

testing. This can be seen very beneficial because clear guidance of the process of the agile testing, 

especially in Scrum, is missing. The importance of this process is emphasized in environments 

where the transition from traditional software development to agile software development is taking 

place. 

12.3 Validity 

There is no one clear definition what validity means in qualitative research (Golafshan 2003, Trochim 

2006, Flick 2006). However, Flick (2006) summarizes that validity answers to the question whether 

the researchers see what they think they see. Flick (2006) also suggests using triangulation as a method 

for evaluating the qualitative research. Based on that suggestion, the validity of this research is evaluated 

using data and investigator triangulation. Theory and methodological triangulation are not used 

because of the practical nature and predefined scope of the research. Also other matters affecting the 

results of this thesis are considered. 

The validity of the data was ensured by collecting data with different data collection methods listed in 

Chapter 9.3. Data was also collected during the whole research, increasing the validity of the data. To 

prevent unbalanced view, the researcher interviewed and observed people in different roles. Investigator 

triangulation means using more than one researcher to detect and prevent biases resulting from the 

researcher as a person. It was not possible to use any other interviewer or observer in this research. 

From this point of view the validity of the research is questionable. 

90

The high involvement in the Project and especially the help the Project gained from the researcher during 

the study affects the validity of this research. Kock (2003) mentions that in action research the researcher’s 

actions may strongly bias the results. The researcher became aware of this possibility in the 

beginning of the research, and it was kept in mind during the Project and especially during the analyzing 

phase. 

In the addition the background, the know-how and opinions of the researcher are possible sources of 

error. This is mainly due to the fact that this research was a qualitative research, and for example, the 

interviews were used as a research method. Therefore, the content and form of the interview questions 

can reflect researcher’s own background, knowledge, and views. As a part of the project team, there is 

no possibility for the researcher to be completely objective. However, it can be discussed whether this 

subjectivity has a negative impact on the research or not. 

Interpreting results is not completely an objective action. Therefore, it might be that another researcher 

with a different background may have interpreted the gained results in a slightly different way. Therefore, 

it must be kept in mind, that for example the conclusions are always a somewhat subjective view 

of the reality. However, it can be argued that the results gained from the research would have been 

similar even though the research had been carried out by some other researcher. 

Also the fact that there were other changes in the Project such as a change towards agile testing may 

have caused problems in understanding what actually caused the perceived benefits. However, as was 

noticed in the analysis of the research results, some of the changes and benefits originate directly from 

the use of ATDD. To be sure about the benefits, the subject should be studied for a longer period of 

time than what was done in this research. However, the main conclusions could be drawn also based 

on the period of time used in the research. The results of earlier studies and relevant literature confirm 

the noticed research results as they were mainly in line with each other. 

It must be kept in mind that the results presented in this thesis are only based on one software development 

project and more specifically on one team’s work. Every project has its own context specific 

features. These facts and of course the structure of the team have an influence on how ATDD is used 

and how it is adapted as a part of the development process. Therefore, the results can in some extent 

vary according to the project in question, but the main benefits noticed should be possible to gain also 

in other projects. 

91

The test automation framework Robot used in this research is not open sourced, which makes it harder 

to introduce the test automation concept used in this research to other projects. However, there is a 

possibility that the keyword-driven test automation framework used in the study will be open sourced. 

12.4 Evaluation of the Thesis 

The first goal of this thesis was to investigate whether the keyword-driven test automation framework 

could be used with acceptance test-driven development. It can be said that the goal was achieved. Suitability 

of keyword-driven test automation was analyzed extensively and based on the analysis the outcome 

was that it is possible to use the keyword-driven test automation framework with ATDD. It was 

also noticed that some limitations exist which may in turn prevent the finalization of the test cases 

prior to feature implementation. 

One aim was to describe the use of keyword-driven test automation framework with ATDD in a way 

that enables other projects to experiment the approach with similar tools. How this goal is met remains 

to be seen when the results of this thesis will possible be used in other real-world software development 

projects. However the aim was to describe both the fictive example (Chapter 7) and case study 

(Chapter 10) in such a way that they would be widely understood. 

The last goal was to study what are the pros and cons of the acceptance test driven development when 

it is used with the keyword driven test automation framework. Even though the research lasted only 

four months, plenty of results were collected. Based on these results it was possible to see clear benefits, 

some challenges, and a few drawbacks. In this sense, the study was successful. 

12.5 Further Research Areas 

Because this thesis is one of the first studies focusing on acceptance test-driven development with the 

keyword-driven test automation framework, there is need for a more extensive study of this kind of 

approach in other projects and projects that use different kind of iterative processes. Also a longer research 

period would be beneficial as the changes due to the use of ATDD are wide-ranging, and the 

adaptation and adjusting the process takes time. Full-scale use of ATDD would make it possible to 

study better the effects of the test automation framework, and the running tested features metric’s suitability 

with ATDD. 

92

As was noticed, the level of acceptance tests affects the benefits of ATDD. It was also noticed that 

there is a need for acceptance test cases on different levels and that it is difficult to create test cases on 

a right level. At least the following areas need more study to understand which kind of acceptance tests 

would be beneficial to create: 

• How do the different levels of test cases affect the different aspects of ATDD 

• How do the different levels of acceptance tests affect measuring the project, and how does 

it affect the use of running tested features metric 

• How could the lower level acceptance tests created with the keyword-driven test automation 

framework defined in a format that can be easily understood 

• What is the relationship between the unit testing and the lower level acceptance testing 

Further research is also needed to clarify which of the benefits mentioned in this research are actually 

direct results of ATDD. Therefore, the relationships between the benefits and the source of each benefit 

should be studied. 

One issue that was not studied in this research was the ability to substitute the requirement specifications 

with the acceptance test cases. As one interviewee mentioned, there is no need to replace the requirements 

with the acceptance test cases. However, some of the details in the requirement specifications 

could be defined with test cases to avoid maintaining duplicate data. This could lead to linking 

the high level requirements to the acceptance test cases. This would be an interesting area for further 

study. 

Altogether, it can be said that this thesis is a good opening for discussion in this field of software testing. 

93

BIBLIOGRAPHY 

Abrahamsson, Pekka, Outi Salo, Jussi Ronkainen & Juhani Warsta (2002). Agile Software 

Development Methods: Review and Analysis. VTT Publications 478, VTT, Finland. 

 

Agile Advice (2005). Information Radiators, May 10, 2005. 

May 14th, 2007 

Andersson, Johan, Geoff Bache & Peter Sutton (2003). XP with Acceptance Test-Driven 

Development: A Rewrite Project for a Resource Optimization System. Lecture Notes in Computer 

Science, Volume 2675/2003, Extreme Programming and Agile Processes in Software Engineering, 

180-188, Springer Berlin/Heidelberg. 

 

Astels, David (2003). Test-Driven Development: A Practical Guide. 562, Prentice Hall PTR, United 

States of America. 

Avison, David, Francis Lau, Michael Myers & Peter Axel Nielsen (1999). Action Research: To make 

academic research relevant, researchers should try out their theories with practitioners in real situations 

and real organizations. COMMUNICATIONS OF THE ACM, January 1999/Vol. 42, No. 1, 94-97. 

Babüroglu, Oguz N. & Ib Ravn. Normative Action Research (1992). Organization Studies Vol. 13, No. 

1, 1992, 19-34. 

Bach, James (2003a). Agile test automation. 

March 31st, 2007 

Bach, James (2003b). Exploratory Testing Explained v.1.3 4/16/03. 

March 31st, 2007 

Beck, Kent (2000). Extreme Programming Explained: Embrace Change. Third Print, 190, Addison- 

Wesley, Reading (MA). 

94

Beck, Kent, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, 

James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick, Robert C. 

Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland & Dave Thomas (2001a). Manifesto for Agile 

Software Development. December 5th, 2006 

Beck, Kent, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, 

James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick, Robert C. 

Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland & Dave Thomas (2001b). Principles behind the 

Agile Manifesto. March 31st, 2007 

Beck, Kent (2003). Test-Driven Development By Example. 240, Addison-Wesley. 

Beizer, Boris (1990). Software testing techniques. Second Edition, 550, Van Nostrand Reinhold, New 

York. 

Burnstein, Ilene (2003). Practical Software Testing: a process-oriented approach. 709, Springer, New 

York. 

Buwalda, Hans, Dennis Janssen & Iris Pinkster (2002). Integrated Test Design and Automation: Using 

the TestFrame Method. 242, Addison Wesley, Bibbles Ltd, Guildford and King’s Lynn, Great Britain. 

Cohn, Mike (2004). User Stories Applied: For Agile Software Development. 268, Addison-Wesley. 

Cohn, Mike (2007). User Stories, Agile Planning and Estimating. Internal Seminar, March 24th, 2007. 

Control Chaos (2006a). What is Scrum September 26th, 2006 

Control Chaos (2006b). XP@Scrum. September 26th, 

2006 

Craig, Rick D. & Stefan P. Jaskiel (2002). Systematic Software Testing. 536, Artech House Publishers, 

Boston. 

Crispin, Lisa, Tip House & Wade Carol (2002). The Need for Speed: Automating Acceptance Testing 

in an eXtreme Programming Environment. Upgrade, The European Online Magazine for the IT 

Professional Vol III, No. 2, April 2002, 11-17. 

95

Crispin, Lisa & Tip House (2005). Testing Extreme Programming. Second Print, 306, Addison- 

Wesley. 

Crispin, Lisa (2005). Using Customer Tests to Drive Development. METHODS & TOOLS, Global 

knowledge source for software development professionals, Summer 2005, Volume 13, number 2, 12- 

17. 

Cruise Control (2006). Cruise Control, Continuous Integration Toolkit. 

Sebtember 23rd, 2006 

Dustin, Elfriede, Jeff Rashka & John Paul (1999). Automated Software Testing: introduction, 

management, and performance. 575, Addison-Wesley. 

Fenton, Norman E (1996). Software metrics : a rigorous and practical approach. Second Edition, 638, 

International Thomson Computer Press, London. 

Fewster, Mark & Dorothy Graham (1999). Software Test Automation, Effective use of test execution 

tools. 574, Addison-Wesley. 

Flick, Uwe (2006). An Introduction to Qualitative Research. Third Edition, 443, SAGE, London. 

Golafshani, Nahid (2003). Understanding Realibility and Validity in Qualitative Research. The 

Qualitative Report Vol. 8, Number 4, December 2003, 597-607. 

Hendrickson, Elisabeth (2006). Agile QA/Testing. 

April 10th, 2007 

IEEE Std 829-1983. IEEE Standard for Software Test Documentation. Institute of Electrical and 

Electronics Engineers, Inc., 1983. 

IEEE Std 1008-1987. IEEE standard for Software Unit Testing. Institute of Electrical and Electronics 

Engineers, Inc., 1987. 

IEEE Std 610.12-1990. IEEE standard glossary of software engineering terminology. Institute of 

Electrical and Electronics Engineers, Inc., 1990. 

96

ISO Std 9000-2005. Quality management systems - Fundamentals and vocabulary. ISO Properties, 

Inc., 2005 

ISO/IEC Std 9126-1:2001. Software engineering -- Product quality -- Part 1: Quality model. ISO 

Properties, Inc., 2001 

ISTQB (2006). Standard glossary of terms used in Software Testing Version 1.2 (dd. June, 4th 2006). 


Itkonen, Juha, Kristian Rautiainen and Casper Lassenius (2005). Toward an Understanding of Quality 

Assurance in Agile Software Development. International Journal of Agile Manufacturing, Vol. 8, No. 

2, 39-49. 

Jeffries, Ronald E. (1999). Extreme Testing, Why aggressive software development calls for radical 

testing efforts. Software Testing & Quality Engineering, March/April 1999, 23-26. 

 

Jeffries, Ron, Ann Andersson & Chet Hendrickson (2001). Extreme Programming Installed. 265, 

Addison-Wesley, Boston. 

Jeffries, Ron (2004). A Metric Leading to Agility 06/14/2004. 

November 18th, 2006 

Jeffries, Ron (2006). Automating “All” Tests 05/25/2006. 


Kaner, Cem, Jack Falk & Quoc Nguyen (1999). Testing Computer Software. Second Edition, 480, 

Wiley, New York. 

Kaner, Cem, James Bach, Bret Pettichord, Brian Marick, Alan Myrvold, Ross Collard, Johanna 

Rothman, Christopher Denardis, Marge Farrell, Noel Nyman, Karen Johnson, Jane Stepak, Erick 

Griffin, Patricia A. McQuaid, Stale Amland, Sam Guckenheimer, Paul Szymkowiak, Andy Tinkham, 

Pat McGee & Alan A. Jorgensen (2001a). The Seven Basic Principles of the Context-Driven School. 

December 19th, 2006 

Kaner, Cem, James Bach & Bret Pettichord (2001b). Lessons Learned in Software Testing: A Context- 

Driven Approach. 286, John Wiley & Sons, Inc., New York. 

97

Kaner, Cem (2003). The Role of Testers in XP. 


Kit, Edward (1999). Integrated, effective test design and automation. Software Development, February 

1999, 27–41. 

Kock, Ned (2003). Action Research: Lessons Learned From a Multi-Iteration Study of Computer- 

Mediated Communication in Groups. IEEE Transactions on Professional Communication, Vol. 46, No. 

2, June 2003, 105-128. 

Larman, Craig (2004). Agile & Iterative Development: A Manager’s Guide. 342, Addison-Wesley. 

Larman, Craig (2006). Introduction to Agile & Iterative Development. Internal Seminar, December 

14th, 2006. 

Laukkanen, Pekka (2006). Data-Driven and Keyword-Driven Test Automation Frameworks. 98, 

Master’s Thesis, Software Business and Engineering Institute, Department of Computer Science and 

Engineering, Helsinki University of Technology. 

Mar, Kane & Ken Schwaber (2002). Scrum with XP. 

October 4th, 2006 

Marick, Brian (2001). Agile Methods and Agile Testing. 


Marick, Brian (2004). Agile Testing Directions. 


Meszaros, Gerard (2003). Agile regression testing using record & playback. Conference on Object 

Oriented Programming Systems Languages and Applications, Companion of the 18th annual ACM 

SIGPLAN conference on Object-oriented programming, systems, languages, and applications, 353– 

360, ACM Press, New York. 

Miller, Roy W. & Christopher T. Collins (2001). Acceptance testing. XP Universe, 2001. 


98

Mosley, Daniel J. & Bruce A. Posey (2002). Just Enough Software Test Automation. 260, Prentice Hall 

PTR, Upper Saddle River, New Jersey, USA. 

Mugridge, Rick & Ward Cunningham (2005). Fit for Developing Software: Framework for Integrated 

Tests. 355, Prentice Hall PTR, Westford, Massachusetts. 

Nagle, Carl J. (2007). Test Automation Frameworks. 


Patton, Ron (2000). Software Testing. 389, SAMS, United States of America. 

Pol, Martin (2002). Software testing: a guide to the TMap approach. 564, Addison-Wesley, Harlow. 

Reppert, Tracy (2004). Don’t Just Break Software Make Software: How storytest-driven development 

is changing the way QA, Customers, and developers work. Better Software, July/August, 2004, 18-23. 

 

Sauvé, Jacques Philippe, Osório Lopes Abath Neto & Walfredo Cirne (2006). EasyAccept: a tool to 

easily create, run and drive development with automated acceptance tests. International Conference on 

Software Engineering, Proceedings of the 2006 international workshop on Automation of software 

test, 111-117, ACM Press, New York. 

Schwaber, Ken & Mike Beeble (2002). Agile software development with Scrum. 158, Prentice-Hall, 

Upper Saddle River (NJ). 

Schwaber, Ken (2004). Agile Project Management with Scrum. 163, Microsoft Press, Redmond, 

Washington. 

Stringer, Ernest T. (1996). Action Research: A Handbook for Practitioners. 169, SAGE, United States 

of America. 

Trochim, William M.K (2006). Qualitative Validity. 

October 4th, 2006 

99

Zallar, Kerry (2001). Are you ready for the test automation game Software Testing & Quality 

Engineering, November/December 2001, 22–26. 

 

Watt, Richard J. & David Leigh-Fellows (2004). Acceptance Test-Driven Planning. Lecture Notes in 

Computer Science, Volume 3134/2004, Extreme Programming and Agile Methods - XP/Agile Universe 

2004, 43-49, Springer, Berlin/Heidelberg. 

Wideman Max R. (2002). Wideman Comparative Glossary of Project Management Terms, March 

2002. May 14th, 2007 

100

APPENDIX A 

PRINCIPLES BEHIND THE AGILE 

MANIFESTO 

We follow these principles: 

Our highest priority is to satisfy the customer 

through early and continuous delivery 

of valuable software. 

Welcome changing requirements, even late in 

development. Agile processes harness change for 

the customer's competitive advantage. 

Deliver working software frequently, from a 

couple of weeks to a couple of months, with a 

preference to the shorter timescale. 

Business people and developers must work 

together daily throughout the project. 

Build projects around motivated individuals. 

Give them the environment and support they need, 

and trust them to get the job done. 

The most efficient and effective method of 

conveying information to and within a development 

team is face-to-face conversation. 

Working software is the primary measure of progress. 

Agile processes promote sustainable development. 

The sponsors, developers, and users should be able 

to maintain a constant pace indefinitely. 

Continuous attention to technical excellence 

and good design enhances agility. 

Simplicity--the art of maximizing the amount 

of work not done--is essential. 

The best architectures, requirements, and designs 

emerge from self-organizing teams. 

At regular intervals, the team reflects on how 

to become more effective, then tunes and adjusts 

its behavior accordingly. (Beck et al. 2001b) 

101

APPENDIX B 

INTERVIEW QUESTIONS 

Interview questions asked in the final interviews. 

1. How has ATDD affected the software development Why 

2. What have been the benefits in ATDD Why 

3. What have been the drawbacks in ATDD Why 

4. What have been the challenges in ATDD Why 

5. Has ATDD affected on the risk of building incorrect software How Why 

6. Has ATDD affected on the visibility of the development status How Why 

7. Has ATDD established a quality agreement between the development and feature owners 

How Why 

8. Has ATDD changed your confidence in the software How Why 

9. Has ATDD affected on when problems are found How Why 

10. Has ATDD affected on the way requirements are up to date How Why 

11. Has ATDD affected on the way requirements and tests are in sync How Why 

12. Are the acceptance tests in a format that is easy to understand Why or why not 

13. Is it easy to write the acceptance tests on a right level Why or why not 

14. Has ATDD affected on the developers’ goal How Why 

15. Has ATDD affected on the design of the developed system How Why 

16. Has ATDD affected on verifying the refactoring correctness How Why 

17. Has ATDD affected on the quality of the test cases How Why 

18. Has ATDD had influence on the way people see test engineers How Why 

19. Has ATDD had influence on the test engineer's role How Why 

20. Has ATDD affected on how hard or easy the tests are to automate How Why 

21. What could be improved in the current way of doing ATDD Which changes could give the 

biggest benefits 

22. Sum up the biggest benefit and the biggest drawback based on the issues asked in this interview 

and state the reasons. 

102

Acceptance Test-Driven Development with Keyword ... - Niksula

Create successful ePaper yourself

Delete template?

Save as template?