Technische Universität<br />

München<br />

Fakultät für Informatik<br />

<strong>Diploma</strong>rbeit<br />

Aufgabensteller: Univ-Prof. Gudrun Klinker, Ph.D.<br />

Betreuer: Dr. Martin Wagner<br />

Abgabedatum: 11. Mai 2005

Erklärung<br />

Ich versichere, dass ich diese Ausarbeitung der <strong>Diploma</strong>rbeit selbstständig verfasst und nur<br />

die angegebenen Quellen und Hilfsmittel verwendet habe.<br />

München, den 11. Mai 2005 Felix Löw

Zusammenfassung<br />

Anwendungen der Erweiterten Realität (<strong>Augmented</strong> <strong>Reality</strong>) bereichern die reale Welt durch<br />

die Überlagerung mit virtuellen Objekten. Um die Verschmelzung von realer und virtueller<br />

Umgebung betrachten zu können, werden grafische Ausgabehardware wie Head Mounted<br />

Display oder <strong>Table</strong>t PC, sowie Tracking Technologien zur Bestimmung von Position und<br />

Orientierung von verfolgten Objekten verwendet. Häufig verwendete bild-basierte Tracking<br />

Verfahren wie Natural Feature Tracking sind fehleranfällig für Kamerabewegungen. Bestimmte<br />

Merkmale (Features) werden über mehrere Bildsequenzen verfolgt. Die Grundidee<br />

dieser Arbeit ist, den Suchbereich für das Finden dieser Merkmale an die Bewegungsveränderung<br />

der Interaktionshardware anzupassen. Diese Arbeit ist ein erster Schritt dieses<br />

Problem für eine spezielle Anwendungsklasse für <strong>Augmented</strong> <strong>Reality</strong> zu lösen, <strong>Table</strong>-<br />

<strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong>. Diese Arbeit schlägt einen hybriden Trackingansatz vor, um beides,<br />

das Tracking und den Bewegungskontext des Benutzers zu berücksichtigen. Die gemessene<br />

Orientierung eines zusätzlichen Trackers wird für eine dynamische Laufzeitanpassung des<br />

bild-basierten Trackingverfahrens, das Tracking von Texturen, verarbeitet. Hierzu wird eine<br />

Software Architektur vorgeschlagen, die dies ermöglicht.<br />

Nach einer Einführung in <strong>Table</strong>-<strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> erörtern wir den Aufbau und<br />

Auswertung einer Benutzerstudie. Ziel dabei ist es eine Annäherung für eine lineare Abbildung<br />

von Benutzerbewegung und Suchfenster des Texturentrackings zu bestimmen. Dabei<br />

werden statistische Analysemethoden verwendet um diese Abbildung zu finden. Diese<br />

Abbildung kann in einer einfachen linearen Funktion mit der Orientierungsänderung als<br />

Eingabeparameter ausgedrückt werden. Zusätzlich wird die Beziehung zwischen dem Benutzerverhalten<br />

und ausgeführten Aufgabe untersuchen. Ferner werden Aufgaben in <strong>Table</strong>-<br />

<strong>Top</strong> Anwendungen identifiziert und Konsequenzen für das bild-basierte Trackingverfahren<br />


Abstract<br />

<strong>Augmented</strong> <strong>Reality</strong> (AR) applications enrich the real world by augmenting virtual objects. In<br />

order to gaze this fusion of real environment and virtual content <strong>Augmented</strong> <strong>Reality</strong> setups<br />

utilize common graphical output hardware like Head Mounted Displays or <strong>Table</strong>t PC and<br />

tracking technologies to estimate the position and orientation of tracking targets. Frequently<br />

used vision-based techniques like Natural Feature Tracking are error-prone to camera movements.<br />

Features have to be found in subsequent video frames again. Basic idea of this work<br />

is to adopt the search area for features to the change in orientation of the user interface hardware.<br />

This work is a first step to solve this problem for a special class of <strong>Augmented</strong> <strong>Reality</strong><br />

applications, <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong>. The work provides a hybrid tracking approach<br />

to bring tracking and the user’s movement context together. Orientation information given<br />

by an additional tracker is used and applied for a dynamic configuration during runtime of<br />

the vision-based tracking routine, a texture tracking algorithm. To accomplish this a special<br />

software architecture is proposed.<br />

After we introduced the basic ideas of table top <strong>Augmented</strong> <strong>Reality</strong> we show the design,<br />

the execution and evaluation of a user study. Goal is to find an approximation for a linear<br />

mapping between user motion and search window of the texture tracking routine. Applying<br />

statistical techniques we will show that it is possible to derive such a mapping. This mapping<br />

can be expressed by a simple linear function with the change of orientation as input<br />

parameter. We will also evaluate that the user behavior is related to the performed tasks. We<br />

will identify tasks for <strong>Table</strong> <strong>Top</strong> AR and discuss implications for the tracking routine.

Purpose of This Document<br />

Preface<br />

This work was written as diploma thesis, which is adequate to a Masters <strong>Thesis</strong>, at the Technische<br />

Universität München at Prof. Gudrun Klinkers <strong>Augmented</strong> <strong>Reality</strong> Research Group<br />

(Chair for Computer Aided Medical Procedures and <strong>Augmented</strong> <strong>Reality</strong>). The work was<br />

advised by Dr. Martin Wagner. The ideas for this thesis evolved from fruitful discussions<br />

with him.<br />

The thesis was accomplished in cooperation with the Human Interface Technology Laboratory<br />

New Zealand (HIT Lab New Zealand 1 ) in Christchurch. From September 2004 until<br />

February 2005 I have been at the HIT Lab in New Zealand developing and conducting the<br />

main parts of thesis. This part was supervised by Prof. Mark Billinghurst.<br />

This abidance in New Zealand for this thesis was financially supported by a scholarship<br />

for ”Kurzfristige Studienaufenthalt für Abschlussarbeiten” by the Deutscher Akademischer<br />

Austauschdienst (DAAD) 2 .<br />

In this thesis I would like to show my ideas behind my approach, document my results<br />

and draw conclusions for future implications of my work.<br />

Target Audience<br />

General Readers who are interested in <strong>Augmented</strong> <strong>Reality</strong> should read chapter 1 and 2. These<br />

sections give an overview of the basic terms and technologies and introduce the ideas<br />

of my thesis<br />

Readers interested in Hybrid Tracking should read chapter 3 where my approach is explained<br />

and categorized in the related work. Succeeding steps and evaluation of my work are<br />

described in chapters 3, 5 and 6.<br />

Human Computer Interaction Researchers should read chapter 3, where our user study based<br />

approach is explained. The design, execution and evaluation of the study is shown in<br />

chapters 5 and 6.<br />

Acknowledgments<br />

First of all I would like to thank my buddy Michael ”Siggä” Siggelkow for struggling together<br />

through the whole studies and through New Zealand. Thanks for helping, motivating,<br />

partying and being a friend.<br />

I would like to thank Martin Wagner who advised this thesis. Although it is a hard task to<br />

advise a thesis on the other side of the world I think you did a very good job. Thanks a lot!<br />

Heaps of thanks to Mark Billinghurst for giving me the opportunity to come the HIT Lab in<br />

New Zealand and for his kind hospitality. Thanks to Gudrun Klinker as well for supporting<br />

and enabling the stay in New Zealand.<br />

Here is my big respect for the HIT Lab crew. Most of all I would like to thank Raphaël<br />

”le docteur” Grasset and Phil Lamb for helping me so much with my user study. Special<br />

thanks to Anna-Lee Mason and Nathan Gardiner, the good souls of the HIT Lab. Thanks to<br />

all people working or being involved at the HIT Lab.<br />

Thanks to all the people I met and I can call my friends now. Special thanks to my great<br />

swedish flatmates Johan Karlsson and Mikael Seleg˚ard. Ett stor tack! Thanks a lot to Claudia<br />

Nelles for heaps of things. Also thanks to the rest of the gang in our room: Thomas ”The<br />

German” Zurbrügg and Michael ”Intern of the month” Herchel. Also a huge thank you to<br />

Jonno Hill, Raphaël Grasset, Sofi Crosley, Jörg Hauber, Anna-Lee Mason, Phil Lamb, Marcel<br />

Lancelle, Matt Keir, Tobi Gefken, Oakley Buchmann and Nathan Gardiner. Thanks to all the<br />

people from the Canterbury University Tramping Club and the Wednesday soccer team. I<br />

had a great time with all of you.<br />

I would like to thank all my friends here in Munich. Special thanks to the crowd that came<br />

all the way to New Zealand to look after us.<br />

Last I would like to thank my family for everything and making me feel at home again.<br />

Special thanks to my brother Martin for showing me the secrets of statistics.<br />


Figure 0.1: New Zealand, Mount Cook National Park<br />

iii<br />

Garching, May 2005<br />

Felix Löw

Contents<br />

1 Introduction 1<br />

1.1 Overview of <strong>Augmented</strong> <strong>Reality</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br />

1.2 Tracking in <strong>Augmented</strong> <strong>Reality</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br />

1.2.1 Representation of spatial information . . . . . . . . . . . . . . . . . . . 3<br />

1.2.2 Examples for Tracking Technologies . . . . . . . . . . . . . . . . . . . . 5<br />

1.2.3 Hybrid Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br />

1.3 Human Computer Interaction (HCI) . . . . . . . . . . . . . . . . . . . . . . . . 10<br />

1.4 Goals and Outline of this <strong>Thesis</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />

1.4.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />

1.4.2 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> 13<br />

2.1 Motivation for <strong>Table</strong> <strong>Top</strong> AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br />

2.2 The Magic Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br />

2.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br />

2.4 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br />

2.4.1 Marker-Based Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 18<br />

2.4.2 Texture Tracking of a 2D plane . . . . . . . . . . . . . . . . . . . . . . . 19<br />

2.4.3 Tracking in the Magic Book . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.5 User Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.5.1 Graphical Output Hardware . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.5.2 Input Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />

3 A Hybrid Tracking Approach 29<br />

3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />

3.2 An Inertial - Optical Tracker based Runtime Setup . . . . . . . . . . . . . . . . 30<br />

3.3 Configuration of the setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

3.4 Motivation for a User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

3.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />

3.5.1 Natural Feature Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />


Contents<br />

3.5.2 Hybrid Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br />

3.5.3 Head motion prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />

3.5.4 <strong>Table</strong>-<strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> . . . . . . . . . . . . . . . . . . . . . . . 36<br />

3.5.5 Our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37<br />

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38<br />

4 A Software Architecture based on DWARF 39<br />

4.1 DWARF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />

4.1.1 Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />

4.1.2 Service Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />

4.1.3 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />

4.2 Software Architecture for a Dynamic Configuration during Runtime . . . . . 42<br />

4.2.1 Existing Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />

4.2.2 Requirements for new architecture . . . . . . . . . . . . . . . . . . . . . 43<br />

4.2.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44<br />

4.2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

4.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48<br />

5 User Study 49<br />

5.1 Goals of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

5.2 User Study design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br />

5.2.1 Movement Tracking of the Hand-Held Device . . . . . . . . . . . . . . 51<br />

5.2.2 Tracking of 2d feature point . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />

5.2.3 Logging Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />

5.2.4 Task Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

5.2.5 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />

5.2.6 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

5.3 Execution of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

6 Evaluation of the User Study 65<br />

6.1 Evaluation of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />

6.1.1 Feature Point Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />

6.1.2 Feature Point Tracking and Tracking of the Handheld . . . . . . . . . . 70<br />

6.1.3 Feature Point Tracking and Tasks . . . . . . . . . . . . . . . . . . . . . . 77<br />

6.1.4 Further Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />

7 Conclusions 82<br />

7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82<br />

7.1.1 Results of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . 82<br />

7.1.2 Natural Feature Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />

7.1.3 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> . . . . . . . . . . . . . . . . . . . . . . . . 84<br />

7.1.4 Assessment of our Approach . . . . . . . . . . . . . . . . . . . . . . . . 85<br />

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85<br />

7.2.1 Factors for User Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 85<br />

7.2.2 Visual Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86<br />

7.2.3 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86<br />


Contents<br />

Glossary 88<br />

A User Study 90<br />

A.1 Conduction of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90<br />

A.1.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90<br />

A.1.2 Instructions and Guideline . . . . . . . . . . . . . . . . . . . . . . . . . 91<br />

A.2 Statistical Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />

A.2.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />

A.2.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96<br />

B Complete Results 101<br />

B.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />

B.2 Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<br />

B.2.1 Case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<br />

B.2.2 Case 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />

B.2.3 Case 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />

B.2.4 Case 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113<br />

Bibliography 118<br />


CHAPTER 1<br />

Introduction<br />

Computers have always changed their appearance in the past and still will in the future.<br />

Huge central computers available for only a small group of researchers or experts have<br />

changed to desktop personal computers (PCs) available for everyone in the last 40 years.<br />

And the trend towards new small, cheap and mobile computers like mobile phones or palm<br />

desktops is continuing. Computers get more and more involved into our everyday life. New<br />

ways of interacting with these new computers have to be researched and evaluated.<br />

<strong>Augmented</strong> <strong>Reality</strong> (AR) is such an approach to bring the real world and the virtual computer<br />

world together. AR allows Human-Computer Interaction (HCI) in a new way.<br />

1.1 Overview of <strong>Augmented</strong> <strong>Reality</strong><br />

In his survey of <strong>Augmented</strong> <strong>Reality</strong> [5][2] Azuma defines <strong>Augmented</strong> <strong>Reality</strong> as follows:<br />

”<strong>Augmented</strong> <strong>Reality</strong> (AR) is a variation of Virtual Environments (VE), or Virtual<br />

<strong>Reality</strong> (VR) as it is more called. VE technologies completely immerse a user<br />

inside a synthetic environment. While immersed, the user cannot see the real<br />

world around him. In contrast, AR allows the user to see the real world, with<br />

virtual objects superimposed upon or composited with the real world.”<br />

In other words AR tries to enrich the real environment with virtual information. AR brings<br />

the real world and the computer world together. In contrast Virtual Environments leave the<br />

real world outside. The vision of VR is that the user is not aware of the ”outside world”at<br />

all and he can not interact with real objects. In AR on the other hand the user is able to<br />

interact with virtual objects as well as real objects. This virtual information is augmented in<br />

the user’s point of view, using special graphical output devices like a Head Mounted Display<br />

(HMD), utilized in a classical AR setup for example. The system immediately responds to the<br />

user’s actions and gives feedback. Realtime feedback is one of the key requirements for AR<br />

applications. If a user wearing a HMD turns his head, the new viewpoint has to be calculated<br />

and the 3D objects have to be registered with the real objects in realtime. Otherwise the<br />


1 Introduction<br />

user will have the feeling that the registration of the 3D objects will lag behind his head<br />

movements. The virtual information is adopted according to the user’s performance or even<br />

if the state of the environment changes. Already in 1968 Ivan Sutherland presented the<br />

first <strong>Augmented</strong> <strong>Reality</strong> system introducing the first HMD [54]. Interesting is that the basic<br />

concepts proposed by Sutherland are still valid for current AR applications.<br />

Figure (1.1) shows an example of the AR application ”<strong>Augmented</strong> Furniture Client” that<br />

allows to put virtual furniture into the real living room [19]. A user wearing a HMD can<br />

walk through his own real living room and the selected pieces of furniture are displayed in<br />

the environment according to the user’s viewpoint.<br />

Figure 1.1: Placing virtual furniture in a real environment. A virtual sofa and chair are augmented in<br />

the livingroom<br />

To realize this we have to find new ways of interacting with computers. The paradigm<br />

of a desktop PC with a keyboard and mouse as the only way to interact with computers is<br />

not suited for this kind of applications anymore. Often a mobile user has to be enabled to<br />

change the behavior of a system with more intuitive ways of interacting or even without any<br />

interaction at all. Marc Weiser describes this new way of thinking in his ”The computer of<br />

the 21st century”[64]. Also the terms of Context-Aware Computing and Mobile Computing are<br />

important if we talk of AR.<br />

Context-Aware Computing. According to Schilit and Theimer ”Context-Aware software adopts<br />

according to the location of use, the collection of nearby people, hosts, and accessible<br />

devices, as well as to changes to such things over time”. Any information about the<br />

environment that is important for the computer system has to be collected, evaluated<br />

and responded to by the system. For example a system changes its internal behavior<br />

according to the light conditions of the user’s position [63]. A huge research area in AR<br />

is therefore how to collect these informations about the environment like the location of<br />

the user. These questions will be discussed more in detail in the tracking introduction.<br />


1 Introduction<br />

Mobile Computing. A lot of AR applications enable the user to move around freely in the<br />

environment. Therefore a mobile setup is attached to a mobile user. Possible scenarios<br />

for this class of AR applications are maintenance tasks, like repairing a car for example.<br />

The user has his hands available to fix a car and gets virtual information into his HMD<br />

showing which step to perform next. Another example would be a navigation system<br />

displaying information about the environment in the user’s view.<br />

As a short summary key challenges in AR are:<br />

• Registration of virtual information in the real environment<br />

• Find new ways of interactions with systems that respond in realtime<br />

For 3D registration in the real environment tracking technologies are needed. Tracking is<br />

a difficult problem which will be discussed more in detail. For new user interaction possibilities<br />

innovative user interfaces (UI) are needed and it has to be discovered how user actually<br />

use a system and accept new ways of interaction or even refuse them.<br />

1.2 Tracking in <strong>Augmented</strong> <strong>Reality</strong><br />

As denoted in the previous section tracking is one of the main and most difficult issues in<br />

AR research.<br />

Virtual and real objects have to be aligned as good as possible. This process is called registration.<br />

Sensors that gather information about the environment and collect 3 dimensional<br />

(3D) spatial information are called trackers To register virtual objects in 3D, AR applications<br />

work in the 3 dimensional space. In order to calculate the viewpoint of a user and to display<br />

the the virtual information at the right position the pose information, which consists of the<br />

position as well as the orientation has to be tracked by the underlying tracking technology.<br />

This section gives an introduction to the fundamentals of tracking and an overview of<br />

the basic tracking technologies used in this thesis. A good introduction and more detailed<br />

descriptions of all the terms and technologies introduced can be found in [21].<br />

1.2.1 Representation of spatial information<br />

• Position<br />

Position is a 3D vector estimated by the tracking technology. It contains the coordinates<br />

of the specified point in the tracker coordinate system in the current tracking<br />

frame. The tracker coordinate system is a cartesian coordinate system consisting of three<br />

perpendicular axes. These axes intersect in one point, the origin of the coordinate system.<br />

In homogeneous coordinates position is represented by 4-component vector:<br />

¯v = (x, y, z, w) T , where is typically w = 0<br />


• Orientation<br />

1 Introduction<br />

Orientation gives the information how an object is rotated according to the axis of<br />

the tracker coordinate system. Unlike representing positions, there are more way of<br />

representing orientation that have advantages and disadvantages. It is always a tradeoff<br />

which method to choose.<br />

1. Rotation Matrix<br />

A common way to represent transformations of points in a coordinate system is<br />

the 3x3 rotation matrix. Rotation and scaling on a set of points can be combined<br />

and performed by a simple matrix multiplication. If a 4x4 homogeneous matrix<br />

is used the rotation matrix is the upper 3x3 matrix. Also translations can be computed,<br />

thus it is represented in the 4th column. The columns of the rotations matrix<br />

can be regarded as the direction of the transformed coordinate axes projected<br />

on the source coordinate axes.<br />

2. Euler Angles<br />

Euler angles are the simplest and most intuitive representation of rotations. Every<br />

rotation can be considered as three successive single rotations around the three<br />

coordinate axes. In the 3 dimensional space there are rotation matrices for every<br />

axis:<br />

⎛<br />

rotation φ about x axis: Rx = ⎝<br />

⎛<br />

rotation θ about y axis: Ry = ⎝<br />

⎛<br />

rotation ψ about z axis: Rz = ⎝<br />

1 0 0<br />

0 cos φ sin φ<br />

0 −sin φ cos φ<br />

cos θ 0 −sin θ<br />

0 1 0<br />

sin θ 0 cos θ<br />

cos ψ sin ψ 0<br />

−sin ψ cos ψ 0<br />

0 0 1<br />

Any rotation in the 3D space can be calculated by multiplying the three rotation<br />

matrices:<br />

R = Rx · Ry · Rz<br />

Note that matrix multiplication is not commutative, the order of the three rotations<br />

matters.<br />

3. Quaternions<br />

Quaternions are extensions of complex numbers to hyper-complex numbers of<br />

rank 4. On a first glance quaternions might look difficult and confusing, but once<br />

you are familiar with them calculations can be very easy. They can be represented<br />

by a 4 dimensional vector. They consist of a real scalar and an imaginary vector:<br />

q = w + xi + yj + zk, w, x, y, z ∈ R<br />

q = (w, x, y, z) , w, x, y, z ∈ R<br />

q = (s, ¯v) , s ∈ R, ¯v ∈ R 3<br />

4<br />

⎞<br />

⎠<br />

⎞<br />

⎠<br />

⎞<br />

1 Introduction<br />

Note that i, j, k are imaginary units with i 2 = j 2 = k 2 = ijk = −1, the imaginary<br />

vector is ¯v = (x, y, z) T , and the scalar part is s = w.<br />

Mukundan provides an introduction into the basic quaternion algebra providing<br />

operations like multiplication and addition, which we will not discuss here in<br />

detail [36].<br />

Rotations with quaternions: The vector part containing the imaginary components<br />

specifies the rotation axis, the scalar part is the cosine half of the rotation angle.<br />

Only unit quaternions are used to describe rotations q = 1. A quaternion<br />

q = (s, ¯v) specifies a rotation of 2arcos(s) around the axis ¯v [21]. So if we<br />

want to construct a rotation around an axis ¯v of angle θ we can express the<br />

following quaternion:<br />

qθ,v = (cos( 1<br />

θ), sin(1<br />

2 2 θ)¯v)<br />

Here are two simple examples with q = (s, ¯v):<br />

– The identity rotation is specified by rotation of 0 degrees around an unspecified<br />

axis<br />

s = 1 ¯v = (0, 0, 0) T<br />

– A rotation of 90 degrees about the y-axis is specified the following way<br />

s = 1<br />

√ 2<br />

¯v =<br />

<br />

0, 1<br />

T √ , 0<br />

2<br />

Consecutive rotations can be expressed with the product of the corresponding<br />

quaternions. A rotation q can be applied to a vector ¯p = (x, y, z) the following<br />

way:<br />

¯p ′ = q ◦ ¯p ◦ q ∗<br />

with the conjugated quaternion q ∗ = (s, −¯v).<br />

A discussion about the advantages and disadvantages of the different representations<br />

of orientation can be found here [49].<br />

1.2.2 Examples for Tracking Technologies<br />

In order to select a appropriate tracking system for an AR application certain criteria have<br />

to be evaluated. The most important criteria are latency, accuracy, update rate, working area<br />

and mobility [46].<br />

Tracking could be compared to how a human being collects information about the environment:<br />

by seeing, by sensing (hearing, feeling, recognizing certain influences) and by<br />

equilibrium sense. Therefore tracking technologies can be categorized in almost similar categories.<br />

This is only a small selection of different tracking technologies used in this thesis.<br />


Vision-Based Tracking (Seeing)<br />

1 Introduction<br />

Vision-based tracking apply image recognition techniques in order to detect certain features<br />

in images grabbed by optical cameras. Thus speaking of optical trackers means a combination<br />

of hardware to grab video frames and software to analyze the frames. The terms<br />

vision-based tracking and optical tracking will be used in the same way during this thesis.<br />

Tracked features can be artificial or natural. They are used to calculate the position of the<br />

target in the reference coordinate system. While often simple markers are used as artificial<br />

features [28], natural features can either be preprocessed points in a 2D plane [11] or any<br />

features in the environment [42]. This method provides full 6 degrees of freedom (DOF). This<br />

means that vision based tracking provide position and the orientation as well.<br />

A common software used for marker-based tracking is the <strong>Augmented</strong> <strong>Reality</strong> Toolkit<br />

(ARToolkit) [28]. Recently a new version of this toolkit has been developed which allows the<br />

texture tracking of a 2D plane instead of fiducials [11]. This software has been developed by<br />

the HIT Lab (Human Interface Technology Laboratories) USA and New Zealand 1 . A cheap<br />

web cam on an average desktop computer can be used with this software and allows to set<br />

up a small AR system even at home. AR applications based on vision-based tracking could<br />

become very important in the future in order to address a broad mass of people.<br />

Optical Tracking is often also categorized in Outside-In or Inside-Out Tracking. In an<br />

Outside-In setup the camera is attached to a fixed position, in an Inside-Out setup the camera<br />

is attached to the moving target itself (on the HMD of a user for example).<br />

One disadvantage is a high latency because of the huge amount of video data grabbed by<br />

the camera and the high processing time of the image recognition algorithms. Drawback for<br />

the user is mainly occlusion. If either artificial or natural features are occluded the tracking<br />

will fail (marker based tracking) or will lead to inaccurate results of the tracking routine if<br />

less features are available for tracking (natural feature tracking).<br />

Inertial Tracking (Equilibrium sense)<br />

In order to measure the position of an object, inertial trackers use accelerometers which<br />

estimate the linear acceleration of an object. Gyroscopes instead measure angular velocity<br />

applying the laws of conservation of angular momentum and therefore gyroscopes are able<br />

to provide the orientation of an object. The orientation is delivered as yaw (y-axis), pitch<br />

(x-axis) and roll (z-axis).<br />

Current technologies for gyroscopes play an important role, because they are small, cheap<br />

and easy to integrate into other devices like laptops or even mobile phones. But historically<br />

gyroscopes were used for navigation in airplanes and ships. These inertial devices<br />

were heavy, expensive and due to their navigation task very accurate. But new technologies<br />

enabled the development of smaller and cheaper devices. Here is a short overview of<br />

the common techniques used for gyroscope devices [23]. A special focus is set on vibrating<br />

gyroscopes.<br />

1 www.hitlabnz.org<br />


• Spinning Mass Gyroscopes<br />

1 Introduction<br />

These classical gyroscopes are also called gimbaled gyroscopes. They use the properties<br />

of a spinning wheel and can only measure the rate of rotation about one axis. Thus<br />

three gyroscopes have to be combined if the rotation about three orthogonal axes has<br />

to be sensed. These gyroscopes are heavy and large they are only applied in ships and<br />

aircrafts anymore.<br />

• Optical Gyroscopes<br />

These gyroscopes apply the time of flight (TOF) principle. The time of flight until a<br />

signal is sensed by a receiver is measured. For a gyroscope it means that rotation<br />

influences the time of flight for light. The time is measured and the rate of turn can be<br />

estimated.<br />

• Vibrating Gyroscopes<br />

Vibrating gyroscopes are commonly used in recent application. Reasons for that are<br />

that they are small, consume less power and no bearing or motors are required. A vibrating<br />

element is rotated. Evaluations have shown a ring-shaped vibrating resonator<br />

is suited best for the purpose of measuring rate or turns. For the measurement of the<br />

angular rate a phenomenon known from the aviation domain is utilized, the Coriolis<br />

Effect. If an airplane is heading east it will drift towards south, although it does not<br />

accelerate in the south direction. Heading west leads to a drift in the north direction.<br />

This ”force” responsible for the acceleration in the north-south direction is called the<br />

Coriolis force FC. This effect occurs when an object moves within a rotating reference<br />

frame. In this special case an aircraft moves in the reference frame of the rotating earth.<br />

Figure 1.2 shows this effect. An object moves around a rotation axis and the Coriolis<br />

force affects the object perpendicular to the movement direction. This effect can also be<br />

recognized in thunderstorms, weather developments or the water flushing down the<br />

sink in the other direction on the southern hemisphere. A good demonstration of the<br />

effect can be seen in [16].<br />

The Coriolis force FC can also be expressed in the following equation with the objects<br />

mass m, its velocity in the rotating frame vr, the angular velocity of the rotating frame<br />

of reference ω and × the vector cross-product.<br />

FC = −2m(ω × vr)<br />

A good introduction to the Coriolis force can be found here [25]. This effect is applied<br />

in vibrating gyroscopes. Figure 1.3 shows the reference coordinate frame with the vibrating<br />

ring. First let us only consider rotations around the axis Z. The ring is vibrated<br />

with a constant amplitude ΩZ around the Z-axis. This is called the primary mode. If<br />

the gyroscope is turned around the Z-axis now, the Coriolis effect leads to a acceleration<br />

perpendicular to the motion of the ring (in w-direction), the also called secondary<br />

mode. This secondary mode can be measured and the rate of turn can be calculated.<br />

If we now vibrate the ring around all axis of the reference frame Ω = (ΩX, ΩY , ΩZ) we<br />

can measure the rate of turn for all directions [22] [14].<br />


1 Introduction<br />

Figure 1.2: An object moves in a rotating frame. The Coriolis force results in an acceleration perpendicular<br />

to the movement direction<br />

Figure 1.3: The reference frame of a vibrating gyroscope. The ring is vibrated around the all the axis<br />

(primary mode) and the rate of turn can be measured by the secondary mode cause by the Coriolis<br />

effect<br />


1 Introduction<br />

Figure 1.4: Intersense Products: Inertial Cube2 (left) and Intertrax2 (right)<br />

Figure 1.5: Example for a Magnetic Tracker: Ascension Flock of Birds<br />

Accelerometers and gyroscopes are often combined to get full 6 DOF. Due to the fact that<br />

inertial trackers provide relative measurements they are often combined with other tracking<br />

technologies to obtain absolute measurements as well. One drawback of this technology is<br />

that small measurement errors accumulate and cause drift, which leads to incorrect tracking<br />

results after a certain period. Widespread inertial tracking products are tracker by Intersense<br />

2 (see figure 1.4).<br />

A difficult drawback for the usage is the accumulation of drift. Either relative orientation<br />

is used or the setup has to integrate other tracking or filtering technologies in order to correct<br />

the measurements delivered by the inertial tracker.<br />

Magnetic Tracking<br />

A special hardware setup generates a magnetic field in a certain working area. These magnetic<br />

fields are either low frequency AC or DC fields. Three orthogonal coils in the sender<br />

as well as the receiver are used to produce measurements for position and orientation (6<br />

DOF). The physical principles applied are described in [21]. Tracking measurements can be<br />

distorted by ferromagnetic objects or CRT monitors which produce an artificial magnetic<br />

field for example. An example for a magnetic tracking system is the Ascension 3 Flock of<br />

Birds(1.5).<br />

Main drawback for users is the limited range of tracking. Best results are achieved near the<br />

2 www.intersense.com<br />

3 http://www.ascension-tech.com<br />


1 Introduction<br />

base station. Also interferences with artificial magnetic fields or metals have to be avoided.<br />

Other examples<br />

Other tracking technologies are acoustic trackers, the Global Positioning System (GPS), which<br />

is important for outdoor applications and mechanical trackers.<br />

1.2.3 Hybrid Tracking<br />

All the tracking technologies have their advantages and drawbacks, like latency, frame rates,<br />

mobility and precision. But as we said that tracking is one of the key issues for AR this<br />

weaknesses have to be eliminated. One approach is to combine several tracking technologies<br />

to compensate the drawbacks of a single technology. For example inertial trackers can only<br />

give relative pose information. If it is combined with a GPS system the resulting tracking<br />

system can deliver absolute pose information (the world coordinates as a reference frame). In<br />

such applications user movement heavily affects the 3D registration and has to be stabilized<br />

by inertial trackers[1].<br />

As we said inertial tracking accumulates small measurement errors that cause drift. To<br />

compensate this vision-based tracking techniques are integrated to filter and even predict<br />

orientation estimations. An important filtering technique in this context is called Kalman<br />

Filtering.<br />

Kalman Filter: Kalman Filtering is a powerful mathematical tool using a prediction and correction<br />

loop to filter and stabilize error-prone data. If several trackers are combined<br />

this tool can be used to correct the measurements of each other. Bishop and Welch<br />

provide a good introduction to the Kalman Filter [65].<br />

In the context of Context-Aware and Ubiquitous Computing multiple sensors in the environment<br />

have to be combined and evaluated. In Ubiquitous Tracking these sensors can be<br />

dynamically integrated during runtime [39].<br />

1.3 Human Computer Interaction (HCI)<br />

As mentioned above not only a satisfying registration in 3D is a must in AR, but also a suited<br />

way of interaction with AR systems has to be researched and evaluated. According to [24]<br />

research in this area focuses on the following tasks:<br />

• Design<br />

• Evaluation<br />

• Implementation<br />

of interactive computing systems for human use. During this steps also factors influencing<br />

the user behavior, like psychological aspects are studied. All these aspects might vary<br />

within different user groups or with a changing application domain. Therefore understanding<br />

the user is one of the key requirements when developing user interfaces. Based on that<br />


1 Introduction<br />

observations also the suited input and output hardware has to be selected. It is important<br />

to notice that this is a new way of thinking, because the last decades the user has always<br />

been forced to learn how to interact with a computer system. Input- and Output devices,<br />

like keyboard, mouse and a monitor were fixed. The vision of HCI is that a user does not<br />

have to adopt to a system at all, the system adopts to the user.<br />

1.4 Goals and Outline of this <strong>Thesis</strong><br />

My overall vision is to bring tracking and user behavior together. In almost every AR application<br />

the user has to learn how tracking works in order to improve her or his performance.<br />

The user has to learn which actions are allowed and which ones result in an unexpected<br />

feedback by the system, or even in no feedback at all. Every class of AR applications has<br />

special requirements concerning tracking, user interaction and mobility. As already said the<br />

user interaction will change from application to application and therefore tracking requirements<br />

dependent on the user behavior will be different. So user behavior has to be studied<br />

even for a specific application domain or even just a single task.<br />

1.4.1 Goals<br />

This thesis focuses on special class of AR applications: AR <strong>Table</strong> <strong>Top</strong> Applications. <strong>Table</strong> top<br />

applications can be set up at a desk or a table with a user standing or sitting in front of it. All<br />

the studies have been done with a table top application called The Magic Book developed at<br />

the HIT Lab New Zealand.<br />

Goal of this thesis is to evaluate if it is possible to adjust the behavior of the vision-based<br />

tracking algorithm according to the user’s movements. This should be realized with a hybrid<br />

tracking approach combining the vision-based tracking of the Magic Book on the one hand<br />

and a gyroscope giving information about the orientation of the user interface on the other<br />

hand. The gyroscope will be a measurement of the occurring user movements.<br />

The following issues will be discussed in order to get a little bit closer to this vision:<br />

• Properties of <strong>Table</strong> <strong>Top</strong> AR<br />

The special properties concerning tracking and user interaction in <strong>Table</strong> <strong>Top</strong> AR will<br />

be evaluated.<br />

• User study-based approach to find a mapping between behavior of the tracking algorithm<br />

and user movement<br />

If it is possible to find such a mapping we can adjust the algorithm very easy. We<br />

can simply apply a function with the movement as input parameter and configuration<br />

setting for the vision-based tracking as output parameters. To achieve this an approach<br />

based on a user study is introduced.<br />

• A software architecture allowing a dynamic configuration of the tracking algorithm<br />

based on the DWARF (Distributed Wearable <strong>Augmented</strong> <strong>Reality</strong> Framework) framework.<br />


1 Introduction<br />

The information given by the gyroscope has to be integrated in the tracking routine<br />

during runtime. Thus a software architecture accomplishing this requirement is needed.<br />

DWARF [6] is a component based AR framework. Some of these components have to<br />

be extended in order to integrate the texture tracking version of the ARToolkit.<br />

1.4.2 Outline of the thesis<br />

Heres a brief overview of the chapters of this thesis.<br />

Chapter 2: In this chapter we will give a motivation for <strong>Table</strong> <strong>Top</strong> AR and introduce the<br />

Magic Book. All the research done in this thesis is based on this application. Requirements<br />

on tracking and user interaction for <strong>Table</strong> <strong>Top</strong> AR will be discussed in this<br />

chapter and the fundamental techniques used in the Magic Book will be explained.<br />

Chapter 3: This chapter describes the approach to bring tracking and user behavior together<br />

and provides an idea for a user study based evaluation. The approach will be classified<br />

into other research projects and existing related work.<br />

Chapter 4: A DWARF based architecture will be introduced that allows a dynamic configuration<br />

of the tracking algorithm.<br />

Chapter 5: This was the main part of the work. A user study was performed with the goal to<br />

find a dependency between the tracking algorithm and user movement. The chapter<br />

describes the design and execution of the user study.<br />

Chapter 6: The user study will be evaluated. We will try to measure the correlation between<br />

the recorded tracking data of the vision-based tracker and the movement information<br />

given by the gyroscope. A mapping has to be found. Also I will present further ideas<br />

on how the gathered data can be evaluated and analyzed.<br />

Chapter 7: Results will be evaluated, conclusions will be presented and ideas for future work<br />

will be discussed.<br />


CHAPTER 2<br />

<strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

<strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> is a specific class of AR applications. As the name says the<br />

application is set up on a table or desk. The user stands, sits or moves in front of the table<br />

setup. This leads to special requirements on user interaction, tracking and mobility of the<br />

whole setup. As an example application this work deals with the Magic Book which was<br />

developed at the HIT Lab New Zealand.<br />

The concepts and technologies described in this chapter are generally valid for all AR<br />

applications as well. But we will always consider them in the context of table top AR applications.<br />

First we will give a motivation for table top AR for various application domains.<br />

The Magic Book and some other applications will be introduced shortly and possibilities<br />

for interactions and tracking will be evaluated. As we will see vision-based tracking has<br />

serious advantages for those applications. As the Magic Book uses the ARToolkit texture<br />

tracking technology we will give a short introduction to the algorithm. Several user interfaces<br />

for graphical output, especially HMD, a special handheld device and a tablet PC will be<br />

introduced. Also a short overview of input devices will be provided.<br />

2.1 Motivation for <strong>Table</strong> <strong>Top</strong> AR<br />

The reason for setting up applications on a table environment is very simple. People work,<br />

read, discuss, interact and play games on tables. Therefore table top AR applications try<br />

to enhance the experience of a current task or a social event with techniques used in AR.<br />

It is also called a horizontal setup which is a characteristic of table top environments. We<br />

have already discussed that one important issue of AR is the alignment of virtual and real<br />

objects. In table top applications these virtual objects are displayed in the work space, the<br />

table itself. People can sit, stand or even walk around the table and applications allow the<br />

interaction with the virtual environment and even communication and interactions between<br />

the participating users itself.<br />

Here are examples for very different application domains for table top AR with some<br />

examples for related work:<br />


• Exhibitions and Education<br />

2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

A lot of research has been done to bring new ways of multimedia interactions in museums<br />

and exhibitions. AR is a new way for interactive multimedia presentation. With<br />

a graphical output device like a HMD a user can walk through a museum. While<br />

regarding cultural objects audio and virtual information is augmented to the users<br />

recognition for example. The user is standing in front of the exhibition piece which<br />

is placed on a desk or on a table. The HIT Lab New Zealand developed a special AR<br />

kiosk for applications like this (figure 2.1). This kiosk can be used for a variety of applications<br />

like for education, science presentations and entertainment. [66] presents<br />

some of those applications used for educational purposes, like the AR Volcano (figure<br />

2.1), an interactive tutorial about volcanoes and the S.O.L.A.R (Solar-System and<br />

Orbit Learning in <strong>Augmented</strong> <strong>Reality</strong>) where a user succeeds if he is able to arrange<br />

augmented planets around the sun in the right way.<br />

Figure 2.1: The <strong>Augmented</strong> <strong>Reality</strong> Kiosk (left) and the AR Volcano application developed by the<br />

HIT Lab New Zealand (right)<br />

Mark Billinghurst writes about the potential of AR in education in his internet essay<br />

”New Horizons for Learning”[7].<br />

• Gaming<br />

Gaming is an interesting application domain for AR and especially table top AR. Players<br />

sit or stand around the table and get the results of their interaction augmented in<br />

the viewpoint. The immersion in a game experience and therefore the fun factor increases<br />

[56]. Immersion is a measurement to what degree a player is affected by a<br />

virtual or augmented experience. For example the classical PC game ”Worms” was<br />

ported to an AR application. The Studierstube also developed a collaborative game to<br />

steer a virtual train on a real network of wooden play rails. The trains can only be seen<br />

and manipulated by a see-through PDA device [61] (see figure 2.2).<br />

• Interactive Storytelling<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

Figure 2.2: The Invisible Train: An <strong>Table</strong> <strong>Top</strong> AR game developed by the Studierstube, Vienna<br />

New ways of telling stories are explored. As well as with enhancing the museum<br />

experience by augmented audio and virtual information, AR is used for storytelling as<br />

well. Using easy authoring tools even children are enabled to create their own content<br />

and create their own virtual worlds [34]. The Magic Book is such an application.<br />

• Collaboration<br />

Even new ways of collaboration are researched. Billinghurst also evaluated the potentials<br />

of AR applications for collaboration [8]. Michael Siggelkow developed an application<br />

for remote collaboration that could be set up easily on any desktop. In his thesis<br />

he explores in what way AR enhances the awareness of the participants in comparison<br />

to other technologies [50].<br />

The potential that such applications are also suited to catch the attention of a broad mass<br />

of people makes them interesting in the future. At the moment AR is used mainly in the<br />

industry yet and the door for non-experts has still to be opened.<br />

2.2 The Magic Book<br />

This thesis will focus on a special application: An interactive fairy tale book. The Magic<br />

Book itself is just a framework for a variety of applications based on<br />

• The book paradigm<br />

To change the content of the virtual scenes a book is used. Like reading a real book the<br />

user can turn pages and the content is augmented on the book page. Thus for every<br />

page a corresponding virtual model exists. The book is a tangible device for interacting<br />

with the application. The paradigm of tangible devices is embossed by Ishi [43].<br />

• A handheld visor<br />

With this visor the virtual objects are augmented in the user’s viewpoint. It will be<br />

described more in detail in user interaction section.<br />

The application content we are working with deals with the fairy tale ”Giant Jimmy Jones”,<br />

which was especially written for this purpose. On every page another part of the story is<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

augmented on the book. The story continues when a user turns the page. Also audio output<br />

is supported, a storyteller explains the scenes and a soundtrack has been written 2.3.<br />

A user is equipped with the hand held visor, the so-called handheld device. He can look<br />

through this visor, which is a specially prepared HMD and gaze the 2D book pages. While<br />

standing in front of the AR kiosk he is enabled to zoom into the scene in case he wants to<br />

focus on a certain property of the 3D animated fairy tale. Zooming into the scene is done<br />

by getting closer to the book surface with the handheld device. It is possible to walk around<br />

and watch the scene from different points of view. The book is attached to a rotatable plate.<br />

Hence the scene can also be watched from another point of view by simply turning the plate,<br />

which could also be considered as a tangible interaction device.<br />

Figure 2.3: Giant Jimmy Jones, an interactive fairy tale<br />

The different aspects concerning tracking and user interaction will be discussed in the<br />

following corresponding sections.<br />

2.3 Requirements<br />

In order to have a full understanding of the term table top AR we will discuss certain properties<br />

and requirements for AR. This is done to give a rationale why certain user interaction<br />

hardware or tracking technologies evolved for the class of table top AR applications. We will<br />

evaluate these requirements in the context of table top AR. <strong>Table</strong> top AR has the common<br />

properties of AR but also additional requirements.<br />

• Alignment in realtime<br />

In the introduction I already made clear that this is a key issue for a AR. Alignment<br />

again means that the virtual information is registered in the real environment at the<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

exact position and with the exact orientation according to the information sensed by<br />

the underlying tracking technology. Thus we need a tracker or a combination of trackers<br />

providing 6 DOF. If the realtime requirement would not be met, the user always<br />

would have the feeling of a lag between the actual movement and the display of the<br />

virtual information. This requirement has to be met by the tracking infrastructure.<br />

• Usability<br />

Usability Engineering is a new disciple in software engineering trying to solve the<br />

question how to make a computer system usable. Due to the fact that visitors of a<br />

museum or participants of a game, for example, are not experienced AR users the<br />

interaction with table top applications has to be very easy and intuitive. This requirement<br />

has to be considered while designing the user interaction. The right choice of<br />

user interface hardware and software design has to be made.<br />

• Mobility of the setup<br />

It should be possible to set up the system on every table without further difficulties.<br />

Exhibitions will move or have to be rearranged and a fixed setup for playing a game<br />

is not suited. This has serious requirements as well for the chosen tracking technology,<br />

because a stationary setup is not suited.<br />

• Price<br />

Of course a cheap price is always a requirements for software systems, but in order<br />

to address the public audience or budgeted art galleries with this new technology a<br />

tracking setup for several thousand Euros would not make sense, even if the measurements<br />

concerning latency, accuracy and update rate provide better results. Thus an<br />

affordable user interface and tracking infrastructure is needed.<br />

In the next section will will discuss which tracking technology and which user interfaces<br />

are suited best to meet these requirements for table top AR.<br />

2.4 Tracking<br />

The best technology to meet all the requirements evaluated in the previous section are vision<br />

based tracking techniques, especially Inside-Out tracking. As a repetition an optical tracker<br />

consists of a camera and an image recognition software. The camera grabs video images and<br />

the software algorithm searches for features to calculated the exact position and orientation<br />

to display the virtual object. Here is a short discussion why this technology is suited best for<br />

table top environments:<br />

• Perfect alignment in realtime<br />

First vision-based tracker deliver position as well as orientation. The measurements<br />

given by the tracker is accurate and good enough for this kind of applications. If the<br />

tracking fails in several frames it is accepted, because the consequences are not that<br />

serious, although the usability decreases. The bottleneck is the high latency, because<br />

the video data first has to be transmitted from the camera in the main memory, then<br />

the computational expensive image recognition has to detect the features. Hence the<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

quality of the tracking is dependent on the update rate of the camera and the speed of<br />

the image processing software.<br />

• Usability<br />

A camera needs to be integrated in the user interface (Inside-Out tracking), which is<br />

an additional requirement for the UI now. As mentioned above the tracking might<br />

fail due to fast movements, changing light conditions or wrong usage for example.<br />

Also occlusion is a main drawback. It has to considered that a user might occlude<br />

trackable features during usage. Suited and easy to learn interaction techniques have<br />

to be applied.<br />

• Mobility of the setup<br />

This is one of the big advantages, because no huge hardware setup is needed. Just a<br />

web camera, usually connected via USB or firewire, is enough. It can be attached and<br />

detached almost without any effort.<br />

• Price<br />

This is definitely the killer argument for the selection of optical tracking. On the one<br />

hand good webcams are already available for less than 100 Euros. On the other hand<br />

free image recognition toolkits are offered for programmers to design the software, like<br />

the ARToolkit.<br />

Next we will give a short introduction on marker-based tracking and then a deeper view<br />

into the algorithm of the texture tracking version of the ARToolkit. Further on in the thesis<br />

we will describe our approach to adopt parameters of this algorithm to movement information.<br />

2.4.1 Marker-Based Tracking<br />

A basic and freely available software for marker based tracking is the ARToolkit [28][27].<br />

Although several marker-based tracking algorithms are available we will focus on the AR-<br />

Toolkit here, because it is used in this thesis. The environment has to be prepared with<br />

quadratic pattern used for the calculation of the relative position of the camera to the so<br />

called markers (see figure 2.4). These markers are black and white squares with a black border<br />

and a configurable pattern in the inside. This pattern can be created individually and<br />

has to be preprocessed first. Owen describes the criteria for a ”good” fiducial [41]. The AR<br />

volcano uses this vision-based tracking toolkit (see 2.1). Advantage is that the computation<br />

of the homographie-matrix, which describes the relation between the camera and the marker<br />

plane works faster than the texture tracking introduced next. The big disadvantage is that it<br />

is totally error-prone to occlusion. If the marker or only parts of the marker disappear in the<br />

video image the tracking fails. This restricts the usage, because the complete marker always<br />

has to be in the video stream.<br />

The texture tracking introduced next is an extension of the ARToolkit, it still uses marker<br />

recognition for obtaining an initial position.<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

Figure 2.4: Examples of an ARToolkit Marker and a 2D textured plane used in the Texture Tracking<br />

version of the ARToolkit (Note that this version is still using markers for initialization as well)<br />

2.4.2 Texture Tracking of a 2D plane<br />

The ARToolkit has been introduced to have a quite simple toolkit to write small AR applications<br />

based on vision-based marker tracking. However the texture tracking version of the<br />

ARToolkit allows to track two dimensional textures instead of black and white markers 1 [11].<br />

These images 2.4 have to be preprocessed in order to calculate a set of feature points that are<br />

used for the tracking algorithm. Note that the marker detection is still used for obtaining<br />

the initial position and orientation. Thus the image to preprocess has to contain a ARToolkit<br />

marker. Once the initial pose information is calculated it continues with the tracking of point<br />

features. The texture tracking toolkit provides both the preprocessing tools and the tracking<br />

algorithms.<br />

In this context we want to distinguish between texture tracking and Natural Feature Tracking<br />

but this is our own definition of terms used in the thesis. Texture tracking is used with<br />

preprocessed images, preprocessed textures. Still the environment has to be prepared. Like<br />

markers the textures have to be placed on the table, the wall or the floor. Natural Features<br />

are features that are not artificially placed in the environment, like edges, lines or other properties.<br />

In the related work section 3.5 we will discuss ideas for natural feature tracking as<br />

well. The next subsection is thought as a small tutorial for the texture tracking algorithm<br />

of the ARToolkit. While analyzing the algorithm the ideas for this work have evolved. A<br />

deeper analysis of the algorithm can be found in Vials master thesis [59].<br />

Algorithm of the Texture Tracking ARToolkit<br />

Every single step of the algorithm is shown in figure 2.6:<br />

1. The data structures for the ARToolkit-Handles for the texture tracking and for the<br />

marker tracking as well are created and initial parameters are set. As mentioned above<br />

the initial orientation is given by the marker position in the image frame. This is done<br />

by a simple call of the marker detection method of the ARToolkit (step 1 and 2).<br />

1 This version is not available under license yet. Please contact Hirokazu Kato for further information:<br />

kato@sys.es.osaka-u.ac.jp<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

2. Due to the initial positions four feature points of the preprocessed image that are visible<br />

in the current video frame are selected to update the position. Later on the selected<br />

feature points have to be found in the video frame. Out of all the feature point candidates<br />

four points are selected according to the following rules. The number in the<br />

brackets shows the step in the algorithm.<br />

• F P1 This point has to be furthest away from the video frame center (5)<br />

• F P2 This point has to be furthest away from F P1 (11)<br />

• F P3 This point has to maximize the surface of the triangle between F P1, F P2 and<br />

the new F P3 (12)<br />

• F P4 This point has to maximize the surface of the square between F P1, F P2, F P3<br />

and the new F P4 (12)<br />

3. Once a feature point is selected a template is created for it. This template is used by<br />

the Normalized Cross Correlation (NCC) method to deliver a measurement for similarity<br />

between the template and an area around a pixel in the video frame. Reason for that<br />

is that it is unlikely to find a feature point in the next frame again. Thus windows are<br />

compared. Figure 2.5 shows a template for a selected feature point F P . Now following<br />

NCC parameters for the template are calculated (6):<br />

• The average pixel value:<br />

averagetemplate<br />

• A vector with the normalized pixel values. The advantage of the normalization is<br />

that different light conditions between frames do not result in different correlation<br />

values:<br />

∀(x, y) ∈ template :<br />

vectortemplate(x, y) = valuepixel(x, y) − averagetemplate<br />

• The normalized vector length of the template:<br />

<br />

lengthtemplate =<br />

∀(x,y)∈template<br />

(2.1)<br />

(vectortemplate(x, y) 2 ) (2.2)<br />

4. The algorithm now estimates the location of the selected FP in the video frame based<br />

on the previous location. Here simple approach is used.<br />

Three different estimation methods are provided. These methods take previous pose<br />

calculations at different levels into account. For each frame every method is evaluated<br />

to find the feature point and the best result is taken. Every method represents a<br />

different movement model of the camera.<br />

• The first method assumes that no movement occur between two frames. Using<br />

this assumption the algorithm simply takes the position pk i of the F Pi in the last<br />

frame and searches the position p k−1<br />

i for the F Pi in the new frame within a certain<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

Figure 2.5: Texture Tracking Template: Not only single feature points but whole areas are compared<br />

search window size. This assumption however is not really realistic because it<br />

does not consider movements at all. But movements definitely occur between<br />

two frames and cause displacements (7).<br />

p k i = p k−1<br />

i<br />

(2.3)<br />

• The second method takes the last two tracking frames into account. Here it is<br />

assumed that the displacement between two frames is constant.<br />

v = p k−1<br />

i<br />

− pk−2<br />

i<br />

(2.4)<br />

Now the position of the feature point in the current frame can be calculated. The<br />

equation<br />

results to<br />

p k i = p k−1<br />

i<br />

p k i = 2p k−1<br />

i<br />

+ v (2.5)<br />

− pk−2<br />

i . (2.6)<br />

• The third method uses the last three positions of the feature point in a similar way.<br />

p k i = 3p k−1<br />

i<br />

− 3pk−2<br />

i<br />

+ pk−3<br />

i<br />

(2.7)<br />

A common and more sophisticated method to predict the position of a feature in the<br />

next frame is a Kalman filter which is not applied here. Examples will be shown in the<br />

related work.<br />

5. Around the estimated position of the tracked feature the algorithm searches for the<br />

best matches to update the real position of the feature point. This is done for all three<br />

position estimates described in the step before. Within a fixed search window the correlation<br />

value for every area around the pixel with the template is calculated. In figure<br />

2.7 the parameters used for the calculation of the NCC value are shown.<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

Figure 2.6: The ARToolkit Texture Tracking Algorithm<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

Figure 2.7: Template Matching<br />

To obtain the correlation value which gives a measurement for similarity between the<br />

template and the pixel area at position (i, j) area several calculations are made for<br />

every pixel within the search area.<br />

• Similar to the creation of the template every pixel within the pixel area is normalized<br />

by subtracting the average pixel value. Note that the size of the pixel area is<br />

exactly the template size.<br />

∀(x, y) ∈ pixelarea :<br />

vectorpixelarea(x, y) = valuepixel(x, y) − averagepixelarea<br />

• Also the length of this vector is calculated.<br />

<br />

lengthpixelarea =<br />

∀(x,y)∈pixelarea<br />

(2.8)<br />

(vectorpixelarea(x, y) 2 ) (2.9)<br />

• To calculate the correlation between the pixelarea and the template every value in<br />

vectorpixelarea is multiplied with the corresponding (at the same position) value<br />

in vectortemplate.<br />

corr =<br />

templateSize2 −1<br />

(vectorpixelarea(i) · vectortemplate(i)) (2.10)<br />

i=0<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

• Finally the similarity can be calculated.<br />

sim =<br />

corr<br />

lengthpixelarea · lengthtemplate<br />

(2.11)<br />

• The result is a measurement for the similarity of the area around pixel (i, j) and<br />

the template. It has a range from -1, which indicates no similarity at all, to 1<br />

indicating a high correlation. Here we will just mention that later on we use another<br />

correlation method as a statistical analysis tool for the evaluation of the user<br />

study. So this similarity must be calculated for every pixel within the search area.<br />

The one with the highest similarity value is most likely to be the feature point in<br />

the current frame.<br />

6. According to the requirement to track four feature points these steps have to be done<br />

four times to obtain the locations of the four feature points in the current frame. Note<br />

that we are using three different methods, which are described above, to estimate the<br />

positions of the feature points.<br />

7. Now for every method the position of the 2D plane is calculated and the one producing<br />

the smallest tracking error is taken. Finally the algorithm provides the correct position<br />

of the textured plane in the camera viewpoint.<br />

Complexity of the template matching algorithm<br />

Considering the single steps described there are two main drivers for the complexity and the<br />

robustness of the texture tracking algorithm. For every pixel within the search size area<br />

the area around the pixel<br />

O(searchSize 2 )<br />

O(templateSize 2 )<br />

is compared with the template of the current feature point. This leads to a complexity of<br />

• Search Size<br />

O(searchSize 2 · templateSize 2 ) (2.12)<br />

Due to movements of the camera the estimation of the feature point position with one<br />

of the three methods is not correct and therefore several points around the estimated<br />

feature point are considered. These points are within a certain search window. So a<br />

large search window provides a higher probability to find the point with the highest<br />

correlation value. But this also means that for every point within the search the correlation<br />

has to be calculated. This will increase the computation time. On the other<br />

hand a small search size will speed up the computation but the tracking robustness<br />

will decrease.<br />

This parameter is set to a constant value during runtime. According to Billinghurst the<br />

minimal error will result with a pixel search area of 48 2 pixels [11]. Thus the algorithm<br />

is configured with a constant search window length of 48. Here the main idea of this<br />

thesis evolves. Does the parameter of the search window size has to be constant? Is it<br />

possible to adjust this parameter during runtime with movement information?<br />


• Template Size<br />

2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

If the template size is large the algorithm will provide a higher quality of the correlation<br />

value. In this thesis the template size will be considered as constant during runtime.<br />

2.4.3 Tracking in the Magic Book<br />

The tracking technology used in the Magic Book ”Giant Jimmy Jones” is the texture tracking<br />

of a 2D plane. This has several advantages to previous marker-tracking based applications:<br />

• The marker does not have to be in the view<br />

As we said with marker-based tracking occlusion is a serious problem. But as the algorithm<br />

uses certain features of the image, the marker is only needed for initialization.<br />

Of course if the tracking fails, the marker has to be in the view of the user again. But<br />

after initialization the user can move in a way that the marker disappears in the video<br />

frame.<br />

• It is possible to zoom in and zoom out<br />

The user is now enabled to zoom in a scene. If the user gets closer the tracking still<br />

works, it just selects feature point that are more suited for the current tracking frame.<br />

In contrast with marker-based tracking zooming in will fail because it is likely that the<br />

marker is not completely in the video image.<br />

• Any preprocessed image can be used<br />

Any image can be used, with the restriction that there still has to be a small marker in<br />

it. No more artificial markers have to be placed in the environment. The user can also<br />

use the Magic Book as a simple textbook, because the pages consist of colorful images<br />

and not of markers anymore.<br />

2.5 User Interaction<br />

In user interaction we can differ between interfaces providing feedback to the user, mainly<br />

graphical output hardware and interfaces enabling the user to provide input to the system.<br />

2.5.1 Graphical Output Hardware<br />

Now we will have a look at the different possibilities for graphical output user interfaces.<br />

Again the right choice of a suited user interface is an important issue for table top AR.<br />

Desktop PC<br />

Due to the fact that users who are not familiar with new ways of user interaction methods,<br />

still the common desktop PC can be used. This is more suited for Outside-In tracking. A<br />

camera is installed at a fixed location delivering images of the <strong>Table</strong> <strong>Top</strong> workspace. On a<br />

monitor the users can see the feedback to their actions, like the rearrangement of markers for<br />

example. Graphical output through a monitor is not suited for Inside-Out tracking because<br />

the user moving the camera is probably not able to see visual feedback at the same moment.<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

Anyway this is not the kind of user interaction we want, because as we discussed Inside-Out<br />

tracking is more suited.<br />

Head Mounted Display<br />

In a classical AR environment a user is equipped with a HMD. It could be compared to data<br />

glasses and it is attached to the users head. The first advantage is that the user is still able<br />

to use his hands. As the target is moving steadily this device is used for Inside-Out tracking<br />

and thus suited for our purposes. There are two different kinds for this graphical output<br />

device:<br />

Optical see-through. An optical see-through HMD is based on a semi-transparent mirror<br />

making it possible to look through the display. The virtual objects are augmented in<br />

the mirror as well.<br />

Video see-through. A video see-through HMD shows a video stream tracked by a camera<br />

attached to the HMD augmented with the virtual information. Note that with this<br />

method such a setup has to be calibrated in a way that the camera represents the user’s<br />

view.<br />

Delay of the vision-based tracking is always a problem with graphical output. With optical<br />

see-through displays the virtual world and with video see-through the real world will lag<br />

behind [45]. Thus it also a research issue to predict the head motion of a user to compensate<br />

the tracking delay (see the related work chapter 3.5).<br />

Hand-Held device<br />

This is a variation of the HMD developed at the HIT Lab. The HMD is put on an iron stick<br />

in order to use it like lenses or a visor (see figure 2.1). The user holds it in front of his eyes<br />

and it has the same effect as a usual HMD. A video-see through device is used and therefore<br />

a camera is attached to the hand held that it exactly matches the viewpoint of the user.<br />

Disadvantage is that the user has to use one hand to steer the device. The rational behind<br />

this device is that during exhibitions a lot of people want to use the application. It is a lot<br />

easier to hand over the device than to adjust the HMD for the next user. And despite that<br />

the HMD might get damaged after a certain period. For the HMD a Sony Glasstron Video<br />

see-through HMD 2 and a Logitec Quick Camera 4000 3 is used.<br />

<strong>Table</strong>t PC, PDA, Mobile Phone<br />

The idea of this interaction device is that the user has a video-see through display, usually a<br />

small and flat computer in his hands. A camera is attached to the computer producing the<br />

see-through effect. An example is a tablet PC. It is possible to rotate the display on top of<br />

the computer and use it as a tablet. As computers are becoming smaller and smaller and the<br />

processing speed increases (Moores Law [35]), table top applications could also be ported<br />

to devices like a small mobile phone or a Personal Digital Assistant (PDA) handheld. The<br />

2 www.sony.com<br />

3 www.logitec.com<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

Studierstube uses a PDA as an user interface for the Invisible Train application (see figure<br />

2.2). Efforts are also being made to port the ARToolkit on a mobile phone. This enables AR<br />

applications even for mobile phones.<br />

Projectors<br />

If a lot of people are interacting everyone has to be equipped with the necessary devices. An<br />

alternative is to use a projector to display the virtual information on the table. Thus users<br />

can interact with interaction devices, like markers or even tangible user interfaces and see<br />

the immediate result on the projection. A example for this is the Sheep application, which<br />

is a sheepherding game allowing multimodal interaction [47]. This game is also based on<br />

the DWARF framework we will describe later by the way. Another application application<br />

applying projectors is an intelligent kitchen [12]. With projectors virtual information is augmented<br />

to a kitchen helping the user to prepare a dinner for example.<br />

2.5.2 Input Interfaces<br />

An important issue is how a user can manipulate virtual objects. Thus additional user interfaces<br />

for collecting user input have to be provided. Still traditional interfaces like a mouse<br />

or keyboard are used, but as we have discussed earlier new and more suitable interfaces<br />

have to be found. Billinghust proposes to build tangible user interfaces based on marker<br />

tracking as well [10]. Marker-based optical tracking is used in his work. Markers are attached<br />

to real objects allowing users to interact with them. Moving, rotating and occluding<br />

the tangible objects results in a feedback by the application. User input can be provided by<br />

special components like a glove [58]. Again markers are attached to the glove itself and an<br />

optical tracking routine calculates the position and orientation of the hand. This technology<br />

makes it even possible to interact with virtual objects by ”touching” them. This leads to<br />

the question how to provide feedback caused by a collision of real and virtual objects. One<br />

issue in this kind of research are force-feedback devices, that provide a mechanical feedback<br />

to the user. One example are joysticks that adopt to a situation in a computer game and give<br />

feedback by making it harder to move in a certain direction. Another example for a forcefeedback<br />

device is the phantom by Sensable 4 which is a mechanical 6 DOF input device (see<br />

figure 2.8).<br />

2.6 Summary<br />

In this chapter we have discussed the context of table top AR. This class of applications<br />

has special requirements on tracking and user interaction. Applications for exhibitions and<br />

education being applied not only for research purposes have to be cheap, easy to install,<br />

usable and high-performance concerning the quality of tracking.<br />

All further evaluations will be on the Magic Book application. Here is a short summary of<br />

the key properties of the Magic Book.<br />

• Tracking<br />

4 http://www.sensable.com<br />


2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

Figure 2.8: The force-feedback phantom device by Sensable on the left and a special prepared glove<br />

for user interaction (Studierstube) on the right<br />

The vision-based tracking technique is grounded on texture tracking of a 2D plane.<br />

Preprocessed images can be used to calculate the viewpoint of the user.<br />

• User Interaction<br />

The user interface chosen is a hand held device, a HMD attached to an iron stick.<br />

Because even children are familiar with the book paradigm the application is based on<br />

a tangible book. Just by turning a page the content changes. The scene can be rotated<br />

just by rotating a plate the book lying on.<br />

As described movement information is not considered in the tracking routine. We have<br />

seen that the search window size is a fixed parameter in the texture tracking routine. If we<br />

can apply movement information to change this parameter during runtime we can achieve<br />

better tracking results in terms of computation speed and robustness. The next chapter will<br />

introduce this idea to improve the tracking in the Magic Book application.<br />


CHAPTER 3<br />

A Hybrid Tracking Approach<br />

Now we have discussed the basic requirements for table top AR. The fundamentals of the<br />

underlying technologies used by the Magic Book have been introduced. Now we have to<br />

bring the texture tracking technology and the user behavior together. We have seen that<br />

feature points have to be ”found” again in the next video frame and that the size of the<br />

search window an important configuration parameter of the tracking routine. Again, with a<br />

large search size window the robustness of the tracking will increase on the one hand. But on<br />

the other hand the computation time of the tracking routine will rise. This leads to a lower<br />

update rate.<br />

If we can establish a relationship between the search window and the occurring movements<br />

of the handheld device used in the Magic Book, we can configure the texture tracking<br />

algorithm during runtime. Thus a hybrid tracking approach combining vision-based tracking<br />

and inertial tracking is introduced.<br />

3.1 Motivation<br />

In the previous chapter we had a look at the complexity of the template matching algorithm<br />

used by the texture tracking:<br />

O(searchSize 2 · templateSize 2 ) (3.1)<br />

We will consider the template size as a fixed constant of the tracking routine. In former<br />

considerations the search size parameter was constant during runtime. Again if we would<br />

lower the value the computation of the pose information the computation will speed up in<br />

a quadratic way. If we could get the information that almost no movement of the camera<br />

has happened between two tracking frames, the estimation that the feature point will be at<br />

the same position in the next frame would be almost correct. There is no need for a large<br />

window if we have a measurement for a change in position or orientation. In contrast if it is<br />

possible to derive the information that movement occurred we can adjust the search window<br />

to a large size. This would of course increase the computation time, but it is more likely that<br />


3 A Hybrid Tracking Approach<br />

the feature point will be found in the next frame again, because more potential points within<br />

the search area are considered.<br />

Our approach is to use additional information about the movement of the camera to alter<br />

the search size parameter during runtime according to a simple rule:<br />

movement ↓⇒ searchsize ↓<br />

movement ↑⇒ searchsize ↑<br />

Thus the first step is to evaluate the relationship of movements and adequate search window.<br />

And if possible we want to derive a linear mapping.<br />

3.2 An Inertial - Optical Tracker based Runtime Setup<br />

A requirement in order to get movement information of the camera is that we have to track<br />

the handheld device. So if we talk now of the movement of the handheld device the movement<br />

of the camera is meant, because the camera is mounted to the device. For tracking the<br />

handheld device movements another tracking technology with the following requirements<br />

needs to be integrated.<br />

• Integration in the User Interface<br />

To obtain the pose measurements of the handheld device a small tracking device has<br />

to be integrated in the user interface. It has to be assembled in a way that it does not<br />

influence the user behavior at all.<br />

• Update Rate<br />

In order to estimate a realistic search window size for the next tracking frame of the<br />

optical tracker, the update rates of the movement tracker has to be higher than the<br />

frame rate of the vision based tracking system.<br />

• Measurement state space<br />

A criteria for the measurement state space is the number of degrees of freedom described<br />

in the tracking introduction. If we would use a 6 DOF tracker with higher<br />

update rates than the optical tracking, then there is no need for the optical tracking at<br />

all. It could be replaced by the other tracking system if accurate enough.<br />

Therefore we will use a tracker providing 3 DOF relative orientation. No position is<br />

delivered by such a tracker. It still has to be evaluated if a tracker with this properties<br />

is sufficient enough for our purposes.<br />

• Price<br />

A of course price is also a key requirement. If this technology has to be integrated in a<br />

mobile phones, for example, an expensive tracking device will not be considered.<br />


3 A Hybrid Tracking Approach<br />

For our purpose an inertial tracker, a gyroscope is suited best. When we introduced inertial<br />

trackers we also discussed that small measurement errors accumulate and cause drift<br />

after a short period. It provides relative orientation measurements. But as we are interested<br />

only in the relative change of orientation between two tracking frames drift does not matter<br />

for our setup. Drift does not affect our measurements.<br />

But a big question is if the relative measurement of orientation suited for the configuration<br />

of the search window size. To prove if this is possible we first have to discover a relationship<br />

between the movement measurements and the feature point tracking routine. If we are able<br />

to find such a relationship we can integrate the mapping in our software design.<br />

3.3 Configuration of the setup<br />

As described the first question is if there is a relationship, then the second question is to find<br />

a mapping between change in orientation and search window size:<br />

ftexturetracking(∆ orientation) = search window size (3.2)<br />

This mapping has to be integrated into the software of the hybrid tracking setup. Thus<br />

one requirement for the software design is to allow a dynamic configuration of the texture<br />

tracking routine. For every tracking frame the proper value for the search window size has to<br />

be set according to a mapping. This leads to the question how to determine this relationship<br />

and how to evaluate if this approach is possible at all.<br />

The idea is to do an user study. The study should give hints on how people actually use<br />

the Magic Book. While performing the study we want to retrieve data of the movement<br />

of the handheld device on the one hand, but we also want to have a deeper look at certain<br />

properties of a feature point on the other hand. Both data sets have to be explored the degree<br />

of correlation has to be measured. Correlation is a measurement to what degree two data sets<br />

are related.<br />

3.4 Motivation for a User Study<br />

As discussed we have to record data of the movement of the handheld device and try to<br />

relate this data to properties of feature point tracking, which also have to be logged.<br />

• Recording the pose information of the handheld device<br />

In order to record absolute full 6DOF we have to get the absolute position pHandheld<br />

as well as the absolute orientation qHandheld. To record the pose information a 6DOF<br />

magnetic tracker will be used. You might ask why we are also considering the position<br />

as well, because for the runtime setup only the orientation is needed. The magnetic<br />

tracker is used for the analysis of movement. If we recognize that people mainly use<br />

the Magic Book by changing the position of the handheld device our idea does not<br />

work. Please note that the 6DOF tracker is only for the purpose of the user study, not<br />

for the runtime environment. The runtime setup still consists of the handheld device<br />

and the gyroscope.<br />


3 A Hybrid Tracking Approach<br />

Figure 3.1: The 2D coordinates of a feature point are tracked over several frames. The feature point<br />

moves through the 2D video plane<br />

• Recording 2D coordinates of feature points in the video image<br />

One possibility to relate the obtained pose information to feature point tracking is to<br />

record the 2D video frame coordinates of the feature points in every frame. This leads<br />

to a change in the 2D position for the feature point ∆pF P , if the feature point is tracked<br />

over a period of several frames. This change in position can be annotated with the<br />

corresponding change in orientation given by the magnetic tracker. Deriving these<br />

”chains”, where the same feature point is tracked over several frames is dependent on<br />

the selection of the best suited feature points by the algorithm. In figure 3.1 a feature<br />

point is tracked for several frames and ”moves” through the 2D video plane. This<br />

movement is obviously caused by camera movement.<br />

3.5 Related Work<br />

This section should give an overview of the current research related to this thesis. Main issue<br />

of my work is to characterize the movement of the user interface to improve the computation<br />

and the robustness of the system. But we also want to have a look at the current algorithms<br />

for natural feature tracking first.<br />

3.5.1 Natural Feature Tracking<br />

According to [55] an image sequence can be represented as any function of three variables<br />

I(x, y, t), with x, y a spatial and discrete variables and t as discrete variable for time. As<br />

patterns move from frame to frame in an image stream I satisfies the following equation:<br />

I(x, y, t + τ) = I(x − ξ, y − η, t) (3.3)<br />

In other words this equation says that we can take a picture of a scene at later point in time<br />

and we can obtain the image by moving every point p = (x, y) by a displacement d = (ξ, η).<br />

If we want to track certain features of a scene over several frames algorithms have to face<br />

this displacement. In our case camera movement occurs and I(x, y, ti+1) = I(x, y, ti).<br />


3 A Hybrid Tracking Approach<br />

In his ”State of the Art Report of Natural Feature Tracking” Vial [60] gives an overview<br />

of principle features that can be found in an image 3.1. He also provides an overview of<br />

common methods to extract the features. Generally we can distinguish between modelbased<br />

and move-matching methods [51] for feature tracking. Model-based tracking requires<br />

a model definition of the object to be tracked. Marker detection methods like the ARToolkit,<br />

but also the texture tracking of the ARToolkit can be categorized in this method. Reason<br />

for that is that all the images used with the toolkit have to be preprocessed first. During<br />

runtime the preprocessed data is applied. Another possibility for a model-based approach<br />

is to consider a CAD model of the environment in the tracking routine [29]. Thus such<br />

methods are not suited for unprepared environments. In contrast move-matching methods<br />

estimate a correspondence of 2D image movements to 3D position and orientation without<br />

any underlying model.<br />

0D 1D 2D Motion<br />

Corners, Points Contours, Edges,<br />

Chains, Lines,<br />

Circles, Ellipses<br />

Uniform Regions,<br />

Textured<br />

Areas, Surface<br />

patches<br />

<strong>Table</strong> 3.1: Overview of features in an image [60]<br />

Regions with<br />

similar motion<br />

Thus the first step is to select ”good features in regions with rich enough texture [55]” and<br />

then apply tracking techniques to find the corresponding point pi+1 = (x − ξ, y − η) of point<br />

pi = (x, y) in the following frame. This is often applied in closed-loop architectures [51]:<br />

<br />

1. Detect N interest points in frame i + 1, resulting in the set<br />

x i+1<br />

j<br />

2. Match interest point from frame i to i + 1 and find the correspondences x i j<br />

3. Use these correspondences to compute pose<br />

N<br />

j=1<br />

←→ xi+1<br />

k<br />

The problem is to find the correspondences of features between tracking frames. And<br />

that is were our approach makes sense. While tracking feature points Lucas and Kanade<br />

[33] proposed the measurement of similarity between fixed search windows of two consecutive<br />

frames. This is based on the assumption that the displacements d from frame to frame<br />

are small. The correlation of the windows is defined as sum of squared intensity differences.<br />

This method is applied by the texture tracking ARToolkit. Every algorithm using this<br />

method can be extended with our hybrid tracking approach.<br />

Neumann and You use similar closed-loop approach. They use the concept of optical flow,<br />

which observes the motion of the image pixels as a whole. They combine region tracking and<br />

feature point tracking [38][37]. First regions with similar movements are extracted and the<br />

tracking is refined by 0D point tracking. Region motion tracking is based on optical flow<br />

and relies on the spatial-temporal gradients of an image. Using region tracking a movement<br />

model is derived. Because we know where a feature is located within a region we<br />

can refine the region tracking by matching the corresponding points by applying correlation<br />

methods as well. This work also proposes a verification and evaluation mechanism. For<br />


3 A Hybrid Tracking Approach<br />

every estimation the confidence is assessed. If the confidence is poor the result is refined.<br />

This approach allows larger movements as well. Generally region tracking allows larger<br />

camera movements while point feature tracking itself is only suited for small displacements.<br />

In contrast our approach assumes small inter-frame displacements which results in a rather<br />

simple movement model. It has to be evaluated if our idea can be applied here as well for<br />

the refinement of the tracking.<br />

All these algorithms have the displacement problems for finding the tracked point of interest<br />

in the next frame again. As a summary our idea is suited for all feature tracking<br />

algorithms assuming small displacements between frames. To find corresponding features<br />

points windows are compared, not pixels itself. These windows correspond to the term templates<br />

in the texture tracking. Our approach tries to influence the number of comparisons<br />

of these windows by applying orientation information. With this information the search<br />

window is adjusted.<br />

3.5.2 Hybrid Tracking<br />

As we said hybrid tracking combines different tracking technologies to compensate drawbacks<br />

of a single tracker. We will focus on related work on combinations of vision-based<br />

and inertial tracking. As discussed drawbacks of vision-based tracking are the occlusion<br />

of tracked features and computational expensive algorithms causing additional delay. An<br />

inertial tracker accumulates small measurement errors that cause drift [46].<br />

Azuma motivates the usage of hybrid tracking systems for unprepared environments [4].<br />

In prepared environments the user or developer is in control of tracked objects. A user<br />

can place fiducial markers on a table for example. This is more difficult in outdoor AR<br />

applications. Light conditions change and visual landmarks used for feature tracking may be<br />

occluded. Integrating a gyroscope provides a good estimate of orientation and a reasonable<br />

guess to reduce the search space of the optical tracking algorithm. This idea is picked up in<br />

our work. He uses the setup the following way: If a user stops moving, the video tracking<br />

is locked on traceable features and the accumulated drift in the inertial tracker is corrected.<br />

He distinguishes two methods for the fusion of inputs from inertial and optical tracker:<br />

1. Use the gyroscope orientation as an estimate for orientation of the vision tracker.<br />

This compensates inaccurate measurements of the vision-based tracker, but the inertial<br />

tracker will drift and cause wrong result after a certain period.<br />

2. Use vision-based tracker to compensate drift<br />

Every frame of the vision-based tracker corrects the measurements of the inertial tracker.<br />

Thus drift does not occur, but inaccurate measurements of the vision-based tracker will<br />

be propagated.<br />

Handling unprepared environments is an important issue for outdoor applications. It is<br />

not realistic placing fiducial markers or preprocessed textures in the outdoor environment.<br />

Our environment is prepared, we can expect accurate measurements of the optical tracker.<br />

Drift does not affect our setup either because we only use relative measurements of the<br />

gyroscope tracker. In [68] the vision-based algorithm introduced earlier [37] is applied for<br />


3 A Hybrid Tracking Approach<br />

a hybrid setup. The frame to frame prediction of camera orientation by the inertial tracker<br />

and the correction of the accumulated drift exploits the nature of both trackers. The inertial<br />

predicts the motion of image features and the estimated positions are refined by searching for<br />

local matches for the feature points. This work also addresses the importance of calibration<br />

issues of the setup.<br />

In an indoor and mobile path finding setup by the Studierstube a user is guided through<br />

an unfamiliar building to a destination room [26]. This application combines marker and<br />

inertial tracking as well. A camera mounted on a helmet worn by the user grabs video<br />

images containing square markers attached on walls. Additionally an inertial tracker also<br />

attached to the HMD provided head orientation. This setup tries to compensate the low<br />

update rates of vision-based tracking with a gyroscope. In between measurements from the<br />

optical tracker the gyroscope gives the user’s viewing direction. The drift drawback of the<br />

gyro is corrected with the orientation given by the next frame of the optical tracker:<br />

qHybridview = qvision = qcorrection ◦ qinertial with qcorrection = qvision ◦ q ∗ inertial<br />

This method has been applied from [40]. In this setup an active ultrasonic tracking system<br />

called Bat system is combined with an inertial tracker. In contrast to Azuma these are indoor<br />

setups in a prepared environment, thus the vision-based tracking is very reliable.<br />

In chapter 1 we have already introduced a method to predict and correct measurements,<br />

the Kalman Filter. Figure 3.2 shows the basic mechanism of the filer loop. This filter has<br />

several benefits. On the one hand it can predict measurements even if actual measurements<br />

are not available yet due to low updated rates of a tracking system. The prediction estimates<br />

can be used prior the availability of the actual measurements (State estimation). In the correction<br />

phase the parameters for the prediction are recalculated (State update). Although<br />

prediction is not necessary a hybrid tracking issue sensor fusion approaches can be used to<br />

predict new measurements. Klein and Drummond propose a filter-architecture for a modelbased<br />

hybrid tracking approach [29]. Model-based approach in this case means that a CAD<br />

model of the tracked environment is available. Again the idea is that the prediction of the<br />

new camera pose is estimated by an inertial tracker. Thus with this information the visual<br />

tracking system is able to start in the right place. With the results of the visual tracking frame<br />

a new system state is calculated which is used for further prediction and the correction of<br />

the accumulated tracking error of the inertial tracker. Again a huge focus of this work is the<br />

issue of calibration of the used trackers. Neumann’s and You’s design of the filter [67] even<br />

allows a failure of the vision-based tracker, due to occlusion for example, to update the current<br />

state of the system. This work focuses on fiducial marker tracking in the vision-based<br />

tracking routine. The predicted pose can be utilized in this approach as well. For the visionbased<br />

and the gyroscope measurements independent correction channels are provided. For<br />

his outdoor reality system Azuma fuses the output of gyroscopes and of a compass [1]. Thus<br />

the user’s head movement can be predicted and the noise in the compass measurements can<br />

be filtered. He evaluated that the compass noise makes it hard for outdoor registration tasks.<br />

Applying a filter using additional gyroscopes the system was stabilized.<br />

We are combining gyroscopes and vision-based tracking which is a hybrid tracking approach.<br />

But we do not fuse the output together. We only apply the relative orientation given<br />


3 A Hybrid Tracking Approach<br />

Figure 3.2: The prediction and correction loop of the Kalman filter<br />

by the gyroscopes. Thus a difficult calibration step is not necessary, because we consider and<br />

evaluate the orientation independently.<br />

3.5.3 Head motion prediction<br />

As we already introduced a very common setup for AR applications is a HMD-based setup.<br />

An important problem is the end-to-end system delay. The user always has the impression<br />

that the virtual content lags behind his actual movements. Hence the head movement has<br />

to be predicted. We already introduced the Kalman filter as a prediction method, but still<br />

other filtering techniques are available. Azuma tries to compare two classes of head motion<br />

predictors [3]. Both methods are analyzed in the frequency domain in order to obtain<br />

characteristics of the predicted signal as a function of system delay and input motion. Shaw<br />

and Liang address the problem of head motion as well. In a first experiment they try to<br />

characterize head motion [48]. Especially changes in head orientation are important because<br />

the change of viewing direction often causes more changes in the scene. The benefits of this<br />

knowledge should be used for the design of a predictive filter. The experiments consist of<br />

a user study where participants have to fulfill several navigation tasks. The test person sits<br />

on a chair and has to look at markers on a wall in a certain sequence. The head position and<br />

orientation is tracked during the study. They found out that the user’s head moves along a<br />

great-circle arc and that the velocity of orientation seems to be symmetric while accelerating<br />

and slowing down. The second step was to design the filter [32]. They applied the knowledge<br />

that the felt delay was mainly caused by delay in orientation and jittering is mostly<br />

caused by noise in the position data. They also recognized that the noise in position was<br />

higher that the noise in orientation. As a consequence they designed a prediction filter to<br />

address the orientation delay and an anisotropic low pass filter to filter the noise in the position<br />

measurements. They also evaluated an adequate prediction length of the filter. As one<br />

conclusion they noticed that the prediction of hand movement is a more difficult process. In<br />

our approach the movement of the handheld device is rather comparable to hand movement<br />

than to head motion.<br />

3.5.4 <strong>Table</strong>-<strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

The basic motivations for table top AR have already been discussed in chapter 2.1 and examples<br />

have been shown. For references for related projects please have a look there. Here<br />

is just a short summary of application domains for table top AR. But obviously there are still<br />


unexplored domains for table top AR as well.<br />

• Exhibitions<br />

• Education<br />

• Gaming<br />

• Interactive Storytelling<br />

• Conferencing<br />

3 A Hybrid Tracking Approach<br />

Every table top AR application has to face the problem of 3D registration. Mark Billinghurst<br />

discusses this problem in several publications. The intention is of the shared space technology<br />

is to enable interaction with virtual and physical objects, but also the collaborative<br />

interaction with other users [9] [8]. The shared space could be used for a variety of applications.<br />

He mainly uses the ARToolkit as optical tracking system [27]. A lot of table top<br />

applications use the ARToolkit as tracking component either to display the virtual content<br />

[61] or to interact with the system [57][10]. Reasons for that may be that it is free available<br />

and easy to integrate in the application. A deeper knowledge of tracking and image recognition<br />

algorithms is not necessary. Therefore the ideas provided in this thesis are suited for<br />

all vision-based tracking application in horizontal table top environments.<br />

3.5.5 Our approach<br />

A cheap gyroscope only giving relative spacial information in sufficient even if drift occurs,<br />

because only the relative change of orientation between two frames is considered. Our application<br />

is set up in a prepared environment. It means that we are using a preprocessed<br />

image for tracking. There is no need to filter the gyro orientation, because it is not used for<br />

the registration of the fairy tale scene. And although head mounted displays are also suited<br />

for table top AR, the input device used in the Magic Book is the handheld visor. The field of<br />

interest of the user will be the horizontal setup on the table. The the range of possible movements<br />

is rather restricted to the table area and small inter-frame displacements are likely.<br />

But of course an interesting question would be as well how user behavior changes with different<br />

input and output devices. We have seen that out approach is suited for vision-based<br />

trackers assuming small inter-frame displacements. Future work has to evaluated to what<br />

degree our ideas can be applied for other tracking algorithms. As we said the motion of<br />

the handheld device will differ from head motion. Hence out approach might motivate the<br />

characterization of other user interfaces.<br />

In my opinion table top AR applications are a way to address a broad mass of people with<br />

the new technology. Just image augmenting an exhibition piece with virtual information<br />

using a cellphone display with a attached camera. Thus fast, robust and accurate tracking<br />

will seriously influence the user’s acceptance of the technology. Our approach will help to<br />

improve this.<br />


3.6 Summary<br />

3 A Hybrid Tracking Approach<br />

According to our idea we now do not consider the search window size as a fixed parameter<br />

inside the texture tracking algorithm anymore. Our approach is to adjust this size during<br />

runtime according to relative orientation information given by a cheap and self-contained<br />

gyroscope tracker. To find a relationship between the feature point tracking routine and<br />

changes in orientation of the handheld device ideas for a user study have been discussed.<br />

From now on we distinguish between a runtime setup and a user study setup.<br />

Runtime setup. The runtime setup consists of the handheld device with an integrated gyroscope,<br />

a camera as input- and a altered HMD as output device. A requirement for this<br />

setup is the dynamic configuration of the search size window during runtime.<br />

User Study setup. The user study setup consist of a magnetic tracker recording the movements<br />

of the handheld. The ARToolkit tracking has to be extended to log the feature<br />

point coordinates in every video frame.<br />

The next chapter will focus on the software architecture of the runtime setup. After that<br />

the main part of the thesis will be introduced: the design, execution and evaluation of the<br />

user study.<br />


CHAPTER 4<br />

A Software Architecture based on DWARF<br />

To enable a dynamic configuration of the optical-inertial setup during runtime a software<br />

architecture has to be provided. The information given by the gyroscope tracker has to be<br />

processed and used to set the search window size of the texture tracking routine. The Distributed<br />

Wearable <strong>Augmented</strong> <strong>Reality</strong> Framework DWARF is a component based framework to<br />

build AR applications [17][6]. A CORBA 1 -based infrastructure allows communication between<br />

these components, so distribution which is important for mobile setups is provided.<br />

Reusable components for tracking, rendering and user interaction enable rapid prototyping.<br />

First I will give a short overview of the basic principles of DWARF. Then the requirements<br />

for my architecture will be discussed and I will show the structure of the resulting system.<br />

Only the necessary terms relevant for this thesis will be described. More information and<br />

tutorials about DWARF can be found in the corresponding references.<br />

4.1 DWARF<br />

DWARF has been developed at the Technische Universität München (<strong>TUM</strong>) from the AR<br />

research group of Prof. Klinker. The basic concept of DWARF is that various applications<br />

consist of interdependent and distributed components that can be reused for a variety of<br />

different applications even for several application domains.<br />

4.1.1 Services<br />

In DWARF components are called services and can be distributed within the network infrastructure.<br />

Every service is running as a single process. To build an application these<br />

services have to be combined in order to fulfill a certain task. To accomplish this, interdependent<br />

services have to exchange data via mechanisms provided by the framework. Every<br />

service has its service description. The XML-notated description describes the input data<br />

needed and the output data provided by the service. This configuration is responsible for<br />

1 http://www.corba.org<br />


4 A Software Architecture based on DWARF<br />

the connection with other services delivering or demanding exactly what is needed, the so<br />

called needs, or provided, the abilities, by the service. The configuration of a service can also<br />

change or be reconfigured during runtime.<br />

Needs and Abilities<br />

• Needs<br />

A need is a certain property of a service in order to request a functionality from another<br />

service running in the network. The middleware connects these two services and they<br />

continue to work on a peer-to-peer basis. The services can communicate via different<br />

communication protocols described by the need description.<br />

• Abilities<br />

Abilities are the correspondence to Needs. They are specifying a certain functionality<br />

provided to other services. The ability description also sets the communication protocol<br />

used for communication (similar to the need description)<br />

Further restrictions can be made with attributes and predicates. Attributes can be set for<br />

abilities. It can specify certain additional properties of the ability. For example if two services<br />

provide similar abilities, like two cameras are attached to a user. To distinguish between<br />

them for every ability an additional attribute is set. Now the need for video data can specify<br />

a predicate in order to connect with the right ability. In the example 4.1.3 this means that the<br />

service only wants information from the video data ability with the attribute ”head”.<br />

If the middleware recognizes that the service descriptions of two services match according<br />

to their needs and abilities (with predicates and attributes), they get connected and can start<br />

to exchange data via several communication protocols.<br />

Communication<br />

Here I will give a short description of the main communication protocols used in DWARF.<br />

As mentioned the way of communication is specified in the need- and ability description<br />

and has to match with each other. Some of them are CORBA-based mechanisms, thus an<br />

Interface Description Language (IDL) interface has to be provided.<br />

Method Calls: A service exports a method provided to other services. The corresponding<br />

partner has to import this method. The service is able to call the method on the remote<br />

object now, which is similar to a method call on a local object. This is realized by<br />

CORBA Remote Procedure Calls RPCs.<br />

Events: A service can send events and another service is able to subscribe these events. This<br />

is realized by the CORBA Notification Service.<br />

Shared memory: A service could also write data in a local shared memory. Another service<br />

is able to read out of the shared memory to obtain the data. Note that both services<br />

have to run on the same machine, which is a restriction to the distribution of the components.<br />


4 A Software Architecture based on DWARF<br />

Figure 4.1: The example shows two services: the Videograbber with the ability for VideoData demanded<br />

by the need of the VideoDisplay service<br />

Depending on the requirements of the application the right choice of communication protocols<br />

has to be selected. As a short summary a need and an ability consist of a name, used<br />

as an identifier, a type, describing the kind of data offered or demanded and the connector<br />

protocol specifying the communication mechanism.<br />

4.1.2 Service Manager<br />

The DWARF service manager is responsible for connecting the components. It is the ”heart” component<br />

of the framework. At each network node a service manager is running. Every time a service<br />

is started it registers at the service manager which collects all the information about local<br />

services. Once a service is registered the service manager looks for a suited connection partner.<br />

It fulfills the task of a broker. It tries to find the corresponding ability for a need. If two<br />

matching service have been found the service manager establishes a connection between the<br />

services. Now the matching services can communicate via a direct communication channel<br />

on a peer-to-peer basis, the service manager is not needed anymore for communication.<br />

4.1.3 An example<br />

This little example will show the terms described above (see figure 4.1).<br />

The notation used for DWARF architectures is an Unified Modeling Language (UML [20])<br />

extension for component diagrams providing mechanisms for the need- and ability relationships.<br />

Needs are represented by half-circles and abilities by full-circles.<br />

The example shows two services, the VideoGrabber with the ability for ”type=VideoData” and<br />

the VideoDisplay with the need for type=”VideoData”. Figure (4.2) shows the corresponding<br />

XML service descriptions. On a first look it is obvious that the type of need and ability<br />

matches. The VideoDisplay service sets the predicate ”type=head” which means that this service<br />

only wants to connect to a ability with the attribute ”type=head”. As we can see the<br />

ability ”provideImage” fulfills this. Both services are connected by the service manager and<br />

they communicate via shared memory.<br />

The services can be reused for different purposes. As we will see later on the video stream<br />

provided by the VideoGrabber can also be used for optical tracking. The VideoData is forwarded<br />

to another service performing the texture tracking.<br />


4 A Software Architecture based on DWARF<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

<br />

Figure 4.2: An example: XML description of the VideoGrabber and the VideoDisplay services<br />

connected via the shared memory communication mechanism. The need and the ability for<br />

”type=VideoData”and the predicate for ”type=head”matches.<br />

4.2 Software Architecture for a Dynamic Configuration during<br />

Runtime<br />

DWARF has the capabilities for a suited software design for our runtime setup, that uses<br />

the gyroscope information for the dynamic configuration of the texture tracking. First we<br />

will have a look at the requirements for such a software design. A task was to integrate<br />

and reuse existing components into the setup. DWARF already provides an architecture for<br />

optical tracking based on the ARToolkit. This architecture was proposed in Wagners PhD<br />

thesis [62].<br />

4.2.1 Existing Architecture<br />

In figure 4.3 the existing architecture is shown in the UML syntax described above. Note that<br />

PoseData is a data struct containing information about position and orientation (6DOFPose-<br />

Data) or orientation (3DOFPoseData) only. In DWARF it is not distinguished between them.<br />

If a tracker only provides orientation then the values for position will not be set in the pose<br />

data structure.<br />

The ARToolkit is split in to several single components. The Videograbber service grabs the<br />

video stream and provides it via shared memory to the ARTkMarkerDetection service. The<br />

marker detection component is the core of the ARToolkit. It searches for marker features in<br />

the video frame. In order to keep the optical tracker as flexible and reusable the ARTkMarkerConfiguration<br />

is responsible to configure the ARTkMarkerDetection with marker data. So<br />

marker data can be loaded and unloaded during runtime. The ARTkMarkerDetection service<br />

does not provide pose data directly. It sends a ARTkFrameMarkers structure which contains<br />

all the information about the detected markers in the video frame. To extract the PoseData<br />

out of the marker structure the ARTkPoseReconstruct service is needed. This has the advantage<br />

that the marker detection can also be used for other purposes, if the pose information is<br />

not relevant for the application. In [63] the ARToolkit is used for wide area tracking, for example.<br />

The ARTkPoseReconstruct provides 6DOFPoseData and can be used by other services<br />


4 A Software Architecture based on DWARF<br />

Figure 4.3: The existing architecture based on DWARF Services<br />

for several goals like displaying a 3D model for example. A complete rationale behind this<br />

architecture can be found in Wagners thesis.<br />

4.2.2 Requirements for new architecture<br />

First the functionality of the architecture has to be extended with the texture tracking version<br />

of the ARToolkit. So similar to the ARTkMarkerDetection a component has to be written that<br />

performs the texture tracking routine, the ARTkNFTDetection. In terms of reusing the existing<br />

components the following considerations have to be made:<br />

• VideoGrabber service<br />

The VideoGrabber can be reused without any restrictions. The CameraData can be read<br />

directly from the shared memory, like the ARTkMarkerDetection does. Again the only<br />

restriction is that both services (VideoGrabber and ARTkNFTDetection) have to run on<br />

the same machine.<br />

• ARTkPoseReconstruct service<br />

To reuse this component the interfaces (Needs and Abilities) have to redesigned. The<br />

ARTkNFTDetection does not provide a data structure with the detected markers. It only<br />

provides the homographie-matrix used for the extraction of the 6DOFPoseData. The<br />

ARTkPoseReconstruct service calculates this matrix out of the marker structure and the<br />

extracts the pose information. Thus another need of the ARTkPoseReconstruct service<br />

has to be integrated allowing the estimation of pose with matrix data given by the<br />

ARTkNFTDetection service.<br />

• Configuration of the search window<br />


4 A Software Architecture based on DWARF<br />

manually: A simple graphical UI should make it possible to alter the search window<br />

size of the texture tracking. Similar to the ARTkNFTDetection service a configuration<br />

component enabling this is needed: the ARTkNFTConfiguration service. If<br />

no configuration of the search window is needed at all the ARTkMarkerDetection<br />

service is able to run with the configuration service.<br />

dynamically: This is the requirement for our runtime setup described in the previous<br />

chapter. Information given by a gyroscope should be used to estimate a new<br />

search size window. This estimation is based on the mapping of the movement<br />

of the handheld device and the feature point 2D coordinates that still has to be<br />

found (see chapter 3). A gyroscope tracking unit can connect to this configuration<br />

component and deliver 3DOF orientation information (gyroscope service).<br />

The ARTkNFTConfiguration allows both: a manual and a dynamic configuration of the<br />

search window size.<br />

4.2.3 System Design<br />

First we will have a look at the new components and the redesign of the interface of the<br />

ARTkPoseReconstruct service necessary to realize the architecture.<br />

New services<br />

Looking at the small requirements elicitation for the new software design the following new<br />

components can be identified. For a deeper look into requirements analysis and software<br />

engineering in general have a look at Bruegge’s ”Object-Oriented Software Engineering”<br />

book [13].<br />

• ARTkNFTDetection<br />

This service implements the loop for the optical tracking. It has an ability for ARTkN-<br />

FTPoseMatrix. This data contains the matrix describing the position and orientation of<br />

the 2D plane. The pose data can be extracted from this data.<br />

• ARTkNFTConfiguration<br />

Interfacing the ARTkNFTDetection service this service is able to set the search window<br />

size of the texture tracking. Thus this service provides the ability NFTSearchSizeWindow.<br />

Adaption of the ARTkPoseReconstruct interface<br />

A new need for NFTSearchWindowSize has to be provided. This need matches the ability<br />

of the ARTkNFTDetection and these services are able to connect. In contrast to the marker<br />

detection, where first the marker structure is used to calculated the pose matrix, the matrix<br />

is provided directly by the texture tracking. Reason for that is that the tracking routine<br />

and the calculation of the matrix out of the feature points cannot be separated easily. The<br />

calculation of the 6DOF pose data can be performed in the same way now.<br />


Resulting architecture<br />

4 A Software Architecture based on DWARF<br />

In figure 4.4 shows the resulting architecture with the new services. Old components are<br />

drawn in gray.<br />

Figure 4.4: The resulting architecture integrating the new components<br />

The following dependencies between the connected services are established by the service<br />

manager during runtime. The information is given in the XML service description for every<br />

service.<br />

• VideoGrabber ←→ ARTkMarkerDetection<br />

Type CameraData<br />

Communication method Shared Memory<br />

Description The ARTkMarkerDetection reads the video stream out of the shared memory<br />

• ARTkNFTDetection ←→ ARTkNFTConfiguration<br />

Type: NFTSearchWindowSize<br />

Communication method: Method Call<br />

Description: The ARTkMarkerConfiguration calls an exported method of the ARTkNFT-<br />

Detection to set the search size of the texture tracking.<br />

• ARTkNFTConfiguration ←→ Gyroscope<br />


Type: 3DOFPoseData<br />

Communication method: Event<br />

4 A Software Architecture based on DWARF<br />

Description: Every frame the Gyroscope service sends the new pose data to the ARTkN-<br />

FTConfiguration service. This information then is evaluated and the appropriate<br />

search size is set.<br />

• ARTkNFTDetection ←→ ARTkPoseReconstruct<br />

Type: ARTkNFTPoseMatrix<br />

Communication method: Event<br />

Description: As described the pose matrix is sent to the new need interface of the ARTk-<br />

PoseReconstruct.<br />

Figure 4.5 shows the resulting UML sequence diagram. The communication of the services<br />

is described in relation of time. We can see that the frame rate of the gyroscope is higher<br />

than the frame rate of the optical tracker and that the ARTkNFTConfiguration sets the search<br />

window size according to the received pose data events.<br />

Figure 4.5: UML Sequence diagram: Interaction of the DWARF services<br />

This architecture meets the requirements described above. On the one hand the texture<br />

tracking can be used as an independent component without any configuration of the search<br />

window. Then the parameter is set to a constant value. But on the other hand the parameter<br />

could also be set dynamically if a gyroscope tracking service and the ARTkNFTConfiguration<br />

service are active.<br />


4.2.4 Implementation<br />

4 A Software Architecture based on DWARF<br />

Linux is the main platform for DWARF. A Fedora Linux distribution 2 (version 2) was used<br />

to run DWARF, although SUSE 3 distributions are suited better. Core components have been<br />

developed and tested under SUSE only. The reason for using Fedora was that a SUSE distribution<br />

was not available in New Zealand due to high internet costs. DWARF provides<br />

support for several programming languages like C++, Java and Python. The existing architecture<br />

was written in C++, thus the new services have also been developed in C++. The<br />

graphical user interface for setting the search window size manually (in the ARTkNFTConfiguration<br />

service) was implemented with the QT Toolkit. Figure 4.6 shows a screenshot of<br />

the runtime setup. It shows the preprocessed 2D image plane and a virtual plane registered<br />

on top of it. The search window size could be set by a slider in the graphical user interface.<br />

Still is it not possible to set the search window dynamically, because it is not clear if a proper<br />

mapping can be found. But even if it is not possible to find such a mapping the texture<br />

tracking can be use in the DWARF framework now.<br />

Figure 4.6: Runtime environment: The search window size can be set manually or by the orientation<br />

information given by a gyroscope (3 DOF Intersense Tracker)<br />

2 http://fedora.redhat.com/<br />

3 http://www.suse.com/<br />


4.2.5 Summary<br />

4 A Software Architecture based on DWARF<br />

We have introduced a software architecture based on the DWARF framework accomplishing<br />

the discussed requirements of a dynamic configuration during runtime. The texture tracking<br />

ARToolkit is now integrated in the DWARF framework, which was one of the core requirement<br />

for this thesis. A mechanism for deriving a search window size out of the gyroscope<br />

orientation still has to be implemented.<br />

The next chapters will show that it is not an easy task to express this relationship. Next<br />

we will describe the design and performance of the user study motivated in chapter 3.<br />


CHAPTER 5<br />

User Study<br />

As we explained in the previous chapter we want to find a mapping between the feature<br />

point tracking and the change in orientation of the handheld device. To obtain data to evaluate<br />

a user observation has to be made. A logging infrastructure records the data during the<br />

study.<br />

This chapter will describe the goals and the design of the user study. As said the overall<br />

goal is to get data from a certain amount of people in order to analyze it. The texture tracking<br />

AR Toolkit is altered in a way that we can retrieve the 2D feature point coordinates in every<br />

video frame. The tracking setup is extended with a Ascension Flock of Bird magnetic tracker<br />

to track the user movements. To obtain comparable sets of data special tasks have been<br />

designed, in which all the test persons have to answer questions about the current scenes.<br />

This is done to force the user to act a certain way.<br />

5.1 Goals of the User Study<br />

The motivations for user studies and evaluations can be very different. According to [44]<br />

four reasons for doing evaluations could be identified:<br />

• Understanding the world<br />

How do future users use a technology? How do they employ the new system in their<br />

workplace? The main motivation for that kind of evaluations is understanding the<br />

user and his behavior.<br />

• Comparing designs<br />

Ofter system designers have to decide which input method to chose. Therefore evaluations<br />

of these different methods have to be made. A evaluation should give hints<br />

which method is more accepted by the user and leads to a better performance. An<br />

example for this can be found in Kulas master thesis [30], in which he focuses on usability<br />

aspects of ubiquitous systems and performs a sample user study to compare<br />

two menu designs.<br />


• Engineering towards a target<br />

5 User Study<br />

Studies are made in order to evaluated if the system accomplishes certain goals, for<br />

example a better performance than a competitors product.<br />

• Checking conformance to a standard<br />

These studies are mainly testing procedures to evaluates if a systems meets required<br />

standards.<br />

In our evaluation we want to ”understand the world” better. Especially we want to have a<br />

deeper look at the following aspect: How is the tracking related to the input provided by the<br />

user through movements of the handheld device. This user study is meant for collecting<br />

enough data to describe a relationship and if possible a mapping between the two data<br />

sources. Here is an overview of the expected outcomes of our user study. Of course not<br />

all of these goals could be highlighted within the limited time of this thesis. But potential for<br />

further investigations in that research area are shown.<br />

• Collect user data<br />

Data has to be collected. As we described in the previous chapters there is a lot potential<br />

to derive possible conclusions using the data. Of course finding a mapping is the<br />

prime goal but collecting the data is a first big step and took most of the time during<br />

this work.<br />

• Find a mapping between user movement and search window size<br />

This is the idea described in chapter 3. The mapping is expressed in the function introduced:<br />

ftexturetracking(∆ orientation) = search window size (5.1)<br />

• Find a relationship between user tasks and the related movements<br />

If we know which actions and movements are connected with certain tasks, we can<br />

adopt our tracking not only to general mapping, but also to the designated task. Therefore<br />

we first have to detect possible tasks in table top AR and try to discover a dependency<br />

between these tasks and the tracking results.<br />

• Characterize movement of the handheld device<br />

We are interested if the movement can be characterized and are these results valid for<br />

every user. Therefore user tasks are needed to let the user perform similar actions.<br />

• Evaluation of a suited task design for table top AR<br />

Like Shaw introduced in his experiment to characterize head motion [48] certain user<br />

tasks have to be designed. We will introduce and abstract tasks that might be suited<br />

for a variety of table top applications.<br />

• Collect feedback from potential users<br />

A questionnaire was designed to collect additional feedback on the Magic Book and<br />

on the tasks.<br />


5 User Study<br />

• Observe anything else that might be interesting<br />

Still it is important to keep the eyes open for any interesting observation during the<br />

execution of the user study and the analysis of the collected data.<br />

5.2 User Study design<br />

This section discusses all the relevant aspects of the design of this user study. First we will<br />

have a look at the recording of the separate data sources: the pose data given by the magnetic<br />

tracking device to track the hand held device and the 2D positions of the tracked feature<br />

points. The Magic Book has been implemented on Windows using Microsoft Visual Study<br />

6 1 . Therefore first the logging infrastructure had to be integrated into the existing system.<br />

After that we will introduce our task design for table top AR.<br />

5.2.1 Movement Tracking of the Hand-Held Device<br />

The issue is to track the position and orientation of the handheld device. Therefore a tracking<br />

device providing 6DOF in needed. Like we have introduced in the beginning of this thesis<br />

a magnetic tracker can measure position and orientation. The Flock Of Bird system has a<br />

update rate of about 90 frames per second, in comparison the vision-based texture tracking<br />

runs with 30 frames per second. As a repetition a base station establishes a magnetic field<br />

and the pose data of several sensors could be tracked within the range of the magnetic field.<br />

Hence it is possible to track more than one object at one time. Drawback of this tracker is<br />

that the data might be disturbed by artificial magnetic fields produced by a CRT monitor for<br />

example. And it is almost impossible to fix a setup in a room without any interference. Additionally<br />

the sensors have to be close to the base station to obtain accurate tracking results.<br />

We will also see that we have to be careful with attaching a sensor directly to the handheld<br />

device, because of its iron stick. Other tracking devices like an infrared optical tracking device,<br />

the A.R.T. tracking system 2 for example, might be an alternative. A huge advantage of<br />

the magnetic tracker is that there is no need for a line of sight between the sender and the<br />

receiver. So it is no requirement for the setup that the user is restricted not to occlude the<br />

line of sight. The setup for this user study would not be suited well for registering 3D virtual<br />

objects, because interference will lead to jittering.<br />

Software support for Flock of Birds<br />

For using and developing applications with the Ascension Flock Of Bird tracker a commercial<br />

library is available. The ”Eden Library” provides a threaded query mechanism to get<br />

the measurements. To connect to the host system communication links via TCP/IP or serial<br />

port (RS232) are supported 3 . For spatial data representation it supports an OpenGL pose<br />

matrix, a position vector and euler angles. As we can see later quaternions can be derived<br />

by a simple algorithm with the OpenGL matrix as input parameter.<br />

1 http://msdn.microsoft.com/vstudio/<br />

2 http://www.ar-tracking.de/<br />

3 For further information on The Eden Library please contact Phillip Lamb, phil@eden.net.nz<br />


5 User Study<br />

Figure 5.1: The Ascension Flock of Birds with the sender (black cube) and the host system in the<br />

background<br />

Calibration<br />

The pose data given by the Flock of Bird is always given in the coordinate system of the<br />

sending station. We get the absolute position and orientation to the origin of the Flock of<br />

Birds coordinate system. The origin of this coordinate system lies in the center of the black<br />

cube (see figure 5.1). In order to bring this coordinate system in relation to the magic book<br />

we have to to a calibration step. We have the possibility to track several targets with the<br />

Flock of Birds, thus we will track the tangible book as well. Figure 5.2 shows two sensors:<br />

the first one is attached to the upper left corner of the book. This is supposed to be the<br />

origin of the Magic Book coordinate system. The other sensor is attached to the hand held<br />

device. Our aim is to record the pose data of the second ”bird” calibrated to the Magic Book<br />

coordinate system. Next we will explain the steps to calculate this. All the methods used<br />

in the following steps were taken from the DWARF utility package. This toolbox provides<br />

all the basic transformations and calculations for spatial data. The mathematical basics for<br />

these methods can be found in the corresponding literature [49].<br />

The Eden Library provides a OpenGL matrix for the Magic Book sensor and for the hand<br />

held sensor, Mbook and Mhandheld.<br />

Position: Calculation the position of the handheld is an easy task by simply subtracting the<br />

position vector of the handheld from the origin of the coordinate system. The position<br />

vector is the last column of OpenGL matrix.<br />

The notation ¯p Book<br />

dinates.<br />

Handheld<br />

¯p Book<br />

lock lock<br />

Handheld = ¯pF Book − ¯pF Handheld<br />

(5.2)<br />

means the position vector of the handheld device in book coor-<br />

Orientation: We want to get the orientation of the handheld in book coordinates q Book<br />

Handheld<br />

in the quaternion representation. The representation of the matrices is important for<br />


5 User Study<br />

Figure 5.2: Magic Book coordinate system with its origin in the upper left corner and the handheld<br />

device with a styrofoam puffer due to ferromagnetic distortion<br />

further calculations. Because OpenGL is using a column major order and DWARF a<br />

row major order we first have to transpose both matrices.<br />

MBook = M T Book<br />

MHandheld = M T Handheld<br />

(5.3)<br />

Both matrices contain pose information in Flock coordinates. Now the corresponding<br />

quaternions can be derived by a simple method call.<br />

F lock<br />

qBook = matrix2quaternion(M T Book )<br />

F lock<br />

qHandheld = matrix2quaternion(M T Handheld ) (5.4)<br />

To obtain the resulting quaternion, the quaternion representing the orientation of the<br />

source coordinate system has to be inverted and multiplied with the quaternion of the<br />

handheld device.<br />

The resulting pose data consists of ¯p book<br />

the logging infrastructure.<br />

q Book<br />

lock<br />

Handheld = (qFBook )∗ · q<br />

handheld<br />

53<br />

F lock<br />

Handheld<br />

(5.5)<br />

and qbook<br />

handheld . This data has to be recorded by

Interference and noise<br />

5 User Study<br />

As we said, because ferromagnetic objects may seriously distort measurements it is not possible<br />

to attach the sensor directly to the handheld device. Therefore the handheld device<br />

was prepared in a special way. To make sure that the handheld device does not influence the<br />

measurements it had to be attached in certain distance from the handheld device. To figure<br />

out the proper distance a straight edge was put on to of the handheld perpendicular to the<br />

iron stick. Now the sensor was moved constantly along the straight edge from one side to<br />

the other side. In the middle, on top of the handheld device, a distortion was recognized.<br />

To visualize this distortion a virtual cube was displayed with the pose information given by<br />

the Flock of Birds tracker. In the next step the straight edge was attached further away by<br />

using styrofoam blocks. This step was repeated until no obvious distortion could be recognized<br />

anymore. A styrofoam block with the proper thickness has been attached on top of<br />

the device (5.2). But as discussed this is a huge drawback of using a magnetic tracker in this<br />

setup.<br />

As a summary we realized that it is necessary to track both, the handheld and the Magic<br />

Book. We have to calibrate the setup, because we are considering the book coordinate system<br />

as our world coordinate system. Especially this is important if we want to consider position<br />

information for future evaluations.<br />

5.2.2 Tracking of 2d feature point<br />

The information which feature points are tracked in the texture tracking ARToolkit is transparent<br />

for the programmer. This means that the tracking routine itself is hidden for the application<br />

developer. This has the consequence that the tracking method has to be extended<br />

to obtain the feature point information as well. Due to the texture tracking algorithm, described<br />

in chapter 2, in every frame at least four feature points suited best are chosen. For<br />

every feature point the 2D coordinates are tracked: ¯pF P = (px, py). Note that we only have<br />

a 2 dimensional vector here. Every feature point has a unique identity. This is important for<br />

the user study so we can recognize that a feature point is continuously tracked over several<br />

frames and observe its path through the 2D video plane.<br />

5.2.3 Logging Infrastructure<br />

To setup a logging infrastructure the existing Magic Book application had to be extended. A<br />

Flock Of Birds tracking component has to be integrated and and the the method call of the<br />

tracking routine has to be altered. Both tracking information has to be recorded by a component<br />

logging this information. Later in the evaluation it must be possible to synchronize the<br />

data. The logging can be started and stopped by the glut callback functionality (see figure<br />

5.4). At the end of each tracking frame of the tracking components the data is given to the<br />

Logger via method call and written into a file.<br />

Figure 5.3 shows the classes participated in the logging steps. Central component is the<br />

Logger. It records the pose information given by the texture tracking and the Flock Of Birds.<br />

Both tracking components attach a timestamp to the pose data.<br />


5 User Study<br />

Figure 5.3: Static structure of the logging environment<br />

Figure 5.4: Sequence diagram describing the logging steps<br />


5 User Study<br />

The data is written to two files. For the 2D feature points coordinates the following data<br />

is recorded. As we said at least for feature points are need to calculate the viewpoint of the<br />

user. All these four feature points are considered (see equation 5.6).<br />

logF eaturepoints = (timestamp, id1, x1, y1,<br />

id2, x2, y2,<br />

id3, x3, y3,<br />

id4, x4, y4) (5.6)<br />

If the tracking fails for one or several frames id1 is set to ’-1’. This makes it possible to<br />

count tracking failures.<br />

The pose data given from the Flock of Birds is logged in the following way (5.7). Each<br />

component for the quaternion q = (x, y, z, w) = (q0, q1, q2, q3) is considered. ¯v = (x, y, z)<br />

is the imaginary vector and w the real scalar. Note that we changed the order of the scalar<br />

and the imaginary vector. This has the reason that the calculation methods provide the<br />

quaternions in this order. The position ¯p = (px, py, pz) is logged as well, although we do not<br />

know if we need it for further evaluation.<br />

logHandheld = (timestamp, q0, q1, q2, q3, px, py, pz) (5.7)<br />

One thing we have not considered at all in this logging environment is the delay of both<br />

trackers. This is necessary to determine the exact state of the setup at one point of time.<br />

Consequences and reasons will be discussed later on in the thesis.<br />

5.2.4 Task Design<br />

Somehow we want to force the participants to behave in a certain way to compare the data<br />

between different participants. Letting them explore the virtual objects randomly might<br />

not lead to satisfying results. Every user might focus on other objects and animations. But<br />

this is still an assumption which has to be proved. The idea is that we might have more<br />

success to compare data of different participants if we give similar tasks to the test persons.<br />

Task centered user interaction design is one approach to develop specific user interfaces [31].<br />

Future user have to be observed in order to know their behavior, to gather information about<br />

how they handle things and to evaluate special requirements for tasks. So this user study<br />

could rather be understood as a task observation, not a usability study. Although we will<br />

possibly collect also information that makes it possible to appraise the usability of the Magic<br />

Book. We present a categorization for tasks in table top AR applications. These tasks are<br />

also related to expected actions or behavior of the participant. These tasks are applied in<br />

the user study, but further on we will discuss if it is possible to abstract them for a variety<br />

of table top AR applications too. Primary goal is to design user tasks that are easy, so even<br />

unexperienced participants are able to perform them without further training. First we want<br />

to introduce the tasks generally and then we will give an example how these tasks have been<br />

applied in the user study.<br />


Different tasks in table top AR<br />

5 User Study<br />

The following user tasks have been identified, relevant for the Magic Book. But as we said,<br />

later on we will have a look if we can use them for other table top applications as well.<br />

Overview task: The user has to get an overview of the scene with its virtual, but also real<br />

objects. It is expected that the user will bring himself in a position where he is able<br />

to see the whole scene without moving around in order to get an impression. Special<br />

features of interests might be focused and he might move around slightly. In the Magic<br />

Book this can be achieved by simply asking a question about the content of the scene.<br />

A feature in the content of the scene could be a virtual character or an object.<br />

Focus task: In this task the user is pushed to focus a specific feature of the augmented environment.<br />

The location of the feature should be obvious to the participant. As an<br />

expected behavior the user will move closer to the scene and try to hold still to observe<br />

the feature. Again in the Magic Book through posting a question on a specific virtual<br />

object.<br />

Detail task: This task is a combination of the overview and the focus task. While the focus<br />

task only concentrates on single features, the detail task will force the user to move<br />

around in the scene to get an overview and move closer to focus on features as well.<br />

This can be achieved by asking the participants to count objects in the scene, that occur<br />

at several locations, for example.<br />

Additionally a free task will be introduced. This should give the opportunity to observe the<br />

user when he is able to move around without any restrictions. The task can be circumscribed<br />

as ”understanding the scene”- task. The following example should demonstrate these tasks<br />

by applying them for a fairy tale scene of the Magic Book.<br />

Example application of the tasks in the user study<br />

The questions posted to the participant are related to the scene in figure 5.5.<br />

In this scene the participant is confronted with the tasks described above. This is done<br />

with asking questions about different features of the scene. It is not relevant if the answer of<br />

the participant is right. The focus is on what efforts concerning movements of the handheld<br />

device are made to explore the features. The following questions are posted.<br />

• How many people do you see in the scene?<br />

This is an overview task. All the virtual characters are distributed throughout the scene.<br />

Thus a position is needed where the user gets an overview of the whole scene.<br />

• What is the haircolor of the woman with the white skirt and the yellow jersey?<br />

The participant should focus on a specific feature of the scene. The feature ”women<br />

with the white skirt” is almost obvious to the user. This is a focus task.<br />

• How many people wear a hat or a hairdress?<br />

This is a detail task. The features are spread over the scene and a closer look is necessary<br />

to answer the question. But it is obvious where the features are located.<br />


5 User Study<br />

Figure 5.5: Magic Book: In order to let participant perform tasks questions are posted<br />

In the free task the test person is able to observe the scene without further questions or<br />

constraints. The user study itself consists of 4 cases. Every case considers one scene displayed<br />

on one page. Two of these cases are free tasks. In the other two cases questions are<br />

posted to the user in order to accomplish the demanded task. A full description of the cases<br />

can be found in the appendix B.2.<br />

5.2.5 Setup<br />

The environment for the user study is set up in the HIT Lab demo room. The Magic Book<br />

application with the logging extension runs on a fast Shuttle PC (P4 3.2Ghz processor with<br />

1GB DDR400 RAM). The Shuttle PC 4 is plugged to the network and is able to connect to<br />

the Flock of Bird host system via TCP/IP. The tangible magic book is placed on a table in<br />

a similar height as the AR kiosk and one sensor is placed in the upper left corner of the<br />

book, because of the calibration issue. The Flock of Bird tracker does not need a line of sight<br />

between the sender and the receiver, therefore we do not have to ensure that the participants<br />

do not cross the line of sight. A participant equipped with the tracked handheld device is<br />

able to use the application similar to the usual Magic Book setup. The Shuttle PC itself was<br />

placed on a desk nearby the Magic Book table and I was sitting on this desk in order to post<br />

the task questions and to start and stop the logging (see figure 5.6). Figure 5.7 shows the top<br />

view of the user study.<br />

The best tracking results are achieved if the receivers are close to the sending base station.<br />

Therefore the distance of the table to the base station was about 1 1<br />

2 meters. Thus there was<br />

still enough space for the user to move around freely.<br />

4 www.shuttle.com<br />


5 User Study<br />

Figure 5.6: User study setup: The magic book is place on a plate in a certain height. The computer<br />

system in the back controls the logging and is connected to the Flock of Birds tracker<br />

Figure 5.7: <strong>Top</strong> view of the user study setup. The base station is put near the sensors as close as<br />

possible.<br />


5.2.6 Questionnaire<br />

5 User Study<br />

Additionally to the recorded data a questionnaire was given to the test persons. It should not<br />

take longer than 5 minutes for the test person to answer the questions. This questionnaire<br />

should give further information about the following issues:<br />

• Background of participants<br />

Data about age, occupation and background on AR and the Magic Book is collected. It<br />

is desirable to have a widespread variety of test persons. AR experts would probably<br />

behave different that new users, but this is also only an assumption. Data about age<br />

and occupation were only voluntary, they were not important for the study.<br />

• Feedback on tracking<br />

This feedback focuses on the delay and the jittering of the feature tracking technology.<br />

Both factors affect the immersion of the user in the virtual world. This data was also<br />

only secondary. The delay issue is interesting, because our approach wants to speed<br />

up the computation. Our ideas is to reduce the delay caused by the image processing<br />

routine.<br />

• Feedback on tasks<br />

These questions should give feedback on the difficulty of the task. We made a difference<br />

between the test cases where a user is able to move around without restrictions<br />

and the test cases where the user has to fulfill certain tasks.<br />

• Feedback on user interface and usability<br />

The Magic Book works with a handheld visor a graphical user interface. In the table<br />

top AR chapter 2 we discussed the rationale of this choice. But still there is the question<br />

if another user interface is more suited for the Magic Book. This is as well connected<br />

to the question if the Magic Book is ”easy to use”for unexperienced users.<br />

• Further comments and feedback<br />

During user studies the comments that participants express are valuable as well. These<br />

comments can be used to draw further conclusions for our user study goals, although<br />

they can not be put in empirical data.<br />

All of the questions were posted with scalar values from 1 to 5. The collection of the<br />

tracking data is still the main goal of the user study, thus the questionnaire should only give<br />

additional feedback and information about the user. Mainly the feedback for the tasks were<br />

important because as we said we wanted to have ”easy” tasks. If the test person would<br />

mainly agree that the tasks were difficult, we could hardly compare data sets of expert users<br />

and participants who are confronted with the Magic Book for the first time. The complete<br />

questionnaire can be found in the appendix A.1.1.<br />


5.3 Execution of the Study<br />

5 User Study<br />

The execution of the user study was mainly done in two sets. This section gives an overview<br />

of the concrete execution of the user study, concerning the selection of participants, the sequence<br />

of tasks during the study, time and place and a discussion about the difficulties and<br />

problems during the study.<br />

Participants<br />

The method for getting test persons for the user study was mainly ”hallway testing” for<br />

saving time. First I asked students and interns at the HIT Lab to join my study. Unfortunately<br />

most of the students were experts in developing AR applications themselves. But I<br />

still asked the test persons to spread the word and so I was able to test unexperienced users<br />

as well. At a whole 20 test persons were joining the study and the level of expertise was distributed,<br />

which is satisfying for my purposes (see figure 5.8). I recognized that experienced<br />

AR developers are more critical concerning the tracking and UI technologies. On the other<br />

hand test persons who are confronted with AR the first time are very fascinated. This was<br />

my experience while talking with the participants.<br />

Figure 5.8: Overview of the expertise of the user study participants on AR and the Magic Book. The<br />

scale was from 1 (”never heard of it”) to 5 (”experienced developer”). Overall 20 participants joined<br />

the study<br />

Place and time<br />

Due to the shared resources of the demo room the study had to be split in two sets. Thus<br />

the setup had to be built up for several times including a pilot run. In the pilot run mainly<br />

the questions on the virtual scene were tested with a satisfying result. Setting up several<br />

times has a huge disadvantage, because it is hard to set up with the same conditions for the<br />

test persons twice. Even the light conditions influencing the optical tracking depend on the<br />

time of day. Also we had to be very careful with the Flock Of Bird tracker. The demo room<br />

was equipped with some CRT monitor setups and computers which might interfere with the<br />


5 User Study<br />

magnetic field of the Flock of Bird. Thus a lot of testing was required prior to the conduction<br />

of the user study. But in the end the results achieved with this setup were satisfying.<br />

Steps in the user study<br />

As an introduction for the participant I figured out a guideline for the study (see A.1.2).<br />

The participant should know about the execution and the purpose of the study. Details<br />

probably influencing the behavior are hidden to the user. Also the usage of the Magic Book<br />

is introduced to unexperienced users. Important was also to let the user know that he can<br />

not do anything wrong or give a wrong answer to the task questions. The participant was<br />

allowed to ask questions during the study as well. For myself I figured out a schedule with<br />

the single steps during the study (see A.1.2).<br />

1. Practice<br />

The first step should give the test person the possibility to get used to the usage of<br />

the Magic Book. The participant should figure out which movements of the handheld<br />

device are allowed by the setup, especially by the tracking routine. If the tracking fails<br />

the reinitialization of the tracking by looking on the marker was explained.<br />

2. Case 1: Free Task<br />

The movements of the handheld and the feature point tracking information was recorded<br />

for 30 seconds during this task. There were no restrictions for the user, expect for the<br />

task to understand the scene.<br />

3. Case 2<br />

Now special questions were posted to the user about features of the current scene. The<br />

categorization of tasks introduced earlier in this thesis was applied here. Prior to the<br />

study special scenes, pages in the Magic Book, were chosen that were suited best for<br />

that purpose. One property of those scenes was that they had more features relative to<br />

the other scenes. The user should answer those questions as soon as possible. There<br />

was no time restriction. The user had to accomplish 5 tasks in this case.<br />

4. Case 3: Free task<br />

This case is similar to the first case, except that another scene was chosen for it (again<br />

30 seconds).<br />

5. Case 4<br />

This case again is similar to case 2. A suited scene was selected for it. In this case the<br />

participant had to succeed 4 tasks.<br />

6. Questionnaire<br />

After the 4 cases the questionnaire was handed out for the user.<br />

7. Gather further comments and feedback<br />

Additionally to the questionnaire the participant had the possibility to make further<br />

comments and encouragements. With most the participants it was possible to chat<br />

about the tracking technologies and the Magic Book. Also some people were interested<br />

in the results of the study.<br />


5 User Study<br />

For the user study only 4 scenes were needed. But most of the participants wanted to<br />

enjoy the whole fairy tale consisting of 8 scenes. A complete description of the scenes used<br />

for the cases and the single tasks with the corresponding results of the evaluation can be<br />

found in the appendix (see B.2). In figure 5.9 a participant during the user study can be seen.<br />

Problems and difficulties<br />

Figure 5.9: A participant during the user study with the handheld visor<br />

As already mentioned the availability of the demo room was one restriction to the execution<br />

and preparation of the study. So I had to set up the study environment several times and<br />

I had to ensure to have almost the same conditions for every run. Another important issue<br />

which was not considered in the study setup were the different delays of the trackers. In<br />

order to synchronize both tracking data sets in the evaluation of the study the states of both<br />

tracked objects at one point of time have to match. This is a very difficult tasks to estimate the<br />

tracking delays. The delay of the texture tracking is mainly caused by the transport of video<br />

data from the camera in the main memory and by the image processing routine. The Flock<br />

of Bird tracker first has to transfer the data from the receiver to the host system. Then the<br />

data is transfered via network to the Shuttle PC. To consider both latencies correctly delays<br />

measurements have to be made. This is done by using a reference tracking device where the<br />

delays is know. Shaw proposes an experiment to measure this delay [32]. Performing this is<br />

hard and time intensive. Thus we have to consider a small shift in our data measurements.<br />

But future setups should put the effort to measure the delay offset of both trackers.<br />

5.4 Summary<br />

In this chapter we described the design and conduction of the user study. Also the difficulties<br />

have been discussed. But finally we recorded two sets of data: the 2D feature point<br />

coordinates given by the texture tracking algorithm and the pose information given by the<br />


5 User Study<br />

Flock Of Bird tracker. In addition to this we obtained feedback by the user study participants.<br />

The further evaluation will focus on three different aspects now. First we will have<br />

a look at the feature point tracking itself. As we said in chapter 3 the feature point moves<br />

on a path if it it tracked for several frames (see figure 3.1). We will try to discover patterns<br />

in the paths of the tracked feature points and draw some conclusions. These patterns can be<br />

used to explore the relationship with the handheld orientation (see the intersection between<br />

feature point coordinate and handheld orientation in figure 5.10).<br />

We gave a categorization of tasks for table top AR, especially suited for the Magic Book.<br />

We will also try to evaluate if properties of these feature point patterns give hints on the<br />

performed task (intersection between feature point coordinates and task).<br />

Figure 5.10: Overview of further evaluations<br />

Still only a view aspects of this evaluation can be discussed in this thesis. But some further<br />

ideas on how to continue this research area will be provided and discussed in the further<br />

chapters.<br />


CHAPTER 6<br />

Evaluation of the User Study<br />

Now we have performed the user study and we have collected the desired data. Now we<br />

have to discuss ideas how to evaluate and analyze the derived data sets. On the one hand the<br />

feature point coordinates and on the other hand the pose information of the handheld device<br />

have been recorded. This chapter tries to evaluate the retrieved data on different aspects. We<br />

can consider an evaluation of each data set alone or find dependencies (see figure 5.10).<br />

The results will be presented and conclusions for table top AR will be drawn. In order to<br />

get a full understanding of the relationship between tracking, user interface and user also<br />

additional ideas for future work will be discussed.<br />

6.1 Evaluation of the User Study<br />

In the last chapter we introduced the evaluation scheme to highlight mainly three different<br />

aspects of the user study:<br />

1. Analyzing the feature point data sets<br />

We will show that the tracking of feature points results in certain pattern in the logging<br />

data. We will use these patterns for our further evaluations.<br />

2. Finding a linear mapping between 2D feature point coordinates and the orientation of<br />

the handheld device<br />

This part describes an approach to find a correlation between the data sets of 2D feature<br />

point coordinates and the change of orientation given by the magnetic tracker. To do<br />

this we have to refine the data in order to compare it, because now we only have<br />

heterogeneous data: 2D coordinates and 3D orientation. The method used to find a<br />

linear mapping is called linear regression.<br />

3. Finding a relationship between the performed task and properties of the tracked feature<br />

points.<br />


6 Evaluation of the User Study<br />

We will see that certain properties of the tracked feature points change with the task.<br />

This leads to a task-based approach. This means that the search size could also be<br />

adopted to the performed task.<br />

The results from the questionnaire will be used to have additional ideas if we evaluate a<br />

certain topic. In addition to this approach further ideas for evaluations will be discussed.<br />

6.1.1 Feature Point Tracking<br />

In every feature point tracking frame four feature point are tracked. Thus the logging data<br />

contains these four feature point coordinates (see 5.6) or ’-1’ if the tracking fails in one frame.<br />

First we want to have a look at the tracking failures.<br />

Tracking Failures<br />

Tracking failures are caused by fast or unsuited movements in a way that feature points can<br />

not be found again in the next frame. The user has to reinitialize the tracking by looking on<br />

the square marker again. Figure 6.1 compares the two free tasks (case 1 and case 3) where the<br />

user is allowed to use the Magic Book without any restrictions. Case 1 is performed prior to<br />

case 3. Each case took 30 seconds. In the time between case 2 was performed. Hence while<br />

performing case 2 the participants got additional ”training” with the Magic Book.<br />

Case 1 Mean Tracking Failure Max Min Variance<br />

10.60 35 0 103.62<br />

Case 3 Mean Tracking Failure Max Min Variance<br />

2.25 13 0 8.30<br />

<strong>Table</strong> 6.1: Tracking Failures during case 1 and 3: The chart shows that the tracking failure in case 3 is<br />

lower and does not vary like in case 1<br />

On a first view it is remarkable that the mean tracking failure in case 1 is almost five<br />

times higher than in case 3. Also the variance of the tracking failures leads to the conclusion<br />

than the usage of the Magic Book in case 1 is more heterogeneous than in case 3. This first<br />

interesting consideration could be caused by two different aspects:<br />

• Different content leads to these values<br />

This aspect could be a reason for the differences is these values. But our experience was<br />

that the Magic Book content does not lead to a huge mismatch of the user behavior. The<br />

compositions of the virtual scenes are similar throughout the Magic Book fairy tale.<br />

• Learning the usage of the Magic Book leads to these values<br />

As we claimed in the introduction users adopt to the underlying tracking technology.<br />

The tracking failure measurements give evidence for this assumption. The usage in<br />

case 1 is very heterogeneous but users learn which movements lead to tracking failures<br />

and apply this knowledge in Case 3.<br />


This lead to our first insight:<br />

6 Evaluation of the User Study<br />

Result: Tracking Failure<br />

Users adopt to tracking. The comparison of tracking failures show that these failures<br />

decrease with more practice. This important aspect could be considered in our<br />

gyroscope runtime setup. In the beginning always an offset is added to the search<br />

window size. This takes the learning aspect into account.<br />

One possibility to figure out if the Magic Book content influences the user behavior to a<br />

high degree would have been to perform another free task with different content. This case<br />

could be compared with case 3. But it is assumed that there will not be a huge difference due<br />

to the fact that the arrangement of the 3D scenes is very similar.<br />

Interesting is also if test persons who considered themselves as very familiar with the<br />

Magic Book cause less tracking failures than non-experienced users. We split the participants<br />

into two groups. The non-experienced group, assessing themselves from 1 to 3 on the<br />

questionnaire scale and the experienced (4-5). The following chart shows the results:<br />

Experienced Non-Experienced<br />

Case 1 8.70 12.50<br />

Case 3 2.60 2.50<br />

<strong>Table</strong> 6.2: Mean Tracking Failure of the experienced and non-experienced group<br />

The figures show that there is no serious difference between these groups. But the data<br />

have to be considered with caution because every group has outliers with failures up to 28<br />

and more persons have to be tested in order to have a reliable result. But my observations<br />

were that also experienced test persons had to get used to the Magic Book first again.<br />

Connected Chains Pattern<br />

As we have already discussed in chapter 3 we want to analyze resulting paths of a feature<br />

point tracked over several frames. This is easily done by comparing the logged IDs of the<br />

feature points. If the IDs of feature points match in two following tracking frames we know<br />

that we can calculate the coordinate offset between the 2D coordinates. If a feature point is<br />

tracked over more than one frame we will call the resulting pattern connected chain. The feature<br />

point in a connected chain is not necessarily logged on the same feature point position<br />

(1-4) in the logged data. For example it is possible that feature point with id x is logged on<br />

position F P1 in frame i and on position F P4 in frame i + 1. Therefore it is necessary to sort<br />

the logged data to retrieve these connected chains at one position. In figure 6.1 a 3D plot of<br />

the sorted data is shown. The purpose of this plot is to see how the connected chains look<br />

like. Each feature point position F P1 − F P4 is colored in a different way. For further evaluation<br />

we will only consider the feature points at position F P1 (red color). Reason for that<br />

is that the sorting is done in a way that the longest matches will be at position F P1. In this<br />

special case the logged data exists only of one chain (see figure 6.1). In figure 6.2 the data<br />


6 Evaluation of the User Study<br />

set at F P1 consists of 27 connected chains. Both plots were taken from the same task with<br />

different test persons.<br />

Figure 6.1: Example 1: 3D Plot of the logged feature point coordinates: It shows the (x, y) coordinates<br />

in the video plane in the corresponding tracking frame. This data set of F P1 consists of only one<br />

connected chain (red chain). The time is measures in milliseconds<br />

The corresponding 2D plot of these both data sets can be seen in figure 6.3.<br />

It is obvious that it is possible to derive information about the occurring movement of the<br />

handheld device from these connected chains. The first plot (6.1) leads to the conclusion<br />

that no heavy movement occurred, because the tracking results in only one connected chain.<br />

In the second figure (6.2) 27 connected chains occur. The 2D view of the video plane (6.3)<br />

underlines this because the area covered by the tracked feature points in the second plot is<br />

obviously larger than in the first one.<br />

A interesting number derived from these connected chains is the shift of one feature point<br />

between two tracking frames. This shift is the length li,i+1 of the vector from the 2D position<br />

pi = (xi, yi) in frame i to the position pi+1 = (xi+1, yi+1) in frame i + 1 (see figure 6.4). The<br />

length of the shift vector can be calculated easily:<br />

li,i+1 = (xi+1 − xi) 2 + (yi+1 − yi) 2 (6.1)<br />

Thus we have two values now if we want to analyze the connected chains: the lengths of<br />

the shift vectors and the number of connected chains in a data set. Note that the number of<br />

connected chains is a specific property of the ARToolkit feature point tracking algorithm.<br />


6 Evaluation of the User Study<br />

Figure 6.2: Example 2: 3D Plot of the connected chains. The data set at position F P1 (red) consists of<br />

27 connected chains<br />

Figure 6.3: 2D plots of the data sets are shown: on the left side the corresponding plot with one chain<br />

(example 1) and on the right side the corresponding plot with 27 chains (example 2)<br />


6 Evaluation of the User Study<br />

Figure 6.4: Vector of feature point position from frame i to frame i + 1<br />

Result: Connected chains<br />

During the analysis of feature point tracking data we discovered a pattern that we<br />

call connected chains. For further evaluations we will consider the length of shift<br />

vectors.<br />

The next step will take the logged orientation of the handheld display into account. This<br />

would correspond to the intersection of feature point tracking and handheld orientation in<br />

figure 5.10.<br />

6.1.2 Feature Point Tracking and Tracking of the Handheld<br />

First we have to refine the logged data in a way that we can compare it. The idea is that<br />

we use the shift of the feature points on the one hand and the change in orientation on the<br />

other hand. The shift can be measured by calculating the vector length using the coordinates<br />

offsets as described in 6.1. Note that we can only calculate the vector length at connected<br />

chains, because the same feature point is tracked over several frames. In order to synchronize<br />

the data sets, the start and the end of the vector has to be annotated with the orientation<br />

valid at the corresponding time (see figure 6.5). The timestamps are used to perform this<br />

synchronization. The update rates of both trackers are constant. The Magnetic tracker has<br />

an update rate of 90 fps (frames per second) and the vision based tracker works with an<br />

update rate of 30 fps. Thus the magnetic tracker is about 3 times faster than the optical<br />

tracker.<br />

As we discussed during the design of the user study this step is not done in an optimal way<br />

because the delay offset between the vision based and the magnetic tracker is not considered.<br />

Now we have to calculate the change in orientation between the orientation at the beginning<br />

and at the end of the vector. Due to the fact that we logged quaternions we have to calculate<br />

the difference of the quaternions to get the anglular offset. For every tracking frame i + 1 we<br />

can derive data pairs di+1 with<br />

di+1 = d(xi+1, yi+1) = (∆(qk, qk+n), l (i,i+1)) (6.2)<br />


6 Evaluation of the User Study<br />

Figure 6.5: The beginning and the end of the shift vector is annotated with the orientation using the<br />

timestamps<br />

Difference between a quaternion rotation<br />

The calculation of the difference between a quaternion rotation is applied similar to Strasser’s<br />

diploma thesis [53] which should be considered for a deeper look. As a repetition a quater-<br />

θ)¯v) specifies a rotation of θ = 2arccos(s) around the axis ¯v.<br />

nion q = (s, sin( 1<br />

2<br />

q = (s, sin( 1<br />

θ)¯v) ⇒ θ = 2arccos(s) (6.3)<br />

2<br />

This leads to the result that we only have to consider the scalar value s of a quaternion<br />

if we want to derive the angle θ. Now the difference d between two quaternions p and q is<br />

calculated by multiplying the conjugate of the first quaternions p∗ = (w, −¯v) with the second<br />

quaternion q = (w ′ , ¯v ′ ). This is also called the derivation of a quaternion:<br />

d = p ∗ q = (ww ′ − ¯v · (−¯v ′ ), ¯v × (−¯v ′ ) + w(−¯v ′ ) + w ′ ¯v) (6.4)<br />

Note that · is the scalar multiplication and × the vector product in R 3 . As we have seen in<br />

6.3 we only need to consider the scalar part of 6.4 now:<br />

∆w = ww ′ + xx ′ + yy ′ + zz ′ (6.5)<br />

with ¯v = (x, y, z) and ¯v ′ = (x ′ , y ′ , z ′ ). Now we can compute ∆Θ:<br />

∆Θ = 2arccos(∆w) = 2arccos( ww ′ + xx ′ + yy ′ + zz ′ ) (6.6)<br />

In order to calculate the angle in the right quadrant by the arccos function the absolute<br />

value of ∆w must be taken.<br />

Now the angle between the two quaternions qk and qk+n can be computed by using equation<br />

6.6:<br />

∆Θk,k+n = 2arccos(|wkwk+n + xkxk+n + ykyk+n + zkzk+n|) (6.7)<br />


6 Evaluation of the User Study<br />

For all connected chains we will compute the corresponding data pairs. A data pair consists<br />

of the angular offset of the handheld device between the start and the end of an optical<br />

tracking frame the length of the shift vector.<br />

di+1 = d(xi+1, yi+1) = (∆Θk,k+n), l (i,i+1)) (6.8)<br />

The next step is to characterize the relationship between the vector length and angle measurements.<br />

Correlation and Regression of the Data Pairs<br />

As we have already discussed we now want to have a measurement how the 3D orientation<br />

and the 2D feature point coordinates are related to each other. In our pre-considerations<br />

above we refined our data to have comparable data pairs di consisting of the angle between<br />

the quaternions and the length of the shift vectors at the corresponding timestamps. The<br />

first question is how the data sets are related to each other. Then we want to characterize<br />

this relationship and develop a linear mapping. We will use two statistical techniques to<br />

evaluate the data pairs:<br />

Correlation. The correlation gives a measurement to what degree two sets of data are related<br />

to each other. The Correlation Coefficient r can be calculated by every standard statistical<br />

tool and gives a value between -1 and 1. The idea is very easy: If we would plot all the<br />

data pairs d(x, y) with x on the x-axis and y on the y-axis and all the point would fall on<br />

one straight line the correlation coefficient would become |r| = 1. This is an indicator<br />

for a very strong relationship. On the opposite the correlation coefficient tends to 0 if<br />

the points would be randomly spread. If the value of y increases with higher values of<br />

x the coefficient is positive. Otherwise the coefficient has a negative sign. These plots<br />

are also called scattered plots.<br />

Regression. The regression characterizes a relationship of two measurements. We will concentrate<br />

on linear regression only. The linear regression computes a linear model of the<br />

measurements in order to predict the dependent variable Y using the predictor variable<br />

X. The resulting function look like this:<br />

Y = a + bX (6.9)<br />

The linear regression estimates the values for a and b. If we would look at the plot this<br />

function is represented by best suited straight line to minimize the difference from the<br />

actual measurements. These differences are called residuals. If the relationship can not<br />

be described by a linear model a multiple regression has to be applied which will not<br />

be discussed here.<br />

An introduction to correlation and linear regression can be found in the appendix of this<br />

thesis A.2.<br />

We want to derive a linear model for our measurements. We want to predict the length<br />

of the shift vector (dependent variable L) using the angular offset (independent variable A).<br />

The linear regression calculates the values for a and b in this function:<br />


6 Evaluation of the User Study<br />

L = a + bA (6.10)<br />

In order to have an overall conclusion about the correlation and regression we concatenated<br />

the data sets of all the test persons in every test case. This increases the number of<br />

measurements and leads to a more accurate estimation of the linear model. The following<br />

example is taken from test case 2. It shows the measurements derived by the data of task 2.<br />

Figure 6.6 shows the measurements in a scattered plot with the angle A on the x-axis and the<br />

vector length L on the y-axis.<br />

Figure 6.6: Case 2 / Task 1: The scattered plot with regression line<br />

The correlation analysis with this task leads to a correlation coefficient of 0.6340. This gives<br />

an indicator that the measurements are related. If we have a look at the plot we see that the<br />

the length of the shift vector increases with higher angles. Thus the coefficient is positive.<br />

The regression analysis estimates a = 1.26631 and b = 194.822. This leads to the following<br />

linear model:<br />

L = f(A) = 1.26631 + 194.822 · A<br />

Of course this regression function is only an estimation. The output value is the length of<br />

the shift vector which is exactly the search window size needed in order not to produce a<br />

tracking failure. Thus the feature point lies somewhere on the circle with radius f(A) around<br />

the position of the feature point in the previous frame (figure 6.7).<br />

If all the points for every input angle would fall on the circle the correlation coefficient<br />

would be r = 1 and we would have a perfect correlation. Now we have a look at the<br />

residuals. As we said briefly the residuals are the differences between the linear model and<br />


6 Evaluation of the User Study<br />

Figure 6.7: The feature point falls on a circle with radius r = f(A): Point inside the circle are residuals<br />

below the regression line, points outside the circle are residuals above the regression line<br />

the actual measurements, they would cause errors for our search window configuration.<br />

If we look at figures 6.7 and 6.6 we see that residuals below the regression line will not<br />

result in tracking failures, because the search window with length lW indow = 2f(A) is larger<br />

than actually needed in the current frame. The residuals above the regression line would<br />

cause failures. They are located outside the search window. A possible solution to solve this<br />

problem is to add a specific offset to f(A). Thus the regression line would shift on the y-axis<br />

and more points would fall below the line. If all the points should fall below the regression<br />

line we can chose the maximum positive residual as offset:<br />

lengthi = f(anglei) + max(A measured − f(A) | (A measured − f(A)) > 0)<br />

But the adequate offset still has to be evaluated, which is not part of this thesis. The<br />

maximum residual as offset could be an outlier and the search window would be far too<br />

large in consideration of all the other measurements.<br />

Analyzing the different Tasks<br />

As a short repetition the following steps have to be made in order to analyze all the data sets<br />

for the different tasks:<br />

1. Extract the connected chains for each data set<br />

2. Calculate the shift vector lengths of the connected chains (6.1)<br />

3. Synchronize each start and end of the shift vector with the corresponding quaternions<br />

(6.5)<br />

4. Compute the difference between the quaternions in order to obtain data pairs d = (y, x)<br />

(6.7, 6.8)<br />


6 Evaluation of the User Study<br />

5. Concatenate all the data pairs for one test person in one task, because every task will<br />

result in several connected chains. As we have seen in example 2 we have retrieved 27<br />

connected chains. Each chains has to be refined according to the previous step and all<br />

the data pairs have to be united.<br />

6. Concatenate the data pairs of all the test persons for one task. As we have said we<br />

want to have an overview of all the data of a single task. Therefore the data pairs of<br />

every test person are united.<br />

7. Now for every task the linear model is calculated (6.10) with a statistical tool.<br />

The following chart show the estimations of the correlation and the linear regression for<br />

every test case and the corresponding tasks. The following numbers are shown in the tabular:<br />

the correlation coefficient r, the parameters a and b of the linear model. The value h<br />

which gives a number for the angular rate of turn causing an offset of the search window<br />

size of 1. It is just a solution to the equation ∆yi = b∆xi ⇒ 1 = b∆xi.<br />

r a b h<br />

Case 1<br />

Free Task<br />

Case 2<br />

0.3239 1.56895 144.624 0.0069<br />

Task 1 0.6340 1.26631 194.822 0.0051<br />

Task 2 0.5511 1.45847 177.196 0.0056<br />

Task 3 0.5296 1.78150 193.941 0.0051<br />

Task 4 0.5949 1.76134 220,166 0.0045<br />

Task 5<br />

Case 3<br />

0.6254 1.34886 219,083 0.0046<br />

Free Task<br />

Case 4<br />

0.4642 1.55991 127.487 0.0078<br />

Task 1 0.5682 1.16379 190.290 0.0052<br />

Task 2 0.4934 1.48687 131.274 0.0076<br />

Task 3 0.4366 1.74120 130.608 0.0076<br />

Task 4 0.5216 1.49365 197,812 0.0051<br />

<strong>Table</strong> 6.3: Results of the linear regression with the correlation coefficient r, the parameters a and b of<br />

the linear model. The value h which gives a number for the angular rate of turn causing an offset of<br />

the search window size of 1.<br />

The statistical tool also computes a p-value. This p-value is the probability that we make<br />

a mistake if we deny the assumption that the angle and the shift vector are not related at all<br />

(the null-hypothesis H0). In our linear model this assumption means that the b-parameter is<br />

zero (H0 : b = 0), thus the angle does not influence the length of the shift vector at all. The<br />

p-value for all the tasks is 0.00001. With a probability of 0.00001 we would make a mistake<br />

if we deny the hypothesis that these data sets are not related. In other words it is highly<br />

significant that our data sets are related, the angle influences the vector length. Please have a<br />

look in the appendix for further explanations (A.2). This underlines exactly our expectations.<br />


6 Evaluation of the User Study<br />

If we look at the data now we see that we have a positive correlation r > 0 in every case.<br />

The coefficient in case 1 r = 0.3239 is very low compared to the other cases. Reasons for that<br />

could be that the usage of the Magic Book in the beginning is not homogeneous. Our first<br />

result that users adopt to tracking underlines this. If we exclude case 1 the r has a range from<br />

0.44 to 0.63 which indicates a relation. Possible factors influencing the correlation coefficient<br />

should be taken into account:<br />

• Delay of Trackers<br />

In our setup we have not considered the tracking delay of both tracking technologies.<br />

To calculate this delay huge efforts have to be made. We do not have exact numbers for<br />

the delay of each single tracker and the efforts for a measurement would be too high<br />

for this thesis. An idea to compensate this is to shift the data sets. The shift maximizing<br />

r has to be found.<br />

• Change in position as well<br />

It still has to be analyzed to what degree the change of position has to be taken into<br />

account, because movements in position occur. Thus the position data of the handheld<br />

device has to be explored as well for further evaluations.<br />

Summarizing all these aspects lead to the following result:<br />

Result: Correlation<br />

The change in orientation and the tracking of feature points are related significantly.<br />

The correlation coefficient r is positive and ranging from 0.44 to 0.63.<br />

Let us now look at the values computed for the linear models. The value for parameter a<br />

ranges from 1.27 to 1.78. If we round it to the next possible number:<br />

⌈a⌉ = 2 (6.11)<br />

This value for a is valid for all cases. b on the other hand ranges from 127.49 to 220.166 and<br />

is responsible for the slope of the regression line. If we want to derive a global configuration<br />

for all the cases and the corresponding tasks we have to take the maximum value for b. In<br />

the next section we will discuss if we can find a relationship between the tracking data and<br />

the performed task. Thus the linear model could be altered to the task. If we would take the<br />

maximum value for a global configuration the linear model looks like this:<br />

y = 2 + 220.166x (6.12)<br />

To calculate the search area size l 2 now we still have to add an offset k now to address the<br />

problem of positive residuals. This offset has to be evaluated in further research. Therefore<br />

the following equation is only a raw estimation:<br />

l 2 = (2 ⌈(2 + 220.166x + k)⌉) 2<br />

76<br />


6 Evaluation of the User Study<br />

Result: Regression<br />

We can derive a linear mapping f(x) with the angular difference as input parameter.<br />

Thus we can predict the shift of the tracked feature point and therefore the<br />

necessary search window size. This mapping is only an approximation and therefore<br />

an additional offset has to be added to the search window size.<br />

The scattered plots and the output data for each test case can be examined in the corresponding<br />

appendix section A.2. In the following section we try to derive conclusions about<br />

the relationship between tracking and tasks.<br />

6.1.3 Feature Point Tracking and Tasks<br />

First lets have a look at the overview of our evaluation again (6.8).<br />

Figure 6.8: Overview of further evaluations<br />

In the last section we described the relationship of the feature point tracking and the handheld<br />

orientation. We want to obtain additional information about the relationship between<br />

feature point tracking and tasks (intersection feature point coordinates and tasks in 6.8). Additional<br />

feedback for the tasks was collected from the questionnaire we handed out to the<br />

test persons. We asked the test person if they think the tasks were easy to accomplish. This<br />

was done just to have a feeling about the tasks, because we wanted to give easy tasks to the<br />

participants. We collected feedback for the free tasks as well as the other ”navigation” tasks<br />

(see figure 6.9). The figure shows that most of the test persons agree that all the tasks were<br />

easy. The free task was obviously easier than the navigation tasks, due to the fact that no<br />

restrictions were made.<br />

We will use information that we have already derived (6.1.2, 6.1.2). The following chart<br />

(6.4) shows the correlation coefficient, mean shift vector length and the mean number of<br />

connected chains for every task. With this information we will try to derive conclusions of<br />

the performed task.<br />


6 Evaluation of the User Study<br />

Figure 6.9: Results of the feedback to the question ”The task was easy to perform” (1=strongly disagree,<br />

5=strongly agree)<br />

r ¯s var(¯s) ¯t ¯c/time<br />

Case 1<br />

Free Task<br />

Case 2<br />

0.3230 2.34 0.60 30 1.55<br />

Task 1 0.6340 1.91 0.72 10.36 1.47<br />

Task 2 0.5511 2.33 0.34 10.97 2.26<br />

Task 3 0.5296 2.79 1.26 7.89 2.99<br />

Task 4 0.5949 2.93 1.16 12.77 2.68<br />

Task 5<br />

Case 3<br />

0.6254 2.13 1.11 12.40 1.80<br />

Free Task<br />

Case 4<br />

0.4642 2.33 0.61 30 2.05<br />

Task 1 0.5682 1.97 0.80 9.25 1.76<br />

Task 2 0.4934 2.07 0.45 18.39 2.15<br />

Task 3 0.4366 2.51 0.70 20.91 2.37<br />

Task 4 0.5216 2.39 0.72 12.33 1.83<br />

<strong>Table</strong> 6.4: Results of Feature Point Tracking Evaluation: r is the correlation coefficient, ¯s is the mean<br />

length of the shift vector, var(¯s) is the variance of the shift vector length, ¯t is the mean time needed to<br />

perform the task and ¯c/time is the number of connected chains in relation the needed time<br />


6 Evaluation of the User Study<br />

In the further evaluation we will not consider the free tasks because it is hard to conclude<br />

to a common behavior in this case. Just on a first glimpse we can see that the mean vector<br />

shift ¯s is equal in case 1 and case 3. The idea of the tasks was to enforce the user to perform<br />

specific actions. We will now have a look at the values in the chart and conclude to the<br />

performed actions. A full overview of the tasks can be found in the appendix B.2.<br />

First we try to separate the task into three groups. One group g1 with a rather low vector<br />

shift ¯s < 2.20 and another group g2 with a high value for ¯s > 2.50. The third group g3 is<br />

formed by the rest. We will use an abbreviation for every task: Case 2 Task 1 will be C2T 1<br />

for example.<br />

• Group g1<br />

g1 = {C2T 1, C2T 5, C4T 1, C4T 2, }<br />

g2 = {C2T 3, C2T 4, C4T 3}<br />

g3 = {C2T 2, C2T 4} (6.14)<br />

The question for the tasks in g1 was to count features in the scene, like people or items.<br />

These object were obvious to the user and distributed all over the scene. Thus the user<br />

can view the scene from a certain distance with the whole scene in his viewpoint to<br />

accomplish the tasks. These are the overview tasks. Because the user wants to see<br />

the whole scene movements of the handheld visor will result in small pixel offsets.<br />

Another indicator underlining this is the number of chains. The shift vector length is<br />

not big and therefore it is more likely that a feature point is tracked in the next frame<br />

again. This results in less connected chains. Except for C4T 2 these tasks have very<br />

good values for the correlation coefficient r, thus the linear model for those tasks are<br />

more reliable. The time to perform C4T 2 is higher than the other tasks in g1, this might<br />

give hints that the task is more ”difficult” than the other tasks in this group. General<br />

it took more time to perform C4T 2 and C4T 3 and the coefficients r are low compared<br />

to the other tasks. But we definitely found characteristics for overview tasks using the<br />

numbers in the chart.<br />

• Group g2<br />

The values for ¯s, the mean length of the shift vector, are high. As we have said if the<br />

camera is close to the 2D surface small movements will cause a large pixel offset. And<br />

indeed all of the tasks in this group were to done to let the user focus on a specific<br />

feature. Questions like ”What is the eye color of the man?” or ”What is the color of her<br />

shoes?”were posted. Thus the user has to ”zoom” into the scene which causes that<br />

the camera will get closer to the surface. We can characterize the focus task with this<br />

behavior. The question for task C2T 4 was ”How many people wear a headdress?”.<br />

With our definition of tasks it is rather a detail task than a focus task, it is more a<br />

combination of overview and focus task. But with my observations during the study i<br />

could recognize that the test persons had to focus on a certain person to figure out the<br />

headdress. Also the number of chains is relatively high in these tasks.<br />

• Group g3<br />


6 Evaluation of the User Study<br />

According to our statements made for the other groups we can expect this group to<br />

have a in-between property, because of the values for ¯s being in the middle range.<br />

This property again is the distance of the camera to the 2D plane. This leads to the<br />

conclusion that the distance between the camera and the surface for this group of tasks<br />

are between the overview (large distance) and focus (short distance) task. Lets have<br />

a look at the questions: Both questions were asking about certain features again. In<br />

contrast to the features in the focus task, it is not necessary to zoom into the scene to a<br />

high degree. But on the other hand these questions could not be answered only with<br />

having an overview of the scene. The number of chains is in the middle range as well.<br />

Because the number of chains is a specific property of the ARToolkit feature point tracking<br />

we use the vector shift length as an indicator. Using this number we can distinguish between<br />

our tasks. Here is a summary of the characteristics of each task:<br />

• Overview Task<br />

The user places the camera in a certain distance that allows to have the whole 3D<br />

scene or everything else important for the table top application in his viewpoint. Small<br />

movements of the camera result in small pixel offsets (the length of the shift vector).<br />

The best results for the linear model can be achieved.<br />

• Detail Task<br />

We have to refine our definition of detail tasks. We said that it is a combination of<br />

overview and focus task. Thus the user has to zoom in the scene from time to time to<br />

explore certain features. But the user does not need to zoom real close in the scene to<br />

perform the task. Being closer to the plane causes a larger pixel offset compare to the<br />

overview task.<br />

• Focus Task To accomplish this task the user has to zoom close to the 2D plane. Thus<br />

even small movements will result in large pixel offsets.<br />

Result: Categorization of tasks<br />

We were able to find a categorization for the proposed tasks and underline this with<br />

the corresponding evaluations of the user study.<br />

For all this results the position data of the handheld device could be taken into account as<br />

well. Due to time limitations I was not able to perform a deeper evaluation of the recorded<br />

data.<br />

6.1.4 Further Evaluations<br />

In the questionnaire also feedback to the experience of the participant concerning tracking<br />

and user interface was collected. But this feedback is not used for additional evaluations<br />

here, because the focus was on analyzing the recorded data. But it was a good possibility<br />

to retrieve this information as well. The questions were structured into usability of the<br />


6 Evaluation of the User Study<br />

Magic Book, the awareness of jittering and delay of the vision based tracker and the handheld<br />

device user interface. One interesting thing I recognized while looking on the tracking<br />

questions was that the participants were more aware of jittering than tracking delay. The<br />

test person were basically satisfied with the usability of the Magic Book, but there was a<br />

divergence in the answers to the question if the handheld device is a ”very suited device<br />

for interacting with the Magic Book”. It would be interesting to evaluate this question with<br />

other graphical output devices. All see data from the questionnaire can be seen in the appendix<br />

B.1.<br />

In the next chapter we will summarize all the results of the user study. Further on conclusions<br />

on our user study based evaluation are drawn. Implication for future work will be<br />

discussed as well.<br />


CHAPTER 7<br />

Conclusions<br />

This chapter summarizes the experiences and results of the thesis concerning table top AR.<br />

It also reviews the approach of finding a mapping between user movement and tracking<br />

parameters. Ideas and implications for future work will be shown.<br />

7.1 Results<br />

In this section we will summarize the results obtained from the evaluation of the user study.<br />

First let us have a look at our motivation for the user study again. Our idea was to characterize<br />

the relationship between feature point tracking and user behavior. The usage of the<br />

system was done by moving the video camera which is attached to the handheld device.<br />

This information should be used to configure our runtime setup consisting of a combination<br />

of a gyroscope and an optical tracker. The orientation of the gyroscope is mapped on the<br />

search window size of the tracking routine. In previous setups the search window size is a<br />

constant parameter. We want to alter it during runtime according to the change of relative<br />

orientation of the gyroscope running with a higher frame rate than the optical tracker.<br />

Due to the fact that the texture tracking ARToolkit is a special technology for natural feature<br />

tracking we will shortly discuss the importance for other natural feature tracking algorithms.<br />

Further on we will try to abstract our results and ideas for table top AR applications.<br />

7.1.1 Results of the User Study<br />

The question was: ”To what degree do changes in orientation influence the pixel offset from<br />

frame to frame in the tracking routine?”. Therefore we collected data of the handheld movement<br />

(magnetic tracker) and the feature point coordinates in the 2D video plane. The evaluation<br />

of the user study led to the following results:<br />

1. Relative orientation and feature point offset between two frames are related<br />

We applied a statistical technique called correlation analysis. The correlation coefficient<br />

r gives information to what degree two measurements are related. Analyzing the tasks<br />


7 Conclusions<br />

we derived values for r from 0.45 up to 0.63 which indicates a moderate relationship<br />

between the two sets. This result allows us to consider the relative orientation for our<br />

runtime setup.<br />

2. We can derive a linear model for this dependency<br />

Using a regression analysis we derived approximations for linear relationships between<br />

both measurements X and Y of the form:<br />

Y = a + bX + offset (7.1)<br />

The linear regression estimates values for a and b. The values differ from case to case.<br />

But we can obtain a raw approximation for a global setup. Future work is to optimize<br />

this model for each case. Also a suited value for the constant offset has to be estimated.<br />

The output of the regression analysis also indicates a high significance for the<br />

dependency between the two data sets.<br />

3. Tasks<br />

We introduced a categorization for tasks and indeed the tracking results differ in the<br />

mean pixel offset in each task group. We proposed that the categorization is dependent<br />

on the usage of the handheld and thus the camera. If the user holds the camera far<br />

away from the 2D surface he wants to have an overview. If he zooms more into the<br />

scene, but he is still not concentrating on a single feature we can speak of a detail task.<br />

If the user zooms into the scene to examine a single feature we explored the focus task.<br />

4. Tracking Failures<br />

We showed that the tracking failures when a user starts to use the application are significantly<br />

high. Thus users adopt to the tracking technology. This can be considered<br />

in the runtime configuration as well. We enlarge the search window for a specific time<br />

and reduce it stepwise during usage. But more evaluations are necessary to explore<br />

this behavior.<br />

The results give concrete steps in order to find a good configuration for our runtime setup.<br />

We proposed a DWARF based architecture for a dynamic configuration of the search window<br />

during runtime. The software architecture provided in chapter 4 has to be extended with the<br />

linear mapping. A test environment should enable the developer to test the parameters for<br />

the linear model. Thus the offset of the linear model can be estimated through testing as<br />

well.<br />

Potentials for further evaluation<br />

If we have a look at our evaluation overview again 6.8 we can see that we have not discussed<br />

the intersection of the tasks and the movement information of the handheld device yet. This<br />

aspect can be evaluated as well. As we said that the tasks can be distinguished by distance<br />

from the 2D plane, the recorded position data could be used to underline this. Also we have<br />

not compared the data of the test persons or of only one test person during all the tasks. Our<br />

first issue was to have a look at a global configuration. If the usage is different from user to<br />

user a possible idea would be to estimate user profiles for the runtime configuration.<br />


7 Conclusions<br />

Another important issue is that we still do not predict the search window size. We only<br />

found a relationship between the orientation and feature point tracking. Thus an adequate<br />

mechanism to predict the window size has to be evaluated. The property that the gyroscope<br />

has higher update rates should be utilized and the window size has to be set before the next<br />

tracking frame of the optical tracker starts.<br />

Next we will have a look how we can use the information gathered in the user study for<br />

natural feature tracking and table top AR in general.<br />

7.1.2 Natural Feature Tracking<br />

One thing that all the natural feature tracking algorithms have in common is that certain<br />

feature points are tracked. The texture tracking of the ARTookit is a special case, because it<br />

only enables the tracking of preprocessed 2D textures only. In the related work section 3.5<br />

we considered other algorithms as well. The basic assumption for all algorithms where our<br />

idea can be applied is that the inter-frame displacement is small. Our movement model is<br />

very simple. It expects the feature point to be almost at the same position. This assumption<br />

is realistic in table top AR cause the field of interest is restricted to the horizontal table top<br />

setup. We also discussed other techniques like the tracking of whole regions to address<br />

heavier motions. But it has to be evaluated if out approach is suited for these algorithms as<br />

well. The better alternative to allow larger movements with a hybrid tracking setup would<br />

be to predict the camera movement applying the gyroscope data. We also showed examples<br />

for such setups.<br />

7.1.3 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />

In our special case we used the Magic Book which is a table top application where the user<br />

does not have to solve a certain task. As the setup allows the user can move around freely<br />

and explore the 3D virtual scenes without any restrictions. In the user study we restricted<br />

the usage by asking the test participants to fulfill certain tasks and force the user to use<br />

the Magic Book in a certain way. With this method we achieved a categorization of tasks<br />

suited for the Magic Book. As we figured out these tasks are related to different movements<br />

influencing the optical tracking. Still we have to validate if these tasks are also suited for<br />

other applications. We have also discussed the huge potential for table top AR which has to<br />

be addressed by future research.<br />

Let us look at another table top application. An ISMAR 2004 demo introduced a chess<br />

game combining <strong>Augmented</strong> and Virtual <strong>Reality</strong> [15]. One player sits in front of his chess<br />

board wearing using a graphical output device. He is able to move his own tangible chessman<br />

while the chessmen of the component are virtual objects. If now a natural feature tracking<br />

technique would be used to align the virtual chessman on the chessboard, for example<br />

by tracking the edges of the chessboard, we can apply our results from our user study. Thus<br />

the user probably wants an overview of the whole scene most of the time we can apply the<br />

overview task configuration. If the chessman is nicely animated the user might have a closer<br />

look at the pawns in the game. Thus we can switch to the detail task configuration. We can<br />

assume that every application has a nature of usage known by the developer. This information<br />

can be derived by studies and be used for a configuration. Of course it is more difficult<br />


7 Conclusions<br />

if a user does not have to accomplish a measured goal, like in the Magic Book. If we consider<br />

an augmented exhibition for example it might not be obvious where a user looks first or if<br />

he will zoom closer to the object. Later on we will discuss as well how virtual content might<br />

enforce actions by the user.<br />

7.1.4 Assessment of our Approach<br />

In comparison to the related work described in 3.5 our approach was not to improve the<br />

tracking itself by using hybrid tracking technologies, but to improve a specific class of applications<br />

by considering the user context. In our case the user context was the usage of<br />

the handheld device. This usage is estimated by interpreting relative orientation given by<br />

an additional gyroscope tracker. The main part of the thesis was to design, conduct and<br />

evaluate a user study to explore a relationship between both tracking measurements. During<br />

the evaluation phase we figured out that the movement context and the behavior of the<br />

tracking routine are related. It makes sense to consider this movement context in a runtime<br />

environment provided by the proposed architecture. As we have also discussed a delay of<br />

the underlying tracking techniques has to be measured and applied for further studies. This<br />

will lead to better results. In the research area of human-computer interaction a lot of further<br />

studies have to be made to understand users in a better way. We provided some suited ideas<br />

which can be taken into account for future evaluations.<br />

In my opinion the issue of how humans interact with computer systems will become more<br />

and more important with the evolution of new interaction methods used by AR applications.<br />

On the other hand humans always adopted to new technologies and are used to learn how<br />

new technologies work. Thus it is also a question to what degree we can expect people to<br />

learn new interaction techniques.<br />

7.2 Future Work<br />

In this work we tried to characterize the user behavior for a specific application class. Still<br />

a lot of factors influencing the behavior have been unconsidered. Further on we will have a<br />

look at a technique to enforce the user to perform actions.<br />

7.2.1 Factors for User Behavior<br />

First we will introduced the most important factors influencing the user behavior:<br />

• Hardware for graphical output<br />

For our studies this factor was fixed to the handheld display. Other hardware for user<br />

interfaces might be used in a totally different way. Potential for future research is to<br />

repeat the study with a head mounted display and a tablet-PC as well. The results<br />

have to be compared and the linear models have to be adopted to the corresponding<br />

user interface hardware.<br />

• Tasks<br />


7 Conclusions<br />

In our work we tried to categorize tasks for table top AR. We varied these tasks and<br />

found out that the logging data varies between these tasks. These tasks have to be<br />

validated or even declined for other applications as well.<br />

• Virtual Content<br />

This parameter was also almost fixed during the study. Of course the scenes change<br />

from page to page but the content was comparable. But this content could be varied<br />

as well. Shapes, animations, textures could influence the user behavior by catching his<br />

attention as well. So experiments could be made with altered properties of the virtual<br />

scene in each test condition.<br />

• User<br />

These are factor depending on each single user, like psychological factors or the knowledge<br />

of a user. It is a rather difficult task to measure these factors. Performing these<br />

studies is an interdisciplinary task between computer scientists and psychologists. Our<br />

study lead to the result that a user learns about the tracking during usage for example.<br />

These factors can be varied and will probably lead to different results.<br />

7.2.2 Visual Cues<br />

The tasks used in the Magic Book user study were constructed artificially to enforce a certain<br />

feedback by the user. It will not be realistic that a user uses the Magic Book and performs<br />

these tasks if he is not forced to accomplish them. Thus we do not know what the user will<br />

do in order to provide a configuration for the runtime setup. An idea would be enforced actions<br />

by changing the virtual content. This leads to another interesting question: ”How can<br />

virtual content enforce actions by the user?”. If it is possible to use these so called visual cues<br />

we can predict actions by the user. Usually visual clues are used to give the user additional<br />

graphical feedback or hints. For example an AR kitchen project developed at the MIT provides<br />

additional visual cues to the user in order to improve the performance. It highlights<br />

a the drawer containing a needed ingredient for example [12]. In contrast we want to use it<br />

as a mechanism to start actions by the user. In the chess application mentioned earlier the<br />

system is highlighting a chessman if the user has to put out of the game [15]. Thus we expect<br />

that the user will take the chessman out of the game. Another idea is to animate a chessman<br />

when it is moved. It is more likely that the user will focus on this animation than on other<br />

features of the scene. Again it will be more difficult with museum exhibition applications,<br />

but also by changing the content chains of actions can be enforced.<br />

All of these ideas have to be underlined by corresponding studies.<br />

7.2.3 Next Steps<br />

The following steps should give a rough guideline for further steps on this research.<br />

1. User Study<br />

The user study should be repeated with the following modifications. First the tracking<br />

delay for both trackers must be considered. Now we can vary some of the factors<br />


7 Conclusions<br />

described above. In a first test condition the study is made with the handheld device<br />

again, the second test case with a tablet PC. The resulting evaluation data should be<br />

compared. Also the content could be changed in a second study. The first condition<br />

will evaluate the fairy tale Magic Book while the second condition will be made with<br />

different content. Experiments with visual cues could be made as well. The evaluation<br />

methods described in this work could be applied.<br />

2. Extend DWARF-Architecture<br />

Still the DWARF component has to be fed with configuration. At this stage the architecture<br />

only provides the communication mechanisms and the configuration for the<br />

linear model could be easily integrated. A good idea would be to provide a test environment<br />

where the parameters of the linear model (7.1) could be altered. Another<br />

interesting idea is that the linear model adopts to the occurring tracking failures.<br />

3. Validate Results<br />

Finally it has to be shown that the dynamically configured Magic Book really leads to a<br />

better performance in term of robustness and computation. Possible method would be<br />

another user study with two test conditions, one without dynamic configuration and<br />

one with the gyroscope setup.<br />

Finally I hope that I could provide ideas and encouragements to continue parts of this<br />

work.<br />


Glossary<br />

<strong>Augmented</strong> <strong>Reality</strong>. The goal of <strong>Augmented</strong> <strong>Reality</strong> is to enrich the real world by overlaying<br />

it with virtual information.<br />

Calibration. Calibration is the task of defining and configuring parameters that stay constant<br />

during a −→ tracking task. Especially the integration of different coordinate systems<br />

into a world coordinate system is meant by this term in this thesis.<br />

Correlation. The correlation coefficient of two data sets indicates the relationship between<br />

both measurements. The correlation coefficient r ranges from -1 to 1. |r| ≈ 1 indicates<br />

a strong relationship, |r| ≈ 0 indicates a weak relationship.<br />

DOF. Degrees of freedom determines the state measurement ability of a tracking technology<br />

in a three dimensional environment. 3DOF determines position or orientation of an<br />

object, while 6DOF determines both (see −→ Pose).<br />

DWARF. The Distributed Wearable <strong>Augmented</strong> <strong>Reality</strong> Framework is a component-based<br />

framework enabling fast prototyping for AR applications. These components are<br />

reusable and distributed. A Corba-based infrastructure provides communication<br />

mechanisms for the components.<br />

Human Computer Interaction. Human Computer Interaction is a discipline in computer research<br />

putting a focus on the design, evaluation and implementation of interactive<br />

computer systems. Thus the interaction between humans and computers is an important<br />

issue.<br />

Immersion. Immersion is a measurement to what degree a user is affected by a virtual or augmented<br />

experience. Psychological factors as well as properties of the setup influence<br />

the immersion.<br />

Inertial Tracking. Based on the law of inertia accelerometers estimate the position of an object.<br />

Important for many applications are gyroscopes measuring the relative orientation<br />

of objects. Relative orientation means that we estimate the rate of turn, not<br />

absolute angles in a world coordinate frame.<br />


Glossary<br />

Linear Regression. Linear Regression is the estimation of a linear model for two variables X<br />

and Y , while X is the independent predictor and Y the dependent prediction. The<br />

linear model has the form Y = a + bX and values for a and b are approximated.<br />

Natural Feature Tracking. Natural Feature Tracking is a optical or −→ vision-based tracking<br />

method. Features from the environment are extracted from the video image. In the<br />

next video frame these features have to be found again. While computing the correspondences<br />

between the 2D feature points in the video frame and the 3D objects the<br />

camera pose can be estimated.<br />

Pose. Pose is a data structure containing spatial information of an object. The spatial information<br />

of an object consists of position and orientation.<br />

Registration. Registration is the problem of aligning virtual objects in the real environment.<br />

Two task are important for registration: the −→ calibration of the setup and the tracking.<br />

This is an important issue for the quality of an −→ <strong>Augmented</strong> <strong>Reality</strong> application<br />

<strong>Table</strong> <strong>Top</strong>. <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> is a special class of <strong>Augmented</strong> <strong>Reality</strong> −→ applications.<br />

It could be characterized by a horizontal setup with a restricted interaction area.<br />

Main application domains are exhibitions, education, gaming and collaboration.<br />

Tasks. A task is the purpose for the use of a computer application. A user has to accomplish<br />

a certain goal usually motivated by the user himself. In our case we describe the task<br />

to the user.<br />

Texture Tracking. In our context texture tracking is a special case of natural feature tracking.<br />

2 dimensional textures can used for the tracking of feature points. In the ARToolkit<br />

version these textures have to be preprocessed first.<br />

Tracking. Tracking is used for the estimation of an objects state. Tracking is a loop process<br />

consisting of the estimation and the update of the target’s state. A tracker is the underlying<br />

technology responsible for the −→ pose estimation.<br />

Vision-based Tracking. Also the term optical −→ tracking is used for this tracking technology<br />

which consists of hardware grabbing video frames and image recognition software<br />

analyzing the image and providing −→ pose information.<br />


A.1 Conduction of the User Study<br />

A.1.1 Questionnaire<br />


User Study<br />

Here is a short overview of the rationale behind the questions posted in the questionnaire<br />

(see figure A.1). All the questions except part ’A’ could be answered on a scale of 1 to 5. In<br />

part ’B’ the scale options were 1=’never heard of it’ and 5=’experienced developer’, in part<br />

’C’ to ’E’ the scale options were 1=’strongly disagree’ and 5=’strongly agree’. The five scales<br />

seemed adequate for us, because the focus of the work was not on the questionnaire and it<br />

provides a suited feedback of the participants. The results can be explored in section B.1.<br />

A Personal details of participants<br />

These questions concerning age, occupation, gender and hand usage were posetd to<br />

have an overview of the participating test persons. Thus it can be evaluated if the<br />

group of participants is suited for the study. For future research the test person could<br />

be splitted into groups and the data sets could be compared.<br />

B Background on <strong>Augmented</strong> <strong>Reality</strong><br />

This part of the questionnaire should collect information about the previous knowledge<br />

of the test persons. Questions were posted about knowledge in <strong>Augmented</strong> <strong>Reality</strong><br />

in general and knowledge on the Magic Book.<br />

C Behavior of the Magic Book<br />

The intention of these questions was to collect feedback on the tracking and the usability<br />

of the Magic Book. Question C1 intended on usability, C2 on the awareness of<br />

jittering of the tracking and C3 focused on the tracking delay. But intention of this part<br />

was just to have a feedback about these issues by the test person. It was not used for<br />

further evaluation.<br />

D Tasks<br />


A User Study<br />

The question ”The task was easy to perform” should give information if the test persons<br />

had any problems performing the different tasks.<br />

E Handheld device<br />

Here feedback on the handheld device was gathered. This information can be used for<br />

future evaluation with other user interfaces.<br />

F Comments and encouragements<br />

A.1.2 Instructions and Guideline<br />

The instructions to the user can be seen in figure A.2. Figure A.3 shows the guideline for the<br />

study.<br />


A User Study<br />


A. Personal Details<br />

Age: ______________ female O lefthanded O<br />

Occupation: ______________ male O righthanded O<br />

B. Background on AR<br />

How familiar am I with <strong>Augmented</strong> <strong>Reality</strong>.<br />

1 (never heard of it) 2 3 4 5 (experienced developer)<br />

How familiar am I with the Magic Book.<br />

1 (never used it) 2 3 4 5 (familiar with technologies used by the Magic<br />

Book)<br />

C. Behaviour of the Magic Book<br />

It easy to use the Magic Book.<br />

1 (strongly disagree) 2 3 4 5 (strongly agree)<br />

The scenes on the page were always clear and stable<br />

1 (strongly disagree) 2 3 4 5 (strongly agree)<br />

The scenes in the book responded to my movements immediately.<br />

1 (strongly disagree) 2 3 4 5 (strongly agree)<br />

D. Tasks<br />

The free task was easy to perform.<br />

1 (strongly disagree) 2 3 4 5 (strongly agree)<br />

The navigation task was easy to perform.<br />

1 (strongly disagree) 2 3 4 5 (strongly agree)<br />

E. Handheld device<br />

The Handheld device is a very suitable device for interacting with the Magic Book.<br />

1 (strongly disagree) 2 3 4 5 (strongly agree)<br />

F. Comments on the Magic Book, UI, Tasks, etc. (voluntarily)<br />

Date: ID: Thank you very much! ☺<br />

Figure A.1: Questionnaire<br />


A User Study<br />


1. The study will take 10 minutes<br />

2. Records are held anonymous<br />

3. In this user study we try to evaluate in which actions occur when people<br />

are using the Magic Book.<br />

4. Usage:<br />

a. Just use the Handheld device and look at the marker first to obtain<br />

an initial orientation<br />

b. Now you can move freely through the scene<br />

c. If you lose the scene just look on the marker again<br />

5. Steps:<br />

a. You first have a chance to do some training with the Magic Book<br />

b. In the free task you can move around as you feel for it. Goal is to<br />

understand the scene (30 seconds).<br />

c. In the navigation task I will ask some questions concerning the 3D<br />

scene, like “how many trees do you see? (until finished)”<br />

d. 2 free tasks and 2 navigation tasks<br />

e. The other scenes are for your pleasure<br />

6. There is no wrong answer to my questions<br />

7. Data will only be used for the analysis of your movements of the<br />

handheld device<br />

8. Feel free to ask questions any time<br />

9. Comments are very welcome<br />

10. Thank you!<br />

Figure A.2: Instructions to the user study participant<br />


A User Study<br />


1. Welcome participant<br />

2. Guide him through the study (see instructions)<br />

3. Perform tasks (see below)<br />

4. Questionaire<br />

5. Ask for comments<br />

6. Give contact details in case of further questions<br />

Scenes (Pages):<br />

1. Practice<br />

2. CASE 1: Free Task<br />

a. 30 seconds<br />

3. for your pleasure (voluntary)<br />

4. CASE 2:<br />


a. Task 1: How many people do you see in this scene (7)?<br />

b. Task 2: What is the haircolour of the woman with the white skirt<br />

c. Task 3: What is the colour of her shoes (blond/blue)?<br />

d. Task 4: How many people wear a headdress (6)?<br />

e. Task 5: How many pieces of wood do you see (4, 5-6)?<br />

5. for your pleasure (voluntary)<br />

6. for your pleasure (voluntary)<br />

7. CASE 3: Free Task<br />

a. 30 seconds<br />

8. CASE 4:<br />


a. Task 1: How many people do you see in this scene (7)?<br />

b. Task 2: How many women and how many men do you see (2/5)?<br />

c. Task 3: What is the eye colour of the man with the purple coat (blue)?<br />

d. Task 4: How many windows and how many doors do you the on the<br />

front of the church (4/2)?<br />

Figure A.3: Guideline through the user study<br />


A.2 Statistical Tools<br />

A User Study<br />

This chapter should deliver an overview of the statistical evaluation methods used in this<br />

thesis. First the correlation is introduced which gives a measurement for the degree of a<br />

linear relationship for two sets of data. The regression analysis gives an estimation model<br />

for this linear relationship. Further and detailed discussions of these methods can be found<br />

in the corresponding references [18][52]. In my evaluation I mainly used two tools, an open<br />

source statistical analysis tool called GRETL 1 and Matlab 2 . All the plots have been done with<br />

Matlab.<br />

A.2.1 Correlation<br />

Basic question if we consider a correlation evaluation is how two sets of data (X, Y ) are<br />

related to each other. If increasing values of measurement X result in increasing values of<br />

measurement Y , we can expect a positive relationship between those two data sets. The<br />

correlation method provides a measurement for the grade of relationship between two independent<br />

measurements, the correlation coefficient.<br />

Correlation-Coefficient The Correlation-Coefficient r gives the degree of the strength of a relationship<br />

between two measurements with<br />

It is also called the Bravais-Pearson correlation coefficient.<br />

− 1 ≤ r ≤ 1 (A.1)<br />

|r| < 0.5 expresses a weak relationship, while |r| ≥ 0.8 expresses a very strong dependency.<br />

If the measurements tend to be on one straight line, the higher will be |r|. If this<br />

straight line has a positive slope (positive correlation) r ≈ 1 and with a negative slope (negative<br />

correlation) r ≈ −1 (see figure A.4). If the slope equals 0 we have no correlation at<br />

all.<br />

Figure A.4: The Bravais-Pearson-Correlation Coefficient expresses the grade of linear relationship<br />

between two data sets. The right figure show a positive correlation, the figure in the middle shows<br />

no correlation and the left figure show a negative correlation<br />

The correlation coefficient r for the measurement sets (xi, yi), with i = 1, .., n is calculated<br />

the following way:<br />

1 http://gretl.sourceforge.net/<br />

2 http://www.mathworks.com/<br />


= rXY =<br />

A User Study<br />

n i=1 (xi − ¯x)(yi − ¯y)<br />

n i=1 (xi − ¯x) 2 ˜sXY<br />

=<br />

(yi − ¯y) 2 ˜sX ˜sY<br />

(A.2)<br />

with ¯x and ¯y mean values of x and y. ˜sX and ˜sY are the standard deviations of measurements<br />

X and Y .<br />

˜sXY = 1<br />

n<br />

<br />

<br />

<br />

˜sX = 1<br />

n<br />

(xi − ¯x)<br />

n<br />

i=1<br />

2 (A.3)<br />

<br />

<br />

<br />

˜sY = 1<br />

n<br />

(yi − ¯y)<br />

n<br />

2 (A.4)<br />

i=1<br />

n<br />

(xi − ¯x)(yi − ¯y) (A.5)<br />

i=1<br />

The calculation of r is provided by most of the statistical analysis tools. After applying<br />

this function we can characterize the degree of linear relationship of our measurements. It is<br />

important to note that the coefficient only gives information on a linear correlation, not on<br />

other forms of dependency. If we want to estimate the linear model now we have to apply a<br />

further technique called linear regression.<br />

A.2.2 Linear Regression<br />

The goal of the linear regression is to derive a linear model to estimate the value of the<br />

predicted variable Y . This is done by performing a linear transform of the predictor variable<br />

X. The transformation looks like this:<br />

Y = f(X) = a + bX (A.6)<br />

Of course this function is only an approximation, because not all of the measurements<br />

will fall on one straight line (see as well A.4). Therefore for every pair of data (xi, yi) the<br />

following relationship is applied:<br />

yi = a + bxi + ei<br />

(A.7)<br />

ei is the error resulting because of the adoption to the linear relationship for i = 1, ..n. a<br />

and b are the parameters of the linear regression model. One assumption of this model is<br />

that we can not estimate the error ei if we know xi, there is no dependency. The sum of all<br />

errors has to be minimized in order to obtain a ”best” straight line and therefore a ”best”<br />

regression model. The error ei for every measurement yi can be calculated as:<br />

ei = yi − ˆyi<br />

(A.8)<br />

where ˆyi is the predicted value of the linear mapping and yi is the actual measured value.<br />

This error also called residual has to be summed for all data pairs. As we do not want that<br />

negative and positive values balance each other we square the error values, the so called<br />

squared residuals. The average of this value Q has to be minimized.<br />


Q(a, b) = 1<br />

n<br />

n<br />

i=1<br />

A User Study<br />

(yi − ˆyi) 2 = 1<br />

n<br />

n<br />

(yi − (a + bxi)) 2<br />

i=1<br />

(A.9)<br />

We now have to estimate the values for a and b resulting in a minimal Q(a, b). This method<br />

is also called sum of least squares method.<br />

The estimates (â, ˆ b) of (a, b) can be calculated by setting the partial derivative of a and b to<br />

0. These both equations have to solved.<br />

δQ(a, b)<br />

δa<br />

δQ(a, b)<br />

δb<br />

= −2<br />

= −2<br />

n<br />

(yi − (a + bxi)) = 0 (A.10)<br />

i=1<br />

n<br />

(yi − (a + bxi))xi = 0 (A.11)<br />

i=1<br />

Further calculations lead to these two equations:<br />

1<br />

n<br />

n<br />

i=1<br />

n 1<br />

yi − â −<br />

n<br />

i=1<br />

ˆb 1<br />

n<br />

xiyi − 1<br />

n â<br />

n<br />

xi − 1<br />

n ˆb i=1<br />

n<br />

yi = 0 (A.12)<br />

i=1<br />

n<br />

i=1<br />

Now â and ˆ b can be calculated out of these equations:<br />

If we put this in equation A.13 we can derive ˆ b.<br />

ˆ b =<br />

x 2 i = 0 (A.13)<br />

â = ¯y − ˆ b¯x (A.14)<br />

n i=1 (xi − ¯x)(yi − ¯y)<br />

n i=1 (xi − ¯x) 2 = ˜sXY<br />

˜s 2 X<br />

(A.15)<br />

Now we have calculated both parameters of the linear regression. But we still need an<br />

indicator for the quality of this estimation, because if the error e increases with higher x<br />

values a linear model might not be suited best to model the real world. The idea is that<br />

we split the Sum of Squares Total (SQT) which gives a value for the total variance into two<br />

components.<br />

Sum of Squares Explained (SQE): This is the variance of our linear model.<br />

SQE =<br />

n<br />

(ˆyi − ¯y) 2<br />

Sum of Squares Residuals (SQR) This is the rest of the distribution of the yi values.<br />

SQR =<br />

97<br />

i=1<br />

n<br />

(yi − ˆy) 2<br />

i=1<br />

(A.16)<br />


A User Study<br />

Sum of Squares Total (SQT): Thus the total variance is the variance described with our linear<br />

model plus the the sum of the variance we can not explain.<br />

SQT = SQE + SQR (A.18)<br />

With these values we can calculate a determination coefficient R 2 .<br />

R 2 = SQE<br />

SQT<br />

= 1 −<br />

n<br />

i=1 (yi − ˆy) 2<br />

n<br />

i=1 (yi − ¯y) 2<br />

(A.19)<br />

R 2 is an indicator for the part of the total dispersion that can be explained by our linear<br />

model. This coefficient has domain from 0 to 1. If all of our measurements would fall on one<br />

straight line it would result in coefficient of R 2 = 1, because our derived model matches the<br />

reality to 100%. If the residuals would be spreaded around the regression line randomly, the<br />

model would not be suited and results in a low determination coefficient.<br />

The determination coefficient R 2 is also related to the to correlation coefficient rXY for the<br />

measurements X and Y. The proof can be found in [18].<br />

R 2 = r 2 XY<br />

(A.20)<br />

Another method get a feeling of the quality of the linear model is a graphical plot of the<br />

residuals. If the residuals are close to 0 and vary without any system around the horizontal<br />

axis we can assume a suited model.<br />

Example<br />

This small example taken from [52] should demonstrate the conclusions derived from the<br />

correlation coefficient and the regression model. The following measurements have to be<br />

analyzed concerning a linear dependency (A.1). Calculations were made with GRETL, the<br />

plots were made with Matlab.<br />

i 1 2 3 4 5 6 7 8 9 10<br />

xi 1 5 3 8 2 2 10 8 7 4<br />

yi 1 6 1 6 3 2 8 5 6 2<br />

<strong>Table</strong> A.1: Example data measurements: 10 test samples were made for variable X and Y<br />

In figure A.5 you could see the resulting plot with the linear regression model. Figure A.6<br />

you could see the residuals. These residuals are distributed randomly, there is no obvious<br />

dependency between x and the values of the residuals. Putting the data into GRETL results<br />

in the output shown in figure A.7.<br />

The correlation coefficient is r = 0.8934 which is a indicator for a high relation between<br />

both variables. We can also extract the linear model for the prediction of variable Y out of<br />

this output (COEFFICIENT-column in A.7).<br />

y = f(x) = 0.395349 + 0.720930x<br />


A User Study<br />

Figure A.5: The plotted data with the corresponding linear regression model<br />

Figure A.6: The plot shows the values of the residuals for every predicted y value<br />


A User Study<br />

Model 4: OLS estimates using the 10 observations 1-10<br />

Dependent variable: Y<br />


0) const 0,395349 0,742950 0,532 0,609090<br />

1) X 0,720930 0,128171 5,625 0,000496 ***<br />

Mean of dependent variable = 4<br />

Standard deviation of dep. var. = 2,49444<br />

Sum of squared residuals = 11,3023<br />

Standard error of residuals = 1,18861<br />

Unadjusted R-squared = 0,798173<br />

Adjusted R-squared = 0,772944<br />

Degrees of freedom = 8<br />

corr(X, Y) = 0,8934<br />

Figure A.7: The output from the GRETL tool: All the important figures concerning the linear regression<br />

model are shown<br />

The determination coefficient results in R 2 = r 2 = 0.798173. This lets us conclude that we<br />

found a very suited model.<br />

The GRETL output also calculated a p-value p. This p-value is the probability that we<br />

make a mistake if we deny the null-hypothesis of the t-test. The null-hypothesis H0 in our<br />

case means that the measurements are not related. In the linear model Y = a + bX it means<br />

that b = 0, Y can not be explained with the predictor X:<br />

H0 : b = 0<br />

H1 : b = 0<br />

(A.21)<br />

Thus the p-value is an indicator if we should accept or deny H0. GRETL automatically<br />

performs this t-test. In our example the p-value for X declaring Y is 0.000496. If this p-value<br />

is under a certain significance parameter α we can reject H0. GRETL computes the t-test<br />

for three significance level α = {0.1, 0.05, 0.01}. The significance level is the probability we<br />

admit for a false choice of a hypothesis. If p < α, H0 is rejected. In our example p is lower<br />

than all the values for α. This is indicated by the three ∗ behind the p-value.<br />

Hence we can conclude that X and Y are related and that we can accept H1. The lower<br />

p the more confidence can be put in the assumption that the data measurements are highly<br />

related. For the calculation of this value and the testing of hypothesis please have a look in<br />

the corresponding literature.<br />



Complete Results<br />

In this section of the appendix the complete results of the questionnaire and the analysis of<br />

the linear regression of the data sets can be examined.<br />

B.1 Questionnaire<br />

Here is the data extracted from the questionnaires of the user study.<br />

A Personal Details of participants<br />

1. Gender<br />

male female<br />

80% 20 %<br />

2. Age<br />

20-22 23-25 26-28 29-31 32+<br />

10% 40% 25% 10 % 15 %<br />

3. Hand<br />

left right<br />

95% 5 %<br />

4. Occupation<br />

Student Employee Researcher Else<br />

70 % 10% 10% 10%<br />

B Background on <strong>Augmented</strong> <strong>Reality</strong><br />

1. How familiar am I with <strong>Augmented</strong> <strong>Reality</strong>?<br />

1 2 3 4 5<br />

10% 25% 15% 5% 45%<br />

(1=never heard of it, 5=experienced developer)<br />


B Complete Results<br />

2. How familiar am I with the Magic Book?<br />

1 2 3 4 5<br />

5% 30% 15% 15% 35%<br />

C Behavior of the Magic Book<br />

D Tasks<br />

1. It was easy to use the Magic Book<br />

1 2 3 4 5<br />

0% 0% 45% 40% 15%<br />

(1=never heard of it, 5=experienced developer)<br />

(1=strongly disagree, 5=strongly agree)<br />

2. The scenes on the page were always clear and stable<br />

1 2 3 4 5<br />

5% 35% 45% 15% 0%<br />

(1=strongly disagree, 5=strongly agree)<br />

3. The scenes in the book responded to my movements immediately<br />

1 2 3 4 5<br />

0% 5% 10% 55% 30%<br />

1. The free task was easy to perform<br />

1 2 3 4 5<br />

0% 0% 10% 30% 60%<br />

2. The navigation task easy to perform<br />

1 2 3 4 5<br />

0% 0% 15% 65% 20%<br />

E Handheld device<br />

(1=strongly disagree, 5=strongly agree)<br />

(1=strongly disagree, 5=strongly agree)<br />

(1=strongly disagree, 5=strongly agree)<br />

1. The handheld device is a very suitable device for interacting with the Magic Book<br />

1 2 3 4 5<br />

5% 35% 45% 15% 0%<br />

F Comments and encouragements<br />

(1=strongly disagree, 5=strongly agree)<br />

Here are some comments made on the Magic Book, the handheld device and the user<br />

study itself:<br />

cables were disturbing, use of a head mounted display as comparison, cables limit the<br />

range of movement, handheld device is too heavy, tracking fails too often, cool book,<br />

some colors do not look good in the handheld display, screen was a little blury, difficult<br />

to see the whole scene, giant was not completely visible, image disappears, not stable,<br />

jittering and tracking failure is annoying, heavy device is hard to keep stable, would<br />

be good not to hold anything, :-), tasks were good<br />


B.2 Cases<br />

B Complete Results<br />

In this section all the cases with the corresponding tasks are described. The results of the<br />

linear regression are shown as well.<br />

B.2.1 Case 1<br />

Case 1 was the first free task without any questions about the virtual scene (B.1). Figure B.2<br />

shows the scattered plot with the regression line and a plot of the residuals. The GRETL<br />

output is shown in B.3.<br />

Figure B.1: Case 1: Virtual Scene<br />

Figure B.2: Case 1: The scattered plot with regression line and the plot of the residuals<br />


B Complete Results<br />

Model 1: OLS estimates using the 16327 observations 1-16327<br />

Dependent variable: Vector<br />


0) const 1,56895 0,0274115 57,237 < 0,00001 ***<br />

2) Angle 144,624 3,30650 43,739 < 0,00001 ***<br />

Mean of dependent variable = 2,31546<br />

Standard deviation of dep. var. = 2,89688<br />

Sum of squared residuals = 122635<br />

Standard error of residuals = 2,74082<br />

Unadjusted R-squared = 0,104897<br />

Adjusted R-squared = 0,104843<br />

Degrees of freedom = 16325<br />

Pairwise correlation coefficients:<br />

corr(Vector, Angle) = 0,3239<br />

Figure B.3: The output from the GRETL tool: Case 1<br />


B.2.2 Case 2<br />

B Complete Results<br />

Figure B.4 shows the virtual scene for Case 2. Now with this scene the test persons had to<br />

perform several tasks. Questions were posted to the test person.<br />

• Task 1<br />

Figure B.4: Case 2: Virtual Scene<br />

Question: ”How many people do you see in the scene ?” (plot: B.5, output: B.6)<br />

Figure B.5: Case 2 / Task 1: The scattered plot with regression line and the plot of the residuals<br />


B Complete Results<br />

Model 1: OLS estimates using the 5773 observations 1-5773<br />

Dependent variable: Vector<br />


0) const 1,26631 0,0238207 53,160 < 0,00001 ***<br />

2) Angle 194,822 3,12817 62,280 < 0,00001 ***<br />

Mean of dependent variable = 2,11212<br />

Standard deviation of dep. var. = 1,9226<br />

Sum of squared residuals = 12759,7<br />

Standard error of residuals = 1,48694<br />

Unadjusted R-squared = 0,401955<br />

Adjusted R-squared = 0,401851<br />

Degrees of freedom = 5771<br />

Pairwise correlation coefficients:<br />

corr(Vector, Angle) = 0,6340<br />

Figure B.6: The output from the GRETL tool: Case 2 / Task 1<br />


• Task 2<br />

B Complete Results<br />

Question: ”What is the haircolor of the woman with the white skirt ?” (plot: B.7, output:<br />

B.8)<br />

Figure B.7: Case 2 / Task 2: The scattered plot with regression line and the plot of the residuals<br />

Model 1: OLS estimates using the 5757 observations 1-5757<br />

Dependent variable: Vector<br />


0) const 1,45847 0,0269841 54,049 < 0,00001 ***<br />

2) Angle 177,196 3,53711 50,096 < 0,00001 ***<br />

Mean of dependent variable = 2,24338<br />

Standard deviation of dep. var. = 1,99741<br />

Sum of squared residuals = 15991,1<br />

Standard error of residuals = 1,66692<br />

Unadjusted R-squared = 0,303659<br />

Adjusted R-squared = 0,303538<br />

Degrees of freedom = 5755<br />

Pairwise correlation coefficients:<br />

corr(Vector, Angle) = 0,5511<br />

Figure B.8: The output from the GRETL tool: Case 2 / Task 2<br />


• Task 3<br />

B Complete Results<br />

Question: ”What is the color of the shoes of the woman with the white skirt ?” (plot:<br />

B.9, output: B.10)<br />

Figure B.9: Case 2 / Task 3: The scattered plot with regression line and the plot of the residuals<br />

Model 1: OLS estimates using the 3927 observations 1-3927<br />

Dependent variable: Vector<br />


0) const 1,78150 0,0423130 42,103 < 0,00001 ***<br />

2) Angle 193,941 4,95880 39,110 < 0,00001 ***<br />

Mean of dependent variable = 2,79864<br />

Standard deviation of dep. var. = 2,46541<br />

Sum of squared residuals = 17171,3<br />

Standard error of residuals = 2,09161<br />

Unadjusted R-squared = 0,280428<br />

Adjusted R-squared = 0,280245<br />

Degrees of freedom = 3925<br />

Pairwise correlation coefficients:<br />

corr(Vector, Angle) = 0,5296<br />

Figure B.10: The output from the GRETL tool: Case 2 / Task 3<br />


• Task 4<br />

B Complete Results<br />

Question: ”How many people wear a hat or a headdress ?” (plot: B.11, output: B.12)<br />

Figure B.11: Case 2 / Task 4: The scattered plot with regression line and the plot of the residuals<br />

Model 1: OLS estimates using the 6456 observations 1-6456<br />

Dependent variable: Vector<br />


0) const 1,76134 0,0320220 55,004 < 0,00001 ***<br />

2) Angle 220,166 3,70264 59,462 < 0,00001 ***<br />

Mean of dependent variable = 2,89062<br />

Standard deviation of dep. var. = 2,57711<br />

Sum of squared residuals = 27697,3<br />

Standard error of residuals = 2,07159<br />

Unadjusted R-squared = 0,353936<br />

Adjusted R-squared = 0,353836<br />

Degrees of freedom = 6454<br />

Pairwise correlation coefficients:<br />

corr(Vector, Angle) = 0,5949<br />

Figure B.12: The output from the GRETL tool: Case 2 / Task 4<br />


• Task 5<br />

B Complete Results<br />

Question: ”How many pieces of wood do you see ?” (plot: B.13, output: B.14)<br />

Figure B.13: Case 2 / Task 5: The scattered plot with regression line and the plot of the residuals<br />

Model 1: OLS estimates using the 6641 observations 1-6641<br />

Dependent variable: Vector<br />


0) const 1,34886 0,0262689 51,348 < 0,00001 ***<br />

2) Angle 210,083 3,21658 65,312 < 0,00001 ***<br />

Mean of dependent variable = 2,19743<br />

Standard deviation of dep. var. = 2,38431<br />

Sum of squared residuals = 22981,7<br />

Standard error of residuals = 1,86054<br />

Unadjusted R-squared = 0,391181<br />

Adjusted R-squared = 0,391089<br />

Degrees of freedom = 6639<br />

Pairwise correlation coefficients:<br />

corr(Vector, Angle) = 0,6254<br />

Figure B.14: The output from the GRETL tool: Case 2 / Task 5<br />


B.2.3 Case 3<br />

B Complete Results<br />

Case 3 (B.15 was again a free task without any questions posted to the test person (plot: B.16,<br />

output: B.17).<br />

Figure B.15: Case 3: Virtual Scene<br />

Figure B.16: Case 3: The scattered plot with regression line and the plot of the residuals<br />


B Complete Results<br />

Model 1: OLS estimates using the 14743 observations 1-14743<br />

Dependent variable: Vector<br />


0) const 1,55991 0,0190527 81,873 < 0,00001 ***<br />

2) Angle 127,487 2,00371 63,625 < 0,00001 ***<br />

Mean of dependent variable = 2,24612<br />

Standard deviation of dep. var. = 2,15298<br />

Sum of squared residuals = 53611,2<br />

Standard error of residuals = 1,90706<br />

Unadjusted R-squared = 0,215452<br />

Adjusted R-squared = 0,215399<br />

Degrees of freedom = 14741<br />

Pairwise correlation coefficients:<br />

corr(Vector, Angle) = 0,4642<br />

Figure B.17: The output from the GRETL tool: Case 3<br />


B.2.4 Case 4<br />

B Complete Results<br />

The scene for case 4 can be seen in figure B.18. Several tasks had to performed again in this<br />

case and the corresponding questions were posted.<br />

• Task 1<br />

Figure B.18: Case 4: Virtual Scene<br />

Question: ”How many people do you see in the scene ?” (plot: B.19, output: B.20)<br />

Figure B.19: Case 4 / Task 1: The scattered plot with regression line and the plot of the residuals<br />


B Complete Results<br />

Model 1: OLS estimates using the 5284 observations 1-5284<br />

Dependent variable: Vector<br />


0) const 1,16379 0,0269233 43,226 < 0,00001 ***<br />

2) Angle 190,290 3,79175 50,185 < 0,00001 ***<br />

Mean of dependent variable = 1,93473<br />

Standard deviation of dep. var. = 1,95301<br />

Sum of squared residuals = 13644,7<br />

Standard error of residuals = 1,60725<br />

Unadjusted R-squared = 0,322868<br />

Adjusted R-squared = 0,32274<br />

Degrees of freedom = 5282<br />

Pairwise correlation coefficients:<br />

corr(Vector, Angle) = 0,5682<br />

Figure B.20: The output from the GRETL tool: Case 4 / Task 1<br />


• Task 2<br />

B Complete Results<br />

Question: ”How many women and how many men do you see?” (plot: B.21, output:<br />

B.22)<br />

Figure B.21: Case 4 / Task 2: The scattered plot with regression line and the plot of the residuals<br />

Model 1: OLS estimates using the 9356 observations 1-9356<br />

Dependent variable: Vector<br />


0) const 1,48687 0,0198384 74,949 < 0,00001 ***<br />

2) Angle 131,274 2,39249 54,869 < 0,00001 ***<br />

Mean of dependent variable = 2,04073<br />

Standard deviation of dep. var. = 1,89916<br />

Sum of squared residuals = 25525,9<br />

Standard error of residuals = 1,65193<br />

Unadjusted R-squared = 0,243489<br />

Adjusted R-squared = 0,243408<br />

Degrees of freedom = 9354<br />

Pairwise correlation coefficients:<br />

corr(Vector, Angle) = 0,4934<br />

Figure B.22: The output from the GRETL tool: Case 4 / Task 2<br />


• Task 3<br />

B Complete Results<br />

Question: ”What is the eye color of the man with the purple coat ?” (plot: B.23, output:<br />

B.24)<br />

Figure B.23: Case 4 / Task 3: The scattered plot with regression line and the plot of the residuals<br />

Model 1: OLS estimates using the 9535 observations 1-9535<br />

Dependent variable: Vector<br />


0) const 1,74120 0,0276468 62,980 < 0,00001 ***<br />

2) Angle 130,608 2,75606 47,390 < 0,00001 ***<br />

Mean of dependent variable = 2,44667<br />

Standard deviation of dep. var. = 2,52852<br />

Sum of squared residuals = 49333,1<br />

Standard error of residuals = 2,27486<br />

Unadjusted R-squared = 0,190662<br />

Adjusted R-squared = 0,190577<br />

Degrees of freedom = 9533<br />

Pairwise correlation coefficients:<br />

corr(Vector, Angle) = 0,4366<br />

Figure B.24: The output from the GRETL tool: Case 4 / Task 3<br />


• Task 4<br />

B Complete Results<br />

Question: ”How many windows and how many doors do you see and the front of the<br />

church ?” (plot: B.25, output: B.26)<br />

Figure B.25: Case 4 / Task 4: The scattered plot with regression line and the plot of the residuals<br />

Model 1: OLS estimates using the 5140 observations 1-5140<br />

Dependent variable: Vector<br />


0) const 1,49365 0,0361831 41,280 < 0,00001 ***<br />

2) Angle 197,812 4,51415 43,820 < 0,00001 ***<br />

Mean of dependent variable = 2,30959<br />

Standard deviation of dep. var. = 2,60671<br />

Sum of squared residuals = 25419,1<br />

Standard error of residuals = 2,22425<br />

Unadjusted R-squared = 0,272055<br />

Adjusted R-squared = 0,271914<br />

Degrees of freedom = 5138<br />

Pairwise correlation coefficients:<br />

corr(Vector, Angle) = 0,5216<br />

Figure B.26: The output from the GRETL tool: Case 4 / Task 4<br />


