Diploma Thesis: Improving Augmented Reality Table Top ... - TUM

Technische Universität 

München 

Fakultät für Informatik 

Diplomarbeit 

❞ ❞ ❞ ❞ 

❞ ❞ 

❞ ❞ 

❞ 

❞ 

❞ 

❞ ❞ ❞ 

❞ 

❞ 

❞ 

❞ ❞ 

Improving Augmented Reality Table 

Top Applications with Hybrid 

Tracking 

Felix Löw

Technische Universität 

München 

Fakultät für Informatik 

Diplomarbeit 

❞ ❞ ❞ ❞ 

❞ ❞ 

❞ ❞ 

❞ 

❞ 

❞ 

❞ ❞ ❞ 

❞ 

❞ 

❞ 

❞ ❞ 

Improving Augmented Reality Table 

Top Applications with Hybrid 

Tracking 

Felix Löw 

Aufgabensteller: Univ-Prof. Gudrun Klinker, Ph.D. 

Betreuer: Dr. Martin Wagner 

Abgabedatum: 11. Mai 2005

Erklärung 

Ich versichere, dass ich diese Ausarbeitung der Diplomarbeit selbstständig verfasst und nur 

die angegebenen Quellen und Hilfsmittel verwendet habe. 

München, den 11. Mai 2005 Felix Löw

Zusammenfassung 

Anwendungen der Erweiterten Realität (Augmented Reality) bereichern die reale Welt durch 

die Überlagerung mit virtuellen Objekten. Um die Verschmelzung von realer und virtueller 

Umgebung betrachten zu können, werden grafische Ausgabehardware wie Head Mounted 

Display oder Tablet PC, sowie Tracking Technologien zur Bestimmung von Position und 

Orientierung von verfolgten Objekten verwendet. Häufig verwendete bild-basierte Tracking 

Verfahren wie Natural Feature Tracking sind fehleranfällig für Kamerabewegungen. Bestimmte 

Merkmale (Features) werden über mehrere Bildsequenzen verfolgt. Die Grundidee 

dieser Arbeit ist, den Suchbereich für das Finden dieser Merkmale an die Bewegungsveränderung 

der Interaktionshardware anzupassen. Diese Arbeit ist ein erster Schritt dieses 

Problem für eine spezielle Anwendungsklasse für Augmented Reality zu lösen, Table- 

Top Augmented Reality. Diese Arbeit schlägt einen hybriden Trackingansatz vor, um beides, 

das Tracking und den Bewegungskontext des Benutzers zu berücksichtigen. Die gemessene 

Orientierung eines zusätzlichen Trackers wird für eine dynamische Laufzeitanpassung des 

bild-basierten Trackingverfahrens, das Tracking von Texturen, verarbeitet. Hierzu wird eine 

Software Architektur vorgeschlagen, die dies ermöglicht. 

Nach einer Einführung in Table-Top Augmented Reality erörtern wir den Aufbau und 

Auswertung einer Benutzerstudie. Ziel dabei ist es eine Annäherung für eine lineare Abbildung 

von Benutzerbewegung und Suchfenster des Texturentrackings zu bestimmen. Dabei 

werden statistische Analysemethoden verwendet um diese Abbildung zu finden. Diese 

Abbildung kann in einer einfachen linearen Funktion mit der Orientierungsänderung als 

Eingabeparameter ausgedrückt werden. Zusätzlich wird die Beziehung zwischen dem Benutzerverhalten 

und ausgeführten Aufgabe untersuchen. Ferner werden Aufgaben in Table- 

Top Anwendungen identifiziert und Konsequenzen für das bild-basierte Trackingverfahren 

gefolgert.

Abstract 

Augmented Reality (AR) applications enrich the real world by augmenting virtual objects. In 

order to gaze this fusion of real environment and virtual content Augmented Reality setups 

utilize common graphical output hardware like Head Mounted Displays or Tablet PC and 

tracking technologies to estimate the position and orientation of tracking targets. Frequently 

used vision-based techniques like Natural Feature Tracking are error-prone to camera movements. 

Features have to be found in subsequent video frames again. Basic idea of this work 

is to adopt the search area for features to the change in orientation of the user interface hardware. 

This work is a first step to solve this problem for a special class of Augmented Reality 

applications, Table Top Augmented Reality. The work provides a hybrid tracking approach 

to bring tracking and the user’s movement context together. Orientation information given 

by an additional tracker is used and applied for a dynamic configuration during runtime of 

the vision-based tracking routine, a texture tracking algorithm. To accomplish this a special 

software architecture is proposed. 

After we introduced the basic ideas of table top Augmented Reality we show the design, 

the execution and evaluation of a user study. Goal is to find an approximation for a linear 

mapping between user motion and search window of the texture tracking routine. Applying 

statistical techniques we will show that it is possible to derive such a mapping. This mapping 

can be expressed by a simple linear function with the change of orientation as input 

parameter. We will also evaluate that the user behavior is related to the performed tasks. We 

will identify tasks for Table Top AR and discuss implications for the tracking routine.

Purpose of This Document 

Preface 

This work was written as diploma thesis, which is adequate to a Masters Thesis, at the Technische 

Universität München at Prof. Gudrun Klinkers Augmented Reality Research Group 

(Chair for Computer Aided Medical Procedures and Augmented Reality). The work was 

advised by Dr. Martin Wagner. The ideas for this thesis evolved from fruitful discussions 

with him. 

The thesis was accomplished in cooperation with the Human Interface Technology Laboratory 

New Zealand (HIT Lab New Zealand 1 ) in Christchurch. From September 2004 until 

February 2005 I have been at the HIT Lab in New Zealand developing and conducting the 

main parts of thesis. This part was supervised by Prof. Mark Billinghurst. 

This abidance in New Zealand for this thesis was financially supported by a scholarship 

for ”Kurzfristige Studienaufenthalt für Abschlussarbeiten” by the Deutscher Akademischer 

Austauschdienst (DAAD) 2 . 

In this thesis I would like to show my ideas behind my approach, document my results 

and draw conclusions for future implications of my work. 

1 www.hitlabnz.org 

2 www.daad.de 

i

Target Audience 

General Readers who are interested in Augmented Reality should read chapter 1 and 2. These 

sections give an overview of the basic terms and technologies and introduce the ideas 

of my thesis 

Readers interested in Hybrid Tracking should read chapter 3 where my approach is explained 

and categorized in the related work. Succeeding steps and evaluation of my work are 

described in chapters 3, 5 and 6. 

Human Computer Interaction Researchers should read chapter 3, where our user study based 

approach is explained. The design, execution and evaluation of the study is shown in 

chapters 5 and 6. 

Acknowledgments 

First of all I would like to thank my buddy Michael ”Siggä” Siggelkow for struggling together 

through the whole studies and through New Zealand. Thanks for helping, motivating, 

partying and being a friend. 

I would like to thank Martin Wagner who advised this thesis. Although it is a hard task to 

advise a thesis on the other side of the world I think you did a very good job. Thanks a lot! 

Heaps of thanks to Mark Billinghurst for giving me the opportunity to come the HIT Lab in 

New Zealand and for his kind hospitality. Thanks to Gudrun Klinker as well for supporting 

and enabling the stay in New Zealand. 

Here is my big respect for the HIT Lab crew. Most of all I would like to thank Raphaël 

”le docteur” Grasset and Phil Lamb for helping me so much with my user study. Special 

thanks to Anna-Lee Mason and Nathan Gardiner, the good souls of the HIT Lab. Thanks to 

all people working or being involved at the HIT Lab. 

Thanks to all the people I met and I can call my friends now. Special thanks to my great 

swedish flatmates Johan Karlsson and Mikael Seleg˚ard. Ett stor tack! Thanks a lot to Claudia 

Nelles for heaps of things. Also thanks to the rest of the gang in our room: Thomas ”The 

German” Zurbrügg and Michael ”Intern of the month” Herchel. Also a huge thank you to 

Jonno Hill, Raphaël Grasset, Sofi Crosley, Jörg Hauber, Anna-Lee Mason, Phil Lamb, Marcel 

Lancelle, Matt Keir, Tobi Gefken, Oakley Buchmann and Nathan Gardiner. Thanks to all the 

people from the Canterbury University Tramping Club and the Wednesday soccer team. I 

had a great time with all of you. 

I would like to thank all my friends here in Munich. Special thanks to the crowd that came 

all the way to New Zealand to look after us. 

Last I would like to thank my family for everything and making me feel at home again. 

Special thanks to my brother Martin for showing me the secrets of statistics. 

ii

Figure 0.1: New Zealand, Mount Cook National Park 

iii 

Garching, May 2005 

Felix Löw

Contents 

1 Introduction 1 

1.1 Overview of Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . 1 

1.2 Tracking in Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 

1.2.1 Representation of spatial information . . . . . . . . . . . . . . . . . . . 3 

1.2.2 Examples for Tracking Technologies . . . . . . . . . . . . . . . . . . . . 5 

1.2.3 Hybrid Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 

1.3 Human Computer Interaction (HCI) . . . . . . . . . . . . . . . . . . . . . . . . 10 

1.4 Goals and Outline of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

1.4.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 

1.4.2 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 

2 Table Top Augmented Reality 13 

2.1 Motivation for Table Top AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 

2.2 The Magic Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 

2.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 

2.4 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 

2.4.1 Marker-Based Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 

2.4.2 Texture Tracking of a 2D plane . . . . . . . . . . . . . . . . . . . . . . . 19 

2.4.3 Tracking in the Magic Book . . . . . . . . . . . . . . . . . . . . . . . . . 25 

2.5 User Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 

2.5.1 Graphical Output Hardware . . . . . . . . . . . . . . . . . . . . . . . . 25 

2.5.2 Input Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 

3 A Hybrid Tracking Approach 29 

3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 

3.2 An Inertial - Optical Tracker based Runtime Setup . . . . . . . . . . . . . . . . 30 

3.3 Configuration of the setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 

3.4 Motivation for a User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 

3.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 

3.5.1 Natural Feature Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 32 

iv

Contents 

3.5.2 Hybrid Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 

3.5.3 Head motion prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 

3.5.4 Table-Top Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . 36 

3.5.5 Our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 

4 A Software Architecture based on DWARF 39 

4.1 DWARF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 

4.1.1 Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 

4.1.2 Service Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 

4.1.3 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 

4.2 Software Architecture for a Dynamic Configuration during Runtime . . . . . 42 

4.2.1 Existing Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 

4.2.2 Requirements for new architecture . . . . . . . . . . . . . . . . . . . . . 43 

4.2.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 

4.2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 

4.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 

5 User Study 49 

5.1 Goals of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 

5.2 User Study design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 

5.2.1 Movement Tracking of the Hand-Held Device . . . . . . . . . . . . . . 51 

5.2.2 Tracking of 2d feature point . . . . . . . . . . . . . . . . . . . . . . . . . 54 

5.2.3 Logging Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 

5.2.4 Task Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 

5.2.5 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 

5.2.6 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 

5.3 Execution of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 

6 Evaluation of the User Study 65 

6.1 Evaluation of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 

6.1.1 Feature Point Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 

6.1.2 Feature Point Tracking and Tracking of the Handheld . . . . . . . . . . 70 

6.1.3 Feature Point Tracking and Tasks . . . . . . . . . . . . . . . . . . . . . . 77 

6.1.4 Further Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 

7 Conclusions 82 

7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 

7.1.1 Results of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . 82 

7.1.2 Natural Feature Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 84 

7.1.3 Table Top Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . 84 

7.1.4 Assessment of our Approach . . . . . . . . . . . . . . . . . . . . . . . . 85 

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 

7.2.1 Factors for User Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 85 

7.2.2 Visual Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 

7.2.3 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 

v

Contents 

Glossary 88 

A User Study 90 

A.1 Conduction of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 

A.1.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 

A.1.2 Instructions and Guideline . . . . . . . . . . . . . . . . . . . . . . . . . 91 

A.2 Statistical Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 

A.2.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 

A.2.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 

B Complete Results 101 

B.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 

B.2 Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 

B.2.1 Case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 

B.2.2 Case 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 

B.2.3 Case 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 

B.2.4 Case 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 

Bibliography 118 

vi

CHAPTER 1 

Introduction 

Computers have always changed their appearance in the past and still will in the future. 

Huge central computers available for only a small group of researchers or experts have 

changed to desktop personal computers (PCs) available for everyone in the last 40 years. 

And the trend towards new small, cheap and mobile computers like mobile phones or palm 

desktops is continuing. Computers get more and more involved into our everyday life. New 

ways of interacting with these new computers have to be researched and evaluated. 

Augmented Reality (AR) is such an approach to bring the real world and the virtual computer 

world together. AR allows Human-Computer Interaction (HCI) in a new way. 

1.1 Overview of Augmented Reality 

In his survey of Augmented Reality [5][2] Azuma defines Augmented Reality as follows: 

”Augmented Reality (AR) is a variation of Virtual Environments (VE), or Virtual 

Reality (VR) as it is more called. VE technologies completely immerse a user 

inside a synthetic environment. While immersed, the user cannot see the real 

world around him. In contrast, AR allows the user to see the real world, with 

virtual objects superimposed upon or composited with the real world.” 

In other words AR tries to enrich the real environment with virtual information. AR brings 

the real world and the computer world together. In contrast Virtual Environments leave the 

real world outside. The vision of VR is that the user is not aware of the ”outside world”at 

all and he can not interact with real objects. In AR on the other hand the user is able to 

interact with virtual objects as well as real objects. This virtual information is augmented in 

the user’s point of view, using special graphical output devices like a Head Mounted Display 

(HMD), utilized in a classical AR setup for example. The system immediately responds to the 

user’s actions and gives feedback. Realtime feedback is one of the key requirements for AR 

applications. If a user wearing a HMD turns his head, the new viewpoint has to be calculated 

and the 3D objects have to be registered with the real objects in realtime. Otherwise the 

1

1 Introduction 

user will have the feeling that the registration of the 3D objects will lag behind his head 

movements. The virtual information is adopted according to the user’s performance or even 

if the state of the environment changes. Already in 1968 Ivan Sutherland presented the 

first Augmented Reality system introducing the first HMD [54]. Interesting is that the basic 

concepts proposed by Sutherland are still valid for current AR applications. 

Figure (1.1) shows an example of the AR application ”Augmented Furniture Client” that 

allows to put virtual furniture into the real living room [19]. A user wearing a HMD can 

walk through his own real living room and the selected pieces of furniture are displayed in 

the environment according to the user’s viewpoint. 

Figure 1.1: Placing virtual furniture in a real environment. A virtual sofa and chair are augmented in 

the livingroom 

To realize this we have to find new ways of interacting with computers. The paradigm 

of a desktop PC with a keyboard and mouse as the only way to interact with computers is 

not suited for this kind of applications anymore. Often a mobile user has to be enabled to 

change the behavior of a system with more intuitive ways of interacting or even without any 

interaction at all. Marc Weiser describes this new way of thinking in his ”The computer of 

the 21st century”[64]. Also the terms of Context-Aware Computing and Mobile Computing are 

important if we talk of AR. 

Context-Aware Computing. According to Schilit and Theimer ”Context-Aware software adopts 

according to the location of use, the collection of nearby people, hosts, and accessible 

devices, as well as to changes to such things over time”. Any information about the 

environment that is important for the computer system has to be collected, evaluated 

and responded to by the system. For example a system changes its internal behavior 

according to the light conditions of the user’s position [63]. A huge research area in AR 

is therefore how to collect these informations about the environment like the location of 

the user. These questions will be discussed more in detail in the tracking introduction. 

2


Mobile Computing. A lot of AR applications enable the user to move around freely in the 

environment. Therefore a mobile setup is attached to a mobile user. Possible scenarios 

for this class of AR applications are maintenance tasks, like repairing a car for example. 

The user has his hands available to fix a car and gets virtual information into his HMD 

showing which step to perform next. Another example would be a navigation system 

displaying information about the environment in the user’s view. 

As a short summary key challenges in AR are: 

• Registration of virtual information in the real environment 

• Find new ways of interactions with systems that respond in realtime 

For 3D registration in the real environment tracking technologies are needed. Tracking is 

a difficult problem which will be discussed more in detail. For new user interaction possibilities 

innovative user interfaces (UI) are needed and it has to be discovered how user actually 

use a system and accept new ways of interaction or even refuse them. 

1.2 Tracking in Augmented Reality 

As denoted in the previous section tracking is one of the main and most difficult issues in 

AR research. 

Virtual and real objects have to be aligned as good as possible. This process is called registration. 

Sensors that gather information about the environment and collect 3 dimensional 

(3D) spatial information are called trackers To register virtual objects in 3D, AR applications 

work in the 3 dimensional space. In order to calculate the viewpoint of a user and to display 

the the virtual information at the right position the pose information, which consists of the 

position as well as the orientation has to be tracked by the underlying tracking technology. 

This section gives an introduction to the fundamentals of tracking and an overview of 

the basic tracking technologies used in this thesis. A good introduction and more detailed 

descriptions of all the terms and technologies introduced can be found in [21]. 

1.2.1 Representation of spatial information 

• Position 

Position is a 3D vector estimated by the tracking technology. It contains the coordinates 

of the specified point in the tracker coordinate system in the current tracking 

frame. The tracker coordinate system is a cartesian coordinate system consisting of three 

perpendicular axes. These axes intersect in one point, the origin of the coordinate system. 

In homogeneous coordinates position is represented by 4-component vector: 

¯v = (x, y, z, w) T , where is typically w = 0 

3

• Orientation 


Orientation gives the information how an object is rotated according to the axis of 

the tracker coordinate system. Unlike representing positions, there are more way of 

representing orientation that have advantages and disadvantages. It is always a tradeoff 

which method to choose. 

1. Rotation Matrix 

A common way to represent transformations of points in a coordinate system is 

the 3x3 rotation matrix. Rotation and scaling on a set of points can be combined 

and performed by a simple matrix multiplication. If a 4x4 homogeneous matrix 

is used the rotation matrix is the upper 3x3 matrix. Also translations can be computed, 

thus it is represented in the 4th column. The columns of the rotations matrix 

can be regarded as the direction of the transformed coordinate axes projected 

on the source coordinate axes. 

2. Euler Angles 

Euler angles are the simplest and most intuitive representation of rotations. Every 

rotation can be considered as three successive single rotations around the three 

coordinate axes. In the 3 dimensional space there are rotation matrices for every 

axis: 

⎛ 

rotation φ about x axis: Rx = ⎝ 

⎛ 

rotation θ about y axis: Ry = ⎝ 

⎛ 

rotation ψ about z axis: Rz = ⎝ 

1 0 0 

0 cos φ sin φ 

0 −sin φ cos φ 

cos θ 0 −sin θ 

0 1 0 

sin θ 0 cos θ 

cos ψ sin ψ 0 

−sin ψ cos ψ 0 

0 0 1 

Any rotation in the 3D space can be calculated by multiplying the three rotation 

matrices: 

R = Rx · Ry · Rz 

Note that matrix multiplication is not commutative, the order of the three rotations 

matters. 

3. Quaternions 

Quaternions are extensions of complex numbers to hyper-complex numbers of 

rank 4. On a first glance quaternions might look difficult and confusing, but once 

you are familiar with them calculations can be very easy. They can be represented 

by a 4 dimensional vector. They consist of a real scalar and an imaginary vector: 

q = w + xi + yj + zk, w, x, y, z ∈ R 

q = (w, x, y, z) , w, x, y, z ∈ R 

q = (s, ¯v) , s ∈ R, ¯v ∈ R 3 

4 

⎞ 

⎠ 

⎞ 

⎠ 

⎞ 

⎠


Note that i, j, k are imaginary units with i 2 = j 2 = k 2 = ijk = −1, the imaginary 

vector is ¯v = (x, y, z) T , and the scalar part is s = w. 

Mukundan provides an introduction into the basic quaternion algebra providing 

operations like multiplication and addition, which we will not discuss here in 

detail [36]. 

Rotations with quaternions: The vector part containing the imaginary components 

specifies the rotation axis, the scalar part is the cosine half of the rotation angle. 

Only unit quaternions are used to describe rotations q = 1. A quaternion 

q = (s, ¯v) specifies a rotation of 2arcos(s) around the axis ¯v [21]. So if we 

want to construct a rotation around an axis ¯v of angle θ we can express the 

following quaternion: 

qθ,v = (cos( 1 

θ), sin(1 

2 2 θ)¯v) 

Here are two simple examples with q = (s, ¯v): 

– The identity rotation is specified by rotation of 0 degrees around an unspecified 

axis 

s = 1 ¯v = (0, 0, 0) T 

– A rotation of 90 degrees about the y-axis is specified the following way 

s = 1 

√ 2 

¯v = 

 

0, 1 

T √ , 0 

2 

Consecutive rotations can be expressed with the product of the corresponding 

quaternions. A rotation q can be applied to a vector ¯p = (x, y, z) the following 

way: 

¯p ′ = q ◦ ¯p ◦ q ∗ 

with the conjugated quaternion q ∗ = (s, −¯v). 

A discussion about the advantages and disadvantages of the different representations 

of orientation can be found here [49]. 

1.2.2 Examples for Tracking Technologies 

In order to select a appropriate tracking system for an AR application certain criteria have 

to be evaluated. The most important criteria are latency, accuracy, update rate, working area 

and mobility [46]. 

Tracking could be compared to how a human being collects information about the environment: 

by seeing, by sensing (hearing, feeling, recognizing certain influences) and by 

equilibrium sense. Therefore tracking technologies can be categorized in almost similar categories. 

This is only a small selection of different tracking technologies used in this thesis. 

5

Vision-Based Tracking (Seeing) 


Vision-based tracking apply image recognition techniques in order to detect certain features 

in images grabbed by optical cameras. Thus speaking of optical trackers means a combination 

of hardware to grab video frames and software to analyze the frames. The terms 

vision-based tracking and optical tracking will be used in the same way during this thesis. 

Tracked features can be artificial or natural. They are used to calculate the position of the 

target in the reference coordinate system. While often simple markers are used as artificial 

features [28], natural features can either be preprocessed points in a 2D plane [11] or any 

features in the environment [42]. This method provides full 6 degrees of freedom (DOF). This 

means that vision based tracking provide position and the orientation as well. 

A common software used for marker-based tracking is the Augmented Reality Toolkit 

(ARToolkit) [28]. Recently a new version of this toolkit has been developed which allows the 

texture tracking of a 2D plane instead of fiducials [11]. This software has been developed by 

the HIT Lab (Human Interface Technology Laboratories) USA and New Zealand 1 . A cheap 

web cam on an average desktop computer can be used with this software and allows to set 

up a small AR system even at home. AR applications based on vision-based tracking could 

become very important in the future in order to address a broad mass of people. 

Optical Tracking is often also categorized in Outside-In or Inside-Out Tracking. In an 

Outside-In setup the camera is attached to a fixed position, in an Inside-Out setup the camera 

is attached to the moving target itself (on the HMD of a user for example). 

One disadvantage is a high latency because of the huge amount of video data grabbed by 

the camera and the high processing time of the image recognition algorithms. Drawback for 

the user is mainly occlusion. If either artificial or natural features are occluded the tracking 

will fail (marker based tracking) or will lead to inaccurate results of the tracking routine if 

less features are available for tracking (natural feature tracking). 

Inertial Tracking (Equilibrium sense) 

In order to measure the position of an object, inertial trackers use accelerometers which 

estimate the linear acceleration of an object. Gyroscopes instead measure angular velocity 

applying the laws of conservation of angular momentum and therefore gyroscopes are able 

to provide the orientation of an object. The orientation is delivered as yaw (y-axis), pitch 

(x-axis) and roll (z-axis). 

Current technologies for gyroscopes play an important role, because they are small, cheap 

and easy to integrate into other devices like laptops or even mobile phones. But historically 

gyroscopes were used for navigation in airplanes and ships. These inertial devices 

were heavy, expensive and due to their navigation task very accurate. But new technologies 

enabled the development of smaller and cheaper devices. Here is a short overview of 

the common techniques used for gyroscope devices [23]. A special focus is set on vibrating 

gyroscopes. 

1 www.hitlabnz.org 

6

• Spinning Mass Gyroscopes 


These classical gyroscopes are also called gimbaled gyroscopes. They use the properties 

of a spinning wheel and can only measure the rate of rotation about one axis. Thus 

three gyroscopes have to be combined if the rotation about three orthogonal axes has 

to be sensed. These gyroscopes are heavy and large they are only applied in ships and 

aircrafts anymore. 

• Optical Gyroscopes 

These gyroscopes apply the time of flight (TOF) principle. The time of flight until a 

signal is sensed by a receiver is measured. For a gyroscope it means that rotation 

influences the time of flight for light. The time is measured and the rate of turn can be 

estimated. 

• Vibrating Gyroscopes 

Vibrating gyroscopes are commonly used in recent application. Reasons for that are 

that they are small, consume less power and no bearing or motors are required. A vibrating 

element is rotated. Evaluations have shown a ring-shaped vibrating resonator 

is suited best for the purpose of measuring rate or turns. For the measurement of the 

angular rate a phenomenon known from the aviation domain is utilized, the Coriolis 

Effect. If an airplane is heading east it will drift towards south, although it does not 

accelerate in the south direction. Heading west leads to a drift in the north direction. 

This ”force” responsible for the acceleration in the north-south direction is called the 

Coriolis force FC. This effect occurs when an object moves within a rotating reference 

frame. In this special case an aircraft moves in the reference frame of the rotating earth. 

Figure 1.2 shows this effect. An object moves around a rotation axis and the Coriolis 

force affects the object perpendicular to the movement direction. This effect can also be 

recognized in thunderstorms, weather developments or the water flushing down the 

sink in the other direction on the southern hemisphere. A good demonstration of the 

effect can be seen in [16]. 

The Coriolis force FC can also be expressed in the following equation with the objects 

mass m, its velocity in the rotating frame vr, the angular velocity of the rotating frame 

of reference ω and × the vector cross-product. 

FC = −2m(ω × vr) 

A good introduction to the Coriolis force can be found here [25]. This effect is applied 

in vibrating gyroscopes. Figure 1.3 shows the reference coordinate frame with the vibrating 

ring. First let us only consider rotations around the axis Z. The ring is vibrated 

with a constant amplitude ΩZ around the Z-axis. This is called the primary mode. If 

the gyroscope is turned around the Z-axis now, the Coriolis effect leads to a acceleration 

perpendicular to the motion of the ring (in w-direction), the also called secondary 

mode. This secondary mode can be measured and the rate of turn can be calculated. 

If we now vibrate the ring around all axis of the reference frame Ω = (ΩX, ΩY , ΩZ) we 

can measure the rate of turn for all directions [22] [14]. 

7


Figure 1.2: An object moves in a rotating frame. The Coriolis force results in an acceleration perpendicular 

to the movement direction 

Figure 1.3: The reference frame of a vibrating gyroscope. The ring is vibrated around the all the axis 

(primary mode) and the rate of turn can be measured by the secondary mode cause by the Coriolis 

effect 

8


Figure 1.4: Intersense Products: Inertial Cube2 (left) and Intertrax2 (right) 

Figure 1.5: Example for a Magnetic Tracker: Ascension Flock of Birds 

Accelerometers and gyroscopes are often combined to get full 6 DOF. Due to the fact that 

inertial trackers provide relative measurements they are often combined with other tracking 

technologies to obtain absolute measurements as well. One drawback of this technology is 

that small measurement errors accumulate and cause drift, which leads to incorrect tracking 

results after a certain period. Widespread inertial tracking products are tracker by Intersense 

2 (see figure 1.4). 

A difficult drawback for the usage is the accumulation of drift. Either relative orientation 

is used or the setup has to integrate other tracking or filtering technologies in order to correct 

the measurements delivered by the inertial tracker. 

Magnetic Tracking 

A special hardware setup generates a magnetic field in a certain working area. These magnetic 

fields are either low frequency AC or DC fields. Three orthogonal coils in the sender 

as well as the receiver are used to produce measurements for position and orientation (6 

DOF). The physical principles applied are described in [21]. Tracking measurements can be 

distorted by ferromagnetic objects or CRT monitors which produce an artificial magnetic 

field for example. An example for a magnetic tracking system is the Ascension 3 Flock of 

Birds(1.5). 

Main drawback for users is the limited range of tracking. Best results are achieved near the 

2 www.intersense.com 

3 http://www.ascension-tech.com 

9


base station. Also interferences with artificial magnetic fields or metals have to be avoided. 

Other examples 

Other tracking technologies are acoustic trackers, the Global Positioning System (GPS), which 

is important for outdoor applications and mechanical trackers. 

1.2.3 Hybrid Tracking 

All the tracking technologies have their advantages and drawbacks, like latency, frame rates, 

mobility and precision. But as we said that tracking is one of the key issues for AR this 

weaknesses have to be eliminated. One approach is to combine several tracking technologies 

to compensate the drawbacks of a single technology. For example inertial trackers can only 

give relative pose information. If it is combined with a GPS system the resulting tracking 

system can deliver absolute pose information (the world coordinates as a reference frame). In 

such applications user movement heavily affects the 3D registration and has to be stabilized 

by inertial trackers[1]. 

As we said inertial tracking accumulates small measurement errors that cause drift. To 

compensate this vision-based tracking techniques are integrated to filter and even predict 

orientation estimations. An important filtering technique in this context is called Kalman 

Filtering. 

Kalman Filter: Kalman Filtering is a powerful mathematical tool using a prediction and correction 

loop to filter and stabilize error-prone data. If several trackers are combined 

this tool can be used to correct the measurements of each other. Bishop and Welch 

provide a good introduction to the Kalman Filter [65]. 

In the context of Context-Aware and Ubiquitous Computing multiple sensors in the environment 

have to be combined and evaluated. In Ubiquitous Tracking these sensors can be 

dynamically integrated during runtime [39]. 

1.3 Human Computer Interaction (HCI) 

As mentioned above not only a satisfying registration in 3D is a must in AR, but also a suited 

way of interaction with AR systems has to be researched and evaluated. According to [24] 

research in this area focuses on the following tasks: 

• Design 

• Evaluation 

• Implementation 

of interactive computing systems for human use. During this steps also factors influencing 

the user behavior, like psychological aspects are studied. All these aspects might vary 

within different user groups or with a changing application domain. Therefore understanding 

the user is one of the key requirements when developing user interfaces. Based on that 

10


observations also the suited input and output hardware has to be selected. It is important 

to notice that this is a new way of thinking, because the last decades the user has always 

been forced to learn how to interact with a computer system. Input- and Output devices, 

like keyboard, mouse and a monitor were fixed. The vision of HCI is that a user does not 

have to adopt to a system at all, the system adopts to the user. 

1.4 Goals and Outline of this Thesis 

My overall vision is to bring tracking and user behavior together. In almost every AR application 

the user has to learn how tracking works in order to improve her or his performance. 

The user has to learn which actions are allowed and which ones result in an unexpected 

feedback by the system, or even in no feedback at all. Every class of AR applications has 

special requirements concerning tracking, user interaction and mobility. As already said the 

user interaction will change from application to application and therefore tracking requirements 

dependent on the user behavior will be different. So user behavior has to be studied 

even for a specific application domain or even just a single task. 

1.4.1 Goals 

This thesis focuses on special class of AR applications: AR Table Top Applications. Table top 

applications can be set up at a desk or a table with a user standing or sitting in front of it. All 

the studies have been done with a table top application called The Magic Book developed at 

the HIT Lab New Zealand. 

Goal of this thesis is to evaluate if it is possible to adjust the behavior of the vision-based 

tracking algorithm according to the user’s movements. This should be realized with a hybrid 

tracking approach combining the vision-based tracking of the Magic Book on the one hand 

and a gyroscope giving information about the orientation of the user interface on the other 

hand. The gyroscope will be a measurement of the occurring user movements. 

The following issues will be discussed in order to get a little bit closer to this vision: 

• Properties of Table Top AR 

The special properties concerning tracking and user interaction in Table Top AR will 

be evaluated. 

• User study-based approach to find a mapping between behavior of the tracking algorithm 

and user movement 

If it is possible to find such a mapping we can adjust the algorithm very easy. We 

can simply apply a function with the movement as input parameter and configuration 

setting for the vision-based tracking as output parameters. To achieve this an approach 

based on a user study is introduced. 

• A software architecture allowing a dynamic configuration of the tracking algorithm 

based on the DWARF (Distributed Wearable Augmented Reality Framework) framework. 

11


The information given by the gyroscope has to be integrated in the tracking routine 

during runtime. Thus a software architecture accomplishing this requirement is needed. 

DWARF [6] is a component based AR framework. Some of these components have to 

be extended in order to integrate the texture tracking version of the ARToolkit. 

1.4.2 Outline of the thesis 

Heres a brief overview of the chapters of this thesis. 

Chapter 2: In this chapter we will give a motivation for Table Top AR and introduce the 

Magic Book. All the research done in this thesis is based on this application. Requirements 

on tracking and user interaction for Table Top AR will be discussed in this 

chapter and the fundamental techniques used in the Magic Book will be explained. 

Chapter 3: This chapter describes the approach to bring tracking and user behavior together 

and provides an idea for a user study based evaluation. The approach will be classified 

into other research projects and existing related work. 

Chapter 4: A DWARF based architecture will be introduced that allows a dynamic configuration 

of the tracking algorithm. 

Chapter 5: This was the main part of the work. A user study was performed with the goal to 

find a dependency between the tracking algorithm and user movement. The chapter 

describes the design and execution of the user study. 

Chapter 6: The user study will be evaluated. We will try to measure the correlation between 

the recorded tracking data of the vision-based tracker and the movement information 

given by the gyroscope. A mapping has to be found. Also I will present further ideas 

on how the gathered data can be evaluated and analyzed. 

Chapter 7: Results will be evaluated, conclusions will be presented and ideas for future work 

will be discussed. 

12

CHAPTER 2 

Table Top Augmented Reality 

Table Top Augmented Reality is a specific class of AR applications. As the name says the 

application is set up on a table or desk. The user stands, sits or moves in front of the table 

setup. This leads to special requirements on user interaction, tracking and mobility of the 

whole setup. As an example application this work deals with the Magic Book which was 

developed at the HIT Lab New Zealand. 

The concepts and technologies described in this chapter are generally valid for all AR 

applications as well. But we will always consider them in the context of table top AR applications. 

First we will give a motivation for table top AR for various application domains. 

The Magic Book and some other applications will be introduced shortly and possibilities 

for interactions and tracking will be evaluated. As we will see vision-based tracking has 

serious advantages for those applications. As the Magic Book uses the ARToolkit texture 

tracking technology we will give a short introduction to the algorithm. Several user interfaces 

for graphical output, especially HMD, a special handheld device and a tablet PC will be 

introduced. Also a short overview of input devices will be provided. 

2.1 Motivation for Table Top AR 

The reason for setting up applications on a table environment is very simple. People work, 

read, discuss, interact and play games on tables. Therefore table top AR applications try 

to enhance the experience of a current task or a social event with techniques used in AR. 

It is also called a horizontal setup which is a characteristic of table top environments. We 

have already discussed that one important issue of AR is the alignment of virtual and real 

objects. In table top applications these virtual objects are displayed in the work space, the 

table itself. People can sit, stand or even walk around the table and applications allow the 

interaction with the virtual environment and even communication and interactions between 

the participating users itself. 

Here are examples for very different application domains for table top AR with some 

examples for related work: 

13

• Exhibitions and Education 

2 Table Top Augmented Reality 

A lot of research has been done to bring new ways of multimedia interactions in museums 

and exhibitions. AR is a new way for interactive multimedia presentation. With 

a graphical output device like a HMD a user can walk through a museum. While 

regarding cultural objects audio and virtual information is augmented to the users 

recognition for example. The user is standing in front of the exhibition piece which 

is placed on a desk or on a table. The HIT Lab New Zealand developed a special AR 

kiosk for applications like this (figure 2.1). This kiosk can be used for a variety of applications 

like for education, science presentations and entertainment. [66] presents 

some of those applications used for educational purposes, like the AR Volcano (figure 

2.1), an interactive tutorial about volcanoes and the S.O.L.A.R (Solar-System and 

Orbit Learning in Augmented Reality) where a user succeeds if he is able to arrange 

augmented planets around the sun in the right way. 

Figure 2.1: The Augmented Reality Kiosk (left) and the AR Volcano application developed by the 

HIT Lab New Zealand (right) 

Mark Billinghurst writes about the potential of AR in education in his internet essay 

”New Horizons for Learning”[7]. 

• Gaming 

Gaming is an interesting application domain for AR and especially table top AR. Players 

sit or stand around the table and get the results of their interaction augmented in 

the viewpoint. The immersion in a game experience and therefore the fun factor increases 

[56]. Immersion is a measurement to what degree a player is affected by a 

virtual or augmented experience. For example the classical PC game ”Worms” was 

ported to an AR application. The Studierstube also developed a collaborative game to 

steer a virtual train on a real network of wooden play rails. The trains can only be seen 

and manipulated by a see-through PDA device [61] (see figure 2.2). 

• Interactive Storytelling 

14


Figure 2.2: The Invisible Train: An Table Top AR game developed by the Studierstube, Vienna 

New ways of telling stories are explored. As well as with enhancing the museum 

experience by augmented audio and virtual information, AR is used for storytelling as 

well. Using easy authoring tools even children are enabled to create their own content 

and create their own virtual worlds [34]. The Magic Book is such an application. 

• Collaboration 

Even new ways of collaboration are researched. Billinghurst also evaluated the potentials 

of AR applications for collaboration [8]. Michael Siggelkow developed an application 

for remote collaboration that could be set up easily on any desktop. In his thesis 

he explores in what way AR enhances the awareness of the participants in comparison 

to other technologies [50]. 

The potential that such applications are also suited to catch the attention of a broad mass 

of people makes them interesting in the future. At the moment AR is used mainly in the 

industry yet and the door for non-experts has still to be opened. 

2.2 The Magic Book 

This thesis will focus on a special application: An interactive fairy tale book. The Magic 

Book itself is just a framework for a variety of applications based on 

• The book paradigm 

To change the content of the virtual scenes a book is used. Like reading a real book the 

user can turn pages and the content is augmented on the book page. Thus for every 

page a corresponding virtual model exists. The book is a tangible device for interacting 

with the application. The paradigm of tangible devices is embossed by Ishi [43]. 

• A handheld visor 

With this visor the virtual objects are augmented in the user’s viewpoint. It will be 

described more in detail in user interaction section. 

The application content we are working with deals with the fairy tale ”Giant Jimmy Jones”, 

which was especially written for this purpose. On every page another part of the story is 

15


augmented on the book. The story continues when a user turns the page. Also audio output 

is supported, a storyteller explains the scenes and a soundtrack has been written 2.3. 

A user is equipped with the hand held visor, the so-called handheld device. He can look 

through this visor, which is a specially prepared HMD and gaze the 2D book pages. While 

standing in front of the AR kiosk he is enabled to zoom into the scene in case he wants to 

focus on a certain property of the 3D animated fairy tale. Zooming into the scene is done 

by getting closer to the book surface with the handheld device. It is possible to walk around 

and watch the scene from different points of view. The book is attached to a rotatable plate. 

Hence the scene can also be watched from another point of view by simply turning the plate, 

which could also be considered as a tangible interaction device. 

Figure 2.3: Giant Jimmy Jones, an interactive fairy tale 

The different aspects concerning tracking and user interaction will be discussed in the 

following corresponding sections. 

2.3 Requirements 

In order to have a full understanding of the term table top AR we will discuss certain properties 

and requirements for AR. This is done to give a rationale why certain user interaction 

hardware or tracking technologies evolved for the class of table top AR applications. We will 

evaluate these requirements in the context of table top AR. Table top AR has the common 

properties of AR but also additional requirements. 

• Alignment in realtime 

In the introduction I already made clear that this is a key issue for a AR. Alignment 

again means that the virtual information is registered in the real environment at the 

16


exact position and with the exact orientation according to the information sensed by 

the underlying tracking technology. Thus we need a tracker or a combination of trackers 

providing 6 DOF. If the realtime requirement would not be met, the user always 

would have the feeling of a lag between the actual movement and the display of the 

virtual information. This requirement has to be met by the tracking infrastructure. 

• Usability 

Usability Engineering is a new disciple in software engineering trying to solve the 

question how to make a computer system usable. Due to the fact that visitors of a 

museum or participants of a game, for example, are not experienced AR users the 

interaction with table top applications has to be very easy and intuitive. This requirement 

has to be considered while designing the user interaction. The right choice of 

user interface hardware and software design has to be made. 

• Mobility of the setup 

It should be possible to set up the system on every table without further difficulties. 

Exhibitions will move or have to be rearranged and a fixed setup for playing a game 

is not suited. This has serious requirements as well for the chosen tracking technology, 

because a stationary setup is not suited. 

• Price 

Of course a cheap price is always a requirements for software systems, but in order 

to address the public audience or budgeted art galleries with this new technology a 

tracking setup for several thousand Euros would not make sense, even if the measurements 

concerning latency, accuracy and update rate provide better results. Thus an 

affordable user interface and tracking infrastructure is needed. 

In the next section will will discuss which tracking technology and which user interfaces 

are suited best to meet these requirements for table top AR. 

2.4 Tracking 

The best technology to meet all the requirements evaluated in the previous section are vision 

based tracking techniques, especially Inside-Out tracking. As a repetition an optical tracker 

consists of a camera and an image recognition software. The camera grabs video images and 

the software algorithm searches for features to calculated the exact position and orientation 

to display the virtual object. Here is a short discussion why this technology is suited best for 

table top environments: 

• Perfect alignment in realtime 

First vision-based tracker deliver position as well as orientation. The measurements 

given by the tracker is accurate and good enough for this kind of applications. If the 

tracking fails in several frames it is accepted, because the consequences are not that 

serious, although the usability decreases. The bottleneck is the high latency, because 

the video data first has to be transmitted from the camera in the main memory, then 

the computational expensive image recognition has to detect the features. Hence the 

17


quality of the tracking is dependent on the update rate of the camera and the speed of 

the image processing software. 

• Usability 

A camera needs to be integrated in the user interface (Inside-Out tracking), which is 

an additional requirement for the UI now. As mentioned above the tracking might 

fail due to fast movements, changing light conditions or wrong usage for example. 

Also occlusion is a main drawback. It has to considered that a user might occlude 

trackable features during usage. Suited and easy to learn interaction techniques have 

to be applied. 

• Mobility of the setup 

This is one of the big advantages, because no huge hardware setup is needed. Just a 

web camera, usually connected via USB or firewire, is enough. It can be attached and 

detached almost without any effort. 

• Price 

This is definitely the killer argument for the selection of optical tracking. On the one 

hand good webcams are already available for less than 100 Euros. On the other hand 

free image recognition toolkits are offered for programmers to design the software, like 

the ARToolkit. 

Next we will give a short introduction on marker-based tracking and then a deeper view 

into the algorithm of the texture tracking version of the ARToolkit. Further on in the thesis 

we will describe our approach to adopt parameters of this algorithm to movement information. 

2.4.1 Marker-Based Tracking 

A basic and freely available software for marker based tracking is the ARToolkit [28][27]. 

Although several marker-based tracking algorithms are available we will focus on the AR- 

Toolkit here, because it is used in this thesis. The environment has to be prepared with 

quadratic pattern used for the calculation of the relative position of the camera to the so 

called markers (see figure 2.4). These markers are black and white squares with a black border 

and a configurable pattern in the inside. This pattern can be created individually and 

has to be preprocessed first. Owen describes the criteria for a ”good” fiducial [41]. The AR 

volcano uses this vision-based tracking toolkit (see 2.1). Advantage is that the computation 

of the homographie-matrix, which describes the relation between the camera and the marker 

plane works faster than the texture tracking introduced next. The big disadvantage is that it 

is totally error-prone to occlusion. If the marker or only parts of the marker disappear in the 

video image the tracking fails. This restricts the usage, because the complete marker always 

has to be in the video stream. 

The texture tracking introduced next is an extension of the ARToolkit, it still uses marker 

recognition for obtaining an initial position. 

18


Figure 2.4: Examples of an ARToolkit Marker and a 2D textured plane used in the Texture Tracking 

version of the ARToolkit (Note that this version is still using markers for initialization as well) 

2.4.2 Texture Tracking of a 2D plane 

The ARToolkit has been introduced to have a quite simple toolkit to write small AR applications 

based on vision-based marker tracking. However the texture tracking version of the 

ARToolkit allows to track two dimensional textures instead of black and white markers 1 [11]. 

These images 2.4 have to be preprocessed in order to calculate a set of feature points that are 

used for the tracking algorithm. Note that the marker detection is still used for obtaining 

the initial position and orientation. Thus the image to preprocess has to contain a ARToolkit 

marker. Once the initial pose information is calculated it continues with the tracking of point 

features. The texture tracking toolkit provides both the preprocessing tools and the tracking 

algorithms. 

In this context we want to distinguish between texture tracking and Natural Feature Tracking 

but this is our own definition of terms used in the thesis. Texture tracking is used with 

preprocessed images, preprocessed textures. Still the environment has to be prepared. Like 

markers the textures have to be placed on the table, the wall or the floor. Natural Features 

are features that are not artificially placed in the environment, like edges, lines or other properties. 

In the related work section 3.5 we will discuss ideas for natural feature tracking as 

well. The next subsection is thought as a small tutorial for the texture tracking algorithm 

of the ARToolkit. While analyzing the algorithm the ideas for this work have evolved. A 

deeper analysis of the algorithm can be found in Vials master thesis [59]. 

Algorithm of the Texture Tracking ARToolkit 

Every single step of the algorithm is shown in figure 2.6: 

1. The data structures for the ARToolkit-Handles for the texture tracking and for the 

marker tracking as well are created and initial parameters are set. As mentioned above 

the initial orientation is given by the marker position in the image frame. This is done 

by a simple call of the marker detection method of the ARToolkit (step 1 and 2). 

1 This version is not available under license yet. Please contact Hirokazu Kato for further information: 

kato@sys.es.osaka-u.ac.jp 

19


2. Due to the initial positions four feature points of the preprocessed image that are visible 

in the current video frame are selected to update the position. Later on the selected 

feature points have to be found in the video frame. Out of all the feature point candidates 

four points are selected according to the following rules. The number in the 

brackets shows the step in the algorithm. 

• F P1 This point has to be furthest away from the video frame center (5) 

• F P2 This point has to be furthest away from F P1 (11) 

• F P3 This point has to maximize the surface of the triangle between F P1, F P2 and 

the new F P3 (12) 

• F P4 This point has to maximize the surface of the square between F P1, F P2, F P3 

and the new F P4 (12) 

3. Once a feature point is selected a template is created for it. This template is used by 

the Normalized Cross Correlation (NCC) method to deliver a measurement for similarity 

between the template and an area around a pixel in the video frame. Reason for that 

is that it is unlikely to find a feature point in the next frame again. Thus windows are 

compared. Figure 2.5 shows a template for a selected feature point F P . Now following 

NCC parameters for the template are calculated (6): 

• The average pixel value: 

averagetemplate 

• A vector with the normalized pixel values. The advantage of the normalization is 

that different light conditions between frames do not result in different correlation 

values: 

∀(x, y) ∈ template : 

vectortemplate(x, y) = valuepixel(x, y) − averagetemplate 

• The normalized vector length of the template: 

 

lengthtemplate = 

∀(x,y)∈template 

(2.1) 

(vectortemplate(x, y) 2 ) (2.2) 

4. The algorithm now estimates the location of the selected FP in the video frame based 

on the previous location. Here simple approach is used. 

Three different estimation methods are provided. These methods take previous pose 

calculations at different levels into account. For each frame every method is evaluated 

to find the feature point and the best result is taken. Every method represents a 

different movement model of the camera. 

• The first method assumes that no movement occur between two frames. Using 

this assumption the algorithm simply takes the position pk i of the F Pi in the last 

frame and searches the position p k−1 

i for the F Pi in the new frame within a certain 

20


Figure 2.5: Texture Tracking Template: Not only single feature points but whole areas are compared 

search window size. This assumption however is not really realistic because it 

does not consider movements at all. But movements definitely occur between 

two frames and cause displacements (7). 

p k i = p k−1 

i 

(2.3) 

• The second method takes the last two tracking frames into account. Here it is 

assumed that the displacement between two frames is constant. 

v = p k−1 

i 

− pk−2 

i 

(2.4) 

Now the position of the feature point in the current frame can be calculated. The 

equation 

results to 

p k i = p k−1 

i 

p k i = 2p k−1 

i 

+ v (2.5) 

− pk−2 

i . (2.6) 

• The third method uses the last three positions of the feature point in a similar way. 

p k i = 3p k−1 

i 

− 3pk−2 

i 

+ pk−3 

i 

(2.7) 

A common and more sophisticated method to predict the position of a feature in the 

next frame is a Kalman filter which is not applied here. Examples will be shown in the 

related work. 

5. Around the estimated position of the tracked feature the algorithm searches for the 

best matches to update the real position of the feature point. This is done for all three 

position estimates described in the step before. Within a fixed search window the correlation 

value for every area around the pixel with the template is calculated. In figure 

2.7 the parameters used for the calculation of the NCC value are shown. 

21


Figure 2.6: The ARToolkit Texture Tracking Algorithm 

22


Figure 2.7: Template Matching 

To obtain the correlation value which gives a measurement for similarity between the 

template and the pixel area at position (i, j) area several calculations are made for 

every pixel within the search area. 

• Similar to the creation of the template every pixel within the pixel area is normalized 

by subtracting the average pixel value. Note that the size of the pixel area is 

exactly the template size. 

∀(x, y) ∈ pixelarea : 

vectorpixelarea(x, y) = valuepixel(x, y) − averagepixelarea 

• Also the length of this vector is calculated. 

 

lengthpixelarea = 

∀(x,y)∈pixelarea 

(2.8) 

(vectorpixelarea(x, y) 2 ) (2.9) 

• To calculate the correlation between the pixelarea and the template every value in 

vectorpixelarea is multiplied with the corresponding (at the same position) value 

in vectortemplate. 

corr = 

templateSize2 −1 

(vectorpixelarea(i) · vectortemplate(i)) (2.10) 

i=0 

23


• Finally the similarity can be calculated. 

sim = 

corr 

lengthpixelarea · lengthtemplate 

(2.11) 

• The result is a measurement for the similarity of the area around pixel (i, j) and 

the template. It has a range from -1, which indicates no similarity at all, to 1 

indicating a high correlation. Here we will just mention that later on we use another 

correlation method as a statistical analysis tool for the evaluation of the user 

study. So this similarity must be calculated for every pixel within the search area. 

The one with the highest similarity value is most likely to be the feature point in 

the current frame. 

6. According to the requirement to track four feature points these steps have to be done 

four times to obtain the locations of the four feature points in the current frame. Note 

that we are using three different methods, which are described above, to estimate the 

positions of the feature points. 

7. Now for every method the position of the 2D plane is calculated and the one producing 

the smallest tracking error is taken. Finally the algorithm provides the correct position 

of the textured plane in the camera viewpoint. 

Complexity of the template matching algorithm 

Considering the single steps described there are two main drivers for the complexity and the 

robustness of the texture tracking algorithm. For every pixel within the search size area 

the area around the pixel 

O(searchSize 2 ) 

O(templateSize 2 ) 

is compared with the template of the current feature point. This leads to a complexity of 

• Search Size 

O(searchSize 2 · templateSize 2 ) (2.12) 

Due to movements of the camera the estimation of the feature point position with one 

of the three methods is not correct and therefore several points around the estimated 

feature point are considered. These points are within a certain search window. So a 

large search window provides a higher probability to find the point with the highest 

correlation value. But this also means that for every point within the search the correlation 

has to be calculated. This will increase the computation time. On the other 

hand a small search size will speed up the computation but the tracking robustness 

will decrease. 

This parameter is set to a constant value during runtime. According to Billinghurst the 

minimal error will result with a pixel search area of 48 2 pixels [11]. Thus the algorithm 

is configured with a constant search window length of 48. Here the main idea of this 

thesis evolves. Does the parameter of the search window size has to be constant? Is it 

possible to adjust this parameter during runtime with movement information? 

24

• Template Size 


If the template size is large the algorithm will provide a higher quality of the correlation 

value. In this thesis the template size will be considered as constant during runtime. 

2.4.3 Tracking in the Magic Book 

The tracking technology used in the Magic Book ”Giant Jimmy Jones” is the texture tracking 

of a 2D plane. This has several advantages to previous marker-tracking based applications: 

• The marker does not have to be in the view 

As we said with marker-based tracking occlusion is a serious problem. But as the algorithm 

uses certain features of the image, the marker is only needed for initialization. 

Of course if the tracking fails, the marker has to be in the view of the user again. But 

after initialization the user can move in a way that the marker disappears in the video 

frame. 

• It is possible to zoom in and zoom out 

The user is now enabled to zoom in a scene. If the user gets closer the tracking still 

works, it just selects feature point that are more suited for the current tracking frame. 

In contrast with marker-based tracking zooming in will fail because it is likely that the 

marker is not completely in the video image. 

• Any preprocessed image can be used 

Any image can be used, with the restriction that there still has to be a small marker in 

it. No more artificial markers have to be placed in the environment. The user can also 

use the Magic Book as a simple textbook, because the pages consist of colorful images 

and not of markers anymore. 

2.5 User Interaction 

In user interaction we can differ between interfaces providing feedback to the user, mainly 

graphical output hardware and interfaces enabling the user to provide input to the system. 

2.5.1 Graphical Output Hardware 

Now we will have a look at the different possibilities for graphical output user interfaces. 

Again the right choice of a suited user interface is an important issue for table top AR. 

Desktop PC 

Due to the fact that users who are not familiar with new ways of user interaction methods, 

still the common desktop PC can be used. This is more suited for Outside-In tracking. A 

camera is installed at a fixed location delivering images of the Table Top workspace. On a 

monitor the users can see the feedback to their actions, like the rearrangement of markers for 

example. Graphical output through a monitor is not suited for Inside-Out tracking because 

the user moving the camera is probably not able to see visual feedback at the same moment. 

25


Anyway this is not the kind of user interaction we want, because as we discussed Inside-Out 

tracking is more suited. 

Head Mounted Display 

In a classical AR environment a user is equipped with a HMD. It could be compared to data 

glasses and it is attached to the users head. The first advantage is that the user is still able 

to use his hands. As the target is moving steadily this device is used for Inside-Out tracking 

and thus suited for our purposes. There are two different kinds for this graphical output 

device: 

Optical see-through. An optical see-through HMD is based on a semi-transparent mirror 

making it possible to look through the display. The virtual objects are augmented in 

the mirror as well. 

Video see-through. A video see-through HMD shows a video stream tracked by a camera 

attached to the HMD augmented with the virtual information. Note that with this 

method such a setup has to be calibrated in a way that the camera represents the user’s 

view. 

Delay of the vision-based tracking is always a problem with graphical output. With optical 

see-through displays the virtual world and with video see-through the real world will lag 

behind [45]. Thus it also a research issue to predict the head motion of a user to compensate 

the tracking delay (see the related work chapter 3.5). 

Hand-Held device 

This is a variation of the HMD developed at the HIT Lab. The HMD is put on an iron stick 

in order to use it like lenses or a visor (see figure 2.1). The user holds it in front of his eyes 

and it has the same effect as a usual HMD. A video-see through device is used and therefore 

a camera is attached to the hand held that it exactly matches the viewpoint of the user. 

Disadvantage is that the user has to use one hand to steer the device. The rational behind 

this device is that during exhibitions a lot of people want to use the application. It is a lot 

easier to hand over the device than to adjust the HMD for the next user. And despite that 

the HMD might get damaged after a certain period. For the HMD a Sony Glasstron Video 

see-through HMD 2 and a Logitec Quick Camera 4000 3 is used. 

Tablet PC, PDA, Mobile Phone 

The idea of this interaction device is that the user has a video-see through display, usually a 

small and flat computer in his hands. A camera is attached to the computer producing the 

see-through effect. An example is a tablet PC. It is possible to rotate the display on top of 

the computer and use it as a tablet. As computers are becoming smaller and smaller and the 

processing speed increases (Moores Law [35]), table top applications could also be ported 

to devices like a small mobile phone or a Personal Digital Assistant (PDA) handheld. The 

2 www.sony.com 

3 www.logitec.com 

26


Studierstube uses a PDA as an user interface for the Invisible Train application (see figure 

2.2). Efforts are also being made to port the ARToolkit on a mobile phone. This enables AR 

applications even for mobile phones. 

Projectors 

If a lot of people are interacting everyone has to be equipped with the necessary devices. An 

alternative is to use a projector to display the virtual information on the table. Thus users 

can interact with interaction devices, like markers or even tangible user interfaces and see 

the immediate result on the projection. A example for this is the Sheep application, which 

is a sheepherding game allowing multimodal interaction [47]. This game is also based on 

the DWARF framework we will describe later by the way. Another application application 

applying projectors is an intelligent kitchen [12]. With projectors virtual information is augmented 

to a kitchen helping the user to prepare a dinner for example. 

2.5.2 Input Interfaces 

An important issue is how a user can manipulate virtual objects. Thus additional user interfaces 

for collecting user input have to be provided. Still traditional interfaces like a mouse 

or keyboard are used, but as we have discussed earlier new and more suitable interfaces 

have to be found. Billinghust proposes to build tangible user interfaces based on marker 

tracking as well [10]. Marker-based optical tracking is used in his work. Markers are attached 

to real objects allowing users to interact with them. Moving, rotating and occluding 

the tangible objects results in a feedback by the application. User input can be provided by 

special components like a glove [58]. Again markers are attached to the glove itself and an 

optical tracking routine calculates the position and orientation of the hand. This technology 

makes it even possible to interact with virtual objects by ”touching” them. This leads to 

the question how to provide feedback caused by a collision of real and virtual objects. One 

issue in this kind of research are force-feedback devices, that provide a mechanical feedback 

to the user. One example are joysticks that adopt to a situation in a computer game and give 

feedback by making it harder to move in a certain direction. Another example for a forcefeedback 

device is the phantom by Sensable 4 which is a mechanical 6 DOF input device (see 

figure 2.8). 

2.6 Summary 

In this chapter we have discussed the context of table top AR. This class of applications 

has special requirements on tracking and user interaction. Applications for exhibitions and 

education being applied not only for research purposes have to be cheap, easy to install, 

usable and high-performance concerning the quality of tracking. 

All further evaluations will be on the Magic Book application. Here is a short summary of 

the key properties of the Magic Book. 

• Tracking 

4 http://www.sensable.com 

27


Figure 2.8: The force-feedback phantom device by Sensable on the left and a special prepared glove 

for user interaction (Studierstube) on the right 

The vision-based tracking technique is grounded on texture tracking of a 2D plane. 

Preprocessed images can be used to calculate the viewpoint of the user. 

• User Interaction 

The user interface chosen is a hand held device, a HMD attached to an iron stick. 

Because even children are familiar with the book paradigm the application is based on 

a tangible book. Just by turning a page the content changes. The scene can be rotated 

just by rotating a plate the book lying on. 

As described movement information is not considered in the tracking routine. We have 

seen that the search window size is a fixed parameter in the texture tracking routine. If we 

can apply movement information to change this parameter during runtime we can achieve 

better tracking results in terms of computation speed and robustness. The next chapter will 

introduce this idea to improve the tracking in the Magic Book application. 

28

CHAPTER 3 

A Hybrid Tracking Approach 

Now we have discussed the basic requirements for table top AR. The fundamentals of the 

underlying technologies used by the Magic Book have been introduced. Now we have to 

bring the texture tracking technology and the user behavior together. We have seen that 

feature points have to be ”found” again in the next video frame and that the size of the 

search window an important configuration parameter of the tracking routine. Again, with a 

large search size window the robustness of the tracking will increase on the one hand. But on 

the other hand the computation time of the tracking routine will rise. This leads to a lower 

update rate. 

If we can establish a relationship between the search window and the occurring movements 

of the handheld device used in the Magic Book, we can configure the texture tracking 

algorithm during runtime. Thus a hybrid tracking approach combining vision-based tracking 

and inertial tracking is introduced. 

3.1 Motivation 

In the previous chapter we had a look at the complexity of the template matching algorithm 

used by the texture tracking: 

O(searchSize 2 · templateSize 2 ) (3.1) 

We will consider the template size as a fixed constant of the tracking routine. In former 

considerations the search size parameter was constant during runtime. Again if we would 

lower the value the computation of the pose information the computation will speed up in 

a quadratic way. If we could get the information that almost no movement of the camera 

has happened between two tracking frames, the estimation that the feature point will be at 

the same position in the next frame would be almost correct. There is no need for a large 

window if we have a measurement for a change in position or orientation. In contrast if it is 

possible to derive the information that movement occurred we can adjust the search window 

to a large size. This would of course increase the computation time, but it is more likely that 

29

3 A Hybrid Tracking Approach 

the feature point will be found in the next frame again, because more potential points within 

the search area are considered. 

Our approach is to use additional information about the movement of the camera to alter 

the search size parameter during runtime according to a simple rule: 

movement ↓⇒ searchsize ↓ 

movement ↑⇒ searchsize ↑ 

Thus the first step is to evaluate the relationship of movements and adequate search window. 

And if possible we want to derive a linear mapping. 

3.2 An Inertial - Optical Tracker based Runtime Setup 

A requirement in order to get movement information of the camera is that we have to track 

the handheld device. So if we talk now of the movement of the handheld device the movement 

of the camera is meant, because the camera is mounted to the device. For tracking the 

handheld device movements another tracking technology with the following requirements 

needs to be integrated. 

• Integration in the User Interface 

To obtain the pose measurements of the handheld device a small tracking device has 

to be integrated in the user interface. It has to be assembled in a way that it does not 

influence the user behavior at all. 

• Update Rate 

In order to estimate a realistic search window size for the next tracking frame of the 

optical tracker, the update rates of the movement tracker has to be higher than the 

frame rate of the vision based tracking system. 

• Measurement state space 

A criteria for the measurement state space is the number of degrees of freedom described 

in the tracking introduction. If we would use a 6 DOF tracker with higher 

update rates than the optical tracking, then there is no need for the optical tracking at 

all. It could be replaced by the other tracking system if accurate enough. 

Therefore we will use a tracker providing 3 DOF relative orientation. No position is 

delivered by such a tracker. It still has to be evaluated if a tracker with this properties 

is sufficient enough for our purposes. 

• Price 

A of course price is also a key requirement. If this technology has to be integrated in a 

mobile phones, for example, an expensive tracking device will not be considered. 

30


For our purpose an inertial tracker, a gyroscope is suited best. When we introduced inertial 

trackers we also discussed that small measurement errors accumulate and cause drift 

after a short period. It provides relative orientation measurements. But as we are interested 

only in the relative change of orientation between two tracking frames drift does not matter 

for our setup. Drift does not affect our measurements. 

But a big question is if the relative measurement of orientation suited for the configuration 

of the search window size. To prove if this is possible we first have to discover a relationship 

between the movement measurements and the feature point tracking routine. If we are able 

to find such a relationship we can integrate the mapping in our software design. 

3.3 Configuration of the setup 

As described the first question is if there is a relationship, then the second question is to find 

a mapping between change in orientation and search window size: 

ftexturetracking(∆ orientation) = search window size (3.2) 

This mapping has to be integrated into the software of the hybrid tracking setup. Thus 

one requirement for the software design is to allow a dynamic configuration of the texture 

tracking routine. For every tracking frame the proper value for the search window size has to 

be set according to a mapping. This leads to the question how to determine this relationship 

and how to evaluate if this approach is possible at all. 

The idea is to do an user study. The study should give hints on how people actually use 

the Magic Book. While performing the study we want to retrieve data of the movement 

of the handheld device on the one hand, but we also want to have a deeper look at certain 

properties of a feature point on the other hand. Both data sets have to be explored the degree 

of correlation has to be measured. Correlation is a measurement to what degree two data sets 

are related. 

3.4 Motivation for a User Study 

As discussed we have to record data of the movement of the handheld device and try to 

relate this data to properties of feature point tracking, which also have to be logged. 

• Recording the pose information of the handheld device 

In order to record absolute full 6DOF we have to get the absolute position pHandheld 

as well as the absolute orientation qHandheld. To record the pose information a 6DOF 

magnetic tracker will be used. You might ask why we are also considering the position 

as well, because for the runtime setup only the orientation is needed. The magnetic 

tracker is used for the analysis of movement. If we recognize that people mainly use 

the Magic Book by changing the position of the handheld device our idea does not 

work. Please note that the 6DOF tracker is only for the purpose of the user study, not 

for the runtime environment. The runtime setup still consists of the handheld device 

and the gyroscope. 

31


Figure 3.1: The 2D coordinates of a feature point are tracked over several frames. The feature point 

moves through the 2D video plane 

• Recording 2D coordinates of feature points in the video image 

One possibility to relate the obtained pose information to feature point tracking is to 

record the 2D video frame coordinates of the feature points in every frame. This leads 

to a change in the 2D position for the feature point ∆pF P , if the feature point is tracked 

over a period of several frames. This change in position can be annotated with the 

corresponding change in orientation given by the magnetic tracker. Deriving these 

”chains”, where the same feature point is tracked over several frames is dependent on 

the selection of the best suited feature points by the algorithm. In figure 3.1 a feature 

point is tracked for several frames and ”moves” through the 2D video plane. This 

movement is obviously caused by camera movement. 

3.5 Related Work 

This section should give an overview of the current research related to this thesis. Main issue 

of my work is to characterize the movement of the user interface to improve the computation 

and the robustness of the system. But we also want to have a look at the current algorithms 

for natural feature tracking first. 

3.5.1 Natural Feature Tracking 

According to [55] an image sequence can be represented as any function of three variables 

I(x, y, t), with x, y a spatial and discrete variables and t as discrete variable for time. As 

patterns move from frame to frame in an image stream I satisfies the following equation: 

I(x, y, t + τ) = I(x − ξ, y − η, t) (3.3) 

In other words this equation says that we can take a picture of a scene at later point in time 

and we can obtain the image by moving every point p = (x, y) by a displacement d = (ξ, η). 

If we want to track certain features of a scene over several frames algorithms have to face 

this displacement. In our case camera movement occurs and I(x, y, ti+1) = I(x, y, ti). 

32


In his ”State of the Art Report of Natural Feature Tracking” Vial [60] gives an overview 

of principle features that can be found in an image 3.1. He also provides an overview of 

common methods to extract the features. Generally we can distinguish between modelbased 

and move-matching methods [51] for feature tracking. Model-based tracking requires 

a model definition of the object to be tracked. Marker detection methods like the ARToolkit, 

but also the texture tracking of the ARToolkit can be categorized in this method. Reason 

for that is that all the images used with the toolkit have to be preprocessed first. During 

runtime the preprocessed data is applied. Another possibility for a model-based approach 

is to consider a CAD model of the environment in the tracking routine [29]. Thus such 

methods are not suited for unprepared environments. In contrast move-matching methods 

estimate a correspondence of 2D image movements to 3D position and orientation without 

any underlying model. 

0D 1D 2D Motion 

Corners, Points Contours, Edges, 

Chains, Lines, 

Circles, Ellipses 

Uniform Regions, 

Textured 

Areas, Surface 

patches 

Table 3.1: Overview of features in an image [60] 

Regions with 

similar motion 

Thus the first step is to select ”good features in regions with rich enough texture [55]” and 

then apply tracking techniques to find the corresponding point pi+1 = (x − ξ, y − η) of point 

pi = (x, y) in the following frame. This is often applied in closed-loop architectures [51]: 

 

1. Detect N interest points in frame i + 1, resulting in the set 

x i+1 

j 

2. Match interest point from frame i to i + 1 and find the correspondences x i j 

3. Use these correspondences to compute pose 

N 

j=1 

←→ xi+1 

k 

The problem is to find the correspondences of features between tracking frames. And 

that is were our approach makes sense. While tracking feature points Lucas and Kanade 

[33] proposed the measurement of similarity between fixed search windows of two consecutive 

frames. This is based on the assumption that the displacements d from frame to frame 

are small. The correlation of the windows is defined as sum of squared intensity differences. 

This method is applied by the texture tracking ARToolkit. Every algorithm using this 

method can be extended with our hybrid tracking approach. 

Neumann and You use similar closed-loop approach. They use the concept of optical flow, 

which observes the motion of the image pixels as a whole. They combine region tracking and 

feature point tracking [38][37]. First regions with similar movements are extracted and the 

tracking is refined by 0D point tracking. Region motion tracking is based on optical flow 

and relies on the spatial-temporal gradients of an image. Using region tracking a movement 

model is derived. Because we know where a feature is located within a region we 

can refine the region tracking by matching the corresponding points by applying correlation 

methods as well. This work also proposes a verification and evaluation mechanism. For 

33


every estimation the confidence is assessed. If the confidence is poor the result is refined. 

This approach allows larger movements as well. Generally region tracking allows larger 

camera movements while point feature tracking itself is only suited for small displacements. 

In contrast our approach assumes small inter-frame displacements which results in a rather 

simple movement model. It has to be evaluated if our idea can be applied here as well for 

the refinement of the tracking. 

All these algorithms have the displacement problems for finding the tracked point of interest 

in the next frame again. As a summary our idea is suited for all feature tracking 

algorithms assuming small displacements between frames. To find corresponding features 

points windows are compared, not pixels itself. These windows correspond to the term templates 

in the texture tracking. Our approach tries to influence the number of comparisons 

of these windows by applying orientation information. With this information the search 

window is adjusted. 

3.5.2 Hybrid Tracking 

As we said hybrid tracking combines different tracking technologies to compensate drawbacks 

of a single tracker. We will focus on related work on combinations of vision-based 

and inertial tracking. As discussed drawbacks of vision-based tracking are the occlusion 

of tracked features and computational expensive algorithms causing additional delay. An 

inertial tracker accumulates small measurement errors that cause drift [46]. 

Azuma motivates the usage of hybrid tracking systems for unprepared environments [4]. 

In prepared environments the user or developer is in control of tracked objects. A user 

can place fiducial markers on a table for example. This is more difficult in outdoor AR 

applications. Light conditions change and visual landmarks used for feature tracking may be 

occluded. Integrating a gyroscope provides a good estimate of orientation and a reasonable 

guess to reduce the search space of the optical tracking algorithm. This idea is picked up in 

our work. He uses the setup the following way: If a user stops moving, the video tracking 

is locked on traceable features and the accumulated drift in the inertial tracker is corrected. 

He distinguishes two methods for the fusion of inputs from inertial and optical tracker: 

1. Use the gyroscope orientation as an estimate for orientation of the vision tracker. 

This compensates inaccurate measurements of the vision-based tracker, but the inertial 

tracker will drift and cause wrong result after a certain period. 

2. Use vision-based tracker to compensate drift 

Every frame of the vision-based tracker corrects the measurements of the inertial tracker. 

Thus drift does not occur, but inaccurate measurements of the vision-based tracker will 

be propagated. 

Handling unprepared environments is an important issue for outdoor applications. It is 

not realistic placing fiducial markers or preprocessed textures in the outdoor environment. 

Our environment is prepared, we can expect accurate measurements of the optical tracker. 

Drift does not affect our setup either because we only use relative measurements of the 

gyroscope tracker. In [68] the vision-based algorithm introduced earlier [37] is applied for 

34


a hybrid setup. The frame to frame prediction of camera orientation by the inertial tracker 

and the correction of the accumulated drift exploits the nature of both trackers. The inertial 

predicts the motion of image features and the estimated positions are refined by searching for 

local matches for the feature points. This work also addresses the importance of calibration 

issues of the setup. 

In an indoor and mobile path finding setup by the Studierstube a user is guided through 

an unfamiliar building to a destination room [26]. This application combines marker and 

inertial tracking as well. A camera mounted on a helmet worn by the user grabs video 

images containing square markers attached on walls. Additionally an inertial tracker also 

attached to the HMD provided head orientation. This setup tries to compensate the low 

update rates of vision-based tracking with a gyroscope. In between measurements from the 

optical tracker the gyroscope gives the user’s viewing direction. The drift drawback of the 

gyro is corrected with the orientation given by the next frame of the optical tracker: 

qHybridview = qvision = qcorrection ◦ qinertial with qcorrection = qvision ◦ q ∗ inertial 

This method has been applied from [40]. In this setup an active ultrasonic tracking system 

called Bat system is combined with an inertial tracker. In contrast to Azuma these are indoor 

setups in a prepared environment, thus the vision-based tracking is very reliable. 

In chapter 1 we have already introduced a method to predict and correct measurements, 

the Kalman Filter. Figure 3.2 shows the basic mechanism of the filer loop. This filter has 

several benefits. On the one hand it can predict measurements even if actual measurements 

are not available yet due to low updated rates of a tracking system. The prediction estimates 

can be used prior the availability of the actual measurements (State estimation). In the correction 

phase the parameters for the prediction are recalculated (State update). Although 

prediction is not necessary a hybrid tracking issue sensor fusion approaches can be used to 

predict new measurements. Klein and Drummond propose a filter-architecture for a modelbased 

hybrid tracking approach [29]. Model-based approach in this case means that a CAD 

model of the tracked environment is available. Again the idea is that the prediction of the 

new camera pose is estimated by an inertial tracker. Thus with this information the visual 

tracking system is able to start in the right place. With the results of the visual tracking frame 

a new system state is calculated which is used for further prediction and the correction of 

the accumulated tracking error of the inertial tracker. Again a huge focus of this work is the 

issue of calibration of the used trackers. Neumann’s and You’s design of the filter [67] even 

allows a failure of the vision-based tracker, due to occlusion for example, to update the current 

state of the system. This work focuses on fiducial marker tracking in the vision-based 

tracking routine. The predicted pose can be utilized in this approach as well. For the visionbased 

and the gyroscope measurements independent correction channels are provided. For 

his outdoor reality system Azuma fuses the output of gyroscopes and of a compass [1]. Thus 

the user’s head movement can be predicted and the noise in the compass measurements can 

be filtered. He evaluated that the compass noise makes it hard for outdoor registration tasks. 

Applying a filter using additional gyroscopes the system was stabilized. 

We are combining gyroscopes and vision-based tracking which is a hybrid tracking approach. 

But we do not fuse the output together. We only apply the relative orientation given 

35


Figure 3.2: The prediction and correction loop of the Kalman filter 

by the gyroscopes. Thus a difficult calibration step is not necessary, because we consider and 

evaluate the orientation independently. 

3.5.3 Head motion prediction 

As we already introduced a very common setup for AR applications is a HMD-based setup. 

An important problem is the end-to-end system delay. The user always has the impression 

that the virtual content lags behind his actual movements. Hence the head movement has 

to be predicted. We already introduced the Kalman filter as a prediction method, but still 

other filtering techniques are available. Azuma tries to compare two classes of head motion 

predictors [3]. Both methods are analyzed in the frequency domain in order to obtain 

characteristics of the predicted signal as a function of system delay and input motion. Shaw 

and Liang address the problem of head motion as well. In a first experiment they try to 

characterize head motion [48]. Especially changes in head orientation are important because 

the change of viewing direction often causes more changes in the scene. The benefits of this 

knowledge should be used for the design of a predictive filter. The experiments consist of 

a user study where participants have to fulfill several navigation tasks. The test person sits 

on a chair and has to look at markers on a wall in a certain sequence. The head position and 

orientation is tracked during the study. They found out that the user’s head moves along a 

great-circle arc and that the velocity of orientation seems to be symmetric while accelerating 

and slowing down. The second step was to design the filter [32]. They applied the knowledge 

that the felt delay was mainly caused by delay in orientation and jittering is mostly 

caused by noise in the position data. They also recognized that the noise in position was 

higher that the noise in orientation. As a consequence they designed a prediction filter to 

address the orientation delay and an anisotropic low pass filter to filter the noise in the position 

measurements. They also evaluated an adequate prediction length of the filter. As one 

conclusion they noticed that the prediction of hand movement is a more difficult process. In 

our approach the movement of the handheld device is rather comparable to hand movement 

than to head motion. 

3.5.4 Table-Top Augmented Reality 

The basic motivations for table top AR have already been discussed in chapter 2.1 and examples 

have been shown. For references for related projects please have a look there. Here 

is just a short summary of application domains for table top AR. But obviously there are still 

36

unexplored domains for table top AR as well. 

• Exhibitions 

• Education 

• Gaming 

• Interactive Storytelling 

• Conferencing 


Every table top AR application has to face the problem of 3D registration. Mark Billinghurst 

discusses this problem in several publications. The intention is of the shared space technology 

is to enable interaction with virtual and physical objects, but also the collaborative 

interaction with other users [9] [8]. The shared space could be used for a variety of applications. 

He mainly uses the ARToolkit as optical tracking system [27]. A lot of table top 

applications use the ARToolkit as tracking component either to display the virtual content 

[61] or to interact with the system [57][10]. Reasons for that may be that it is free available 

and easy to integrate in the application. A deeper knowledge of tracking and image recognition 

algorithms is not necessary. Therefore the ideas provided in this thesis are suited for 

all vision-based tracking application in horizontal table top environments. 

3.5.5 Our approach 

A cheap gyroscope only giving relative spacial information in sufficient even if drift occurs, 

because only the relative change of orientation between two frames is considered. Our application 

is set up in a prepared environment. It means that we are using a preprocessed 

image for tracking. There is no need to filter the gyro orientation, because it is not used for 

the registration of the fairy tale scene. And although head mounted displays are also suited 

for table top AR, the input device used in the Magic Book is the handheld visor. The field of 

interest of the user will be the horizontal setup on the table. The the range of possible movements 

is rather restricted to the table area and small inter-frame displacements are likely. 

But of course an interesting question would be as well how user behavior changes with different 

input and output devices. We have seen that out approach is suited for vision-based 

trackers assuming small inter-frame displacements. Future work has to evaluated to what 

degree our ideas can be applied for other tracking algorithms. As we said the motion of 

the handheld device will differ from head motion. Hence out approach might motivate the 

characterization of other user interfaces. 

In my opinion table top AR applications are a way to address a broad mass of people with 

the new technology. Just image augmenting an exhibition piece with virtual information 

using a cellphone display with a attached camera. Thus fast, robust and accurate tracking 

will seriously influence the user’s acceptance of the technology. Our approach will help to 

improve this. 

37

3.6 Summary 


According to our idea we now do not consider the search window size as a fixed parameter 

inside the texture tracking algorithm anymore. Our approach is to adjust this size during 

runtime according to relative orientation information given by a cheap and self-contained 

gyroscope tracker. To find a relationship between the feature point tracking routine and 

changes in orientation of the handheld device ideas for a user study have been discussed. 

From now on we distinguish between a runtime setup and a user study setup. 

Runtime setup. The runtime setup consists of the handheld device with an integrated gyroscope, 

a camera as input- and a altered HMD as output device. A requirement for this 

setup is the dynamic configuration of the search size window during runtime. 

User Study setup. The user study setup consist of a magnetic tracker recording the movements 

of the handheld. The ARToolkit tracking has to be extended to log the feature 

point coordinates in every video frame. 

The next chapter will focus on the software architecture of the runtime setup. After that 

the main part of the thesis will be introduced: the design, execution and evaluation of the 

user study. 

38

CHAPTER 4 

A Software Architecture based on DWARF 

To enable a dynamic configuration of the optical-inertial setup during runtime a software 

architecture has to be provided. The information given by the gyroscope tracker has to be 

processed and used to set the search window size of the texture tracking routine. The Distributed 

Wearable Augmented Reality Framework DWARF is a component based framework to 

build AR applications [17][6]. A CORBA 1 -based infrastructure allows communication between 

these components, so distribution which is important for mobile setups is provided. 

Reusable components for tracking, rendering and user interaction enable rapid prototyping. 

First I will give a short overview of the basic principles of DWARF. Then the requirements 

for my architecture will be discussed and I will show the structure of the resulting system. 

Only the necessary terms relevant for this thesis will be described. More information and 

tutorials about DWARF can be found in the corresponding references. 

4.1 DWARF 

DWARF has been developed at the Technische Universität München (TUM) from the AR 

research group of Prof. Klinker. The basic concept of DWARF is that various applications 

consist of interdependent and distributed components that can be reused for a variety of 

different applications even for several application domains. 

4.1.1 Services 

In DWARF components are called services and can be distributed within the network infrastructure. 

Every service is running as a single process. To build an application these 

services have to be combined in order to fulfill a certain task. To accomplish this, interdependent 

services have to exchange data via mechanisms provided by the framework. Every 

service has its service description. The XML-notated description describes the input data 

needed and the output data provided by the service. This configuration is responsible for 

1 http://www.corba.org 

39

4 A Software Architecture based on DWARF 

the connection with other services delivering or demanding exactly what is needed, the so 

called needs, or provided, the abilities, by the service. The configuration of a service can also 

change or be reconfigured during runtime. 

Needs and Abilities 

• Needs 

A need is a certain property of a service in order to request a functionality from another 

service running in the network. The middleware connects these two services and they 

continue to work on a peer-to-peer basis. The services can communicate via different 

communication protocols described by the need description. 

• Abilities 

Abilities are the correspondence to Needs. They are specifying a certain functionality 

provided to other services. The ability description also sets the communication protocol 

used for communication (similar to the need description) 

Further restrictions can be made with attributes and predicates. Attributes can be set for 

abilities. It can specify certain additional properties of the ability. For example if two services 

provide similar abilities, like two cameras are attached to a user. To distinguish between 

them for every ability an additional attribute is set. Now the need for video data can specify 

a predicate in order to connect with the right ability. In the example 4.1.3 this means that the 

service only wants information from the video data ability with the attribute ”head”. 

If the middleware recognizes that the service descriptions of two services match according 

to their needs and abilities (with predicates and attributes), they get connected and can start 

to exchange data via several communication protocols. 

Communication 

Here I will give a short description of the main communication protocols used in DWARF. 

As mentioned the way of communication is specified in the need- and ability description 

and has to match with each other. Some of them are CORBA-based mechanisms, thus an 

Interface Description Language (IDL) interface has to be provided. 

Method Calls: A service exports a method provided to other services. The corresponding 

partner has to import this method. The service is able to call the method on the remote 

object now, which is similar to a method call on a local object. This is realized by 

CORBA Remote Procedure Calls RPCs. 

Events: A service can send events and another service is able to subscribe these events. This 

is realized by the CORBA Notification Service. 

Shared memory: A service could also write data in a local shared memory. Another service 

is able to read out of the shared memory to obtain the data. Note that both services 

have to run on the same machine, which is a restriction to the distribution of the components. 

40


Figure 4.1: The example shows two services: the Videograbber with the ability for VideoData demanded 

by the need of the VideoDisplay service 

Depending on the requirements of the application the right choice of communication protocols 

has to be selected. As a short summary a need and an ability consist of a name, used 

as an identifier, a type, describing the kind of data offered or demanded and the connector 

protocol specifying the communication mechanism. 

4.1.2 Service Manager 

The DWARF service manager is responsible for connecting the components. It is the ”heart” component 

of the framework. At each network node a service manager is running. Every time a service 

is started it registers at the service manager which collects all the information about local 

services. Once a service is registered the service manager looks for a suited connection partner. 

It fulfills the task of a broker. It tries to find the corresponding ability for a need. If two 

matching service have been found the service manager establishes a connection between the 

services. Now the matching services can communicate via a direct communication channel 

on a peer-to-peer basis, the service manager is not needed anymore for communication. 

4.1.3 An example 

This little example will show the terms described above (see figure 4.1). 

The notation used for DWARF architectures is an Unified Modeling Language (UML [20]) 

extension for component diagrams providing mechanisms for the need- and ability relationships. 

Needs are represented by half-circles and abilities by full-circles. 

The example shows two services, the VideoGrabber with the ability for ”type=VideoData” and 

the VideoDisplay with the need for type=”VideoData”. Figure (4.2) shows the corresponding 

XML service descriptions. On a first look it is obvious that the type of need and ability 

matches. The VideoDisplay service sets the predicate ”type=head” which means that this service 

only wants to connect to a ability with the attribute ”type=head”. As we can see the 

ability ”provideImage” fulfills this. Both services are connected by the service manager and 

they communicate via shared memory. 

The services can be reused for different purposes. As we will see later on the video stream 

provided by the VideoGrabber can also be used for optical tracking. The VideoData is forwarded 

to another service performing the texture tracking. 

41


 

 

 

 

 

 

 

 

 

 

 

Figure 4.2: An example: XML description of the VideoGrabber and the VideoDisplay services 

connected via the shared memory communication mechanism. The need and the ability for 

”type=VideoData”and the predicate for ”type=head”matches. 

4.2 Software Architecture for a Dynamic Configuration during 

Runtime 

DWARF has the capabilities for a suited software design for our runtime setup, that uses 

the gyroscope information for the dynamic configuration of the texture tracking. First we 

will have a look at the requirements for such a software design. A task was to integrate 

and reuse existing components into the setup. DWARF already provides an architecture for 

optical tracking based on the ARToolkit. This architecture was proposed in Wagners PhD 

thesis [62]. 

4.2.1 Existing Architecture 

In figure 4.3 the existing architecture is shown in the UML syntax described above. Note that 

PoseData is a data struct containing information about position and orientation (6DOFPose- 

Data) or orientation (3DOFPoseData) only. In DWARF it is not distinguished between them. 

If a tracker only provides orientation then the values for position will not be set in the pose 

data structure. 

The ARToolkit is split in to several single components. The Videograbber service grabs the 

video stream and provides it via shared memory to the ARTkMarkerDetection service. The 

marker detection component is the core of the ARToolkit. It searches for marker features in 

the video frame. In order to keep the optical tracker as flexible and reusable the ARTkMarkerConfiguration 

is responsible to configure the ARTkMarkerDetection with marker data. So 

marker data can be loaded and unloaded during runtime. The ARTkMarkerDetection service 

does not provide pose data directly. It sends a ARTkFrameMarkers structure which contains 

all the information about the detected markers in the video frame. To extract the PoseData 

out of the marker structure the ARTkPoseReconstruct service is needed. This has the advantage 

that the marker detection can also be used for other purposes, if the pose information is 

not relevant for the application. In [63] the ARToolkit is used for wide area tracking, for example. 

The ARTkPoseReconstruct provides 6DOFPoseData and can be used by other services 

42


Figure 4.3: The existing architecture based on DWARF Services 

for several goals like displaying a 3D model for example. A complete rationale behind this 

architecture can be found in Wagners thesis. 

4.2.2 Requirements for new architecture 

First the functionality of the architecture has to be extended with the texture tracking version 

of the ARToolkit. So similar to the ARTkMarkerDetection a component has to be written that 

performs the texture tracking routine, the ARTkNFTDetection. In terms of reusing the existing 

components the following considerations have to be made: 

• VideoGrabber service 

The VideoGrabber can be reused without any restrictions. The CameraData can be read 

directly from the shared memory, like the ARTkMarkerDetection does. Again the only 

restriction is that both services (VideoGrabber and ARTkNFTDetection) have to run on 

the same machine. 

• ARTkPoseReconstruct service 

To reuse this component the interfaces (Needs and Abilities) have to redesigned. The 

ARTkNFTDetection does not provide a data structure with the detected markers. It only 

provides the homographie-matrix used for the extraction of the 6DOFPoseData. The 

ARTkPoseReconstruct service calculates this matrix out of the marker structure and the 

extracts the pose information. Thus another need of the ARTkPoseReconstruct service 

has to be integrated allowing the estimation of pose with matrix data given by the 

ARTkNFTDetection service. 

• Configuration of the search window 

43


manually: A simple graphical UI should make it possible to alter the search window 

size of the texture tracking. Similar to the ARTkNFTDetection service a configuration 

component enabling this is needed: the ARTkNFTConfiguration service. If 

no configuration of the search window is needed at all the ARTkMarkerDetection 

service is able to run with the configuration service. 

dynamically: This is the requirement for our runtime setup described in the previous 

chapter. Information given by a gyroscope should be used to estimate a new 

search size window. This estimation is based on the mapping of the movement 

of the handheld device and the feature point 2D coordinates that still has to be 

found (see chapter 3). A gyroscope tracking unit can connect to this configuration 

component and deliver 3DOF orientation information (gyroscope service). 

The ARTkNFTConfiguration allows both: a manual and a dynamic configuration of the 

search window size. 

4.2.3 System Design 

First we will have a look at the new components and the redesign of the interface of the 

ARTkPoseReconstruct service necessary to realize the architecture. 

New services 

Looking at the small requirements elicitation for the new software design the following new 

components can be identified. For a deeper look into requirements analysis and software 

engineering in general have a look at Bruegge’s ”Object-Oriented Software Engineering” 

book [13]. 

• ARTkNFTDetection 

This service implements the loop for the optical tracking. It has an ability for ARTkN- 

FTPoseMatrix. This data contains the matrix describing the position and orientation of 

the 2D plane. The pose data can be extracted from this data. 

• ARTkNFTConfiguration 

Interfacing the ARTkNFTDetection service this service is able to set the search window 

size of the texture tracking. Thus this service provides the ability NFTSearchSizeWindow. 

Adaption of the ARTkPoseReconstruct interface 

A new need for NFTSearchWindowSize has to be provided. This need matches the ability 

of the ARTkNFTDetection and these services are able to connect. In contrast to the marker 

detection, where first the marker structure is used to calculated the pose matrix, the matrix 

is provided directly by the texture tracking. Reason for that is that the tracking routine 

and the calculation of the matrix out of the feature points cannot be separated easily. The 

calculation of the 6DOF pose data can be performed in the same way now. 

44

Resulting architecture 


In figure 4.4 shows the resulting architecture with the new services. Old components are 

drawn in gray. 

Figure 4.4: The resulting architecture integrating the new components 

The following dependencies between the connected services are established by the service 

manager during runtime. The information is given in the XML service description for every 

service. 

• VideoGrabber ←→ ARTkMarkerDetection 

Type CameraData 

Communication method Shared Memory 

Description The ARTkMarkerDetection reads the video stream out of the shared memory 

• ARTkNFTDetection ←→ ARTkNFTConfiguration 

Type: NFTSearchWindowSize 

Communication method: Method Call 

Description: The ARTkMarkerConfiguration calls an exported method of the ARTkNFT- 

Detection to set the search size of the texture tracking. 

• ARTkNFTConfiguration ←→ Gyroscope 

45

Type: 3DOFPoseData 

Communication method: Event 


Description: Every frame the Gyroscope service sends the new pose data to the ARTkN- 

FTConfiguration service. This information then is evaluated and the appropriate 

search size is set. 

• ARTkNFTDetection ←→ ARTkPoseReconstruct 

Type: ARTkNFTPoseMatrix 

Communication method: Event 

Description: As described the pose matrix is sent to the new need interface of the ARTk- 

PoseReconstruct. 

Figure 4.5 shows the resulting UML sequence diagram. The communication of the services 

is described in relation of time. We can see that the frame rate of the gyroscope is higher 

than the frame rate of the optical tracker and that the ARTkNFTConfiguration sets the search 

window size according to the received pose data events. 

Figure 4.5: UML Sequence diagram: Interaction of the DWARF services 

This architecture meets the requirements described above. On the one hand the texture 

tracking can be used as an independent component without any configuration of the search 

window. Then the parameter is set to a constant value. But on the other hand the parameter 

could also be set dynamically if a gyroscope tracking service and the ARTkNFTConfiguration 

service are active. 

46

4.2.4 Implementation 


Linux is the main platform for DWARF. A Fedora Linux distribution 2 (version 2) was used 

to run DWARF, although SUSE 3 distributions are suited better. Core components have been 

developed and tested under SUSE only. The reason for using Fedora was that a SUSE distribution 

was not available in New Zealand due to high internet costs. DWARF provides 

support for several programming languages like C++, Java and Python. The existing architecture 

was written in C++, thus the new services have also been developed in C++. The 

graphical user interface for setting the search window size manually (in the ARTkNFTConfiguration 

service) was implemented with the QT Toolkit. Figure 4.6 shows a screenshot of 

the runtime setup. It shows the preprocessed 2D image plane and a virtual plane registered 

on top of it. The search window size could be set by a slider in the graphical user interface. 

Still is it not possible to set the search window dynamically, because it is not clear if a proper 

mapping can be found. But even if it is not possible to find such a mapping the texture 

tracking can be use in the DWARF framework now. 

Figure 4.6: Runtime environment: The search window size can be set manually or by the orientation 

information given by a gyroscope (3 DOF Intersense Tracker) 

2 http://fedora.redhat.com/ 

3 http://www.suse.com/ 

47

4.2.5 Summary 


We have introduced a software architecture based on the DWARF framework accomplishing 

the discussed requirements of a dynamic configuration during runtime. The texture tracking 

ARToolkit is now integrated in the DWARF framework, which was one of the core requirement 

for this thesis. A mechanism for deriving a search window size out of the gyroscope 

orientation still has to be implemented. 

The next chapters will show that it is not an easy task to express this relationship. Next 

we will describe the design and performance of the user study motivated in chapter 3. 

48

CHAPTER 5 

User Study 

As we explained in the previous chapter we want to find a mapping between the feature 

point tracking and the change in orientation of the handheld device. To obtain data to evaluate 

a user observation has to be made. A logging infrastructure records the data during the 

study. 

This chapter will describe the goals and the design of the user study. As said the overall 

goal is to get data from a certain amount of people in order to analyze it. The texture tracking 

AR Toolkit is altered in a way that we can retrieve the 2D feature point coordinates in every 

video frame. The tracking setup is extended with a Ascension Flock of Bird magnetic tracker 

to track the user movements. To obtain comparable sets of data special tasks have been 

designed, in which all the test persons have to answer questions about the current scenes. 

This is done to force the user to act a certain way. 

5.1 Goals of the User Study 

The motivations for user studies and evaluations can be very different. According to [44] 

four reasons for doing evaluations could be identified: 

• Understanding the world 

How do future users use a technology? How do they employ the new system in their 

workplace? The main motivation for that kind of evaluations is understanding the 

user and his behavior. 

• Comparing designs 

Ofter system designers have to decide which input method to chose. Therefore evaluations 

of these different methods have to be made. A evaluation should give hints 

which method is more accepted by the user and leads to a better performance. An 

example for this can be found in Kulas master thesis [30], in which he focuses on usability 

aspects of ubiquitous systems and performs a sample user study to compare 

two menu designs. 

49

• Engineering towards a target 

5 User Study 

Studies are made in order to evaluated if the system accomplishes certain goals, for 

example a better performance than a competitors product. 

• Checking conformance to a standard 

These studies are mainly testing procedures to evaluates if a systems meets required 

standards. 

In our evaluation we want to ”understand the world” better. Especially we want to have a 

deeper look at the following aspect: How is the tracking related to the input provided by the 

user through movements of the handheld device. This user study is meant for collecting 

enough data to describe a relationship and if possible a mapping between the two data 

sources. Here is an overview of the expected outcomes of our user study. Of course not 

all of these goals could be highlighted within the limited time of this thesis. But potential for 

further investigations in that research area are shown. 

• Collect user data 

Data has to be collected. As we described in the previous chapters there is a lot potential 

to derive possible conclusions using the data. Of course finding a mapping is the 

prime goal but collecting the data is a first big step and took most of the time during 

this work. 

• Find a mapping between user movement and search window size 

This is the idea described in chapter 3. The mapping is expressed in the function introduced: 

ftexturetracking(∆ orientation) = search window size (5.1) 

• Find a relationship between user tasks and the related movements 

If we know which actions and movements are connected with certain tasks, we can 

adopt our tracking not only to general mapping, but also to the designated task. Therefore 

we first have to detect possible tasks in table top AR and try to discover a dependency 

between these tasks and the tracking results. 

• Characterize movement of the handheld device 

We are interested if the movement can be characterized and are these results valid for 

every user. Therefore user tasks are needed to let the user perform similar actions. 

• Evaluation of a suited task design for table top AR 

Like Shaw introduced in his experiment to characterize head motion [48] certain user 

tasks have to be designed. We will introduce and abstract tasks that might be suited 

for a variety of table top applications. 

• Collect feedback from potential users 

A questionnaire was designed to collect additional feedback on the Magic Book and 

on the tasks. 

50

5 User Study 

• Observe anything else that might be interesting 

Still it is important to keep the eyes open for any interesting observation during the 

execution of the user study and the analysis of the collected data. 

5.2 User Study design 

This section discusses all the relevant aspects of the design of this user study. First we will 

have a look at the recording of the separate data sources: the pose data given by the magnetic 

tracking device to track the hand held device and the 2D positions of the tracked feature 

points. The Magic Book has been implemented on Windows using Microsoft Visual Study 

6 1 . Therefore first the logging infrastructure had to be integrated into the existing system. 

After that we will introduce our task design for table top AR. 

5.2.1 Movement Tracking of the Hand-Held Device 

The issue is to track the position and orientation of the handheld device. Therefore a tracking 

device providing 6DOF in needed. Like we have introduced in the beginning of this thesis 

a magnetic tracker can measure position and orientation. The Flock Of Bird system has a 

update rate of about 90 frames per second, in comparison the vision-based texture tracking 

runs with 30 frames per second. As a repetition a base station establishes a magnetic field 

and the pose data of several sensors could be tracked within the range of the magnetic field. 

Hence it is possible to track more than one object at one time. Drawback of this tracker is 

that the data might be disturbed by artificial magnetic fields produced by a CRT monitor for 

example. And it is almost impossible to fix a setup in a room without any interference. Additionally 

the sensors have to be close to the base station to obtain accurate tracking results. 

We will also see that we have to be careful with attaching a sensor directly to the handheld 

device, because of its iron stick. Other tracking devices like an infrared optical tracking device, 

the A.R.T. tracking system 2 for example, might be an alternative. A huge advantage of 

the magnetic tracker is that there is no need for a line of sight between the sender and the 

receiver. So it is no requirement for the setup that the user is restricted not to occlude the 

line of sight. The setup for this user study would not be suited well for registering 3D virtual 

objects, because interference will lead to jittering. 

Software support for Flock of Birds 

For using and developing applications with the Ascension Flock Of Bird tracker a commercial 

library is available. The ”Eden Library” provides a threaded query mechanism to get 

the measurements. To connect to the host system communication links via TCP/IP or serial 

port (RS232) are supported 3 . For spatial data representation it supports an OpenGL pose 

matrix, a position vector and euler angles. As we can see later quaternions can be derived 

by a simple algorithm with the OpenGL matrix as input parameter. 

1 http://msdn.microsoft.com/vstudio/ 

2 http://www.ar-tracking.de/ 

3 For further information on The Eden Library please contact Phillip Lamb, phil@eden.net.nz 

51

5 User Study 

Figure 5.1: The Ascension Flock of Birds with the sender (black cube) and the host system in the 

background 

Calibration 

The pose data given by the Flock of Bird is always given in the coordinate system of the 

sending station. We get the absolute position and orientation to the origin of the Flock of 

Birds coordinate system. The origin of this coordinate system lies in the center of the black 

cube (see figure 5.1). In order to bring this coordinate system in relation to the magic book 

we have to to a calibration step. We have the possibility to track several targets with the 

Flock of Birds, thus we will track the tangible book as well. Figure 5.2 shows two sensors: 

the first one is attached to the upper left corner of the book. This is supposed to be the 

origin of the Magic Book coordinate system. The other sensor is attached to the hand held 

device. Our aim is to record the pose data of the second ”bird” calibrated to the Magic Book 

coordinate system. Next we will explain the steps to calculate this. All the methods used 

in the following steps were taken from the DWARF utility package. This toolbox provides 

all the basic transformations and calculations for spatial data. The mathematical basics for 

these methods can be found in the corresponding literature [49]. 

The Eden Library provides a OpenGL matrix for the Magic Book sensor and for the hand 

held sensor, Mbook and Mhandheld. 

Position: Calculation the position of the handheld is an easy task by simply subtracting the 

position vector of the handheld from the origin of the coordinate system. The position 

vector is the last column of OpenGL matrix. 

The notation ¯p Book 

dinates. 

Handheld 

¯p Book 

lock lock 

Handheld = ¯pF Book − ¯pF Handheld 

(5.2) 

means the position vector of the handheld device in book coor- 

Orientation: We want to get the orientation of the handheld in book coordinates q Book 

Handheld 

in the quaternion representation. The representation of the matrices is important for 

52

5 User Study 

Figure 5.2: Magic Book coordinate system with its origin in the upper left corner and the handheld 

device with a styrofoam puffer due to ferromagnetic distortion 

further calculations. Because OpenGL is using a column major order and DWARF a 

row major order we first have to transpose both matrices. 

MBook = M T Book 

MHandheld = M T Handheld 

(5.3) 

Both matrices contain pose information in Flock coordinates. Now the corresponding 

quaternions can be derived by a simple method call. 

F lock 

qBook = matrix2quaternion(M T Book ) 

F lock 

qHandheld = matrix2quaternion(M T Handheld ) (5.4) 

To obtain the resulting quaternion, the quaternion representing the orientation of the 

source coordinate system has to be inverted and multiplied with the quaternion of the 

handheld device. 

The resulting pose data consists of ¯p book 

the logging infrastructure. 

q Book 

lock 

Handheld = (qFBook )∗ · q 

handheld 

53 

F lock 

Handheld 

(5.5) 

and qbook 

handheld . This data has to be recorded by

Interference and noise 

5 User Study 

As we said, because ferromagnetic objects may seriously distort measurements it is not possible 

to attach the sensor directly to the handheld device. Therefore the handheld device 

was prepared in a special way. To make sure that the handheld device does not influence the 

measurements it had to be attached in certain distance from the handheld device. To figure 

out the proper distance a straight edge was put on to of the handheld perpendicular to the 

iron stick. Now the sensor was moved constantly along the straight edge from one side to 

the other side. In the middle, on top of the handheld device, a distortion was recognized. 

To visualize this distortion a virtual cube was displayed with the pose information given by 

the Flock of Birds tracker. In the next step the straight edge was attached further away by 

using styrofoam blocks. This step was repeated until no obvious distortion could be recognized 

anymore. A styrofoam block with the proper thickness has been attached on top of 

the device (5.2). But as discussed this is a huge drawback of using a magnetic tracker in this 

setup. 

As a summary we realized that it is necessary to track both, the handheld and the Magic 

Book. We have to calibrate the setup, because we are considering the book coordinate system 

as our world coordinate system. Especially this is important if we want to consider position 

information for future evaluations. 

5.2.2 Tracking of 2d feature point 

The information which feature points are tracked in the texture tracking ARToolkit is transparent 

for the programmer. This means that the tracking routine itself is hidden for the application 

developer. This has the consequence that the tracking method has to be extended 

to obtain the feature point information as well. Due to the texture tracking algorithm, described 

in chapter 2, in every frame at least four feature points suited best are chosen. For 

every feature point the 2D coordinates are tracked: ¯pF P = (px, py). Note that we only have 

a 2 dimensional vector here. Every feature point has a unique identity. This is important for 

the user study so we can recognize that a feature point is continuously tracked over several 

frames and observe its path through the 2D video plane. 

5.2.3 Logging Infrastructure 

To setup a logging infrastructure the existing Magic Book application had to be extended. A 

Flock Of Birds tracking component has to be integrated and and the the method call of the 

tracking routine has to be altered. Both tracking information has to be recorded by a component 

logging this information. Later in the evaluation it must be possible to synchronize the 

data. The logging can be started and stopped by the glut callback functionality (see figure 

5.4). At the end of each tracking frame of the tracking components the data is given to the 

Logger via method call and written into a file. 

Figure 5.3 shows the classes participated in the logging steps. Central component is the 

Logger. It records the pose information given by the texture tracking and the Flock Of Birds. 

Both tracking components attach a timestamp to the pose data. 

54

5 User Study 

Figure 5.3: Static structure of the logging environment 

Figure 5.4: Sequence diagram describing the logging steps 

55

5 User Study 

The data is written to two files. For the 2D feature points coordinates the following data 

is recorded. As we said at least for feature points are need to calculate the viewpoint of the 

user. All these four feature points are considered (see equation 5.6). 

logF eaturepoints = (timestamp, id1, x1, y1, 

id2, x2, y2, 

id3, x3, y3, 

id4, x4, y4) (5.6) 

If the tracking fails for one or several frames id1 is set to ’-1’. This makes it possible to 

count tracking failures. 

The pose data given from the Flock of Birds is logged in the following way (5.7). Each 

component for the quaternion q = (x, y, z, w) = (q0, q1, q2, q3) is considered. ¯v = (x, y, z) 

is the imaginary vector and w the real scalar. Note that we changed the order of the scalar 

and the imaginary vector. This has the reason that the calculation methods provide the 

quaternions in this order. The position ¯p = (px, py, pz) is logged as well, although we do not 

know if we need it for further evaluation. 

logHandheld = (timestamp, q0, q1, q2, q3, px, py, pz) (5.7) 

One thing we have not considered at all in this logging environment is the delay of both 

trackers. This is necessary to determine the exact state of the setup at one point of time. 

Consequences and reasons will be discussed later on in the thesis. 

5.2.4 Task Design 

Somehow we want to force the participants to behave in a certain way to compare the data 

between different participants. Letting them explore the virtual objects randomly might 

not lead to satisfying results. Every user might focus on other objects and animations. But 

this is still an assumption which has to be proved. The idea is that we might have more 

success to compare data of different participants if we give similar tasks to the test persons. 

Task centered user interaction design is one approach to develop specific user interfaces [31]. 

Future user have to be observed in order to know their behavior, to gather information about 

how they handle things and to evaluate special requirements for tasks. So this user study 

could rather be understood as a task observation, not a usability study. Although we will 

possibly collect also information that makes it possible to appraise the usability of the Magic 

Book. We present a categorization for tasks in table top AR applications. These tasks are 

also related to expected actions or behavior of the participant. These tasks are applied in 

the user study, but further on we will discuss if it is possible to abstract them for a variety 

of table top AR applications too. Primary goal is to design user tasks that are easy, so even 

unexperienced participants are able to perform them without further training. First we want 

to introduce the tasks generally and then we will give an example how these tasks have been 

applied in the user study. 

56

Different tasks in table top AR 

5 User Study 

The following user tasks have been identified, relevant for the Magic Book. But as we said, 

later on we will have a look if we can use them for other table top applications as well. 

Overview task: The user has to get an overview of the scene with its virtual, but also real 

objects. It is expected that the user will bring himself in a position where he is able 

to see the whole scene without moving around in order to get an impression. Special 

features of interests might be focused and he might move around slightly. In the Magic 

Book this can be achieved by simply asking a question about the content of the scene. 

A feature in the content of the scene could be a virtual character or an object. 

Focus task: In this task the user is pushed to focus a specific feature of the augmented environment. 

The location of the feature should be obvious to the participant. As an 

expected behavior the user will move closer to the scene and try to hold still to observe 

the feature. Again in the Magic Book through posting a question on a specific virtual 

object. 

Detail task: This task is a combination of the overview and the focus task. While the focus 

task only concentrates on single features, the detail task will force the user to move 

around in the scene to get an overview and move closer to focus on features as well. 

This can be achieved by asking the participants to count objects in the scene, that occur 

at several locations, for example. 

Additionally a free task will be introduced. This should give the opportunity to observe the 

user when he is able to move around without any restrictions. The task can be circumscribed 

as ”understanding the scene”- task. The following example should demonstrate these tasks 

by applying them for a fairy tale scene of the Magic Book. 

Example application of the tasks in the user study 

The questions posted to the participant are related to the scene in figure 5.5. 

In this scene the participant is confronted with the tasks described above. This is done 

with asking questions about different features of the scene. It is not relevant if the answer of 

the participant is right. The focus is on what efforts concerning movements of the handheld 

device are made to explore the features. The following questions are posted. 

• How many people do you see in the scene? 

This is an overview task. All the virtual characters are distributed throughout the scene. 

Thus a position is needed where the user gets an overview of the whole scene. 

• What is the haircolor of the woman with the white skirt and the yellow jersey? 

The participant should focus on a specific feature of the scene. The feature ”women 

with the white skirt” is almost obvious to the user. This is a focus task. 

• How many people wear a hat or a hairdress? 

This is a detail task. The features are spread over the scene and a closer look is necessary 

to answer the question. But it is obvious where the features are located. 

57

5 User Study 

Figure 5.5: Magic Book: In order to let participant perform tasks questions are posted 

In the free task the test person is able to observe the scene without further questions or 

constraints. The user study itself consists of 4 cases. Every case considers one scene displayed 

on one page. Two of these cases are free tasks. In the other two cases questions are 

posted to the user in order to accomplish the demanded task. A full description of the cases 

can be found in the appendix B.2. 

5.2.5 Setup 

The environment for the user study is set up in the HIT Lab demo room. The Magic Book 

application with the logging extension runs on a fast Shuttle PC (P4 3.2Ghz processor with 

1GB DDR400 RAM). The Shuttle PC 4 is plugged to the network and is able to connect to 

the Flock of Bird host system via TCP/IP. The tangible magic book is placed on a table in 

a similar height as the AR kiosk and one sensor is placed in the upper left corner of the 

book, because of the calibration issue. The Flock of Bird tracker does not need a line of sight 

between the sender and the receiver, therefore we do not have to ensure that the participants 

do not cross the line of sight. A participant equipped with the tracked handheld device is 

able to use the application similar to the usual Magic Book setup. The Shuttle PC itself was 

placed on a desk nearby the Magic Book table and I was sitting on this desk in order to post 

the task questions and to start and stop the logging (see figure 5.6). Figure 5.7 shows the top 

view of the user study. 

The best tracking results are achieved if the receivers are close to the sending base station. 

Therefore the distance of the table to the base station was about 1 1 

2 meters. Thus there was 

still enough space for the user to move around freely. 

4 www.shuttle.com 

58

5 User Study 

Figure 5.6: User study setup: The magic book is place on a plate in a certain height. The computer 

system in the back controls the logging and is connected to the Flock of Birds tracker 

Figure 5.7: Top view of the user study setup. The base station is put near the sensors as close as 

possible. 

59

5.2.6 Questionnaire 

5 User Study 

Additionally to the recorded data a questionnaire was given to the test persons. It should not 

take longer than 5 minutes for the test person to answer the questions. This questionnaire 

should give further information about the following issues: 

• Background of participants 

Data about age, occupation and background on AR and the Magic Book is collected. It 

is desirable to have a widespread variety of test persons. AR experts would probably 

behave different that new users, but this is also only an assumption. Data about age 

and occupation were only voluntary, they were not important for the study. 

• Feedback on tracking 

This feedback focuses on the delay and the jittering of the feature tracking technology. 

Both factors affect the immersion of the user in the virtual world. This data was also 

only secondary. The delay issue is interesting, because our approach wants to speed 

up the computation. Our ideas is to reduce the delay caused by the image processing 

routine. 

• Feedback on tasks 

These questions should give feedback on the difficulty of the task. We made a difference 

between the test cases where a user is able to move around without restrictions 

and the test cases where the user has to fulfill certain tasks. 

• Feedback on user interface and usability 

The Magic Book works with a handheld visor a graphical user interface. In the table 

top AR chapter 2 we discussed the rationale of this choice. But still there is the question 

if another user interface is more suited for the Magic Book. This is as well connected 

to the question if the Magic Book is ”easy to use”for unexperienced users. 

• Further comments and feedback 

During user studies the comments that participants express are valuable as well. These 

comments can be used to draw further conclusions for our user study goals, although 

they can not be put in empirical data. 

All of the questions were posted with scalar values from 1 to 5. The collection of the 

tracking data is still the main goal of the user study, thus the questionnaire should only give 

additional feedback and information about the user. Mainly the feedback for the tasks were 

important because as we said we wanted to have ”easy” tasks. If the test person would 

mainly agree that the tasks were difficult, we could hardly compare data sets of expert users 

and participants who are confronted with the Magic Book for the first time. The complete 

questionnaire can be found in the appendix A.1.1. 

60

5.3 Execution of the Study 

5 User Study 

The execution of the user study was mainly done in two sets. This section gives an overview 

of the concrete execution of the user study, concerning the selection of participants, the sequence 

of tasks during the study, time and place and a discussion about the difficulties and 

problems during the study. 

Participants 

The method for getting test persons for the user study was mainly ”hallway testing” for 

saving time. First I asked students and interns at the HIT Lab to join my study. Unfortunately 

most of the students were experts in developing AR applications themselves. But I 

still asked the test persons to spread the word and so I was able to test unexperienced users 

as well. At a whole 20 test persons were joining the study and the level of expertise was distributed, 

which is satisfying for my purposes (see figure 5.8). I recognized that experienced 

AR developers are more critical concerning the tracking and UI technologies. On the other 

hand test persons who are confronted with AR the first time are very fascinated. This was 

my experience while talking with the participants. 

Figure 5.8: Overview of the expertise of the user study participants on AR and the Magic Book. The 

scale was from 1 (”never heard of it”) to 5 (”experienced developer”). Overall 20 participants joined 

the study 

Place and time 

Due to the shared resources of the demo room the study had to be split in two sets. Thus 

the setup had to be built up for several times including a pilot run. In the pilot run mainly 

the questions on the virtual scene were tested with a satisfying result. Setting up several 

times has a huge disadvantage, because it is hard to set up with the same conditions for the 

test persons twice. Even the light conditions influencing the optical tracking depend on the 

time of day. Also we had to be very careful with the Flock Of Bird tracker. The demo room 

was equipped with some CRT monitor setups and computers which might interfere with the 

61

5 User Study 

magnetic field of the Flock of Bird. Thus a lot of testing was required prior to the conduction 

of the user study. But in the end the results achieved with this setup were satisfying. 

Steps in the user study 

As an introduction for the participant I figured out a guideline for the study (see A.1.2). 

The participant should know about the execution and the purpose of the study. Details 

probably influencing the behavior are hidden to the user. Also the usage of the Magic Book 

is introduced to unexperienced users. Important was also to let the user know that he can 

not do anything wrong or give a wrong answer to the task questions. The participant was 

allowed to ask questions during the study as well. For myself I figured out a schedule with 

the single steps during the study (see A.1.2). 

1. Practice 

The first step should give the test person the possibility to get used to the usage of 

the Magic Book. The participant should figure out which movements of the handheld 

device are allowed by the setup, especially by the tracking routine. If the tracking fails 

the reinitialization of the tracking by looking on the marker was explained. 

2. Case 1: Free Task 

The movements of the handheld and the feature point tracking information was recorded 

for 30 seconds during this task. There were no restrictions for the user, expect for the 

task to understand the scene. 

3. Case 2 

Now special questions were posted to the user about features of the current scene. The 

categorization of tasks introduced earlier in this thesis was applied here. Prior to the 

study special scenes, pages in the Magic Book, were chosen that were suited best for 

that purpose. One property of those scenes was that they had more features relative to 

the other scenes. The user should answer those questions as soon as possible. There 

was no time restriction. The user had to accomplish 5 tasks in this case. 

4. Case 3: Free task 

This case is similar to the first case, except that another scene was chosen for it (again 

30 seconds). 

5. Case 4 

This case again is similar to case 2. A suited scene was selected for it. In this case the 

participant had to succeed 4 tasks. 

6. Questionnaire 

After the 4 cases the questionnaire was handed out for the user. 

7. Gather further comments and feedback 

Additionally to the questionnaire the participant had the possibility to make further 

comments and encouragements. With most the participants it was possible to chat 

about the tracking technologies and the Magic Book. Also some people were interested 

in the results of the study. 

62

5 User Study 

For the user study only 4 scenes were needed. But most of the participants wanted to 

enjoy the whole fairy tale consisting of 8 scenes. A complete description of the scenes used 

for the cases and the single tasks with the corresponding results of the evaluation can be 

found in the appendix (see B.2). In figure 5.9 a participant during the user study can be seen. 

Problems and difficulties 

Figure 5.9: A participant during the user study with the handheld visor 

As already mentioned the availability of the demo room was one restriction to the execution 

and preparation of the study. So I had to set up the study environment several times and 

I had to ensure to have almost the same conditions for every run. Another important issue 

which was not considered in the study setup were the different delays of the trackers. In 

order to synchronize both tracking data sets in the evaluation of the study the states of both 

tracked objects at one point of time have to match. This is a very difficult tasks to estimate the 

tracking delays. The delay of the texture tracking is mainly caused by the transport of video 

data from the camera in the main memory and by the image processing routine. The Flock 

of Bird tracker first has to transfer the data from the receiver to the host system. Then the 

data is transfered via network to the Shuttle PC. To consider both latencies correctly delays 

measurements have to be made. This is done by using a reference tracking device where the 

delays is know. Shaw proposes an experiment to measure this delay [32]. Performing this is 

hard and time intensive. Thus we have to consider a small shift in our data measurements. 

But future setups should put the effort to measure the delay offset of both trackers. 

5.4 Summary 

In this chapter we described the design and conduction of the user study. Also the difficulties 

have been discussed. But finally we recorded two sets of data: the 2D feature point 

coordinates given by the texture tracking algorithm and the pose information given by the 

63

5 User Study 

Flock Of Bird tracker. In addition to this we obtained feedback by the user study participants. 

The further evaluation will focus on three different aspects now. First we will have 

a look at the feature point tracking itself. As we said in chapter 3 the feature point moves 

on a path if it it tracked for several frames (see figure 3.1). We will try to discover patterns 

in the paths of the tracked feature points and draw some conclusions. These patterns can be 

used to explore the relationship with the handheld orientation (see the intersection between 

feature point coordinate and handheld orientation in figure 5.10). 

We gave a categorization of tasks for table top AR, especially suited for the Magic Book. 

We will also try to evaluate if properties of these feature point patterns give hints on the 

performed task (intersection between feature point coordinates and task). 

Figure 5.10: Overview of further evaluations 

Still only a view aspects of this evaluation can be discussed in this thesis. But some further 

ideas on how to continue this research area will be provided and discussed in the further 

chapters. 

64

CHAPTER 6 

Evaluation of the User Study 

Now we have performed the user study and we have collected the desired data. Now we 

have to discuss ideas how to evaluate and analyze the derived data sets. On the one hand the 

feature point coordinates and on the other hand the pose information of the handheld device 

have been recorded. This chapter tries to evaluate the retrieved data on different aspects. We 

can consider an evaluation of each data set alone or find dependencies (see figure 5.10). 

The results will be presented and conclusions for table top AR will be drawn. In order to 

get a full understanding of the relationship between tracking, user interface and user also 

additional ideas for future work will be discussed. 

6.1 Evaluation of the User Study 

In the last chapter we introduced the evaluation scheme to highlight mainly three different 

aspects of the user study: 

1. Analyzing the feature point data sets 

We will show that the tracking of feature points results in certain pattern in the logging 

data. We will use these patterns for our further evaluations. 

2. Finding a linear mapping between 2D feature point coordinates and the orientation of 

the handheld device 

This part describes an approach to find a correlation between the data sets of 2D feature 

point coordinates and the change of orientation given by the magnetic tracker. To do 

this we have to refine the data in order to compare it, because now we only have 

heterogeneous data: 2D coordinates and 3D orientation. The method used to find a 

linear mapping is called linear regression. 

3. Finding a relationship between the performed task and properties of the tracked feature 

points. 

65

6 Evaluation of the User Study 

We will see that certain properties of the tracked feature points change with the task. 

This leads to a task-based approach. This means that the search size could also be 

adopted to the performed task. 

The results from the questionnaire will be used to have additional ideas if we evaluate a 

certain topic. In addition to this approach further ideas for evaluations will be discussed. 

6.1.1 Feature Point Tracking 

In every feature point tracking frame four feature point are tracked. Thus the logging data 

contains these four feature point coordinates (see 5.6) or ’-1’ if the tracking fails in one frame. 

First we want to have a look at the tracking failures. 

Tracking Failures 

Tracking failures are caused by fast or unsuited movements in a way that feature points can 

not be found again in the next frame. The user has to reinitialize the tracking by looking on 

the square marker again. Figure 6.1 compares the two free tasks (case 1 and case 3) where the 

user is allowed to use the Magic Book without any restrictions. Case 1 is performed prior to 

case 3. Each case took 30 seconds. In the time between case 2 was performed. Hence while 

performing case 2 the participants got additional ”training” with the Magic Book. 

Case 1 Mean Tracking Failure Max Min Variance 

10.60 35 0 103.62 

Case 3 Mean Tracking Failure Max Min Variance 

2.25 13 0 8.30 

Table 6.1: Tracking Failures during case 1 and 3: The chart shows that the tracking failure in case 3 is 

lower and does not vary like in case 1 

On a first view it is remarkable that the mean tracking failure in case 1 is almost five 

times higher than in case 3. Also the variance of the tracking failures leads to the conclusion 

than the usage of the Magic Book in case 1 is more heterogeneous than in case 3. This first 

interesting consideration could be caused by two different aspects: 

• Different content leads to these values 

This aspect could be a reason for the differences is these values. But our experience was 

that the Magic Book content does not lead to a huge mismatch of the user behavior. The 

compositions of the virtual scenes are similar throughout the Magic Book fairy tale. 

• Learning the usage of the Magic Book leads to these values 

As we claimed in the introduction users adopt to the underlying tracking technology. 

The tracking failure measurements give evidence for this assumption. The usage in 

case 1 is very heterogeneous but users learn which movements lead to tracking failures 

and apply this knowledge in Case 3. 

66

This lead to our first insight: 


Result: Tracking Failure 

Users adopt to tracking. The comparison of tracking failures show that these failures 

decrease with more practice. This important aspect could be considered in our 

gyroscope runtime setup. In the beginning always an offset is added to the search 

window size. This takes the learning aspect into account. 

One possibility to figure out if the Magic Book content influences the user behavior to a 

high degree would have been to perform another free task with different content. This case 

could be compared with case 3. But it is assumed that there will not be a huge difference due 

to the fact that the arrangement of the 3D scenes is very similar. 

Interesting is also if test persons who considered themselves as very familiar with the 

Magic Book cause less tracking failures than non-experienced users. We split the participants 

into two groups. The non-experienced group, assessing themselves from 1 to 3 on the 

questionnaire scale and the experienced (4-5). The following chart shows the results: 

Experienced Non-Experienced 

Case 1 8.70 12.50 

Case 3 2.60 2.50 

Table 6.2: Mean Tracking Failure of the experienced and non-experienced group 

The figures show that there is no serious difference between these groups. But the data 

have to be considered with caution because every group has outliers with failures up to 28 

and more persons have to be tested in order to have a reliable result. But my observations 

were that also experienced test persons had to get used to the Magic Book first again. 

Connected Chains Pattern 

As we have already discussed in chapter 3 we want to analyze resulting paths of a feature 

point tracked over several frames. This is easily done by comparing the logged IDs of the 

feature points. If the IDs of feature points match in two following tracking frames we know 

that we can calculate the coordinate offset between the 2D coordinates. If a feature point is 

tracked over more than one frame we will call the resulting pattern connected chain. The feature 

point in a connected chain is not necessarily logged on the same feature point position 

(1-4) in the logged data. For example it is possible that feature point with id x is logged on 

position F P1 in frame i and on position F P4 in frame i + 1. Therefore it is necessary to sort 

the logged data to retrieve these connected chains at one position. In figure 6.1 a 3D plot of 

the sorted data is shown. The purpose of this plot is to see how the connected chains look 

like. Each feature point position F P1 − F P4 is colored in a different way. For further evaluation 

we will only consider the feature points at position F P1 (red color). Reason for that 

is that the sorting is done in a way that the longest matches will be at position F P1. In this 

special case the logged data exists only of one chain (see figure 6.1). In figure 6.2 the data 

67


set at F P1 consists of 27 connected chains. Both plots were taken from the same task with 

different test persons. 

Figure 6.1: Example 1: 3D Plot of the logged feature point coordinates: It shows the (x, y) coordinates 

in the video plane in the corresponding tracking frame. This data set of F P1 consists of only one 

connected chain (red chain). The time is measures in milliseconds 

The corresponding 2D plot of these both data sets can be seen in figure 6.3. 

It is obvious that it is possible to derive information about the occurring movement of the 

handheld device from these connected chains. The first plot (6.1) leads to the conclusion 

that no heavy movement occurred, because the tracking results in only one connected chain. 

In the second figure (6.2) 27 connected chains occur. The 2D view of the video plane (6.3) 

underlines this because the area covered by the tracked feature points in the second plot is 

obviously larger than in the first one. 

A interesting number derived from these connected chains is the shift of one feature point 

between two tracking frames. This shift is the length li,i+1 of the vector from the 2D position 

pi = (xi, yi) in frame i to the position pi+1 = (xi+1, yi+1) in frame i + 1 (see figure 6.4). The 

length of the shift vector can be calculated easily: 

li,i+1 = (xi+1 − xi) 2 + (yi+1 − yi) 2 (6.1) 

Thus we have two values now if we want to analyze the connected chains: the lengths of 

the shift vectors and the number of connected chains in a data set. Note that the number of 

connected chains is a specific property of the ARToolkit feature point tracking algorithm. 

68


Figure 6.2: Example 2: 3D Plot of the connected chains. The data set at position F P1 (red) consists of 

27 connected chains 

Figure 6.3: 2D plots of the data sets are shown: on the left side the corresponding plot with one chain 

(example 1) and on the right side the corresponding plot with 27 chains (example 2) 

69


Figure 6.4: Vector of feature point position from frame i to frame i + 1 

Result: Connected chains 

During the analysis of feature point tracking data we discovered a pattern that we 

call connected chains. For further evaluations we will consider the length of shift 

vectors. 

The next step will take the logged orientation of the handheld display into account. This 

would correspond to the intersection of feature point tracking and handheld orientation in 

figure 5.10. 

6.1.2 Feature Point Tracking and Tracking of the Handheld 

First we have to refine the logged data in a way that we can compare it. The idea is that 

we use the shift of the feature points on the one hand and the change in orientation on the 

other hand. The shift can be measured by calculating the vector length using the coordinates 

offsets as described in 6.1. Note that we can only calculate the vector length at connected 

chains, because the same feature point is tracked over several frames. In order to synchronize 

the data sets, the start and the end of the vector has to be annotated with the orientation 

valid at the corresponding time (see figure 6.5). The timestamps are used to perform this 

synchronization. The update rates of both trackers are constant. The Magnetic tracker has 

an update rate of 90 fps (frames per second) and the vision based tracker works with an 

update rate of 30 fps. Thus the magnetic tracker is about 3 times faster than the optical 

tracker. 

As we discussed during the design of the user study this step is not done in an optimal way 

because the delay offset between the vision based and the magnetic tracker is not considered. 

Now we have to calculate the change in orientation between the orientation at the beginning 

and at the end of the vector. Due to the fact that we logged quaternions we have to calculate 

the difference of the quaternions to get the anglular offset. For every tracking frame i + 1 we 

can derive data pairs di+1 with 

di+1 = d(xi+1, yi+1) = (∆(qk, qk+n), l (i,i+1)) (6.2) 

70


Figure 6.5: The beginning and the end of the shift vector is annotated with the orientation using the 

timestamps 

Difference between a quaternion rotation 

The calculation of the difference between a quaternion rotation is applied similar to Strasser’s 

diploma thesis [53] which should be considered for a deeper look. As a repetition a quater- 

θ)¯v) specifies a rotation of θ = 2arccos(s) around the axis ¯v. 

nion q = (s, sin( 1 

2 

q = (s, sin( 1 

θ)¯v) ⇒ θ = 2arccos(s) (6.3) 

2 

This leads to the result that we only have to consider the scalar value s of a quaternion 

if we want to derive the angle θ. Now the difference d between two quaternions p and q is 

calculated by multiplying the conjugate of the first quaternions p∗ = (w, −¯v) with the second 

quaternion q = (w ′ , ¯v ′ ). This is also called the derivation of a quaternion: 

d = p ∗ q = (ww ′ − ¯v · (−¯v ′ ), ¯v × (−¯v ′ ) + w(−¯v ′ ) + w ′ ¯v) (6.4) 

Note that · is the scalar multiplication and × the vector product in R 3 . As we have seen in 

6.3 we only need to consider the scalar part of 6.4 now: 

∆w = ww ′ + xx ′ + yy ′ + zz ′ (6.5) 

with ¯v = (x, y, z) and ¯v ′ = (x ′ , y ′ , z ′ ). Now we can compute ∆Θ: 

∆Θ = 2arccos(∆w) = 2arccos( ww ′ + xx ′ + yy ′ + zz ′ ) (6.6) 

In order to calculate the angle in the right quadrant by the arccos function the absolute 

value of ∆w must be taken. 

Now the angle between the two quaternions qk and qk+n can be computed by using equation 

6.6: 

∆Θk,k+n = 2arccos(|wkwk+n + xkxk+n + ykyk+n + zkzk+n|) (6.7) 

71


For all connected chains we will compute the corresponding data pairs. A data pair consists 

of the angular offset of the handheld device between the start and the end of an optical 

tracking frame the length of the shift vector. 

di+1 = d(xi+1, yi+1) = (∆Θk,k+n), l (i,i+1)) (6.8) 

The next step is to characterize the relationship between the vector length and angle measurements. 

Correlation and Regression of the Data Pairs 

As we have already discussed we now want to have a measurement how the 3D orientation 

and the 2D feature point coordinates are related to each other. In our pre-considerations 

above we refined our data to have comparable data pairs di consisting of the angle between 

the quaternions and the length of the shift vectors at the corresponding timestamps. The 

first question is how the data sets are related to each other. Then we want to characterize 

this relationship and develop a linear mapping. We will use two statistical techniques to 

evaluate the data pairs: 

Correlation. The correlation gives a measurement to what degree two sets of data are related 

to each other. The Correlation Coefficient r can be calculated by every standard statistical 

tool and gives a value between -1 and 1. The idea is very easy: If we would plot all the 

data pairs d(x, y) with x on the x-axis and y on the y-axis and all the point would fall on 

one straight line the correlation coefficient would become |r| = 1. This is an indicator 

for a very strong relationship. On the opposite the correlation coefficient tends to 0 if 

the points would be randomly spread. If the value of y increases with higher values of 

x the coefficient is positive. Otherwise the coefficient has a negative sign. These plots 

are also called scattered plots. 

Regression. The regression characterizes a relationship of two measurements. We will concentrate 

on linear regression only. The linear regression computes a linear model of the 

measurements in order to predict the dependent variable Y using the predictor variable 

X. The resulting function look like this: 

Y = a + bX (6.9) 

The linear regression estimates the values for a and b. If we would look at the plot this 

function is represented by best suited straight line to minimize the difference from the 

actual measurements. These differences are called residuals. If the relationship can not 

be described by a linear model a multiple regression has to be applied which will not 

be discussed here. 

An introduction to correlation and linear regression can be found in the appendix of this 

thesis A.2. 

We want to derive a linear model for our measurements. We want to predict the length 

of the shift vector (dependent variable L) using the angular offset (independent variable A). 

The linear regression calculates the values for a and b in this function: 

72


L = a + bA (6.10) 

In order to have an overall conclusion about the correlation and regression we concatenated 

the data sets of all the test persons in every test case. This increases the number of 

measurements and leads to a more accurate estimation of the linear model. The following 

example is taken from test case 2. It shows the measurements derived by the data of task 2. 

Figure 6.6 shows the measurements in a scattered plot with the angle A on the x-axis and the 

vector length L on the y-axis. 

Figure 6.6: Case 2 / Task 1: The scattered plot with regression line 

The correlation analysis with this task leads to a correlation coefficient of 0.6340. This gives 

an indicator that the measurements are related. If we have a look at the plot we see that the 

the length of the shift vector increases with higher angles. Thus the coefficient is positive. 

The regression analysis estimates a = 1.26631 and b = 194.822. This leads to the following 

linear model: 

L = f(A) = 1.26631 + 194.822 · A 

Of course this regression function is only an estimation. The output value is the length of 

the shift vector which is exactly the search window size needed in order not to produce a 

tracking failure. Thus the feature point lies somewhere on the circle with radius f(A) around 

the position of the feature point in the previous frame (figure 6.7). 

If all the points for every input angle would fall on the circle the correlation coefficient 

would be r = 1 and we would have a perfect correlation. Now we have a look at the 

residuals. As we said briefly the residuals are the differences between the linear model and 

73


Figure 6.7: The feature point falls on a circle with radius r = f(A): Point inside the circle are residuals 

below the regression line, points outside the circle are residuals above the regression line 

the actual measurements, they would cause errors for our search window configuration. 

If we look at figures 6.7 and 6.6 we see that residuals below the regression line will not 

result in tracking failures, because the search window with length lW indow = 2f(A) is larger 

than actually needed in the current frame. The residuals above the regression line would 

cause failures. They are located outside the search window. A possible solution to solve this 

problem is to add a specific offset to f(A). Thus the regression line would shift on the y-axis 

and more points would fall below the line. If all the points should fall below the regression 

line we can chose the maximum positive residual as offset: 

lengthi = f(anglei) + max(A measured − f(A) | (A measured − f(A)) > 0) 

But the adequate offset still has to be evaluated, which is not part of this thesis. The 

maximum residual as offset could be an outlier and the search window would be far too 

large in consideration of all the other measurements. 

Analyzing the different Tasks 

As a short repetition the following steps have to be made in order to analyze all the data sets 

for the different tasks: 

1. Extract the connected chains for each data set 

2. Calculate the shift vector lengths of the connected chains (6.1) 

3. Synchronize each start and end of the shift vector with the corresponding quaternions 

(6.5) 

4. Compute the difference between the quaternions in order to obtain data pairs d = (y, x) 

(6.7, 6.8) 

74


5. Concatenate all the data pairs for one test person in one task, because every task will 

result in several connected chains. As we have seen in example 2 we have retrieved 27 

connected chains. Each chains has to be refined according to the previous step and all 

the data pairs have to be united. 

6. Concatenate the data pairs of all the test persons for one task. As we have said we 

want to have an overview of all the data of a single task. Therefore the data pairs of 

every test person are united. 

7. Now for every task the linear model is calculated (6.10) with a statistical tool. 

The following chart show the estimations of the correlation and the linear regression for 

every test case and the corresponding tasks. The following numbers are shown in the tabular: 

the correlation coefficient r, the parameters a and b of the linear model. The value h 

which gives a number for the angular rate of turn causing an offset of the search window 

size of 1. It is just a solution to the equation ∆yi = b∆xi ⇒ 1 = b∆xi. 

r a b h 

Case 1 

Free Task 

Case 2 

0.3239 1.56895 144.624 0.0069 

Task 1 0.6340 1.26631 194.822 0.0051 

Task 2 0.5511 1.45847 177.196 0.0056 

Task 3 0.5296 1.78150 193.941 0.0051 

Task 4 0.5949 1.76134 220,166 0.0045 

Task 5 

Case 3 

0.6254 1.34886 219,083 0.0046 

Free Task 

Case 4 

0.4642 1.55991 127.487 0.0078 

Task 1 0.5682 1.16379 190.290 0.0052 

Task 2 0.4934 1.48687 131.274 0.0076 

Task 3 0.4366 1.74120 130.608 0.0076 

Task 4 0.5216 1.49365 197,812 0.0051 

Table 6.3: Results of the linear regression with the correlation coefficient r, the parameters a and b of 

the linear model. The value h which gives a number for the angular rate of turn causing an offset of 

the search window size of 1. 

The statistical tool also computes a p-value. This p-value is the probability that we make 

a mistake if we deny the assumption that the angle and the shift vector are not related at all 

(the null-hypothesis H0). In our linear model this assumption means that the b-parameter is 

zero (H0 : b = 0), thus the angle does not influence the length of the shift vector at all. The 

p-value for all the tasks is 0.00001. With a probability of 0.00001 we would make a mistake 

if we deny the hypothesis that these data sets are not related. In other words it is highly 

significant that our data sets are related, the angle influences the vector length. Please have a 

look in the appendix for further explanations (A.2). This underlines exactly our expectations. 

75


If we look at the data now we see that we have a positive correlation r > 0 in every case. 

The coefficient in case 1 r = 0.3239 is very low compared to the other cases. Reasons for that 

could be that the usage of the Magic Book in the beginning is not homogeneous. Our first 

result that users adopt to tracking underlines this. If we exclude case 1 the r has a range from 

0.44 to 0.63 which indicates a relation. Possible factors influencing the correlation coefficient 

should be taken into account: 

• Delay of Trackers 

In our setup we have not considered the tracking delay of both tracking technologies. 

To calculate this delay huge efforts have to be made. We do not have exact numbers for 

the delay of each single tracker and the efforts for a measurement would be too high 

for this thesis. An idea to compensate this is to shift the data sets. The shift maximizing 

r has to be found. 

• Change in position as well 

It still has to be analyzed to what degree the change of position has to be taken into 

account, because movements in position occur. Thus the position data of the handheld 

device has to be explored as well for further evaluations. 

Summarizing all these aspects lead to the following result: 

Result: Correlation 

The change in orientation and the tracking of feature points are related significantly. 

The correlation coefficient r is positive and ranging from 0.44 to 0.63. 

Let us now look at the values computed for the linear models. The value for parameter a 

ranges from 1.27 to 1.78. If we round it to the next possible number: 

⌈a⌉ = 2 (6.11) 

This value for a is valid for all cases. b on the other hand ranges from 127.49 to 220.166 and 

is responsible for the slope of the regression line. If we want to derive a global configuration 

for all the cases and the corresponding tasks we have to take the maximum value for b. In 

the next section we will discuss if we can find a relationship between the tracking data and 

the performed task. Thus the linear model could be altered to the task. If we would take the 

maximum value for a global configuration the linear model looks like this: 

y = 2 + 220.166x (6.12) 

To calculate the search area size l 2 now we still have to add an offset k now to address the 

problem of positive residuals. This offset has to be evaluated in further research. Therefore 

the following equation is only a raw estimation: 

l 2 = (2 ⌈(2 + 220.166x + k)⌉) 2 

76 

(6.13)


Result: Regression 

We can derive a linear mapping f(x) with the angular difference as input parameter. 

Thus we can predict the shift of the tracked feature point and therefore the 

necessary search window size. This mapping is only an approximation and therefore 

an additional offset has to be added to the search window size. 

The scattered plots and the output data for each test case can be examined in the corresponding 

appendix section A.2. In the following section we try to derive conclusions about 

the relationship between tracking and tasks. 

6.1.3 Feature Point Tracking and Tasks 

First lets have a look at the overview of our evaluation again (6.8). 

Figure 6.8: Overview of further evaluations 

In the last section we described the relationship of the feature point tracking and the handheld 

orientation. We want to obtain additional information about the relationship between 

feature point tracking and tasks (intersection feature point coordinates and tasks in 6.8). Additional 

feedback for the tasks was collected from the questionnaire we handed out to the 

test persons. We asked the test person if they think the tasks were easy to accomplish. This 

was done just to have a feeling about the tasks, because we wanted to give easy tasks to the 

participants. We collected feedback for the free tasks as well as the other ”navigation” tasks 

(see figure 6.9). The figure shows that most of the test persons agree that all the tasks were 

easy. The free task was obviously easier than the navigation tasks, due to the fact that no 

restrictions were made. 

We will use information that we have already derived (6.1.2, 6.1.2). The following chart 

(6.4) shows the correlation coefficient, mean shift vector length and the mean number of 

connected chains for every task. With this information we will try to derive conclusions of 

the performed task. 

77


Figure 6.9: Results of the feedback to the question ”The task was easy to perform” (1=strongly disagree, 

5=strongly agree) 

r ¯s var(¯s) ¯t ¯c/time 

Case 1 

Free Task 

Case 2 

0.3230 2.34 0.60 30 1.55 

Task 1 0.6340 1.91 0.72 10.36 1.47 

Task 2 0.5511 2.33 0.34 10.97 2.26 

Task 3 0.5296 2.79 1.26 7.89 2.99 

Task 4 0.5949 2.93 1.16 12.77 2.68 

Task 5 

Case 3 

0.6254 2.13 1.11 12.40 1.80 

Free Task 

Case 4 

0.4642 2.33 0.61 30 2.05 

Task 1 0.5682 1.97 0.80 9.25 1.76 

Task 2 0.4934 2.07 0.45 18.39 2.15 

Task 3 0.4366 2.51 0.70 20.91 2.37 

Task 4 0.5216 2.39 0.72 12.33 1.83 

Table 6.4: Results of Feature Point Tracking Evaluation: r is the correlation coefficient, ¯s is the mean 

length of the shift vector, var(¯s) is the variance of the shift vector length, ¯t is the mean time needed to 

perform the task and ¯c/time is the number of connected chains in relation the needed time 

78


In the further evaluation we will not consider the free tasks because it is hard to conclude 

to a common behavior in this case. Just on a first glimpse we can see that the mean vector 

shift ¯s is equal in case 1 and case 3. The idea of the tasks was to enforce the user to perform 

specific actions. We will now have a look at the values in the chart and conclude to the 

performed actions. A full overview of the tasks can be found in the appendix B.2. 

First we try to separate the task into three groups. One group g1 with a rather low vector 

shift ¯s < 2.20 and another group g2 with a high value for ¯s > 2.50. The third group g3 is 

formed by the rest. We will use an abbreviation for every task: Case 2 Task 1 will be C2T 1 

for example. 

• Group g1 

g1 = {C2T 1, C2T 5, C4T 1, C4T 2, } 

g2 = {C2T 3, C2T 4, C4T 3} 

g3 = {C2T 2, C2T 4} (6.14) 

The question for the tasks in g1 was to count features in the scene, like people or items. 

These object were obvious to the user and distributed all over the scene. Thus the user 

can view the scene from a certain distance with the whole scene in his viewpoint to 

accomplish the tasks. These are the overview tasks. Because the user wants to see 

the whole scene movements of the handheld visor will result in small pixel offsets. 

Another indicator underlining this is the number of chains. The shift vector length is 

not big and therefore it is more likely that a feature point is tracked in the next frame 

again. This results in less connected chains. Except for C4T 2 these tasks have very 

good values for the correlation coefficient r, thus the linear model for those tasks are 

more reliable. The time to perform C4T 2 is higher than the other tasks in g1, this might 

give hints that the task is more ”difficult” than the other tasks in this group. General 

it took more time to perform C4T 2 and C4T 3 and the coefficients r are low compared 

to the other tasks. But we definitely found characteristics for overview tasks using the 

numbers in the chart. 

• Group g2 

The values for ¯s, the mean length of the shift vector, are high. As we have said if the 

camera is close to the 2D surface small movements will cause a large pixel offset. And 

indeed all of the tasks in this group were to done to let the user focus on a specific 

feature. Questions like ”What is the eye color of the man?” or ”What is the color of her 

shoes?”were posted. Thus the user has to ”zoom” into the scene which causes that 

the camera will get closer to the surface. We can characterize the focus task with this 

behavior. The question for task C2T 4 was ”How many people wear a headdress?”. 

With our definition of tasks it is rather a detail task than a focus task, it is more a 

combination of overview and focus task. But with my observations during the study i 

could recognize that the test persons had to focus on a certain person to figure out the 

headdress. Also the number of chains is relatively high in these tasks. 

• Group g3 

79


According to our statements made for the other groups we can expect this group to 

have a in-between property, because of the values for ¯s being in the middle range. 

This property again is the distance of the camera to the 2D plane. This leads to the 

conclusion that the distance between the camera and the surface for this group of tasks 

are between the overview (large distance) and focus (short distance) task. Lets have 

a look at the questions: Both questions were asking about certain features again. In 

contrast to the features in the focus task, it is not necessary to zoom into the scene to a 

high degree. But on the other hand these questions could not be answered only with 

having an overview of the scene. The number of chains is in the middle range as well. 

Because the number of chains is a specific property of the ARToolkit feature point tracking 

we use the vector shift length as an indicator. Using this number we can distinguish between 

our tasks. Here is a summary of the characteristics of each task: 

• Overview Task 

The user places the camera in a certain distance that allows to have the whole 3D 

scene or everything else important for the table top application in his viewpoint. Small 

movements of the camera result in small pixel offsets (the length of the shift vector). 

The best results for the linear model can be achieved. 

• Detail Task 

We have to refine our definition of detail tasks. We said that it is a combination of 

overview and focus task. Thus the user has to zoom in the scene from time to time to 

explore certain features. But the user does not need to zoom real close in the scene to 

perform the task. Being closer to the plane causes a larger pixel offset compare to the 

overview task. 

• Focus Task To accomplish this task the user has to zoom close to the 2D plane. Thus 

even small movements will result in large pixel offsets. 

Result: Categorization of tasks 

We were able to find a categorization for the proposed tasks and underline this with 

the corresponding evaluations of the user study. 

For all this results the position data of the handheld device could be taken into account as 

well. Due to time limitations I was not able to perform a deeper evaluation of the recorded 

data. 

6.1.4 Further Evaluations 

In the questionnaire also feedback to the experience of the participant concerning tracking 

and user interface was collected. But this feedback is not used for additional evaluations 

here, because the focus was on analyzing the recorded data. But it was a good possibility 

to retrieve this information as well. The questions were structured into usability of the 

80


Magic Book, the awareness of jittering and delay of the vision based tracker and the handheld 

device user interface. One interesting thing I recognized while looking on the tracking 

questions was that the participants were more aware of jittering than tracking delay. The 

test person were basically satisfied with the usability of the Magic Book, but there was a 

divergence in the answers to the question if the handheld device is a ”very suited device 

for interacting with the Magic Book”. It would be interesting to evaluate this question with 

other graphical output devices. All see data from the questionnaire can be seen in the appendix 

B.1. 

In the next chapter we will summarize all the results of the user study. Further on conclusions 

on our user study based evaluation are drawn. Implication for future work will be 

discussed as well. 

81

CHAPTER 7 

Conclusions 

This chapter summarizes the experiences and results of the thesis concerning table top AR. 

It also reviews the approach of finding a mapping between user movement and tracking 

parameters. Ideas and implications for future work will be shown. 

7.1 Results 

In this section we will summarize the results obtained from the evaluation of the user study. 

First let us have a look at our motivation for the user study again. Our idea was to characterize 

the relationship between feature point tracking and user behavior. The usage of the 

system was done by moving the video camera which is attached to the handheld device. 

This information should be used to configure our runtime setup consisting of a combination 

of a gyroscope and an optical tracker. The orientation of the gyroscope is mapped on the 

search window size of the tracking routine. In previous setups the search window size is a 

constant parameter. We want to alter it during runtime according to the change of relative 

orientation of the gyroscope running with a higher frame rate than the optical tracker. 

Due to the fact that the texture tracking ARToolkit is a special technology for natural feature 

tracking we will shortly discuss the importance for other natural feature tracking algorithms. 

Further on we will try to abstract our results and ideas for table top AR applications. 

7.1.1 Results of the User Study 

The question was: ”To what degree do changes in orientation influence the pixel offset from 

frame to frame in the tracking routine?”. Therefore we collected data of the handheld movement 

(magnetic tracker) and the feature point coordinates in the 2D video plane. The evaluation 

of the user study led to the following results: 

1. Relative orientation and feature point offset between two frames are related 

We applied a statistical technique called correlation analysis. The correlation coefficient 

r gives information to what degree two measurements are related. Analyzing the tasks 

82

7 Conclusions 

we derived values for r from 0.45 up to 0.63 which indicates a moderate relationship 

between the two sets. This result allows us to consider the relative orientation for our 

runtime setup. 

2. We can derive a linear model for this dependency 

Using a regression analysis we derived approximations for linear relationships between 

both measurements X and Y of the form: 

Y = a + bX + offset (7.1) 

The linear regression estimates values for a and b. The values differ from case to case. 

But we can obtain a raw approximation for a global setup. Future work is to optimize 

this model for each case. Also a suited value for the constant offset has to be estimated. 

The output of the regression analysis also indicates a high significance for the 

dependency between the two data sets. 

3. Tasks 

We introduced a categorization for tasks and indeed the tracking results differ in the 

mean pixel offset in each task group. We proposed that the categorization is dependent 

on the usage of the handheld and thus the camera. If the user holds the camera far 

away from the 2D surface he wants to have an overview. If he zooms more into the 

scene, but he is still not concentrating on a single feature we can speak of a detail task. 

If the user zooms into the scene to examine a single feature we explored the focus task. 

4. Tracking Failures 

We showed that the tracking failures when a user starts to use the application are significantly 

high. Thus users adopt to the tracking technology. This can be considered 

in the runtime configuration as well. We enlarge the search window for a specific time 

and reduce it stepwise during usage. But more evaluations are necessary to explore 

this behavior. 

The results give concrete steps in order to find a good configuration for our runtime setup. 

We proposed a DWARF based architecture for a dynamic configuration of the search window 

during runtime. The software architecture provided in chapter 4 has to be extended with the 

linear mapping. A test environment should enable the developer to test the parameters for 

the linear model. Thus the offset of the linear model can be estimated through testing as 

well. 

Potentials for further evaluation 

If we have a look at our evaluation overview again 6.8 we can see that we have not discussed 

the intersection of the tasks and the movement information of the handheld device yet. This 

aspect can be evaluated as well. As we said that the tasks can be distinguished by distance 

from the 2D plane, the recorded position data could be used to underline this. Also we have 

not compared the data of the test persons or of only one test person during all the tasks. Our 

first issue was to have a look at a global configuration. If the usage is different from user to 

user a possible idea would be to estimate user profiles for the runtime configuration. 

83

7 Conclusions 

Another important issue is that we still do not predict the search window size. We only 

found a relationship between the orientation and feature point tracking. Thus an adequate 

mechanism to predict the window size has to be evaluated. The property that the gyroscope 

has higher update rates should be utilized and the window size has to be set before the next 

tracking frame of the optical tracker starts. 

Next we will have a look how we can use the information gathered in the user study for 

natural feature tracking and table top AR in general. 

7.1.2 Natural Feature Tracking 

One thing that all the natural feature tracking algorithms have in common is that certain 

feature points are tracked. The texture tracking of the ARTookit is a special case, because it 

only enables the tracking of preprocessed 2D textures only. In the related work section 3.5 

we considered other algorithms as well. The basic assumption for all algorithms where our 

idea can be applied is that the inter-frame displacement is small. Our movement model is 

very simple. It expects the feature point to be almost at the same position. This assumption 

is realistic in table top AR cause the field of interest is restricted to the horizontal table top 

setup. We also discussed other techniques like the tracking of whole regions to address 

heavier motions. But it has to be evaluated if out approach is suited for these algorithms as 

well. The better alternative to allow larger movements with a hybrid tracking setup would 

be to predict the camera movement applying the gyroscope data. We also showed examples 

for such setups. 

7.1.3 Table Top Augmented Reality 

In our special case we used the Magic Book which is a table top application where the user 

does not have to solve a certain task. As the setup allows the user can move around freely 

and explore the 3D virtual scenes without any restrictions. In the user study we restricted 

the usage by asking the test participants to fulfill certain tasks and force the user to use 

the Magic Book in a certain way. With this method we achieved a categorization of tasks 

suited for the Magic Book. As we figured out these tasks are related to different movements 

influencing the optical tracking. Still we have to validate if these tasks are also suited for 

other applications. We have also discussed the huge potential for table top AR which has to 

be addressed by future research. 

Let us look at another table top application. An ISMAR 2004 demo introduced a chess 

game combining Augmented and Virtual Reality [15]. One player sits in front of his chess 

board wearing using a graphical output device. He is able to move his own tangible chessman 

while the chessmen of the component are virtual objects. If now a natural feature tracking 

technique would be used to align the virtual chessman on the chessboard, for example 

by tracking the edges of the chessboard, we can apply our results from our user study. Thus 

the user probably wants an overview of the whole scene most of the time we can apply the 

overview task configuration. If the chessman is nicely animated the user might have a closer 

look at the pawns in the game. Thus we can switch to the detail task configuration. We can 

assume that every application has a nature of usage known by the developer. This information 

can be derived by studies and be used for a configuration. Of course it is more difficult 

84

7 Conclusions 

if a user does not have to accomplish a measured goal, like in the Magic Book. If we consider 

an augmented exhibition for example it might not be obvious where a user looks first or if 

he will zoom closer to the object. Later on we will discuss as well how virtual content might 

enforce actions by the user. 

7.1.4 Assessment of our Approach 

In comparison to the related work described in 3.5 our approach was not to improve the 

tracking itself by using hybrid tracking technologies, but to improve a specific class of applications 

by considering the user context. In our case the user context was the usage of 

the handheld device. This usage is estimated by interpreting relative orientation given by 

an additional gyroscope tracker. The main part of the thesis was to design, conduct and 

evaluate a user study to explore a relationship between both tracking measurements. During 

the evaluation phase we figured out that the movement context and the behavior of the 

tracking routine are related. It makes sense to consider this movement context in a runtime 

environment provided by the proposed architecture. As we have also discussed a delay of 

the underlying tracking techniques has to be measured and applied for further studies. This 

will lead to better results. In the research area of human-computer interaction a lot of further 

studies have to be made to understand users in a better way. We provided some suited ideas 

which can be taken into account for future evaluations. 

In my opinion the issue of how humans interact with computer systems will become more 

and more important with the evolution of new interaction methods used by AR applications. 

On the other hand humans always adopted to new technologies and are used to learn how 

new technologies work. Thus it is also a question to what degree we can expect people to 

learn new interaction techniques. 

7.2 Future Work 

In this work we tried to characterize the user behavior for a specific application class. Still 

a lot of factors influencing the behavior have been unconsidered. Further on we will have a 

look at a technique to enforce the user to perform actions. 

7.2.1 Factors for User Behavior 

First we will introduced the most important factors influencing the user behavior: 

• Hardware for graphical output 

For our studies this factor was fixed to the handheld display. Other hardware for user 

interfaces might be used in a totally different way. Potential for future research is to 

repeat the study with a head mounted display and a tablet-PC as well. The results 

have to be compared and the linear models have to be adopted to the corresponding 

user interface hardware. 

• Tasks 

85

7 Conclusions 

In our work we tried to categorize tasks for table top AR. We varied these tasks and 

found out that the logging data varies between these tasks. These tasks have to be 

validated or even declined for other applications as well. 

• Virtual Content 

This parameter was also almost fixed during the study. Of course the scenes change 

from page to page but the content was comparable. But this content could be varied 

as well. Shapes, animations, textures could influence the user behavior by catching his 

attention as well. So experiments could be made with altered properties of the virtual 

scene in each test condition. 

• User 

These are factor depending on each single user, like psychological factors or the knowledge 

of a user. It is a rather difficult task to measure these factors. Performing these 

studies is an interdisciplinary task between computer scientists and psychologists. Our 

study lead to the result that a user learns about the tracking during usage for example. 

These factors can be varied and will probably lead to different results. 

7.2.2 Visual Cues 

The tasks used in the Magic Book user study were constructed artificially to enforce a certain 

feedback by the user. It will not be realistic that a user uses the Magic Book and performs 

these tasks if he is not forced to accomplish them. Thus we do not know what the user will 

do in order to provide a configuration for the runtime setup. An idea would be enforced actions 

by changing the virtual content. This leads to another interesting question: ”How can 

virtual content enforce actions by the user?”. If it is possible to use these so called visual cues 

we can predict actions by the user. Usually visual clues are used to give the user additional 

graphical feedback or hints. For example an AR kitchen project developed at the MIT provides 

additional visual cues to the user in order to improve the performance. It highlights 

a the drawer containing a needed ingredient for example [12]. In contrast we want to use it 

as a mechanism to start actions by the user. In the chess application mentioned earlier the 

system is highlighting a chessman if the user has to put out of the game [15]. Thus we expect 

that the user will take the chessman out of the game. Another idea is to animate a chessman 

when it is moved. It is more likely that the user will focus on this animation than on other 

features of the scene. Again it will be more difficult with museum exhibition applications, 

but also by changing the content chains of actions can be enforced. 

All of these ideas have to be underlined by corresponding studies. 

7.2.3 Next Steps 

The following steps should give a rough guideline for further steps on this research. 

1. User Study 

The user study should be repeated with the following modifications. First the tracking 

delay for both trackers must be considered. Now we can vary some of the factors 

86

7 Conclusions 

described above. In a first test condition the study is made with the handheld device 

again, the second test case with a tablet PC. The resulting evaluation data should be 

compared. Also the content could be changed in a second study. The first condition 

will evaluate the fairy tale Magic Book while the second condition will be made with 

different content. Experiments with visual cues could be made as well. The evaluation 

methods described in this work could be applied. 

2. Extend DWARF-Architecture 

Still the DWARF component has to be fed with configuration. At this stage the architecture 

only provides the communication mechanisms and the configuration for the 

linear model could be easily integrated. A good idea would be to provide a test environment 

where the parameters of the linear model (7.1) could be altered. Another 

interesting idea is that the linear model adopts to the occurring tracking failures. 

3. Validate Results 

Finally it has to be shown that the dynamically configured Magic Book really leads to a 

better performance in term of robustness and computation. Possible method would be 

another user study with two test conditions, one without dynamic configuration and 

one with the gyroscope setup. 

Finally I hope that I could provide ideas and encouragements to continue parts of this 

work. 

87

Glossary 

Augmented Reality. The goal of Augmented Reality is to enrich the real world by overlaying 

it with virtual information. 

Calibration. Calibration is the task of defining and configuring parameters that stay constant 

during a −→ tracking task. Especially the integration of different coordinate systems 

into a world coordinate system is meant by this term in this thesis. 

Correlation. The correlation coefficient of two data sets indicates the relationship between 

both measurements. The correlation coefficient r ranges from -1 to 1. |r| ≈ 1 indicates 

a strong relationship, |r| ≈ 0 indicates a weak relationship. 

DOF. Degrees of freedom determines the state measurement ability of a tracking technology 

in a three dimensional environment. 3DOF determines position or orientation of an 

object, while 6DOF determines both (see −→ Pose). 

DWARF. The Distributed Wearable Augmented Reality Framework is a component-based 

framework enabling fast prototyping for AR applications. These components are 

reusable and distributed. A Corba-based infrastructure provides communication 

mechanisms for the components. 

Human Computer Interaction. Human Computer Interaction is a discipline in computer research 

putting a focus on the design, evaluation and implementation of interactive 

computer systems. Thus the interaction between humans and computers is an important 

issue. 

Immersion. Immersion is a measurement to what degree a user is affected by a virtual or augmented 

experience. Psychological factors as well as properties of the setup influence 

the immersion. 

Inertial Tracking. Based on the law of inertia accelerometers estimate the position of an object. 

Important for many applications are gyroscopes measuring the relative orientation 

of objects. Relative orientation means that we estimate the rate of turn, not 

absolute angles in a world coordinate frame. 

88

Glossary 

Linear Regression. Linear Regression is the estimation of a linear model for two variables X 

and Y , while X is the independent predictor and Y the dependent prediction. The 

linear model has the form Y = a + bX and values for a and b are approximated. 

Natural Feature Tracking. Natural Feature Tracking is a optical or −→ vision-based tracking 

method. Features from the environment are extracted from the video image. In the 

next video frame these features have to be found again. While computing the correspondences 

between the 2D feature points in the video frame and the 3D objects the 

camera pose can be estimated. 

Pose. Pose is a data structure containing spatial information of an object. The spatial information 

of an object consists of position and orientation. 

Registration. Registration is the problem of aligning virtual objects in the real environment. 

Two task are important for registration: the −→ calibration of the setup and the tracking. 

This is an important issue for the quality of an −→ Augmented Reality application 

Table Top. Table Top Augmented Reality is a special class of Augmented Reality −→ applications. 

It could be characterized by a horizontal setup with a restricted interaction area. 

Main application domains are exhibitions, education, gaming and collaboration. 

Tasks. A task is the purpose for the use of a computer application. A user has to accomplish 

a certain goal usually motivated by the user himself. In our case we describe the task 

to the user. 

Texture Tracking. In our context texture tracking is a special case of natural feature tracking. 

2 dimensional textures can used for the tracking of feature points. In the ARToolkit 

version these textures have to be preprocessed first. 

Tracking. Tracking is used for the estimation of an objects state. Tracking is a loop process 

consisting of the estimation and the update of the target’s state. A tracker is the underlying 

technology responsible for the −→ pose estimation. 

Vision-based Tracking. Also the term optical −→ tracking is used for this tracking technology 

which consists of hardware grabbing video frames and image recognition software 

analyzing the image and providing −→ pose information. 

89

A.1 Conduction of the User Study 

A.1.1 Questionnaire 

APPENDIX A 

User Study 

Here is a short overview of the rationale behind the questions posted in the questionnaire 

(see figure A.1). All the questions except part ’A’ could be answered on a scale of 1 to 5. In 

part ’B’ the scale options were 1=’never heard of it’ and 5=’experienced developer’, in part 

’C’ to ’E’ the scale options were 1=’strongly disagree’ and 5=’strongly agree’. The five scales 

seemed adequate for us, because the focus of the work was not on the questionnaire and it 

provides a suited feedback of the participants. The results can be explored in section B.1. 

A Personal details of participants 

These questions concerning age, occupation, gender and hand usage were posetd to 

have an overview of the participating test persons. Thus it can be evaluated if the 

group of participants is suited for the study. For future research the test person could 

be splitted into groups and the data sets could be compared. 

B Background on Augmented Reality 

This part of the questionnaire should collect information about the previous knowledge 

of the test persons. Questions were posted about knowledge in Augmented Reality 

in general and knowledge on the Magic Book. 

C Behavior of the Magic Book 

The intention of these questions was to collect feedback on the tracking and the usability 

of the Magic Book. Question C1 intended on usability, C2 on the awareness of 

jittering of the tracking and C3 focused on the tracking delay. But intention of this part 

was just to have a feedback about these issues by the test person. It was not used for 

further evaluation. 

D Tasks 

90

A User Study 

The question ”The task was easy to perform” should give information if the test persons 

had any problems performing the different tasks. 

E Handheld device 

Here feedback on the handheld device was gathered. This information can be used for 

future evaluation with other user interfaces. 

F Comments and encouragements 

A.1.2 Instructions and Guideline 

The instructions to the user can be seen in figure A.2. Figure A.3 shows the guideline for the 

study. 

91

A User Study 

QUESTIONAIRE FOR MAGIC BOOK - USER STUDY 

A. Personal Details 

Age: ______________ female O lefthanded O 

Occupation: ______________ male O righthanded O 

B. Background on AR 

How familiar am I with Augmented Reality. 

1 (never heard of it) 2 3 4 5 (experienced developer) 

How familiar am I with the Magic Book. 

1 (never used it) 2 3 4 5 (familiar with technologies used by the Magic 

Book) 

C. Behaviour of the Magic Book 

It easy to use the Magic Book. 

1 (strongly disagree) 2 3 4 5 (strongly agree) 

The scenes on the page were always clear and stable 


The scenes in the book responded to my movements immediately. 


D. Tasks 

The free task was easy to perform. 


The navigation task was easy to perform. 


E. Handheld device 

The Handheld device is a very suitable device for interacting with the Magic Book. 


F. Comments on the Magic Book, UI, Tasks, etc. (voluntarily) 

Date: ID: Thank you very much! ☺ 

Figure A.1: Questionnaire 

92

A User Study 

INSTRUCTIONS FOR MAGIC BOOK - USER STUDY 

1. The study will take 10 minutes 

2. Records are held anonymous 

3. In this user study we try to evaluate in which actions occur when people 

are using the Magic Book. 

4. Usage: 

a. Just use the Handheld device and look at the marker first to obtain 

an initial orientation 

b. Now you can move freely through the scene 

c. If you lose the scene just look on the marker again 

5. Steps: 

a. You first have a chance to do some training with the Magic Book 

b. In the free task you can move around as you feel for it. Goal is to 

understand the scene (30 seconds). 

c. In the navigation task I will ask some questions concerning the 3D 

scene, like “how many trees do you see? (until finished)” 

d. 2 free tasks and 2 navigation tasks 

e. The other scenes are for your pleasure 

6. There is no wrong answer to my questions 

7. Data will only be used for the analysis of your movements of the 

handheld device 

8. Feel free to ask questions any time 

9. Comments are very welcome 

10. Thank you! 

Figure A.2: Instructions to the user study participant 

93

A User Study 

GUIDELINE FOR MAGIC BOOK - USER STUDY 

1. Welcome participant 

2. Guide him through the study (see instructions) 

3. Perform tasks (see below) 

4. Questionaire 

5. Ask for comments 

6. Give contact details in case of further questions 

Scenes (Pages): 

1. Practice 

2. CASE 1: Free Task 

a. 30 seconds 

3. for your pleasure (voluntary) 

4. CASE 2: 

(PRESS T WHEN TASK BEGINS) 

a. Task 1: How many people do you see in this scene (7)? 

b. Task 2: What is the haircolour of the woman with the white skirt 

c. Task 3: What is the colour of her shoes (blond/blue)? 

d. Task 4: How many people wear a headdress (6)? 

e. Task 5: How many pieces of wood do you see (4, 5-6)? 



7. CASE 3: Free Task 

a. 30 seconds 

8. CASE 4: 

(PRESS T WHEN TASK BEGINS) 

a. Task 1: How many people do you see in this scene (7)? 

b. Task 2: How many women and how many men do you see (2/5)? 

c. Task 3: What is the eye colour of the man with the purple coat (blue)? 

d. Task 4: How many windows and how many doors do you the on the 

front of the church (4/2)? 

Figure A.3: Guideline through the user study 

94

A.2 Statistical Tools 

A User Study 

This chapter should deliver an overview of the statistical evaluation methods used in this 

thesis. First the correlation is introduced which gives a measurement for the degree of a 

linear relationship for two sets of data. The regression analysis gives an estimation model 

for this linear relationship. Further and detailed discussions of these methods can be found 

in the corresponding references [18][52]. In my evaluation I mainly used two tools, an open 

source statistical analysis tool called GRETL 1 and Matlab 2 . All the plots have been done with 

Matlab. 

A.2.1 Correlation 

Basic question if we consider a correlation evaluation is how two sets of data (X, Y ) are 

related to each other. If increasing values of measurement X result in increasing values of 

measurement Y , we can expect a positive relationship between those two data sets. The 

correlation method provides a measurement for the grade of relationship between two independent 

measurements, the correlation coefficient. 

Correlation-Coefficient The Correlation-Coefficient r gives the degree of the strength of a relationship 

between two measurements with 

It is also called the Bravais-Pearson correlation coefficient. 

− 1 ≤ r ≤ 1 (A.1) 

|r| < 0.5 expresses a weak relationship, while |r| ≥ 0.8 expresses a very strong dependency. 

If the measurements tend to be on one straight line, the higher will be |r|. If this 

straight line has a positive slope (positive correlation) r ≈ 1 and with a negative slope (negative 

correlation) r ≈ −1 (see figure A.4). If the slope equals 0 we have no correlation at 

all. 

Figure A.4: The Bravais-Pearson-Correlation Coefficient expresses the grade of linear relationship 

between two data sets. The right figure show a positive correlation, the figure in the middle shows 

no correlation and the left figure show a negative correlation 

The correlation coefficient r for the measurement sets (xi, yi), with i = 1, .., n is calculated 

the following way: 

1 http://gretl.sourceforge.net/ 

2 http://www.mathworks.com/ 

95

= rXY = 

A User Study 

n i=1 (xi − ¯x)(yi − ¯y) 

n i=1 (xi − ¯x) 2 ˜sXY 

= 

(yi − ¯y) 2 ˜sX ˜sY 

(A.2) 

with ¯x and ¯y mean values of x and y. ˜sX and ˜sY are the standard deviations of measurements 

X and Y . 

˜sXY = 1 

n 

 

 

 

˜sX = 1 

n 

(xi − ¯x) 

n 

i=1 

2 (A.3) 

 

 

 

˜sY = 1 

n 

(yi − ¯y) 

n 

2 (A.4) 

i=1 

n 

(xi − ¯x)(yi − ¯y) (A.5) 

i=1 

The calculation of r is provided by most of the statistical analysis tools. After applying 

this function we can characterize the degree of linear relationship of our measurements. It is 

important to note that the coefficient only gives information on a linear correlation, not on 

other forms of dependency. If we want to estimate the linear model now we have to apply a 

further technique called linear regression. 

A.2.2 Linear Regression 

The goal of the linear regression is to derive a linear model to estimate the value of the 

predicted variable Y . This is done by performing a linear transform of the predictor variable 

X. The transformation looks like this: 

Y = f(X) = a + bX (A.6) 

Of course this function is only an approximation, because not all of the measurements 

will fall on one straight line (see as well A.4). Therefore for every pair of data (xi, yi) the 

following relationship is applied: 

yi = a + bxi + ei 

(A.7) 

ei is the error resulting because of the adoption to the linear relationship for i = 1, ..n. a 

and b are the parameters of the linear regression model. One assumption of this model is 

that we can not estimate the error ei if we know xi, there is no dependency. The sum of all 

errors has to be minimized in order to obtain a ”best” straight line and therefore a ”best” 

regression model. The error ei for every measurement yi can be calculated as: 

ei = yi − ˆyi 

(A.8) 

where ˆyi is the predicted value of the linear mapping and yi is the actual measured value. 

This error also called residual has to be summed for all data pairs. As we do not want that 

negative and positive values balance each other we square the error values, the so called 

squared residuals. The average of this value Q has to be minimized. 

96

Q(a, b) = 1 

n 

n 

i=1 

A User Study 

(yi − ˆyi) 2 = 1 

n 

n 

(yi − (a + bxi)) 2 

i=1 

(A.9) 

We now have to estimate the values for a and b resulting in a minimal Q(a, b). This method 

is also called sum of least squares method. 

The estimates (â, ˆ b) of (a, b) can be calculated by setting the partial derivative of a and b to 

0. These both equations have to solved. 

δQ(a, b) 

δa 

δQ(a, b) 

δb 

= −2 

= −2 

n 

(yi − (a + bxi)) = 0 (A.10) 

i=1 

n 

(yi − (a + bxi))xi = 0 (A.11) 

i=1 

Further calculations lead to these two equations: 

1 

n 

n 

i=1 

n 1 

yi − â − 

n 

i=1 

ˆb 1 

n 

xiyi − 1 

n â 

n 

xi − 1 

n ˆb i=1 

n 

yi = 0 (A.12) 

i=1 

n 

i=1 

Now â and ˆ b can be calculated out of these equations: 

If we put this in equation A.13 we can derive ˆ b. 

ˆ b = 

x 2 i = 0 (A.13) 

â = ¯y − ˆ b¯x (A.14) 

n i=1 (xi − ¯x)(yi − ¯y) 

n i=1 (xi − ¯x) 2 = ˜sXY 

˜s 2 X 

(A.15) 

Now we have calculated both parameters of the linear regression. But we still need an 

indicator for the quality of this estimation, because if the error e increases with higher x 

values a linear model might not be suited best to model the real world. The idea is that 

we split the Sum of Squares Total (SQT) which gives a value for the total variance into two 

components. 

Sum of Squares Explained (SQE): This is the variance of our linear model. 

SQE = 

n 

(ˆyi − ¯y) 2 

Sum of Squares Residuals (SQR) This is the rest of the distribution of the yi values. 

SQR = 

97 

i=1 

n 

(yi − ˆy) 2 

i=1 

(A.16) 

(A.17)

A User Study 

Sum of Squares Total (SQT): Thus the total variance is the variance described with our linear 

model plus the the sum of the variance we can not explain. 

SQT = SQE + SQR (A.18) 

With these values we can calculate a determination coefficient R 2 . 

R 2 = SQE 

SQT 

= 1 − 

n 

i=1 (yi − ˆy) 2 

n 

i=1 (yi − ¯y) 2 

(A.19) 

R 2 is an indicator for the part of the total dispersion that can be explained by our linear 

model. This coefficient has domain from 0 to 1. If all of our measurements would fall on one 

straight line it would result in coefficient of R 2 = 1, because our derived model matches the 

reality to 100%. If the residuals would be spreaded around the regression line randomly, the 

model would not be suited and results in a low determination coefficient. 

The determination coefficient R 2 is also related to the to correlation coefficient rXY for the 

measurements X and Y. The proof can be found in [18]. 

R 2 = r 2 XY 

(A.20) 

Another method get a feeling of the quality of the linear model is a graphical plot of the 

residuals. If the residuals are close to 0 and vary without any system around the horizontal 

axis we can assume a suited model. 

Example 

This small example taken from [52] should demonstrate the conclusions derived from the 

correlation coefficient and the regression model. The following measurements have to be 

analyzed concerning a linear dependency (A.1). Calculations were made with GRETL, the 

plots were made with Matlab. 

i 1 2 3 4 5 6 7 8 9 10 

xi 1 5 3 8 2 2 10 8 7 4 

yi 1 6 1 6 3 2 8 5 6 2 

Table A.1: Example data measurements: 10 test samples were made for variable X and Y 

In figure A.5 you could see the resulting plot with the linear regression model. Figure A.6 

you could see the residuals. These residuals are distributed randomly, there is no obvious 

dependency between x and the values of the residuals. Putting the data into GRETL results 

in the output shown in figure A.7. 

The correlation coefficient is r = 0.8934 which is a indicator for a high relation between 

both variables. We can also extract the linear model for the prediction of variable Y out of 

this output (COEFFICIENT-column in A.7). 

y = f(x) = 0.395349 + 0.720930x 

98

A User Study 

Figure A.5: The plotted data with the corresponding linear regression model 

Figure A.6: The plot shows the values of the residuals for every predicted y value 

99

A User Study 

Model 4: OLS estimates using the 10 observations 1-10 

Dependent variable: Y 

VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|) 

0) const 0,395349 0,742950 0,532 0,609090 

1) X 0,720930 0,128171 5,625 0,000496 *** 

Mean of dependent variable = 4 

Standard deviation of dep. var. = 2,49444 

Sum of squared residuals = 11,3023 

Standard error of residuals = 1,18861 

Unadjusted R-squared = 0,798173 

Adjusted R-squared = 0,772944 

Degrees of freedom = 8 

corr(X, Y) = 0,8934 

Figure A.7: The output from the GRETL tool: All the important figures concerning the linear regression 

model are shown 

The determination coefficient results in R 2 = r 2 = 0.798173. This lets us conclude that we 

found a very suited model. 

The GRETL output also calculated a p-value p. This p-value is the probability that we 

make a mistake if we deny the null-hypothesis of the t-test. The null-hypothesis H0 in our 

case means that the measurements are not related. In the linear model Y = a + bX it means 

that b = 0, Y can not be explained with the predictor X: 

H0 : b = 0 

H1 : b = 0 

(A.21) 

Thus the p-value is an indicator if we should accept or deny H0. GRETL automatically 

performs this t-test. In our example the p-value for X declaring Y is 0.000496. If this p-value 

is under a certain significance parameter α we can reject H0. GRETL computes the t-test 

for three significance level α = {0.1, 0.05, 0.01}. The significance level is the probability we 

admit for a false choice of a hypothesis. If p < α, H0 is rejected. In our example p is lower 

than all the values for α. This is indicated by the three ∗ behind the p-value. 

Hence we can conclude that X and Y are related and that we can accept H1. The lower 

p the more confidence can be put in the assumption that the data measurements are highly 

related. For the calculation of this value and the testing of hypothesis please have a look in 

the corresponding literature. 

100

APPENDIX B 

Complete Results 

In this section of the appendix the complete results of the questionnaire and the analysis of 

the linear regression of the data sets can be examined. 

B.1 Questionnaire 

Here is the data extracted from the questionnaires of the user study. 

A Personal Details of participants 

1. Gender 

male female 

80% 20 % 

2. Age 

20-22 23-25 26-28 29-31 32+ 

10% 40% 25% 10 % 15 % 

3. Hand 

left right 

95% 5 % 

4. Occupation 

Student Employee Researcher Else 

70 % 10% 10% 10% 

B Background on Augmented Reality 

1. How familiar am I with Augmented Reality? 

1 2 3 4 5 

10% 25% 15% 5% 45% 

(1=never heard of it, 5=experienced developer) 

101

B Complete Results 

2. How familiar am I with the Magic Book? 

1 2 3 4 5 

5% 30% 15% 15% 35% 

C Behavior of the Magic Book 

D Tasks 

1. It was easy to use the Magic Book 

1 2 3 4 5 

0% 0% 45% 40% 15% 

(1=never heard of it, 5=experienced developer) 

(1=strongly disagree, 5=strongly agree) 

2. The scenes on the page were always clear and stable 

1 2 3 4 5 

5% 35% 45% 15% 0% 


3. The scenes in the book responded to my movements immediately 

1 2 3 4 5 

0% 5% 10% 55% 30% 

1. The free task was easy to perform 

1 2 3 4 5 

0% 0% 10% 30% 60% 

2. The navigation task easy to perform 

1 2 3 4 5 

0% 0% 15% 65% 20% 

E Handheld device 




1. The handheld device is a very suitable device for interacting with the Magic Book 

1 2 3 4 5 

5% 35% 45% 15% 0% 

F Comments and encouragements 


Here are some comments made on the Magic Book, the handheld device and the user 

study itself: 

cables were disturbing, use of a head mounted display as comparison, cables limit the 

range of movement, handheld device is too heavy, tracking fails too often, cool book, 

some colors do not look good in the handheld display, screen was a little blury, difficult 

to see the whole scene, giant was not completely visible, image disappears, not stable, 

jittering and tracking failure is annoying, heavy device is hard to keep stable, would 

be good not to hold anything, :-), tasks were good 

102

B.2 Cases 


In this section all the cases with the corresponding tasks are described. The results of the 

linear regression are shown as well. 

B.2.1 Case 1 

Case 1 was the first free task without any questions about the virtual scene (B.1). Figure B.2 

shows the scattered plot with the regression line and a plot of the residuals. The GRETL 

output is shown in B.3. 

Figure B.1: Case 1: Virtual Scene 

Figure B.2: Case 1: The scattered plot with regression line and the plot of the residuals 

103



Dependent variable: Vector 


0) const 1,56895 0,0274115 57,237 < 0,00001 *** 

2) Angle 144,624 3,30650 43,739 < 0,00001 *** 

Mean of dependent variable = 2,31546 


Sum of squared residuals = 122635 





Pairwise correlation coefficients: 

corr(Vector, Angle) = 0,3239 

Figure B.3: The output from the GRETL tool: Case 1 

104

B.2.2 Case 2 


Figure B.4 shows the virtual scene for Case 2. Now with this scene the test persons had to 

perform several tasks. Questions were posted to the test person. 

• Task 1 


Question: ”How many people do you see in the scene ?” (plot: B.5, output: B.6) 

Figure B.5: Case 2 / Task 1: The scattered plot with regression line and the plot of the residuals 

105





0) const 1,26631 0,0238207 53,160 < 0,00001 *** 

2) Angle 194,822 3,12817 62,280 < 0,00001 *** 










Figure B.6: The output from the GRETL tool: Case 2 / Task 1 

106

• Task 2 


Question: ”What is the haircolor of the woman with the white skirt ?” (plot: B.7, output: 

B.8) 





0) const 1,45847 0,0269841 54,049 < 0,00001 *** 

2) Angle 177,196 3,53711 50,096 < 0,00001 *** 











107

• Task 3 


Question: ”What is the color of the shoes of the woman with the white skirt ?” (plot: 

B.9, output: B.10) 





0) const 1,78150 0,0423130 42,103 < 0,00001 *** 

2) Angle 193,941 4,95880 39,110 < 0,00001 *** 











108

• Task 4 


Question: ”How many people wear a hat or a headdress ?” (plot: B.11, output: B.12) 





0) const 1,76134 0,0320220 55,004 < 0,00001 *** 

2) Angle 220,166 3,70264 59,462 < 0,00001 *** 











109

• Task 5 


Question: ”How many pieces of wood do you see ?” (plot: B.13, output: B.14) 





0) const 1,34886 0,0262689 51,348 < 0,00001 *** 

2) Angle 210,083 3,21658 65,312 < 0,00001 *** 











110

B.2.3 Case 3 


Case 3 (B.15 was again a free task without any questions posted to the test person (plot: B.16, 

output: B.17). 


Figure B.16: Case 3: The scattered plot with regression line and the plot of the residuals 

111





0) const 1,55991 0,0190527 81,873 < 0,00001 *** 

2) Angle 127,487 2,00371 63,625 < 0,00001 *** 










Figure B.17: The output from the GRETL tool: Case 3 

112

B.2.4 Case 4 


The scene for case 4 can be seen in figure B.18. Several tasks had to performed again in this 

case and the corresponding questions were posted. 

• Task 1 


Question: ”How many people do you see in the scene ?” (plot: B.19, output: B.20) 


113





0) const 1,16379 0,0269233 43,226 < 0,00001 *** 

2) Angle 190,290 3,79175 50,185 < 0,00001 *** 











114

• Task 2 


Question: ”How many women and how many men do you see?” (plot: B.21, output: 

B.22) 





0) const 1,48687 0,0198384 74,949 < 0,00001 *** 

2) Angle 131,274 2,39249 54,869 < 0,00001 *** 











115

• Task 3 


Question: ”What is the eye color of the man with the purple coat ?” (plot: B.23, output: 

B.24) 





0) const 1,74120 0,0276468 62,980 < 0,00001 *** 

2) Angle 130,608 2,75606 47,390 < 0,00001 *** 











116

• Task 4 


Question: ”How many windows and how many doors do you see and the front of the 

church ?” (plot: B.25, output: B.26) 





0) const 1,49365 0,0361831 41,280 < 0,00001 *** 

2) Angle 197,812 4,51415 43,820 < 0,00001 *** 











117

Bibliography 

[1] R. Azuma, B. Hoff, III Neely, H., and R. Sarfaty. A motion-stabilized outdoor augmented 

reality system. In Virtual Reality, 1999. Proceedings., IEEE, pages 252–259, 1999. 

[2] Ronald Azuma, Yohan Baillot, Reinhold Behringer, Steven Feiner, Simon Julier, and 

Blair MacIntyre. Recent advances in augmented reality. IEEE Computer Graphics and 

Applications, 21(6):34–47, Nov/Dec 2001. 

[3] Ronald Azuma and Gary Bishop. A frequency-domain analysis of head-motion prediction. 

In SIGGRAPH ’95: Proceedings of the 22nd annual conference on Computer graphics 

and interactive techniques, pages 401–408, New York, NY, USA, 1995. ACM Press. 

[4] Ronald Azuma, Jong Weon Lee, Bolan Jiang, Jun Park, Suya You, and Ulrich Neumann. 

Tracking in unprepared environments for augmented reality systems. Computers & 

Graphics, 23(6):787–793, 1999. 

[5] Ronald T. Azuma. A survey of augmented reality. Presence, Special Issue on Augmented 

Reality, 6(4):355–385, August 1997. 

[6] Martin Bauer, Bernd Brügge, Gudrun Klinker, Asa MacWilliams, Thomas Reicher, Stefan 

Riss, Christian Sandor, and Martin Wagner. Design of a component-based augmented 

reality framework. In Proceedings of the 2nd IEEE and ACM International Symposium 

on Augmented Reality (ISAR 2001), New York, NY, October 2001. IEEE Computer 

Society. 

[7] Mark Billinghurst. Augmented reality in education, new horizons for learning. Internet, 

2002. www.newhorzons.org/strategies/technology/billinghurst.htm. 

[8] Mark Billinghurst and Hirokazu Kato. Collaborative augmented reality. Communications 

of the ACM, 45(7):64–70, 2002. 

[9] Mark Billinghurst, Ivan Poupyrev, Hirokazu Kato, and Richard May. Mixing realities 

in shared space: An augmented reality interface for collaborative computing. In IEEE 

International Conference on Multimedia and Expo (III), pages 1641–1644, 2000. 

118

Bibliography 

[10] Mark Billinghurst, Hirokazu Tachibana, Keihachiro Kato, and Michael Grafe. Virtual 

object manipulation on a table top ar environment. In Proceedings of the 2nd IEEE and 

ACM International Symposium on Augmented Reality (ISAR 2001), New York, NY, October 

2001. IEEE Computer Society. 

[11] Mark Billinghurst, Hirokazu Tachibana, Keihachiro Kato, and Michael Grafe. A registration 

method based on texture tracking using artoolkit. In The Second IEEE International 

Augmented Reality Toolkit Workshop, Tokyo, Japan, 2003. 

[12] Leonardo Bonanni, Chia-Hsun Lee, and Ted Selker. Attention-based design of augmented 

reality interfaces. In CHI ’05: CHI ’05 extended abstracts on Human factors in 

computing systems, pages 1228–1231, New York, NY, USA, 2005. ACM Press. 

[13] Bernd Bruegge and Allen A. Dutoit. Object-Oriented Software Engineering; Conquering 

Complex and Changing Systems. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1999. 

[14] J.S. Burdess, A.J. Harris, J. Cruickshank, D. Wood, and G. Cooper. A review of vibratory 

gyroscopes. IEEE Engineering Science and Education Journal, 3(16), 1994. 

[15] Carlos Correa, Andres Agudelo, Allan Meng Krebs, Ivan Marsic, Jun Hou, Ashutosh 

Morde, and Kicha Ganapathy. The parallel worlds system for collaboration among 

virtual and augmented reality users. In Proc. of International Symposium on Mixed and 

Augmented Reality, Arlington, VA, USA, Nov. 2004. 

[16] Corioli’s force. http://www.cmt.phys.kyushu-u.ac.jp/ ∼ M.Sakurai/phys/ 

physmath/rotsys-e.html. 

[17] DWARF Project Homepage. www.augmentedreality.de. 

[18] Ludwig Fahrmeir, Rita Künstler, Iris Pigeot, and Gerhard Tutz. Statistik - Der Weg zur 

Datenanalyse. Springer, 1997. 

[19] Armin Fischer, Michael Kuhn, and Felix Löw. Augmented Furniture Client - Eine digitale 

Vertriebsinnovation für die Möbelbranche, 2004. 

[20] Martin Fowler. UML Distilled: A Brief Guide to the Standard Object Modeling Language. 

Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2001. 

[21] B.D. Allen G. Bishop, G. Welch. Tracking: Beyond 15 minutes of thought. SIGGRAPH 

Course Pack, 2001. 

[22] B.J. Gallagher, J.S. Burdess, A.J. Harris, and M.E. McNie. Principles of a three-axis vibrating 

gyroscope. IEEE Transactions on Aerospace and Electronic Systems, 37(4), 2001. 

[23] Gyroscopes of various types. http://www.spp.co.jp/sssj/sindoue.html. 

[24] Thomas T. Hewett. ACM SIGCHI curricula for Human-Computer Interaction. Technical 

report, New York, NY, USA, 1992. 

[25] Getting around the coriolis force. http://www.physics.ohio-state.edu/ 

∼ dvandom/Edu/newcor.html. 

119

Bibliography 

[26] Michael Kalkusch, Thomas Lidy, Michael Knapp, Gerhard Reitmayr, Hannes Kaufmann, 

and Dieter Schmalstieg. Structured visual markers for indoor pathfinding. In 

Proceedings of the First IEEE International Workshop on ARToolKit, Darmstadt, Germany, 

2002. 

[27] H. Kato and Mark Billinghurst. Marker tracking and hmd calibration for a video-based 

augmented reality conferencing system. In Proceedings of the 2nd International Workshop 

on Augmented Reality (IWAR 99), San Francisco, USA, October 1999. 

[28] Hirokazu Kato, Mark Billinghurst, and Ivan Poupyrev. ARToolKit version 2.33 Manual, 

2000. Available for download at http://www.hitl.washington.edu/research/ 

shared space/download/. 

[29] Georg Klein and Tom Drummond. Robust visual tracking for non-instrumented augmented 

reality. In The Second IEEE and ACM International Symposium on Mixed and Augmented 

Reality, pages 113 – 122. IEEE Computer Society, October 7 – 10 2003. 

[30] Chris Kulas. Usability engineering for ubiquitous computing. Master’s thesis, Technische 

Universität München, 2003. 

[31] Clayton Lewis and John Rieman. Task-Centered User Interface Design - A Practical Introduction. 

1994. http://hcibib.org/tcuid/. 

[32] J. Liang, C. Shaw, and M. Green. On temporal-spatial realism in the virtual reality environment. 

In Proc. of the 4th Annual Symposium on User Interface Software and Technology 

(UIST’91), pages 19–25, Hilton Head, SC, 1991. 

[33] Bruce D. Lucas and Takeo Kanade. An iterative image registration technique with an 

application to stereo vision (darpa). In Proceedings of the 1981 DARPA Image Understanding 

Workshop, pages 121–130, April 1981. 

[34] J. McKenzie and D. Darnell. The ”magic book”: A report into augmented reality storytelling 

in the context of a children’s workshop. Technical report, Christchurch College 

of Education, 2003. 

[35] Gordon E. Moore. Cramming more components onto integrated circuits. Electronics, 

Volume 38(8), April 1965. 

[36] R. Mukundan. Quaternions: From classical mechanics to computer graphics, and beyond. 

In Proceedings of the 7 th Asian Technology Conference in Mathematics, 2002. 

[37] Ulrich Neumann and Suya You. Integration of region tracking and optical flow for 

image motion estimation. In ICIP (3), pages 658–662, 1998. 

[38] Ulrich Neumann and Suya You. Natural feature tracking for augmented reality. IEEE 

Transactions on Multimedia, 1(1):53–64, 1999. 

[39] J. Newman, M. Wagner, M. Bauer, A. MacWilliams, T. Pintaric, D. Beyer, D. Pustka, 

F. Strasser, D. Schmalstieg, and G. Klinker. Ubiquitous tracking for augmented reality. 

In Proc. of International Symposium on Mixed and Augmented Reality, Arlington, VA, USA, 

Nov. 2004. 

120

Bibliography 

[40] Joseph Newman and David Ingram. Augmented reality in a wide area sentient environment. 

In ISAR ’01: Proceedings of the IEEE and ACM International Symposium on 

Augmented Reality (ISAR’01), page 77, Washington, DC, USA, 2001. IEEE Computer Society. 

[41] Charles B. Owen, Fan Xiao, and Paul Middlin. What is the best fiducial? In The First 

IEEE International Augmented Reality Toolkit Workshop, pages 98–105, Darmstadt, Germany, 

September 2002. 

[42] Jun Park, Suya You, and Ulrich Neumann. Natural feature tracking for extendible robust 

augmented realities. In Proceedings of the international workshop on Augmented reality 

: placing artificial objects in real scenes, pages 209–217. A. K. Peters, Ltd., 1999. 

[43] James Patten, Hiroshi Ishii, Jim Hines, and Gian Pangaro. Sensetable: a wireless object 

tracking platform for tangible user interfaces. CHI ’01: Proceedings of the SIGCHI 

conference on Human factors in computing systems, pages 253–260, 2001. 

[44] Jenny Preece, Yvonne Rogers, Helen Sharp, David Benyon, Simon Holland, and Tom 

Carey. Human-Computer Interaction. Addison-Wesley, 1994. 

[45] J. P. Rolland and H. Fuchs. Optical versus video see-through head-mounted displays in 

medical visualization. Presence: Teleoperators & Virtual Environments, 9(3):287–309 (23), 

June 2000. 

[46] Jannick P. Rolland, Larry Davis, and Yohan Baillot. A survey of tracking technology for 

virtual environments. In W. Barfield and T. Caudell, editors, Fundamentals of Wearable 

Computers and Augmented Reality, pages 67–112. Lawrence Erlbaum, Mahwah, NJ, USA, 

2001. 

[47] Christian Sandor, Asa MacWilliams, Martin Wagner, Martin Bauer, and Gudrun 

Klinker. Sheep: The shared environment entertainment pasture. In Proceedings of the 

International Symposium on Mixed and Augmented Reality (ISMAR), October 2002. 

[48] Chris Shaw and Jiandong Liang. An experiment to characterize head motion in VR and 

RR using MR. pages 99–101, 1992. 

[49] Ken Shoemake. Animating rotation with quaternion curves. In SIGGRAPH ’85: Proceedings 

of the 12th annual conference on Computer graphics and interactive techniques, pages 

245–254, New York, NY, USA, 1985. ACM Press. 

[50] Michael Siggelkow. Importance of gaze awareness in augmented reality teleconferencing. 

Master’s thesis, Technische Universität München, 2005. 

[51] G. Simon, A. Fitzgibbon, and A. Zisserman. Markerless tracking using planar structures 

in the scene. In Proc. International Symposium on Augmented Reality, pages 120–128, 

October 2000. 

[52] David W. Stockburger. Introductory Statistics: Concepts, Models, And Applications. http: 

//www.psychstat.smsu.edu/introbook/sbk00.htm. 

[53] Franz Strasser. Bootstrapping of sensor networks in ubiquitous tracking environments. 

Master’s thesis, Technische Universität München, 2004. 

121

Bibliography 

[54] Ivan E. Sutherland. A head-mounted three dimensional display. In AFIPS Conference 

Proceedings, Fall Joint Conference, volume 1, pages 757–764, Washinton (DC), USA, 1968. 

[55] C. Tomasi and Takeo Kanade. Shape and motion from image streams: a factorization 

method - part 3 detection and tracking of point features. Technical Report CMU-CS-91- 

132, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, April 

1991. 

[56] Julian Looser Trond Nilsen, Steven Linton. Motivations for Augmented Reality Gaming. 

Technical report, HIT Lab New Zealand, 2004. 

[57] Christiane Ulbricht and Dieter Schmalstieg. Tangible Augmented Reality for Computer 

Games. In M.H. Hamza, editor, VIIP Conference Proceedings, pages 950–954. IASTED, 

ACTA Press, September 2003. ISBN 0-88986-382-2. 

[58] Stefan Veigl, Andreas Kaltenbach, Florian Ledermann, Gerhad Reitmayr, and Dieter 

Schmalstieg. Two-handed direct interaction with artoolkit. Technical report, Vienna 

University of Technology, 2002. 

[59] Florent Vial. Natural point feature tracking of a textured plane: A realtime augmented 

reality application. Master’s thesis, Lyon School of Chemistry, Physics and Electronics, 

2003. 

[60] Florent Vial. State of the art report on natural feature tracking for vision-based real time 

augmented reality. Technical report, HIT Lab New Zealand, 2003. 

[61] Daniel Wagner, Thomas Pintaric, Florian Ledermann, and Dieter Schmalstieg. Towards 

massively multi-user augmented reality on handheld devices. In Third International 

Conference on Pervasive Computing (Pervasive 2005), Munich, Germany, 2005. 

[62] Martin Wagner. Tracking with multiple sensors. PhD thesis, Technische Universität 

München, 2005. 

[63] Martin Wagner and Felix Loew. Configuration strategies of an ar toolkit-based wide 

area tracker. In The Second IEEE International Augmented Reality Toolkit Workshop, Tokyo, 

Japan, 2003. 

[64] Mark Weiser. The computer for the 21st century. SIGMOBILE Mob. Comput. Commun. 

Rev., 3(3):3–11, 1999. 

[65] Greg Welch and Gary Bishop. An introduction to the kalman filter. Technical report, 

Chapel Hill, NC, USA, 1995. 

[66] Eric Woods, Mark Billinghurst, Graham Aldrige, Barbara Garrie, Julian Loser, Deidre 

Brown, and Claudia Nelles. Augmenting the science centre and museum experience. 

Technical report, HIT Lab New Zealand, 2004. 

[67] Suya You and Ulrich Neumann. Fusion of vision and gyro tracking for robust augmented 

reality registration. In Proceedings of IEEE Virtual Reality, pages 71–78, Yokohama, 

Japan, March 2001. 

122

Bibliography 

[68] Suya You, Ulrich Neumann, and Ronald Azuma. Hybrid inertial and vision tracking 

for augmented reality registration. In VR ’99: Proceedings of the IEEE Virtual Reality, page 

260, Washington, DC, USA, 1999. IEEE Computer Society. 

123

Diploma Thesis: Improving Augmented Reality Table Top ... - TUM

Create successful ePaper yourself

Delete template?

Save as template?