Diploma Thesis: Improving Augmented Reality Table Top ... - TUM
Diploma Thesis: Improving Augmented Reality Table Top ... - TUM
Diploma Thesis: Improving Augmented Reality Table Top ... - TUM
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Technische Universität<br />
München<br />
Fakultät für Informatik<br />
<strong>Diploma</strong>rbeit<br />
❞ ❞ ❞ ❞<br />
❞ ❞<br />
❞ ❞<br />
❞<br />
❞<br />
❞<br />
❞ ❞ ❞<br />
❞<br />
❞<br />
❞<br />
❞ ❞<br />
<strong>Improving</strong> <strong>Augmented</strong> <strong>Reality</strong> <strong>Table</strong><br />
<strong>Top</strong> Applications with Hybrid<br />
Tracking<br />
Felix Löw
Technische Universität<br />
München<br />
Fakultät für Informatik<br />
<strong>Diploma</strong>rbeit<br />
❞ ❞ ❞ ❞<br />
❞ ❞<br />
❞ ❞<br />
❞<br />
❞<br />
❞<br />
❞ ❞ ❞<br />
❞<br />
❞<br />
❞<br />
❞ ❞<br />
<strong>Improving</strong> <strong>Augmented</strong> <strong>Reality</strong> <strong>Table</strong><br />
<strong>Top</strong> Applications with Hybrid<br />
Tracking<br />
Felix Löw<br />
Aufgabensteller: Univ-Prof. Gudrun Klinker, Ph.D.<br />
Betreuer: Dr. Martin Wagner<br />
Abgabedatum: 11. Mai 2005
Erklärung<br />
Ich versichere, dass ich diese Ausarbeitung der <strong>Diploma</strong>rbeit selbstständig verfasst und nur<br />
die angegebenen Quellen und Hilfsmittel verwendet habe.<br />
München, den 11. Mai 2005 Felix Löw
Zusammenfassung<br />
Anwendungen der Erweiterten Realität (<strong>Augmented</strong> <strong>Reality</strong>) bereichern die reale Welt durch<br />
die Überlagerung mit virtuellen Objekten. Um die Verschmelzung von realer und virtueller<br />
Umgebung betrachten zu können, werden grafische Ausgabehardware wie Head Mounted<br />
Display oder <strong>Table</strong>t PC, sowie Tracking Technologien zur Bestimmung von Position und<br />
Orientierung von verfolgten Objekten verwendet. Häufig verwendete bild-basierte Tracking<br />
Verfahren wie Natural Feature Tracking sind fehleranfällig für Kamerabewegungen. Bestimmte<br />
Merkmale (Features) werden über mehrere Bildsequenzen verfolgt. Die Grundidee<br />
dieser Arbeit ist, den Suchbereich für das Finden dieser Merkmale an die Bewegungsveränderung<br />
der Interaktionshardware anzupassen. Diese Arbeit ist ein erster Schritt dieses<br />
Problem für eine spezielle Anwendungsklasse für <strong>Augmented</strong> <strong>Reality</strong> zu lösen, <strong>Table</strong>-<br />
<strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong>. Diese Arbeit schlägt einen hybriden Trackingansatz vor, um beides,<br />
das Tracking und den Bewegungskontext des Benutzers zu berücksichtigen. Die gemessene<br />
Orientierung eines zusätzlichen Trackers wird für eine dynamische Laufzeitanpassung des<br />
bild-basierten Trackingverfahrens, das Tracking von Texturen, verarbeitet. Hierzu wird eine<br />
Software Architektur vorgeschlagen, die dies ermöglicht.<br />
Nach einer Einführung in <strong>Table</strong>-<strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> erörtern wir den Aufbau und<br />
Auswertung einer Benutzerstudie. Ziel dabei ist es eine Annäherung für eine lineare Abbildung<br />
von Benutzerbewegung und Suchfenster des Texturentrackings zu bestimmen. Dabei<br />
werden statistische Analysemethoden verwendet um diese Abbildung zu finden. Diese<br />
Abbildung kann in einer einfachen linearen Funktion mit der Orientierungsänderung als<br />
Eingabeparameter ausgedrückt werden. Zusätzlich wird die Beziehung zwischen dem Benutzerverhalten<br />
und ausgeführten Aufgabe untersuchen. Ferner werden Aufgaben in <strong>Table</strong>-<br />
<strong>Top</strong> Anwendungen identifiziert und Konsequenzen für das bild-basierte Trackingverfahren<br />
gefolgert.
Abstract<br />
<strong>Augmented</strong> <strong>Reality</strong> (AR) applications enrich the real world by augmenting virtual objects. In<br />
order to gaze this fusion of real environment and virtual content <strong>Augmented</strong> <strong>Reality</strong> setups<br />
utilize common graphical output hardware like Head Mounted Displays or <strong>Table</strong>t PC and<br />
tracking technologies to estimate the position and orientation of tracking targets. Frequently<br />
used vision-based techniques like Natural Feature Tracking are error-prone to camera movements.<br />
Features have to be found in subsequent video frames again. Basic idea of this work<br />
is to adopt the search area for features to the change in orientation of the user interface hardware.<br />
This work is a first step to solve this problem for a special class of <strong>Augmented</strong> <strong>Reality</strong><br />
applications, <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong>. The work provides a hybrid tracking approach<br />
to bring tracking and the user’s movement context together. Orientation information given<br />
by an additional tracker is used and applied for a dynamic configuration during runtime of<br />
the vision-based tracking routine, a texture tracking algorithm. To accomplish this a special<br />
software architecture is proposed.<br />
After we introduced the basic ideas of table top <strong>Augmented</strong> <strong>Reality</strong> we show the design,<br />
the execution and evaluation of a user study. Goal is to find an approximation for a linear<br />
mapping between user motion and search window of the texture tracking routine. Applying<br />
statistical techniques we will show that it is possible to derive such a mapping. This mapping<br />
can be expressed by a simple linear function with the change of orientation as input<br />
parameter. We will also evaluate that the user behavior is related to the performed tasks. We<br />
will identify tasks for <strong>Table</strong> <strong>Top</strong> AR and discuss implications for the tracking routine.
Purpose of This Document<br />
Preface<br />
This work was written as diploma thesis, which is adequate to a Masters <strong>Thesis</strong>, at the Technische<br />
Universität München at Prof. Gudrun Klinkers <strong>Augmented</strong> <strong>Reality</strong> Research Group<br />
(Chair for Computer Aided Medical Procedures and <strong>Augmented</strong> <strong>Reality</strong>). The work was<br />
advised by Dr. Martin Wagner. The ideas for this thesis evolved from fruitful discussions<br />
with him.<br />
The thesis was accomplished in cooperation with the Human Interface Technology Laboratory<br />
New Zealand (HIT Lab New Zealand 1 ) in Christchurch. From September 2004 until<br />
February 2005 I have been at the HIT Lab in New Zealand developing and conducting the<br />
main parts of thesis. This part was supervised by Prof. Mark Billinghurst.<br />
This abidance in New Zealand for this thesis was financially supported by a scholarship<br />
for ”Kurzfristige Studienaufenthalt für Abschlussarbeiten” by the Deutscher Akademischer<br />
Austauschdienst (DAAD) 2 .<br />
In this thesis I would like to show my ideas behind my approach, document my results<br />
and draw conclusions for future implications of my work.<br />
1 www.hitlabnz.org<br />
2 www.daad.de<br />
i
Target Audience<br />
General Readers who are interested in <strong>Augmented</strong> <strong>Reality</strong> should read chapter 1 and 2. These<br />
sections give an overview of the basic terms and technologies and introduce the ideas<br />
of my thesis<br />
Readers interested in Hybrid Tracking should read chapter 3 where my approach is explained<br />
and categorized in the related work. Succeeding steps and evaluation of my work are<br />
described in chapters 3, 5 and 6.<br />
Human Computer Interaction Researchers should read chapter 3, where our user study based<br />
approach is explained. The design, execution and evaluation of the study is shown in<br />
chapters 5 and 6.<br />
Acknowledgments<br />
First of all I would like to thank my buddy Michael ”Siggä” Siggelkow for struggling together<br />
through the whole studies and through New Zealand. Thanks for helping, motivating,<br />
partying and being a friend.<br />
I would like to thank Martin Wagner who advised this thesis. Although it is a hard task to<br />
advise a thesis on the other side of the world I think you did a very good job. Thanks a lot!<br />
Heaps of thanks to Mark Billinghurst for giving me the opportunity to come the HIT Lab in<br />
New Zealand and for his kind hospitality. Thanks to Gudrun Klinker as well for supporting<br />
and enabling the stay in New Zealand.<br />
Here is my big respect for the HIT Lab crew. Most of all I would like to thank Raphaël<br />
”le docteur” Grasset and Phil Lamb for helping me so much with my user study. Special<br />
thanks to Anna-Lee Mason and Nathan Gardiner, the good souls of the HIT Lab. Thanks to<br />
all people working or being involved at the HIT Lab.<br />
Thanks to all the people I met and I can call my friends now. Special thanks to my great<br />
swedish flatmates Johan Karlsson and Mikael Seleg˚ard. Ett stor tack! Thanks a lot to Claudia<br />
Nelles for heaps of things. Also thanks to the rest of the gang in our room: Thomas ”The<br />
German” Zurbrügg and Michael ”Intern of the month” Herchel. Also a huge thank you to<br />
Jonno Hill, Raphaël Grasset, Sofi Crosley, Jörg Hauber, Anna-Lee Mason, Phil Lamb, Marcel<br />
Lancelle, Matt Keir, Tobi Gefken, Oakley Buchmann and Nathan Gardiner. Thanks to all the<br />
people from the Canterbury University Tramping Club and the Wednesday soccer team. I<br />
had a great time with all of you.<br />
I would like to thank all my friends here in Munich. Special thanks to the crowd that came<br />
all the way to New Zealand to look after us.<br />
Last I would like to thank my family for everything and making me feel at home again.<br />
Special thanks to my brother Martin for showing me the secrets of statistics.<br />
ii
Figure 0.1: New Zealand, Mount Cook National Park<br />
iii<br />
Garching, May 2005<br />
Felix Löw
Contents<br />
1 Introduction 1<br />
1.1 Overview of <strong>Augmented</strong> <strong>Reality</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . 1<br />
1.2 Tracking in <strong>Augmented</strong> <strong>Reality</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . 3<br />
1.2.1 Representation of spatial information . . . . . . . . . . . . . . . . . . . 3<br />
1.2.2 Examples for Tracking Technologies . . . . . . . . . . . . . . . . . . . . 5<br />
1.2.3 Hybrid Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br />
1.3 Human Computer Interaction (HCI) . . . . . . . . . . . . . . . . . . . . . . . . 10<br />
1.4 Goals and Outline of this <strong>Thesis</strong> . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />
1.4.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11<br />
1.4.2 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> 13<br />
2.1 Motivation for <strong>Table</strong> <strong>Top</strong> AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13<br />
2.2 The Magic Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15<br />
2.3 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br />
2.4 Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br />
2.4.1 Marker-Based Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . 18<br />
2.4.2 Texture Tracking of a 2D plane . . . . . . . . . . . . . . . . . . . . . . . 19<br />
2.4.3 Tracking in the Magic Book . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />
2.5 User Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />
2.5.1 Graphical Output Hardware . . . . . . . . . . . . . . . . . . . . . . . . 25<br />
2.5.2 Input Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27<br />
3 A Hybrid Tracking Approach 29<br />
3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />
3.2 An Inertial - Optical Tracker based Runtime Setup . . . . . . . . . . . . . . . . 30<br />
3.3 Configuration of the setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />
3.4 Motivation for a User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />
3.5 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />
3.5.1 Natural Feature Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 32<br />
iv
Contents<br />
3.5.2 Hybrid Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34<br />
3.5.3 Head motion prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 36<br />
3.5.4 <strong>Table</strong>-<strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> . . . . . . . . . . . . . . . . . . . . . . . 36<br />
3.5.5 Our approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37<br />
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38<br />
4 A Software Architecture based on DWARF 39<br />
4.1 DWARF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />
4.1.1 Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39<br />
4.1.2 Service Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />
4.1.3 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />
4.2 Software Architecture for a Dynamic Configuration during Runtime . . . . . 42<br />
4.2.1 Existing Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42<br />
4.2.2 Requirements for new architecture . . . . . . . . . . . . . . . . . . . . . 43<br />
4.2.3 System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44<br />
4.2.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />
4.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48<br />
5 User Study 49<br />
5.1 Goals of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />
5.2 User Study design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51<br />
5.2.1 Movement Tracking of the Hand-Held Device . . . . . . . . . . . . . . 51<br />
5.2.2 Tracking of 2d feature point . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />
5.2.3 Logging Infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54<br />
5.2.4 Task Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />
5.2.5 Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />
5.2.6 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />
5.3 Execution of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61<br />
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />
6 Evaluation of the User Study 65<br />
6.1 Evaluation of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />
6.1.1 Feature Point Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />
6.1.2 Feature Point Tracking and Tracking of the Handheld . . . . . . . . . . 70<br />
6.1.3 Feature Point Tracking and Tasks . . . . . . . . . . . . . . . . . . . . . . 77<br />
6.1.4 Further Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80<br />
7 Conclusions 82<br />
7.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82<br />
7.1.1 Results of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . 82<br />
7.1.2 Natural Feature Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />
7.1.3 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> . . . . . . . . . . . . . . . . . . . . . . . . 84<br />
7.1.4 Assessment of our Approach . . . . . . . . . . . . . . . . . . . . . . . . 85<br />
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85<br />
7.2.1 Factors for User Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . 85<br />
7.2.2 Visual Cues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86<br />
7.2.3 Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86<br />
v
Contents<br />
Glossary 88<br />
A User Study 90<br />
A.1 Conduction of the User Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90<br />
A.1.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90<br />
A.1.2 Instructions and Guideline . . . . . . . . . . . . . . . . . . . . . . . . . 91<br />
A.2 Statistical Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />
A.2.1 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95<br />
A.2.2 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96<br />
B Complete Results 101<br />
B.1 Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101<br />
B.2 Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<br />
B.2.1 Case 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103<br />
B.2.2 Case 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105<br />
B.2.3 Case 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />
B.2.4 Case 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113<br />
Bibliography 118<br />
vi
CHAPTER 1<br />
Introduction<br />
Computers have always changed their appearance in the past and still will in the future.<br />
Huge central computers available for only a small group of researchers or experts have<br />
changed to desktop personal computers (PCs) available for everyone in the last 40 years.<br />
And the trend towards new small, cheap and mobile computers like mobile phones or palm<br />
desktops is continuing. Computers get more and more involved into our everyday life. New<br />
ways of interacting with these new computers have to be researched and evaluated.<br />
<strong>Augmented</strong> <strong>Reality</strong> (AR) is such an approach to bring the real world and the virtual computer<br />
world together. AR allows Human-Computer Interaction (HCI) in a new way.<br />
1.1 Overview of <strong>Augmented</strong> <strong>Reality</strong><br />
In his survey of <strong>Augmented</strong> <strong>Reality</strong> [5][2] Azuma defines <strong>Augmented</strong> <strong>Reality</strong> as follows:<br />
”<strong>Augmented</strong> <strong>Reality</strong> (AR) is a variation of Virtual Environments (VE), or Virtual<br />
<strong>Reality</strong> (VR) as it is more called. VE technologies completely immerse a user<br />
inside a synthetic environment. While immersed, the user cannot see the real<br />
world around him. In contrast, AR allows the user to see the real world, with<br />
virtual objects superimposed upon or composited with the real world.”<br />
In other words AR tries to enrich the real environment with virtual information. AR brings<br />
the real world and the computer world together. In contrast Virtual Environments leave the<br />
real world outside. The vision of VR is that the user is not aware of the ”outside world”at<br />
all and he can not interact with real objects. In AR on the other hand the user is able to<br />
interact with virtual objects as well as real objects. This virtual information is augmented in<br />
the user’s point of view, using special graphical output devices like a Head Mounted Display<br />
(HMD), utilized in a classical AR setup for example. The system immediately responds to the<br />
user’s actions and gives feedback. Realtime feedback is one of the key requirements for AR<br />
applications. If a user wearing a HMD turns his head, the new viewpoint has to be calculated<br />
and the 3D objects have to be registered with the real objects in realtime. Otherwise the<br />
1
1 Introduction<br />
user will have the feeling that the registration of the 3D objects will lag behind his head<br />
movements. The virtual information is adopted according to the user’s performance or even<br />
if the state of the environment changes. Already in 1968 Ivan Sutherland presented the<br />
first <strong>Augmented</strong> <strong>Reality</strong> system introducing the first HMD [54]. Interesting is that the basic<br />
concepts proposed by Sutherland are still valid for current AR applications.<br />
Figure (1.1) shows an example of the AR application ”<strong>Augmented</strong> Furniture Client” that<br />
allows to put virtual furniture into the real living room [19]. A user wearing a HMD can<br />
walk through his own real living room and the selected pieces of furniture are displayed in<br />
the environment according to the user’s viewpoint.<br />
Figure 1.1: Placing virtual furniture in a real environment. A virtual sofa and chair are augmented in<br />
the livingroom<br />
To realize this we have to find new ways of interacting with computers. The paradigm<br />
of a desktop PC with a keyboard and mouse as the only way to interact with computers is<br />
not suited for this kind of applications anymore. Often a mobile user has to be enabled to<br />
change the behavior of a system with more intuitive ways of interacting or even without any<br />
interaction at all. Marc Weiser describes this new way of thinking in his ”The computer of<br />
the 21st century”[64]. Also the terms of Context-Aware Computing and Mobile Computing are<br />
important if we talk of AR.<br />
Context-Aware Computing. According to Schilit and Theimer ”Context-Aware software adopts<br />
according to the location of use, the collection of nearby people, hosts, and accessible<br />
devices, as well as to changes to such things over time”. Any information about the<br />
environment that is important for the computer system has to be collected, evaluated<br />
and responded to by the system. For example a system changes its internal behavior<br />
according to the light conditions of the user’s position [63]. A huge research area in AR<br />
is therefore how to collect these informations about the environment like the location of<br />
the user. These questions will be discussed more in detail in the tracking introduction.<br />
2
1 Introduction<br />
Mobile Computing. A lot of AR applications enable the user to move around freely in the<br />
environment. Therefore a mobile setup is attached to a mobile user. Possible scenarios<br />
for this class of AR applications are maintenance tasks, like repairing a car for example.<br />
The user has his hands available to fix a car and gets virtual information into his HMD<br />
showing which step to perform next. Another example would be a navigation system<br />
displaying information about the environment in the user’s view.<br />
As a short summary key challenges in AR are:<br />
• Registration of virtual information in the real environment<br />
• Find new ways of interactions with systems that respond in realtime<br />
For 3D registration in the real environment tracking technologies are needed. Tracking is<br />
a difficult problem which will be discussed more in detail. For new user interaction possibilities<br />
innovative user interfaces (UI) are needed and it has to be discovered how user actually<br />
use a system and accept new ways of interaction or even refuse them.<br />
1.2 Tracking in <strong>Augmented</strong> <strong>Reality</strong><br />
As denoted in the previous section tracking is one of the main and most difficult issues in<br />
AR research.<br />
Virtual and real objects have to be aligned as good as possible. This process is called registration.<br />
Sensors that gather information about the environment and collect 3 dimensional<br />
(3D) spatial information are called trackers To register virtual objects in 3D, AR applications<br />
work in the 3 dimensional space. In order to calculate the viewpoint of a user and to display<br />
the the virtual information at the right position the pose information, which consists of the<br />
position as well as the orientation has to be tracked by the underlying tracking technology.<br />
This section gives an introduction to the fundamentals of tracking and an overview of<br />
the basic tracking technologies used in this thesis. A good introduction and more detailed<br />
descriptions of all the terms and technologies introduced can be found in [21].<br />
1.2.1 Representation of spatial information<br />
• Position<br />
Position is a 3D vector estimated by the tracking technology. It contains the coordinates<br />
of the specified point in the tracker coordinate system in the current tracking<br />
frame. The tracker coordinate system is a cartesian coordinate system consisting of three<br />
perpendicular axes. These axes intersect in one point, the origin of the coordinate system.<br />
In homogeneous coordinates position is represented by 4-component vector:<br />
¯v = (x, y, z, w) T , where is typically w = 0<br />
3
• Orientation<br />
1 Introduction<br />
Orientation gives the information how an object is rotated according to the axis of<br />
the tracker coordinate system. Unlike representing positions, there are more way of<br />
representing orientation that have advantages and disadvantages. It is always a tradeoff<br />
which method to choose.<br />
1. Rotation Matrix<br />
A common way to represent transformations of points in a coordinate system is<br />
the 3x3 rotation matrix. Rotation and scaling on a set of points can be combined<br />
and performed by a simple matrix multiplication. If a 4x4 homogeneous matrix<br />
is used the rotation matrix is the upper 3x3 matrix. Also translations can be computed,<br />
thus it is represented in the 4th column. The columns of the rotations matrix<br />
can be regarded as the direction of the transformed coordinate axes projected<br />
on the source coordinate axes.<br />
2. Euler Angles<br />
Euler angles are the simplest and most intuitive representation of rotations. Every<br />
rotation can be considered as three successive single rotations around the three<br />
coordinate axes. In the 3 dimensional space there are rotation matrices for every<br />
axis:<br />
⎛<br />
rotation φ about x axis: Rx = ⎝<br />
⎛<br />
rotation θ about y axis: Ry = ⎝<br />
⎛<br />
rotation ψ about z axis: Rz = ⎝<br />
1 0 0<br />
0 cos φ sin φ<br />
0 −sin φ cos φ<br />
cos θ 0 −sin θ<br />
0 1 0<br />
sin θ 0 cos θ<br />
cos ψ sin ψ 0<br />
−sin ψ cos ψ 0<br />
0 0 1<br />
Any rotation in the 3D space can be calculated by multiplying the three rotation<br />
matrices:<br />
R = Rx · Ry · Rz<br />
Note that matrix multiplication is not commutative, the order of the three rotations<br />
matters.<br />
3. Quaternions<br />
Quaternions are extensions of complex numbers to hyper-complex numbers of<br />
rank 4. On a first glance quaternions might look difficult and confusing, but once<br />
you are familiar with them calculations can be very easy. They can be represented<br />
by a 4 dimensional vector. They consist of a real scalar and an imaginary vector:<br />
q = w + xi + yj + zk, w, x, y, z ∈ R<br />
q = (w, x, y, z) , w, x, y, z ∈ R<br />
q = (s, ¯v) , s ∈ R, ¯v ∈ R 3<br />
4<br />
⎞<br />
⎠<br />
⎞<br />
⎠<br />
⎞<br />
⎠
1 Introduction<br />
Note that i, j, k are imaginary units with i 2 = j 2 = k 2 = ijk = −1, the imaginary<br />
vector is ¯v = (x, y, z) T , and the scalar part is s = w.<br />
Mukundan provides an introduction into the basic quaternion algebra providing<br />
operations like multiplication and addition, which we will not discuss here in<br />
detail [36].<br />
Rotations with quaternions: The vector part containing the imaginary components<br />
specifies the rotation axis, the scalar part is the cosine half of the rotation angle.<br />
Only unit quaternions are used to describe rotations q = 1. A quaternion<br />
q = (s, ¯v) specifies a rotation of 2arcos(s) around the axis ¯v [21]. So if we<br />
want to construct a rotation around an axis ¯v of angle θ we can express the<br />
following quaternion:<br />
qθ,v = (cos( 1<br />
θ), sin(1<br />
2 2 θ)¯v)<br />
Here are two simple examples with q = (s, ¯v):<br />
– The identity rotation is specified by rotation of 0 degrees around an unspecified<br />
axis<br />
s = 1 ¯v = (0, 0, 0) T<br />
– A rotation of 90 degrees about the y-axis is specified the following way<br />
s = 1<br />
√ 2<br />
¯v =<br />
<br />
0, 1<br />
T √ , 0<br />
2<br />
Consecutive rotations can be expressed with the product of the corresponding<br />
quaternions. A rotation q can be applied to a vector ¯p = (x, y, z) the following<br />
way:<br />
¯p ′ = q ◦ ¯p ◦ q ∗<br />
with the conjugated quaternion q ∗ = (s, −¯v).<br />
A discussion about the advantages and disadvantages of the different representations<br />
of orientation can be found here [49].<br />
1.2.2 Examples for Tracking Technologies<br />
In order to select a appropriate tracking system for an AR application certain criteria have<br />
to be evaluated. The most important criteria are latency, accuracy, update rate, working area<br />
and mobility [46].<br />
Tracking could be compared to how a human being collects information about the environment:<br />
by seeing, by sensing (hearing, feeling, recognizing certain influences) and by<br />
equilibrium sense. Therefore tracking technologies can be categorized in almost similar categories.<br />
This is only a small selection of different tracking technologies used in this thesis.<br />
5
Vision-Based Tracking (Seeing)<br />
1 Introduction<br />
Vision-based tracking apply image recognition techniques in order to detect certain features<br />
in images grabbed by optical cameras. Thus speaking of optical trackers means a combination<br />
of hardware to grab video frames and software to analyze the frames. The terms<br />
vision-based tracking and optical tracking will be used in the same way during this thesis.<br />
Tracked features can be artificial or natural. They are used to calculate the position of the<br />
target in the reference coordinate system. While often simple markers are used as artificial<br />
features [28], natural features can either be preprocessed points in a 2D plane [11] or any<br />
features in the environment [42]. This method provides full 6 degrees of freedom (DOF). This<br />
means that vision based tracking provide position and the orientation as well.<br />
A common software used for marker-based tracking is the <strong>Augmented</strong> <strong>Reality</strong> Toolkit<br />
(ARToolkit) [28]. Recently a new version of this toolkit has been developed which allows the<br />
texture tracking of a 2D plane instead of fiducials [11]. This software has been developed by<br />
the HIT Lab (Human Interface Technology Laboratories) USA and New Zealand 1 . A cheap<br />
web cam on an average desktop computer can be used with this software and allows to set<br />
up a small AR system even at home. AR applications based on vision-based tracking could<br />
become very important in the future in order to address a broad mass of people.<br />
Optical Tracking is often also categorized in Outside-In or Inside-Out Tracking. In an<br />
Outside-In setup the camera is attached to a fixed position, in an Inside-Out setup the camera<br />
is attached to the moving target itself (on the HMD of a user for example).<br />
One disadvantage is a high latency because of the huge amount of video data grabbed by<br />
the camera and the high processing time of the image recognition algorithms. Drawback for<br />
the user is mainly occlusion. If either artificial or natural features are occluded the tracking<br />
will fail (marker based tracking) or will lead to inaccurate results of the tracking routine if<br />
less features are available for tracking (natural feature tracking).<br />
Inertial Tracking (Equilibrium sense)<br />
In order to measure the position of an object, inertial trackers use accelerometers which<br />
estimate the linear acceleration of an object. Gyroscopes instead measure angular velocity<br />
applying the laws of conservation of angular momentum and therefore gyroscopes are able<br />
to provide the orientation of an object. The orientation is delivered as yaw (y-axis), pitch<br />
(x-axis) and roll (z-axis).<br />
Current technologies for gyroscopes play an important role, because they are small, cheap<br />
and easy to integrate into other devices like laptops or even mobile phones. But historically<br />
gyroscopes were used for navigation in airplanes and ships. These inertial devices<br />
were heavy, expensive and due to their navigation task very accurate. But new technologies<br />
enabled the development of smaller and cheaper devices. Here is a short overview of<br />
the common techniques used for gyroscope devices [23]. A special focus is set on vibrating<br />
gyroscopes.<br />
1 www.hitlabnz.org<br />
6
• Spinning Mass Gyroscopes<br />
1 Introduction<br />
These classical gyroscopes are also called gimbaled gyroscopes. They use the properties<br />
of a spinning wheel and can only measure the rate of rotation about one axis. Thus<br />
three gyroscopes have to be combined if the rotation about three orthogonal axes has<br />
to be sensed. These gyroscopes are heavy and large they are only applied in ships and<br />
aircrafts anymore.<br />
• Optical Gyroscopes<br />
These gyroscopes apply the time of flight (TOF) principle. The time of flight until a<br />
signal is sensed by a receiver is measured. For a gyroscope it means that rotation<br />
influences the time of flight for light. The time is measured and the rate of turn can be<br />
estimated.<br />
• Vibrating Gyroscopes<br />
Vibrating gyroscopes are commonly used in recent application. Reasons for that are<br />
that they are small, consume less power and no bearing or motors are required. A vibrating<br />
element is rotated. Evaluations have shown a ring-shaped vibrating resonator<br />
is suited best for the purpose of measuring rate or turns. For the measurement of the<br />
angular rate a phenomenon known from the aviation domain is utilized, the Coriolis<br />
Effect. If an airplane is heading east it will drift towards south, although it does not<br />
accelerate in the south direction. Heading west leads to a drift in the north direction.<br />
This ”force” responsible for the acceleration in the north-south direction is called the<br />
Coriolis force FC. This effect occurs when an object moves within a rotating reference<br />
frame. In this special case an aircraft moves in the reference frame of the rotating earth.<br />
Figure 1.2 shows this effect. An object moves around a rotation axis and the Coriolis<br />
force affects the object perpendicular to the movement direction. This effect can also be<br />
recognized in thunderstorms, weather developments or the water flushing down the<br />
sink in the other direction on the southern hemisphere. A good demonstration of the<br />
effect can be seen in [16].<br />
The Coriolis force FC can also be expressed in the following equation with the objects<br />
mass m, its velocity in the rotating frame vr, the angular velocity of the rotating frame<br />
of reference ω and × the vector cross-product.<br />
FC = −2m(ω × vr)<br />
A good introduction to the Coriolis force can be found here [25]. This effect is applied<br />
in vibrating gyroscopes. Figure 1.3 shows the reference coordinate frame with the vibrating<br />
ring. First let us only consider rotations around the axis Z. The ring is vibrated<br />
with a constant amplitude ΩZ around the Z-axis. This is called the primary mode. If<br />
the gyroscope is turned around the Z-axis now, the Coriolis effect leads to a acceleration<br />
perpendicular to the motion of the ring (in w-direction), the also called secondary<br />
mode. This secondary mode can be measured and the rate of turn can be calculated.<br />
If we now vibrate the ring around all axis of the reference frame Ω = (ΩX, ΩY , ΩZ) we<br />
can measure the rate of turn for all directions [22] [14].<br />
7
1 Introduction<br />
Figure 1.2: An object moves in a rotating frame. The Coriolis force results in an acceleration perpendicular<br />
to the movement direction<br />
Figure 1.3: The reference frame of a vibrating gyroscope. The ring is vibrated around the all the axis<br />
(primary mode) and the rate of turn can be measured by the secondary mode cause by the Coriolis<br />
effect<br />
8
1 Introduction<br />
Figure 1.4: Intersense Products: Inertial Cube2 (left) and Intertrax2 (right)<br />
Figure 1.5: Example for a Magnetic Tracker: Ascension Flock of Birds<br />
Accelerometers and gyroscopes are often combined to get full 6 DOF. Due to the fact that<br />
inertial trackers provide relative measurements they are often combined with other tracking<br />
technologies to obtain absolute measurements as well. One drawback of this technology is<br />
that small measurement errors accumulate and cause drift, which leads to incorrect tracking<br />
results after a certain period. Widespread inertial tracking products are tracker by Intersense<br />
2 (see figure 1.4).<br />
A difficult drawback for the usage is the accumulation of drift. Either relative orientation<br />
is used or the setup has to integrate other tracking or filtering technologies in order to correct<br />
the measurements delivered by the inertial tracker.<br />
Magnetic Tracking<br />
A special hardware setup generates a magnetic field in a certain working area. These magnetic<br />
fields are either low frequency AC or DC fields. Three orthogonal coils in the sender<br />
as well as the receiver are used to produce measurements for position and orientation (6<br />
DOF). The physical principles applied are described in [21]. Tracking measurements can be<br />
distorted by ferromagnetic objects or CRT monitors which produce an artificial magnetic<br />
field for example. An example for a magnetic tracking system is the Ascension 3 Flock of<br />
Birds(1.5).<br />
Main drawback for users is the limited range of tracking. Best results are achieved near the<br />
2 www.intersense.com<br />
3 http://www.ascension-tech.com<br />
9
1 Introduction<br />
base station. Also interferences with artificial magnetic fields or metals have to be avoided.<br />
Other examples<br />
Other tracking technologies are acoustic trackers, the Global Positioning System (GPS), which<br />
is important for outdoor applications and mechanical trackers.<br />
1.2.3 Hybrid Tracking<br />
All the tracking technologies have their advantages and drawbacks, like latency, frame rates,<br />
mobility and precision. But as we said that tracking is one of the key issues for AR this<br />
weaknesses have to be eliminated. One approach is to combine several tracking technologies<br />
to compensate the drawbacks of a single technology. For example inertial trackers can only<br />
give relative pose information. If it is combined with a GPS system the resulting tracking<br />
system can deliver absolute pose information (the world coordinates as a reference frame). In<br />
such applications user movement heavily affects the 3D registration and has to be stabilized<br />
by inertial trackers[1].<br />
As we said inertial tracking accumulates small measurement errors that cause drift. To<br />
compensate this vision-based tracking techniques are integrated to filter and even predict<br />
orientation estimations. An important filtering technique in this context is called Kalman<br />
Filtering.<br />
Kalman Filter: Kalman Filtering is a powerful mathematical tool using a prediction and correction<br />
loop to filter and stabilize error-prone data. If several trackers are combined<br />
this tool can be used to correct the measurements of each other. Bishop and Welch<br />
provide a good introduction to the Kalman Filter [65].<br />
In the context of Context-Aware and Ubiquitous Computing multiple sensors in the environment<br />
have to be combined and evaluated. In Ubiquitous Tracking these sensors can be<br />
dynamically integrated during runtime [39].<br />
1.3 Human Computer Interaction (HCI)<br />
As mentioned above not only a satisfying registration in 3D is a must in AR, but also a suited<br />
way of interaction with AR systems has to be researched and evaluated. According to [24]<br />
research in this area focuses on the following tasks:<br />
• Design<br />
• Evaluation<br />
• Implementation<br />
of interactive computing systems for human use. During this steps also factors influencing<br />
the user behavior, like psychological aspects are studied. All these aspects might vary<br />
within different user groups or with a changing application domain. Therefore understanding<br />
the user is one of the key requirements when developing user interfaces. Based on that<br />
10
1 Introduction<br />
observations also the suited input and output hardware has to be selected. It is important<br />
to notice that this is a new way of thinking, because the last decades the user has always<br />
been forced to learn how to interact with a computer system. Input- and Output devices,<br />
like keyboard, mouse and a monitor were fixed. The vision of HCI is that a user does not<br />
have to adopt to a system at all, the system adopts to the user.<br />
1.4 Goals and Outline of this <strong>Thesis</strong><br />
My overall vision is to bring tracking and user behavior together. In almost every AR application<br />
the user has to learn how tracking works in order to improve her or his performance.<br />
The user has to learn which actions are allowed and which ones result in an unexpected<br />
feedback by the system, or even in no feedback at all. Every class of AR applications has<br />
special requirements concerning tracking, user interaction and mobility. As already said the<br />
user interaction will change from application to application and therefore tracking requirements<br />
dependent on the user behavior will be different. So user behavior has to be studied<br />
even for a specific application domain or even just a single task.<br />
1.4.1 Goals<br />
This thesis focuses on special class of AR applications: AR <strong>Table</strong> <strong>Top</strong> Applications. <strong>Table</strong> top<br />
applications can be set up at a desk or a table with a user standing or sitting in front of it. All<br />
the studies have been done with a table top application called The Magic Book developed at<br />
the HIT Lab New Zealand.<br />
Goal of this thesis is to evaluate if it is possible to adjust the behavior of the vision-based<br />
tracking algorithm according to the user’s movements. This should be realized with a hybrid<br />
tracking approach combining the vision-based tracking of the Magic Book on the one hand<br />
and a gyroscope giving information about the orientation of the user interface on the other<br />
hand. The gyroscope will be a measurement of the occurring user movements.<br />
The following issues will be discussed in order to get a little bit closer to this vision:<br />
• Properties of <strong>Table</strong> <strong>Top</strong> AR<br />
The special properties concerning tracking and user interaction in <strong>Table</strong> <strong>Top</strong> AR will<br />
be evaluated.<br />
• User study-based approach to find a mapping between behavior of the tracking algorithm<br />
and user movement<br />
If it is possible to find such a mapping we can adjust the algorithm very easy. We<br />
can simply apply a function with the movement as input parameter and configuration<br />
setting for the vision-based tracking as output parameters. To achieve this an approach<br />
based on a user study is introduced.<br />
• A software architecture allowing a dynamic configuration of the tracking algorithm<br />
based on the DWARF (Distributed Wearable <strong>Augmented</strong> <strong>Reality</strong> Framework) framework.<br />
11
1 Introduction<br />
The information given by the gyroscope has to be integrated in the tracking routine<br />
during runtime. Thus a software architecture accomplishing this requirement is needed.<br />
DWARF [6] is a component based AR framework. Some of these components have to<br />
be extended in order to integrate the texture tracking version of the ARToolkit.<br />
1.4.2 Outline of the thesis<br />
Heres a brief overview of the chapters of this thesis.<br />
Chapter 2: In this chapter we will give a motivation for <strong>Table</strong> <strong>Top</strong> AR and introduce the<br />
Magic Book. All the research done in this thesis is based on this application. Requirements<br />
on tracking and user interaction for <strong>Table</strong> <strong>Top</strong> AR will be discussed in this<br />
chapter and the fundamental techniques used in the Magic Book will be explained.<br />
Chapter 3: This chapter describes the approach to bring tracking and user behavior together<br />
and provides an idea for a user study based evaluation. The approach will be classified<br />
into other research projects and existing related work.<br />
Chapter 4: A DWARF based architecture will be introduced that allows a dynamic configuration<br />
of the tracking algorithm.<br />
Chapter 5: This was the main part of the work. A user study was performed with the goal to<br />
find a dependency between the tracking algorithm and user movement. The chapter<br />
describes the design and execution of the user study.<br />
Chapter 6: The user study will be evaluated. We will try to measure the correlation between<br />
the recorded tracking data of the vision-based tracker and the movement information<br />
given by the gyroscope. A mapping has to be found. Also I will present further ideas<br />
on how the gathered data can be evaluated and analyzed.<br />
Chapter 7: Results will be evaluated, conclusions will be presented and ideas for future work<br />
will be discussed.<br />
12
CHAPTER 2<br />
<strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
<strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> is a specific class of AR applications. As the name says the<br />
application is set up on a table or desk. The user stands, sits or moves in front of the table<br />
setup. This leads to special requirements on user interaction, tracking and mobility of the<br />
whole setup. As an example application this work deals with the Magic Book which was<br />
developed at the HIT Lab New Zealand.<br />
The concepts and technologies described in this chapter are generally valid for all AR<br />
applications as well. But we will always consider them in the context of table top AR applications.<br />
First we will give a motivation for table top AR for various application domains.<br />
The Magic Book and some other applications will be introduced shortly and possibilities<br />
for interactions and tracking will be evaluated. As we will see vision-based tracking has<br />
serious advantages for those applications. As the Magic Book uses the ARToolkit texture<br />
tracking technology we will give a short introduction to the algorithm. Several user interfaces<br />
for graphical output, especially HMD, a special handheld device and a tablet PC will be<br />
introduced. Also a short overview of input devices will be provided.<br />
2.1 Motivation for <strong>Table</strong> <strong>Top</strong> AR<br />
The reason for setting up applications on a table environment is very simple. People work,<br />
read, discuss, interact and play games on tables. Therefore table top AR applications try<br />
to enhance the experience of a current task or a social event with techniques used in AR.<br />
It is also called a horizontal setup which is a characteristic of table top environments. We<br />
have already discussed that one important issue of AR is the alignment of virtual and real<br />
objects. In table top applications these virtual objects are displayed in the work space, the<br />
table itself. People can sit, stand or even walk around the table and applications allow the<br />
interaction with the virtual environment and even communication and interactions between<br />
the participating users itself.<br />
Here are examples for very different application domains for table top AR with some<br />
examples for related work:<br />
13
• Exhibitions and Education<br />
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
A lot of research has been done to bring new ways of multimedia interactions in museums<br />
and exhibitions. AR is a new way for interactive multimedia presentation. With<br />
a graphical output device like a HMD a user can walk through a museum. While<br />
regarding cultural objects audio and virtual information is augmented to the users<br />
recognition for example. The user is standing in front of the exhibition piece which<br />
is placed on a desk or on a table. The HIT Lab New Zealand developed a special AR<br />
kiosk for applications like this (figure 2.1). This kiosk can be used for a variety of applications<br />
like for education, science presentations and entertainment. [66] presents<br />
some of those applications used for educational purposes, like the AR Volcano (figure<br />
2.1), an interactive tutorial about volcanoes and the S.O.L.A.R (Solar-System and<br />
Orbit Learning in <strong>Augmented</strong> <strong>Reality</strong>) where a user succeeds if he is able to arrange<br />
augmented planets around the sun in the right way.<br />
Figure 2.1: The <strong>Augmented</strong> <strong>Reality</strong> Kiosk (left) and the AR Volcano application developed by the<br />
HIT Lab New Zealand (right)<br />
Mark Billinghurst writes about the potential of AR in education in his internet essay<br />
”New Horizons for Learning”[7].<br />
• Gaming<br />
Gaming is an interesting application domain for AR and especially table top AR. Players<br />
sit or stand around the table and get the results of their interaction augmented in<br />
the viewpoint. The immersion in a game experience and therefore the fun factor increases<br />
[56]. Immersion is a measurement to what degree a player is affected by a<br />
virtual or augmented experience. For example the classical PC game ”Worms” was<br />
ported to an AR application. The Studierstube also developed a collaborative game to<br />
steer a virtual train on a real network of wooden play rails. The trains can only be seen<br />
and manipulated by a see-through PDA device [61] (see figure 2.2).<br />
• Interactive Storytelling<br />
14
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
Figure 2.2: The Invisible Train: An <strong>Table</strong> <strong>Top</strong> AR game developed by the Studierstube, Vienna<br />
New ways of telling stories are explored. As well as with enhancing the museum<br />
experience by augmented audio and virtual information, AR is used for storytelling as<br />
well. Using easy authoring tools even children are enabled to create their own content<br />
and create their own virtual worlds [34]. The Magic Book is such an application.<br />
• Collaboration<br />
Even new ways of collaboration are researched. Billinghurst also evaluated the potentials<br />
of AR applications for collaboration [8]. Michael Siggelkow developed an application<br />
for remote collaboration that could be set up easily on any desktop. In his thesis<br />
he explores in what way AR enhances the awareness of the participants in comparison<br />
to other technologies [50].<br />
The potential that such applications are also suited to catch the attention of a broad mass<br />
of people makes them interesting in the future. At the moment AR is used mainly in the<br />
industry yet and the door for non-experts has still to be opened.<br />
2.2 The Magic Book<br />
This thesis will focus on a special application: An interactive fairy tale book. The Magic<br />
Book itself is just a framework for a variety of applications based on<br />
• The book paradigm<br />
To change the content of the virtual scenes a book is used. Like reading a real book the<br />
user can turn pages and the content is augmented on the book page. Thus for every<br />
page a corresponding virtual model exists. The book is a tangible device for interacting<br />
with the application. The paradigm of tangible devices is embossed by Ishi [43].<br />
• A handheld visor<br />
With this visor the virtual objects are augmented in the user’s viewpoint. It will be<br />
described more in detail in user interaction section.<br />
The application content we are working with deals with the fairy tale ”Giant Jimmy Jones”,<br />
which was especially written for this purpose. On every page another part of the story is<br />
15
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
augmented on the book. The story continues when a user turns the page. Also audio output<br />
is supported, a storyteller explains the scenes and a soundtrack has been written 2.3.<br />
A user is equipped with the hand held visor, the so-called handheld device. He can look<br />
through this visor, which is a specially prepared HMD and gaze the 2D book pages. While<br />
standing in front of the AR kiosk he is enabled to zoom into the scene in case he wants to<br />
focus on a certain property of the 3D animated fairy tale. Zooming into the scene is done<br />
by getting closer to the book surface with the handheld device. It is possible to walk around<br />
and watch the scene from different points of view. The book is attached to a rotatable plate.<br />
Hence the scene can also be watched from another point of view by simply turning the plate,<br />
which could also be considered as a tangible interaction device.<br />
Figure 2.3: Giant Jimmy Jones, an interactive fairy tale<br />
The different aspects concerning tracking and user interaction will be discussed in the<br />
following corresponding sections.<br />
2.3 Requirements<br />
In order to have a full understanding of the term table top AR we will discuss certain properties<br />
and requirements for AR. This is done to give a rationale why certain user interaction<br />
hardware or tracking technologies evolved for the class of table top AR applications. We will<br />
evaluate these requirements in the context of table top AR. <strong>Table</strong> top AR has the common<br />
properties of AR but also additional requirements.<br />
• Alignment in realtime<br />
In the introduction I already made clear that this is a key issue for a AR. Alignment<br />
again means that the virtual information is registered in the real environment at the<br />
16
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
exact position and with the exact orientation according to the information sensed by<br />
the underlying tracking technology. Thus we need a tracker or a combination of trackers<br />
providing 6 DOF. If the realtime requirement would not be met, the user always<br />
would have the feeling of a lag between the actual movement and the display of the<br />
virtual information. This requirement has to be met by the tracking infrastructure.<br />
• Usability<br />
Usability Engineering is a new disciple in software engineering trying to solve the<br />
question how to make a computer system usable. Due to the fact that visitors of a<br />
museum or participants of a game, for example, are not experienced AR users the<br />
interaction with table top applications has to be very easy and intuitive. This requirement<br />
has to be considered while designing the user interaction. The right choice of<br />
user interface hardware and software design has to be made.<br />
• Mobility of the setup<br />
It should be possible to set up the system on every table without further difficulties.<br />
Exhibitions will move or have to be rearranged and a fixed setup for playing a game<br />
is not suited. This has serious requirements as well for the chosen tracking technology,<br />
because a stationary setup is not suited.<br />
• Price<br />
Of course a cheap price is always a requirements for software systems, but in order<br />
to address the public audience or budgeted art galleries with this new technology a<br />
tracking setup for several thousand Euros would not make sense, even if the measurements<br />
concerning latency, accuracy and update rate provide better results. Thus an<br />
affordable user interface and tracking infrastructure is needed.<br />
In the next section will will discuss which tracking technology and which user interfaces<br />
are suited best to meet these requirements for table top AR.<br />
2.4 Tracking<br />
The best technology to meet all the requirements evaluated in the previous section are vision<br />
based tracking techniques, especially Inside-Out tracking. As a repetition an optical tracker<br />
consists of a camera and an image recognition software. The camera grabs video images and<br />
the software algorithm searches for features to calculated the exact position and orientation<br />
to display the virtual object. Here is a short discussion why this technology is suited best for<br />
table top environments:<br />
• Perfect alignment in realtime<br />
First vision-based tracker deliver position as well as orientation. The measurements<br />
given by the tracker is accurate and good enough for this kind of applications. If the<br />
tracking fails in several frames it is accepted, because the consequences are not that<br />
serious, although the usability decreases. The bottleneck is the high latency, because<br />
the video data first has to be transmitted from the camera in the main memory, then<br />
the computational expensive image recognition has to detect the features. Hence the<br />
17
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
quality of the tracking is dependent on the update rate of the camera and the speed of<br />
the image processing software.<br />
• Usability<br />
A camera needs to be integrated in the user interface (Inside-Out tracking), which is<br />
an additional requirement for the UI now. As mentioned above the tracking might<br />
fail due to fast movements, changing light conditions or wrong usage for example.<br />
Also occlusion is a main drawback. It has to considered that a user might occlude<br />
trackable features during usage. Suited and easy to learn interaction techniques have<br />
to be applied.<br />
• Mobility of the setup<br />
This is one of the big advantages, because no huge hardware setup is needed. Just a<br />
web camera, usually connected via USB or firewire, is enough. It can be attached and<br />
detached almost without any effort.<br />
• Price<br />
This is definitely the killer argument for the selection of optical tracking. On the one<br />
hand good webcams are already available for less than 100 Euros. On the other hand<br />
free image recognition toolkits are offered for programmers to design the software, like<br />
the ARToolkit.<br />
Next we will give a short introduction on marker-based tracking and then a deeper view<br />
into the algorithm of the texture tracking version of the ARToolkit. Further on in the thesis<br />
we will describe our approach to adopt parameters of this algorithm to movement information.<br />
2.4.1 Marker-Based Tracking<br />
A basic and freely available software for marker based tracking is the ARToolkit [28][27].<br />
Although several marker-based tracking algorithms are available we will focus on the AR-<br />
Toolkit here, because it is used in this thesis. The environment has to be prepared with<br />
quadratic pattern used for the calculation of the relative position of the camera to the so<br />
called markers (see figure 2.4). These markers are black and white squares with a black border<br />
and a configurable pattern in the inside. This pattern can be created individually and<br />
has to be preprocessed first. Owen describes the criteria for a ”good” fiducial [41]. The AR<br />
volcano uses this vision-based tracking toolkit (see 2.1). Advantage is that the computation<br />
of the homographie-matrix, which describes the relation between the camera and the marker<br />
plane works faster than the texture tracking introduced next. The big disadvantage is that it<br />
is totally error-prone to occlusion. If the marker or only parts of the marker disappear in the<br />
video image the tracking fails. This restricts the usage, because the complete marker always<br />
has to be in the video stream.<br />
The texture tracking introduced next is an extension of the ARToolkit, it still uses marker<br />
recognition for obtaining an initial position.<br />
18
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
Figure 2.4: Examples of an ARToolkit Marker and a 2D textured plane used in the Texture Tracking<br />
version of the ARToolkit (Note that this version is still using markers for initialization as well)<br />
2.4.2 Texture Tracking of a 2D plane<br />
The ARToolkit has been introduced to have a quite simple toolkit to write small AR applications<br />
based on vision-based marker tracking. However the texture tracking version of the<br />
ARToolkit allows to track two dimensional textures instead of black and white markers 1 [11].<br />
These images 2.4 have to be preprocessed in order to calculate a set of feature points that are<br />
used for the tracking algorithm. Note that the marker detection is still used for obtaining<br />
the initial position and orientation. Thus the image to preprocess has to contain a ARToolkit<br />
marker. Once the initial pose information is calculated it continues with the tracking of point<br />
features. The texture tracking toolkit provides both the preprocessing tools and the tracking<br />
algorithms.<br />
In this context we want to distinguish between texture tracking and Natural Feature Tracking<br />
but this is our own definition of terms used in the thesis. Texture tracking is used with<br />
preprocessed images, preprocessed textures. Still the environment has to be prepared. Like<br />
markers the textures have to be placed on the table, the wall or the floor. Natural Features<br />
are features that are not artificially placed in the environment, like edges, lines or other properties.<br />
In the related work section 3.5 we will discuss ideas for natural feature tracking as<br />
well. The next subsection is thought as a small tutorial for the texture tracking algorithm<br />
of the ARToolkit. While analyzing the algorithm the ideas for this work have evolved. A<br />
deeper analysis of the algorithm can be found in Vials master thesis [59].<br />
Algorithm of the Texture Tracking ARToolkit<br />
Every single step of the algorithm is shown in figure 2.6:<br />
1. The data structures for the ARToolkit-Handles for the texture tracking and for the<br />
marker tracking as well are created and initial parameters are set. As mentioned above<br />
the initial orientation is given by the marker position in the image frame. This is done<br />
by a simple call of the marker detection method of the ARToolkit (step 1 and 2).<br />
1 This version is not available under license yet. Please contact Hirokazu Kato for further information:<br />
kato@sys.es.osaka-u.ac.jp<br />
19
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
2. Due to the initial positions four feature points of the preprocessed image that are visible<br />
in the current video frame are selected to update the position. Later on the selected<br />
feature points have to be found in the video frame. Out of all the feature point candidates<br />
four points are selected according to the following rules. The number in the<br />
brackets shows the step in the algorithm.<br />
• F P1 This point has to be furthest away from the video frame center (5)<br />
• F P2 This point has to be furthest away from F P1 (11)<br />
• F P3 This point has to maximize the surface of the triangle between F P1, F P2 and<br />
the new F P3 (12)<br />
• F P4 This point has to maximize the surface of the square between F P1, F P2, F P3<br />
and the new F P4 (12)<br />
3. Once a feature point is selected a template is created for it. This template is used by<br />
the Normalized Cross Correlation (NCC) method to deliver a measurement for similarity<br />
between the template and an area around a pixel in the video frame. Reason for that<br />
is that it is unlikely to find a feature point in the next frame again. Thus windows are<br />
compared. Figure 2.5 shows a template for a selected feature point F P . Now following<br />
NCC parameters for the template are calculated (6):<br />
• The average pixel value:<br />
averagetemplate<br />
• A vector with the normalized pixel values. The advantage of the normalization is<br />
that different light conditions between frames do not result in different correlation<br />
values:<br />
∀(x, y) ∈ template :<br />
vectortemplate(x, y) = valuepixel(x, y) − averagetemplate<br />
• The normalized vector length of the template:<br />
<br />
lengthtemplate =<br />
∀(x,y)∈template<br />
(2.1)<br />
(vectortemplate(x, y) 2 ) (2.2)<br />
4. The algorithm now estimates the location of the selected FP in the video frame based<br />
on the previous location. Here simple approach is used.<br />
Three different estimation methods are provided. These methods take previous pose<br />
calculations at different levels into account. For each frame every method is evaluated<br />
to find the feature point and the best result is taken. Every method represents a<br />
different movement model of the camera.<br />
• The first method assumes that no movement occur between two frames. Using<br />
this assumption the algorithm simply takes the position pk i of the F Pi in the last<br />
frame and searches the position p k−1<br />
i for the F Pi in the new frame within a certain<br />
20
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
Figure 2.5: Texture Tracking Template: Not only single feature points but whole areas are compared<br />
search window size. This assumption however is not really realistic because it<br />
does not consider movements at all. But movements definitely occur between<br />
two frames and cause displacements (7).<br />
p k i = p k−1<br />
i<br />
(2.3)<br />
• The second method takes the last two tracking frames into account. Here it is<br />
assumed that the displacement between two frames is constant.<br />
v = p k−1<br />
i<br />
− pk−2<br />
i<br />
(2.4)<br />
Now the position of the feature point in the current frame can be calculated. The<br />
equation<br />
results to<br />
p k i = p k−1<br />
i<br />
p k i = 2p k−1<br />
i<br />
+ v (2.5)<br />
− pk−2<br />
i . (2.6)<br />
• The third method uses the last three positions of the feature point in a similar way.<br />
p k i = 3p k−1<br />
i<br />
− 3pk−2<br />
i<br />
+ pk−3<br />
i<br />
(2.7)<br />
A common and more sophisticated method to predict the position of a feature in the<br />
next frame is a Kalman filter which is not applied here. Examples will be shown in the<br />
related work.<br />
5. Around the estimated position of the tracked feature the algorithm searches for the<br />
best matches to update the real position of the feature point. This is done for all three<br />
position estimates described in the step before. Within a fixed search window the correlation<br />
value for every area around the pixel with the template is calculated. In figure<br />
2.7 the parameters used for the calculation of the NCC value are shown.<br />
21
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
Figure 2.6: The ARToolkit Texture Tracking Algorithm<br />
22
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
Figure 2.7: Template Matching<br />
To obtain the correlation value which gives a measurement for similarity between the<br />
template and the pixel area at position (i, j) area several calculations are made for<br />
every pixel within the search area.<br />
• Similar to the creation of the template every pixel within the pixel area is normalized<br />
by subtracting the average pixel value. Note that the size of the pixel area is<br />
exactly the template size.<br />
∀(x, y) ∈ pixelarea :<br />
vectorpixelarea(x, y) = valuepixel(x, y) − averagepixelarea<br />
• Also the length of this vector is calculated.<br />
<br />
lengthpixelarea =<br />
∀(x,y)∈pixelarea<br />
(2.8)<br />
(vectorpixelarea(x, y) 2 ) (2.9)<br />
• To calculate the correlation between the pixelarea and the template every value in<br />
vectorpixelarea is multiplied with the corresponding (at the same position) value<br />
in vectortemplate.<br />
corr =<br />
templateSize2 −1<br />
(vectorpixelarea(i) · vectortemplate(i)) (2.10)<br />
i=0<br />
23
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
• Finally the similarity can be calculated.<br />
sim =<br />
corr<br />
lengthpixelarea · lengthtemplate<br />
(2.11)<br />
• The result is a measurement for the similarity of the area around pixel (i, j) and<br />
the template. It has a range from -1, which indicates no similarity at all, to 1<br />
indicating a high correlation. Here we will just mention that later on we use another<br />
correlation method as a statistical analysis tool for the evaluation of the user<br />
study. So this similarity must be calculated for every pixel within the search area.<br />
The one with the highest similarity value is most likely to be the feature point in<br />
the current frame.<br />
6. According to the requirement to track four feature points these steps have to be done<br />
four times to obtain the locations of the four feature points in the current frame. Note<br />
that we are using three different methods, which are described above, to estimate the<br />
positions of the feature points.<br />
7. Now for every method the position of the 2D plane is calculated and the one producing<br />
the smallest tracking error is taken. Finally the algorithm provides the correct position<br />
of the textured plane in the camera viewpoint.<br />
Complexity of the template matching algorithm<br />
Considering the single steps described there are two main drivers for the complexity and the<br />
robustness of the texture tracking algorithm. For every pixel within the search size area<br />
the area around the pixel<br />
O(searchSize 2 )<br />
O(templateSize 2 )<br />
is compared with the template of the current feature point. This leads to a complexity of<br />
• Search Size<br />
O(searchSize 2 · templateSize 2 ) (2.12)<br />
Due to movements of the camera the estimation of the feature point position with one<br />
of the three methods is not correct and therefore several points around the estimated<br />
feature point are considered. These points are within a certain search window. So a<br />
large search window provides a higher probability to find the point with the highest<br />
correlation value. But this also means that for every point within the search the correlation<br />
has to be calculated. This will increase the computation time. On the other<br />
hand a small search size will speed up the computation but the tracking robustness<br />
will decrease.<br />
This parameter is set to a constant value during runtime. According to Billinghurst the<br />
minimal error will result with a pixel search area of 48 2 pixels [11]. Thus the algorithm<br />
is configured with a constant search window length of 48. Here the main idea of this<br />
thesis evolves. Does the parameter of the search window size has to be constant? Is it<br />
possible to adjust this parameter during runtime with movement information?<br />
24
• Template Size<br />
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
If the template size is large the algorithm will provide a higher quality of the correlation<br />
value. In this thesis the template size will be considered as constant during runtime.<br />
2.4.3 Tracking in the Magic Book<br />
The tracking technology used in the Magic Book ”Giant Jimmy Jones” is the texture tracking<br />
of a 2D plane. This has several advantages to previous marker-tracking based applications:<br />
• The marker does not have to be in the view<br />
As we said with marker-based tracking occlusion is a serious problem. But as the algorithm<br />
uses certain features of the image, the marker is only needed for initialization.<br />
Of course if the tracking fails, the marker has to be in the view of the user again. But<br />
after initialization the user can move in a way that the marker disappears in the video<br />
frame.<br />
• It is possible to zoom in and zoom out<br />
The user is now enabled to zoom in a scene. If the user gets closer the tracking still<br />
works, it just selects feature point that are more suited for the current tracking frame.<br />
In contrast with marker-based tracking zooming in will fail because it is likely that the<br />
marker is not completely in the video image.<br />
• Any preprocessed image can be used<br />
Any image can be used, with the restriction that there still has to be a small marker in<br />
it. No more artificial markers have to be placed in the environment. The user can also<br />
use the Magic Book as a simple textbook, because the pages consist of colorful images<br />
and not of markers anymore.<br />
2.5 User Interaction<br />
In user interaction we can differ between interfaces providing feedback to the user, mainly<br />
graphical output hardware and interfaces enabling the user to provide input to the system.<br />
2.5.1 Graphical Output Hardware<br />
Now we will have a look at the different possibilities for graphical output user interfaces.<br />
Again the right choice of a suited user interface is an important issue for table top AR.<br />
Desktop PC<br />
Due to the fact that users who are not familiar with new ways of user interaction methods,<br />
still the common desktop PC can be used. This is more suited for Outside-In tracking. A<br />
camera is installed at a fixed location delivering images of the <strong>Table</strong> <strong>Top</strong> workspace. On a<br />
monitor the users can see the feedback to their actions, like the rearrangement of markers for<br />
example. Graphical output through a monitor is not suited for Inside-Out tracking because<br />
the user moving the camera is probably not able to see visual feedback at the same moment.<br />
25
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
Anyway this is not the kind of user interaction we want, because as we discussed Inside-Out<br />
tracking is more suited.<br />
Head Mounted Display<br />
In a classical AR environment a user is equipped with a HMD. It could be compared to data<br />
glasses and it is attached to the users head. The first advantage is that the user is still able<br />
to use his hands. As the target is moving steadily this device is used for Inside-Out tracking<br />
and thus suited for our purposes. There are two different kinds for this graphical output<br />
device:<br />
Optical see-through. An optical see-through HMD is based on a semi-transparent mirror<br />
making it possible to look through the display. The virtual objects are augmented in<br />
the mirror as well.<br />
Video see-through. A video see-through HMD shows a video stream tracked by a camera<br />
attached to the HMD augmented with the virtual information. Note that with this<br />
method such a setup has to be calibrated in a way that the camera represents the user’s<br />
view.<br />
Delay of the vision-based tracking is always a problem with graphical output. With optical<br />
see-through displays the virtual world and with video see-through the real world will lag<br />
behind [45]. Thus it also a research issue to predict the head motion of a user to compensate<br />
the tracking delay (see the related work chapter 3.5).<br />
Hand-Held device<br />
This is a variation of the HMD developed at the HIT Lab. The HMD is put on an iron stick<br />
in order to use it like lenses or a visor (see figure 2.1). The user holds it in front of his eyes<br />
and it has the same effect as a usual HMD. A video-see through device is used and therefore<br />
a camera is attached to the hand held that it exactly matches the viewpoint of the user.<br />
Disadvantage is that the user has to use one hand to steer the device. The rational behind<br />
this device is that during exhibitions a lot of people want to use the application. It is a lot<br />
easier to hand over the device than to adjust the HMD for the next user. And despite that<br />
the HMD might get damaged after a certain period. For the HMD a Sony Glasstron Video<br />
see-through HMD 2 and a Logitec Quick Camera 4000 3 is used.<br />
<strong>Table</strong>t PC, PDA, Mobile Phone<br />
The idea of this interaction device is that the user has a video-see through display, usually a<br />
small and flat computer in his hands. A camera is attached to the computer producing the<br />
see-through effect. An example is a tablet PC. It is possible to rotate the display on top of<br />
the computer and use it as a tablet. As computers are becoming smaller and smaller and the<br />
processing speed increases (Moores Law [35]), table top applications could also be ported<br />
to devices like a small mobile phone or a Personal Digital Assistant (PDA) handheld. The<br />
2 www.sony.com<br />
3 www.logitec.com<br />
26
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
Studierstube uses a PDA as an user interface for the Invisible Train application (see figure<br />
2.2). Efforts are also being made to port the ARToolkit on a mobile phone. This enables AR<br />
applications even for mobile phones.<br />
Projectors<br />
If a lot of people are interacting everyone has to be equipped with the necessary devices. An<br />
alternative is to use a projector to display the virtual information on the table. Thus users<br />
can interact with interaction devices, like markers or even tangible user interfaces and see<br />
the immediate result on the projection. A example for this is the Sheep application, which<br />
is a sheepherding game allowing multimodal interaction [47]. This game is also based on<br />
the DWARF framework we will describe later by the way. Another application application<br />
applying projectors is an intelligent kitchen [12]. With projectors virtual information is augmented<br />
to a kitchen helping the user to prepare a dinner for example.<br />
2.5.2 Input Interfaces<br />
An important issue is how a user can manipulate virtual objects. Thus additional user interfaces<br />
for collecting user input have to be provided. Still traditional interfaces like a mouse<br />
or keyboard are used, but as we have discussed earlier new and more suitable interfaces<br />
have to be found. Billinghust proposes to build tangible user interfaces based on marker<br />
tracking as well [10]. Marker-based optical tracking is used in his work. Markers are attached<br />
to real objects allowing users to interact with them. Moving, rotating and occluding<br />
the tangible objects results in a feedback by the application. User input can be provided by<br />
special components like a glove [58]. Again markers are attached to the glove itself and an<br />
optical tracking routine calculates the position and orientation of the hand. This technology<br />
makes it even possible to interact with virtual objects by ”touching” them. This leads to<br />
the question how to provide feedback caused by a collision of real and virtual objects. One<br />
issue in this kind of research are force-feedback devices, that provide a mechanical feedback<br />
to the user. One example are joysticks that adopt to a situation in a computer game and give<br />
feedback by making it harder to move in a certain direction. Another example for a forcefeedback<br />
device is the phantom by Sensable 4 which is a mechanical 6 DOF input device (see<br />
figure 2.8).<br />
2.6 Summary<br />
In this chapter we have discussed the context of table top AR. This class of applications<br />
has special requirements on tracking and user interaction. Applications for exhibitions and<br />
education being applied not only for research purposes have to be cheap, easy to install,<br />
usable and high-performance concerning the quality of tracking.<br />
All further evaluations will be on the Magic Book application. Here is a short summary of<br />
the key properties of the Magic Book.<br />
• Tracking<br />
4 http://www.sensable.com<br />
27
2 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
Figure 2.8: The force-feedback phantom device by Sensable on the left and a special prepared glove<br />
for user interaction (Studierstube) on the right<br />
The vision-based tracking technique is grounded on texture tracking of a 2D plane.<br />
Preprocessed images can be used to calculate the viewpoint of the user.<br />
• User Interaction<br />
The user interface chosen is a hand held device, a HMD attached to an iron stick.<br />
Because even children are familiar with the book paradigm the application is based on<br />
a tangible book. Just by turning a page the content changes. The scene can be rotated<br />
just by rotating a plate the book lying on.<br />
As described movement information is not considered in the tracking routine. We have<br />
seen that the search window size is a fixed parameter in the texture tracking routine. If we<br />
can apply movement information to change this parameter during runtime we can achieve<br />
better tracking results in terms of computation speed and robustness. The next chapter will<br />
introduce this idea to improve the tracking in the Magic Book application.<br />
28
CHAPTER 3<br />
A Hybrid Tracking Approach<br />
Now we have discussed the basic requirements for table top AR. The fundamentals of the<br />
underlying technologies used by the Magic Book have been introduced. Now we have to<br />
bring the texture tracking technology and the user behavior together. We have seen that<br />
feature points have to be ”found” again in the next video frame and that the size of the<br />
search window an important configuration parameter of the tracking routine. Again, with a<br />
large search size window the robustness of the tracking will increase on the one hand. But on<br />
the other hand the computation time of the tracking routine will rise. This leads to a lower<br />
update rate.<br />
If we can establish a relationship between the search window and the occurring movements<br />
of the handheld device used in the Magic Book, we can configure the texture tracking<br />
algorithm during runtime. Thus a hybrid tracking approach combining vision-based tracking<br />
and inertial tracking is introduced.<br />
3.1 Motivation<br />
In the previous chapter we had a look at the complexity of the template matching algorithm<br />
used by the texture tracking:<br />
O(searchSize 2 · templateSize 2 ) (3.1)<br />
We will consider the template size as a fixed constant of the tracking routine. In former<br />
considerations the search size parameter was constant during runtime. Again if we would<br />
lower the value the computation of the pose information the computation will speed up in<br />
a quadratic way. If we could get the information that almost no movement of the camera<br />
has happened between two tracking frames, the estimation that the feature point will be at<br />
the same position in the next frame would be almost correct. There is no need for a large<br />
window if we have a measurement for a change in position or orientation. In contrast if it is<br />
possible to derive the information that movement occurred we can adjust the search window<br />
to a large size. This would of course increase the computation time, but it is more likely that<br />
29
3 A Hybrid Tracking Approach<br />
the feature point will be found in the next frame again, because more potential points within<br />
the search area are considered.<br />
Our approach is to use additional information about the movement of the camera to alter<br />
the search size parameter during runtime according to a simple rule:<br />
movement ↓⇒ searchsize ↓<br />
movement ↑⇒ searchsize ↑<br />
Thus the first step is to evaluate the relationship of movements and adequate search window.<br />
And if possible we want to derive a linear mapping.<br />
3.2 An Inertial - Optical Tracker based Runtime Setup<br />
A requirement in order to get movement information of the camera is that we have to track<br />
the handheld device. So if we talk now of the movement of the handheld device the movement<br />
of the camera is meant, because the camera is mounted to the device. For tracking the<br />
handheld device movements another tracking technology with the following requirements<br />
needs to be integrated.<br />
• Integration in the User Interface<br />
To obtain the pose measurements of the handheld device a small tracking device has<br />
to be integrated in the user interface. It has to be assembled in a way that it does not<br />
influence the user behavior at all.<br />
• Update Rate<br />
In order to estimate a realistic search window size for the next tracking frame of the<br />
optical tracker, the update rates of the movement tracker has to be higher than the<br />
frame rate of the vision based tracking system.<br />
• Measurement state space<br />
A criteria for the measurement state space is the number of degrees of freedom described<br />
in the tracking introduction. If we would use a 6 DOF tracker with higher<br />
update rates than the optical tracking, then there is no need for the optical tracking at<br />
all. It could be replaced by the other tracking system if accurate enough.<br />
Therefore we will use a tracker providing 3 DOF relative orientation. No position is<br />
delivered by such a tracker. It still has to be evaluated if a tracker with this properties<br />
is sufficient enough for our purposes.<br />
• Price<br />
A of course price is also a key requirement. If this technology has to be integrated in a<br />
mobile phones, for example, an expensive tracking device will not be considered.<br />
30
3 A Hybrid Tracking Approach<br />
For our purpose an inertial tracker, a gyroscope is suited best. When we introduced inertial<br />
trackers we also discussed that small measurement errors accumulate and cause drift<br />
after a short period. It provides relative orientation measurements. But as we are interested<br />
only in the relative change of orientation between two tracking frames drift does not matter<br />
for our setup. Drift does not affect our measurements.<br />
But a big question is if the relative measurement of orientation suited for the configuration<br />
of the search window size. To prove if this is possible we first have to discover a relationship<br />
between the movement measurements and the feature point tracking routine. If we are able<br />
to find such a relationship we can integrate the mapping in our software design.<br />
3.3 Configuration of the setup<br />
As described the first question is if there is a relationship, then the second question is to find<br />
a mapping between change in orientation and search window size:<br />
ftexturetracking(∆ orientation) = search window size (3.2)<br />
This mapping has to be integrated into the software of the hybrid tracking setup. Thus<br />
one requirement for the software design is to allow a dynamic configuration of the texture<br />
tracking routine. For every tracking frame the proper value for the search window size has to<br />
be set according to a mapping. This leads to the question how to determine this relationship<br />
and how to evaluate if this approach is possible at all.<br />
The idea is to do an user study. The study should give hints on how people actually use<br />
the Magic Book. While performing the study we want to retrieve data of the movement<br />
of the handheld device on the one hand, but we also want to have a deeper look at certain<br />
properties of a feature point on the other hand. Both data sets have to be explored the degree<br />
of correlation has to be measured. Correlation is a measurement to what degree two data sets<br />
are related.<br />
3.4 Motivation for a User Study<br />
As discussed we have to record data of the movement of the handheld device and try to<br />
relate this data to properties of feature point tracking, which also have to be logged.<br />
• Recording the pose information of the handheld device<br />
In order to record absolute full 6DOF we have to get the absolute position pHandheld<br />
as well as the absolute orientation qHandheld. To record the pose information a 6DOF<br />
magnetic tracker will be used. You might ask why we are also considering the position<br />
as well, because for the runtime setup only the orientation is needed. The magnetic<br />
tracker is used for the analysis of movement. If we recognize that people mainly use<br />
the Magic Book by changing the position of the handheld device our idea does not<br />
work. Please note that the 6DOF tracker is only for the purpose of the user study, not<br />
for the runtime environment. The runtime setup still consists of the handheld device<br />
and the gyroscope.<br />
31
3 A Hybrid Tracking Approach<br />
Figure 3.1: The 2D coordinates of a feature point are tracked over several frames. The feature point<br />
moves through the 2D video plane<br />
• Recording 2D coordinates of feature points in the video image<br />
One possibility to relate the obtained pose information to feature point tracking is to<br />
record the 2D video frame coordinates of the feature points in every frame. This leads<br />
to a change in the 2D position for the feature point ∆pF P , if the feature point is tracked<br />
over a period of several frames. This change in position can be annotated with the<br />
corresponding change in orientation given by the magnetic tracker. Deriving these<br />
”chains”, where the same feature point is tracked over several frames is dependent on<br />
the selection of the best suited feature points by the algorithm. In figure 3.1 a feature<br />
point is tracked for several frames and ”moves” through the 2D video plane. This<br />
movement is obviously caused by camera movement.<br />
3.5 Related Work<br />
This section should give an overview of the current research related to this thesis. Main issue<br />
of my work is to characterize the movement of the user interface to improve the computation<br />
and the robustness of the system. But we also want to have a look at the current algorithms<br />
for natural feature tracking first.<br />
3.5.1 Natural Feature Tracking<br />
According to [55] an image sequence can be represented as any function of three variables<br />
I(x, y, t), with x, y a spatial and discrete variables and t as discrete variable for time. As<br />
patterns move from frame to frame in an image stream I satisfies the following equation:<br />
I(x, y, t + τ) = I(x − ξ, y − η, t) (3.3)<br />
In other words this equation says that we can take a picture of a scene at later point in time<br />
and we can obtain the image by moving every point p = (x, y) by a displacement d = (ξ, η).<br />
If we want to track certain features of a scene over several frames algorithms have to face<br />
this displacement. In our case camera movement occurs and I(x, y, ti+1) = I(x, y, ti).<br />
32
3 A Hybrid Tracking Approach<br />
In his ”State of the Art Report of Natural Feature Tracking” Vial [60] gives an overview<br />
of principle features that can be found in an image 3.1. He also provides an overview of<br />
common methods to extract the features. Generally we can distinguish between modelbased<br />
and move-matching methods [51] for feature tracking. Model-based tracking requires<br />
a model definition of the object to be tracked. Marker detection methods like the ARToolkit,<br />
but also the texture tracking of the ARToolkit can be categorized in this method. Reason<br />
for that is that all the images used with the toolkit have to be preprocessed first. During<br />
runtime the preprocessed data is applied. Another possibility for a model-based approach<br />
is to consider a CAD model of the environment in the tracking routine [29]. Thus such<br />
methods are not suited for unprepared environments. In contrast move-matching methods<br />
estimate a correspondence of 2D image movements to 3D position and orientation without<br />
any underlying model.<br />
0D 1D 2D Motion<br />
Corners, Points Contours, Edges,<br />
Chains, Lines,<br />
Circles, Ellipses<br />
Uniform Regions,<br />
Textured<br />
Areas, Surface<br />
patches<br />
<strong>Table</strong> 3.1: Overview of features in an image [60]<br />
Regions with<br />
similar motion<br />
Thus the first step is to select ”good features in regions with rich enough texture [55]” and<br />
then apply tracking techniques to find the corresponding point pi+1 = (x − ξ, y − η) of point<br />
pi = (x, y) in the following frame. This is often applied in closed-loop architectures [51]:<br />
<br />
1. Detect N interest points in frame i + 1, resulting in the set<br />
x i+1<br />
j<br />
2. Match interest point from frame i to i + 1 and find the correspondences x i j<br />
3. Use these correspondences to compute pose<br />
N<br />
j=1<br />
←→ xi+1<br />
k<br />
The problem is to find the correspondences of features between tracking frames. And<br />
that is were our approach makes sense. While tracking feature points Lucas and Kanade<br />
[33] proposed the measurement of similarity between fixed search windows of two consecutive<br />
frames. This is based on the assumption that the displacements d from frame to frame<br />
are small. The correlation of the windows is defined as sum of squared intensity differences.<br />
This method is applied by the texture tracking ARToolkit. Every algorithm using this<br />
method can be extended with our hybrid tracking approach.<br />
Neumann and You use similar closed-loop approach. They use the concept of optical flow,<br />
which observes the motion of the image pixels as a whole. They combine region tracking and<br />
feature point tracking [38][37]. First regions with similar movements are extracted and the<br />
tracking is refined by 0D point tracking. Region motion tracking is based on optical flow<br />
and relies on the spatial-temporal gradients of an image. Using region tracking a movement<br />
model is derived. Because we know where a feature is located within a region we<br />
can refine the region tracking by matching the corresponding points by applying correlation<br />
methods as well. This work also proposes a verification and evaluation mechanism. For<br />
33
3 A Hybrid Tracking Approach<br />
every estimation the confidence is assessed. If the confidence is poor the result is refined.<br />
This approach allows larger movements as well. Generally region tracking allows larger<br />
camera movements while point feature tracking itself is only suited for small displacements.<br />
In contrast our approach assumes small inter-frame displacements which results in a rather<br />
simple movement model. It has to be evaluated if our idea can be applied here as well for<br />
the refinement of the tracking.<br />
All these algorithms have the displacement problems for finding the tracked point of interest<br />
in the next frame again. As a summary our idea is suited for all feature tracking<br />
algorithms assuming small displacements between frames. To find corresponding features<br />
points windows are compared, not pixels itself. These windows correspond to the term templates<br />
in the texture tracking. Our approach tries to influence the number of comparisons<br />
of these windows by applying orientation information. With this information the search<br />
window is adjusted.<br />
3.5.2 Hybrid Tracking<br />
As we said hybrid tracking combines different tracking technologies to compensate drawbacks<br />
of a single tracker. We will focus on related work on combinations of vision-based<br />
and inertial tracking. As discussed drawbacks of vision-based tracking are the occlusion<br />
of tracked features and computational expensive algorithms causing additional delay. An<br />
inertial tracker accumulates small measurement errors that cause drift [46].<br />
Azuma motivates the usage of hybrid tracking systems for unprepared environments [4].<br />
In prepared environments the user or developer is in control of tracked objects. A user<br />
can place fiducial markers on a table for example. This is more difficult in outdoor AR<br />
applications. Light conditions change and visual landmarks used for feature tracking may be<br />
occluded. Integrating a gyroscope provides a good estimate of orientation and a reasonable<br />
guess to reduce the search space of the optical tracking algorithm. This idea is picked up in<br />
our work. He uses the setup the following way: If a user stops moving, the video tracking<br />
is locked on traceable features and the accumulated drift in the inertial tracker is corrected.<br />
He distinguishes two methods for the fusion of inputs from inertial and optical tracker:<br />
1. Use the gyroscope orientation as an estimate for orientation of the vision tracker.<br />
This compensates inaccurate measurements of the vision-based tracker, but the inertial<br />
tracker will drift and cause wrong result after a certain period.<br />
2. Use vision-based tracker to compensate drift<br />
Every frame of the vision-based tracker corrects the measurements of the inertial tracker.<br />
Thus drift does not occur, but inaccurate measurements of the vision-based tracker will<br />
be propagated.<br />
Handling unprepared environments is an important issue for outdoor applications. It is<br />
not realistic placing fiducial markers or preprocessed textures in the outdoor environment.<br />
Our environment is prepared, we can expect accurate measurements of the optical tracker.<br />
Drift does not affect our setup either because we only use relative measurements of the<br />
gyroscope tracker. In [68] the vision-based algorithm introduced earlier [37] is applied for<br />
34
3 A Hybrid Tracking Approach<br />
a hybrid setup. The frame to frame prediction of camera orientation by the inertial tracker<br />
and the correction of the accumulated drift exploits the nature of both trackers. The inertial<br />
predicts the motion of image features and the estimated positions are refined by searching for<br />
local matches for the feature points. This work also addresses the importance of calibration<br />
issues of the setup.<br />
In an indoor and mobile path finding setup by the Studierstube a user is guided through<br />
an unfamiliar building to a destination room [26]. This application combines marker and<br />
inertial tracking as well. A camera mounted on a helmet worn by the user grabs video<br />
images containing square markers attached on walls. Additionally an inertial tracker also<br />
attached to the HMD provided head orientation. This setup tries to compensate the low<br />
update rates of vision-based tracking with a gyroscope. In between measurements from the<br />
optical tracker the gyroscope gives the user’s viewing direction. The drift drawback of the<br />
gyro is corrected with the orientation given by the next frame of the optical tracker:<br />
qHybridview = qvision = qcorrection ◦ qinertial with qcorrection = qvision ◦ q ∗ inertial<br />
This method has been applied from [40]. In this setup an active ultrasonic tracking system<br />
called Bat system is combined with an inertial tracker. In contrast to Azuma these are indoor<br />
setups in a prepared environment, thus the vision-based tracking is very reliable.<br />
In chapter 1 we have already introduced a method to predict and correct measurements,<br />
the Kalman Filter. Figure 3.2 shows the basic mechanism of the filer loop. This filter has<br />
several benefits. On the one hand it can predict measurements even if actual measurements<br />
are not available yet due to low updated rates of a tracking system. The prediction estimates<br />
can be used prior the availability of the actual measurements (State estimation). In the correction<br />
phase the parameters for the prediction are recalculated (State update). Although<br />
prediction is not necessary a hybrid tracking issue sensor fusion approaches can be used to<br />
predict new measurements. Klein and Drummond propose a filter-architecture for a modelbased<br />
hybrid tracking approach [29]. Model-based approach in this case means that a CAD<br />
model of the tracked environment is available. Again the idea is that the prediction of the<br />
new camera pose is estimated by an inertial tracker. Thus with this information the visual<br />
tracking system is able to start in the right place. With the results of the visual tracking frame<br />
a new system state is calculated which is used for further prediction and the correction of<br />
the accumulated tracking error of the inertial tracker. Again a huge focus of this work is the<br />
issue of calibration of the used trackers. Neumann’s and You’s design of the filter [67] even<br />
allows a failure of the vision-based tracker, due to occlusion for example, to update the current<br />
state of the system. This work focuses on fiducial marker tracking in the vision-based<br />
tracking routine. The predicted pose can be utilized in this approach as well. For the visionbased<br />
and the gyroscope measurements independent correction channels are provided. For<br />
his outdoor reality system Azuma fuses the output of gyroscopes and of a compass [1]. Thus<br />
the user’s head movement can be predicted and the noise in the compass measurements can<br />
be filtered. He evaluated that the compass noise makes it hard for outdoor registration tasks.<br />
Applying a filter using additional gyroscopes the system was stabilized.<br />
We are combining gyroscopes and vision-based tracking which is a hybrid tracking approach.<br />
But we do not fuse the output together. We only apply the relative orientation given<br />
35
3 A Hybrid Tracking Approach<br />
Figure 3.2: The prediction and correction loop of the Kalman filter<br />
by the gyroscopes. Thus a difficult calibration step is not necessary, because we consider and<br />
evaluate the orientation independently.<br />
3.5.3 Head motion prediction<br />
As we already introduced a very common setup for AR applications is a HMD-based setup.<br />
An important problem is the end-to-end system delay. The user always has the impression<br />
that the virtual content lags behind his actual movements. Hence the head movement has<br />
to be predicted. We already introduced the Kalman filter as a prediction method, but still<br />
other filtering techniques are available. Azuma tries to compare two classes of head motion<br />
predictors [3]. Both methods are analyzed in the frequency domain in order to obtain<br />
characteristics of the predicted signal as a function of system delay and input motion. Shaw<br />
and Liang address the problem of head motion as well. In a first experiment they try to<br />
characterize head motion [48]. Especially changes in head orientation are important because<br />
the change of viewing direction often causes more changes in the scene. The benefits of this<br />
knowledge should be used for the design of a predictive filter. The experiments consist of<br />
a user study where participants have to fulfill several navigation tasks. The test person sits<br />
on a chair and has to look at markers on a wall in a certain sequence. The head position and<br />
orientation is tracked during the study. They found out that the user’s head moves along a<br />
great-circle arc and that the velocity of orientation seems to be symmetric while accelerating<br />
and slowing down. The second step was to design the filter [32]. They applied the knowledge<br />
that the felt delay was mainly caused by delay in orientation and jittering is mostly<br />
caused by noise in the position data. They also recognized that the noise in position was<br />
higher that the noise in orientation. As a consequence they designed a prediction filter to<br />
address the orientation delay and an anisotropic low pass filter to filter the noise in the position<br />
measurements. They also evaluated an adequate prediction length of the filter. As one<br />
conclusion they noticed that the prediction of hand movement is a more difficult process. In<br />
our approach the movement of the handheld device is rather comparable to hand movement<br />
than to head motion.<br />
3.5.4 <strong>Table</strong>-<strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
The basic motivations for table top AR have already been discussed in chapter 2.1 and examples<br />
have been shown. For references for related projects please have a look there. Here<br />
is just a short summary of application domains for table top AR. But obviously there are still<br />
36
unexplored domains for table top AR as well.<br />
• Exhibitions<br />
• Education<br />
• Gaming<br />
• Interactive Storytelling<br />
• Conferencing<br />
3 A Hybrid Tracking Approach<br />
Every table top AR application has to face the problem of 3D registration. Mark Billinghurst<br />
discusses this problem in several publications. The intention is of the shared space technology<br />
is to enable interaction with virtual and physical objects, but also the collaborative<br />
interaction with other users [9] [8]. The shared space could be used for a variety of applications.<br />
He mainly uses the ARToolkit as optical tracking system [27]. A lot of table top<br />
applications use the ARToolkit as tracking component either to display the virtual content<br />
[61] or to interact with the system [57][10]. Reasons for that may be that it is free available<br />
and easy to integrate in the application. A deeper knowledge of tracking and image recognition<br />
algorithms is not necessary. Therefore the ideas provided in this thesis are suited for<br />
all vision-based tracking application in horizontal table top environments.<br />
3.5.5 Our approach<br />
A cheap gyroscope only giving relative spacial information in sufficient even if drift occurs,<br />
because only the relative change of orientation between two frames is considered. Our application<br />
is set up in a prepared environment. It means that we are using a preprocessed<br />
image for tracking. There is no need to filter the gyro orientation, because it is not used for<br />
the registration of the fairy tale scene. And although head mounted displays are also suited<br />
for table top AR, the input device used in the Magic Book is the handheld visor. The field of<br />
interest of the user will be the horizontal setup on the table. The the range of possible movements<br />
is rather restricted to the table area and small inter-frame displacements are likely.<br />
But of course an interesting question would be as well how user behavior changes with different<br />
input and output devices. We have seen that out approach is suited for vision-based<br />
trackers assuming small inter-frame displacements. Future work has to evaluated to what<br />
degree our ideas can be applied for other tracking algorithms. As we said the motion of<br />
the handheld device will differ from head motion. Hence out approach might motivate the<br />
characterization of other user interfaces.<br />
In my opinion table top AR applications are a way to address a broad mass of people with<br />
the new technology. Just image augmenting an exhibition piece with virtual information<br />
using a cellphone display with a attached camera. Thus fast, robust and accurate tracking<br />
will seriously influence the user’s acceptance of the technology. Our approach will help to<br />
improve this.<br />
37
3.6 Summary<br />
3 A Hybrid Tracking Approach<br />
According to our idea we now do not consider the search window size as a fixed parameter<br />
inside the texture tracking algorithm anymore. Our approach is to adjust this size during<br />
runtime according to relative orientation information given by a cheap and self-contained<br />
gyroscope tracker. To find a relationship between the feature point tracking routine and<br />
changes in orientation of the handheld device ideas for a user study have been discussed.<br />
From now on we distinguish between a runtime setup and a user study setup.<br />
Runtime setup. The runtime setup consists of the handheld device with an integrated gyroscope,<br />
a camera as input- and a altered HMD as output device. A requirement for this<br />
setup is the dynamic configuration of the search size window during runtime.<br />
User Study setup. The user study setup consist of a magnetic tracker recording the movements<br />
of the handheld. The ARToolkit tracking has to be extended to log the feature<br />
point coordinates in every video frame.<br />
The next chapter will focus on the software architecture of the runtime setup. After that<br />
the main part of the thesis will be introduced: the design, execution and evaluation of the<br />
user study.<br />
38
CHAPTER 4<br />
A Software Architecture based on DWARF<br />
To enable a dynamic configuration of the optical-inertial setup during runtime a software<br />
architecture has to be provided. The information given by the gyroscope tracker has to be<br />
processed and used to set the search window size of the texture tracking routine. The Distributed<br />
Wearable <strong>Augmented</strong> <strong>Reality</strong> Framework DWARF is a component based framework to<br />
build AR applications [17][6]. A CORBA 1 -based infrastructure allows communication between<br />
these components, so distribution which is important for mobile setups is provided.<br />
Reusable components for tracking, rendering and user interaction enable rapid prototyping.<br />
First I will give a short overview of the basic principles of DWARF. Then the requirements<br />
for my architecture will be discussed and I will show the structure of the resulting system.<br />
Only the necessary terms relevant for this thesis will be described. More information and<br />
tutorials about DWARF can be found in the corresponding references.<br />
4.1 DWARF<br />
DWARF has been developed at the Technische Universität München (<strong>TUM</strong>) from the AR<br />
research group of Prof. Klinker. The basic concept of DWARF is that various applications<br />
consist of interdependent and distributed components that can be reused for a variety of<br />
different applications even for several application domains.<br />
4.1.1 Services<br />
In DWARF components are called services and can be distributed within the network infrastructure.<br />
Every service is running as a single process. To build an application these<br />
services have to be combined in order to fulfill a certain task. To accomplish this, interdependent<br />
services have to exchange data via mechanisms provided by the framework. Every<br />
service has its service description. The XML-notated description describes the input data<br />
needed and the output data provided by the service. This configuration is responsible for<br />
1 http://www.corba.org<br />
39
4 A Software Architecture based on DWARF<br />
the connection with other services delivering or demanding exactly what is needed, the so<br />
called needs, or provided, the abilities, by the service. The configuration of a service can also<br />
change or be reconfigured during runtime.<br />
Needs and Abilities<br />
• Needs<br />
A need is a certain property of a service in order to request a functionality from another<br />
service running in the network. The middleware connects these two services and they<br />
continue to work on a peer-to-peer basis. The services can communicate via different<br />
communication protocols described by the need description.<br />
• Abilities<br />
Abilities are the correspondence to Needs. They are specifying a certain functionality<br />
provided to other services. The ability description also sets the communication protocol<br />
used for communication (similar to the need description)<br />
Further restrictions can be made with attributes and predicates. Attributes can be set for<br />
abilities. It can specify certain additional properties of the ability. For example if two services<br />
provide similar abilities, like two cameras are attached to a user. To distinguish between<br />
them for every ability an additional attribute is set. Now the need for video data can specify<br />
a predicate in order to connect with the right ability. In the example 4.1.3 this means that the<br />
service only wants information from the video data ability with the attribute ”head”.<br />
If the middleware recognizes that the service descriptions of two services match according<br />
to their needs and abilities (with predicates and attributes), they get connected and can start<br />
to exchange data via several communication protocols.<br />
Communication<br />
Here I will give a short description of the main communication protocols used in DWARF.<br />
As mentioned the way of communication is specified in the need- and ability description<br />
and has to match with each other. Some of them are CORBA-based mechanisms, thus an<br />
Interface Description Language (IDL) interface has to be provided.<br />
Method Calls: A service exports a method provided to other services. The corresponding<br />
partner has to import this method. The service is able to call the method on the remote<br />
object now, which is similar to a method call on a local object. This is realized by<br />
CORBA Remote Procedure Calls RPCs.<br />
Events: A service can send events and another service is able to subscribe these events. This<br />
is realized by the CORBA Notification Service.<br />
Shared memory: A service could also write data in a local shared memory. Another service<br />
is able to read out of the shared memory to obtain the data. Note that both services<br />
have to run on the same machine, which is a restriction to the distribution of the components.<br />
40
4 A Software Architecture based on DWARF<br />
Figure 4.1: The example shows two services: the Videograbber with the ability for VideoData demanded<br />
by the need of the VideoDisplay service<br />
Depending on the requirements of the application the right choice of communication protocols<br />
has to be selected. As a short summary a need and an ability consist of a name, used<br />
as an identifier, a type, describing the kind of data offered or demanded and the connector<br />
protocol specifying the communication mechanism.<br />
4.1.2 Service Manager<br />
The DWARF service manager is responsible for connecting the components. It is the ”heart” component<br />
of the framework. At each network node a service manager is running. Every time a service<br />
is started it registers at the service manager which collects all the information about local<br />
services. Once a service is registered the service manager looks for a suited connection partner.<br />
It fulfills the task of a broker. It tries to find the corresponding ability for a need. If two<br />
matching service have been found the service manager establishes a connection between the<br />
services. Now the matching services can communicate via a direct communication channel<br />
on a peer-to-peer basis, the service manager is not needed anymore for communication.<br />
4.1.3 An example<br />
This little example will show the terms described above (see figure 4.1).<br />
The notation used for DWARF architectures is an Unified Modeling Language (UML [20])<br />
extension for component diagrams providing mechanisms for the need- and ability relationships.<br />
Needs are represented by half-circles and abilities by full-circles.<br />
The example shows two services, the VideoGrabber with the ability for ”type=VideoData” and<br />
the VideoDisplay with the need for type=”VideoData”. Figure (4.2) shows the corresponding<br />
XML service descriptions. On a first look it is obvious that the type of need and ability<br />
matches. The VideoDisplay service sets the predicate ”type=head” which means that this service<br />
only wants to connect to a ability with the attribute ”type=head”. As we can see the<br />
ability ”provideImage” fulfills this. Both services are connected by the service manager and<br />
they communicate via shared memory.<br />
The services can be reused for different purposes. As we will see later on the video stream<br />
provided by the VideoGrabber can also be used for optical tracking. The VideoData is forwarded<br />
to another service performing the texture tracking.<br />
41
4 A Software Architecture based on DWARF<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
<br />
Figure 4.2: An example: XML description of the VideoGrabber and the VideoDisplay services<br />
connected via the shared memory communication mechanism. The need and the ability for<br />
”type=VideoData”and the predicate for ”type=head”matches.<br />
4.2 Software Architecture for a Dynamic Configuration during<br />
Runtime<br />
DWARF has the capabilities for a suited software design for our runtime setup, that uses<br />
the gyroscope information for the dynamic configuration of the texture tracking. First we<br />
will have a look at the requirements for such a software design. A task was to integrate<br />
and reuse existing components into the setup. DWARF already provides an architecture for<br />
optical tracking based on the ARToolkit. This architecture was proposed in Wagners PhD<br />
thesis [62].<br />
4.2.1 Existing Architecture<br />
In figure 4.3 the existing architecture is shown in the UML syntax described above. Note that<br />
PoseData is a data struct containing information about position and orientation (6DOFPose-<br />
Data) or orientation (3DOFPoseData) only. In DWARF it is not distinguished between them.<br />
If a tracker only provides orientation then the values for position will not be set in the pose<br />
data structure.<br />
The ARToolkit is split in to several single components. The Videograbber service grabs the<br />
video stream and provides it via shared memory to the ARTkMarkerDetection service. The<br />
marker detection component is the core of the ARToolkit. It searches for marker features in<br />
the video frame. In order to keep the optical tracker as flexible and reusable the ARTkMarkerConfiguration<br />
is responsible to configure the ARTkMarkerDetection with marker data. So<br />
marker data can be loaded and unloaded during runtime. The ARTkMarkerDetection service<br />
does not provide pose data directly. It sends a ARTkFrameMarkers structure which contains<br />
all the information about the detected markers in the video frame. To extract the PoseData<br />
out of the marker structure the ARTkPoseReconstruct service is needed. This has the advantage<br />
that the marker detection can also be used for other purposes, if the pose information is<br />
not relevant for the application. In [63] the ARToolkit is used for wide area tracking, for example.<br />
The ARTkPoseReconstruct provides 6DOFPoseData and can be used by other services<br />
42
4 A Software Architecture based on DWARF<br />
Figure 4.3: The existing architecture based on DWARF Services<br />
for several goals like displaying a 3D model for example. A complete rationale behind this<br />
architecture can be found in Wagners thesis.<br />
4.2.2 Requirements for new architecture<br />
First the functionality of the architecture has to be extended with the texture tracking version<br />
of the ARToolkit. So similar to the ARTkMarkerDetection a component has to be written that<br />
performs the texture tracking routine, the ARTkNFTDetection. In terms of reusing the existing<br />
components the following considerations have to be made:<br />
• VideoGrabber service<br />
The VideoGrabber can be reused without any restrictions. The CameraData can be read<br />
directly from the shared memory, like the ARTkMarkerDetection does. Again the only<br />
restriction is that both services (VideoGrabber and ARTkNFTDetection) have to run on<br />
the same machine.<br />
• ARTkPoseReconstruct service<br />
To reuse this component the interfaces (Needs and Abilities) have to redesigned. The<br />
ARTkNFTDetection does not provide a data structure with the detected markers. It only<br />
provides the homographie-matrix used for the extraction of the 6DOFPoseData. The<br />
ARTkPoseReconstruct service calculates this matrix out of the marker structure and the<br />
extracts the pose information. Thus another need of the ARTkPoseReconstruct service<br />
has to be integrated allowing the estimation of pose with matrix data given by the<br />
ARTkNFTDetection service.<br />
• Configuration of the search window<br />
43
4 A Software Architecture based on DWARF<br />
manually: A simple graphical UI should make it possible to alter the search window<br />
size of the texture tracking. Similar to the ARTkNFTDetection service a configuration<br />
component enabling this is needed: the ARTkNFTConfiguration service. If<br />
no configuration of the search window is needed at all the ARTkMarkerDetection<br />
service is able to run with the configuration service.<br />
dynamically: This is the requirement for our runtime setup described in the previous<br />
chapter. Information given by a gyroscope should be used to estimate a new<br />
search size window. This estimation is based on the mapping of the movement<br />
of the handheld device and the feature point 2D coordinates that still has to be<br />
found (see chapter 3). A gyroscope tracking unit can connect to this configuration<br />
component and deliver 3DOF orientation information (gyroscope service).<br />
The ARTkNFTConfiguration allows both: a manual and a dynamic configuration of the<br />
search window size.<br />
4.2.3 System Design<br />
First we will have a look at the new components and the redesign of the interface of the<br />
ARTkPoseReconstruct service necessary to realize the architecture.<br />
New services<br />
Looking at the small requirements elicitation for the new software design the following new<br />
components can be identified. For a deeper look into requirements analysis and software<br />
engineering in general have a look at Bruegge’s ”Object-Oriented Software Engineering”<br />
book [13].<br />
• ARTkNFTDetection<br />
This service implements the loop for the optical tracking. It has an ability for ARTkN-<br />
FTPoseMatrix. This data contains the matrix describing the position and orientation of<br />
the 2D plane. The pose data can be extracted from this data.<br />
• ARTkNFTConfiguration<br />
Interfacing the ARTkNFTDetection service this service is able to set the search window<br />
size of the texture tracking. Thus this service provides the ability NFTSearchSizeWindow.<br />
Adaption of the ARTkPoseReconstruct interface<br />
A new need for NFTSearchWindowSize has to be provided. This need matches the ability<br />
of the ARTkNFTDetection and these services are able to connect. In contrast to the marker<br />
detection, where first the marker structure is used to calculated the pose matrix, the matrix<br />
is provided directly by the texture tracking. Reason for that is that the tracking routine<br />
and the calculation of the matrix out of the feature points cannot be separated easily. The<br />
calculation of the 6DOF pose data can be performed in the same way now.<br />
44
Resulting architecture<br />
4 A Software Architecture based on DWARF<br />
In figure 4.4 shows the resulting architecture with the new services. Old components are<br />
drawn in gray.<br />
Figure 4.4: The resulting architecture integrating the new components<br />
The following dependencies between the connected services are established by the service<br />
manager during runtime. The information is given in the XML service description for every<br />
service.<br />
• VideoGrabber ←→ ARTkMarkerDetection<br />
Type CameraData<br />
Communication method Shared Memory<br />
Description The ARTkMarkerDetection reads the video stream out of the shared memory<br />
• ARTkNFTDetection ←→ ARTkNFTConfiguration<br />
Type: NFTSearchWindowSize<br />
Communication method: Method Call<br />
Description: The ARTkMarkerConfiguration calls an exported method of the ARTkNFT-<br />
Detection to set the search size of the texture tracking.<br />
• ARTkNFTConfiguration ←→ Gyroscope<br />
45
Type: 3DOFPoseData<br />
Communication method: Event<br />
4 A Software Architecture based on DWARF<br />
Description: Every frame the Gyroscope service sends the new pose data to the ARTkN-<br />
FTConfiguration service. This information then is evaluated and the appropriate<br />
search size is set.<br />
• ARTkNFTDetection ←→ ARTkPoseReconstruct<br />
Type: ARTkNFTPoseMatrix<br />
Communication method: Event<br />
Description: As described the pose matrix is sent to the new need interface of the ARTk-<br />
PoseReconstruct.<br />
Figure 4.5 shows the resulting UML sequence diagram. The communication of the services<br />
is described in relation of time. We can see that the frame rate of the gyroscope is higher<br />
than the frame rate of the optical tracker and that the ARTkNFTConfiguration sets the search<br />
window size according to the received pose data events.<br />
Figure 4.5: UML Sequence diagram: Interaction of the DWARF services<br />
This architecture meets the requirements described above. On the one hand the texture<br />
tracking can be used as an independent component without any configuration of the search<br />
window. Then the parameter is set to a constant value. But on the other hand the parameter<br />
could also be set dynamically if a gyroscope tracking service and the ARTkNFTConfiguration<br />
service are active.<br />
46
4.2.4 Implementation<br />
4 A Software Architecture based on DWARF<br />
Linux is the main platform for DWARF. A Fedora Linux distribution 2 (version 2) was used<br />
to run DWARF, although SUSE 3 distributions are suited better. Core components have been<br />
developed and tested under SUSE only. The reason for using Fedora was that a SUSE distribution<br />
was not available in New Zealand due to high internet costs. DWARF provides<br />
support for several programming languages like C++, Java and Python. The existing architecture<br />
was written in C++, thus the new services have also been developed in C++. The<br />
graphical user interface for setting the search window size manually (in the ARTkNFTConfiguration<br />
service) was implemented with the QT Toolkit. Figure 4.6 shows a screenshot of<br />
the runtime setup. It shows the preprocessed 2D image plane and a virtual plane registered<br />
on top of it. The search window size could be set by a slider in the graphical user interface.<br />
Still is it not possible to set the search window dynamically, because it is not clear if a proper<br />
mapping can be found. But even if it is not possible to find such a mapping the texture<br />
tracking can be use in the DWARF framework now.<br />
Figure 4.6: Runtime environment: The search window size can be set manually or by the orientation<br />
information given by a gyroscope (3 DOF Intersense Tracker)<br />
2 http://fedora.redhat.com/<br />
3 http://www.suse.com/<br />
47
4.2.5 Summary<br />
4 A Software Architecture based on DWARF<br />
We have introduced a software architecture based on the DWARF framework accomplishing<br />
the discussed requirements of a dynamic configuration during runtime. The texture tracking<br />
ARToolkit is now integrated in the DWARF framework, which was one of the core requirement<br />
for this thesis. A mechanism for deriving a search window size out of the gyroscope<br />
orientation still has to be implemented.<br />
The next chapters will show that it is not an easy task to express this relationship. Next<br />
we will describe the design and performance of the user study motivated in chapter 3.<br />
48
CHAPTER 5<br />
User Study<br />
As we explained in the previous chapter we want to find a mapping between the feature<br />
point tracking and the change in orientation of the handheld device. To obtain data to evaluate<br />
a user observation has to be made. A logging infrastructure records the data during the<br />
study.<br />
This chapter will describe the goals and the design of the user study. As said the overall<br />
goal is to get data from a certain amount of people in order to analyze it. The texture tracking<br />
AR Toolkit is altered in a way that we can retrieve the 2D feature point coordinates in every<br />
video frame. The tracking setup is extended with a Ascension Flock of Bird magnetic tracker<br />
to track the user movements. To obtain comparable sets of data special tasks have been<br />
designed, in which all the test persons have to answer questions about the current scenes.<br />
This is done to force the user to act a certain way.<br />
5.1 Goals of the User Study<br />
The motivations for user studies and evaluations can be very different. According to [44]<br />
four reasons for doing evaluations could be identified:<br />
• Understanding the world<br />
How do future users use a technology? How do they employ the new system in their<br />
workplace? The main motivation for that kind of evaluations is understanding the<br />
user and his behavior.<br />
• Comparing designs<br />
Ofter system designers have to decide which input method to chose. Therefore evaluations<br />
of these different methods have to be made. A evaluation should give hints<br />
which method is more accepted by the user and leads to a better performance. An<br />
example for this can be found in Kulas master thesis [30], in which he focuses on usability<br />
aspects of ubiquitous systems and performs a sample user study to compare<br />
two menu designs.<br />
49
• Engineering towards a target<br />
5 User Study<br />
Studies are made in order to evaluated if the system accomplishes certain goals, for<br />
example a better performance than a competitors product.<br />
• Checking conformance to a standard<br />
These studies are mainly testing procedures to evaluates if a systems meets required<br />
standards.<br />
In our evaluation we want to ”understand the world” better. Especially we want to have a<br />
deeper look at the following aspect: How is the tracking related to the input provided by the<br />
user through movements of the handheld device. This user study is meant for collecting<br />
enough data to describe a relationship and if possible a mapping between the two data<br />
sources. Here is an overview of the expected outcomes of our user study. Of course not<br />
all of these goals could be highlighted within the limited time of this thesis. But potential for<br />
further investigations in that research area are shown.<br />
• Collect user data<br />
Data has to be collected. As we described in the previous chapters there is a lot potential<br />
to derive possible conclusions using the data. Of course finding a mapping is the<br />
prime goal but collecting the data is a first big step and took most of the time during<br />
this work.<br />
• Find a mapping between user movement and search window size<br />
This is the idea described in chapter 3. The mapping is expressed in the function introduced:<br />
ftexturetracking(∆ orientation) = search window size (5.1)<br />
• Find a relationship between user tasks and the related movements<br />
If we know which actions and movements are connected with certain tasks, we can<br />
adopt our tracking not only to general mapping, but also to the designated task. Therefore<br />
we first have to detect possible tasks in table top AR and try to discover a dependency<br />
between these tasks and the tracking results.<br />
• Characterize movement of the handheld device<br />
We are interested if the movement can be characterized and are these results valid for<br />
every user. Therefore user tasks are needed to let the user perform similar actions.<br />
• Evaluation of a suited task design for table top AR<br />
Like Shaw introduced in his experiment to characterize head motion [48] certain user<br />
tasks have to be designed. We will introduce and abstract tasks that might be suited<br />
for a variety of table top applications.<br />
• Collect feedback from potential users<br />
A questionnaire was designed to collect additional feedback on the Magic Book and<br />
on the tasks.<br />
50
5 User Study<br />
• Observe anything else that might be interesting<br />
Still it is important to keep the eyes open for any interesting observation during the<br />
execution of the user study and the analysis of the collected data.<br />
5.2 User Study design<br />
This section discusses all the relevant aspects of the design of this user study. First we will<br />
have a look at the recording of the separate data sources: the pose data given by the magnetic<br />
tracking device to track the hand held device and the 2D positions of the tracked feature<br />
points. The Magic Book has been implemented on Windows using Microsoft Visual Study<br />
6 1 . Therefore first the logging infrastructure had to be integrated into the existing system.<br />
After that we will introduce our task design for table top AR.<br />
5.2.1 Movement Tracking of the Hand-Held Device<br />
The issue is to track the position and orientation of the handheld device. Therefore a tracking<br />
device providing 6DOF in needed. Like we have introduced in the beginning of this thesis<br />
a magnetic tracker can measure position and orientation. The Flock Of Bird system has a<br />
update rate of about 90 frames per second, in comparison the vision-based texture tracking<br />
runs with 30 frames per second. As a repetition a base station establishes a magnetic field<br />
and the pose data of several sensors could be tracked within the range of the magnetic field.<br />
Hence it is possible to track more than one object at one time. Drawback of this tracker is<br />
that the data might be disturbed by artificial magnetic fields produced by a CRT monitor for<br />
example. And it is almost impossible to fix a setup in a room without any interference. Additionally<br />
the sensors have to be close to the base station to obtain accurate tracking results.<br />
We will also see that we have to be careful with attaching a sensor directly to the handheld<br />
device, because of its iron stick. Other tracking devices like an infrared optical tracking device,<br />
the A.R.T. tracking system 2 for example, might be an alternative. A huge advantage of<br />
the magnetic tracker is that there is no need for a line of sight between the sender and the<br />
receiver. So it is no requirement for the setup that the user is restricted not to occlude the<br />
line of sight. The setup for this user study would not be suited well for registering 3D virtual<br />
objects, because interference will lead to jittering.<br />
Software support for Flock of Birds<br />
For using and developing applications with the Ascension Flock Of Bird tracker a commercial<br />
library is available. The ”Eden Library” provides a threaded query mechanism to get<br />
the measurements. To connect to the host system communication links via TCP/IP or serial<br />
port (RS232) are supported 3 . For spatial data representation it supports an OpenGL pose<br />
matrix, a position vector and euler angles. As we can see later quaternions can be derived<br />
by a simple algorithm with the OpenGL matrix as input parameter.<br />
1 http://msdn.microsoft.com/vstudio/<br />
2 http://www.ar-tracking.de/<br />
3 For further information on The Eden Library please contact Phillip Lamb, phil@eden.net.nz<br />
51
5 User Study<br />
Figure 5.1: The Ascension Flock of Birds with the sender (black cube) and the host system in the<br />
background<br />
Calibration<br />
The pose data given by the Flock of Bird is always given in the coordinate system of the<br />
sending station. We get the absolute position and orientation to the origin of the Flock of<br />
Birds coordinate system. The origin of this coordinate system lies in the center of the black<br />
cube (see figure 5.1). In order to bring this coordinate system in relation to the magic book<br />
we have to to a calibration step. We have the possibility to track several targets with the<br />
Flock of Birds, thus we will track the tangible book as well. Figure 5.2 shows two sensors:<br />
the first one is attached to the upper left corner of the book. This is supposed to be the<br />
origin of the Magic Book coordinate system. The other sensor is attached to the hand held<br />
device. Our aim is to record the pose data of the second ”bird” calibrated to the Magic Book<br />
coordinate system. Next we will explain the steps to calculate this. All the methods used<br />
in the following steps were taken from the DWARF utility package. This toolbox provides<br />
all the basic transformations and calculations for spatial data. The mathematical basics for<br />
these methods can be found in the corresponding literature [49].<br />
The Eden Library provides a OpenGL matrix for the Magic Book sensor and for the hand<br />
held sensor, Mbook and Mhandheld.<br />
Position: Calculation the position of the handheld is an easy task by simply subtracting the<br />
position vector of the handheld from the origin of the coordinate system. The position<br />
vector is the last column of OpenGL matrix.<br />
The notation ¯p Book<br />
dinates.<br />
Handheld<br />
¯p Book<br />
lock lock<br />
Handheld = ¯pF Book − ¯pF Handheld<br />
(5.2)<br />
means the position vector of the handheld device in book coor-<br />
Orientation: We want to get the orientation of the handheld in book coordinates q Book<br />
Handheld<br />
in the quaternion representation. The representation of the matrices is important for<br />
52
5 User Study<br />
Figure 5.2: Magic Book coordinate system with its origin in the upper left corner and the handheld<br />
device with a styrofoam puffer due to ferromagnetic distortion<br />
further calculations. Because OpenGL is using a column major order and DWARF a<br />
row major order we first have to transpose both matrices.<br />
MBook = M T Book<br />
MHandheld = M T Handheld<br />
(5.3)<br />
Both matrices contain pose information in Flock coordinates. Now the corresponding<br />
quaternions can be derived by a simple method call.<br />
F lock<br />
qBook = matrix2quaternion(M T Book )<br />
F lock<br />
qHandheld = matrix2quaternion(M T Handheld ) (5.4)<br />
To obtain the resulting quaternion, the quaternion representing the orientation of the<br />
source coordinate system has to be inverted and multiplied with the quaternion of the<br />
handheld device.<br />
The resulting pose data consists of ¯p book<br />
the logging infrastructure.<br />
q Book<br />
lock<br />
Handheld = (qFBook )∗ · q<br />
handheld<br />
53<br />
F lock<br />
Handheld<br />
(5.5)<br />
and qbook<br />
handheld . This data has to be recorded by
Interference and noise<br />
5 User Study<br />
As we said, because ferromagnetic objects may seriously distort measurements it is not possible<br />
to attach the sensor directly to the handheld device. Therefore the handheld device<br />
was prepared in a special way. To make sure that the handheld device does not influence the<br />
measurements it had to be attached in certain distance from the handheld device. To figure<br />
out the proper distance a straight edge was put on to of the handheld perpendicular to the<br />
iron stick. Now the sensor was moved constantly along the straight edge from one side to<br />
the other side. In the middle, on top of the handheld device, a distortion was recognized.<br />
To visualize this distortion a virtual cube was displayed with the pose information given by<br />
the Flock of Birds tracker. In the next step the straight edge was attached further away by<br />
using styrofoam blocks. This step was repeated until no obvious distortion could be recognized<br />
anymore. A styrofoam block with the proper thickness has been attached on top of<br />
the device (5.2). But as discussed this is a huge drawback of using a magnetic tracker in this<br />
setup.<br />
As a summary we realized that it is necessary to track both, the handheld and the Magic<br />
Book. We have to calibrate the setup, because we are considering the book coordinate system<br />
as our world coordinate system. Especially this is important if we want to consider position<br />
information for future evaluations.<br />
5.2.2 Tracking of 2d feature point<br />
The information which feature points are tracked in the texture tracking ARToolkit is transparent<br />
for the programmer. This means that the tracking routine itself is hidden for the application<br />
developer. This has the consequence that the tracking method has to be extended<br />
to obtain the feature point information as well. Due to the texture tracking algorithm, described<br />
in chapter 2, in every frame at least four feature points suited best are chosen. For<br />
every feature point the 2D coordinates are tracked: ¯pF P = (px, py). Note that we only have<br />
a 2 dimensional vector here. Every feature point has a unique identity. This is important for<br />
the user study so we can recognize that a feature point is continuously tracked over several<br />
frames and observe its path through the 2D video plane.<br />
5.2.3 Logging Infrastructure<br />
To setup a logging infrastructure the existing Magic Book application had to be extended. A<br />
Flock Of Birds tracking component has to be integrated and and the the method call of the<br />
tracking routine has to be altered. Both tracking information has to be recorded by a component<br />
logging this information. Later in the evaluation it must be possible to synchronize the<br />
data. The logging can be started and stopped by the glut callback functionality (see figure<br />
5.4). At the end of each tracking frame of the tracking components the data is given to the<br />
Logger via method call and written into a file.<br />
Figure 5.3 shows the classes participated in the logging steps. Central component is the<br />
Logger. It records the pose information given by the texture tracking and the Flock Of Birds.<br />
Both tracking components attach a timestamp to the pose data.<br />
54
5 User Study<br />
Figure 5.3: Static structure of the logging environment<br />
Figure 5.4: Sequence diagram describing the logging steps<br />
55
5 User Study<br />
The data is written to two files. For the 2D feature points coordinates the following data<br />
is recorded. As we said at least for feature points are need to calculate the viewpoint of the<br />
user. All these four feature points are considered (see equation 5.6).<br />
logF eaturepoints = (timestamp, id1, x1, y1,<br />
id2, x2, y2,<br />
id3, x3, y3,<br />
id4, x4, y4) (5.6)<br />
If the tracking fails for one or several frames id1 is set to ’-1’. This makes it possible to<br />
count tracking failures.<br />
The pose data given from the Flock of Birds is logged in the following way (5.7). Each<br />
component for the quaternion q = (x, y, z, w) = (q0, q1, q2, q3) is considered. ¯v = (x, y, z)<br />
is the imaginary vector and w the real scalar. Note that we changed the order of the scalar<br />
and the imaginary vector. This has the reason that the calculation methods provide the<br />
quaternions in this order. The position ¯p = (px, py, pz) is logged as well, although we do not<br />
know if we need it for further evaluation.<br />
logHandheld = (timestamp, q0, q1, q2, q3, px, py, pz) (5.7)<br />
One thing we have not considered at all in this logging environment is the delay of both<br />
trackers. This is necessary to determine the exact state of the setup at one point of time.<br />
Consequences and reasons will be discussed later on in the thesis.<br />
5.2.4 Task Design<br />
Somehow we want to force the participants to behave in a certain way to compare the data<br />
between different participants. Letting them explore the virtual objects randomly might<br />
not lead to satisfying results. Every user might focus on other objects and animations. But<br />
this is still an assumption which has to be proved. The idea is that we might have more<br />
success to compare data of different participants if we give similar tasks to the test persons.<br />
Task centered user interaction design is one approach to develop specific user interfaces [31].<br />
Future user have to be observed in order to know their behavior, to gather information about<br />
how they handle things and to evaluate special requirements for tasks. So this user study<br />
could rather be understood as a task observation, not a usability study. Although we will<br />
possibly collect also information that makes it possible to appraise the usability of the Magic<br />
Book. We present a categorization for tasks in table top AR applications. These tasks are<br />
also related to expected actions or behavior of the participant. These tasks are applied in<br />
the user study, but further on we will discuss if it is possible to abstract them for a variety<br />
of table top AR applications too. Primary goal is to design user tasks that are easy, so even<br />
unexperienced participants are able to perform them without further training. First we want<br />
to introduce the tasks generally and then we will give an example how these tasks have been<br />
applied in the user study.<br />
56
Different tasks in table top AR<br />
5 User Study<br />
The following user tasks have been identified, relevant for the Magic Book. But as we said,<br />
later on we will have a look if we can use them for other table top applications as well.<br />
Overview task: The user has to get an overview of the scene with its virtual, but also real<br />
objects. It is expected that the user will bring himself in a position where he is able<br />
to see the whole scene without moving around in order to get an impression. Special<br />
features of interests might be focused and he might move around slightly. In the Magic<br />
Book this can be achieved by simply asking a question about the content of the scene.<br />
A feature in the content of the scene could be a virtual character or an object.<br />
Focus task: In this task the user is pushed to focus a specific feature of the augmented environment.<br />
The location of the feature should be obvious to the participant. As an<br />
expected behavior the user will move closer to the scene and try to hold still to observe<br />
the feature. Again in the Magic Book through posting a question on a specific virtual<br />
object.<br />
Detail task: This task is a combination of the overview and the focus task. While the focus<br />
task only concentrates on single features, the detail task will force the user to move<br />
around in the scene to get an overview and move closer to focus on features as well.<br />
This can be achieved by asking the participants to count objects in the scene, that occur<br />
at several locations, for example.<br />
Additionally a free task will be introduced. This should give the opportunity to observe the<br />
user when he is able to move around without any restrictions. The task can be circumscribed<br />
as ”understanding the scene”- task. The following example should demonstrate these tasks<br />
by applying them for a fairy tale scene of the Magic Book.<br />
Example application of the tasks in the user study<br />
The questions posted to the participant are related to the scene in figure 5.5.<br />
In this scene the participant is confronted with the tasks described above. This is done<br />
with asking questions about different features of the scene. It is not relevant if the answer of<br />
the participant is right. The focus is on what efforts concerning movements of the handheld<br />
device are made to explore the features. The following questions are posted.<br />
• How many people do you see in the scene?<br />
This is an overview task. All the virtual characters are distributed throughout the scene.<br />
Thus a position is needed where the user gets an overview of the whole scene.<br />
• What is the haircolor of the woman with the white skirt and the yellow jersey?<br />
The participant should focus on a specific feature of the scene. The feature ”women<br />
with the white skirt” is almost obvious to the user. This is a focus task.<br />
• How many people wear a hat or a hairdress?<br />
This is a detail task. The features are spread over the scene and a closer look is necessary<br />
to answer the question. But it is obvious where the features are located.<br />
57
5 User Study<br />
Figure 5.5: Magic Book: In order to let participant perform tasks questions are posted<br />
In the free task the test person is able to observe the scene without further questions or<br />
constraints. The user study itself consists of 4 cases. Every case considers one scene displayed<br />
on one page. Two of these cases are free tasks. In the other two cases questions are<br />
posted to the user in order to accomplish the demanded task. A full description of the cases<br />
can be found in the appendix B.2.<br />
5.2.5 Setup<br />
The environment for the user study is set up in the HIT Lab demo room. The Magic Book<br />
application with the logging extension runs on a fast Shuttle PC (P4 3.2Ghz processor with<br />
1GB DDR400 RAM). The Shuttle PC 4 is plugged to the network and is able to connect to<br />
the Flock of Bird host system via TCP/IP. The tangible magic book is placed on a table in<br />
a similar height as the AR kiosk and one sensor is placed in the upper left corner of the<br />
book, because of the calibration issue. The Flock of Bird tracker does not need a line of sight<br />
between the sender and the receiver, therefore we do not have to ensure that the participants<br />
do not cross the line of sight. A participant equipped with the tracked handheld device is<br />
able to use the application similar to the usual Magic Book setup. The Shuttle PC itself was<br />
placed on a desk nearby the Magic Book table and I was sitting on this desk in order to post<br />
the task questions and to start and stop the logging (see figure 5.6). Figure 5.7 shows the top<br />
view of the user study.<br />
The best tracking results are achieved if the receivers are close to the sending base station.<br />
Therefore the distance of the table to the base station was about 1 1<br />
2 meters. Thus there was<br />
still enough space for the user to move around freely.<br />
4 www.shuttle.com<br />
58
5 User Study<br />
Figure 5.6: User study setup: The magic book is place on a plate in a certain height. The computer<br />
system in the back controls the logging and is connected to the Flock of Birds tracker<br />
Figure 5.7: <strong>Top</strong> view of the user study setup. The base station is put near the sensors as close as<br />
possible.<br />
59
5.2.6 Questionnaire<br />
5 User Study<br />
Additionally to the recorded data a questionnaire was given to the test persons. It should not<br />
take longer than 5 minutes for the test person to answer the questions. This questionnaire<br />
should give further information about the following issues:<br />
• Background of participants<br />
Data about age, occupation and background on AR and the Magic Book is collected. It<br />
is desirable to have a widespread variety of test persons. AR experts would probably<br />
behave different that new users, but this is also only an assumption. Data about age<br />
and occupation were only voluntary, they were not important for the study.<br />
• Feedback on tracking<br />
This feedback focuses on the delay and the jittering of the feature tracking technology.<br />
Both factors affect the immersion of the user in the virtual world. This data was also<br />
only secondary. The delay issue is interesting, because our approach wants to speed<br />
up the computation. Our ideas is to reduce the delay caused by the image processing<br />
routine.<br />
• Feedback on tasks<br />
These questions should give feedback on the difficulty of the task. We made a difference<br />
between the test cases where a user is able to move around without restrictions<br />
and the test cases where the user has to fulfill certain tasks.<br />
• Feedback on user interface and usability<br />
The Magic Book works with a handheld visor a graphical user interface. In the table<br />
top AR chapter 2 we discussed the rationale of this choice. But still there is the question<br />
if another user interface is more suited for the Magic Book. This is as well connected<br />
to the question if the Magic Book is ”easy to use”for unexperienced users.<br />
• Further comments and feedback<br />
During user studies the comments that participants express are valuable as well. These<br />
comments can be used to draw further conclusions for our user study goals, although<br />
they can not be put in empirical data.<br />
All of the questions were posted with scalar values from 1 to 5. The collection of the<br />
tracking data is still the main goal of the user study, thus the questionnaire should only give<br />
additional feedback and information about the user. Mainly the feedback for the tasks were<br />
important because as we said we wanted to have ”easy” tasks. If the test person would<br />
mainly agree that the tasks were difficult, we could hardly compare data sets of expert users<br />
and participants who are confronted with the Magic Book for the first time. The complete<br />
questionnaire can be found in the appendix A.1.1.<br />
60
5.3 Execution of the Study<br />
5 User Study<br />
The execution of the user study was mainly done in two sets. This section gives an overview<br />
of the concrete execution of the user study, concerning the selection of participants, the sequence<br />
of tasks during the study, time and place and a discussion about the difficulties and<br />
problems during the study.<br />
Participants<br />
The method for getting test persons for the user study was mainly ”hallway testing” for<br />
saving time. First I asked students and interns at the HIT Lab to join my study. Unfortunately<br />
most of the students were experts in developing AR applications themselves. But I<br />
still asked the test persons to spread the word and so I was able to test unexperienced users<br />
as well. At a whole 20 test persons were joining the study and the level of expertise was distributed,<br />
which is satisfying for my purposes (see figure 5.8). I recognized that experienced<br />
AR developers are more critical concerning the tracking and UI technologies. On the other<br />
hand test persons who are confronted with AR the first time are very fascinated. This was<br />
my experience while talking with the participants.<br />
Figure 5.8: Overview of the expertise of the user study participants on AR and the Magic Book. The<br />
scale was from 1 (”never heard of it”) to 5 (”experienced developer”). Overall 20 participants joined<br />
the study<br />
Place and time<br />
Due to the shared resources of the demo room the study had to be split in two sets. Thus<br />
the setup had to be built up for several times including a pilot run. In the pilot run mainly<br />
the questions on the virtual scene were tested with a satisfying result. Setting up several<br />
times has a huge disadvantage, because it is hard to set up with the same conditions for the<br />
test persons twice. Even the light conditions influencing the optical tracking depend on the<br />
time of day. Also we had to be very careful with the Flock Of Bird tracker. The demo room<br />
was equipped with some CRT monitor setups and computers which might interfere with the<br />
61
5 User Study<br />
magnetic field of the Flock of Bird. Thus a lot of testing was required prior to the conduction<br />
of the user study. But in the end the results achieved with this setup were satisfying.<br />
Steps in the user study<br />
As an introduction for the participant I figured out a guideline for the study (see A.1.2).<br />
The participant should know about the execution and the purpose of the study. Details<br />
probably influencing the behavior are hidden to the user. Also the usage of the Magic Book<br />
is introduced to unexperienced users. Important was also to let the user know that he can<br />
not do anything wrong or give a wrong answer to the task questions. The participant was<br />
allowed to ask questions during the study as well. For myself I figured out a schedule with<br />
the single steps during the study (see A.1.2).<br />
1. Practice<br />
The first step should give the test person the possibility to get used to the usage of<br />
the Magic Book. The participant should figure out which movements of the handheld<br />
device are allowed by the setup, especially by the tracking routine. If the tracking fails<br />
the reinitialization of the tracking by looking on the marker was explained.<br />
2. Case 1: Free Task<br />
The movements of the handheld and the feature point tracking information was recorded<br />
for 30 seconds during this task. There were no restrictions for the user, expect for the<br />
task to understand the scene.<br />
3. Case 2<br />
Now special questions were posted to the user about features of the current scene. The<br />
categorization of tasks introduced earlier in this thesis was applied here. Prior to the<br />
study special scenes, pages in the Magic Book, were chosen that were suited best for<br />
that purpose. One property of those scenes was that they had more features relative to<br />
the other scenes. The user should answer those questions as soon as possible. There<br />
was no time restriction. The user had to accomplish 5 tasks in this case.<br />
4. Case 3: Free task<br />
This case is similar to the first case, except that another scene was chosen for it (again<br />
30 seconds).<br />
5. Case 4<br />
This case again is similar to case 2. A suited scene was selected for it. In this case the<br />
participant had to succeed 4 tasks.<br />
6. Questionnaire<br />
After the 4 cases the questionnaire was handed out for the user.<br />
7. Gather further comments and feedback<br />
Additionally to the questionnaire the participant had the possibility to make further<br />
comments and encouragements. With most the participants it was possible to chat<br />
about the tracking technologies and the Magic Book. Also some people were interested<br />
in the results of the study.<br />
62
5 User Study<br />
For the user study only 4 scenes were needed. But most of the participants wanted to<br />
enjoy the whole fairy tale consisting of 8 scenes. A complete description of the scenes used<br />
for the cases and the single tasks with the corresponding results of the evaluation can be<br />
found in the appendix (see B.2). In figure 5.9 a participant during the user study can be seen.<br />
Problems and difficulties<br />
Figure 5.9: A participant during the user study with the handheld visor<br />
As already mentioned the availability of the demo room was one restriction to the execution<br />
and preparation of the study. So I had to set up the study environment several times and<br />
I had to ensure to have almost the same conditions for every run. Another important issue<br />
which was not considered in the study setup were the different delays of the trackers. In<br />
order to synchronize both tracking data sets in the evaluation of the study the states of both<br />
tracked objects at one point of time have to match. This is a very difficult tasks to estimate the<br />
tracking delays. The delay of the texture tracking is mainly caused by the transport of video<br />
data from the camera in the main memory and by the image processing routine. The Flock<br />
of Bird tracker first has to transfer the data from the receiver to the host system. Then the<br />
data is transfered via network to the Shuttle PC. To consider both latencies correctly delays<br />
measurements have to be made. This is done by using a reference tracking device where the<br />
delays is know. Shaw proposes an experiment to measure this delay [32]. Performing this is<br />
hard and time intensive. Thus we have to consider a small shift in our data measurements.<br />
But future setups should put the effort to measure the delay offset of both trackers.<br />
5.4 Summary<br />
In this chapter we described the design and conduction of the user study. Also the difficulties<br />
have been discussed. But finally we recorded two sets of data: the 2D feature point<br />
coordinates given by the texture tracking algorithm and the pose information given by the<br />
63
5 User Study<br />
Flock Of Bird tracker. In addition to this we obtained feedback by the user study participants.<br />
The further evaluation will focus on three different aspects now. First we will have<br />
a look at the feature point tracking itself. As we said in chapter 3 the feature point moves<br />
on a path if it it tracked for several frames (see figure 3.1). We will try to discover patterns<br />
in the paths of the tracked feature points and draw some conclusions. These patterns can be<br />
used to explore the relationship with the handheld orientation (see the intersection between<br />
feature point coordinate and handheld orientation in figure 5.10).<br />
We gave a categorization of tasks for table top AR, especially suited for the Magic Book.<br />
We will also try to evaluate if properties of these feature point patterns give hints on the<br />
performed task (intersection between feature point coordinates and task).<br />
Figure 5.10: Overview of further evaluations<br />
Still only a view aspects of this evaluation can be discussed in this thesis. But some further<br />
ideas on how to continue this research area will be provided and discussed in the further<br />
chapters.<br />
64
CHAPTER 6<br />
Evaluation of the User Study<br />
Now we have performed the user study and we have collected the desired data. Now we<br />
have to discuss ideas how to evaluate and analyze the derived data sets. On the one hand the<br />
feature point coordinates and on the other hand the pose information of the handheld device<br />
have been recorded. This chapter tries to evaluate the retrieved data on different aspects. We<br />
can consider an evaluation of each data set alone or find dependencies (see figure 5.10).<br />
The results will be presented and conclusions for table top AR will be drawn. In order to<br />
get a full understanding of the relationship between tracking, user interface and user also<br />
additional ideas for future work will be discussed.<br />
6.1 Evaluation of the User Study<br />
In the last chapter we introduced the evaluation scheme to highlight mainly three different<br />
aspects of the user study:<br />
1. Analyzing the feature point data sets<br />
We will show that the tracking of feature points results in certain pattern in the logging<br />
data. We will use these patterns for our further evaluations.<br />
2. Finding a linear mapping between 2D feature point coordinates and the orientation of<br />
the handheld device<br />
This part describes an approach to find a correlation between the data sets of 2D feature<br />
point coordinates and the change of orientation given by the magnetic tracker. To do<br />
this we have to refine the data in order to compare it, because now we only have<br />
heterogeneous data: 2D coordinates and 3D orientation. The method used to find a<br />
linear mapping is called linear regression.<br />
3. Finding a relationship between the performed task and properties of the tracked feature<br />
points.<br />
65
6 Evaluation of the User Study<br />
We will see that certain properties of the tracked feature points change with the task.<br />
This leads to a task-based approach. This means that the search size could also be<br />
adopted to the performed task.<br />
The results from the questionnaire will be used to have additional ideas if we evaluate a<br />
certain topic. In addition to this approach further ideas for evaluations will be discussed.<br />
6.1.1 Feature Point Tracking<br />
In every feature point tracking frame four feature point are tracked. Thus the logging data<br />
contains these four feature point coordinates (see 5.6) or ’-1’ if the tracking fails in one frame.<br />
First we want to have a look at the tracking failures.<br />
Tracking Failures<br />
Tracking failures are caused by fast or unsuited movements in a way that feature points can<br />
not be found again in the next frame. The user has to reinitialize the tracking by looking on<br />
the square marker again. Figure 6.1 compares the two free tasks (case 1 and case 3) where the<br />
user is allowed to use the Magic Book without any restrictions. Case 1 is performed prior to<br />
case 3. Each case took 30 seconds. In the time between case 2 was performed. Hence while<br />
performing case 2 the participants got additional ”training” with the Magic Book.<br />
Case 1 Mean Tracking Failure Max Min Variance<br />
10.60 35 0 103.62<br />
Case 3 Mean Tracking Failure Max Min Variance<br />
2.25 13 0 8.30<br />
<strong>Table</strong> 6.1: Tracking Failures during case 1 and 3: The chart shows that the tracking failure in case 3 is<br />
lower and does not vary like in case 1<br />
On a first view it is remarkable that the mean tracking failure in case 1 is almost five<br />
times higher than in case 3. Also the variance of the tracking failures leads to the conclusion<br />
than the usage of the Magic Book in case 1 is more heterogeneous than in case 3. This first<br />
interesting consideration could be caused by two different aspects:<br />
• Different content leads to these values<br />
This aspect could be a reason for the differences is these values. But our experience was<br />
that the Magic Book content does not lead to a huge mismatch of the user behavior. The<br />
compositions of the virtual scenes are similar throughout the Magic Book fairy tale.<br />
• Learning the usage of the Magic Book leads to these values<br />
As we claimed in the introduction users adopt to the underlying tracking technology.<br />
The tracking failure measurements give evidence for this assumption. The usage in<br />
case 1 is very heterogeneous but users learn which movements lead to tracking failures<br />
and apply this knowledge in Case 3.<br />
66
This lead to our first insight:<br />
6 Evaluation of the User Study<br />
Result: Tracking Failure<br />
Users adopt to tracking. The comparison of tracking failures show that these failures<br />
decrease with more practice. This important aspect could be considered in our<br />
gyroscope runtime setup. In the beginning always an offset is added to the search<br />
window size. This takes the learning aspect into account.<br />
One possibility to figure out if the Magic Book content influences the user behavior to a<br />
high degree would have been to perform another free task with different content. This case<br />
could be compared with case 3. But it is assumed that there will not be a huge difference due<br />
to the fact that the arrangement of the 3D scenes is very similar.<br />
Interesting is also if test persons who considered themselves as very familiar with the<br />
Magic Book cause less tracking failures than non-experienced users. We split the participants<br />
into two groups. The non-experienced group, assessing themselves from 1 to 3 on the<br />
questionnaire scale and the experienced (4-5). The following chart shows the results:<br />
Experienced Non-Experienced<br />
Case 1 8.70 12.50<br />
Case 3 2.60 2.50<br />
<strong>Table</strong> 6.2: Mean Tracking Failure of the experienced and non-experienced group<br />
The figures show that there is no serious difference between these groups. But the data<br />
have to be considered with caution because every group has outliers with failures up to 28<br />
and more persons have to be tested in order to have a reliable result. But my observations<br />
were that also experienced test persons had to get used to the Magic Book first again.<br />
Connected Chains Pattern<br />
As we have already discussed in chapter 3 we want to analyze resulting paths of a feature<br />
point tracked over several frames. This is easily done by comparing the logged IDs of the<br />
feature points. If the IDs of feature points match in two following tracking frames we know<br />
that we can calculate the coordinate offset between the 2D coordinates. If a feature point is<br />
tracked over more than one frame we will call the resulting pattern connected chain. The feature<br />
point in a connected chain is not necessarily logged on the same feature point position<br />
(1-4) in the logged data. For example it is possible that feature point with id x is logged on<br />
position F P1 in frame i and on position F P4 in frame i + 1. Therefore it is necessary to sort<br />
the logged data to retrieve these connected chains at one position. In figure 6.1 a 3D plot of<br />
the sorted data is shown. The purpose of this plot is to see how the connected chains look<br />
like. Each feature point position F P1 − F P4 is colored in a different way. For further evaluation<br />
we will only consider the feature points at position F P1 (red color). Reason for that<br />
is that the sorting is done in a way that the longest matches will be at position F P1. In this<br />
special case the logged data exists only of one chain (see figure 6.1). In figure 6.2 the data<br />
67
6 Evaluation of the User Study<br />
set at F P1 consists of 27 connected chains. Both plots were taken from the same task with<br />
different test persons.<br />
Figure 6.1: Example 1: 3D Plot of the logged feature point coordinates: It shows the (x, y) coordinates<br />
in the video plane in the corresponding tracking frame. This data set of F P1 consists of only one<br />
connected chain (red chain). The time is measures in milliseconds<br />
The corresponding 2D plot of these both data sets can be seen in figure 6.3.<br />
It is obvious that it is possible to derive information about the occurring movement of the<br />
handheld device from these connected chains. The first plot (6.1) leads to the conclusion<br />
that no heavy movement occurred, because the tracking results in only one connected chain.<br />
In the second figure (6.2) 27 connected chains occur. The 2D view of the video plane (6.3)<br />
underlines this because the area covered by the tracked feature points in the second plot is<br />
obviously larger than in the first one.<br />
A interesting number derived from these connected chains is the shift of one feature point<br />
between two tracking frames. This shift is the length li,i+1 of the vector from the 2D position<br />
pi = (xi, yi) in frame i to the position pi+1 = (xi+1, yi+1) in frame i + 1 (see figure 6.4). The<br />
length of the shift vector can be calculated easily:<br />
li,i+1 = (xi+1 − xi) 2 + (yi+1 − yi) 2 (6.1)<br />
Thus we have two values now if we want to analyze the connected chains: the lengths of<br />
the shift vectors and the number of connected chains in a data set. Note that the number of<br />
connected chains is a specific property of the ARToolkit feature point tracking algorithm.<br />
68
6 Evaluation of the User Study<br />
Figure 6.2: Example 2: 3D Plot of the connected chains. The data set at position F P1 (red) consists of<br />
27 connected chains<br />
Figure 6.3: 2D plots of the data sets are shown: on the left side the corresponding plot with one chain<br />
(example 1) and on the right side the corresponding plot with 27 chains (example 2)<br />
69
6 Evaluation of the User Study<br />
Figure 6.4: Vector of feature point position from frame i to frame i + 1<br />
Result: Connected chains<br />
During the analysis of feature point tracking data we discovered a pattern that we<br />
call connected chains. For further evaluations we will consider the length of shift<br />
vectors.<br />
The next step will take the logged orientation of the handheld display into account. This<br />
would correspond to the intersection of feature point tracking and handheld orientation in<br />
figure 5.10.<br />
6.1.2 Feature Point Tracking and Tracking of the Handheld<br />
First we have to refine the logged data in a way that we can compare it. The idea is that<br />
we use the shift of the feature points on the one hand and the change in orientation on the<br />
other hand. The shift can be measured by calculating the vector length using the coordinates<br />
offsets as described in 6.1. Note that we can only calculate the vector length at connected<br />
chains, because the same feature point is tracked over several frames. In order to synchronize<br />
the data sets, the start and the end of the vector has to be annotated with the orientation<br />
valid at the corresponding time (see figure 6.5). The timestamps are used to perform this<br />
synchronization. The update rates of both trackers are constant. The Magnetic tracker has<br />
an update rate of 90 fps (frames per second) and the vision based tracker works with an<br />
update rate of 30 fps. Thus the magnetic tracker is about 3 times faster than the optical<br />
tracker.<br />
As we discussed during the design of the user study this step is not done in an optimal way<br />
because the delay offset between the vision based and the magnetic tracker is not considered.<br />
Now we have to calculate the change in orientation between the orientation at the beginning<br />
and at the end of the vector. Due to the fact that we logged quaternions we have to calculate<br />
the difference of the quaternions to get the anglular offset. For every tracking frame i + 1 we<br />
can derive data pairs di+1 with<br />
di+1 = d(xi+1, yi+1) = (∆(qk, qk+n), l (i,i+1)) (6.2)<br />
70
6 Evaluation of the User Study<br />
Figure 6.5: The beginning and the end of the shift vector is annotated with the orientation using the<br />
timestamps<br />
Difference between a quaternion rotation<br />
The calculation of the difference between a quaternion rotation is applied similar to Strasser’s<br />
diploma thesis [53] which should be considered for a deeper look. As a repetition a quater-<br />
θ)¯v) specifies a rotation of θ = 2arccos(s) around the axis ¯v.<br />
nion q = (s, sin( 1<br />
2<br />
q = (s, sin( 1<br />
θ)¯v) ⇒ θ = 2arccos(s) (6.3)<br />
2<br />
This leads to the result that we only have to consider the scalar value s of a quaternion<br />
if we want to derive the angle θ. Now the difference d between two quaternions p and q is<br />
calculated by multiplying the conjugate of the first quaternions p∗ = (w, −¯v) with the second<br />
quaternion q = (w ′ , ¯v ′ ). This is also called the derivation of a quaternion:<br />
d = p ∗ q = (ww ′ − ¯v · (−¯v ′ ), ¯v × (−¯v ′ ) + w(−¯v ′ ) + w ′ ¯v) (6.4)<br />
Note that · is the scalar multiplication and × the vector product in R 3 . As we have seen in<br />
6.3 we only need to consider the scalar part of 6.4 now:<br />
∆w = ww ′ + xx ′ + yy ′ + zz ′ (6.5)<br />
with ¯v = (x, y, z) and ¯v ′ = (x ′ , y ′ , z ′ ). Now we can compute ∆Θ:<br />
∆Θ = 2arccos(∆w) = 2arccos( ww ′ + xx ′ + yy ′ + zz ′ ) (6.6)<br />
In order to calculate the angle in the right quadrant by the arccos function the absolute<br />
value of ∆w must be taken.<br />
Now the angle between the two quaternions qk and qk+n can be computed by using equation<br />
6.6:<br />
∆Θk,k+n = 2arccos(|wkwk+n + xkxk+n + ykyk+n + zkzk+n|) (6.7)<br />
71
6 Evaluation of the User Study<br />
For all connected chains we will compute the corresponding data pairs. A data pair consists<br />
of the angular offset of the handheld device between the start and the end of an optical<br />
tracking frame the length of the shift vector.<br />
di+1 = d(xi+1, yi+1) = (∆Θk,k+n), l (i,i+1)) (6.8)<br />
The next step is to characterize the relationship between the vector length and angle measurements.<br />
Correlation and Regression of the Data Pairs<br />
As we have already discussed we now want to have a measurement how the 3D orientation<br />
and the 2D feature point coordinates are related to each other. In our pre-considerations<br />
above we refined our data to have comparable data pairs di consisting of the angle between<br />
the quaternions and the length of the shift vectors at the corresponding timestamps. The<br />
first question is how the data sets are related to each other. Then we want to characterize<br />
this relationship and develop a linear mapping. We will use two statistical techniques to<br />
evaluate the data pairs:<br />
Correlation. The correlation gives a measurement to what degree two sets of data are related<br />
to each other. The Correlation Coefficient r can be calculated by every standard statistical<br />
tool and gives a value between -1 and 1. The idea is very easy: If we would plot all the<br />
data pairs d(x, y) with x on the x-axis and y on the y-axis and all the point would fall on<br />
one straight line the correlation coefficient would become |r| = 1. This is an indicator<br />
for a very strong relationship. On the opposite the correlation coefficient tends to 0 if<br />
the points would be randomly spread. If the value of y increases with higher values of<br />
x the coefficient is positive. Otherwise the coefficient has a negative sign. These plots<br />
are also called scattered plots.<br />
Regression. The regression characterizes a relationship of two measurements. We will concentrate<br />
on linear regression only. The linear regression computes a linear model of the<br />
measurements in order to predict the dependent variable Y using the predictor variable<br />
X. The resulting function look like this:<br />
Y = a + bX (6.9)<br />
The linear regression estimates the values for a and b. If we would look at the plot this<br />
function is represented by best suited straight line to minimize the difference from the<br />
actual measurements. These differences are called residuals. If the relationship can not<br />
be described by a linear model a multiple regression has to be applied which will not<br />
be discussed here.<br />
An introduction to correlation and linear regression can be found in the appendix of this<br />
thesis A.2.<br />
We want to derive a linear model for our measurements. We want to predict the length<br />
of the shift vector (dependent variable L) using the angular offset (independent variable A).<br />
The linear regression calculates the values for a and b in this function:<br />
72
6 Evaluation of the User Study<br />
L = a + bA (6.10)<br />
In order to have an overall conclusion about the correlation and regression we concatenated<br />
the data sets of all the test persons in every test case. This increases the number of<br />
measurements and leads to a more accurate estimation of the linear model. The following<br />
example is taken from test case 2. It shows the measurements derived by the data of task 2.<br />
Figure 6.6 shows the measurements in a scattered plot with the angle A on the x-axis and the<br />
vector length L on the y-axis.<br />
Figure 6.6: Case 2 / Task 1: The scattered plot with regression line<br />
The correlation analysis with this task leads to a correlation coefficient of 0.6340. This gives<br />
an indicator that the measurements are related. If we have a look at the plot we see that the<br />
the length of the shift vector increases with higher angles. Thus the coefficient is positive.<br />
The regression analysis estimates a = 1.26631 and b = 194.822. This leads to the following<br />
linear model:<br />
L = f(A) = 1.26631 + 194.822 · A<br />
Of course this regression function is only an estimation. The output value is the length of<br />
the shift vector which is exactly the search window size needed in order not to produce a<br />
tracking failure. Thus the feature point lies somewhere on the circle with radius f(A) around<br />
the position of the feature point in the previous frame (figure 6.7).<br />
If all the points for every input angle would fall on the circle the correlation coefficient<br />
would be r = 1 and we would have a perfect correlation. Now we have a look at the<br />
residuals. As we said briefly the residuals are the differences between the linear model and<br />
73
6 Evaluation of the User Study<br />
Figure 6.7: The feature point falls on a circle with radius r = f(A): Point inside the circle are residuals<br />
below the regression line, points outside the circle are residuals above the regression line<br />
the actual measurements, they would cause errors for our search window configuration.<br />
If we look at figures 6.7 and 6.6 we see that residuals below the regression line will not<br />
result in tracking failures, because the search window with length lW indow = 2f(A) is larger<br />
than actually needed in the current frame. The residuals above the regression line would<br />
cause failures. They are located outside the search window. A possible solution to solve this<br />
problem is to add a specific offset to f(A). Thus the regression line would shift on the y-axis<br />
and more points would fall below the line. If all the points should fall below the regression<br />
line we can chose the maximum positive residual as offset:<br />
lengthi = f(anglei) + max(A measured − f(A) | (A measured − f(A)) > 0)<br />
But the adequate offset still has to be evaluated, which is not part of this thesis. The<br />
maximum residual as offset could be an outlier and the search window would be far too<br />
large in consideration of all the other measurements.<br />
Analyzing the different Tasks<br />
As a short repetition the following steps have to be made in order to analyze all the data sets<br />
for the different tasks:<br />
1. Extract the connected chains for each data set<br />
2. Calculate the shift vector lengths of the connected chains (6.1)<br />
3. Synchronize each start and end of the shift vector with the corresponding quaternions<br />
(6.5)<br />
4. Compute the difference between the quaternions in order to obtain data pairs d = (y, x)<br />
(6.7, 6.8)<br />
74
6 Evaluation of the User Study<br />
5. Concatenate all the data pairs for one test person in one task, because every task will<br />
result in several connected chains. As we have seen in example 2 we have retrieved 27<br />
connected chains. Each chains has to be refined according to the previous step and all<br />
the data pairs have to be united.<br />
6. Concatenate the data pairs of all the test persons for one task. As we have said we<br />
want to have an overview of all the data of a single task. Therefore the data pairs of<br />
every test person are united.<br />
7. Now for every task the linear model is calculated (6.10) with a statistical tool.<br />
The following chart show the estimations of the correlation and the linear regression for<br />
every test case and the corresponding tasks. The following numbers are shown in the tabular:<br />
the correlation coefficient r, the parameters a and b of the linear model. The value h<br />
which gives a number for the angular rate of turn causing an offset of the search window<br />
size of 1. It is just a solution to the equation ∆yi = b∆xi ⇒ 1 = b∆xi.<br />
r a b h<br />
Case 1<br />
Free Task<br />
Case 2<br />
0.3239 1.56895 144.624 0.0069<br />
Task 1 0.6340 1.26631 194.822 0.0051<br />
Task 2 0.5511 1.45847 177.196 0.0056<br />
Task 3 0.5296 1.78150 193.941 0.0051<br />
Task 4 0.5949 1.76134 220,166 0.0045<br />
Task 5<br />
Case 3<br />
0.6254 1.34886 219,083 0.0046<br />
Free Task<br />
Case 4<br />
0.4642 1.55991 127.487 0.0078<br />
Task 1 0.5682 1.16379 190.290 0.0052<br />
Task 2 0.4934 1.48687 131.274 0.0076<br />
Task 3 0.4366 1.74120 130.608 0.0076<br />
Task 4 0.5216 1.49365 197,812 0.0051<br />
<strong>Table</strong> 6.3: Results of the linear regression with the correlation coefficient r, the parameters a and b of<br />
the linear model. The value h which gives a number for the angular rate of turn causing an offset of<br />
the search window size of 1.<br />
The statistical tool also computes a p-value. This p-value is the probability that we make<br />
a mistake if we deny the assumption that the angle and the shift vector are not related at all<br />
(the null-hypothesis H0). In our linear model this assumption means that the b-parameter is<br />
zero (H0 : b = 0), thus the angle does not influence the length of the shift vector at all. The<br />
p-value for all the tasks is 0.00001. With a probability of 0.00001 we would make a mistake<br />
if we deny the hypothesis that these data sets are not related. In other words it is highly<br />
significant that our data sets are related, the angle influences the vector length. Please have a<br />
look in the appendix for further explanations (A.2). This underlines exactly our expectations.<br />
75
6 Evaluation of the User Study<br />
If we look at the data now we see that we have a positive correlation r > 0 in every case.<br />
The coefficient in case 1 r = 0.3239 is very low compared to the other cases. Reasons for that<br />
could be that the usage of the Magic Book in the beginning is not homogeneous. Our first<br />
result that users adopt to tracking underlines this. If we exclude case 1 the r has a range from<br />
0.44 to 0.63 which indicates a relation. Possible factors influencing the correlation coefficient<br />
should be taken into account:<br />
• Delay of Trackers<br />
In our setup we have not considered the tracking delay of both tracking technologies.<br />
To calculate this delay huge efforts have to be made. We do not have exact numbers for<br />
the delay of each single tracker and the efforts for a measurement would be too high<br />
for this thesis. An idea to compensate this is to shift the data sets. The shift maximizing<br />
r has to be found.<br />
• Change in position as well<br />
It still has to be analyzed to what degree the change of position has to be taken into<br />
account, because movements in position occur. Thus the position data of the handheld<br />
device has to be explored as well for further evaluations.<br />
Summarizing all these aspects lead to the following result:<br />
Result: Correlation<br />
The change in orientation and the tracking of feature points are related significantly.<br />
The correlation coefficient r is positive and ranging from 0.44 to 0.63.<br />
Let us now look at the values computed for the linear models. The value for parameter a<br />
ranges from 1.27 to 1.78. If we round it to the next possible number:<br />
⌈a⌉ = 2 (6.11)<br />
This value for a is valid for all cases. b on the other hand ranges from 127.49 to 220.166 and<br />
is responsible for the slope of the regression line. If we want to derive a global configuration<br />
for all the cases and the corresponding tasks we have to take the maximum value for b. In<br />
the next section we will discuss if we can find a relationship between the tracking data and<br />
the performed task. Thus the linear model could be altered to the task. If we would take the<br />
maximum value for a global configuration the linear model looks like this:<br />
y = 2 + 220.166x (6.12)<br />
To calculate the search area size l 2 now we still have to add an offset k now to address the<br />
problem of positive residuals. This offset has to be evaluated in further research. Therefore<br />
the following equation is only a raw estimation:<br />
l 2 = (2 ⌈(2 + 220.166x + k)⌉) 2<br />
76<br />
(6.13)
6 Evaluation of the User Study<br />
Result: Regression<br />
We can derive a linear mapping f(x) with the angular difference as input parameter.<br />
Thus we can predict the shift of the tracked feature point and therefore the<br />
necessary search window size. This mapping is only an approximation and therefore<br />
an additional offset has to be added to the search window size.<br />
The scattered plots and the output data for each test case can be examined in the corresponding<br />
appendix section A.2. In the following section we try to derive conclusions about<br />
the relationship between tracking and tasks.<br />
6.1.3 Feature Point Tracking and Tasks<br />
First lets have a look at the overview of our evaluation again (6.8).<br />
Figure 6.8: Overview of further evaluations<br />
In the last section we described the relationship of the feature point tracking and the handheld<br />
orientation. We want to obtain additional information about the relationship between<br />
feature point tracking and tasks (intersection feature point coordinates and tasks in 6.8). Additional<br />
feedback for the tasks was collected from the questionnaire we handed out to the<br />
test persons. We asked the test person if they think the tasks were easy to accomplish. This<br />
was done just to have a feeling about the tasks, because we wanted to give easy tasks to the<br />
participants. We collected feedback for the free tasks as well as the other ”navigation” tasks<br />
(see figure 6.9). The figure shows that most of the test persons agree that all the tasks were<br />
easy. The free task was obviously easier than the navigation tasks, due to the fact that no<br />
restrictions were made.<br />
We will use information that we have already derived (6.1.2, 6.1.2). The following chart<br />
(6.4) shows the correlation coefficient, mean shift vector length and the mean number of<br />
connected chains for every task. With this information we will try to derive conclusions of<br />
the performed task.<br />
77
6 Evaluation of the User Study<br />
Figure 6.9: Results of the feedback to the question ”The task was easy to perform” (1=strongly disagree,<br />
5=strongly agree)<br />
r ¯s var(¯s) ¯t ¯c/time<br />
Case 1<br />
Free Task<br />
Case 2<br />
0.3230 2.34 0.60 30 1.55<br />
Task 1 0.6340 1.91 0.72 10.36 1.47<br />
Task 2 0.5511 2.33 0.34 10.97 2.26<br />
Task 3 0.5296 2.79 1.26 7.89 2.99<br />
Task 4 0.5949 2.93 1.16 12.77 2.68<br />
Task 5<br />
Case 3<br />
0.6254 2.13 1.11 12.40 1.80<br />
Free Task<br />
Case 4<br />
0.4642 2.33 0.61 30 2.05<br />
Task 1 0.5682 1.97 0.80 9.25 1.76<br />
Task 2 0.4934 2.07 0.45 18.39 2.15<br />
Task 3 0.4366 2.51 0.70 20.91 2.37<br />
Task 4 0.5216 2.39 0.72 12.33 1.83<br />
<strong>Table</strong> 6.4: Results of Feature Point Tracking Evaluation: r is the correlation coefficient, ¯s is the mean<br />
length of the shift vector, var(¯s) is the variance of the shift vector length, ¯t is the mean time needed to<br />
perform the task and ¯c/time is the number of connected chains in relation the needed time<br />
78
6 Evaluation of the User Study<br />
In the further evaluation we will not consider the free tasks because it is hard to conclude<br />
to a common behavior in this case. Just on a first glimpse we can see that the mean vector<br />
shift ¯s is equal in case 1 and case 3. The idea of the tasks was to enforce the user to perform<br />
specific actions. We will now have a look at the values in the chart and conclude to the<br />
performed actions. A full overview of the tasks can be found in the appendix B.2.<br />
First we try to separate the task into three groups. One group g1 with a rather low vector<br />
shift ¯s < 2.20 and another group g2 with a high value for ¯s > 2.50. The third group g3 is<br />
formed by the rest. We will use an abbreviation for every task: Case 2 Task 1 will be C2T 1<br />
for example.<br />
• Group g1<br />
g1 = {C2T 1, C2T 5, C4T 1, C4T 2, }<br />
g2 = {C2T 3, C2T 4, C4T 3}<br />
g3 = {C2T 2, C2T 4} (6.14)<br />
The question for the tasks in g1 was to count features in the scene, like people or items.<br />
These object were obvious to the user and distributed all over the scene. Thus the user<br />
can view the scene from a certain distance with the whole scene in his viewpoint to<br />
accomplish the tasks. These are the overview tasks. Because the user wants to see<br />
the whole scene movements of the handheld visor will result in small pixel offsets.<br />
Another indicator underlining this is the number of chains. The shift vector length is<br />
not big and therefore it is more likely that a feature point is tracked in the next frame<br />
again. This results in less connected chains. Except for C4T 2 these tasks have very<br />
good values for the correlation coefficient r, thus the linear model for those tasks are<br />
more reliable. The time to perform C4T 2 is higher than the other tasks in g1, this might<br />
give hints that the task is more ”difficult” than the other tasks in this group. General<br />
it took more time to perform C4T 2 and C4T 3 and the coefficients r are low compared<br />
to the other tasks. But we definitely found characteristics for overview tasks using the<br />
numbers in the chart.<br />
• Group g2<br />
The values for ¯s, the mean length of the shift vector, are high. As we have said if the<br />
camera is close to the 2D surface small movements will cause a large pixel offset. And<br />
indeed all of the tasks in this group were to done to let the user focus on a specific<br />
feature. Questions like ”What is the eye color of the man?” or ”What is the color of her<br />
shoes?”were posted. Thus the user has to ”zoom” into the scene which causes that<br />
the camera will get closer to the surface. We can characterize the focus task with this<br />
behavior. The question for task C2T 4 was ”How many people wear a headdress?”.<br />
With our definition of tasks it is rather a detail task than a focus task, it is more a<br />
combination of overview and focus task. But with my observations during the study i<br />
could recognize that the test persons had to focus on a certain person to figure out the<br />
headdress. Also the number of chains is relatively high in these tasks.<br />
• Group g3<br />
79
6 Evaluation of the User Study<br />
According to our statements made for the other groups we can expect this group to<br />
have a in-between property, because of the values for ¯s being in the middle range.<br />
This property again is the distance of the camera to the 2D plane. This leads to the<br />
conclusion that the distance between the camera and the surface for this group of tasks<br />
are between the overview (large distance) and focus (short distance) task. Lets have<br />
a look at the questions: Both questions were asking about certain features again. In<br />
contrast to the features in the focus task, it is not necessary to zoom into the scene to a<br />
high degree. But on the other hand these questions could not be answered only with<br />
having an overview of the scene. The number of chains is in the middle range as well.<br />
Because the number of chains is a specific property of the ARToolkit feature point tracking<br />
we use the vector shift length as an indicator. Using this number we can distinguish between<br />
our tasks. Here is a summary of the characteristics of each task:<br />
• Overview Task<br />
The user places the camera in a certain distance that allows to have the whole 3D<br />
scene or everything else important for the table top application in his viewpoint. Small<br />
movements of the camera result in small pixel offsets (the length of the shift vector).<br />
The best results for the linear model can be achieved.<br />
• Detail Task<br />
We have to refine our definition of detail tasks. We said that it is a combination of<br />
overview and focus task. Thus the user has to zoom in the scene from time to time to<br />
explore certain features. But the user does not need to zoom real close in the scene to<br />
perform the task. Being closer to the plane causes a larger pixel offset compare to the<br />
overview task.<br />
• Focus Task To accomplish this task the user has to zoom close to the 2D plane. Thus<br />
even small movements will result in large pixel offsets.<br />
Result: Categorization of tasks<br />
We were able to find a categorization for the proposed tasks and underline this with<br />
the corresponding evaluations of the user study.<br />
For all this results the position data of the handheld device could be taken into account as<br />
well. Due to time limitations I was not able to perform a deeper evaluation of the recorded<br />
data.<br />
6.1.4 Further Evaluations<br />
In the questionnaire also feedback to the experience of the participant concerning tracking<br />
and user interface was collected. But this feedback is not used for additional evaluations<br />
here, because the focus was on analyzing the recorded data. But it was a good possibility<br />
to retrieve this information as well. The questions were structured into usability of the<br />
80
6 Evaluation of the User Study<br />
Magic Book, the awareness of jittering and delay of the vision based tracker and the handheld<br />
device user interface. One interesting thing I recognized while looking on the tracking<br />
questions was that the participants were more aware of jittering than tracking delay. The<br />
test person were basically satisfied with the usability of the Magic Book, but there was a<br />
divergence in the answers to the question if the handheld device is a ”very suited device<br />
for interacting with the Magic Book”. It would be interesting to evaluate this question with<br />
other graphical output devices. All see data from the questionnaire can be seen in the appendix<br />
B.1.<br />
In the next chapter we will summarize all the results of the user study. Further on conclusions<br />
on our user study based evaluation are drawn. Implication for future work will be<br />
discussed as well.<br />
81
CHAPTER 7<br />
Conclusions<br />
This chapter summarizes the experiences and results of the thesis concerning table top AR.<br />
It also reviews the approach of finding a mapping between user movement and tracking<br />
parameters. Ideas and implications for future work will be shown.<br />
7.1 Results<br />
In this section we will summarize the results obtained from the evaluation of the user study.<br />
First let us have a look at our motivation for the user study again. Our idea was to characterize<br />
the relationship between feature point tracking and user behavior. The usage of the<br />
system was done by moving the video camera which is attached to the handheld device.<br />
This information should be used to configure our runtime setup consisting of a combination<br />
of a gyroscope and an optical tracker. The orientation of the gyroscope is mapped on the<br />
search window size of the tracking routine. In previous setups the search window size is a<br />
constant parameter. We want to alter it during runtime according to the change of relative<br />
orientation of the gyroscope running with a higher frame rate than the optical tracker.<br />
Due to the fact that the texture tracking ARToolkit is a special technology for natural feature<br />
tracking we will shortly discuss the importance for other natural feature tracking algorithms.<br />
Further on we will try to abstract our results and ideas for table top AR applications.<br />
7.1.1 Results of the User Study<br />
The question was: ”To what degree do changes in orientation influence the pixel offset from<br />
frame to frame in the tracking routine?”. Therefore we collected data of the handheld movement<br />
(magnetic tracker) and the feature point coordinates in the 2D video plane. The evaluation<br />
of the user study led to the following results:<br />
1. Relative orientation and feature point offset between two frames are related<br />
We applied a statistical technique called correlation analysis. The correlation coefficient<br />
r gives information to what degree two measurements are related. Analyzing the tasks<br />
82
7 Conclusions<br />
we derived values for r from 0.45 up to 0.63 which indicates a moderate relationship<br />
between the two sets. This result allows us to consider the relative orientation for our<br />
runtime setup.<br />
2. We can derive a linear model for this dependency<br />
Using a regression analysis we derived approximations for linear relationships between<br />
both measurements X and Y of the form:<br />
Y = a + bX + offset (7.1)<br />
The linear regression estimates values for a and b. The values differ from case to case.<br />
But we can obtain a raw approximation for a global setup. Future work is to optimize<br />
this model for each case. Also a suited value for the constant offset has to be estimated.<br />
The output of the regression analysis also indicates a high significance for the<br />
dependency between the two data sets.<br />
3. Tasks<br />
We introduced a categorization for tasks and indeed the tracking results differ in the<br />
mean pixel offset in each task group. We proposed that the categorization is dependent<br />
on the usage of the handheld and thus the camera. If the user holds the camera far<br />
away from the 2D surface he wants to have an overview. If he zooms more into the<br />
scene, but he is still not concentrating on a single feature we can speak of a detail task.<br />
If the user zooms into the scene to examine a single feature we explored the focus task.<br />
4. Tracking Failures<br />
We showed that the tracking failures when a user starts to use the application are significantly<br />
high. Thus users adopt to the tracking technology. This can be considered<br />
in the runtime configuration as well. We enlarge the search window for a specific time<br />
and reduce it stepwise during usage. But more evaluations are necessary to explore<br />
this behavior.<br />
The results give concrete steps in order to find a good configuration for our runtime setup.<br />
We proposed a DWARF based architecture for a dynamic configuration of the search window<br />
during runtime. The software architecture provided in chapter 4 has to be extended with the<br />
linear mapping. A test environment should enable the developer to test the parameters for<br />
the linear model. Thus the offset of the linear model can be estimated through testing as<br />
well.<br />
Potentials for further evaluation<br />
If we have a look at our evaluation overview again 6.8 we can see that we have not discussed<br />
the intersection of the tasks and the movement information of the handheld device yet. This<br />
aspect can be evaluated as well. As we said that the tasks can be distinguished by distance<br />
from the 2D plane, the recorded position data could be used to underline this. Also we have<br />
not compared the data of the test persons or of only one test person during all the tasks. Our<br />
first issue was to have a look at a global configuration. If the usage is different from user to<br />
user a possible idea would be to estimate user profiles for the runtime configuration.<br />
83
7 Conclusions<br />
Another important issue is that we still do not predict the search window size. We only<br />
found a relationship between the orientation and feature point tracking. Thus an adequate<br />
mechanism to predict the window size has to be evaluated. The property that the gyroscope<br />
has higher update rates should be utilized and the window size has to be set before the next<br />
tracking frame of the optical tracker starts.<br />
Next we will have a look how we can use the information gathered in the user study for<br />
natural feature tracking and table top AR in general.<br />
7.1.2 Natural Feature Tracking<br />
One thing that all the natural feature tracking algorithms have in common is that certain<br />
feature points are tracked. The texture tracking of the ARTookit is a special case, because it<br />
only enables the tracking of preprocessed 2D textures only. In the related work section 3.5<br />
we considered other algorithms as well. The basic assumption for all algorithms where our<br />
idea can be applied is that the inter-frame displacement is small. Our movement model is<br />
very simple. It expects the feature point to be almost at the same position. This assumption<br />
is realistic in table top AR cause the field of interest is restricted to the horizontal table top<br />
setup. We also discussed other techniques like the tracking of whole regions to address<br />
heavier motions. But it has to be evaluated if out approach is suited for these algorithms as<br />
well. The better alternative to allow larger movements with a hybrid tracking setup would<br />
be to predict the camera movement applying the gyroscope data. We also showed examples<br />
for such setups.<br />
7.1.3 <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong><br />
In our special case we used the Magic Book which is a table top application where the user<br />
does not have to solve a certain task. As the setup allows the user can move around freely<br />
and explore the 3D virtual scenes without any restrictions. In the user study we restricted<br />
the usage by asking the test participants to fulfill certain tasks and force the user to use<br />
the Magic Book in a certain way. With this method we achieved a categorization of tasks<br />
suited for the Magic Book. As we figured out these tasks are related to different movements<br />
influencing the optical tracking. Still we have to validate if these tasks are also suited for<br />
other applications. We have also discussed the huge potential for table top AR which has to<br />
be addressed by future research.<br />
Let us look at another table top application. An ISMAR 2004 demo introduced a chess<br />
game combining <strong>Augmented</strong> and Virtual <strong>Reality</strong> [15]. One player sits in front of his chess<br />
board wearing using a graphical output device. He is able to move his own tangible chessman<br />
while the chessmen of the component are virtual objects. If now a natural feature tracking<br />
technique would be used to align the virtual chessman on the chessboard, for example<br />
by tracking the edges of the chessboard, we can apply our results from our user study. Thus<br />
the user probably wants an overview of the whole scene most of the time we can apply the<br />
overview task configuration. If the chessman is nicely animated the user might have a closer<br />
look at the pawns in the game. Thus we can switch to the detail task configuration. We can<br />
assume that every application has a nature of usage known by the developer. This information<br />
can be derived by studies and be used for a configuration. Of course it is more difficult<br />
84
7 Conclusions<br />
if a user does not have to accomplish a measured goal, like in the Magic Book. If we consider<br />
an augmented exhibition for example it might not be obvious where a user looks first or if<br />
he will zoom closer to the object. Later on we will discuss as well how virtual content might<br />
enforce actions by the user.<br />
7.1.4 Assessment of our Approach<br />
In comparison to the related work described in 3.5 our approach was not to improve the<br />
tracking itself by using hybrid tracking technologies, but to improve a specific class of applications<br />
by considering the user context. In our case the user context was the usage of<br />
the handheld device. This usage is estimated by interpreting relative orientation given by<br />
an additional gyroscope tracker. The main part of the thesis was to design, conduct and<br />
evaluate a user study to explore a relationship between both tracking measurements. During<br />
the evaluation phase we figured out that the movement context and the behavior of the<br />
tracking routine are related. It makes sense to consider this movement context in a runtime<br />
environment provided by the proposed architecture. As we have also discussed a delay of<br />
the underlying tracking techniques has to be measured and applied for further studies. This<br />
will lead to better results. In the research area of human-computer interaction a lot of further<br />
studies have to be made to understand users in a better way. We provided some suited ideas<br />
which can be taken into account for future evaluations.<br />
In my opinion the issue of how humans interact with computer systems will become more<br />
and more important with the evolution of new interaction methods used by AR applications.<br />
On the other hand humans always adopted to new technologies and are used to learn how<br />
new technologies work. Thus it is also a question to what degree we can expect people to<br />
learn new interaction techniques.<br />
7.2 Future Work<br />
In this work we tried to characterize the user behavior for a specific application class. Still<br />
a lot of factors influencing the behavior have been unconsidered. Further on we will have a<br />
look at a technique to enforce the user to perform actions.<br />
7.2.1 Factors for User Behavior<br />
First we will introduced the most important factors influencing the user behavior:<br />
• Hardware for graphical output<br />
For our studies this factor was fixed to the handheld display. Other hardware for user<br />
interfaces might be used in a totally different way. Potential for future research is to<br />
repeat the study with a head mounted display and a tablet-PC as well. The results<br />
have to be compared and the linear models have to be adopted to the corresponding<br />
user interface hardware.<br />
• Tasks<br />
85
7 Conclusions<br />
In our work we tried to categorize tasks for table top AR. We varied these tasks and<br />
found out that the logging data varies between these tasks. These tasks have to be<br />
validated or even declined for other applications as well.<br />
• Virtual Content<br />
This parameter was also almost fixed during the study. Of course the scenes change<br />
from page to page but the content was comparable. But this content could be varied<br />
as well. Shapes, animations, textures could influence the user behavior by catching his<br />
attention as well. So experiments could be made with altered properties of the virtual<br />
scene in each test condition.<br />
• User<br />
These are factor depending on each single user, like psychological factors or the knowledge<br />
of a user. It is a rather difficult task to measure these factors. Performing these<br />
studies is an interdisciplinary task between computer scientists and psychologists. Our<br />
study lead to the result that a user learns about the tracking during usage for example.<br />
These factors can be varied and will probably lead to different results.<br />
7.2.2 Visual Cues<br />
The tasks used in the Magic Book user study were constructed artificially to enforce a certain<br />
feedback by the user. It will not be realistic that a user uses the Magic Book and performs<br />
these tasks if he is not forced to accomplish them. Thus we do not know what the user will<br />
do in order to provide a configuration for the runtime setup. An idea would be enforced actions<br />
by changing the virtual content. This leads to another interesting question: ”How can<br />
virtual content enforce actions by the user?”. If it is possible to use these so called visual cues<br />
we can predict actions by the user. Usually visual clues are used to give the user additional<br />
graphical feedback or hints. For example an AR kitchen project developed at the MIT provides<br />
additional visual cues to the user in order to improve the performance. It highlights<br />
a the drawer containing a needed ingredient for example [12]. In contrast we want to use it<br />
as a mechanism to start actions by the user. In the chess application mentioned earlier the<br />
system is highlighting a chessman if the user has to put out of the game [15]. Thus we expect<br />
that the user will take the chessman out of the game. Another idea is to animate a chessman<br />
when it is moved. It is more likely that the user will focus on this animation than on other<br />
features of the scene. Again it will be more difficult with museum exhibition applications,<br />
but also by changing the content chains of actions can be enforced.<br />
All of these ideas have to be underlined by corresponding studies.<br />
7.2.3 Next Steps<br />
The following steps should give a rough guideline for further steps on this research.<br />
1. User Study<br />
The user study should be repeated with the following modifications. First the tracking<br />
delay for both trackers must be considered. Now we can vary some of the factors<br />
86
7 Conclusions<br />
described above. In a first test condition the study is made with the handheld device<br />
again, the second test case with a tablet PC. The resulting evaluation data should be<br />
compared. Also the content could be changed in a second study. The first condition<br />
will evaluate the fairy tale Magic Book while the second condition will be made with<br />
different content. Experiments with visual cues could be made as well. The evaluation<br />
methods described in this work could be applied.<br />
2. Extend DWARF-Architecture<br />
Still the DWARF component has to be fed with configuration. At this stage the architecture<br />
only provides the communication mechanisms and the configuration for the<br />
linear model could be easily integrated. A good idea would be to provide a test environment<br />
where the parameters of the linear model (7.1) could be altered. Another<br />
interesting idea is that the linear model adopts to the occurring tracking failures.<br />
3. Validate Results<br />
Finally it has to be shown that the dynamically configured Magic Book really leads to a<br />
better performance in term of robustness and computation. Possible method would be<br />
another user study with two test conditions, one without dynamic configuration and<br />
one with the gyroscope setup.<br />
Finally I hope that I could provide ideas and encouragements to continue parts of this<br />
work.<br />
87
Glossary<br />
<strong>Augmented</strong> <strong>Reality</strong>. The goal of <strong>Augmented</strong> <strong>Reality</strong> is to enrich the real world by overlaying<br />
it with virtual information.<br />
Calibration. Calibration is the task of defining and configuring parameters that stay constant<br />
during a −→ tracking task. Especially the integration of different coordinate systems<br />
into a world coordinate system is meant by this term in this thesis.<br />
Correlation. The correlation coefficient of two data sets indicates the relationship between<br />
both measurements. The correlation coefficient r ranges from -1 to 1. |r| ≈ 1 indicates<br />
a strong relationship, |r| ≈ 0 indicates a weak relationship.<br />
DOF. Degrees of freedom determines the state measurement ability of a tracking technology<br />
in a three dimensional environment. 3DOF determines position or orientation of an<br />
object, while 6DOF determines both (see −→ Pose).<br />
DWARF. The Distributed Wearable <strong>Augmented</strong> <strong>Reality</strong> Framework is a component-based<br />
framework enabling fast prototyping for AR applications. These components are<br />
reusable and distributed. A Corba-based infrastructure provides communication<br />
mechanisms for the components.<br />
Human Computer Interaction. Human Computer Interaction is a discipline in computer research<br />
putting a focus on the design, evaluation and implementation of interactive<br />
computer systems. Thus the interaction between humans and computers is an important<br />
issue.<br />
Immersion. Immersion is a measurement to what degree a user is affected by a virtual or augmented<br />
experience. Psychological factors as well as properties of the setup influence<br />
the immersion.<br />
Inertial Tracking. Based on the law of inertia accelerometers estimate the position of an object.<br />
Important for many applications are gyroscopes measuring the relative orientation<br />
of objects. Relative orientation means that we estimate the rate of turn, not<br />
absolute angles in a world coordinate frame.<br />
88
Glossary<br />
Linear Regression. Linear Regression is the estimation of a linear model for two variables X<br />
and Y , while X is the independent predictor and Y the dependent prediction. The<br />
linear model has the form Y = a + bX and values for a and b are approximated.<br />
Natural Feature Tracking. Natural Feature Tracking is a optical or −→ vision-based tracking<br />
method. Features from the environment are extracted from the video image. In the<br />
next video frame these features have to be found again. While computing the correspondences<br />
between the 2D feature points in the video frame and the 3D objects the<br />
camera pose can be estimated.<br />
Pose. Pose is a data structure containing spatial information of an object. The spatial information<br />
of an object consists of position and orientation.<br />
Registration. Registration is the problem of aligning virtual objects in the real environment.<br />
Two task are important for registration: the −→ calibration of the setup and the tracking.<br />
This is an important issue for the quality of an −→ <strong>Augmented</strong> <strong>Reality</strong> application<br />
<strong>Table</strong> <strong>Top</strong>. <strong>Table</strong> <strong>Top</strong> <strong>Augmented</strong> <strong>Reality</strong> is a special class of <strong>Augmented</strong> <strong>Reality</strong> −→ applications.<br />
It could be characterized by a horizontal setup with a restricted interaction area.<br />
Main application domains are exhibitions, education, gaming and collaboration.<br />
Tasks. A task is the purpose for the use of a computer application. A user has to accomplish<br />
a certain goal usually motivated by the user himself. In our case we describe the task<br />
to the user.<br />
Texture Tracking. In our context texture tracking is a special case of natural feature tracking.<br />
2 dimensional textures can used for the tracking of feature points. In the ARToolkit<br />
version these textures have to be preprocessed first.<br />
Tracking. Tracking is used for the estimation of an objects state. Tracking is a loop process<br />
consisting of the estimation and the update of the target’s state. A tracker is the underlying<br />
technology responsible for the −→ pose estimation.<br />
Vision-based Tracking. Also the term optical −→ tracking is used for this tracking technology<br />
which consists of hardware grabbing video frames and image recognition software<br />
analyzing the image and providing −→ pose information.<br />
89
A.1 Conduction of the User Study<br />
A.1.1 Questionnaire<br />
APPENDIX A<br />
User Study<br />
Here is a short overview of the rationale behind the questions posted in the questionnaire<br />
(see figure A.1). All the questions except part ’A’ could be answered on a scale of 1 to 5. In<br />
part ’B’ the scale options were 1=’never heard of it’ and 5=’experienced developer’, in part<br />
’C’ to ’E’ the scale options were 1=’strongly disagree’ and 5=’strongly agree’. The five scales<br />
seemed adequate for us, because the focus of the work was not on the questionnaire and it<br />
provides a suited feedback of the participants. The results can be explored in section B.1.<br />
A Personal details of participants<br />
These questions concerning age, occupation, gender and hand usage were posetd to<br />
have an overview of the participating test persons. Thus it can be evaluated if the<br />
group of participants is suited for the study. For future research the test person could<br />
be splitted into groups and the data sets could be compared.<br />
B Background on <strong>Augmented</strong> <strong>Reality</strong><br />
This part of the questionnaire should collect information about the previous knowledge<br />
of the test persons. Questions were posted about knowledge in <strong>Augmented</strong> <strong>Reality</strong><br />
in general and knowledge on the Magic Book.<br />
C Behavior of the Magic Book<br />
The intention of these questions was to collect feedback on the tracking and the usability<br />
of the Magic Book. Question C1 intended on usability, C2 on the awareness of<br />
jittering of the tracking and C3 focused on the tracking delay. But intention of this part<br />
was just to have a feedback about these issues by the test person. It was not used for<br />
further evaluation.<br />
D Tasks<br />
90
A User Study<br />
The question ”The task was easy to perform” should give information if the test persons<br />
had any problems performing the different tasks.<br />
E Handheld device<br />
Here feedback on the handheld device was gathered. This information can be used for<br />
future evaluation with other user interfaces.<br />
F Comments and encouragements<br />
A.1.2 Instructions and Guideline<br />
The instructions to the user can be seen in figure A.2. Figure A.3 shows the guideline for the<br />
study.<br />
91
A User Study<br />
QUESTIONAIRE FOR MAGIC BOOK - USER STUDY<br />
A. Personal Details<br />
Age: ______________ female O lefthanded O<br />
Occupation: ______________ male O righthanded O<br />
B. Background on AR<br />
How familiar am I with <strong>Augmented</strong> <strong>Reality</strong>.<br />
1 (never heard of it) 2 3 4 5 (experienced developer)<br />
How familiar am I with the Magic Book.<br />
1 (never used it) 2 3 4 5 (familiar with technologies used by the Magic<br />
Book)<br />
C. Behaviour of the Magic Book<br />
It easy to use the Magic Book.<br />
1 (strongly disagree) 2 3 4 5 (strongly agree)<br />
The scenes on the page were always clear and stable<br />
1 (strongly disagree) 2 3 4 5 (strongly agree)<br />
The scenes in the book responded to my movements immediately.<br />
1 (strongly disagree) 2 3 4 5 (strongly agree)<br />
D. Tasks<br />
The free task was easy to perform.<br />
1 (strongly disagree) 2 3 4 5 (strongly agree)<br />
The navigation task was easy to perform.<br />
1 (strongly disagree) 2 3 4 5 (strongly agree)<br />
E. Handheld device<br />
The Handheld device is a very suitable device for interacting with the Magic Book.<br />
1 (strongly disagree) 2 3 4 5 (strongly agree)<br />
F. Comments on the Magic Book, UI, Tasks, etc. (voluntarily)<br />
Date: ID: Thank you very much! ☺<br />
Figure A.1: Questionnaire<br />
92
A User Study<br />
INSTRUCTIONS FOR MAGIC BOOK - USER STUDY<br />
1. The study will take 10 minutes<br />
2. Records are held anonymous<br />
3. In this user study we try to evaluate in which actions occur when people<br />
are using the Magic Book.<br />
4. Usage:<br />
a. Just use the Handheld device and look at the marker first to obtain<br />
an initial orientation<br />
b. Now you can move freely through the scene<br />
c. If you lose the scene just look on the marker again<br />
5. Steps:<br />
a. You first have a chance to do some training with the Magic Book<br />
b. In the free task you can move around as you feel for it. Goal is to<br />
understand the scene (30 seconds).<br />
c. In the navigation task I will ask some questions concerning the 3D<br />
scene, like “how many trees do you see? (until finished)”<br />
d. 2 free tasks and 2 navigation tasks<br />
e. The other scenes are for your pleasure<br />
6. There is no wrong answer to my questions<br />
7. Data will only be used for the analysis of your movements of the<br />
handheld device<br />
8. Feel free to ask questions any time<br />
9. Comments are very welcome<br />
10. Thank you!<br />
Figure A.2: Instructions to the user study participant<br />
93
A User Study<br />
GUIDELINE FOR MAGIC BOOK - USER STUDY<br />
1. Welcome participant<br />
2. Guide him through the study (see instructions)<br />
3. Perform tasks (see below)<br />
4. Questionaire<br />
5. Ask for comments<br />
6. Give contact details in case of further questions<br />
Scenes (Pages):<br />
1. Practice<br />
2. CASE 1: Free Task<br />
a. 30 seconds<br />
3. for your pleasure (voluntary)<br />
4. CASE 2:<br />
(PRESS T WHEN TASK BEGINS)<br />
a. Task 1: How many people do you see in this scene (7)?<br />
b. Task 2: What is the haircolour of the woman with the white skirt<br />
c. Task 3: What is the colour of her shoes (blond/blue)?<br />
d. Task 4: How many people wear a headdress (6)?<br />
e. Task 5: How many pieces of wood do you see (4, 5-6)?<br />
5. for your pleasure (voluntary)<br />
6. for your pleasure (voluntary)<br />
7. CASE 3: Free Task<br />
a. 30 seconds<br />
8. CASE 4:<br />
(PRESS T WHEN TASK BEGINS)<br />
a. Task 1: How many people do you see in this scene (7)?<br />
b. Task 2: How many women and how many men do you see (2/5)?<br />
c. Task 3: What is the eye colour of the man with the purple coat (blue)?<br />
d. Task 4: How many windows and how many doors do you the on the<br />
front of the church (4/2)?<br />
Figure A.3: Guideline through the user study<br />
94
A.2 Statistical Tools<br />
A User Study<br />
This chapter should deliver an overview of the statistical evaluation methods used in this<br />
thesis. First the correlation is introduced which gives a measurement for the degree of a<br />
linear relationship for two sets of data. The regression analysis gives an estimation model<br />
for this linear relationship. Further and detailed discussions of these methods can be found<br />
in the corresponding references [18][52]. In my evaluation I mainly used two tools, an open<br />
source statistical analysis tool called GRETL 1 and Matlab 2 . All the plots have been done with<br />
Matlab.<br />
A.2.1 Correlation<br />
Basic question if we consider a correlation evaluation is how two sets of data (X, Y ) are<br />
related to each other. If increasing values of measurement X result in increasing values of<br />
measurement Y , we can expect a positive relationship between those two data sets. The<br />
correlation method provides a measurement for the grade of relationship between two independent<br />
measurements, the correlation coefficient.<br />
Correlation-Coefficient The Correlation-Coefficient r gives the degree of the strength of a relationship<br />
between two measurements with<br />
It is also called the Bravais-Pearson correlation coefficient.<br />
− 1 ≤ r ≤ 1 (A.1)<br />
|r| < 0.5 expresses a weak relationship, while |r| ≥ 0.8 expresses a very strong dependency.<br />
If the measurements tend to be on one straight line, the higher will be |r|. If this<br />
straight line has a positive slope (positive correlation) r ≈ 1 and with a negative slope (negative<br />
correlation) r ≈ −1 (see figure A.4). If the slope equals 0 we have no correlation at<br />
all.<br />
Figure A.4: The Bravais-Pearson-Correlation Coefficient expresses the grade of linear relationship<br />
between two data sets. The right figure show a positive correlation, the figure in the middle shows<br />
no correlation and the left figure show a negative correlation<br />
The correlation coefficient r for the measurement sets (xi, yi), with i = 1, .., n is calculated<br />
the following way:<br />
1 http://gretl.sourceforge.net/<br />
2 http://www.mathworks.com/<br />
95
= rXY =<br />
A User Study<br />
n i=1 (xi − ¯x)(yi − ¯y)<br />
n i=1 (xi − ¯x) 2 ˜sXY<br />
=<br />
(yi − ¯y) 2 ˜sX ˜sY<br />
(A.2)<br />
with ¯x and ¯y mean values of x and y. ˜sX and ˜sY are the standard deviations of measurements<br />
X and Y .<br />
˜sXY = 1<br />
n<br />
<br />
<br />
<br />
˜sX = 1<br />
n<br />
(xi − ¯x)<br />
n<br />
i=1<br />
2 (A.3)<br />
<br />
<br />
<br />
˜sY = 1<br />
n<br />
(yi − ¯y)<br />
n<br />
2 (A.4)<br />
i=1<br />
n<br />
(xi − ¯x)(yi − ¯y) (A.5)<br />
i=1<br />
The calculation of r is provided by most of the statistical analysis tools. After applying<br />
this function we can characterize the degree of linear relationship of our measurements. It is<br />
important to note that the coefficient only gives information on a linear correlation, not on<br />
other forms of dependency. If we want to estimate the linear model now we have to apply a<br />
further technique called linear regression.<br />
A.2.2 Linear Regression<br />
The goal of the linear regression is to derive a linear model to estimate the value of the<br />
predicted variable Y . This is done by performing a linear transform of the predictor variable<br />
X. The transformation looks like this:<br />
Y = f(X) = a + bX (A.6)<br />
Of course this function is only an approximation, because not all of the measurements<br />
will fall on one straight line (see as well A.4). Therefore for every pair of data (xi, yi) the<br />
following relationship is applied:<br />
yi = a + bxi + ei<br />
(A.7)<br />
ei is the error resulting because of the adoption to the linear relationship for i = 1, ..n. a<br />
and b are the parameters of the linear regression model. One assumption of this model is<br />
that we can not estimate the error ei if we know xi, there is no dependency. The sum of all<br />
errors has to be minimized in order to obtain a ”best” straight line and therefore a ”best”<br />
regression model. The error ei for every measurement yi can be calculated as:<br />
ei = yi − ˆyi<br />
(A.8)<br />
where ˆyi is the predicted value of the linear mapping and yi is the actual measured value.<br />
This error also called residual has to be summed for all data pairs. As we do not want that<br />
negative and positive values balance each other we square the error values, the so called<br />
squared residuals. The average of this value Q has to be minimized.<br />
96
Q(a, b) = 1<br />
n<br />
n<br />
i=1<br />
A User Study<br />
(yi − ˆyi) 2 = 1<br />
n<br />
n<br />
(yi − (a + bxi)) 2<br />
i=1<br />
(A.9)<br />
We now have to estimate the values for a and b resulting in a minimal Q(a, b). This method<br />
is also called sum of least squares method.<br />
The estimates (â, ˆ b) of (a, b) can be calculated by setting the partial derivative of a and b to<br />
0. These both equations have to solved.<br />
δQ(a, b)<br />
δa<br />
δQ(a, b)<br />
δb<br />
= −2<br />
= −2<br />
n<br />
(yi − (a + bxi)) = 0 (A.10)<br />
i=1<br />
n<br />
(yi − (a + bxi))xi = 0 (A.11)<br />
i=1<br />
Further calculations lead to these two equations:<br />
1<br />
n<br />
n<br />
i=1<br />
n 1<br />
yi − â −<br />
n<br />
i=1<br />
ˆb 1<br />
n<br />
xiyi − 1<br />
n â<br />
n<br />
xi − 1<br />
n ˆb i=1<br />
n<br />
yi = 0 (A.12)<br />
i=1<br />
n<br />
i=1<br />
Now â and ˆ b can be calculated out of these equations:<br />
If we put this in equation A.13 we can derive ˆ b.<br />
ˆ b =<br />
x 2 i = 0 (A.13)<br />
â = ¯y − ˆ b¯x (A.14)<br />
n i=1 (xi − ¯x)(yi − ¯y)<br />
n i=1 (xi − ¯x) 2 = ˜sXY<br />
˜s 2 X<br />
(A.15)<br />
Now we have calculated both parameters of the linear regression. But we still need an<br />
indicator for the quality of this estimation, because if the error e increases with higher x<br />
values a linear model might not be suited best to model the real world. The idea is that<br />
we split the Sum of Squares Total (SQT) which gives a value for the total variance into two<br />
components.<br />
Sum of Squares Explained (SQE): This is the variance of our linear model.<br />
SQE =<br />
n<br />
(ˆyi − ¯y) 2<br />
Sum of Squares Residuals (SQR) This is the rest of the distribution of the yi values.<br />
SQR =<br />
97<br />
i=1<br />
n<br />
(yi − ˆy) 2<br />
i=1<br />
(A.16)<br />
(A.17)
A User Study<br />
Sum of Squares Total (SQT): Thus the total variance is the variance described with our linear<br />
model plus the the sum of the variance we can not explain.<br />
SQT = SQE + SQR (A.18)<br />
With these values we can calculate a determination coefficient R 2 .<br />
R 2 = SQE<br />
SQT<br />
= 1 −<br />
n<br />
i=1 (yi − ˆy) 2<br />
n<br />
i=1 (yi − ¯y) 2<br />
(A.19)<br />
R 2 is an indicator for the part of the total dispersion that can be explained by our linear<br />
model. This coefficient has domain from 0 to 1. If all of our measurements would fall on one<br />
straight line it would result in coefficient of R 2 = 1, because our derived model matches the<br />
reality to 100%. If the residuals would be spreaded around the regression line randomly, the<br />
model would not be suited and results in a low determination coefficient.<br />
The determination coefficient R 2 is also related to the to correlation coefficient rXY for the<br />
measurements X and Y. The proof can be found in [18].<br />
R 2 = r 2 XY<br />
(A.20)<br />
Another method get a feeling of the quality of the linear model is a graphical plot of the<br />
residuals. If the residuals are close to 0 and vary without any system around the horizontal<br />
axis we can assume a suited model.<br />
Example<br />
This small example taken from [52] should demonstrate the conclusions derived from the<br />
correlation coefficient and the regression model. The following measurements have to be<br />
analyzed concerning a linear dependency (A.1). Calculations were made with GRETL, the<br />
plots were made with Matlab.<br />
i 1 2 3 4 5 6 7 8 9 10<br />
xi 1 5 3 8 2 2 10 8 7 4<br />
yi 1 6 1 6 3 2 8 5 6 2<br />
<strong>Table</strong> A.1: Example data measurements: 10 test samples were made for variable X and Y<br />
In figure A.5 you could see the resulting plot with the linear regression model. Figure A.6<br />
you could see the residuals. These residuals are distributed randomly, there is no obvious<br />
dependency between x and the values of the residuals. Putting the data into GRETL results<br />
in the output shown in figure A.7.<br />
The correlation coefficient is r = 0.8934 which is a indicator for a high relation between<br />
both variables. We can also extract the linear model for the prediction of variable Y out of<br />
this output (COEFFICIENT-column in A.7).<br />
y = f(x) = 0.395349 + 0.720930x<br />
98
A User Study<br />
Figure A.5: The plotted data with the corresponding linear regression model<br />
Figure A.6: The plot shows the values of the residuals for every predicted y value<br />
99
A User Study<br />
Model 4: OLS estimates using the 10 observations 1-10<br />
Dependent variable: Y<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 0,395349 0,742950 0,532 0,609090<br />
1) X 0,720930 0,128171 5,625 0,000496 ***<br />
Mean of dependent variable = 4<br />
Standard deviation of dep. var. = 2,49444<br />
Sum of squared residuals = 11,3023<br />
Standard error of residuals = 1,18861<br />
Unadjusted R-squared = 0,798173<br />
Adjusted R-squared = 0,772944<br />
Degrees of freedom = 8<br />
corr(X, Y) = 0,8934<br />
Figure A.7: The output from the GRETL tool: All the important figures concerning the linear regression<br />
model are shown<br />
The determination coefficient results in R 2 = r 2 = 0.798173. This lets us conclude that we<br />
found a very suited model.<br />
The GRETL output also calculated a p-value p. This p-value is the probability that we<br />
make a mistake if we deny the null-hypothesis of the t-test. The null-hypothesis H0 in our<br />
case means that the measurements are not related. In the linear model Y = a + bX it means<br />
that b = 0, Y can not be explained with the predictor X:<br />
H0 : b = 0<br />
H1 : b = 0<br />
(A.21)<br />
Thus the p-value is an indicator if we should accept or deny H0. GRETL automatically<br />
performs this t-test. In our example the p-value for X declaring Y is 0.000496. If this p-value<br />
is under a certain significance parameter α we can reject H0. GRETL computes the t-test<br />
for three significance level α = {0.1, 0.05, 0.01}. The significance level is the probability we<br />
admit for a false choice of a hypothesis. If p < α, H0 is rejected. In our example p is lower<br />
than all the values for α. This is indicated by the three ∗ behind the p-value.<br />
Hence we can conclude that X and Y are related and that we can accept H1. The lower<br />
p the more confidence can be put in the assumption that the data measurements are highly<br />
related. For the calculation of this value and the testing of hypothesis please have a look in<br />
the corresponding literature.<br />
100
APPENDIX B<br />
Complete Results<br />
In this section of the appendix the complete results of the questionnaire and the analysis of<br />
the linear regression of the data sets can be examined.<br />
B.1 Questionnaire<br />
Here is the data extracted from the questionnaires of the user study.<br />
A Personal Details of participants<br />
1. Gender<br />
male female<br />
80% 20 %<br />
2. Age<br />
20-22 23-25 26-28 29-31 32+<br />
10% 40% 25% 10 % 15 %<br />
3. Hand<br />
left right<br />
95% 5 %<br />
4. Occupation<br />
Student Employee Researcher Else<br />
70 % 10% 10% 10%<br />
B Background on <strong>Augmented</strong> <strong>Reality</strong><br />
1. How familiar am I with <strong>Augmented</strong> <strong>Reality</strong>?<br />
1 2 3 4 5<br />
10% 25% 15% 5% 45%<br />
(1=never heard of it, 5=experienced developer)<br />
101
B Complete Results<br />
2. How familiar am I with the Magic Book?<br />
1 2 3 4 5<br />
5% 30% 15% 15% 35%<br />
C Behavior of the Magic Book<br />
D Tasks<br />
1. It was easy to use the Magic Book<br />
1 2 3 4 5<br />
0% 0% 45% 40% 15%<br />
(1=never heard of it, 5=experienced developer)<br />
(1=strongly disagree, 5=strongly agree)<br />
2. The scenes on the page were always clear and stable<br />
1 2 3 4 5<br />
5% 35% 45% 15% 0%<br />
(1=strongly disagree, 5=strongly agree)<br />
3. The scenes in the book responded to my movements immediately<br />
1 2 3 4 5<br />
0% 5% 10% 55% 30%<br />
1. The free task was easy to perform<br />
1 2 3 4 5<br />
0% 0% 10% 30% 60%<br />
2. The navigation task easy to perform<br />
1 2 3 4 5<br />
0% 0% 15% 65% 20%<br />
E Handheld device<br />
(1=strongly disagree, 5=strongly agree)<br />
(1=strongly disagree, 5=strongly agree)<br />
(1=strongly disagree, 5=strongly agree)<br />
1. The handheld device is a very suitable device for interacting with the Magic Book<br />
1 2 3 4 5<br />
5% 35% 45% 15% 0%<br />
F Comments and encouragements<br />
(1=strongly disagree, 5=strongly agree)<br />
Here are some comments made on the Magic Book, the handheld device and the user<br />
study itself:<br />
cables were disturbing, use of a head mounted display as comparison, cables limit the<br />
range of movement, handheld device is too heavy, tracking fails too often, cool book,<br />
some colors do not look good in the handheld display, screen was a little blury, difficult<br />
to see the whole scene, giant was not completely visible, image disappears, not stable,<br />
jittering and tracking failure is annoying, heavy device is hard to keep stable, would<br />
be good not to hold anything, :-), tasks were good<br />
102
B.2 Cases<br />
B Complete Results<br />
In this section all the cases with the corresponding tasks are described. The results of the<br />
linear regression are shown as well.<br />
B.2.1 Case 1<br />
Case 1 was the first free task without any questions about the virtual scene (B.1). Figure B.2<br />
shows the scattered plot with the regression line and a plot of the residuals. The GRETL<br />
output is shown in B.3.<br />
Figure B.1: Case 1: Virtual Scene<br />
Figure B.2: Case 1: The scattered plot with regression line and the plot of the residuals<br />
103
B Complete Results<br />
Model 1: OLS estimates using the 16327 observations 1-16327<br />
Dependent variable: Vector<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 1,56895 0,0274115 57,237 < 0,00001 ***<br />
2) Angle 144,624 3,30650 43,739 < 0,00001 ***<br />
Mean of dependent variable = 2,31546<br />
Standard deviation of dep. var. = 2,89688<br />
Sum of squared residuals = 122635<br />
Standard error of residuals = 2,74082<br />
Unadjusted R-squared = 0,104897<br />
Adjusted R-squared = 0,104843<br />
Degrees of freedom = 16325<br />
Pairwise correlation coefficients:<br />
corr(Vector, Angle) = 0,3239<br />
Figure B.3: The output from the GRETL tool: Case 1<br />
104
B.2.2 Case 2<br />
B Complete Results<br />
Figure B.4 shows the virtual scene for Case 2. Now with this scene the test persons had to<br />
perform several tasks. Questions were posted to the test person.<br />
• Task 1<br />
Figure B.4: Case 2: Virtual Scene<br />
Question: ”How many people do you see in the scene ?” (plot: B.5, output: B.6)<br />
Figure B.5: Case 2 / Task 1: The scattered plot with regression line and the plot of the residuals<br />
105
B Complete Results<br />
Model 1: OLS estimates using the 5773 observations 1-5773<br />
Dependent variable: Vector<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 1,26631 0,0238207 53,160 < 0,00001 ***<br />
2) Angle 194,822 3,12817 62,280 < 0,00001 ***<br />
Mean of dependent variable = 2,11212<br />
Standard deviation of dep. var. = 1,9226<br />
Sum of squared residuals = 12759,7<br />
Standard error of residuals = 1,48694<br />
Unadjusted R-squared = 0,401955<br />
Adjusted R-squared = 0,401851<br />
Degrees of freedom = 5771<br />
Pairwise correlation coefficients:<br />
corr(Vector, Angle) = 0,6340<br />
Figure B.6: The output from the GRETL tool: Case 2 / Task 1<br />
106
• Task 2<br />
B Complete Results<br />
Question: ”What is the haircolor of the woman with the white skirt ?” (plot: B.7, output:<br />
B.8)<br />
Figure B.7: Case 2 / Task 2: The scattered plot with regression line and the plot of the residuals<br />
Model 1: OLS estimates using the 5757 observations 1-5757<br />
Dependent variable: Vector<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 1,45847 0,0269841 54,049 < 0,00001 ***<br />
2) Angle 177,196 3,53711 50,096 < 0,00001 ***<br />
Mean of dependent variable = 2,24338<br />
Standard deviation of dep. var. = 1,99741<br />
Sum of squared residuals = 15991,1<br />
Standard error of residuals = 1,66692<br />
Unadjusted R-squared = 0,303659<br />
Adjusted R-squared = 0,303538<br />
Degrees of freedom = 5755<br />
Pairwise correlation coefficients:<br />
corr(Vector, Angle) = 0,5511<br />
Figure B.8: The output from the GRETL tool: Case 2 / Task 2<br />
107
• Task 3<br />
B Complete Results<br />
Question: ”What is the color of the shoes of the woman with the white skirt ?” (plot:<br />
B.9, output: B.10)<br />
Figure B.9: Case 2 / Task 3: The scattered plot with regression line and the plot of the residuals<br />
Model 1: OLS estimates using the 3927 observations 1-3927<br />
Dependent variable: Vector<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 1,78150 0,0423130 42,103 < 0,00001 ***<br />
2) Angle 193,941 4,95880 39,110 < 0,00001 ***<br />
Mean of dependent variable = 2,79864<br />
Standard deviation of dep. var. = 2,46541<br />
Sum of squared residuals = 17171,3<br />
Standard error of residuals = 2,09161<br />
Unadjusted R-squared = 0,280428<br />
Adjusted R-squared = 0,280245<br />
Degrees of freedom = 3925<br />
Pairwise correlation coefficients:<br />
corr(Vector, Angle) = 0,5296<br />
Figure B.10: The output from the GRETL tool: Case 2 / Task 3<br />
108
• Task 4<br />
B Complete Results<br />
Question: ”How many people wear a hat or a headdress ?” (plot: B.11, output: B.12)<br />
Figure B.11: Case 2 / Task 4: The scattered plot with regression line and the plot of the residuals<br />
Model 1: OLS estimates using the 6456 observations 1-6456<br />
Dependent variable: Vector<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 1,76134 0,0320220 55,004 < 0,00001 ***<br />
2) Angle 220,166 3,70264 59,462 < 0,00001 ***<br />
Mean of dependent variable = 2,89062<br />
Standard deviation of dep. var. = 2,57711<br />
Sum of squared residuals = 27697,3<br />
Standard error of residuals = 2,07159<br />
Unadjusted R-squared = 0,353936<br />
Adjusted R-squared = 0,353836<br />
Degrees of freedom = 6454<br />
Pairwise correlation coefficients:<br />
corr(Vector, Angle) = 0,5949<br />
Figure B.12: The output from the GRETL tool: Case 2 / Task 4<br />
109
• Task 5<br />
B Complete Results<br />
Question: ”How many pieces of wood do you see ?” (plot: B.13, output: B.14)<br />
Figure B.13: Case 2 / Task 5: The scattered plot with regression line and the plot of the residuals<br />
Model 1: OLS estimates using the 6641 observations 1-6641<br />
Dependent variable: Vector<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 1,34886 0,0262689 51,348 < 0,00001 ***<br />
2) Angle 210,083 3,21658 65,312 < 0,00001 ***<br />
Mean of dependent variable = 2,19743<br />
Standard deviation of dep. var. = 2,38431<br />
Sum of squared residuals = 22981,7<br />
Standard error of residuals = 1,86054<br />
Unadjusted R-squared = 0,391181<br />
Adjusted R-squared = 0,391089<br />
Degrees of freedom = 6639<br />
Pairwise correlation coefficients:<br />
corr(Vector, Angle) = 0,6254<br />
Figure B.14: The output from the GRETL tool: Case 2 / Task 5<br />
110
B.2.3 Case 3<br />
B Complete Results<br />
Case 3 (B.15 was again a free task without any questions posted to the test person (plot: B.16,<br />
output: B.17).<br />
Figure B.15: Case 3: Virtual Scene<br />
Figure B.16: Case 3: The scattered plot with regression line and the plot of the residuals<br />
111
B Complete Results<br />
Model 1: OLS estimates using the 14743 observations 1-14743<br />
Dependent variable: Vector<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 1,55991 0,0190527 81,873 < 0,00001 ***<br />
2) Angle 127,487 2,00371 63,625 < 0,00001 ***<br />
Mean of dependent variable = 2,24612<br />
Standard deviation of dep. var. = 2,15298<br />
Sum of squared residuals = 53611,2<br />
Standard error of residuals = 1,90706<br />
Unadjusted R-squared = 0,215452<br />
Adjusted R-squared = 0,215399<br />
Degrees of freedom = 14741<br />
Pairwise correlation coefficients:<br />
corr(Vector, Angle) = 0,4642<br />
Figure B.17: The output from the GRETL tool: Case 3<br />
112
B.2.4 Case 4<br />
B Complete Results<br />
The scene for case 4 can be seen in figure B.18. Several tasks had to performed again in this<br />
case and the corresponding questions were posted.<br />
• Task 1<br />
Figure B.18: Case 4: Virtual Scene<br />
Question: ”How many people do you see in the scene ?” (plot: B.19, output: B.20)<br />
Figure B.19: Case 4 / Task 1: The scattered plot with regression line and the plot of the residuals<br />
113
B Complete Results<br />
Model 1: OLS estimates using the 5284 observations 1-5284<br />
Dependent variable: Vector<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 1,16379 0,0269233 43,226 < 0,00001 ***<br />
2) Angle 190,290 3,79175 50,185 < 0,00001 ***<br />
Mean of dependent variable = 1,93473<br />
Standard deviation of dep. var. = 1,95301<br />
Sum of squared residuals = 13644,7<br />
Standard error of residuals = 1,60725<br />
Unadjusted R-squared = 0,322868<br />
Adjusted R-squared = 0,32274<br />
Degrees of freedom = 5282<br />
Pairwise correlation coefficients:<br />
corr(Vector, Angle) = 0,5682<br />
Figure B.20: The output from the GRETL tool: Case 4 / Task 1<br />
114
• Task 2<br />
B Complete Results<br />
Question: ”How many women and how many men do you see?” (plot: B.21, output:<br />
B.22)<br />
Figure B.21: Case 4 / Task 2: The scattered plot with regression line and the plot of the residuals<br />
Model 1: OLS estimates using the 9356 observations 1-9356<br />
Dependent variable: Vector<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 1,48687 0,0198384 74,949 < 0,00001 ***<br />
2) Angle 131,274 2,39249 54,869 < 0,00001 ***<br />
Mean of dependent variable = 2,04073<br />
Standard deviation of dep. var. = 1,89916<br />
Sum of squared residuals = 25525,9<br />
Standard error of residuals = 1,65193<br />
Unadjusted R-squared = 0,243489<br />
Adjusted R-squared = 0,243408<br />
Degrees of freedom = 9354<br />
Pairwise correlation coefficients:<br />
corr(Vector, Angle) = 0,4934<br />
Figure B.22: The output from the GRETL tool: Case 4 / Task 2<br />
115
• Task 3<br />
B Complete Results<br />
Question: ”What is the eye color of the man with the purple coat ?” (plot: B.23, output:<br />
B.24)<br />
Figure B.23: Case 4 / Task 3: The scattered plot with regression line and the plot of the residuals<br />
Model 1: OLS estimates using the 9535 observations 1-9535<br />
Dependent variable: Vector<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 1,74120 0,0276468 62,980 < 0,00001 ***<br />
2) Angle 130,608 2,75606 47,390 < 0,00001 ***<br />
Mean of dependent variable = 2,44667<br />
Standard deviation of dep. var. = 2,52852<br />
Sum of squared residuals = 49333,1<br />
Standard error of residuals = 2,27486<br />
Unadjusted R-squared = 0,190662<br />
Adjusted R-squared = 0,190577<br />
Degrees of freedom = 9533<br />
Pairwise correlation coefficients:<br />
corr(Vector, Angle) = 0,4366<br />
Figure B.24: The output from the GRETL tool: Case 4 / Task 3<br />
116
• Task 4<br />
B Complete Results<br />
Question: ”How many windows and how many doors do you see and the front of the<br />
church ?” (plot: B.25, output: B.26)<br />
Figure B.25: Case 4 / Task 4: The scattered plot with regression line and the plot of the residuals<br />
Model 1: OLS estimates using the 5140 observations 1-5140<br />
Dependent variable: Vector<br />
VARIABLE COEFFICIENT STDERROR T STAT 2Prob(t > |T|)<br />
0) const 1,49365 0,0361831 41,280 < 0,00001 ***<br />
2) Angle 197,812 4,51415 43,820 < 0,00001 ***<br />
Mean of dependent variable = 2,30959<br />
Standard deviation of dep. var. = 2,60671<br />
Sum of squared residuals = 25419,1<br />
Standard error of residuals = 2,22425<br />
Unadjusted R-squared = 0,272055<br />
Adjusted R-squared = 0,271914<br />
Degrees of freedom = 5138<br />
Pairwise correlation coefficients:<br />
corr(Vector, Angle) = 0,5216<br />
Figure B.26: The output from the GRETL tool: Case 4 / Task 4<br />
117
Bibliography<br />
[1] R. Azuma, B. Hoff, III Neely, H., and R. Sarfaty. A motion-stabilized outdoor augmented<br />
reality system. In Virtual <strong>Reality</strong>, 1999. Proceedings., IEEE, pages 252–259, 1999.<br />
[2] Ronald Azuma, Yohan Baillot, Reinhold Behringer, Steven Feiner, Simon Julier, and<br />
Blair MacIntyre. Recent advances in augmented reality. IEEE Computer Graphics and<br />
Applications, 21(6):34–47, Nov/Dec 2001.<br />
[3] Ronald Azuma and Gary Bishop. A frequency-domain analysis of head-motion prediction.<br />
In SIGGRAPH ’95: Proceedings of the 22nd annual conference on Computer graphics<br />
and interactive techniques, pages 401–408, New York, NY, USA, 1995. ACM Press.<br />
[4] Ronald Azuma, Jong Weon Lee, Bolan Jiang, Jun Park, Suya You, and Ulrich Neumann.<br />
Tracking in unprepared environments for augmented reality systems. Computers &<br />
Graphics, 23(6):787–793, 1999.<br />
[5] Ronald T. Azuma. A survey of augmented reality. Presence, Special Issue on <strong>Augmented</strong><br />
<strong>Reality</strong>, 6(4):355–385, August 1997.<br />
[6] Martin Bauer, Bernd Brügge, Gudrun Klinker, Asa MacWilliams, Thomas Reicher, Stefan<br />
Riss, Christian Sandor, and Martin Wagner. Design of a component-based augmented<br />
reality framework. In Proceedings of the 2nd IEEE and ACM International Symposium<br />
on <strong>Augmented</strong> <strong>Reality</strong> (ISAR 2001), New York, NY, October 2001. IEEE Computer<br />
Society.<br />
[7] Mark Billinghurst. <strong>Augmented</strong> reality in education, new horizons for learning. Internet,<br />
2002. www.newhorzons.org/strategies/technology/billinghurst.htm.<br />
[8] Mark Billinghurst and Hirokazu Kato. Collaborative augmented reality. Communications<br />
of the ACM, 45(7):64–70, 2002.<br />
[9] Mark Billinghurst, Ivan Poupyrev, Hirokazu Kato, and Richard May. Mixing realities<br />
in shared space: An augmented reality interface for collaborative computing. In IEEE<br />
International Conference on Multimedia and Expo (III), pages 1641–1644, 2000.<br />
118
Bibliography<br />
[10] Mark Billinghurst, Hirokazu Tachibana, Keihachiro Kato, and Michael Grafe. Virtual<br />
object manipulation on a table top ar environment. In Proceedings of the 2nd IEEE and<br />
ACM International Symposium on <strong>Augmented</strong> <strong>Reality</strong> (ISAR 2001), New York, NY, October<br />
2001. IEEE Computer Society.<br />
[11] Mark Billinghurst, Hirokazu Tachibana, Keihachiro Kato, and Michael Grafe. A registration<br />
method based on texture tracking using artoolkit. In The Second IEEE International<br />
<strong>Augmented</strong> <strong>Reality</strong> Toolkit Workshop, Tokyo, Japan, 2003.<br />
[12] Leonardo Bonanni, Chia-Hsun Lee, and Ted Selker. Attention-based design of augmented<br />
reality interfaces. In CHI ’05: CHI ’05 extended abstracts on Human factors in<br />
computing systems, pages 1228–1231, New York, NY, USA, 2005. ACM Press.<br />
[13] Bernd Bruegge and Allen A. Dutoit. Object-Oriented Software Engineering; Conquering<br />
Complex and Changing Systems. Prentice Hall PTR, Upper Saddle River, NJ, USA, 1999.<br />
[14] J.S. Burdess, A.J. Harris, J. Cruickshank, D. Wood, and G. Cooper. A review of vibratory<br />
gyroscopes. IEEE Engineering Science and Education Journal, 3(16), 1994.<br />
[15] Carlos Correa, Andres Agudelo, Allan Meng Krebs, Ivan Marsic, Jun Hou, Ashutosh<br />
Morde, and Kicha Ganapathy. The parallel worlds system for collaboration among<br />
virtual and augmented reality users. In Proc. of International Symposium on Mixed and<br />
<strong>Augmented</strong> <strong>Reality</strong>, Arlington, VA, USA, Nov. 2004.<br />
[16] Corioli’s force. http://www.cmt.phys.kyushu-u.ac.jp/ ∼ M.Sakurai/phys/<br />
physmath/rotsys-e.html.<br />
[17] DWARF Project Homepage. www.augmentedreality.de.<br />
[18] Ludwig Fahrmeir, Rita Künstler, Iris Pigeot, and Gerhard Tutz. Statistik - Der Weg zur<br />
Datenanalyse. Springer, 1997.<br />
[19] Armin Fischer, Michael Kuhn, and Felix Löw. <strong>Augmented</strong> Furniture Client - Eine digitale<br />
Vertriebsinnovation für die Möbelbranche, 2004.<br />
[20] Martin Fowler. UML Distilled: A Brief Guide to the Standard Object Modeling Language.<br />
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2001.<br />
[21] B.D. Allen G. Bishop, G. Welch. Tracking: Beyond 15 minutes of thought. SIGGRAPH<br />
Course Pack, 2001.<br />
[22] B.J. Gallagher, J.S. Burdess, A.J. Harris, and M.E. McNie. Principles of a three-axis vibrating<br />
gyroscope. IEEE Transactions on Aerospace and Electronic Systems, 37(4), 2001.<br />
[23] Gyroscopes of various types. http://www.spp.co.jp/sssj/sindoue.html.<br />
[24] Thomas T. Hewett. ACM SIGCHI curricula for Human-Computer Interaction. Technical<br />
report, New York, NY, USA, 1992.<br />
[25] Getting around the coriolis force. http://www.physics.ohio-state.edu/<br />
∼ dvandom/Edu/newcor.html.<br />
119
Bibliography<br />
[26] Michael Kalkusch, Thomas Lidy, Michael Knapp, Gerhard Reitmayr, Hannes Kaufmann,<br />
and Dieter Schmalstieg. Structured visual markers for indoor pathfinding. In<br />
Proceedings of the First IEEE International Workshop on ARToolKit, Darmstadt, Germany,<br />
2002.<br />
[27] H. Kato and Mark Billinghurst. Marker tracking and hmd calibration for a video-based<br />
augmented reality conferencing system. In Proceedings of the 2nd International Workshop<br />
on <strong>Augmented</strong> <strong>Reality</strong> (IWAR 99), San Francisco, USA, October 1999.<br />
[28] Hirokazu Kato, Mark Billinghurst, and Ivan Poupyrev. ARToolKit version 2.33 Manual,<br />
2000. Available for download at http://www.hitl.washington.edu/research/<br />
shared space/download/.<br />
[29] Georg Klein and Tom Drummond. Robust visual tracking for non-instrumented augmented<br />
reality. In The Second IEEE and ACM International Symposium on Mixed and <strong>Augmented</strong><br />
<strong>Reality</strong>, pages 113 – 122. IEEE Computer Society, October 7 – 10 2003.<br />
[30] Chris Kulas. Usability engineering for ubiquitous computing. Master’s thesis, Technische<br />
Universität München, 2003.<br />
[31] Clayton Lewis and John Rieman. Task-Centered User Interface Design - A Practical Introduction.<br />
1994. http://hcibib.org/tcuid/.<br />
[32] J. Liang, C. Shaw, and M. Green. On temporal-spatial realism in the virtual reality environment.<br />
In Proc. of the 4th Annual Symposium on User Interface Software and Technology<br />
(UIST’91), pages 19–25, Hilton Head, SC, 1991.<br />
[33] Bruce D. Lucas and Takeo Kanade. An iterative image registration technique with an<br />
application to stereo vision (darpa). In Proceedings of the 1981 DARPA Image Understanding<br />
Workshop, pages 121–130, April 1981.<br />
[34] J. McKenzie and D. Darnell. The ”magic book”: A report into augmented reality storytelling<br />
in the context of a children’s workshop. Technical report, Christchurch College<br />
of Education, 2003.<br />
[35] Gordon E. Moore. Cramming more components onto integrated circuits. Electronics,<br />
Volume 38(8), April 1965.<br />
[36] R. Mukundan. Quaternions: From classical mechanics to computer graphics, and beyond.<br />
In Proceedings of the 7 th Asian Technology Conference in Mathematics, 2002.<br />
[37] Ulrich Neumann and Suya You. Integration of region tracking and optical flow for<br />
image motion estimation. In ICIP (3), pages 658–662, 1998.<br />
[38] Ulrich Neumann and Suya You. Natural feature tracking for augmented reality. IEEE<br />
Transactions on Multimedia, 1(1):53–64, 1999.<br />
[39] J. Newman, M. Wagner, M. Bauer, A. MacWilliams, T. Pintaric, D. Beyer, D. Pustka,<br />
F. Strasser, D. Schmalstieg, and G. Klinker. Ubiquitous tracking for augmented reality.<br />
In Proc. of International Symposium on Mixed and <strong>Augmented</strong> <strong>Reality</strong>, Arlington, VA, USA,<br />
Nov. 2004.<br />
120
Bibliography<br />
[40] Joseph Newman and David Ingram. <strong>Augmented</strong> reality in a wide area sentient environment.<br />
In ISAR ’01: Proceedings of the IEEE and ACM International Symposium on<br />
<strong>Augmented</strong> <strong>Reality</strong> (ISAR’01), page 77, Washington, DC, USA, 2001. IEEE Computer Society.<br />
[41] Charles B. Owen, Fan Xiao, and Paul Middlin. What is the best fiducial? In The First<br />
IEEE International <strong>Augmented</strong> <strong>Reality</strong> Toolkit Workshop, pages 98–105, Darmstadt, Germany,<br />
September 2002.<br />
[42] Jun Park, Suya You, and Ulrich Neumann. Natural feature tracking for extendible robust<br />
augmented realities. In Proceedings of the international workshop on <strong>Augmented</strong> reality<br />
: placing artificial objects in real scenes, pages 209–217. A. K. Peters, Ltd., 1999.<br />
[43] James Patten, Hiroshi Ishii, Jim Hines, and Gian Pangaro. Sensetable: a wireless object<br />
tracking platform for tangible user interfaces. CHI ’01: Proceedings of the SIGCHI<br />
conference on Human factors in computing systems, pages 253–260, 2001.<br />
[44] Jenny Preece, Yvonne Rogers, Helen Sharp, David Benyon, Simon Holland, and Tom<br />
Carey. Human-Computer Interaction. Addison-Wesley, 1994.<br />
[45] J. P. Rolland and H. Fuchs. Optical versus video see-through head-mounted displays in<br />
medical visualization. Presence: Teleoperators & Virtual Environments, 9(3):287–309 (23),<br />
June 2000.<br />
[46] Jannick P. Rolland, Larry Davis, and Yohan Baillot. A survey of tracking technology for<br />
virtual environments. In W. Barfield and T. Caudell, editors, Fundamentals of Wearable<br />
Computers and <strong>Augmented</strong> <strong>Reality</strong>, pages 67–112. Lawrence Erlbaum, Mahwah, NJ, USA,<br />
2001.<br />
[47] Christian Sandor, Asa MacWilliams, Martin Wagner, Martin Bauer, and Gudrun<br />
Klinker. Sheep: The shared environment entertainment pasture. In Proceedings of the<br />
International Symposium on Mixed and <strong>Augmented</strong> <strong>Reality</strong> (ISMAR), October 2002.<br />
[48] Chris Shaw and Jiandong Liang. An experiment to characterize head motion in VR and<br />
RR using MR. pages 99–101, 1992.<br />
[49] Ken Shoemake. Animating rotation with quaternion curves. In SIGGRAPH ’85: Proceedings<br />
of the 12th annual conference on Computer graphics and interactive techniques, pages<br />
245–254, New York, NY, USA, 1985. ACM Press.<br />
[50] Michael Siggelkow. Importance of gaze awareness in augmented reality teleconferencing.<br />
Master’s thesis, Technische Universität München, 2005.<br />
[51] G. Simon, A. Fitzgibbon, and A. Zisserman. Markerless tracking using planar structures<br />
in the scene. In Proc. International Symposium on <strong>Augmented</strong> <strong>Reality</strong>, pages 120–128,<br />
October 2000.<br />
[52] David W. Stockburger. Introductory Statistics: Concepts, Models, And Applications. http:<br />
//www.psychstat.smsu.edu/introbook/sbk00.htm.<br />
[53] Franz Strasser. Bootstrapping of sensor networks in ubiquitous tracking environments.<br />
Master’s thesis, Technische Universität München, 2004.<br />
121
Bibliography<br />
[54] Ivan E. Sutherland. A head-mounted three dimensional display. In AFIPS Conference<br />
Proceedings, Fall Joint Conference, volume 1, pages 757–764, Washinton (DC), USA, 1968.<br />
[55] C. Tomasi and Takeo Kanade. Shape and motion from image streams: a factorization<br />
method - part 3 detection and tracking of point features. Technical Report CMU-CS-91-<br />
132, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA, April<br />
1991.<br />
[56] Julian Looser Trond Nilsen, Steven Linton. Motivations for <strong>Augmented</strong> <strong>Reality</strong> Gaming.<br />
Technical report, HIT Lab New Zealand, 2004.<br />
[57] Christiane Ulbricht and Dieter Schmalstieg. Tangible <strong>Augmented</strong> <strong>Reality</strong> for Computer<br />
Games. In M.H. Hamza, editor, VIIP Conference Proceedings, pages 950–954. IASTED,<br />
ACTA Press, September 2003. ISBN 0-88986-382-2.<br />
[58] Stefan Veigl, Andreas Kaltenbach, Florian Ledermann, Gerhad Reitmayr, and Dieter<br />
Schmalstieg. Two-handed direct interaction with artoolkit. Technical report, Vienna<br />
University of Technology, 2002.<br />
[59] Florent Vial. Natural point feature tracking of a textured plane: A realtime augmented<br />
reality application. Master’s thesis, Lyon School of Chemistry, Physics and Electronics,<br />
2003.<br />
[60] Florent Vial. State of the art report on natural feature tracking for vision-based real time<br />
augmented reality. Technical report, HIT Lab New Zealand, 2003.<br />
[61] Daniel Wagner, Thomas Pintaric, Florian Ledermann, and Dieter Schmalstieg. Towards<br />
massively multi-user augmented reality on handheld devices. In Third International<br />
Conference on Pervasive Computing (Pervasive 2005), Munich, Germany, 2005.<br />
[62] Martin Wagner. Tracking with multiple sensors. PhD thesis, Technische Universität<br />
München, 2005.<br />
[63] Martin Wagner and Felix Loew. Configuration strategies of an ar toolkit-based wide<br />
area tracker. In The Second IEEE International <strong>Augmented</strong> <strong>Reality</strong> Toolkit Workshop, Tokyo,<br />
Japan, 2003.<br />
[64] Mark Weiser. The computer for the 21st century. SIGMOBILE Mob. Comput. Commun.<br />
Rev., 3(3):3–11, 1999.<br />
[65] Greg Welch and Gary Bishop. An introduction to the kalman filter. Technical report,<br />
Chapel Hill, NC, USA, 1995.<br />
[66] Eric Woods, Mark Billinghurst, Graham Aldrige, Barbara Garrie, Julian Loser, Deidre<br />
Brown, and Claudia Nelles. Augmenting the science centre and museum experience.<br />
Technical report, HIT Lab New Zealand, 2004.<br />
[67] Suya You and Ulrich Neumann. Fusion of vision and gyro tracking for robust augmented<br />
reality registration. In Proceedings of IEEE Virtual <strong>Reality</strong>, pages 71–78, Yokohama,<br />
Japan, March 2001.<br />
122
Bibliography<br />
[68] Suya You, Ulrich Neumann, and Ronald Azuma. Hybrid inertial and vision tracking<br />
for augmented reality registration. In VR ’99: Proceedings of the IEEE Virtual <strong>Reality</strong>, page<br />
260, Washington, DC, USA, 1999. IEEE Computer Society.<br />
123