Automatic indexing in e-government - VBN - Aalborg Universitet
Automatic indexing in e-government - VBN - Aalborg Universitet
Automatic indexing in e-government - VBN - Aalborg Universitet
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong>:<br />
Improved access to adm<strong>in</strong>istrative documents for professional<br />
users?<br />
Tanja Svarre<br />
PhD thesis from Department of Communication<br />
<strong>Aalborg</strong> University, Denmark
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong>:<br />
Improved access to adm<strong>in</strong>istrative documents for professional<br />
users?<br />
Tanja Svarre<br />
PhD thesis from Department of Communication<br />
<strong>Aalborg</strong> University, Denmark
CIP – Catalogu<strong>in</strong>g <strong>in</strong> Publication<br />
Svarre, Tanja<br />
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong>: Improved access to<br />
adm<strong>in</strong>istrative documents for professional users?/ Tanja<br />
Svarre. – <strong>Aalborg</strong>: <strong>Aalborg</strong> University, 2012. xiv, 328 p.<br />
© Copyright Tanja Svarre 2012<br />
All rights reserved
Automatisk <strong>in</strong>dekser<strong>in</strong>g <strong>in</strong>denfor e-<strong>government</strong>:<br />
Forbedret adgang til adm<strong>in</strong>istrative dokumenter<br />
for professionelle brugere?<br />
Tanja Svarre<br />
Ph.d.-afhandl<strong>in</strong>g fra<br />
Institut for Kommunikation, <strong>Aalborg</strong> <strong>Universitet</strong>
Acknowledgments<br />
F<strong>in</strong>ish<strong>in</strong>g this thesis has been possible due to cont<strong>in</strong>uous support from colleagues,<br />
family, and friends. For this, I thank you all.<br />
First and foremost I want to thank my supervisor, Professor Marianne Lykke.<br />
You have readily shared valuable knowledge, <strong>in</strong>sight, and experiences, of which I have<br />
learned a lot. Your constructive feedback for written and oral productions has been<br />
<strong>in</strong>valuable, and always moved the project further. You have been an enthusiastic,<br />
flexible and helpful supervisor. For this I am very grateful.<br />
Also, I am <strong>in</strong>debted to a number of present and former colleagues. First I want<br />
to thank former heads of research programme Pia Borlund and Jette Hyldegaard, former<br />
head of department Jesper W. Schneider, head of department Jack Andersen (all from<br />
RSLIS), director of doctoral school Ann Bygholm and head of department Christian<br />
Jantzen for professionally support<strong>in</strong>g me dur<strong>in</strong>g the different phases of this project. My<br />
gratitude also goes to Professor Pia Borlund for plant<strong>in</strong>g the first seeds of my <strong>in</strong>terest <strong>in</strong><br />
research and for always readily discuss<strong>in</strong>g theoretical and empirical matters with great<br />
enthusiasm. To associate professor Jesper W. Schneider for be<strong>in</strong>g a persistent<br />
discussion partner and <strong>in</strong>spiration on statistical matters and questionnaires, and for good<br />
companionship on the IC3. To associate professors Haakon Lund and Birger Larsen for<br />
your flexible and patient support <strong>in</strong> various technical matters. And to former PhD<br />
student Charles Seger. Despite the distance you have been a valuable support <strong>in</strong> good<br />
times and <strong>in</strong> bad, and a great companion <strong>in</strong> travel<strong>in</strong>g. To assistant professor Mette<br />
Skov, thank for your encouragement, for proofread<strong>in</strong>g chapters, and for be<strong>in</strong>g a good<br />
colleague. PhD Brian Kirkegaard Lunn, you undertook extended responsibilities as my<br />
“buddy” and was an excellent partner <strong>in</strong> teach<strong>in</strong>g. Thank you for your always open<br />
door, at the office and at home. Lastly, I want to thank all my colleagues at Friis for<br />
your warm welcome. It has been a pleasure to jo<strong>in</strong> you.<br />
Further, I am <strong>in</strong>debted to Associate Professors Katri<strong>in</strong>a Byström and Tom<br />
Nyvang, and Professor Gunilla Widén for jo<strong>in</strong><strong>in</strong>g the assessment committee and for<br />
mak<strong>in</strong>g time for read<strong>in</strong>g and comment<strong>in</strong>g on the thesis. I highly appreciate your effort.<br />
I also want to thank people outside the research community of <strong>Aalborg</strong> University and<br />
The Royal school of Library and Information Science. I am grateful to Professor<br />
Susanne Bødker for mak<strong>in</strong>g my research visit at Aarhus University, Department of<br />
I
Computer Science possible, and to PhD Niels Mathiassen for good times dur<strong>in</strong>g my<br />
stay.<br />
I am also very thankful to the former National IT and Telecom Agency (IT &<br />
Telestyrelsen) for provid<strong>in</strong>g the topical frame for the project, and for support<strong>in</strong>g the<br />
project. Special thanks goes to senior consultant Palle Aagaard, my contact person <strong>in</strong><br />
the agency, for your <strong>in</strong>terest <strong>in</strong> the project, for mak<strong>in</strong>g room for practice oriented<br />
perspectives on empirical matters, both dur<strong>in</strong>g plann<strong>in</strong>g and analysis, for provid<strong>in</strong>g<br />
competent <strong>in</strong>put for the project, and for always be<strong>in</strong>g available. I am very grateful to<br />
SKAT too for mak<strong>in</strong>g the collaboration possible, and for mak<strong>in</strong>g employees, office and<br />
IT facilities available for my empirical somersaults. In particular I want to express my<br />
gratitude to my contact person <strong>in</strong> SKAT, special consultant Ebbe Tor Andersen. You<br />
have been an enthusiastic source of <strong>in</strong>spiration, always see<strong>in</strong>g possibilities rather than<br />
limitations. I also want to thank all the participants of the project. The 340<br />
questionnaire respondents, the 35 focus group participants and the 42 people<br />
participat<strong>in</strong>g <strong>in</strong> the search test, either as pilot testers or test persons. Thank you for all<br />
your <strong>in</strong>puts, your time, and for your goodwill. And to my transcriber, Timo Iwersen, I<br />
am grateful for your effort <strong>in</strong> transform<strong>in</strong>g the <strong>in</strong>terviews <strong>in</strong>to text.<br />
Last, but by no means least, I am grateful to my boyfriend Sune, and my dear<br />
family and friends for your cont<strong>in</strong>uous patience with me dur<strong>in</strong>g my writ<strong>in</strong>g and work<strong>in</strong>g<br />
on this project. Thank you for your persistence, your help and support, both mentally<br />
and <strong>in</strong> practical matters, for believ<strong>in</strong>g <strong>in</strong> me, and for still be<strong>in</strong>g there. Without you I<br />
could not have succeeded <strong>in</strong> f<strong>in</strong>ish<strong>in</strong>g this thesis. Your dedication is highly treasured.<br />
To my girls, Annika and Maja, I am blessed to have you <strong>in</strong> my life. Your love carried<br />
me all the way through this project.<br />
II
Abstract<br />
The overall purpose of the present thesis is to <strong>in</strong>vestigate, if automatic assigned<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> methods can improve professional users’ access to work-based documents <strong>in</strong><br />
the doma<strong>in</strong> of e-<strong>government</strong>. The problem is <strong>in</strong>vestigated by means of a case study <strong>in</strong><br />
the Danish tax authorities SKAT. An experimental comparative test was designed on<br />
the basis of a preced<strong>in</strong>g doma<strong>in</strong> study, clarify<strong>in</strong>g the seek<strong>in</strong>g behaviour <strong>in</strong> e<strong>government</strong>.<br />
The <strong>in</strong>troduction of e-<strong>government</strong> has arisen from a desire for effectiveness,<br />
efficiency and greater transparency <strong>in</strong> public adm<strong>in</strong>istration. Today public-sector<br />
employees commonly carry out manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> of <strong>government</strong> documents. With the<br />
thesis we want to <strong>in</strong>vestigate if automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can replace, and perhaps even<br />
improve, the current manual procedures to be able to support efficiency and<br />
effectiveness.<br />
An employee perspective guides the thesis. That <strong>in</strong>volves a user group with<br />
great knowledge of the topic they are work<strong>in</strong>g with. In contrast to citizens and other e<strong>government</strong><br />
stakeholders, not much is known about the seek<strong>in</strong>g behaviour of employees<br />
<strong>in</strong> the doma<strong>in</strong>. In addition the <strong>in</strong>troduction of e-<strong>government</strong> is expected to change<br />
employees’ work tasks, and with that their <strong>in</strong>formation needs. That calls for an<br />
<strong>in</strong>vestigation of the present <strong>in</strong>formation seek<strong>in</strong>g behaviour of e-<strong>government</strong> employees.<br />
In the thesis this is done by means of a doma<strong>in</strong> study. The study is based on a<br />
questionnaire distributed to employees <strong>in</strong> SKAT and subsequent focus group <strong>in</strong>terviews.<br />
The doma<strong>in</strong> study shows that the employees use a number of primarily onl<strong>in</strong>e<br />
<strong>in</strong>formation sources to solve their work tasks. The sources are used frequently. The<br />
employees primarily have verificative and conscious topical <strong>in</strong>formation needs. Besides<br />
that they are experienced <strong>in</strong>formation searchers request<strong>in</strong>g more extensive metadata <strong>in</strong><br />
the system form<strong>in</strong>g the basis of the search test: their <strong>in</strong>tranet.<br />
The knowledge ga<strong>in</strong>ed from the doma<strong>in</strong> study was <strong>in</strong>corporated <strong>in</strong>to the search<br />
test design. The test was an experimental test compar<strong>in</strong>g automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
(free text <strong><strong>in</strong>dex<strong>in</strong>g</strong>) and automatic assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> (categorization). In the assigned<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> a doma<strong>in</strong> specific taxonomy formed the basis of the categories. The test<br />
system was a prototype of a future version of SKATs <strong>in</strong>tranet. 32 test persons carried<br />
out searches with the two <strong><strong>in</strong>dex<strong>in</strong>g</strong> types <strong>in</strong> two separate systems <strong>in</strong> experimental sense.<br />
3 simulated search tasks and 1 genu<strong>in</strong>e search task guided the searches. The the<br />
III
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
simulated search tasks were designed <strong>in</strong> accordance with the f<strong>in</strong>d<strong>in</strong>gs from the doma<strong>in</strong><br />
study regard<strong>in</strong>g the <strong>in</strong>formation needs of the employees. The test showed that the two<br />
automatic types of <strong><strong>in</strong>dex<strong>in</strong>g</strong> are useful to the employees <strong>in</strong> their own way. At a general<br />
level extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> had the best performance measured <strong>in</strong> terms of the average<br />
number of terms and concepts <strong>in</strong> queries, <strong>in</strong> terms of the number of sessions with<br />
reformulations, and <strong>in</strong> terms of the number of reformulations <strong>in</strong> sessions. This showed<br />
that the system with categorization demanded more from the test persons <strong>in</strong> comparison<br />
to the free text <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
It turned out that the test persons had difficulties us<strong>in</strong>g the<br />
categorization <strong>in</strong> some respects. Thus it was not relevant to them, if they retrieved a<br />
highly relevant document with a high rank order before us<strong>in</strong>g the categorization. They<br />
did not f<strong>in</strong>d it relevant either, if they retrieved very few results by the <strong>in</strong>itial search. In<br />
those cases it was easier for them to manually go through the results. In contrast the<br />
categorization was helpful <strong>in</strong> identify<strong>in</strong>g new facets of a search task and <strong>in</strong> suggest<strong>in</strong>g<br />
new search terms <strong>in</strong> reformulations. For future e-<strong>government</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es this<br />
resulted <strong>in</strong> the recommendation that both assigned and extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> should be<br />
represented as search facilitators, as they support their own aspects of the <strong>in</strong>formation<br />
needs aris<strong>in</strong>g for employees <strong>in</strong> e-<strong>government</strong>.<br />
The thesis contributes by provid<strong>in</strong>g new <strong>in</strong>sights <strong>in</strong>to the <strong>in</strong>formation seek<strong>in</strong>g behavior<br />
of employees <strong>in</strong> e-<strong>government</strong> and the way <strong>in</strong> which this behavior can be supported by<br />
automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
IV
Abstract <strong>in</strong> Danish<br />
Formålet med nærværende afhandl<strong>in</strong>g er at afdække, hvorvidt automatisk <strong>in</strong>dekser<strong>in</strong>g<br />
kan forbedre medarbejderes adgang til arbejdsbaseret <strong>in</strong>formationssøgn<strong>in</strong>g <strong>in</strong>denfor<br />
domænet digital forvaltn<strong>in</strong>g. Problemstill<strong>in</strong>gen undersøges i ph.d projektet som et<br />
casestudie hos SKAT. Mere præcist foretages en komparativ søgetest. Testen er<br />
designet på baggrund af et forudgående domænestudie, der afklarer søgeadfærden<br />
<strong>in</strong>denfor digital forvaltn<strong>in</strong>g.<br />
Introduktionen af digital forvaltn<strong>in</strong>g er opstået ud fra et ønske om<br />
effektiviser<strong>in</strong>g af og øget åbenhed i den offentlige forvaltn<strong>in</strong>g. Det er i dag udbredt, at<br />
offentlige medarbejdere manuelt <strong>in</strong>dekserer forvaltn<strong>in</strong>gers dokumenter. Da en af<br />
grundene til at digitalisere forvaltn<strong>in</strong>ger netop er et ønske om øget effektiviser<strong>in</strong>g, vil<br />
det i denne afhandl<strong>in</strong>g blive undersøgt, om en automatisk <strong>in</strong>dekser<strong>in</strong>g af dokumenter<br />
kan erstatte, og måske endda forbedre, den manuelle <strong>in</strong>dekser<strong>in</strong>g i domænet.<br />
I afhandl<strong>in</strong>gen anskues problemstill<strong>in</strong>gen ud fra et medarbejderperspektiv. Det<br />
<strong>in</strong>debærer en brugergruppe, som har en stor viden <strong>in</strong>denfor det emne, de arbejder med.<br />
I modsætn<strong>in</strong>g til f.eks. borgere ved man ikke meget om medarbejderes<br />
<strong>in</strong>formationssøgeadfærd <strong>in</strong>denfor e-<strong>government</strong> litteraturen. Når man samtidig<br />
forventer, at digitaliser<strong>in</strong>gen af forvaltn<strong>in</strong>ger har en <strong>in</strong>dvirkn<strong>in</strong>g på medarbejderes<br />
arbejdsopgaver, og ved, at arbejdsopgaver <strong>in</strong>fluerer på de <strong>in</strong>formationsbehov,<br />
<strong>in</strong>formationssøgere udvikler, så opstår der et behov for at afdække, hvad der<br />
kendetegner søgeadfærden hos medarbejdere i den offentlige forvaltn<strong>in</strong>g i dag. Dette er<br />
i afhandl<strong>in</strong>gen blevet gjort ved hjælp af et domænestudie. Domænestudiet er baseret på<br />
en spørgeskemaundersøgelse blandt medarbejdere i en offentlig forvaltn<strong>in</strong>g, samt<br />
opfølgende fokusgruppe<strong>in</strong>terviews. Domænestudiet viste, at medarbejderne gør brug af<br />
en række forskellige <strong>in</strong>formationssystemer i deres arbejde, og at de gør det hyppigt i<br />
løsn<strong>in</strong>gen af deres opgaver. De har primært verifikative og bevidst emneafgrænsede<br />
<strong>in</strong>formationsbehov. Desuden er de erfarne søgere, som efterspørger langt flere metadata<br />
i det system, der danner grundlag for søgetesten, men især <strong>in</strong>dholdsmæssige metadata.<br />
Erfar<strong>in</strong>gerne fra domænestudiet blev <strong>in</strong>darbejdet i søgetestens design. Testen<br />
er en komparativ test, der sammenligner automatisk udtrukken <strong>in</strong>dekser<strong>in</strong>g (fritekst<br />
<strong>in</strong>dekser<strong>in</strong>g) med automatisk tildelt <strong>in</strong>dekser<strong>in</strong>g (kategoriser<strong>in</strong>g) på baggrund af en<br />
domænespecifik taksonomi. Testsystemet er en prototype af medarbejdernes<br />
kommende <strong>in</strong>tranet. 32 testdeltagere søgte i de to systemer på baggrund af 3 udleverede<br />
V
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
og et af deres egne søgeopgaver. Udformn<strong>in</strong>gen af de konstruerede søgeopgaver blev<br />
udformet i overensstemmelse med det, domænestudiet havde vist omkr<strong>in</strong>g<br />
medarbejdernes <strong>in</strong>formationsbehov. Testen viste, at de to former for <strong>in</strong>dekser<strong>in</strong>g er<br />
anvendelige på hver deres måde. Overordnet havde den udtrukne <strong>in</strong>dekser<strong>in</strong>g den<br />
bedste performance målt i forhold til antallet af ord og begreber, der blev anvendt i<br />
forespørgsler, hvor mange sessioner, der <strong>in</strong>deholdt reformuler<strong>in</strong>ger, samt antallet af<br />
reformuler<strong>in</strong>ger i de sessioner. Det viste, at systemet med kategoriser<strong>in</strong>g krævede mere<br />
af brugerne, både i forhold til antal søgn<strong>in</strong>ger og i forhold til antal termer og begreber,<br />
der blev <strong>in</strong>dtastet.<br />
Det viste sig, at testpersonerne havde problemer med at anvende<br />
kategoriser<strong>in</strong>gen i nogle sammenhænge. Således var den ikke relevant for dem, hvis de<br />
fik et højrelevant dokument frem blandt de første søgeresultater uden at have brugt<br />
kategoriser<strong>in</strong>gen. De fandt den heller ikke relevant, hvis de fik så få resultater frem ved<br />
selve søgn<strong>in</strong>gen, det var hurtigere manuelt at kigge dem igennem. Til gengæld kunne<br />
kategoriser<strong>in</strong>gen hjælpe dem med at identificere nye facetter i søgeopgaver og til at<br />
foreslå nye søgetermer i forb<strong>in</strong>delse med reformuler<strong>in</strong>ger. For det videre arbejde med<br />
retn<strong>in</strong>gsl<strong>in</strong>ier for <strong>in</strong>dekser<strong>in</strong>g mundede dette resultat ud i en anbefal<strong>in</strong>g af, at begge<br />
typer bør være til stede i digitale forvaltn<strong>in</strong>gers <strong>in</strong>dekser<strong>in</strong>g idet de dækker forskellige<br />
aspekter af de <strong>in</strong>formationsbehov, der opstår hos medarbejdere i digital forvaltn<strong>in</strong>g.<br />
I s<strong>in</strong> helhed bidrager afhandl<strong>in</strong>gen ved at give ny viden om sammenhængen<br />
<strong>in</strong>formationssøgeadfærden for medarbejdere i digitale forvaltn<strong>in</strong>ger og den måde,<br />
hvorpå den identificerede adfærd kan understøttes ved hjælp af automatisk <strong>in</strong>dekser<strong>in</strong>g.<br />
VI
Table of contents<br />
1 INTRODUCTION ..................................................................................................................... 1<br />
1.1 RESEARCH OBJECTIVE .................................................................................................................. 3<br />
1.2 EMPIRICAL ASSUMPTIONS ............................................................................................................ 4<br />
1.3 MOTIVATIONS FOR THE THESIS ..................................................................................................... 5<br />
1.4 RESEARCH QUESTIONS ................................................................................................................. 7<br />
1.5 STRUCTURE OF THE THESIS ........................................................................................................... 8<br />
2 METHODOLOGICAL FRAMEWORK .............................................................................. 11<br />
2.1 A COGNITIVE FRAMEWORK FOR INFORMATION RESEARCH ......................................................... 11<br />
2.1.1 Towards a holistic cognitive framework .................................................................................... 13<br />
2.1.2 The role of work tasks ................................................................................................................ 15<br />
2.2 THE COGNITIVE FRAMEWORK AND THE THESIS ........................................................................... 16<br />
2.3 OVERALL RESEARCH METHOD: CASE STUDY .............................................................................. 17<br />
2.4 THE CASE: SKAT ....................................................................................................................... 17<br />
2.4.1 The <strong>in</strong>tranet ............................................................................................................................... 19<br />
2.4.2 The <strong>in</strong>tranet taxonomy ............................................................................................................... 21<br />
2.5 SUMMARY .................................................................................................................................. 23<br />
3 THE E-GOVERNMENT DOMAIN ...................................................................................... 25<br />
3.1 DEFINITION AND PURPOSE .......................................................................................................... 26<br />
3.2 SUBJECT AREAS IN E-GOVERNMENT RESEARCH & DEVELOPMENT (R&D) .................................. 29<br />
3.3 STAKEHOLDERS IN E-GOVERNMENT ........................................................................................... 34<br />
3.4 LIS PERSPECTIVES ON E-GOVERNMENT ...................................................................................... 36<br />
3.4.1 Information systems ................................................................................................................... 36<br />
3.4.2 Knowledge management ............................................................................................................ 40<br />
3.4.3 ICT tools: Metadata <strong>in</strong>itiatives .................................................................................................. 42<br />
3.5 SUMMARY .................................................................................................................................. 46<br />
4 SEEKING BEHAVIOUR IN E-GOVERNMENT ................................................................ 47<br />
4.1 INFORMATION SEEKING AND RELATED CONCEPTS ...................................................................... 47<br />
4.2 THE PURPOSE OF SEEKING STUDIES ............................................................................................ 50<br />
4.3 ENTITIES OF E-GOVERNMENT: STUDIES OF SEEKING BEHAVIOR .................................................. 50<br />
4.4 E-GOVERNMENT EMPLOYEE INFORMATION SEEKING .................................................................. 51<br />
4.4.1 Project INISS ............................................................................................................................. 54<br />
4.4.2 System development <strong>in</strong> the Danish Parliament .......................................................................... 55<br />
4.4.3 Information behavior of employees <strong>in</strong> a eng<strong>in</strong>eer<strong>in</strong>g and technical service <strong>government</strong> office 57<br />
4.4.4 Federal, state, and local policy makers’ selection of <strong>in</strong>formation sources ............................... 58<br />
4.4.5 F<strong>in</strong>nish municipal employees .................................................................................................... 59<br />
4.4.6 Users of the European Parliamentary Documentation Centre .................................................. 60<br />
4.4.7 Information literacy of Scottish <strong>government</strong> civil service staff .................................................. 61<br />
4.4.8 Civil servants’ <strong>in</strong>ternet skills ..................................................................................................... 62<br />
4.5 RELATED STUDIES OF INFORMATION SEEKING AND SEARCHING ................................................. 63<br />
4.5.1 Legal seek<strong>in</strong>g behavior .............................................................................................................. 63<br />
4.5.2 Information behaviour of software eng<strong>in</strong>eers ............................................................................ 65<br />
4.5.3 Professional seek<strong>in</strong>g behaviour ................................................................................................. 65<br />
4.6 SUMMARY .................................................................................................................................. 67<br />
5 INDEXING OF ELECTRONIC DOCUMENTS .................................................................. 71<br />
5.1 THE PROCESS OF INDEXING ........................................................................................................ 72<br />
VII
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
5.2 QUALITY OF INDEXING ............................................................................................................... 74<br />
5.2.1 Specificity .................................................................................................................................. 74<br />
5.2.2 Exhaustivity ............................................................................................................................... 75<br />
5.2.3 Consistency ................................................................................................................................ 76<br />
5.2.4 Performance measures .............................................................................................................. 78<br />
5.3 APPROACHES TO INDEXING ........................................................................................................ 79<br />
5.3.1 Document, user, and doma<strong>in</strong> oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> ........................................................................ 79<br />
5.3.2 Controlled vs. uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> ........................................................................................ 81<br />
5.3.3 Intellectual vs. automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> ............................................................................................ 88<br />
5.4 APPROACHES TO AUTOMATIC INDEXING .................................................................................... 93<br />
5.4.1 <strong>Automatic</strong> extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> .................................................................................................... 94<br />
5.4.1.1 Lexical analysis and stop word lists .......................................................................................................... 94<br />
5.4.1.2 Stemm<strong>in</strong>g .................................................................................................................................................. 95<br />
5.4.1.3 Weight<strong>in</strong>g factors ...................................................................................................................................... 96<br />
5.4.1.4 Compound nouns as <strong>in</strong>dex terms ............................................................................................................... 99<br />
5.4.1.5 Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> ................................................................................................................................... 101<br />
5.4.2 <strong>Automatic</strong> assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> ................................................................................................... 104<br />
5.5 HYBRID TYPES OF INTELLECTUAL AND AUTOMATIC INDEXING ................................................. 108<br />
5.6 SUMMARY ................................................................................................................................ 110<br />
6 EMPIRICAL FRAMEWORK ............................................................................................. 111<br />
6.1 DOMAIN STUDY ........................................................................................................................ 111<br />
6.2 QUESTIONNAIRE DESIGN, COLLECTION, AND ANALYSIS ........................................................... 113<br />
6.2.1 Technique and structure .......................................................................................................... 114<br />
6.2.2 Content .................................................................................................................................... 115<br />
6.2.2.1 Background data ...................................................................................................................................... 116<br />
6.2.2.2 Work tasks ............................................................................................................................................... 116<br />
6.2.2.3 Elaboration of work tasks ........................................................................................................................ 117<br />
6.2.3 Data collection ........................................................................................................................ 121<br />
6.2.4 Pilot test<strong>in</strong>g .............................................................................................................................. 121<br />
6.2.5 Data analysis ........................................................................................................................... 122<br />
6.2.6 Methodical reflections ............................................................................................................. 124<br />
6.3 FOCUS GROUP METHOD ............................................................................................................ 125<br />
6.3.1 Purpose and design ................................................................................................................. 125<br />
6.3.2 Data collection: Interview guide ............................................................................................. 127<br />
6.3.3 Execution and documentation ................................................................................................. 127<br />
6.3.4 Data analysis ........................................................................................................................... 128<br />
6.3.5 Limitations ............................................................................................................................... 130<br />
6.4 SEARCH TEST DESIGN ............................................................................................................... 130<br />
6.4.1 Test system ............................................................................................................................... 130<br />
6.4.2 Test persons ............................................................................................................................. 134<br />
6.4.3 Search tasks ............................................................................................................................. 135<br />
6.4.4 Test procedure ......................................................................................................................... 138<br />
6.4.5 Pilot test ................................................................................................................................... 140<br />
6.4.6 Techniques for data collection and preparation ...................................................................... 141<br />
6.4.7 Data analysis ........................................................................................................................... 142<br />
6.5 LIMITATIONS ............................................................................................................................ 148<br />
6.6 RELATION BETWEEN RESEARCH METHOD AND RESEARCH QUESTIONS ..................................... 149<br />
7 DOMAIN STUDY RESULTS .............................................................................................. 151<br />
7.1 QUESTIONNAIRE RESPONDENTS, THEIR BACKGROUND AND WORK TASKS ................................ 151<br />
7.2 CHARACTERISTICS OF FOCUS GROUP PARTICIPANTS ................................................................. 155<br />
7.3 RESULTS REGARDING PROFESSIONAL E-GOVERNMENT SEEKING BEHAVIOR ............................. 156<br />
7.3.1 Use of <strong>in</strong>formation sources ...................................................................................................... 157<br />
7.3.1.1 Reference works ...................................................................................................................................... 157<br />
7.3.1.2 Web sites ................................................................................................................................................. 158<br />
7.3.1.3 Internal systems ....................................................................................................................................... 159<br />
VIII
7.3.2 Colleagues as sources of <strong>in</strong>formation ...................................................................................... 165<br />
7.4 SEEKING RESULTS REGARDING DEMANDS FOR INDEXING IN E-GOVERNMENT ........................... 167<br />
7.4.1 The frequency on <strong>in</strong>formation seek<strong>in</strong>g ..................................................................................... 167<br />
7.4.2 Types of <strong>in</strong>formation needs ...................................................................................................... 173<br />
7.4.3 Preferred metadata .................................................................................................................. 177<br />
7.5 SUMMARY AND IMPLICATIONS FOR INDEXING .......................................................................... 182<br />
8 SEARCH TEST RESULTS .................................................................................................. 185<br />
8.1 THE TEST PERSONS ................................................................................................................... 185<br />
8.2 OVERALL SEARCHING BEHAVIOUR AND PERFORMANCE ........................................................... 188<br />
8.2.1 The search situation ................................................................................................................ 191<br />
8.2.1.1 Sessions ................................................................................................................................................... 191<br />
8.2.1.2 Queries .................................................................................................................................................... 192<br />
8.2.1.3 Search operators ...................................................................................................................................... 196<br />
8.2.1.4 Filter<strong>in</strong>g by metadata ............................................................................................................................... 198<br />
8.2.2 Reformulations ........................................................................................................................ 202<br />
8.2.3 Comb<strong>in</strong>ed system B sessions and queries ................................................................................ 206<br />
8.3 SUMMARY AND PERFORMANCE IMPLICATIONS FOR FUTURE INDEXING IN E-GOVERNMENT ...... 211<br />
9 CONCLUSION AND RECOMMENDATIONS FOR FUTURE WORK ........................ 215<br />
9.1 SUMMARY OF EMPIRICAL FINDINGS .......................................................................................... 215<br />
9.2 CONTRIBUTIONS OF THE THESIS ............................................................................................... 219<br />
9.3 RECOMMENDATIONS FOR FUTURE WORK ................................................................................. 220<br />
10 REFERENCES ...................................................................................................................... 223<br />
List of abbreviations ................................................................................................................................................... 245<br />
Appendices 247<br />
Appendix 1: Generic work tasks at SKAT ................................................................................................................. 249<br />
Appendix 2: Distribution of employees across ma<strong>in</strong> processes <strong>in</strong> the bus<strong>in</strong>ess model .............................................. 253<br />
Appendix 3: E-mail <strong>in</strong>vitation to employees .............................................................................................................. 255<br />
Appendix 4: Questions conta<strong>in</strong>ed <strong>in</strong> questionnaire .................................................................................................... 257<br />
Appendix 5: Questionnaire pilot test data .................................................................................................................. 259<br />
Appendix 6: L<strong>in</strong>k to questionnaire ............................................................................................................................. 261<br />
Appendix 7: Dates for the conduct of focus group <strong>in</strong>terviews ................................................................................... 263<br />
Appendix 8: Example of the slides guid<strong>in</strong>g a focus group <strong>in</strong>terview ......................................................................... 265<br />
Appendix 9: Focus group <strong>in</strong>terview guide ................................................................................................................. 275<br />
Appendix 10: Transcription conventions ................................................................................................................... 277<br />
Appendix 11: Verbatim Danish versions of quotes used <strong>in</strong> the thesis ........................................................................ 279<br />
Appendix 12: E-mail <strong>in</strong>vitation to participate <strong>in</strong> search test ...................................................................................... 287<br />
Appendix 13: Questionnaire for recruit<strong>in</strong>g test persons for the search test ................................................................ 291<br />
Appendix 14: Simulated search tasks ......................................................................................................................... 293<br />
Appendix 15: Test persons’ <strong>in</strong>sight <strong>in</strong>to simulated search tasks ................................................................................ 295<br />
Appendix 16: E-mail concern<strong>in</strong>g naturalistic <strong>in</strong>formation needs ............................................................................... 297<br />
Appendix 17: Instructions for search test persons ...................................................................................................... 299<br />
Appendix 18: Rotation of search tasks ....................................................................................................................... 303<br />
Appendix 19: Search test <strong>in</strong>terview guide .................................................................................................................. 305<br />
Appendix 20: Judgement of the relevance of retrieved documents <strong>in</strong> search test ...................................................... 307<br />
Appendix 21: Completeness degree of questionnaire responses ................................................................................ 309<br />
Appendix 22: Respondents’ experience with work tasks ........................................................................................... 311<br />
Appendix 23: Age distribution of population, respondents and test persons .............................................................. 313<br />
Appendix 24: Respondents’ length of service <strong>in</strong> the organization ............................................................................. 315<br />
Appendix 25: Focus group participants work tasks .................................................................................................... 317<br />
Appendix 26: Additional sources mentioned by respondents .................................................................................... 319<br />
Appendix 27: Test persons’ background data ............................................................................................................ 325<br />
Appendix 28: Supplementary search test tables ......................................................................................................... 327<br />
IX
List of figures<br />
Figure 2.1 The participat<strong>in</strong>g actors <strong>in</strong> context. Model adapted from Ingwersen & Järvel<strong>in</strong> (2005, p.<br />
261) with m<strong>in</strong>or corrections. ........................................................................................................... 12<br />
Figure 2.2 Extension of the cognitive view, the <strong>in</strong>teractive process of IR and affect<strong>in</strong>g factors.<br />
Adapted from Ingwersen & Järvel<strong>in</strong> (2005, p. 274) with m<strong>in</strong>or corrections. .................................. 14<br />
Figure 2.3 Information behaviour and the <strong>in</strong>fluence from job- or non-job related tasks. Adapted<br />
from Ingwersen & Järvel<strong>in</strong>(Ingwersen & Järvel<strong>in</strong>, 2005, p. 198). .................................................. 16<br />
Figure 2.4 SKATs revised bus<strong>in</strong>ess model ................................................................................................. 18<br />
Figure 2.5 Screen dump from exist<strong>in</strong>g <strong>in</strong>tranet <strong>in</strong>terface ........................................................................... 20<br />
Figure 3.1 Discipl<strong>in</strong>es <strong>in</strong>tegrated <strong>in</strong> the multidiscipl<strong>in</strong>ary research field og e-<strong>government</strong>. Adapted<br />
from Wimmer (2007, p. 14) ............................................................................................................ 27<br />
Figure 3.2 Basic elements and relations <strong>in</strong> <strong>government</strong>al systems (Grönlund, 2003, p. 56) ...................... 28<br />
Figure 3.3 E-<strong>government</strong> hype cycle (Schellong, 2007) ............................................................................ 30<br />
Figure 3.4 Dimensions and stages <strong>in</strong> e-<strong>government</strong> (from Layne & Lee, 2001, p. 124) ............................ 32<br />
Figure 4.1 Nested model of <strong>in</strong>formation seek<strong>in</strong>g and <strong>in</strong>formation search<strong>in</strong>g (Wilson, 1999, p. 263) ........ 48<br />
Figure 4.2 Comprehensive model of <strong>in</strong>formation seek<strong>in</strong>g. Adapted from Johnson et al. (1995). ............. 56<br />
Figure 4.3 Model of cognitive factors affect<strong>in</strong>g <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> the doma<strong>in</strong> of software<br />
eng<strong>in</strong>eer<strong>in</strong>g. Adapted from Freund, Toms & Waterhouse (2005). ................................................. 66<br />
Figure 4.4 The process of <strong>in</strong>formation seek<strong>in</strong>g of professionals. Adapted from Leckie, Pettigrew &<br />
Sylva<strong>in</strong> (1996, p. 180) ..................................................................................................................... 68<br />
Figure 5.1: Illustration of the subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> process (Mai, 2000, p. 279). ............................................. 73<br />
Figure 5.2 Document and doma<strong>in</strong> oriented approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Adapted from Mai (2005, p.<br />
607) ................................................................................................................................................. 81<br />
Figure 5.3 Types of vocabularies and their relationships. Adapted from Morville & Rosenfeld<br />
(2007, p. 195) .................................................................................................................................. 82<br />
Figure 5.4 Generalized characteristics of <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Accumulated on the basis of<br />
Rafferty & Hidderley (2007). .......................................................................................................... 89<br />
Figure 5.5 The resolv<strong>in</strong>g power of significant <strong>in</strong>dex terms. Adapted from Luhn (1958a, p. 161) ............. 97<br />
Figure 6.1 Screen dump from atlas.ti cod<strong>in</strong>g of focus group <strong>in</strong>terviews .................................................. 129<br />
Figure 6.2 Screen dump of the test system: Search fields ........................................................................ 131<br />
Figure 6.3 Screen dump of the test system: Categorization...................................................................... 133<br />
Figure 6.4 Relevance types <strong>in</strong> IR evaluation adapted from Borlund (2003a, p. 915). .............................. 139<br />
XI
List of tables<br />
Table 1.1 Timel<strong>in</strong>e for data collection <strong>in</strong> the PhD project ............................................................................ 7<br />
Table 3.1 Stakeholders <strong>in</strong> e-<strong>government</strong>. Adapted from Rowley (2011, p. 56) ......................................... 35<br />
Table 3.2 Knowledge management processes and the potential role of IT. Adapted from Alavi &<br />
Leidner (2001, p. 125) ..................................................................................................................... 41<br />
Table 4.1 Examples of studies that have exam<strong>in</strong>ed <strong>in</strong>formation seek<strong>in</strong>g and/or search<strong>in</strong>g of various<br />
stakeholder roles.............................................................................................................................. 52<br />
Table 5.1 Possible factors affect<strong>in</strong>g consistency. From Lancaster (2003, p. 71). ...................................... 77<br />
Table 5.2 Summary of strengths and weaknesses of controlled vocabularies and free text. Adapted<br />
from Dubois (1987, p. 249). ............................................................................................................ 84<br />
Table 6.1 Indicators of <strong>in</strong>formation needs <strong>in</strong> questionnaire and correspond<strong>in</strong>g theoretical<br />
descriptions ................................................................................................................................... 119<br />
Table 6.2 List of respondents' preferred metadata listed <strong>in</strong> questionnaire ................................................ 120<br />
Table 6.3 Cross tabulations carried out on the basis of variables <strong>in</strong> questionnaire data ........................... 123<br />
Table 6.4 Overview of participants <strong>in</strong> focus groups ................................................................................. 126<br />
Table 6.5 Examples of genu<strong>in</strong>e search tasks ............................................................................................ 137<br />
Table 6.6 Search test variables, their def<strong>in</strong>ition and measurement ........................................................... 144<br />
Table 6.7 Simulated search task facets ..................................................................................................... 146<br />
Table 6.8 Outl<strong>in</strong>e of the relation between research questions and empirical data .................................... 150<br />
Table 7.1 Distribution of respondents as to their education (percentages) ............................................... 152<br />
Table 7.2 Number of work tasks selected by respondents ........................................................................ 153<br />
Table 7.3 Ranked frequency of work tasks <strong>in</strong> questionnaire results ......................................................... 154<br />
Table 7.4 Focus group participants' educational background ................................................................... 156<br />
Table 7.5 Respondents' use of predef<strong>in</strong>ed <strong>in</strong>formation sources (percentages) (to be cont<strong>in</strong>ued on<br />
the succeed<strong>in</strong>g page) ..................................................................................................................... 160<br />
Table 7.6 Questionnaire results regard<strong>in</strong>g the frequency of <strong>in</strong>formation seek<strong>in</strong>g .................................... 166<br />
Table 7.7 Distribution of <strong>in</strong>dicators of <strong>in</strong>formation needs ........................................................................ 172<br />
Table 7.8 Average percentage distribution of verificative needs (VN), conscious topical needs<br />
(CTN), and muddled topical needs (MTN). .................................................................................. 175<br />
Table 7.9 Metadata preferences distributed across work tasks ................................................................. 178<br />
Table 8.1 Frequency of test persons' <strong>in</strong>tranet use ..................................................................................... 185<br />
Table 8.2 Rank<strong>in</strong>g of test persons' most important <strong>in</strong>formation sources .................................................. 186<br />
Table 8.3 General evaluation of simulated search tasks <strong>in</strong> system a, system b, and total (averages) ....... 187<br />
Table 8.4 Evaluation of simulated search tasks specified to s<strong>in</strong>gle simulated search tasks<br />
(averages) ...................................................................................................................................... 187<br />
Table 8.5 General f<strong>in</strong>d<strong>in</strong>gs of variables <strong>in</strong> search test .............................................................................. 188<br />
XIII
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 8.6 Session success (percentages) .................................................................................................. 189<br />
Table 8.7 Query success (percentages) ..................................................................................................... 190<br />
Table 8.8 Number of queries <strong>in</strong> sessions at task level (averages) ............................................................ 191<br />
Table 8.9 Number of queries <strong>in</strong> sessions as to success or failure (averages) ............................................ 192<br />
Table 8.10 Number of search terms <strong>in</strong> queries (averages) ........................................................................ 193<br />
Table 8.11 Number of search keys <strong>in</strong> queries (averages) ......................................................................... 193<br />
Table 8.12 Number of search terms <strong>in</strong> queries as to success or failure (averages) ................................... 194<br />
Table 8.13 Number of search keys <strong>in</strong> queries as to success or failure (averages) .................................... 194<br />
Table 8.14 Distribution of search operator <strong>in</strong> queries (percentages) ........................................................ 195<br />
Table 8.15 Number of search terms used with search operators <strong>in</strong> queries (averages) ............................ 196<br />
Table 8.16 Success of search operators (percentages) .............................................................................. 198<br />
Table 8.17 Document type filter used <strong>in</strong> queries (percentages) ................................................................ 200<br />
Table 8.18 Search success for the document type filter <strong>in</strong> system A and system B queries<br />
(percentages) ................................................................................................................................. 201<br />
Table 8.19 Number of sessions with query reformulations (percentages) ................................................ 202<br />
Table 8.20 Number of reformulations <strong>in</strong> sessions .................................................................................... 203<br />
Table 8.21 Types of reformulations for all queries (percentages) ............................................................ 204<br />
Table 8.22 Query success on the basis of types of reformulations (percentages) ..................................... 205<br />
Table 8.23 Sessions carried out <strong>in</strong> system B, or <strong>in</strong> a comb<strong>in</strong>ation of System B and system A:<br />
Frequency and success (percentages) ............................................................................................ 207<br />
Table 8.24 System of successful queries <strong>in</strong> comb<strong>in</strong>ed system B sessions ................................................ 208<br />
Table 8.25 System B queries: Frequency of category use and query success (percentages) .................... 208<br />
XIV
1 Introduction<br />
1<br />
Chapter 1<br />
Index<strong>in</strong>g has been carried out for centuries start<strong>in</strong>g with manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In the middle<br />
of the last century, automatic methods were <strong>in</strong>troduced as a counterpart. Though both<br />
manual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> has been studied both theoretically and empirically,<br />
researchers are still able to identify shortages <strong>in</strong> our knowledge of <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> terms of<br />
quality, issues of cost-effectiveness, our understand<strong>in</strong>g of the effect of <strong>in</strong>dexers and<br />
<strong>in</strong>formation users’ cognitive processes, and the like (e.g., Milstead, 1994; Anderson &<br />
Perez-Carballo, 2001a, 2001b). The present PhD project explores <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the<br />
context of a specific doma<strong>in</strong>: E-<strong>government</strong>. Specifically, we <strong>in</strong>vestigate the<br />
performance of two methods for subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the doma<strong>in</strong> of e-<strong>government</strong>. The<br />
purpose of the <strong>in</strong>vestigation is to be able to work out a set of recommendations for<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> practice <strong>in</strong> e-<strong>government</strong>.<br />
Information overload is a widely recognized problem today (e.g., Edmunds &<br />
Morris, 2000; Eppler & Mengis, 2004; Codagnone & Wimmer, 2007). Information<br />
overload is a challenge <strong>in</strong> private and public organizations. Simultaneously, the<br />
importance of <strong>in</strong>formation <strong>in</strong> e-<strong>government</strong> cannot be underestimated. Accord<strong>in</strong>g to<br />
Klischewski, “[d]ocument process<strong>in</strong>g is at the core of adm<strong>in</strong>istrative performance <strong>in</strong><br />
several respects: “Documents are the basis for almost all of the adm<strong>in</strong>istrative<br />
processes, they are the most valuable resources to exploit as they are the ma<strong>in</strong> carriers<br />
of <strong>in</strong>formation and represent a large portion of the overall adm<strong>in</strong>istrative knowledge<br />
base” (2006, p. 34). Thus, <strong>in</strong> democracies, documental support is a key issue for<br />
operations undertaken <strong>in</strong> public adm<strong>in</strong>istrations (Kraemer & Dedrick, 1997;<br />
Klischewski, 2006; Sabucedo & Rifón, 2006). The consequences of not be<strong>in</strong>g able to<br />
f<strong>in</strong>d the needed documents for a given task have previously been considered. The<br />
calculations carried out by Feldman & Sherman suggest, that support<strong>in</strong>g corporate<br />
users’ search<strong>in</strong>g for <strong>in</strong>formation is one step towards efficiency and effectiveness<br />
(Glazer, 1993; Feldman & Sherman, 2001). In addition, public adm<strong>in</strong>istrations<br />
expected to offer security to the public. Not be<strong>in</strong>g able to f<strong>in</strong>d needed <strong>in</strong>formation can<br />
have severe costs (Kraemer & Dedrick, 1997). Studies have <strong>in</strong>dicated, that the facilities<br />
of e-<strong>government</strong> systems still leave room for improvement, for <strong>in</strong>stance <strong>in</strong> terms of<br />
search<strong>in</strong>g (e.g., Goh et al., 2008), navigation (e.g., de Jong & Lentz, 2006), the extent of<br />
metadata adoption (e.g., Kopackova, Michalek & Cejna, 2010). In sum, the support of
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
access to <strong>in</strong>formation should be given high priority, if the aim is effective, efficient, and<br />
secure <strong>government</strong>s.<br />
Edmunds & Morris (2000) mentions different methods to reduce <strong>in</strong>formation<br />
overload <strong>in</strong> organizations, e.g., value-added <strong>in</strong>formation. Value-added <strong>in</strong>formation<br />
limits <strong>in</strong>formation overload concurrently with <strong>in</strong>creas<strong>in</strong>g users’ access to relevant<br />
<strong>in</strong>formation. A concrete way of value-add<strong>in</strong>g <strong>in</strong>formation <strong>in</strong> e-<strong>government</strong> documents<br />
is assignment of metadata. Assignment of metadata <strong>in</strong> the doma<strong>in</strong> serves several<br />
purposes, namely allow<strong>in</strong>g <strong>in</strong>teroperability between systems and enabl<strong>in</strong>g users to<br />
retrieve better and more precise search results (Moen, 2001; Tambouris, Manouselis &<br />
Costopoulou, 2007). Further, metadata ease knowledge shar<strong>in</strong>g between employees <strong>in</strong><br />
e-<strong>government</strong> <strong>in</strong>ternally <strong>in</strong> organizations as well as externally (Schwartz, Divit<strong>in</strong>i &<br />
Brasethvik, 2000; Choo, 2006). The multiplicity of metadata standards developed<br />
specifically for e-<strong>government</strong> reflect, that <strong>government</strong>s are very well aware of the<br />
importance of metadata (cf., Tambouris, Manouselis & Costopoulou, 2007). Metadata<br />
can be assigned either manually by humans or automatically on the basis of a mach<strong>in</strong>e<br />
generated analysis of the words constitut<strong>in</strong>g the documents. In e-<strong>government</strong> the<br />
predom<strong>in</strong>ant approach is manual assignment. With the present thesis we want to<br />
<strong>in</strong>vestigate, whether the use of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can be a means to effective and<br />
efficient <strong>government</strong>s that concurrently can support the important process of<br />
<strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> the doma<strong>in</strong>.<br />
The concept of e-<strong>government</strong> designates <strong>government</strong>s that utilize ICT <strong>in</strong> order<br />
to communicate with and allow access to <strong>in</strong>formation for external parties such as<br />
citizens, bus<strong>in</strong>esses and other <strong>government</strong>s (e.g., Fang, 2002; Jaeger, 2003; Grant &<br />
Chau, 2005). A variety of purposes for e-<strong>government</strong> can be identified <strong>in</strong> the literature.<br />
The more important ones are openness, improved and more flexible services for citizens<br />
and bus<strong>in</strong>esses, and <strong>in</strong>creased coherence and efficiency of <strong>government</strong>al processes (e.g.,<br />
Grönlund & Horan, 2004). The concept of e-<strong>government</strong> has emerged worldwide<br />
dur<strong>in</strong>g the latest decade. The Scand<strong>in</strong>avian countries have been pioneers <strong>in</strong> the process<br />
of digitaliz<strong>in</strong>g <strong>government</strong>s. As a result, Scand<strong>in</strong>avia have had favourable appearance<br />
<strong>in</strong> the various <strong>in</strong>ternational e-<strong>government</strong> <strong>in</strong>dexes (Andersen et al., 2005; Henriksen &<br />
Damsgaard, 2006). In Denmark three successive strategies has formed the basis for the<br />
development of e-<strong>government</strong> with<strong>in</strong> the framework of Project Digital Government. In<br />
2002 “Towards e-<strong>government</strong>: vision and strategy for the public sector <strong>in</strong> Denmark”<br />
(Project Digital Government & The Digital Taskforce, 2002) was published. In 2004,<br />
“The Danish eGovernment Strategy 2004-2006: realis<strong>in</strong>g the potential” (The Danish<br />
2
3<br />
Chapter 1<br />
Government et al., 2004) followed. The latest strategy, “The Danish E-Government<br />
Strategy 2007-2010: Towards Better Digital Service, Increased Efficiency, and Stronger<br />
Collaboration” (The Danish Government, Local Government Denmark (LGDK) &<br />
Danish Regions, 2007) appeared <strong>in</strong> 2007. The strategies have been carried out as<br />
cooperation between the most important actors <strong>in</strong> the Danish <strong>government</strong>al system; the<br />
<strong>government</strong>, the regions, and the municipalities. The strategies altogether cover the<br />
period 2001-2010. Dur<strong>in</strong>g the decade they have been <strong>in</strong> function the strategies have<br />
become <strong>in</strong>creas<strong>in</strong>gly specific concurrently with the <strong>in</strong>creased knowledge of e<strong>government</strong>.<br />
In the two latest strategies, automation of employees’ work<strong>in</strong>g processes<br />
has been specifically addressed as a means to reduc<strong>in</strong>g the use of resources. The<br />
pr<strong>in</strong>ciple of effectiveness is carried on <strong>in</strong> the recent mandate for a new strategy that<br />
replaces the exist<strong>in</strong>g strategy <strong>in</strong> 2011 (The Danish Government, Local Government<br />
Denmark & Danish Regions, 2010). Automation of <strong><strong>in</strong>dex<strong>in</strong>g</strong> procedures may thus<br />
support the e-<strong>government</strong> strategy <strong>in</strong> terms of reduc<strong>in</strong>g the resources spent on carry<strong>in</strong>g<br />
out <strong><strong>in</strong>dex<strong>in</strong>g</strong> and search<strong>in</strong>g for <strong>in</strong>formation.<br />
1.1 Research objective<br />
The PhD project has been f<strong>in</strong>anced by the National IT and Telecom Agency,<br />
the Royal School of Library and Information Science, and Department of<br />
Communication, <strong>Aalborg</strong> University. The overall project idea orig<strong>in</strong>ated from the<br />
National IT and Telecom Agency. The agency requested a set of guidel<strong>in</strong>es for the<br />
application of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods that could be used for the Agency’s work<br />
with standardization and <strong>in</strong>teroperability <strong>in</strong> the Danish public sector. We have met this<br />
assignment by focus<strong>in</strong>g on two <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods; automatically extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
(full text <strong><strong>in</strong>dex<strong>in</strong>g</strong>) and automatically assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> (automatic categorization).<br />
Thus, the objective is to evaluate, if automatic categorization as an approach to<br />
automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can improve retrieval performance <strong>in</strong> e-<strong>government</strong> <strong>in</strong> a<br />
professional context. We use the case study approach as our general methodical<br />
approach. The Danish tax authorities SKAT have will<strong>in</strong>gly agreed to be our case of<br />
study. We <strong>in</strong>vestigate employees at SKAT, that is, professional users of <strong>in</strong>formation.<br />
Compared to e-<strong>government</strong> customers (e.g., citizens and bus<strong>in</strong>esses), our target group<br />
constitutes a homogenous user group.<br />
S<strong>in</strong>ce e-<strong>government</strong> represents a specific doma<strong>in</strong>, we carry out the empirical<br />
<strong>in</strong>vestigation of the overall research problem <strong>in</strong> two parts. First we analyse the specific
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
characteristics of the doma<strong>in</strong>. For this purpose we use a questionnaire for ga<strong>in</strong><strong>in</strong>g an<br />
overview of the organization. Subsequently, focus group <strong>in</strong>terviews are employed <strong>in</strong><br />
order to expla<strong>in</strong> and expand the results of the questionnaire survey. The questionnaire<br />
is used to collect data on the employees’ frequency of <strong>in</strong>formation seek<strong>in</strong>g, the types of<br />
<strong>in</strong>formation needs developed, use of <strong>in</strong>formation sources, and metadata preferences <strong>in</strong><br />
relation to specific work tasks <strong>in</strong> the organization. The assumption is that importance of<br />
<strong>in</strong>formation may depend on the work task <strong>in</strong> question. We refer to this first part of the<br />
empirical foundation for the thesis as the doma<strong>in</strong> study.<br />
The second part of the data collection consists of a search test specifically<br />
<strong>in</strong>vestigat<strong>in</strong>g the performance of the two <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods mentioned above. For the<br />
design of the search test we use knowledge ga<strong>in</strong>ed from the doma<strong>in</strong> study <strong>in</strong> order to<br />
qualify the search test design. The search test <strong>in</strong>vestigates the performance of two test<br />
systems. Both test systems employ automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>; one extracted (free text<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>) and one assigned (automatic categorization). Three simulated and one real<br />
search job forms the basis of the test persons’ evaluation of the performance of the test<br />
systems. The relevance of the search results are evaluated by the test persons. The test<br />
sessions are f<strong>in</strong>ished with a short <strong>in</strong>terview.<br />
1.2 Empirical assumptions<br />
The empirical design of the PhD project has been guided by our<br />
methodological start<strong>in</strong>g po<strong>in</strong>t: the cognitive view of <strong>in</strong>formation seek<strong>in</strong>g and retrieval<br />
(cf., Ingwersen & Järvel<strong>in</strong>, 2005). The cognitive viewpo<strong>in</strong>t is methodologically<br />
considered with<strong>in</strong> the research tradition of cognitive constructivism (Talja, Tuom<strong>in</strong>en &<br />
Savola<strong>in</strong>en, 2005). The cognitive viewpo<strong>in</strong>t has emerged as a reaction to a biased focus<br />
on users <strong>in</strong> the user oriented research tradition and on systems <strong>in</strong> the system oriented<br />
research tradition. Thus, the cognitive viewpo<strong>in</strong>t aims at a holistic view on the process<br />
of IR <strong>in</strong>teraction <strong>in</strong> order to achieve <strong>in</strong>tegration between the user oriented and the<br />
system driven research traditions (e.g., Ingwersen, 1992, 1996; Ingwersen & Järvel<strong>in</strong>,<br />
2005). The cognitive view emphasizes the cognitive actors <strong>in</strong>teract<strong>in</strong>g <strong>in</strong> <strong>in</strong>formation<br />
seek<strong>in</strong>g and retrieval. With this view of <strong>in</strong>formation seek<strong>in</strong>g and retrieval, the users and<br />
the <strong>in</strong>formation system must be taken <strong>in</strong>to account when test<strong>in</strong>g performance of an<br />
<strong>in</strong>formation system. As a consequence we test the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods by<br />
<strong>in</strong>volv<strong>in</strong>g real, potential users <strong>in</strong> the search test. Further, we apply an established<br />
evaluation method for the search test, namely simulated search tasks, which have been<br />
4
5<br />
Chapter 1<br />
suggested by Borlund (Borlund & Ingwersen, 1997; Borlund, 2000, 2003b). The<br />
purpose of simulated search tasks is to be able to evaluate IR systems <strong>in</strong> a way that<br />
ensures both realism and experimental control.<br />
In the latest presentations of the cognitive view the importance of context <strong>in</strong><br />
<strong>in</strong>formation seek<strong>in</strong>g and retrieval have received greater emphasis (e.g., Ingwersen &<br />
Järvel<strong>in</strong>, 2005). The cognitive structures of the <strong>in</strong>dividual still constitute the core of the<br />
viewpo<strong>in</strong>t, but the context is considered an <strong>in</strong>fluential component <strong>in</strong> <strong>in</strong>formation<br />
seek<strong>in</strong>g and retrieval. Accord<strong>in</strong>g to Ingwersen & Järvel<strong>in</strong> (2005, p. 19): “...actors and<br />
other components function as context to one another <strong>in</strong> the <strong>in</strong>teraction process. There<br />
are social, organizational, cultural as well as systemic contexts, which evolve over<br />
time.” The dist<strong>in</strong>ct presence of the concept of context <strong>in</strong> the literature emphasizes, that<br />
context must be considered a factor <strong>in</strong> the <strong>in</strong>teraction process. The def<strong>in</strong>ition of what<br />
constitute context have been discussed and operationalized <strong>in</strong> relation to <strong>in</strong>formation<br />
behaviour (cf., Courtright, 2007). In the present work we are concerned with a work<br />
based, organizational context. This calls for a consideration of the <strong>in</strong>fluence of that<br />
specific context as to the results of the search test. This is the ma<strong>in</strong> reason for carry<strong>in</strong>g<br />
out the first part of the empirical data collection: the doma<strong>in</strong> study. We are not guided<br />
by the theoretical foundation of the doma<strong>in</strong> analysis as formulated by Hjørland and<br />
Albrechtsen (1995) s<strong>in</strong>ce it is primarily concerned with scientific doma<strong>in</strong>s. Rather we<br />
are <strong>in</strong>spired by studies similar to the present doma<strong>in</strong> study. Examples count Leckie,<br />
Pettigrew & Sylva<strong>in</strong> (1996), Nielsen (2001) and Freund, Toms & Waterhouse (2005).<br />
1.3 Motivations for the thesis<br />
The present research is motivated by different conditions. We have already<br />
mentioned one of the basic premises of e-<strong>government</strong>, namely effectiveness and<br />
efficiency. With the present study we want to <strong>in</strong>vestigate, whether automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
<strong>in</strong> the form of automatic categorization can contribute to this premise. The motivation<br />
for the study relates to two different aspects; <strong><strong>in</strong>dex<strong>in</strong>g</strong> and the target group <strong>in</strong> question.<br />
The seek<strong>in</strong>g behaviour of e-<strong>government</strong> employees are, to our knowledge, not<br />
very well discovered. To compare, numerous studies have been made of the customers<br />
of e-<strong>government</strong> (e.g., citizens and bus<strong>in</strong>esses) <strong>in</strong> order to evaluate their use of e<strong>government</strong><br />
solutions. Reviews can be found <strong>in</strong> Robb<strong>in</strong>, Courtright & Davis (2004) and<br />
Case (2006). A basic premise for the thesis is that we need to know what characterizes<br />
e-<strong>government</strong> employees’ seek<strong>in</strong>g behaviour and the role of <strong>in</strong>formation <strong>in</strong> the doma<strong>in</strong>
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
<strong>in</strong> order to be able to tailor the <strong><strong>in</strong>dex<strong>in</strong>g</strong> to the seek<strong>in</strong>g behaviour and <strong>in</strong>formation needs<br />
actually experienced by the employees. We will present the studies that after all do<br />
<strong>in</strong>form us about e-<strong>government</strong> users’ seek<strong>in</strong>g behaviour <strong>in</strong> chapter 3.<br />
As for automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>, there are different motivations for suggest<strong>in</strong>g<br />
automatic categorization <strong>in</strong> the present context. Manual assignment of metadata is a<br />
costly and time consum<strong>in</strong>g process for adm<strong>in</strong>istrative employees. If automatic<br />
categorization proves to support and improve the <strong>in</strong>formation seek<strong>in</strong>g of the thesis<br />
target group, it would at the same time support the <strong>in</strong>tentions about <strong>in</strong>creased<br />
effectiveness and efficiency <strong>in</strong> e-<strong>government</strong>. Also, the literature has demonstrated, that<br />
ensur<strong>in</strong>g quality and consistency <strong>in</strong> manually added metadata can be difficult (Anderson<br />
& Perez-Carballo, 2001a; Lancaster, 2003). Thus, manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> tends to depend on<br />
<strong>in</strong>dexers, both across <strong>in</strong>dexers (<strong>in</strong>ter <strong>in</strong>dexer consistency) and across time (<strong>in</strong>tra <strong>in</strong>dexer<br />
consistency). Further, <strong>in</strong> the field of US federal records management, Sprehe, McClure<br />
& Zellner (2002) found, that different situational factors affected the quality of federal<br />
employees’ record keep<strong>in</strong>g, diverg<strong>in</strong>g the quality of the records management across<br />
<strong>government</strong>s. Factors like availability of resources and guidance, the motivation of the<br />
employees, and efficiency of access to records appeared to be affect<strong>in</strong>g the quality of<br />
records management <strong>in</strong> the study. In a recent study of metadata assignment <strong>in</strong> a F<strong>in</strong>nish<br />
<strong>government</strong> the researchers found, that employees prefer not to assign metadata when<br />
they have the option. Also, the employees tend to accept default values, whenever they<br />
are available (Kettunen & Henttonen, 2010). The results suggest that e-<strong>government</strong><br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> might benefit from an automatic solution to <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> a number of ways.<br />
The literature has already demonstrated, that the assignment of metadata is one among<br />
more prerequisites for retrieval and shar<strong>in</strong>g of knowledge <strong>in</strong> organizations (e.g., Choo,<br />
2006). If automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can improve subject metadata, then there is reason to<br />
assume that the retrieval and shar<strong>in</strong>g of knowledge <strong>in</strong> the doma<strong>in</strong> is also <strong>in</strong>fluenced <strong>in</strong> a<br />
positive sense.<br />
To our knowledge, not much is known about how automatically extracted and<br />
automatically assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods supplement each other. The theory of<br />
polyrepresentation suggests that the more different types of representation, the more<br />
cognitive overlap there will be between the representations (Ingwersen, 1996;<br />
Ingwersen & Järvel<strong>in</strong>, 2005). Further, the comb<strong>in</strong>ation of approaches enables tak<strong>in</strong>g<br />
advantage of the strengths of each approach (Anderson & Perez-Carballo, 2001a).<br />
6
Table 1.1 Timel<strong>in</strong>e for data collection <strong>in</strong> the PhD project<br />
Period of time Data type<br />
December 2008 Survey questionnaire<br />
June-July 2009 Focus group <strong>in</strong>terviews<br />
May 2010 Recruitment questionnaire for search test<br />
May-June 2010 Search test<br />
7<br />
Chapter 1<br />
One f<strong>in</strong>al motivation for the thesis concerns automatic categorization.<br />
Categorization represents a structured way of offer<strong>in</strong>g users a subject based overview of<br />
search results. Categorization have been developed <strong>in</strong> different prototypes dur<strong>in</strong>g the<br />
00’s, though rarely for <strong>in</strong>tranets (Käki, 2005a). Thus, we want to <strong>in</strong>vestigate whether<br />
the use of categorization <strong>in</strong> e-<strong>government</strong> is consistent with exist<strong>in</strong>g studies of<br />
categorization.<br />
1.4 Research questions<br />
Our overall research question designates the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods<br />
<strong>in</strong> the doma<strong>in</strong> of e-<strong>government</strong>. The overall research methodology is a s<strong>in</strong>gle case<br />
study. The specific research questions address the doma<strong>in</strong> study (research question 1)<br />
and the search test (research question 2) respectively.<br />
1. What characterizes the e-<strong>government</strong> employee’s <strong>in</strong>formation seek<strong>in</strong>g behaviour <strong>in</strong><br />
relation to:<br />
1.1. Their use of <strong>in</strong>formation sources?<br />
1.2. Their frequency of <strong>in</strong>formation seek<strong>in</strong>g?<br />
1.3. Their <strong>in</strong>formation needs?<br />
1.4. Their metadata preferences?<br />
1.5. How does the seek<strong>in</strong>g behaviour affect demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong>?<br />
The first research question and related sub questions are answered on the basis of the<br />
doma<strong>in</strong> study. The question and sub questions are answered by the quantitative data<br />
collected from the questionnaire and the qualitative follow up focus group <strong>in</strong>terviews<br />
(see timel<strong>in</strong>e of the data collection <strong>in</strong> Table 1.1). Thus the responses aim at provid<strong>in</strong>g a
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
quantitative answer, but also seek to offer explanations for the patterns identified <strong>in</strong> the<br />
questionnaire data.<br />
2. How do automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> and automatic categorization perform <strong>in</strong><br />
relation to the identified doma<strong>in</strong> characteristics as to<br />
2.1. Number of queries <strong>in</strong> sessions?<br />
2.2. Number of terms <strong>in</strong> queries?<br />
2.3. Number of concepts <strong>in</strong> queries?<br />
2.4. The type of search operator applied?<br />
2.5. The use of document type filters?<br />
2.6. Number of reformulations?<br />
2.7. Types of reformulations?<br />
2.8. Degree of search success <strong>in</strong> queries and sessions?<br />
2.9. Overall performance measured by performance measures?<br />
2.10. Which implications does the performance of different <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods have<br />
for future <strong><strong>in</strong>dex<strong>in</strong>g</strong> and <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es <strong>in</strong> the doma<strong>in</strong> of e-<strong>government</strong>?<br />
The empirical basis for the second research question and related sub questions is the<br />
data collected <strong>in</strong> connection with the search test (see Table 1.1). The search test<br />
consists of an experimental comparison test of two <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods. The test was<br />
carried out <strong>in</strong> a realistic sett<strong>in</strong>g <strong>in</strong> a real life <strong>government</strong>al <strong>in</strong>tranet. As the purpose of<br />
the test is to form a basis for ensur<strong>in</strong>g and develop<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> terms of effectiveness<br />
and efficiency, variables measur<strong>in</strong>g search time and effort were important factors <strong>in</strong> the<br />
test design. Questions 2.1-2.7 are answered on the basis of the search log generated<br />
dur<strong>in</strong>g the course of the test. Questions 2.8-2.9 are based on the test persons’<br />
assessment of retrieved outcomes. Post search <strong>in</strong>terviews are <strong>in</strong>cluded to understand<br />
and expla<strong>in</strong> test person behaviour dur<strong>in</strong>g the test. In question 2.10 we sum up the<br />
f<strong>in</strong>d<strong>in</strong>gs of the search test and provide the perspective of <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es for e<strong>government</strong>.<br />
1.5 Structure of the thesis<br />
The thesis reports two <strong>in</strong>terconnected empirical studies. The first study is the<br />
doma<strong>in</strong> study, which is followed by the second study: the search test. The reason for<br />
the succession is that the doma<strong>in</strong> study forms the basis for the search test. The thesis is<br />
8
9<br />
Chapter 1<br />
<strong>in</strong>troduced by a theoretical part. Next follows the empirical part. The theoretical part is<br />
constituted by the chapters 2, 3, 4, and 5. Chapter 2 makes a more thorough<br />
presentation of the empirical assumptions <strong>in</strong>troduced above. Here the methodological<br />
frame guid<strong>in</strong>g both the theoretical parts and the data collection for both doma<strong>in</strong> study<br />
and search test is outl<strong>in</strong>ed. As the case study comprises a part of the methodological<br />
frame, it is also presented here along with a thorough <strong>in</strong>troduction to the specific case:<br />
SKAT.<br />
Chapter 3, 4, and 5 constitute the theoretical basis for the doma<strong>in</strong> study.<br />
Chapter 3 <strong>in</strong>troduces the research area of e-<strong>government</strong>. The purpose of the chapter is to<br />
outl<strong>in</strong>e the doma<strong>in</strong> that the present thesis navigates <strong>in</strong>. In chapter 4 the focus is<br />
narrowed down to analys<strong>in</strong>g what is known about the seek<strong>in</strong>g behaviour of professional<br />
e-<strong>government</strong> users. The theoretical foundation for the search test is presented <strong>in</strong><br />
chapter 5. The chapter conta<strong>in</strong>s a review of manual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The first<br />
part <strong>in</strong>troduces core concepts and understand<strong>in</strong>gs of <strong><strong>in</strong>dex<strong>in</strong>g</strong> and categorization, and<br />
establishes the connection between the two concepts. The second part presents exist<strong>in</strong>g<br />
knowledge on the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods and categorization.<br />
The empirical part of the thesis comprises the chapters 6, 7, and 8. In chapter 6<br />
the applied methods and underly<strong>in</strong>g considerations are presented, firstly for the doma<strong>in</strong><br />
study, secondly for the search test. The chapter f<strong>in</strong>ishes by connect<strong>in</strong>g the empirical<br />
elements to the research questions of the thesis. Chapter 7 presents the results of the<br />
doma<strong>in</strong> study. First, a questionnaire was carried out. The questionnaire was followed<br />
up by 7 focus group <strong>in</strong>terviews. The purpose of the focus groups was a validation and<br />
elaboration of the questionnaire results. The results are reported when relevant to<br />
research question 1 and connected sub questions. Chapter 8 conta<strong>in</strong>s the results of the<br />
search test. The overall aim of chapter 8 is to be able to answer questions raised <strong>in</strong><br />
research question 2. Chapter 9 summarizes and discusses the empirical results. The<br />
thesis is ended by suggestions for further research.
2 Methodological framework<br />
11<br />
Chapter 2<br />
Chapter 2 presents the methodological framework of the thesis. We beg<strong>in</strong> here, as the<br />
theory of scientific method guides the rema<strong>in</strong><strong>in</strong>g of the thesis content. In the research<br />
literature it is suggested to discrim<strong>in</strong>ate methodology from methods. Methodology is a<br />
superior concept that describes, expla<strong>in</strong>s, and justifies the methods used <strong>in</strong> empirical<br />
studies. Methodology may thus be considered a science theoretical or science<br />
philosophical concept address<strong>in</strong>g epistemological concerns. Conversely, method is<br />
subord<strong>in</strong>ate to methodology and designates the specific methods and techniques applied<br />
<strong>in</strong> empirical studies (Wang, 1999). To structure the methodical parts of the thesis we<br />
are follow<strong>in</strong>g this division. Therefore, <strong>in</strong> the present chapter we will present the<br />
methodological issues that have guided the research design and the collection of data.<br />
In a later chapter (Chapter 6), we account for the specific methods applied to collect the<br />
data that constitutes the empirical basis of the thesis.<br />
2.1 A cognitive framework for <strong>in</strong>formation research<br />
As mentioned <strong>in</strong> the <strong>in</strong>troduction, we have been work<strong>in</strong>g with<strong>in</strong> the the<br />
cognitive framework of <strong>in</strong>formation science. The cognitive view was proposed the first<br />
time <strong>in</strong> 1977 (De Mey, 1977; Ingwersen & Järvel<strong>in</strong>, 2005). Here, the cognitive<br />
viewpo<strong>in</strong>t was proposed as a reaction to the two predom<strong>in</strong>ant research traditions at the<br />
time; the system driven and the user oriented research traditions. With<strong>in</strong> the systemdriven<br />
research tradition significant results has been achieved regard<strong>in</strong>g for <strong>in</strong>stance<br />
best-match retrieval models, Boolean logic, question answer<strong>in</strong>g, and cross-language<br />
retrieval. The user oriented research tradition on the other hand have obta<strong>in</strong>ed<br />
equivalently essential results, though <strong>in</strong> relation to <strong>in</strong>creas<strong>in</strong>g our understand<strong>in</strong>g of enduser<br />
search<strong>in</strong>g, doma<strong>in</strong> oriented <strong>in</strong>formation behaviour and the like (Ingwersen, 1996;<br />
Ingwersen & Järvel<strong>in</strong>, 2005). Despite the respective importance of their f<strong>in</strong>d<strong>in</strong>gs, the<br />
two research traditions have been criticized for be<strong>in</strong>g unilateral <strong>in</strong> their methodological<br />
approaches. Thus, the system-driven tradition has been follow<strong>in</strong>g the pr<strong>in</strong>ciple of test<br />
collections, a pr<strong>in</strong>ciple that arose from the Cranfield model. The Cranfield model<br />
measured retrieval performance on the basis of a test collection, a set of queries, and a
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Figure 2.1 The participat<strong>in</strong>g actors <strong>in</strong> context. Model adapted from Ingwersen & Järvel<strong>in</strong> (2005, p.<br />
261) with m<strong>in</strong>or corrections.<br />
set of relevance assessments (Borlund, 2003b). This laboratory like-approach is the<br />
counterpo<strong>in</strong>t to the user oriented tradition. Like <strong>in</strong> the system-driven tradition, the user<br />
oriented tradition is to a large extent based on empirical <strong>in</strong>vestigations, but the<br />
perspective is operational. Furthermore <strong>in</strong>formation users, and not IR systems, are the<br />
focus of attention (Ingwersen, 1996). This leads to a contrast between the traditions,<br />
which has been summed up by Robertson & Hancock-Beaulieu to comprise “…on the<br />
one hand, control over experimental variables, observability, and repeatability, and on<br />
the other hand, realism.” (1992, p. 460).<br />
It was as a reaction towards just this contrast that the cognitive view emerged.<br />
The pioneers of the cognitive viewpo<strong>in</strong>t reacted towards what was considered a onesided<br />
focus on IR systems or users respectively. Instead an alternative approach was<br />
suggested that offered a holistic picture of the IR process. It was acknowledged that <strong>in</strong><br />
order to ga<strong>in</strong> a comprehensive picture of the process of IR <strong>in</strong>teraction, the cognitive<br />
structure of all cognitive actors of the process of <strong>in</strong>teraction needed to be acknowledged<br />
and taken <strong>in</strong>to consideration (cf. Figure 2.1). Five dimensions represent and summarize<br />
the cognitive view. They comprise:<br />
1. “Information process<strong>in</strong>g takes place <strong>in</strong> senders and recipients of messages;<br />
2. Process<strong>in</strong>g takes place at different levels;<br />
12
13<br />
Chapter 2<br />
3. Dur<strong>in</strong>g communication of <strong>in</strong>formation any actor is <strong>in</strong>fluenced by its past<br />
and present experiences (time) and its social, organizational and cultural<br />
environment;<br />
4. Individual actors <strong>in</strong>fluence the environment or doma<strong>in</strong>;<br />
5. Information is situational and contextual.” (Ingwersen & Järvel<strong>in</strong>, 2005, p.<br />
25).<br />
Thus, <strong>in</strong> the cognitive view, senders and recipients of messages not only encompass<br />
<strong>in</strong>formation users, but any actor contribut<strong>in</strong>g to or participat<strong>in</strong>g <strong>in</strong> an aspect of the<br />
process of IR at that (Ingwersen & Järvel<strong>in</strong>, 2005, p. 27). By that means the framework<br />
supported the <strong>in</strong>tegration of IR techniques and IR systems <strong>in</strong>clud<strong>in</strong>g their underly<strong>in</strong>g<br />
cognitive structures and human <strong>in</strong>formation users and their <strong>in</strong>formation behaviour. In<br />
sum it was emphasized that the approach was not solely user oriented, but rather offered<br />
a framework for all human actors and their cognitive structures <strong>in</strong>volved <strong>in</strong> IR<br />
<strong>in</strong>teraction (Ingwersen & Järvel<strong>in</strong>, 2007, p. 141).<br />
However, the attention to all cognitive actors did not reduce <strong>in</strong>terest for the<br />
<strong>in</strong>formation user. Thus, the <strong>in</strong>formation need of the user functioned as the benchmark<br />
for measurement of the success of IR systems. The understand<strong>in</strong>g of users’ <strong>in</strong>formation<br />
needs and their formation has been captured by the ASK-hypothesis. The hypothesis<br />
stated that an <strong>in</strong>formation need arises from an anomaly <strong>in</strong> a user’s state of knowledge<br />
concern<strong>in</strong>g a topic or situation. Thus, <strong>in</strong> preparation for IR, users should be asked to<br />
describe the anomaly rather than to state a request represent<strong>in</strong>g the <strong>in</strong>formation need to<br />
an IR system (Belk<strong>in</strong>, Oddy & Brooks, 1982, p. 62). To summarize, the cognitive view<br />
allowed for a more detailed representation of <strong>in</strong>formation users compared to what was<br />
previously known from the system driven and the user oriented research traditions.<br />
2.1.1 Towards a holistic cognitive framework<br />
From the very beg<strong>in</strong>n<strong>in</strong>g researchers with<strong>in</strong> the cognitive view were ma<strong>in</strong>ly<br />
concerned with <strong>in</strong>dividual variances of cognitive structures. However, developments <strong>in</strong><br />
surround<strong>in</strong>g research areas have <strong>in</strong> the early 1990’s caused proportional change with<strong>in</strong><br />
the cognitive framework towards an <strong>in</strong>creased attention to contextual matters.<br />
Ingwersen br<strong>in</strong>gs out two particular papers as landmark to the change of focus<br />
(Ingwersen, 1999, p. 11 ff.). One is Schamber, Eisenberg & Nilan’s (1990) paper on the<br />
concept of situational relevance. On the basis of a thorough review the authors<br />
characterize situational relevance to be a:
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
1. “[…] multidimensional cognitive concept whose mean<strong>in</strong>g is largely dependent<br />
on users’ perceptions of <strong>in</strong>formation and their own <strong>in</strong>formation need<br />
situations[…]<br />
2. […] dynamic concept that depends on users’ judgments of the quality of the<br />
relationship between <strong>in</strong>formation and <strong>in</strong>formation need at a certa<strong>in</strong> po<strong>in</strong>t <strong>in</strong><br />
time[…]<br />
3. […] complex but systematic and measurable concept if approached<br />
conceptually and operationally from the user’s perspective.” (Schamber,<br />
Eisenberg & Nilan, 1990, 1990, p. 774).<br />
With Schamber, Eisenberg & Nilans paper, the discussion of relevance was re-opened.<br />
The other paper accentuated by Ingwersen is Robertson & Hancock-Beaulieu’s (1992)<br />
manifestation of the relevance revolution, the cognitive revolution, and the <strong>in</strong>teractive<br />
revolution. The relevance revolution addresses the change towards see<strong>in</strong>g stated<br />
requests and <strong>in</strong>formation needs as two separate phenomenons. The implication is that<br />
relevance should be assessed on the basis of the <strong>in</strong>formation need, and not the request.<br />
The cognitive revolution is closely connected to the relevance revolution and states the<br />
grow<strong>in</strong>g tendency towards <strong>in</strong>clud<strong>in</strong>g cognitive perspectives <strong>in</strong>to the process of IR.<br />
Lastly, the <strong>in</strong>teractive revolution articulates the <strong>in</strong>creased <strong>in</strong>teractivity of IR<br />
systems. This development necessitates a move away from the pr<strong>in</strong>ciple of evaluat<strong>in</strong>g<br />
IR systems <strong>in</strong> terms of “one request <strong>in</strong>, one set of results out”. Instead, time and<br />
Figure 2.2 Extension of the cognitive view, the <strong>in</strong>teractive process of IR and affect<strong>in</strong>g factors.<br />
Adapted from Ingwersen & Järvel<strong>in</strong> (2005, p. 274) with m<strong>in</strong>or corrections.<br />
14
15<br />
Chapter 2<br />
situation need to be taken <strong>in</strong>to consideration <strong>in</strong> order to do justice to the special<br />
characteristics of <strong>in</strong>teractive IR (IIR) (Robertson & Hancock-Beaulieu, 1992, pp. 458-<br />
459). The three revolutions challenge the simplified conception of the IR process<br />
presented <strong>in</strong> the system driven research tradition and po<strong>in</strong>t out that far more factors<br />
<strong>in</strong>fluence the process. The outcome of the developments was an <strong>in</strong>creased focus on<br />
context and <strong>in</strong>teraction <strong>in</strong> the process of IR (see Figure 2.2).<br />
With the shift<strong>in</strong>g of focus, an equivalent change of potential research areas<br />
emerged. To illustrate, five categories of variables appear from Figure 2.2; 1)<br />
organizational task dimensions; 2) actor dimensions; document dimensions; 4)<br />
algorithmic dimensions; and 5) access and <strong>in</strong>teraction dimensions (Ingwersen &<br />
Järvel<strong>in</strong>, 2005, p. 313-314). The <strong>in</strong>tention of the model is to illustrate the <strong>in</strong>fluences<br />
and <strong>in</strong>teractions tak<strong>in</strong>g place dur<strong>in</strong>g IR <strong>in</strong>teraction. Not all studies should necessarily<br />
<strong>in</strong>corporate all elements <strong>in</strong> order to f<strong>in</strong>d themselves with<strong>in</strong> the framework. Rather, they<br />
serve as possible explanations for patterns identified with<strong>in</strong> empirical f<strong>in</strong>d<strong>in</strong>gs.<br />
2.1.2 The role of work tasks<br />
Along with the <strong>in</strong>creased <strong>in</strong>clusion of context and <strong>in</strong>teraction <strong>in</strong> the cognitive<br />
framework, work tasks (or daily-life tasks) have become more central. The work task<br />
methodology was <strong>in</strong>troduced to LIS <strong>in</strong> the early 1990s (Vakkari, 2003). The basic<br />
assumption of us<strong>in</strong>g tasks as the foundation of <strong>in</strong>formation seek<strong>in</strong>g and retrieval studies<br />
is that an <strong>in</strong>formation <strong>in</strong>tensive task <strong>in</strong>volves <strong>in</strong>formation related actions. Thus, the task<br />
becomes a framework for analysis of IR systems (Byström & Hansen, 2005). The work<br />
task methodology has ma<strong>in</strong>ly been applied to professional work tasks. Lately, however,<br />
also non-job related tasks have been <strong>in</strong>vestigated with<strong>in</strong> the context of the task<br />
methodology (e.g., Savola<strong>in</strong>en, 1995; Skov, 2009).<br />
Tasks are important to the cognitive view, because it is considered as “the<br />
central element of the context” (Ingwersen & Järvel<strong>in</strong>, 2005, p. 29). Thus, a work task<br />
arises from an <strong>in</strong>cident outside of the user and triggers an <strong>in</strong>formation need with<strong>in</strong> the<br />
user, which aga<strong>in</strong> triggers seek<strong>in</strong>g behaviour (see Figure 2.3). As a result, to understand<br />
seek<strong>in</strong>g behaviour and IR <strong>in</strong>teraction, we must understand the composition of tasks and<br />
their contextual orig<strong>in</strong>. For evaluation purposes with<strong>in</strong> a cognitive frame, build<strong>in</strong>g on<br />
genu<strong>in</strong>e tasks may be challeng<strong>in</strong>g, as their extent and usefulness may vary a lot. As a<br />
consequence comparison between results is impeded. Therefore, to ensure experimental
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Figure 2.3 Information behaviour and the <strong>in</strong>fluence from job- or non-job related tasks. Adapted<br />
from Ingwersen & Järvel<strong>in</strong>(Ingwersen & Järvel<strong>in</strong>, 2005, p. 198).<br />
control and realism, simulated work tasks have been proposed as a methodical tool.<br />
Here cover stories are handed out to <strong>in</strong>formation users to form the basis for <strong>in</strong>formation<br />
search<strong>in</strong>g. On this basis of the story, <strong>in</strong>formation needs are formed with<strong>in</strong> the user, that<br />
serve as an equal po<strong>in</strong>t of departure for <strong>in</strong>teraction with the IR system under evaluation<br />
(Borlund, 2000, 2003b). A consequence of us<strong>in</strong>g tasks as the basel<strong>in</strong>e for evaluation is<br />
the application of situational relevance for measurement of performance (cf. Saracevic,<br />
1996; Borlund, 2003a).<br />
2.2 The cognitive framework and the thesis<br />
The cognitive framework was chosen as the methodological frame of reference<br />
<strong>in</strong> the present work. The quantitative extent of the framework may be discussed. Thus,<br />
arguments exist on a wide extension of the framework (Cole & Leide, 2006, p. 175) and<br />
vice versa (e.g., Järvel<strong>in</strong>, 2007). Regardless of the prevalence we have applied it to<br />
guide the empirical part of the project. The overall reason was the nature of the task set<br />
by the National IT and Telecom Agency, to produce a foundation for giv<strong>in</strong>g guidel<strong>in</strong>es<br />
for automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> with<strong>in</strong> the particular doma<strong>in</strong> of e-<strong>government</strong>. To be able to<br />
give guidel<strong>in</strong>es, we needed to discover the actual use of the IR technique among e<strong>government</strong><br />
employees, as they are the target user group of the project. That required a<br />
methodological framework allow<strong>in</strong>g for a search test with a contextual perspective. For<br />
this purpose the cognitive framework was found suitable. Hereby we were able to<br />
16
17<br />
Chapter 2<br />
discover the doma<strong>in</strong> specific characteristics of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods and add to the general<br />
and very extensive body of knowledge regard<strong>in</strong>g the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods.<br />
The methodology is mirrored throughout the research design of the thesis. The<br />
<strong>in</strong>itial doma<strong>in</strong> study serves the purpose of uncover<strong>in</strong>g contextual characteristics of the<br />
doma<strong>in</strong> <strong>in</strong> question, and of provid<strong>in</strong>g doma<strong>in</strong> knowledge and <strong>in</strong>sight. In Figure 2.2 this<br />
corresponds to the right hand side of the model. Different methods have been comb<strong>in</strong>ed<br />
for the doma<strong>in</strong> study. Initially, exist<strong>in</strong>g studies on seek<strong>in</strong>g behavior with<strong>in</strong> the doma<strong>in</strong><br />
and adjacent doma<strong>in</strong>s were reviewed. As the amount of exist<strong>in</strong>g studies turned out to<br />
be fairly limited, the review is followed up by an empirical doma<strong>in</strong> study consist<strong>in</strong>g of a<br />
survey questionnaire and 7 focus group <strong>in</strong>terviews. In similar manner, the search test<br />
also reflects the methodology. In Figure 2.2 the search test comprise the center and left<br />
hand side components. Here employees are asked to evaluate a test system on the basis<br />
of a number of simulated work tasks. As called for <strong>in</strong> the cognitive framework,<br />
situational relevance is applied for assessment of search results.<br />
2.3 Overall research method: Case study<br />
The method applied <strong>in</strong> the thesis is a s<strong>in</strong>gle case study (Y<strong>in</strong>, 2003, p. 39-40)<br />
of a large Danish <strong>government</strong>al organization: SKAT. Different motivations exist for<br />
do<strong>in</strong>g case studies. The predom<strong>in</strong>ant rationale <strong>in</strong> the present research study is that the<br />
organisation <strong>in</strong> question constitutes a unique case <strong>in</strong> Denmark due to its pioneer<br />
position with<strong>in</strong> e-<strong>government</strong> (see e.g., Østergaard & Olesen, 2004). The strength of<br />
case studies is their ability to draw on multiple sources of data. Further, case studies<br />
cover contextual aspects of the case <strong>in</strong> question (Y<strong>in</strong>, 2003). The research design<br />
reported here consists of two ma<strong>in</strong> parts; a doma<strong>in</strong> study and a search test. The doma<strong>in</strong><br />
study employs a survey questionnaire and focus group <strong>in</strong>terviews as data sources. The<br />
search test aims at a controlled environment. As <strong>in</strong> the doma<strong>in</strong> study, we document the<br />
search test with both quantitative and qualitative data.<br />
2.4 The case: SKAT<br />
The prevail<strong>in</strong>g task of SKAT is to collect the major part of taxes <strong>in</strong> Denmark.<br />
The organization handles all adm<strong>in</strong>istration related to taxes, duties, customs, debt<br />
collection, tax assessment of real estate and cars, and gam<strong>in</strong>g activities (SKAT, 2010).
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
SKAT is among the largest adm<strong>in</strong>istrations of the Danish state <strong>in</strong> terms of employees,<br />
when compared to similar adm<strong>in</strong>istrations (Personalestyrelsen, 2010). The<br />
organization has approximately 8.500 employees located at different office locations<br />
across Denmark. SKAT has grown over the years due to several mergers of former<br />
s<strong>in</strong>gle, m<strong>in</strong>or organizations (e.g., Johansen, 2007). In this manner, the organization<br />
handles highly diverse work tasks. Snellen (1989, cited from Lips, 1998, p. 326) has<br />
identified three levels <strong>in</strong> <strong>government</strong>s’ service environment; the macro level, the meso<br />
level, and the micro level. SKAT operates at all three levels, serv<strong>in</strong>g the parliament,<br />
bus<strong>in</strong>esses, and citizens. SKAT is organized by tasks rather than geography. In<br />
practice this means, that specialized work tasks have been consolidated at certa<strong>in</strong><br />
geographic locations. The purpose of the sub departments is to serve at national level<br />
(SKAT, 2010). This organizational structure allows for a highly specialized<br />
knowledge among the employees.<br />
Some years ago SKAT carried out a bus<strong>in</strong>ess model for <strong>in</strong>ternal use. The<br />
purpose of the bus<strong>in</strong>ess model was to be able to comprise all work tasks carried out by<br />
the organization. The work identified 19 condensed work tasks distributed across 6<br />
ma<strong>in</strong> processes. The ma<strong>in</strong> processes are: Instruction, settlement, <strong>in</strong>spection, collection,<br />
processes of support, and management and development. The two latter ma<strong>in</strong><br />
processes are <strong>in</strong>ternal processes or aimed at servic<strong>in</strong>g the parliamentary part of<br />
Denmark while the former four has citizens and companies as their target group of<br />
Figure 2.4 SKATs revised bus<strong>in</strong>ess model<br />
18
19<br />
Chapter 2<br />
tasks appear from Appendix 1. In between the data collection for the doma<strong>in</strong> study<br />
service. Work tasks carried out across the organization had been described centrally <strong>in</strong><br />
the organization, while department specific work tasks were described by the<br />
responsible departments. A description of the ma<strong>in</strong> processes and condensed work<br />
and the search test a slight correction of the bus<strong>in</strong>ess model was made. The six ma<strong>in</strong><br />
processes rema<strong>in</strong>ed <strong>in</strong>tact, but the condensed work tasks were extended to be applied<br />
across all ma<strong>in</strong> processes. The revised bus<strong>in</strong>ess model is depicted <strong>in</strong> Figure 2.4. The<br />
size and importance of the ma<strong>in</strong> processes is, at least quantitatively, mirrored by the<br />
distribution of employees. Thus, settlement and <strong>in</strong>spection are the largest of the ma<strong>in</strong><br />
processes, cover<strong>in</strong>g approximately 60 percent of the entire workforce. The rema<strong>in</strong><strong>in</strong>g<br />
40 percent are divided between the 4 rema<strong>in</strong>der of the ma<strong>in</strong> processes (see Appendix<br />
2). Translated to the term<strong>in</strong>ology of Byström & Hansen (2005) the condensed work<br />
tasks are at task description level. The ma<strong>in</strong> processes represent the lowest level of<br />
granularity compared to the condensed work tasks (cf. Vakkari, 2003). But also the<br />
condensed work tasks are fairly coarse gra<strong>in</strong>ed. In the bus<strong>in</strong>ess model the generic<br />
work tasks conta<strong>in</strong> more specific sub task descriptions. In Freund, Toms &<br />
Waterhouse’s (2005) term<strong>in</strong>ology this way of operationaliz<strong>in</strong>g work tasks is contentbased.<br />
As a result it is specifically directed towards tax employees <strong>in</strong> the case<br />
organization.<br />
2.4.1 The <strong>in</strong>tranet<br />
The <strong>in</strong>tranet of SKAT functions as the test system for the search test. The<br />
<strong>in</strong>tranet is a CMS based solution accessible to all employees with<strong>in</strong> the organization<br />
(White, 2005). The <strong>in</strong>tranet mirrors the official web portal of SKAT, which is open to<br />
the public on the web (see http://www.skat.dk). The public portal communicates<br />
<strong>in</strong>formation directed towards citizens, bus<strong>in</strong>esses and legal advisors. Specifically, the<br />
portal conta<strong>in</strong>s legal directions, citizen and bus<strong>in</strong>ess directions and brochures, legal<br />
documents, forms, news, etc. Further, the portal conta<strong>in</strong>s a section for self service for<br />
both citizens and bus<strong>in</strong>esses. On the <strong>in</strong>tranet additional documents are available to the<br />
employees. Examples count m<strong>in</strong>utes, job post<strong>in</strong>gs, reports from f<strong>in</strong>ished <strong>in</strong>ternal<br />
projects, HR <strong>in</strong>formation and other <strong>in</strong>ternal <strong>in</strong>formation from the organization and<br />
departments to the rema<strong>in</strong><strong>in</strong>g employees. The <strong>in</strong>tranet conta<strong>in</strong>s documents from June<br />
25, 1998 and onwards. By June 2010 the number of documents <strong>in</strong> the database was<br />
681.640. The <strong>in</strong>tranet further facilitates personalization of the <strong>in</strong>terface <strong>in</strong> order to<br />
optimize which <strong>in</strong>formation is offered to <strong>in</strong>dividual employees. In sum, we may
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
characterize the <strong>in</strong>tranet as a knowledge portal <strong>in</strong> terms of Dias (2001). Thus, the<br />
<strong>in</strong>tranet is a corporate portal enabl<strong>in</strong>g decision support and collaborative process<strong>in</strong>g. In<br />
addition the “F<strong>in</strong>d colleague” function (“F<strong>in</strong>d kollega”) assists <strong>in</strong> locat<strong>in</strong>g colleagues<br />
either on the basis of organizational affiliation, physical location or expertise, which<br />
corresponds to an <strong>in</strong>tegrated expertise portal.<br />
Apply<strong>in</strong>g the <strong>in</strong>tranet for the search test has a number of implications <strong>in</strong><br />
empirical respect. With this choice of test system the search test belongs to the research<br />
area of enterprise search. Enterprise search <strong>in</strong>cludes organizations with electronic text<br />
content, and search of the organization’s <strong>in</strong>tra-, Internet, or other digitalized text (cf.<br />
Hawk<strong>in</strong>g, 2004). Furthermore, a number of characteristics are shared between<br />
corporate <strong>in</strong>tranets and the web. Thus, both are based on web technology. They<br />
demonstrate a great heterogeneity as to the document collection, a dynamic nature, and<br />
both enable hyper l<strong>in</strong>k<strong>in</strong>g between documents (cf., Fag<strong>in</strong> et al., 2003; Rasmussen,<br />
2003). However, the two system types also differ <strong>in</strong> several respects. Firstly, the<br />
premises of the two system types differentiate. Thus, the function of the web is a<br />
democratic <strong>in</strong>strument allow<strong>in</strong>g everyone to express anyth<strong>in</strong>g. On the contrary,<br />
<strong>in</strong>tranets<br />
Figure 2.5 Screen dump from exist<strong>in</strong>g <strong>in</strong>tranet <strong>in</strong>terface<br />
20
21<br />
Chapter 2<br />
are an organizational tools communicat<strong>in</strong>g <strong>in</strong>formation of relevance for ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g<br />
enterprise work tasks (Fag<strong>in</strong> et al., 2003; Mukherjee & Mao, 2004; Stenmark, 2005).<br />
In 2003, Fag<strong>in</strong> et al. have stated 4 axioms compil<strong>in</strong>g further differences between the<br />
web and <strong>in</strong>tranets. In short the axioms state that by contrast to <strong>in</strong>ternet documents,<br />
<strong>in</strong>tranet documents are ma<strong>in</strong>ly created for distribution of <strong>in</strong>formation, not for attract<strong>in</strong>g<br />
the attention of potential users. In addition, a large amount of <strong>in</strong>tranet queries have a<br />
small set of correct answers, if not even unique answers. Also, <strong>in</strong>tranets are most<br />
likely spam free due to limitations as regards publish<strong>in</strong>g access. Lastly, <strong>in</strong>tranets are<br />
not expected to be search eng<strong>in</strong>e friendly due to the lack of <strong>in</strong>terl<strong>in</strong>k<strong>in</strong>g between<br />
documents. Denot<strong>in</strong>g the characteristics as axioms, Fag<strong>in</strong> et al <strong>in</strong>dicate that we do not<br />
have empirical evidence for the correctness of the differences. The lack of empirical<br />
confirmation may be expla<strong>in</strong>ed by the difficulties of ga<strong>in</strong><strong>in</strong>g systematic access to<br />
perform data collection at corporate <strong>in</strong>tranets (cf. Stenmark, 2005). In terms of the<br />
present <strong>in</strong>vestigation we will account for the specific characteristics concern<strong>in</strong>g<br />
<strong>in</strong>tranets, whenever we have empirical evidence as support.<br />
2.4.2 The <strong>in</strong>tranet taxonomy<br />
The process of <strong><strong>in</strong>dex<strong>in</strong>g</strong> on the Internet is obviously by far more extensive than on an<br />
<strong>in</strong>tranet due to the disparity between numbers of documents. However the need for<br />
organiz<strong>in</strong>g documents on corporate <strong>in</strong>tranets also <strong>in</strong>creases along with the number of<br />
documents stored (cf. Gilchrist, 2001). This is mirrored by the differences between the<br />
former and the current taxonomy used on SKATs <strong>in</strong>tranet. As mentioned above, a new<br />
and enlarged taxonomy was <strong>in</strong>troduced on the <strong>in</strong>tranet as of the beg<strong>in</strong>n<strong>in</strong>g of 2008. The<br />
ma<strong>in</strong> functions of a taxonomy is to be able to elim<strong>in</strong>ate uncerta<strong>in</strong>ty, control synonyms,<br />
and establish hierarchical relationships (Zeng, 2008). The preced<strong>in</strong>g taxonomy<br />
corresponded to these characteristics apart from the latter. Thus, the taxonomy had a<br />
flat structure with a one level hierarchy. 25 subject terms represented the taxonomy.<br />
The succeed<strong>in</strong>g taxonomy was expanded <strong>in</strong> different aspects result<strong>in</strong>g <strong>in</strong> a more detailed<br />
presentation of corporate, controlled terms. One change was the <strong>in</strong>troduction of a<br />
second level <strong>in</strong> the hierarchy that enabled an <strong>in</strong>crease of specificity <strong>in</strong> topic<br />
representations. Also the number of terms <strong>in</strong>cluded <strong>in</strong>creased. As of march 2010, the<br />
taxonomy <strong>in</strong>corporated 169 terms at both levels of the hierarchy. Lastly, the controlled<br />
terms of the taxonomy had been supplied with mouse over texts, which basically had<br />
the form of scope notes as known from thesauri. By these means further reduction of<br />
ambiguity and <strong>in</strong>creased control of synonyms are ga<strong>in</strong>ed.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Hitherto the <strong><strong>in</strong>dex<strong>in</strong>g</strong> of <strong>in</strong>tranet documents have been carried out manually by<br />
a large group of <strong>in</strong>dexers distributed across the organization (between 1000-1500<br />
<strong>in</strong>dexers). A corporate taxonomy has formed the basis for the controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong>. It is<br />
a common practice <strong>in</strong> e-<strong>government</strong>s <strong>in</strong> general, that employees attach subject terms to<br />
adm<strong>in</strong>istrative documents. In section 5.3.3, we presented three different k<strong>in</strong>ds of<br />
<strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>, namely expert-led, author-based, and user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The<br />
manual assignment of subject terms carried out by the employees <strong>in</strong> the organization is<br />
not easily characterized as one or the other. The expert-led type is represented <strong>in</strong> the<br />
way, that not all employees handle the assignment. Rather, a group of employees carry<br />
out the task, though the number is quite large. On one side, when a group of employees<br />
has been appo<strong>in</strong>ted to the task, it is reasonable to expect, that they have a more detailed<br />
<strong>in</strong>sight <strong>in</strong>to the taxonomy compared to the non-<strong><strong>in</strong>dex<strong>in</strong>g</strong> colleagues. On the other hand,<br />
the large number of <strong>in</strong>dexers could mean, that the <strong><strong>in</strong>dex<strong>in</strong>g</strong> task is not a very frequent<br />
one, which aga<strong>in</strong> results <strong>in</strong> a limited <strong>in</strong>sight <strong>in</strong>to the taxonomy. One th<strong>in</strong>g is certa<strong>in</strong><br />
about the group of <strong>in</strong>dexers; the typical <strong>in</strong>dexer is not a professional <strong>in</strong>dexer <strong>in</strong> the sense<br />
that he or she carries a LIS degree. The <strong><strong>in</strong>dex<strong>in</strong>g</strong> at SKAT also conta<strong>in</strong>s elements of<br />
author-based <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the sense, that the <strong>in</strong>dexers occasionally will be the authors of<br />
the <strong>in</strong>dexed documents. Lastly, the <strong><strong>in</strong>dex<strong>in</strong>g</strong> may also be characterized as user-based <strong>in</strong><br />
the sense, that the <strong>in</strong>dexers apart from be<strong>in</strong>g <strong>in</strong>dexers are also users of the system.<br />
The document collection at the exist<strong>in</strong>g <strong>in</strong>tranet can be divided <strong>in</strong> two groups;<br />
documents published before December 31, 2007 and documents published from January<br />
1, 2008 and ahead. January 1, 2008 signifies the day, when a revised taxonomy was<br />
taken <strong>in</strong>to use <strong>in</strong> the case organization. The implementation of the revised taxonomy<br />
had different implications. The manual assignment of <strong>in</strong>dex terms cont<strong>in</strong>ued after the<br />
deadl<strong>in</strong>e, though follow<strong>in</strong>g the structure of the revised taxonomy. However, at the same<br />
time the <strong>in</strong>dex terms assigned to the former group of documents were deleted <strong>in</strong> the<br />
database. Therefore documents published before January 1, 2008 could only be<br />
searched by free text <strong><strong>in</strong>dex<strong>in</strong>g</strong>. When our cooperation with SKAT started, the<br />
organization was already work<strong>in</strong>g on a new portal solution encompass<strong>in</strong>g their <strong>in</strong>ternet<br />
and <strong>in</strong>tranet. The new portal comprises different changes and improvements <strong>in</strong>clud<strong>in</strong>g<br />
automatic categorization of search results, which is brought <strong>in</strong>to focus <strong>in</strong> the present<br />
thesis.<br />
22
2.5 Summary<br />
23<br />
Chapter 2<br />
The present chapter have presented and argued for the overall research<br />
methodology applied for the PhD project. We have reviewed the cognitive framework<br />
and its development from an <strong>in</strong>dividualistic towards a contextual methodological<br />
foundation. The choice of methodological standpo<strong>in</strong>t enables the collection and<br />
analysis of data that supplements the exist<strong>in</strong>g general knowledge on the performance of<br />
automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods. With<strong>in</strong> the cognitive framework the case study<br />
methodology has been applied as the overall frame for the specific collection and<br />
analysis of data reported later. Specifically, we carry out a case study of a Danish<br />
organization, a pioneer <strong>in</strong> terms of e-<strong>government</strong>: SKAT.
3 The e-<strong>government</strong> doma<strong>in</strong><br />
25<br />
Chapter 3<br />
Dur<strong>in</strong>g the past century <strong>government</strong>s all over the World have experienced a cont<strong>in</strong>uous<br />
<strong>in</strong>crease <strong>in</strong> demands for effectivity of procedures and work rout<strong>in</strong>es simultaneously<br />
with expectations for accuracy and quality <strong>in</strong> public servants’ handl<strong>in</strong>g of work tasks<br />
(eg. Homburg, 2004). Increased transparency of <strong>government</strong>s towards citizens has<br />
been another predom<strong>in</strong>ant demand on <strong>government</strong>s dur<strong>in</strong>g the period (eg. Bertot,<br />
Jaeger & Grimes, 2010). The demands for transparency have resulted <strong>in</strong> numerous<br />
technical solutions for citizen access, e.g., self-service, and subsequent user<br />
evaluations. However, the citizen perspective on e-<strong>government</strong> will not be <strong>in</strong>cluded <strong>in</strong><br />
further detail here due to our focus on employees.<br />
The development of <strong>government</strong>s has taken place both at local, national, and<br />
<strong>in</strong>ternational levels. It is <strong>in</strong> this light that the concept of e-<strong>government</strong> has emerged.<br />
Thus, digitalization of <strong>government</strong>s has been an important step towards resolv<strong>in</strong>g the<br />
challenges of <strong>in</strong>creas<strong>in</strong>g effectivity and quality of <strong>government</strong>al processes. The<br />
exam<strong>in</strong>ation of e-<strong>government</strong> as a research area started to grow <strong>in</strong> the late 1990’s (e.g.,<br />
Grönlund & Horan, 2004; Helbig et al., 2008). S<strong>in</strong>ce then the <strong>in</strong>creas<strong>in</strong>g number of<br />
emerg<strong>in</strong>g journals and conferences have <strong>in</strong> their own way clarified the importance of<br />
the research field. However, e-<strong>government</strong> is a complex construction due to its roots<br />
<strong>in</strong> a number of related research fields. Public adm<strong>in</strong>istration, management science,<br />
organization science, <strong>in</strong>formation technology, computer science, and library and<br />
<strong>in</strong>formation science are among the <strong>in</strong>terested parties <strong>in</strong> contribut<strong>in</strong>g to the development<br />
of e-<strong>government</strong>. With the present chapter we <strong>in</strong>troduce the e-<strong>government</strong> doma<strong>in</strong>.<br />
The purpose is to provide an overview and understand<strong>in</strong>g of the doma<strong>in</strong> fram<strong>in</strong>g the<br />
PhD project. Further the presentation enables a characterization and plac<strong>in</strong>g of the<br />
thesis <strong>in</strong> the doma<strong>in</strong>. The chapter forms the first part of two of the doma<strong>in</strong> study<br />
review. We <strong>in</strong>itialize the chapter by def<strong>in</strong><strong>in</strong>g the concept of e-<strong>government</strong> and related<br />
concepts along with the purpose of digitaliz<strong>in</strong>g <strong>government</strong>s. This is followed by an<br />
overview of the steps that have and still do characterize the development with<strong>in</strong> the<br />
doma<strong>in</strong>. Models are <strong>in</strong>cluded here for a graphical presentation of different authors’<br />
perception and <strong>in</strong>terpretation of the development of the field. The chapter ends with a<br />
presentation and discussion of the research field of e-<strong>government</strong>. In this clos<strong>in</strong>g<br />
section, we focus on subject matters relevant to the PhD project as a thorough review
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
of the entire field of e-<strong>government</strong> is outside our scope. Specifically, we address<br />
<strong>in</strong>formation systems, knowledge management, and metadata <strong>in</strong>itiatives.<br />
3.1 Def<strong>in</strong>ition and purpose<br />
Numerous suggestions of what def<strong>in</strong>es the concept of e-<strong>government</strong> exist. A<br />
fairly general def<strong>in</strong>ition is put forward by Gil-Garcia & Mart<strong>in</strong>ez-Moyano (2007, p.<br />
266), who see e-<strong>government</strong> as:<br />
“The use of <strong>in</strong>formation and communication technologies <strong>in</strong> <strong>government</strong><br />
sett<strong>in</strong>gs.“<br />
However, also more detailed def<strong>in</strong>itions have been formulated, e.g., by Fang (2002, p.<br />
3-4):<br />
“- the ability to obta<strong>in</strong> <strong>government</strong> services through nontraditional electronic<br />
means, enabl<strong>in</strong>g access to <strong>government</strong> <strong>in</strong>formation and to completion of<br />
<strong>government</strong> transaction on an anywhere, any time basis and <strong>in</strong> conformance<br />
with equal access requirement.<br />
–offers potential to reshape the public sector and build relationships between<br />
citizens and the <strong>government</strong>.”<br />
Def<strong>in</strong><strong>in</strong>g the concept of e-<strong>government</strong> is not a straight forward task. A number of<br />
researchers have collected and compared several def<strong>in</strong>itions (e.g., Grönlund & Horan,<br />
2004; Robb<strong>in</strong>, Courtright & Davis, 2004; Grant & Chau, 2005; Yildiz, 2007; Hu, Pan<br />
& Wang, 2010). These examples illustrate the miss<strong>in</strong>g common understand<strong>in</strong>g of the<br />
def<strong>in</strong>ition. Today, after more than a decade of research, researchers still <strong>in</strong>quire an<br />
unambiguous def<strong>in</strong>ition (e.g., Grönlund, 2010; Hu, Pan & Wang, 2010). Overall, the<br />
difficulties are related to the content and the designation of the concept. As regards the<br />
content, a number of factors help challenge the task. One factor is the lack of<br />
agreement as to the def<strong>in</strong>ition of central concepts (Robb<strong>in</strong>, Courtright & Davis, 2004).<br />
Thus, e-<strong>government</strong> is def<strong>in</strong>ed and referred to differently, depend<strong>in</strong>g on the actual<br />
scope of research papers (Fang, 2002; Grönlund & Horan, 2004; Grant & Chau, 2005;<br />
Grönlund, 2005). In addition the multidiscipl<strong>in</strong>ary nature of the research field<br />
<strong>in</strong>creases the disagreements (Grönlund & Horan, 2004; Hovy, 2008a). The discipl<strong>in</strong>es<br />
26
27<br />
Chapter 3<br />
Figure 3.1 Discipl<strong>in</strong>es <strong>in</strong>tegrated <strong>in</strong> the multidiscipl<strong>in</strong>ary research field og e-<strong>government</strong>. Adapted<br />
from Wimmer The cont<strong>in</strong>u<strong>in</strong>g (2007, p. 14)<br />
development of the concept is a third factor (c.f., Jaeger, 2003).<br />
considered as contribut<strong>in</strong>g to the field also vary. Wimmer presents the most<br />
comprehensive number of contribut<strong>in</strong>g discipl<strong>in</strong>es <strong>in</strong> her model (see Figure 3.1).<br />
When analysed on the basis of e-<strong>government</strong> researchers’ home departments<br />
Wimmer’s model is supported (Heeks & Bailur, 2007).<br />
Secondly, e-<strong>government</strong> is tak<strong>in</strong>g place at two different levels; the micro level<br />
which concerns the technological changes tak<strong>in</strong>g place with<strong>in</strong> <strong>government</strong>s <strong>in</strong>clud<strong>in</strong>g<br />
ICT; and the macro level which refers to the <strong>in</strong>stitutional changes that are usually also a<br />
part of e-<strong>government</strong> research. The two levels are often separated, which complicates<br />
the understand<strong>in</strong>g of the concept (Meijer & Homburg, 2008). At the micro level,<br />
Grönlund (2003) dist<strong>in</strong>guishes between two fields with<strong>in</strong> e-<strong>government</strong>, one with an<br />
<strong>in</strong>ternal focus and one with external focus organizationally speak<strong>in</strong>g. Both fields imply<br />
changes <strong>in</strong> l<strong>in</strong>e with Meijer & Homburg’s (2008) two levels. The <strong>in</strong>ternal field regards<br />
the <strong>in</strong>ternal changes <strong>in</strong> <strong>government</strong>s that follow from employ<strong>in</strong>g ICT for different<br />
professional operations. This field has been developed for some decades already. The<br />
external field concerns the <strong>in</strong>creas<strong>in</strong>g availability of <strong>in</strong>ternet services aimed at external<br />
parties, e.g., citizens or enterprises (Grönlund, 2003). The ICT systems support<strong>in</strong>g the<br />
two fields are referred to as back office and front office systems respectively (e.g.,<br />
Meijer & Homburg, 2008). In this thesis we are concerned with the micro level
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Figure 3.2 Basic elements and relations <strong>in</strong> <strong>government</strong>al systems (Grönlund, 2003, p. 56)<br />
concern<strong>in</strong>g <strong>in</strong>ternal <strong>government</strong>al changes from ICT, and more specifically <strong>in</strong>formation<br />
seek<strong>in</strong>g and retrieval <strong>in</strong> relation to employees’ work task fulfilment.<br />
What can be <strong>in</strong>ferred from above is that the def<strong>in</strong>ition to some degree depends<br />
on the dist<strong>in</strong>ct references consulted. The number of related concepts does not ease the<br />
def<strong>in</strong>ition task. Consequently we will def<strong>in</strong>e the concepts and the related terms the way<br />
they are used <strong>in</strong> the present work below. We use Figure 3.2 to illustrate the concepts.<br />
The figure is a simplified model of the democratic system, which <strong>in</strong> practice is far more<br />
complex. The figure outl<strong>in</strong>es three zones; civil society, formal politics, and<br />
adm<strong>in</strong>istration and their reciprocal <strong>in</strong>teractions.<br />
Government is considered the overall notion for the concepts to follow. The<br />
concept <strong>government</strong> “covers several aspects of manag<strong>in</strong>g a country, rang<strong>in</strong>g from the<br />
very form of <strong>government</strong> to strategic management to daily operations” (Grönlund, 2003,<br />
p. 56). Others suggest <strong>government</strong> to be more focused on the political aspect yet<br />
without leav<strong>in</strong>g out the adm<strong>in</strong>istrative field. Accord<strong>in</strong>g to Beynon-Davies <strong>government</strong><br />
“connotes a political organization, which is comprised of the <strong>in</strong>dividuals and <strong>in</strong>stitutions<br />
that are authorised to formulate public policies and conduct affairs of state.<br />
Governments are normally tasked with establish<strong>in</strong>g and regulat<strong>in</strong>g the <strong>in</strong>terrelationships<br />
of <strong>in</strong>dividuals, groups and organisations with<strong>in</strong> the boundaries of some territory” (2007,<br />
p. 11). In Figure 3.2 <strong>government</strong> covers the two areas of formal politics and<br />
adm<strong>in</strong>istration. Public adm<strong>in</strong>istration denotes the sector, enterprises, and activities<br />
necessary <strong>in</strong> order to serve a <strong>government</strong> (Mar<strong>in</strong>i, 2000; Johnston, 2004). Here serv<strong>in</strong>g<br />
28
29<br />
Chapter 3<br />
implicates formulat<strong>in</strong>g, advis<strong>in</strong>g on, and implement<strong>in</strong>g <strong>government</strong>al policy, and<br />
manag<strong>in</strong>g resources. Thus, public adm<strong>in</strong>istration deals with all aspects of <strong>government</strong><br />
matters apart from the political, democratic issues. In Figure 3.2 public adm<strong>in</strong>istration<br />
covers the field referred to as adm<strong>in</strong>istration. Moreover, the thesis belongs to the<br />
adm<strong>in</strong>istration subfield, as we do not account for either formal politics or civil society.<br />
As for the designation of the concept, the literature does not offer a unique<br />
label for e-<strong>government</strong>. Examples of synonyms are digital <strong>government</strong> (e.g.,<br />
Marchion<strong>in</strong>i, Samet & Brandt, 2003), one-stop <strong>government</strong> (e.g., Glassey, 2002),<br />
eGovernment (e.g., Schellong, 2007), and onl<strong>in</strong>e <strong>government</strong> (e.g., Peres, Guzmán &<br />
Valbuena, 2009). Digital <strong>government</strong> appears to be the predom<strong>in</strong>ant term <strong>in</strong> the<br />
United States while electronic <strong>government</strong> is the preferred term elsewhere (Grönlund<br />
& Horan, 2004). Grönlund & Horan (2004) differentiate between e-<strong>government</strong> and egovernance.<br />
To illustrate the difference, they draw on Figure 3.2. In their def<strong>in</strong>ition,<br />
e-<strong>government</strong> covers adm<strong>in</strong>istration and perhaps formal politics, while e-governance<br />
embraces all three spheres. Though e-governance <strong>in</strong> this manner appears to be a<br />
broader concept, e-<strong>government</strong> as a term is more dom<strong>in</strong>at<strong>in</strong>g <strong>in</strong> the research field.<br />
Further, s<strong>in</strong>ce e-<strong>government</strong> <strong>in</strong> the def<strong>in</strong>ition of Grönlund & Horan (2004) suits the<br />
scope of the present paper well with our focus on adm<strong>in</strong>istrative <strong>government</strong>al<br />
employees, we will refer to it as e-<strong>government</strong> throughout the thesis. Further, our<br />
focus means that the operationalization of the concept is placed solely <strong>in</strong> the<br />
adm<strong>in</strong>istrative part of Figure 3.2. Due to the lack of agreement as to the term<strong>in</strong>ology,<br />
we use the predom<strong>in</strong>ant European choice of term and refer to the concept as e<strong>government</strong><br />
throughout the thesis. However, <strong>in</strong> the light of the diversities of the<br />
def<strong>in</strong>ition of the concept demonstrated above, we will draw on literature work<strong>in</strong>g with<br />
other def<strong>in</strong>itions as long as it falls with<strong>in</strong> the def<strong>in</strong>ition applied here.<br />
3.2 Subject areas <strong>in</strong> e-<strong>government</strong> research & development (R&D)<br />
The use of <strong>in</strong>formation technology <strong>in</strong> <strong>government</strong>al adm<strong>in</strong>istrations is not a<br />
new phenomenon. Rather, it has been go<strong>in</strong>g on for decades already (e.g., Kraemer &<br />
K<strong>in</strong>g, 1986; Andersen & Kraemer, 1994; Bellamy & Taylor, 1998). However, the term<br />
e-<strong>government</strong> was not <strong>in</strong>troduced until the late 1990s. The two eras have been<br />
considered divided for some time. Whether they still are, or if they are becom<strong>in</strong>g more<br />
<strong>in</strong>tegrated rema<strong>in</strong>s an issue of opposite op<strong>in</strong>ions (Grönlund & Horan, 2004; Andersen et<br />
al., 2005).
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
E-<strong>government</strong> may be seen as a natural consequence of the historical and<br />
technological development. Historically speak<strong>in</strong>g public adm<strong>in</strong>istration has from the<br />
late 1970s become <strong>in</strong>creas<strong>in</strong>gly market-oriented (Johnston & Callender, 1997; Box,<br />
1999; Johnston, 2004). In the wake of this change, the focus for public adm<strong>in</strong>istration<br />
have been on “organizational efficiency, the creation of <strong>in</strong>ternal market-style<br />
competitive conditions and the more purposive application of private-sector bus<strong>in</strong>ess<br />
techniques” (Johnston, 2004, p. 12510). Concurrently, <strong>in</strong>formation technology has<br />
developed rapidly, provid<strong>in</strong>g possibilities for technological support of the change of<br />
focus <strong>in</strong> public adm<strong>in</strong>istration<br />
In 2007, Schellong presents a modified model of the e-<strong>government</strong> hype cycle<br />
(see Figure 3.3). The model presents 2002-2003 as the po<strong>in</strong>t <strong>in</strong> time, where e<strong>government</strong><br />
peaked. The years before the peak lasted for approximately 7 years. Those<br />
years <strong>in</strong>troduced <strong>in</strong>formation sites, s<strong>in</strong>gle agencies onl<strong>in</strong>e services, and portals among<br />
other th<strong>in</strong>gs. In the period after the peak, some problematic issues needed to be dealt<br />
with, for <strong>in</strong>stance security issues and a low citizen uptake. However, this does not<br />
Figure 3.3 E-<strong>government</strong> hype cycle (Schellong, 2007)<br />
30
31<br />
Chapter 3<br />
mean, that the concept of e-<strong>government</strong> is not ongo<strong>in</strong>g anymore. However, it has rather<br />
been replaced by a more stable plateau of productivity with more advanced and<br />
technically demand<strong>in</strong>g solutions. Examples are <strong>in</strong>teroperability, enterprise architecture,<br />
and <strong>in</strong>tegrated data management (Schellong, 2007). The optimism identified <strong>in</strong> Heeks<br />
& Bailur’s (2006) <strong>in</strong>dicates a cont<strong>in</strong>ued belief <strong>in</strong> the potential of e-<strong>government</strong>.<br />
Investigat<strong>in</strong>g the potential of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods has potential to many<br />
subareas mentioned <strong>in</strong> the model.<br />
Though implement<strong>in</strong>g e-<strong>government</strong> is <strong>in</strong> focus across the world and across<br />
types of <strong>government</strong>s, the degree of implementation varies. In order to identify the<br />
stage of development, Layne & Lee (2001) have developed a four stage model (see<br />
Figure 3.4) encompass<strong>in</strong>g the technological and organizational complexity (rang<strong>in</strong>g<br />
from simple to complex) and the <strong>in</strong>tegration (sparse to complete). The model suggests<br />
that the relation between the two variables is proportional, that is, as the technological<br />
and organizational complexity <strong>in</strong>creases, so does the complexity of <strong>in</strong>tegration. The<br />
model expresses the technological level of public adm<strong>in</strong>istrations allow<strong>in</strong>g for different<br />
degrees of services to citizens. Layne & Lee take their po<strong>in</strong>t of departure <strong>in</strong> the first<br />
websites created by <strong>government</strong>s. Thus, the use of ICT before then is not reflected <strong>in</strong><br />
the model.<br />
The four steps conta<strong>in</strong>ed <strong>in</strong> Layne & Lee’s model comprises 1) Catalogue; 2)<br />
Transaction; 3) Vertical <strong>in</strong>tegration; and 4) Horizontal <strong>in</strong>tegration. The step<br />
“Catalogue” refers to the <strong>in</strong>troductory stage of e-<strong>government</strong>, where <strong>government</strong>s create<br />
websites with <strong>in</strong>formation about the <strong>government</strong>. At this step citizens and other<br />
stakeholders are helped with fact f<strong>in</strong>d<strong>in</strong>g. At this po<strong>in</strong>t <strong>in</strong> time there are different<br />
motivations for go<strong>in</strong>g onl<strong>in</strong>e. One reason is the possibility to provide external<br />
stakeholders with <strong>in</strong>formation that would otherwise have to be handled by front office<br />
employees. Another reason is the pressure and expectations from outside that<br />
<strong>in</strong>formation about the <strong>government</strong> can be found on the <strong>in</strong>ternet. At the second step,<br />
“Transaction”, we see the beg<strong>in</strong>n<strong>in</strong>g of onl<strong>in</strong>e transactions for <strong>government</strong> stakeholders.<br />
Thus, it becomes possible to carry out transactions <strong>in</strong> order to report one’s taxes and the<br />
like. The step is characterized by automation and digitalization of exist<strong>in</strong>g processes.<br />
“Vertical <strong>in</strong>tegration” is def<strong>in</strong>ed by a renovation of exist<strong>in</strong>g processes and an <strong>in</strong>creased<br />
degree of connection between <strong>government</strong> systems <strong>in</strong> order to enhance the services<br />
towards stakeholders. Also, the vertical <strong>in</strong>tegration allows for exchange of transaction
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Figure 3.4 Dimensions and stages <strong>in</strong> e-<strong>government</strong> (from Layne & Lee, 2001, p. 124)<br />
data across systems. At the f<strong>in</strong>al step, “Horizontal <strong>in</strong>tegration”, additional <strong>in</strong>tegration is<br />
developed. Aside from exchang<strong>in</strong>g data between <strong>government</strong>s, horizontal <strong>in</strong>tegration<br />
offers <strong>in</strong>tegration across <strong>government</strong> functions, e.g., <strong>in</strong> the form of one-stop services,<br />
that are able to meet the range of adm<strong>in</strong>istrative service needs follow<strong>in</strong>g from a life or<br />
bus<strong>in</strong>ess <strong>in</strong>cident (cf. Gouscos et al., 2003). It should be noted, that Layne & Lee’s<br />
model most likely differs a lot across countries. For <strong>in</strong>stance, the latest report from<br />
United Nations (2012) demonstrates geographical differences as to e-<strong>government</strong><br />
implementation levels. Also, Gil-Garcia & Mart<strong>in</strong>ez-Moyano (2007) hypothesize that<br />
the evolution of e-<strong>government</strong> depends on whether the context is at national, state, or<br />
local level <strong>in</strong>dicat<strong>in</strong>g, that e-<strong>government</strong> <strong>in</strong>itiatives start at national level and are s<strong>in</strong>ce<br />
followed up at state and local level of <strong>government</strong>. The geographical location and level<br />
of <strong>government</strong> will probably not have significant <strong>in</strong>fluence on the succession of the<br />
32
33<br />
Chapter 3<br />
steps but may result <strong>in</strong> differentiated grad<strong>in</strong>g <strong>in</strong> the model. As one among a number of<br />
e-<strong>government</strong> forerunners, Denmark is placed <strong>in</strong> the upper right corner of Layne &<br />
Lee’s model. To exemplify, a recent <strong>in</strong>vestigation among Danish municipal IT<br />
managers showed some prevalence of horizontal <strong>in</strong>tegration between <strong>in</strong>formation<br />
systems (Nielsen et al., 2009). Investigat<strong>in</strong>g automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> is <strong>in</strong> pr<strong>in</strong>ciple useful at<br />
all stages of the model.<br />
The expected outcome of digitaliz<strong>in</strong>g <strong>government</strong>s is impressive. Two ma<strong>in</strong><br />
potentials are cont<strong>in</strong>ual; changes <strong>in</strong> the communication between <strong>government</strong> and civil<br />
society, and more efficient work processes <strong>in</strong>ternally <strong>in</strong> <strong>government</strong>s. Or <strong>in</strong> a more<br />
simple form, the purpose of e-<strong>government</strong> is to deliver “<strong>government</strong> that works better<br />
and costs less” (Office of the Vice President, 1993, cited from Bellamy, 2002, p. 214).<br />
Thus, by <strong>in</strong>troduc<strong>in</strong>g e-<strong>government</strong>, <strong>government</strong>s aim to offer improved access to their<br />
services, make the most of their resources or perhaps even be able to reduce costs, and<br />
enhance democracy by improv<strong>in</strong>g the access to <strong>government</strong> employees and offer<strong>in</strong>g edemocracy<br />
(Edmiston, 2003). With the <strong>in</strong>troduction of e-<strong>government</strong>, a shift may be<br />
observed from <strong>government</strong> centric services towards more citizen (or other<br />
stakeholders) centred services. At the same time, the transparency of <strong>government</strong>al<br />
work is <strong>in</strong>tended to <strong>in</strong>crease along with the level of service. Thus, allow<strong>in</strong>g citizens to<br />
have access to <strong>government</strong> day and night is considered one way of <strong>in</strong>creas<strong>in</strong>g the level<br />
of service towards citizens (Bellamy, 2002). In consequence of this, researchers have<br />
started to <strong>in</strong>vestigate for <strong>in</strong>stance applications, changes <strong>in</strong> the adm<strong>in</strong>istrations and<br />
<strong>in</strong>teraction between <strong>government</strong> and civil society.<br />
The development of <strong>government</strong> processes, organization and technologies has<br />
been expected to change the work tasks of <strong>government</strong> employees. Before the dawn of<br />
e-<strong>government</strong> the concern about <strong>in</strong>formation technology and computerization of<br />
<strong>government</strong>s to a large extent regarded employment (e.g., Kraemer & Dedrick, 1997).<br />
Changes are still expected as a consequence of digitaliz<strong>in</strong>g <strong>government</strong>s. However,<br />
today the use of <strong>in</strong>formation technology is rather expected to affect the composition of<br />
work tasks for <strong>government</strong>al employees (Snellen, 2002; Dörfler, 2003; Marchion<strong>in</strong>i,<br />
Samet & Brandt, 2003; Brown, 2005; Landsforen<strong>in</strong>gen af Kommunale Servicecentre,<br />
2005; Mahler & Regan, 2005). This is supported by research based suggestions for<br />
process models that can support <strong>government</strong>s’ way of handl<strong>in</strong>g work tasks (e.g.,<br />
Palkovits, Woitsch & Karagiannis, 2003; Becker, Pfeiffer & Räckers, 2007).<br />
In 2005, the Danish National Association of Municipal Service Centres<br />
predicted a change <strong>in</strong> the work tasks of municipal e-<strong>government</strong> employees. Thus, due
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
to self-service solutions the number of complex situations was expected to <strong>in</strong>crease,<br />
because the citizens are tak<strong>in</strong>g care of more simple tasks themselves. Further, the<br />
share of tasks related to assist<strong>in</strong>g citizens, who are not able to use self-service were<br />
also expected to <strong>in</strong>crease (Landsforen<strong>in</strong>gen af Kommunale Servicecentre, 2005). This<br />
expectation is consistent with the results Grundén found when <strong>in</strong>terview<strong>in</strong>g employees<br />
<strong>in</strong> a Swedish County Adm<strong>in</strong>istration (2009). Here the need for assistance was<br />
expla<strong>in</strong>ed by the digital divide of the customers of the <strong>government</strong>. Also, Mahler &<br />
Regan (2005) expect changes due to digitalization of federal <strong>government</strong> agencies.<br />
Their conclusions are based on qualitative <strong>in</strong>terviews with agency staff <strong>in</strong> agencies<br />
with either strong or weak <strong>in</strong>ternet presence. They f<strong>in</strong>d that the expected <strong>in</strong>crease of<br />
compla<strong>in</strong>ts from citizens does not actually occur. Further, some, but not all citizens are<br />
able to f<strong>in</strong>d needed <strong>in</strong>formation on the agency websites and avert casework for the<br />
agency. In relation to the present work, this means that we cannot assume <strong>government</strong><br />
work tasks to have rema<strong>in</strong>ed unchanged. The possible change of tasks may also<br />
<strong>in</strong>fluence the <strong>in</strong>formation needs developed <strong>in</strong> terms of complexity (cf. Byströms<br />
f<strong>in</strong>d<strong>in</strong>gs, see section 4.4.5). As a consequence we cannot design a search test based on<br />
older seek<strong>in</strong>g studies <strong>in</strong> the doma<strong>in</strong> without further ado, at least as regards <strong>in</strong>formation<br />
needs. A validation of their cont<strong>in</strong>uous relevance will be needed.<br />
3.3 Stakeholders <strong>in</strong> e-<strong>government</strong><br />
The amount of e-<strong>government</strong> research constantly <strong>in</strong>creases. Reviews of the<br />
literature have proposed different ways of categoriz<strong>in</strong>g the research <strong>in</strong> order to<br />
systematize the research conducted. A common way of characteris<strong>in</strong>g studies of e<strong>government</strong><br />
is to divide the research as to the relation they express. Thus, a relation<br />
between one (or several) <strong>government</strong>s and a stakeholder are predom<strong>in</strong>antly articulated.<br />
The literature has suggested a number of different relations (e.g., Fang, 2002; Beynon-<br />
Davies, 2007). The primary emphasis <strong>in</strong> the e-<strong>government</strong> literature has been on<br />
citizens, bus<strong>in</strong>esses, and <strong>government</strong>s. The relations <strong>in</strong>dicate the <strong>government</strong> as the<br />
key communicator towards different recipient groups. This is stressed by the common<br />
way of denot<strong>in</strong>g the relations as G2C (<strong>government</strong>-to-citizen), G2B (<strong>government</strong>-tobus<strong>in</strong>ess),<br />
G2G (<strong>government</strong>-to-<strong>government</strong>) and so forth. This way of referr<strong>in</strong>g to the<br />
relations is <strong>in</strong>spired by the field of e-commerce, where B2B and B2C are common<br />
designations for bus<strong>in</strong>ess-to-bus<strong>in</strong>ess and bus<strong>in</strong>ess-to-consumer.<br />
34
1 People as service users<br />
2 People as citizens<br />
3 Bus<strong>in</strong>esses<br />
4 Small-to-medium sized enterprises<br />
5 Public adm<strong>in</strong>istrators (employees)<br />
6 Other <strong>government</strong> agencies<br />
7 Non-profit organizations<br />
8 Politicians<br />
9 E-Government project managers<br />
10 Design and IT developers<br />
11 Suppliers and partners<br />
12 Researchers and evaluators<br />
Table 3.1 Stakeholders <strong>in</strong> e-<strong>government</strong>. Adapted from Rowley (2011, p. 56)<br />
35<br />
Chapter 3<br />
The underly<strong>in</strong>g thought about e-<strong>government</strong> stakeholders is that their<br />
respective relations to <strong>government</strong>s differ as to their characteristics. Thus,<br />
<strong>government</strong>s cannot necessarily communicate the same way across different<br />
stakeholders. In her literature review of relations, Rowley proposes a thorough<br />
typology of stakeholders (see Table 3.1). It is stressed that stakeholders must be<br />
characterized as to the roles they play rather than as to the groups they form.<br />
Highlight<strong>in</strong>g roles <strong>in</strong> advance of groups allow for <strong>in</strong>dividuals and organizations to take<br />
different roles depend<strong>in</strong>g on the current situation, they engage <strong>in</strong>. The purpose of<br />
elaborat<strong>in</strong>g a typology of stakeholders is to be able to identify characteristics of<br />
specific stakeholders and allow for comparisons (Rowley, 2011). Further, the typology<br />
enables a more specific address<strong>in</strong>g of stakeholders, when their specific characteristics<br />
are described. In the present work we are concerned with one particular type of<br />
stakeholders, namely public adm<strong>in</strong>istrators (<strong>government</strong> employees). Rowley’s<br />
division of stakeholders just emphasizes that stakeholder groups differ. As a<br />
consequence seek<strong>in</strong>g behaviour identified <strong>in</strong> other stakeholder groups are not<br />
necessarily representative for the behaviour tak<strong>in</strong>g place among employees.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
3.4 LIS perspectives on e-<strong>government</strong><br />
Above we made a general <strong>in</strong>troduction to the concept of e-<strong>government</strong>.<br />
However two core LIS areas with<strong>in</strong> the <strong>government</strong> context make an important frame<br />
of reference to our further work. The areas comprise <strong>in</strong>formation systems, knowledge<br />
management and metadata schemas and standards. Further, s<strong>in</strong>ce our overall<br />
perspective is on employees, this perspective will also guide the presentation of LIS<br />
subject areas.<br />
3.4.1 Information systems<br />
The number of <strong>in</strong>formation systems <strong>in</strong> e-<strong>government</strong> is impressive. Bekkers &<br />
Homburg refer to the amount as “myriad registration functions” (2007, p. 374). The<br />
<strong>in</strong>formation systems are highly diverse <strong>in</strong> their nature and content (cf. Veal, 2001; Liu,<br />
Zhu & Gorton, 2007). Content <strong>in</strong>cludes for <strong>in</strong>stance statistical <strong>in</strong>formation,<br />
geographical <strong>in</strong>formation, legal materials, and <strong>in</strong>formation related to specific cases (e.g.,<br />
Bountouri et al., 2009). In addition, <strong>in</strong>formation systems are often designed with a<br />
specific adm<strong>in</strong>istration <strong>in</strong> m<strong>in</strong>d, <strong>in</strong> addition perhaps developed by the adm<strong>in</strong>istration<br />
itself (M<strong>in</strong>istry of f<strong>in</strong>ance, 2001). Also the designations applied to refer to types of<br />
<strong>in</strong>formation systems are remarkable. A thorough presentation of all system types is a<br />
comprehensive task and beyond the scope of the thesis. Instead we will present<br />
different ways of typologiz<strong>in</strong>g e-<strong>government</strong> <strong>in</strong>formation systems below. The purpose<br />
is to identify exist<strong>in</strong>g types of systems and to provide a context for the characterization<br />
of the system that is the subject for the search test <strong>in</strong> the empirical part of our work.<br />
The types of systems may be divided as to different characteristics. One way<br />
of characteriz<strong>in</strong>g <strong>in</strong>formation systems is as to whether they are back or front office<br />
systems. Front office systems are systems directed towards the customers of e<strong>government</strong>;<br />
citizens, bus<strong>in</strong>esses, and external organizations to the <strong>government</strong><br />
(Millard, 2003). Examples are citizen portals such as www.borger.dk or<br />
www.direct.gov.uk or bus<strong>in</strong>ess portals like www.virk.dk. A front end service <strong>in</strong> the<br />
form of a front end system is a product of the <strong>in</strong>troduction of e-<strong>government</strong>. Obviously,<br />
<strong>government</strong>s have always been communicat<strong>in</strong>g with citizens and bus<strong>in</strong>esses but<br />
previous to the <strong>in</strong>troduction of portals and other front end systems the communication<br />
took place <strong>in</strong> contact offices or through call centres (Codagnone & Wimmer, 2007).<br />
Back office processes are processes <strong>in</strong>ternal to the <strong>government</strong> <strong>in</strong> question. Back office<br />
processes comprises general management and account<strong>in</strong>g, but also process<strong>in</strong>g of<br />
36
37<br />
Chapter 3<br />
customers’ applications (Codagnone & Wimmer, 2007). Back office systems, then, is<br />
the designation for systems that supports <strong>in</strong>ternal processes of very diverg<strong>in</strong>g k<strong>in</strong>d.<br />
Further, the systems deliver the data communicated through front end systems. Back<br />
office systems themselves are commonly not visible to the <strong>government</strong> customers.<br />
Back end systems have been applied <strong>in</strong> <strong>government</strong>al adm<strong>in</strong>istrations for decades<br />
already. As we are test<strong>in</strong>g a back office system <strong>in</strong> the search test, we will focus on this<br />
type of systems below.<br />
Van de Donk & Snellen (1989) have presented a typology of knowledge based<br />
systems that is usable for discrim<strong>in</strong>at<strong>in</strong>g back office systems further. The suggested<br />
typology has been developed with<strong>in</strong> the doma<strong>in</strong> of public adm<strong>in</strong>istration. The<br />
background for the typologization is based on the elements that make up expertise <strong>in</strong><br />
comparison to laymen:<br />
“1. encyclopedic knowledge of facts and relationships concern<strong>in</strong>g a certa<strong>in</strong><br />
field;<br />
2. proficient reason<strong>in</strong>g as the basis of a diagnosis;<br />
3. practical short-circuit reason<strong>in</strong>g to arrive at a diagnosis;<br />
4. proficient reason<strong>in</strong>g as the basis for a solution;<br />
5. practical short-circuit reason<strong>in</strong>g to arrive at a solution” (van de Donk &<br />
Snellen, 1989, p. 4).<br />
In particular 3 and 5 differentiate the expert from the layman. On the basis of these<br />
characteristics three types of knowledge systems are suggested: Handl<strong>in</strong>g systems,<br />
advisory systems, and expert systems. Handl<strong>in</strong>g systems embrace items 1, 2, and 4<br />
above. Handl<strong>in</strong>g systems conta<strong>in</strong> facts related to specific cases. Cases are handled by<br />
be<strong>in</strong>g placed <strong>in</strong> a category, of which solutions are known or diagnoses can be made (van<br />
de Donk & Snellen, 1989). A core example of handl<strong>in</strong>g systems are electronic records<br />
management systems (ERMS) (also known as electronic document management<br />
systems (EDMS)), that support creation, captur<strong>in</strong>g, process<strong>in</strong>g, shar<strong>in</strong>g, and manag<strong>in</strong>g<br />
organizations’ records or documents (Gunnlaugsdottir, 2008; Hu et al., 2010). Advisory<br />
systems embrace items 1, 2, 3, and 4 above. Thus, compared to handl<strong>in</strong>g systems the<br />
possibility of arriv<strong>in</strong>g at a diagnosis for a problem makes the difference between the<br />
two system types. Advisory systems are useful, e.g., when there is uncerta<strong>in</strong>ty about the<br />
facts of a case or when the needed qualifications for reach<strong>in</strong>g a decision are vague.<br />
(van de Donk & Snellen, 1989). Expert systems <strong>in</strong> pr<strong>in</strong>ciple conta<strong>in</strong> all five elements<br />
mentioned above. Thus, they are also able to help users to arrive at solutions for
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
problems. Several characteristics differ advisory systems from expert systems. One<br />
ma<strong>in</strong> difference between the two systems types is that advisory systems support users’<br />
own decisions by provid<strong>in</strong>g access to data and models while expert systems offers<br />
decisions and conclusions. This is what leads Ford (1985, p. 26) to characterize<br />
advisory systems as more flexible than expert systems. In terms of Van de Donk &<br />
Snellen, the present system of <strong>in</strong>vestigation may be characterized as an expert system,<br />
as it conta<strong>in</strong>s documents support<strong>in</strong>g the professional and legal basis for the employees.<br />
Also organizational <strong>in</strong>formation is conta<strong>in</strong>ed, while <strong>in</strong>formation related to specific cases<br />
are stored <strong>in</strong> other systems.<br />
Saxena & Aly (1995) embrace a wider variety of systems <strong>in</strong> their typology.<br />
The context of their work is public adm<strong>in</strong>istration <strong>in</strong>clud<strong>in</strong>g policy plann<strong>in</strong>g, policy<br />
implementation, and policy adm<strong>in</strong>istration. The typology counts:<br />
1. Adm<strong>in</strong>istrative process<strong>in</strong>g systems (APS): are able to process large amounts<br />
of data <strong>in</strong> order to support adm<strong>in</strong>istrative rout<strong>in</strong>es, typically <strong>in</strong> the form of<br />
statistical compilation systems or transaction process<strong>in</strong>g systems.<br />
2. Management report<strong>in</strong>g systems (MRS): offer <strong>in</strong>formation for management for<br />
rout<strong>in</strong>e, structured, and expected decisions. Compared to APS, who are more<br />
oriented towards data and efficiency, MRS is rather characterized by<br />
<strong>in</strong>formation and effectiveness.<br />
3. Decision support systems (DSS): assist users <strong>in</strong> decision mak<strong>in</strong>g by offer<strong>in</strong>g<br />
technological support <strong>in</strong> order for users to become able to develop <strong>in</strong>dividual<br />
decision models, databases, and report formats.<br />
4. Group decision support systems (GDSS): are the group equivalent to DSS.<br />
GDSS are commonly used to refer to systems that support group work such as<br />
communication, <strong>in</strong>formation shar<strong>in</strong>g, generation of ideas and so forth.<br />
5. Executive support systems (ESS): As <strong>in</strong>dicated by the name these systems<br />
offers top executive direct access to management reports, <strong>in</strong>formation and<br />
mail services without the connect<strong>in</strong>g l<strong>in</strong>k of an <strong>in</strong>termediary of some sort.<br />
6. Expert systems (ES): <strong>in</strong>itiates human processes of reason<strong>in</strong>g <strong>in</strong> a form, that<br />
could also be handled by human experts. In other words, ES are able to<br />
supplement or even replace human experts. Expert systems take the form of<br />
either handl<strong>in</strong>g systems or advisory systems (cf. van de Donk & Snellen,<br />
1989) (Saxena & Aly, 1995, p. 280-281).<br />
38
39<br />
Chapter 3<br />
One may question Saxena & Aly’s <strong>in</strong>terpretation of handl<strong>in</strong>g systems and advisory<br />
systems as examples of ES. In the <strong>in</strong>troduction made by van de Donk & Snellen<br />
(1989) we rather see handl<strong>in</strong>g systems as an example of APS and advisory systems as<br />
equivalent to either DSS or GDSS. This is the reason for our overall plac<strong>in</strong>g of the test<br />
system as an ES, also <strong>in</strong> terms of Saxena & Aly, though on the basis of their<br />
description of the system type.<br />
The application of systems depends on whether the context of use is policy<br />
plann<strong>in</strong>g, implementation, or adm<strong>in</strong>istration. Here we are concerned with public<br />
adm<strong>in</strong>istration <strong>in</strong> the form of policy adm<strong>in</strong>istration. Accord<strong>in</strong>g to Saxena & Aly, the<br />
relevant systems for this sub area are APS: transaction process<strong>in</strong>g systems (TPS),<br />
transaction summary <strong>in</strong>formation (TPS-TSI) and detailed transaction lists (TPS-DTI);<br />
DSS, and ES. However, one must keep <strong>in</strong> m<strong>in</strong>d, that it is a complex assignment to put<br />
forward an unequivocal typology due to the great variety of tasks carried out by public<br />
adm<strong>in</strong>istrations even with<strong>in</strong> policy adm<strong>in</strong>istration. The actual system use <strong>in</strong> a real life<br />
adm<strong>in</strong>istration may thus differ as to the typology. To draw a parallel to our<br />
characterization of the test system above, the system also conta<strong>in</strong>s <strong>in</strong>formation that is<br />
not necessarily ES oriented as just mentioned.<br />
In accordance with the focus on efficiency and effectiveness <strong>in</strong> e-<strong>government</strong><br />
<strong>in</strong>itiatives and systems obviously need to be evaluated with the purpose of justification.<br />
Thus, <strong>in</strong>formation systems need to function as <strong>in</strong>tended <strong>in</strong> order to be able to support<br />
efficiency and effectiveness. Evaluation may help discover <strong>in</strong>expediencies <strong>in</strong> the<br />
system, but also to <strong>in</strong>form the developers on the strengths and weaknesses of the<br />
system as regards users’ use of the system. Evaluation consequently constitutes a<br />
rather <strong>in</strong>evitable direction <strong>in</strong> the e-<strong>government</strong> literature on <strong>in</strong>formation systems. The<br />
literature on evaluation takes two forms. One is concerned with evaluation of specific<br />
systems. The other represents a methodological perspective, support<strong>in</strong>g researchers<br />
with tools for evaluat<strong>in</strong>g either prototypes or systems already <strong>in</strong> use. Evaluation of<br />
specific systems is either carried out when a new system is proposed or when the<br />
system has been <strong>in</strong> function for some time. Examples are Floropoulos et al.’s (2010)<br />
evaluation of the Greek Tax Information system (TAXIS) from an employee<br />
perspective, Hu et al.’s (2010) evaluation of agency satisfaction with an ERMS, and
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Quam’s (2001) exam<strong>in</strong>ation of citizens’ use of Bridges 1 . The LIS literature has<br />
outl<strong>in</strong>ed directions for and analyses of system evaluation (e.g., Robertson & Hancock-<br />
Beaulieu, 1992; Kelly, 2009). But core e-<strong>government</strong> also offers methods for<br />
evaluation. For <strong>in</strong>stance Goh et al. (2008) have developed a checklist that can be used<br />
to evaluate the degree of knowledge management <strong>in</strong> e-<strong>government</strong> portals. The<br />
evaluation carried out <strong>in</strong> the search test to follow has been designed with established<br />
LIS evaluation methods as the foundation. A prototype is tested, that is, the system<br />
had not been <strong>in</strong> function among the employees at SKAT at the time of the test<strong>in</strong>g.<br />
3.4.2 Knowledge management<br />
Knowledge management designates the process of identify<strong>in</strong>g and controll<strong>in</strong>g<br />
organizational knowledge <strong>in</strong> order to support the competitiveness of bus<strong>in</strong>esses (de<br />
Groot, 2003). The attempts to manage knowledge <strong>in</strong> organizations have arose from<br />
problems with ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g, locat<strong>in</strong>g and apply<strong>in</strong>g knowledge <strong>in</strong> a systematic manner<br />
(Alavi & Leidner, 2001). Competitiveness may not be a core issue <strong>in</strong> public<br />
organizations as such. However, a clear parallel exists between the measurements of<br />
private sector outcome <strong>in</strong> the form of competitiveness and public sector measurements<br />
of effectiveness. This is reflected <strong>in</strong> the literature on knowledge management <strong>in</strong> e<strong>government</strong>.<br />
Thus, though orig<strong>in</strong>at<strong>in</strong>g from private sector bus<strong>in</strong>esses, knowledge<br />
management have been widely adopted <strong>in</strong> the public sector. In spite of fundamental<br />
differences between the goals of private and public organizations, knowledge<br />
management also has the potential of improv<strong>in</strong>g effectiveness, efficiency, and consumer<br />
satisfaction <strong>in</strong> a <strong>government</strong> context (Ha & Zenebe, 2008). These benefits are very<br />
much <strong>in</strong> l<strong>in</strong>e with the desired outcome of e-<strong>government</strong> (cf. Goh et al., 2008; Ha &<br />
Zenebe, 2008). Includ<strong>in</strong>g knowledge management as one of the future oriented themes<br />
po<strong>in</strong>ted out by eGovRTD2020 2 reflects the relevance of the concept to e-<strong>government</strong><br />
(cf. Dawes, 2009).<br />
However, <strong>government</strong>s are usually organized <strong>in</strong> a more complex manner than<br />
bus<strong>in</strong>esses. This larger degree of complexity may affect the realization of knowledge<br />
management (Ha & Zenebe, 2008). Conversely, the complexity may underp<strong>in</strong> the need<br />
1 M<strong>in</strong>nesota’s Gateway to Environmental Information (http://www.bridges.state.mn.us/, accessed on 19-<br />
06-2012).<br />
2 eGovRTD2020 is a research project funded by the EU with the purpose of<br />
40
41<br />
Chapter 3<br />
of a systematic way of handl<strong>in</strong>g organizational knowledge by mak<strong>in</strong>g visible knowledge<br />
that would otherwise be hidden. In this respect, work tasks that cross <strong>government</strong><br />
boundaries comprise a particular challenge (cf. Peel & Rowley, 2010). De Groot (2003,<br />
p. 95) accumulates the results of not be<strong>in</strong>g able to access employees’ knowledge to be:<br />
“...knowledge is available only to small group of people, [k]nowledge is often not<br />
available to the people who need certa<strong>in</strong> knowledge, [and] [e]mployees are overloaded<br />
with irrelevant <strong>in</strong>formation”. Also more tangible factors like f<strong>in</strong>ancial and time<br />
constra<strong>in</strong>ts may dare the realization of knowledge management <strong>in</strong> <strong>government</strong>s<br />
(Hazlett, McAdam & Beggs, 2008). However, as Southon, Todd & Seneque’s (2002)<br />
study of two private and one public organization shows, the management of knowledge<br />
can also be challenged <strong>in</strong> private organizations.<br />
Knowledge management is fundamentally a construct of organization theory.<br />
Knowledge management concerns both tacit knowledge and explicit knowledge<br />
Table 3.2 Knowledge management processes and the potential role of IT. Adapted from Alavi &<br />
Leidner (2001, p. 125)<br />
KM processes Knowledge<br />
creation<br />
Support<strong>in</strong>g<br />
<strong>in</strong>formation<br />
technologies<br />
Data m<strong>in</strong><strong>in</strong>g<br />
Learn<strong>in</strong>g<br />
tools<br />
IT enables Comb<strong>in</strong><strong>in</strong>g<br />
new sources<br />
of knowledge<br />
Just <strong>in</strong> time<br />
learn<strong>in</strong>g<br />
Knowledge<br />
storage and<br />
retrieval<br />
Electronic<br />
bullet<strong>in</strong> boards<br />
Knowledge<br />
repositories<br />
Databases<br />
Support of<br />
<strong>in</strong>dividual and<br />
organizational<br />
memory<br />
Inter-group<br />
knowledge<br />
access<br />
Knowledge<br />
transfer<br />
Electronic<br />
bullet<strong>in</strong> boards<br />
Discussion<br />
forums<br />
Knowledge<br />
directories<br />
More extensive<br />
<strong>in</strong>ternal network<br />
More<br />
communication<br />
channels<br />
available<br />
Faster access to<br />
knowledge<br />
sources<br />
Knowledge<br />
application<br />
Expert systems<br />
Workflow<br />
systems<br />
Knowledge can<br />
be applied <strong>in</strong><br />
many locations<br />
More rapid<br />
application of<br />
new knowledge<br />
through<br />
workflow<br />
automation
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
manifested <strong>in</strong> some k<strong>in</strong>d physical form, usually as documents (Cong & Pandya, 2003).<br />
As regards the latter type, <strong>in</strong>formation systems <strong>in</strong> the form of knowledge management<br />
systems are commonly applied to support the process (Abecker et al., 1998; Alavi &<br />
Leidner, 2001; Mart<strong>in</strong>, 2008). Employees <strong>in</strong> <strong>government</strong>s and <strong>government</strong><br />
“customers”; citizens (cf. Yang et al., 2006), but also bus<strong>in</strong>esses, <strong>in</strong>terest groups and the<br />
like may earn the benefits of <strong>government</strong> knowledge management.<br />
Groupware, communication technologies, and specifically <strong>in</strong>tranets are<br />
important ICT based tools <strong>in</strong> terms of mediat<strong>in</strong>g knowledge management (Alavi &<br />
Leidner, 2001, p. 125). Four processes constitute knowledge management: knowledge<br />
creation, knowledge storage and retrieval, knowledge transfer, and knowledge<br />
application (see Table 3.2). The search test system takes the form of an <strong>in</strong>tranet with<br />
different functions. First of all, <strong>in</strong> terms of Alavi & Leidner, the system is a repository<br />
of knowledge. Both organizational and specialist knowledge is conta<strong>in</strong>ed. However,<br />
also sub portal regard<strong>in</strong>g topics of relevance to the employees are conta<strong>in</strong>ed. Therefore,<br />
storage and retrieval, transfer and application of knowledge are supported by the<br />
system. However, <strong>in</strong> the search test we are primarily concerned with the support of<br />
retrieval of knowledge, <strong>in</strong>clud<strong>in</strong>g how both <strong>in</strong>dividual and organizational memory are<br />
supported. F<strong>in</strong>d<strong>in</strong>gs may be made as to knowledge transfer and knowledge application,<br />
but they are not the object of our <strong>in</strong>vestigations.<br />
3.4.3 ICT tools: Metadata <strong>in</strong>itiatives<br />
On way of support<strong>in</strong>g <strong>in</strong>formation retrieval is to mark up pieces of <strong>in</strong>formation<br />
by means of metadata The pr<strong>in</strong>ciple of describ<strong>in</strong>g and represent<strong>in</strong>g <strong>in</strong>formation units <strong>in</strong><br />
order to be able to retrieve known items, explore new ones, and establish relations<br />
between items reaches far back <strong>in</strong> LIS (cf., Haynes, 2004). Referr<strong>in</strong>g to the assigned<br />
data as metadata came <strong>in</strong>to play along with the <strong>in</strong>troduction of electronic resources (El-<br />
Sherb<strong>in</strong>i & Klim, 2004). Thus, one of the first <strong>in</strong>cidences of the term metadata appears<br />
<strong>in</strong> the beg<strong>in</strong>n<strong>in</strong>g of the 1990’es (Gilliland-Swetland, 2005).<br />
Information units can be characterized as to their content, context, and<br />
structure. The content expresses what the <strong>in</strong>formation is about. The content is<br />
considered <strong>in</strong>tr<strong>in</strong>sic to the <strong>in</strong>formation unit. The context on the other hand is considered<br />
extr<strong>in</strong>sic to the <strong>in</strong>formation and is associated with the creation of the <strong>in</strong>formation. Whquestions<br />
may help mapp<strong>in</strong>g the contextual issues of the <strong>in</strong>formation. The structure of<br />
the <strong>in</strong>formation may be either <strong>in</strong>tr<strong>in</strong>sic or extr<strong>in</strong>sic or both and expresses formal<br />
associations <strong>in</strong>side one <strong>in</strong>formation unit or across several units (Gilliland, 2008). The<br />
42
43<br />
Chapter 3<br />
purpose of add<strong>in</strong>g metadata is to be able to “arrange, describe, track and otherwise<br />
enhance access to <strong>in</strong>formation objects” (Gilliland, 2008, p. 2). NISO (2004) applies a<br />
slightly different tripartition to characterize metadata. NISO divides metadata <strong>in</strong>to<br />
descriptive, structural, and adm<strong>in</strong>istrative metadata. Here descriptive metadata<br />
describes the <strong>in</strong>formation unit <strong>in</strong> order to support discovery and identification.<br />
Structural metadata has the purpose of <strong>in</strong>dicat<strong>in</strong>g the relation between compound<br />
objects such as the order<strong>in</strong>g of pages to form chapters. Adm<strong>in</strong>istrative metadata<br />
supports the management of resources by <strong>in</strong>form<strong>in</strong>g about creation, file type, technical<br />
<strong>in</strong>formation and access <strong>in</strong>formation. The difference of perspective between Gilliland<br />
and NISO is caused by their difference of application. Gilliland’s tripartition is aimed<br />
at the LIS sector, while NISO rather is applied for <strong>in</strong>teroperability and other technically<br />
oriented contexts. Haynes (2004) suggests a further elaboration on the purpose of<br />
metadata and identifies five core areas of application: 1) resource description, 2)<br />
<strong>in</strong>formation retrieval, 3) management of <strong>in</strong>formation, 4) rights management, ownership<br />
and authenticity, and 5) <strong>in</strong>teroperability and e-commerce. The extent of Haynes’<br />
identification thus appears more thorough <strong>in</strong> that it comprises the perspectives of<br />
Gilliland and NISO at the same time.<br />
Metadata formats differ as to their level of complexity. At the lowest level of<br />
complexity we f<strong>in</strong>d full text <strong>in</strong>dexes based on the documents conta<strong>in</strong>ed <strong>in</strong> the <strong>in</strong>dexed<br />
<strong>in</strong>formation system. Full text <strong>in</strong>dexes at the lowest level due to the lack of structure <strong>in</strong><br />
the metadata. The next level of complexity conta<strong>in</strong>s simple, structured formats. This<br />
medium level does not necessarily require professionals for metadata assignment. An<br />
example is the Dubl<strong>in</strong> Core metadata standard designed for mark-up of <strong>in</strong>ternet<br />
resources. The highest level of complexity standards conta<strong>in</strong>s more detailed and<br />
structured standards. Examples are doma<strong>in</strong> specific standards that aim at characteriz<strong>in</strong>g<br />
the <strong>in</strong>formation units <strong>in</strong> a more detailed manner as for example the MARC format<br />
(Dempsey & Heery, 1998). In the most complex group of standards the assignment of<br />
metadata requires a thorough knowledge of the format. Hence it cannot be carried out<br />
by novices.<br />
Metadata developed for specific doma<strong>in</strong>s are referred to as doma<strong>in</strong>-specific<br />
metadata. Doma<strong>in</strong>-specific metadata have been developed for various fields such as<br />
museums, archives, and mov<strong>in</strong>g pictures (e.g., Vellucci, 1998; Haynes, 2004).<br />
However, also with<strong>in</strong> e-<strong>government</strong> metadata has received quite some attention as a<br />
means of improv<strong>in</strong>g access to <strong>government</strong>al <strong>in</strong>formation. Metadata is considered<br />
particularly challeng<strong>in</strong>g <strong>in</strong> e-<strong>government</strong> due to the heterogeneity of the user group
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
(Alasem, 2009). A number of <strong>in</strong>fluential nations have developed standards for metadata<br />
with the aim of support<strong>in</strong>g e-<strong>government</strong>.<br />
The forerunner for the <strong>in</strong>troduction of <strong>government</strong> metadata <strong>in</strong>itiatives was the<br />
global <strong>in</strong>formation locator service (GILS) <strong>in</strong>itiative presented <strong>in</strong> the early 1990es. The<br />
<strong>in</strong>tention beh<strong>in</strong>d GILS was to outl<strong>in</strong>e a standard for localiz<strong>in</strong>g <strong>in</strong>formation that was<br />
applicable to different doma<strong>in</strong>s <strong>in</strong>clud<strong>in</strong>g <strong>government</strong>s. GILS is based on a set of<br />
metadata <strong>in</strong> order to support semantic mapp<strong>in</strong>g, locator records, and <strong>in</strong>teroperability.<br />
The <strong>in</strong>spiration for GILS is to a large extent <strong>in</strong>spired by the pr<strong>in</strong>ciples of bibliographic<br />
catalogu<strong>in</strong>g (Christian, 1999, 2001). In the United States a large project, <strong>government</strong><br />
<strong>in</strong>formation locator service (also referred to as GILS 3 ), was <strong>in</strong>itiated <strong>in</strong> the mid-1990es<br />
(Andrews & Duhon, 1997; Moen, 2001). The stepp<strong>in</strong>g stone for the project was a<br />
politically based decision about paper reduction <strong>in</strong> the United States. The <strong>government</strong><br />
<strong>in</strong>formation locator service was heavily based on the GILS. The service was<br />
thoroughly evaluated <strong>in</strong> 1996-1997 with a number of purposes, among other th<strong>in</strong>gs<br />
understand<strong>in</strong>g how GILS worked as a tool for <strong>in</strong>formation resources management and<br />
how GILS served different user groups (Moen & McClure, 1997). The evaluation<br />
<strong>in</strong>dicated that the level of implementation at the time of the evaluation still left room for<br />
improvement. In particular, the implementation was uneven and diverse <strong>in</strong> nature <strong>in</strong> the<br />
adm<strong>in</strong>istrations selected as evaluation units, which is hardly surpris<strong>in</strong>g the span of time<br />
taken <strong>in</strong>to account. Thus, the problems were not necessarily caused by <strong>in</strong>expediencies<br />
<strong>in</strong> the service itself but rather by local characteristics of the adm<strong>in</strong>istrations.<br />
The development of specific e-<strong>government</strong> metadata has cont<strong>in</strong>ued across the<br />
World throughout the 2000’s (Tambouris, Manouselis & Costopoulou, 2007; Alasem,<br />
2009). In many cases the Dubl<strong>in</strong> Core metadata standard (Weibel, 1997) has<br />
constituted a central element (Alasem, 2009). Dubl<strong>in</strong> Core conta<strong>in</strong>s 15 data elements<br />
and may thus be considered a simple format for metadata compared to for <strong>in</strong>stance the<br />
highly detailed MARC format. In Australia the standard for <strong>government</strong> metadata,<br />
Australian Government Locator Service (AGLS), was <strong>in</strong>itiated <strong>in</strong> 1997. Instead of<br />
follow<strong>in</strong>g the GILS, Australia developed a standard based on the Dubl<strong>in</strong> Core standard<br />
(Haynes, 2004; National Archives of Australia, 2010). Also the European Union has<br />
3<br />
In order to avoid confusion we will refer to the acronym GILS, when designation the Global<br />
Information Locator Service. The Government Information Locator Service will be designated by its<br />
full name.<br />
44
45<br />
Chapter 3<br />
developed a mark-up language (GovML) <strong>in</strong> a 2-year project funded by the European<br />
Commission. GovML is based on an open XML document structure (Kavadias &<br />
Tambouris, 2003; Glassey, 2004).<br />
In Denmark, the National IT and Telecom Agency has functioned as advisors<br />
for <strong>government</strong>al offices with<strong>in</strong> the framework of the FESD project. The purpose was to<br />
give directions and recommendations for digitaliz<strong>in</strong>g <strong>government</strong>s with specific focus<br />
on implement<strong>in</strong>g electronic document management systems (EDMS) (Center for<br />
effektiviser<strong>in</strong>g og digitaliser<strong>in</strong>g, 2002; Ste<strong>in</strong>mark, 2005). However, apply<strong>in</strong>g the<br />
guidel<strong>in</strong>es was not mandatory as also <strong>in</strong>dicated by the choice of term<strong>in</strong>ology. Likewise,<br />
apply<strong>in</strong>g the <strong>government</strong> <strong>in</strong>formation locator service profile <strong>in</strong> the United States was<br />
voluntary. Some American states have adopted it while others have applied alternative<br />
solutions (Moen, 2001). Recently, the Danish <strong>in</strong>itiatives have concerned standardiz<strong>in</strong>g<br />
of data by means of OIOXML, a XML standard developed with the specific purpose of<br />
exchang<strong>in</strong>g and reus<strong>in</strong>g data across adm<strong>in</strong>istrations (National IT and Telecom Agency,<br />
2009). Obviously, an important presupposition for enabl<strong>in</strong>g exchange of data between<br />
adm<strong>in</strong>istrations is <strong>in</strong>teroperability.<br />
Fewer <strong>in</strong>itiatives have been taken <strong>in</strong> order to standardize descriptive metadata<br />
<strong>in</strong> e-<strong>government</strong>. The <strong>in</strong>itiatives are commonly proposed as a component of enterprise<br />
architecture and takes the form of ontologies (Peristeras, Tatabanis & Goudos, 2009).<br />
Ontologies are considered a type of KOS (see section 5.3.2) though with different<br />
characteristics compared to e.g. taxonomies and thesauri (Soergel, 1999; Gilchrist,<br />
2003; Haynes, 2004; Zeng, 2008). In Denmark, FORM has been developed that<br />
conta<strong>in</strong>s a common language for exchange between Danish <strong>government</strong>s. FORM is the<br />
Danish acronym for Jo<strong>in</strong>t Cross Governmental Bus<strong>in</strong>ess Reference Model (cf., OECD,<br />
2010). In their paper, Abecker et al. (1998) outl<strong>in</strong>e three levels for characteriz<strong>in</strong>g<br />
ontologies: Information, doma<strong>in</strong>, and enterprise. FORM is characterized as an<br />
enterprise ontology by virtue of its identification of work tasks carried out across the<br />
entire Danish public sector (cf., Gilchrist, 2003; OECD, 2010). At present FORM is<br />
applied <strong>in</strong> the national portal to the public doma<strong>in</strong> borger.dk. A number of similar<br />
<strong>in</strong>itiatives and tools have been developed <strong>in</strong> other countries (cf., Peristeras, Tatabanis &<br />
Goudos, 2009). As appears for above, the undertak<strong>in</strong>gs regard<strong>in</strong>g metadata have to a<br />
large extent been concerned with the development of standards. The evaluation of<br />
<strong>in</strong>itiatives has received less attention <strong>in</strong> the research literature. In this sense, the present<br />
project can <strong>in</strong>crease our understand<strong>in</strong>g of the role of metadata, when profession users<br />
seek <strong>in</strong>formation.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
3.5 Summary<br />
E-<strong>government</strong> is a fairly new <strong>in</strong>terdiscipl<strong>in</strong>ary research field compris<strong>in</strong>g<br />
research fields such as social sciences, computer science, public adm<strong>in</strong>istration,<br />
organization studies. The field started out some 20 years ago and <strong>in</strong> the <strong>in</strong>terven<strong>in</strong>g<br />
time it has been consolidated with academic journals and conferences. The po<strong>in</strong>t of<br />
departure of the research field was an <strong>in</strong>creased focus on effectivity and efficiency of<br />
<strong>government</strong>s along with demands for transparency of public adm<strong>in</strong>istrations. To some<br />
extent the development of e-<strong>government</strong> has been <strong>in</strong>spired by e-commerce, that is, the<br />
bus<strong>in</strong>ess world. However, the two worlds differ as to a number of characteristics.<br />
Examples count the number and types of stakeholders and the complexity of<br />
organizations. Thus experiences cannot be directly transferred between the two areas.<br />
The present PhD project is placed with<strong>in</strong> <strong>in</strong>formation science, which also<br />
shapes the approach to e-<strong>government</strong>. A variety of system types exists that supports e<strong>government</strong>.<br />
The system that hosts the comparative test of categorization <strong>in</strong> the PhD<br />
project is an <strong>in</strong>tranet. Overall, we characterize it as an expert system due to different<br />
def<strong>in</strong>itions above. However, as it is an <strong>in</strong>tranet, other objects are conta<strong>in</strong>ed <strong>in</strong> the<br />
database too. However, the character of the system places the system as a tool for<br />
knowledge management. Here, we are ma<strong>in</strong>ly concerned with the retrieval of<br />
knowledge. We test the system with professional employees. From Rowley we have<br />
learned that many stakeholders exist with<strong>in</strong> e-<strong>government</strong> and that they do not<br />
necessarily act the same as regards <strong>in</strong>formation seek<strong>in</strong>g. Further we have seen that the<br />
<strong>in</strong>troduction of e-<strong>government</strong> most likely has meant a change of work tasks for<br />
employees. Together this makes demands for the design of the search test. Lastly, the<br />
<strong>in</strong>vestigations of metadata <strong>in</strong> e-<strong>government</strong> have to a large extent been concerned with<br />
metadata formats and to a less degree with descriptive metadata. Further, the exist<strong>in</strong>g<br />
knowledge of the mean<strong>in</strong>g of metadata <strong>in</strong> e-<strong>government</strong> <strong>in</strong>formation seek<strong>in</strong>g is limited.<br />
Therefore it is our aim to add to this knowledge by means of the project.<br />
46
4 Seek<strong>in</strong>g behaviour <strong>in</strong> e-<strong>government</strong><br />
47<br />
Chapter 4<br />
Information seek<strong>in</strong>g constitutes a core research field <strong>in</strong> the user oriented research<br />
tradition (e.g., Ingwersen, 1996; Åström, 2007). Further, <strong>in</strong>formation seek<strong>in</strong>g has been<br />
studied <strong>in</strong> LIS for decades (Ingwersen & Järvel<strong>in</strong>, 2005). Thus, ARIST started out with<br />
annual reviews on <strong>in</strong>formation needs and uses <strong>in</strong> 1966. Though the reviews on the<br />
subject only had an annual frequency until 1972 the ever <strong>in</strong>creas<strong>in</strong>g number of research<br />
articles and reviews on the subject states the importance of the research field.<br />
Studies of <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> the context of e-<strong>government</strong> serve different<br />
purposes. One purpose is the evaluation of (digital) <strong>in</strong>formation services. Are they<br />
be<strong>in</strong>g used and how? Does the use reflect the <strong>in</strong>tentions beh<strong>in</strong>d the service? Another<br />
purpose is to characterize the use of <strong>in</strong>formation and <strong>in</strong>formation services <strong>in</strong> order to<br />
enable meet<strong>in</strong>g this use, when design<strong>in</strong>g new <strong>in</strong>itiatives (e.g., Wilson, 1999).<br />
The purpose of the present chapter is to supplement the prior theoretical<br />
chapter with a review of <strong>in</strong>formation seek<strong>in</strong>g studies with<strong>in</strong> e-<strong>government</strong>. With the<br />
present chapter we want to provide an overview of the current state of knowledge<br />
concern<strong>in</strong>g professional users of <strong>in</strong>formation <strong>in</strong> the context of e-<strong>government</strong>. We<br />
<strong>in</strong>troduce the chapter with a def<strong>in</strong>ition of the concept of <strong>in</strong>formation seek<strong>in</strong>g as it serves<br />
as the frame of reference for review<strong>in</strong>g studies of <strong>in</strong>formation seek<strong>in</strong>g. Next follows a<br />
presentation of the coverage of different e-<strong>government</strong> stakeholders’ seek<strong>in</strong>g behaviour.<br />
The purpose of this subsection is to compare the amount of knowledge of other<br />
stakeholders to the particular group <strong>in</strong> question here: employees. The brief comparison<br />
is followed by a review of the current state of knowledge about the <strong>in</strong>formation seek<strong>in</strong>g<br />
of e-<strong>government</strong> employees. We f<strong>in</strong>ish the fourth chapter with a summary.<br />
4.1 Information seek<strong>in</strong>g and related concepts<br />
Information seek<strong>in</strong>g designates “the conscious effort to acquire <strong>in</strong>formation <strong>in</strong><br />
response to a need or gap <strong>in</strong> your knowledge” (Case, 2007, p. 5). Further, <strong>in</strong>formation<br />
seek<strong>in</strong>g describes “the variety of methods people employ to discover, and ga<strong>in</strong> access to<br />
<strong>in</strong>formation resources…” (Wilson, 1999, p. 263). Two concepts are closely related to
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Figure 4.1 Nested model of <strong>in</strong>formation seek<strong>in</strong>g and <strong>in</strong>formation search<strong>in</strong>g (Wilson, 1999, p. 263)<br />
<strong>in</strong>formation seek<strong>in</strong>g; <strong>in</strong>formation behavior and <strong>in</strong>formation search<strong>in</strong>g. Information<br />
behavior is superord<strong>in</strong>ate to <strong>in</strong>formation seek<strong>in</strong>g. The concept is considered a part of<br />
general human communication behavior and may be def<strong>in</strong>ed as “the more general field<br />
of <strong>in</strong>vestigation…” (Wilson, 1999, p. 263). Information search<strong>in</strong>g on the other hand is<br />
subord<strong>in</strong>ate to <strong>in</strong>formation seek<strong>in</strong>g and represents the situation, when a user <strong>in</strong>teracts<br />
with an <strong>in</strong>formation system <strong>in</strong> order to solve a need for <strong>in</strong>formation. S<strong>in</strong>ce consult<strong>in</strong>g an<br />
IR system is one among more possible ways to solve an <strong>in</strong>formation need, <strong>in</strong>formation<br />
search<strong>in</strong>g must be characterized as potentially conta<strong>in</strong>ed <strong>in</strong> <strong>in</strong>formation seek<strong>in</strong>g. The<br />
relation between the three concepts is illustrated <strong>in</strong> Figure 4.1. The figure is based on<br />
analyses of a number of exist<strong>in</strong>g models and therefore serves as a metamodel.<br />
It is implicit to the concept of <strong>in</strong>formation seek<strong>in</strong>g that it occurs when a subject<br />
has experienced some sort of gap <strong>in</strong> their knowledge and a need for <strong>in</strong>formation has<br />
arisen. Information needs as the trigger<strong>in</strong>g element have been a common po<strong>in</strong>t of<br />
departure for studies of <strong>in</strong>formation seek<strong>in</strong>g and search<strong>in</strong>g. The concept of <strong>in</strong>formation<br />
need denotes a problematic situation which, unless the problem is very simple or<br />
rout<strong>in</strong>e, causes an <strong>in</strong>formation need (MacMull<strong>in</strong> & Taylor, 1984, p. 93). Different<br />
theories of the nature of the <strong>in</strong>formation need have been presented. Taylor (1968) has<br />
outl<strong>in</strong>ed four stages to characterize an <strong>in</strong>formation need (Q1-Q4). Libraries can apply<br />
the stages <strong>in</strong> order to help the user at which ever stage his <strong>in</strong>formation need is. Belk<strong>in</strong><br />
48
49<br />
Chapter 4<br />
Oddy & Brooks’ (1982) contribution is concerned with the background of the<br />
<strong>in</strong>formation need. They have put forward the ASK hypothesis depict<strong>in</strong>g that an<br />
<strong>in</strong>formation need arises from an anomaly <strong>in</strong> the user’s state of knowledge. The idea<br />
beh<strong>in</strong>d the hypothesis is, that it will be easier for the user to describe the anomaly<br />
<strong>in</strong>stead of describ<strong>in</strong>g the <strong>in</strong>formation need <strong>in</strong> the language of the <strong>in</strong>formation system<br />
(Belk<strong>in</strong>, Oddy & Brooks, 1982, p. 62). Also Ingwersen (1986a) has offered an<br />
empirically based typology of <strong>in</strong>formation needs of users. The typology comprises<br />
three different types; verificative <strong>in</strong>formation needs, conscious topical <strong>in</strong>formation<br />
needs, and muddled topical <strong>in</strong>formation needs. Orig<strong>in</strong>ally the identification of the three<br />
types was based on empirical results from library users. When hav<strong>in</strong>g a verificative<br />
<strong>in</strong>formation need the user wants to locate or verify an item. The user possesses<br />
characteristic bibliographic data on the item wanted. The conscious topical <strong>in</strong>formation<br />
need refers to a situation, where “the user wants to clarify, review or pursue aspects of<br />
known subject matter”. F<strong>in</strong>ally, the muddled topical <strong>in</strong>formation need describes a user<br />
want<strong>in</strong>g to explore new concepts outside of subject matters known to the user ahead of<br />
the <strong>in</strong>formation need. Recently, Ingwersen & Järvel<strong>in</strong> (2005, p. 289-293) have added<br />
further specification to the theories of the <strong>in</strong>formation need. Here, three dimensions<br />
characteriz<strong>in</strong>g the <strong>in</strong>formation need have been identified, namely the user’s<br />
<strong>in</strong>tentionality beh<strong>in</strong>d the search task (whether search<strong>in</strong>g for source contents or data<br />
entities), the type of knowledge known by the user (whether declarative and/or<br />
procedural doma<strong>in</strong> knowledge), and the quality of the user’s current knowledge<br />
(whether well- or ill-def<strong>in</strong>ed). Comb<strong>in</strong>ations of the three dimensions lead the authors to<br />
specify eight different <strong>in</strong>formation need types rang<strong>in</strong>g from different known item<br />
searches to muddled types.<br />
From the 1990s the concept of task has ga<strong>in</strong>ed attention as to expla<strong>in</strong><strong>in</strong>g<br />
<strong>in</strong>formation seek<strong>in</strong>g and <strong>in</strong>formation search<strong>in</strong>g (Vakkari, 2003). Information needs and<br />
seek<strong>in</strong>g strongly depend on the underly<strong>in</strong>g task, which expla<strong>in</strong>s the relevance of the<br />
concept <strong>in</strong> seek<strong>in</strong>g studies. Tasks may have been implicit <strong>in</strong> earlier theories of the<br />
<strong>in</strong>formation need formation (cf., Byström & Järvel<strong>in</strong>, 1995). However, it is with the<br />
empirically based identification of types of tasks and their consequences for <strong>in</strong>formation<br />
seek<strong>in</strong>g actions that the value of tasks as a qualified methodical alternative to<br />
<strong>in</strong>formation needs as the po<strong>in</strong>t of departure for studies of <strong>in</strong>formation seek<strong>in</strong>g has been<br />
proven (see e.g., Byström & Järvel<strong>in</strong>, 1995; Byström, 1997, 2002).
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
4.2 The purpose of seek<strong>in</strong>g studies<br />
The study of <strong>in</strong>formation seek<strong>in</strong>g is important, because it provides important<br />
knowledge about users of <strong>in</strong>formation. This knowledge is essential when develop<strong>in</strong>g<br />
<strong>in</strong>formation services regardless the choice of channel. Thus, studies of <strong>in</strong>formation<br />
seek<strong>in</strong>g may <strong>in</strong>form the design process s<strong>in</strong>ce they are able to specify the navigational<br />
structure and data needed by a particular user group <strong>in</strong> order for them to be able to<br />
localize specific <strong>in</strong>formation (cf. Rouse & Rouse, 1984; Wilson, 1999). But, as po<strong>in</strong>ted<br />
out by Wilson (1981, p. 7), studies of <strong>in</strong>formation seek<strong>in</strong>g can also stand alone as basic<br />
research, not necessarily with any practical applications or implications but rather<br />
<strong>in</strong>creas<strong>in</strong>g our knowledge on why users act the way that they do. This second type of<br />
studies <strong>in</strong> particular expresses the change of approach <strong>in</strong> <strong>in</strong>formation seek<strong>in</strong>g studies.<br />
Thus, the focus of <strong>in</strong>formation seek<strong>in</strong>g studies has moved away from exam<strong>in</strong><strong>in</strong>g the<br />
artifacts of <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> what is referred to as system-centered research.<br />
Recent studies rather emphasize the <strong>in</strong>formation user <strong>in</strong> the user-centered research<br />
tradition of <strong>in</strong>formation seek<strong>in</strong>g studies (e.g., Case, 2007; Courtright, 2007; Vakkari,<br />
1999). Along with this change of emphasis towards the users of <strong>in</strong>formation, the<br />
context for <strong>in</strong>formation seek<strong>in</strong>g has received more attention. Taylor’s (1991) paper on<br />
<strong>in</strong>formation use environments po<strong>in</strong>ts out the differences <strong>in</strong> use and perception of<br />
<strong>in</strong>formation <strong>in</strong> different groups of users, suggest<strong>in</strong>g that <strong>in</strong>formation seek<strong>in</strong>g must be<br />
studied with po<strong>in</strong>t of departure <strong>in</strong> specific user groups.<br />
4.3 Entities of e-<strong>government</strong>: studies of seek<strong>in</strong>g behavior<br />
Information seek<strong>in</strong>g behaviour <strong>in</strong> general has been thoroughly discovered<br />
with<strong>in</strong> library and <strong>in</strong>formation science. One area of seek<strong>in</strong>g studies have been focus<strong>in</strong>g<br />
on the seek<strong>in</strong>g behaviour <strong>in</strong> relation to work contexts, e.g., eng<strong>in</strong>eers and lawyers (e.g.,<br />
Case, 2007). But also the area of e-<strong>government</strong> has been the subject of <strong>in</strong>vestigation.<br />
We have previously outl<strong>in</strong>ed the stakeholders of e-<strong>government</strong> (see section 3.3).<br />
Rowley’s (2011) typology has been presented <strong>in</strong> order to be able to, among others,<br />
identify the differences between needs or demands, that characterize different<br />
stakeholders. In this section we will apply the typology as a tool for categoriz<strong>in</strong>g<br />
studies of seek<strong>in</strong>g behavior with<strong>in</strong> e-<strong>government</strong>. The purpose of <strong>in</strong>troduc<strong>in</strong>g the<br />
typology <strong>in</strong> the present chapter is to outl<strong>in</strong>e the research coverage of the different<br />
stakeholders as to their patterns of <strong>in</strong>formation seek<strong>in</strong>g. Obviously, s<strong>in</strong>ce the typology<br />
50
51<br />
Chapter 4<br />
has not been developed with this particular purpose <strong>in</strong> m<strong>in</strong>d, not all roles are necessarily<br />
relevant as objects of <strong>in</strong>vestigation <strong>in</strong> the present framework. For <strong>in</strong>stance we do not<br />
expect to f<strong>in</strong>d seek<strong>in</strong>g studies <strong>in</strong>vestigat<strong>in</strong>g roles that are not subject to <strong>government</strong><br />
services. By this, we particularly mean the meta actors compris<strong>in</strong>g the last four roles <strong>in</strong><br />
Rowley’s typology, namely project managers, design and IT developers, suppliers and<br />
partners, and researchers and evaluators. Further, s<strong>in</strong>ce this chapter serves the function<br />
of sett<strong>in</strong>g the stage for our empirical doma<strong>in</strong> study, we are limit<strong>in</strong>g the follow<strong>in</strong>g review<br />
to geographic locations that share level of development with Denmark, which is the<br />
geographic location of our case organization.<br />
We have already mentioned (section 3.3) that citizens, bus<strong>in</strong>esses, and<br />
<strong>government</strong>s have received much attention <strong>in</strong> the e-<strong>government</strong> research literature.<br />
Among others, this is also reflected <strong>in</strong> seek<strong>in</strong>g studies of e-<strong>government</strong> stakeholders; <strong>in</strong><br />
particular the seek<strong>in</strong>g and search<strong>in</strong>g behavior of citizens is well discovered. Also<br />
politicians elected for office have been rather well discovered. Table 4.1 presents<br />
studies exemplify<strong>in</strong>g seek<strong>in</strong>g studies of different stakeholders. Employees have been<br />
left out of the table s<strong>in</strong>ce we are go<strong>in</strong>g more <strong>in</strong>to detail with this particular stakeholder<br />
role from section 4.4 and onwards. The division of the typology does have some<br />
<strong>in</strong>fluence on how seek<strong>in</strong>g studies can be placed. For <strong>in</strong>stance citizens are divided as to<br />
whether they are general citizens or users of a particular service. This means that the<br />
studies that can be placed <strong>in</strong> the latter group are ma<strong>in</strong>ly search<strong>in</strong>g studies reflect<strong>in</strong>g the<br />
use and often also evaluation of a particular service. The evaluative character of the<br />
latter type of studies also means that they do not necessarily <strong>in</strong>clude search<strong>in</strong>g behavior<br />
per se, such as selection of search terms or modification of queries.<br />
4.4 E-<strong>government</strong> employee <strong>in</strong>formation seek<strong>in</strong>g<br />
A number of selection criteria have guided the <strong>in</strong>clusion and exclusion of studies <strong>in</strong><br />
this review. We have previously mentioned the diverg<strong>in</strong>g maturity levels of e<strong>government</strong><br />
at national levels. In our review we are focus<strong>in</strong>g on countries that have a<br />
maturity level similar to Denmark. It would be reasonable to argue that an even<br />
narrower geographical delimitation would be required due to the specific<br />
characteristics <strong>in</strong> the Scand<strong>in</strong>avian adm<strong>in</strong>istrative tradition (cf. Arellano-Gault & del<br />
Castillo-Vega, 2004). However, s<strong>in</strong>ce we are concerned with seek<strong>in</strong>g behaviour <strong>in</strong><br />
relation to carry<strong>in</strong>g out adm<strong>in</strong>istrative work tasks and not the adm<strong>in</strong>istrative tradition
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 4.1 Examples of studies that have exam<strong>in</strong>ed <strong>in</strong>formation seek<strong>in</strong>g and/or search<strong>in</strong>g of various<br />
stakeholder roles.<br />
Stakeholder Author(s) Object of study Methods applied<br />
People: Service<br />
users<br />
Fu, Farn &<br />
Chao (2006)<br />
Hu et al.<br />
(2008)<br />
Wang &<br />
Shih (2009)<br />
People: Citizens Jaeger &<br />
Thompson<br />
(2004)<br />
Bus<strong>in</strong>esses and<br />
Small and<br />
Medium sized<br />
Enterprises<br />
Reddick<br />
(2005)<br />
Chau, Fang<br />
& Sheng<br />
(2007)<br />
Citizens’ acceptance of e-tax<br />
fil<strong>in</strong>g<br />
Determ<strong>in</strong>ants of service<br />
quality <strong>in</strong> e-tax fil<strong>in</strong>g<br />
Factors <strong>in</strong>fluenc<strong>in</strong>g citizens’<br />
use of <strong>government</strong><br />
<strong>in</strong>formation kiosks<br />
E-<strong>government</strong> non-users<br />
among citizens<br />
Citizens’ <strong>in</strong>teraction with e<strong>government</strong><br />
Citizens’ use of a particular<br />
website: Utah.gov<br />
Ra<strong>in</strong>s (2008) Citizens’ <strong>in</strong>formation<br />
behavior when look<strong>in</strong>g for<br />
health related <strong>in</strong>formation<br />
Cuillier &<br />
Piotrowski<br />
(2009)<br />
College students’, <strong>in</strong>ternet<br />
volunteers’ and citizens’<br />
general seek<strong>in</strong>g behavior and<br />
perceptions of access to<br />
<strong>government</strong> <strong>in</strong>formation<br />
Ren (1999) SME executives’ use of<br />
<strong>government</strong> <strong>in</strong>formation<br />
sources<br />
52<br />
Survey questionnaire<br />
Two-stage onl<strong>in</strong>e<br />
survey of citizens<br />
Survey questionnaire<br />
Literary study<br />
Citizen telephone<br />
surveys<br />
Log analysis<br />
Survey questionnaire<br />
In-class paper<br />
questionnaires,<br />
onl<strong>in</strong>e surveys, and<br />
phone surveys<br />
Survey questionnaire
Stakeholder Author(s) Object of study Methods applied<br />
Non-profit<br />
organizations<br />
Elwood<br />
(2008)<br />
Politicians Nicholas &<br />
Colgrave<br />
(1996)<br />
The particular challenges<br />
connected to meet<strong>in</strong>g the<br />
data needs of grass root<br />
organizations<br />
Nikoi (2008) NGO-workers’<br />
<strong>in</strong>formation needs<br />
Orton,<br />
Marcella &<br />
Baxter<br />
(2000)<br />
Askim<br />
(2007; 2009)<br />
Information needs of<br />
British local councilors<br />
Parliamentary members<br />
<strong>in</strong> the United K<strong>in</strong>gdom<br />
53<br />
Chapter 4<br />
Observation of 2<br />
organizations and semistructured<br />
<strong>in</strong>terviews of<br />
respondents surround<strong>in</strong>g<br />
the organizations<br />
Interviews, observation,<br />
and analyses of the content<br />
of <strong>in</strong>formation already<br />
gathered by the respondents<br />
Interviews and subsequent<br />
survey questionnaire<br />
Case study of two<br />
parliamentary members<br />
<strong>in</strong>clud<strong>in</strong>g observation and<br />
log analysis<br />
per se we f<strong>in</strong>d it reasonable to <strong>in</strong>clude studies from other adm<strong>in</strong>istrative traditions as<br />
well. F<strong>in</strong>ally, we have not limited the <strong>in</strong>cluded studies as to their time of publication.<br />
The rationale beh<strong>in</strong>d this decision is the presence of ICT <strong>in</strong> <strong>government</strong>s that dates far<br />
back. Thus, employees have been us<strong>in</strong>g ICT as a part of their work for decades<br />
already. Therefore we suppose, that studies that have been carried out before the<br />
<strong>in</strong>troduction of the concept of e-<strong>government</strong> may offer valuable <strong>in</strong>sights <strong>in</strong>to the<br />
seek<strong>in</strong>g behaviour of our target group.<br />
As it will appear from the sections to follow, the amount of research<br />
conducted of <strong>in</strong>formation seek<strong>in</strong>g of employees with<strong>in</strong> e-<strong>government</strong> is limited.<br />
Therefore we will supplement the review with studies of related and relevant user<br />
groups that can enrich our uncover<strong>in</strong>g of e-<strong>government</strong> employees. One area we are<br />
draw<strong>in</strong>g on is seek<strong>in</strong>g studies of professions with similar characteristics to the context<br />
<strong>in</strong> question here. Also, we will consult studies of e-<strong>government</strong> employees that are not
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
core seek<strong>in</strong>g or search<strong>in</strong>g studies, but still contribute to our knowledge of the target<br />
group <strong>in</strong> question. Not all studies strictly <strong>in</strong>vestigate employees. We also see<br />
examples of studies that <strong>in</strong>form us about employees, but at the same time <strong>in</strong>clude other<br />
stakeholders such as politicians or citizens (e.g., Marcella et al., 2007). These studies<br />
will be <strong>in</strong>cluded <strong>in</strong> the review given that they are able to add to the knowledge about<br />
employees. Numerous studies exist on <strong>in</strong>formation needs and seek<strong>in</strong>g behaviour<br />
with<strong>in</strong> medical health professionals (a review of recent studies can be found <strong>in</strong> Fourie,<br />
2009). Though medical health may be considered a subcategory of e-<strong>government</strong>,<br />
these studies will not be considered <strong>in</strong> the review s<strong>in</strong>ce the nature of the applied<br />
<strong>in</strong>formation diverges considerably from the <strong>in</strong>formation employed by the user group<br />
that is <strong>in</strong> focus here.<br />
4.4.1 Project INISS<br />
In the late seventies, Wilson et al (e.g., Wilson & Streatfield, 1977; Wilson,<br />
1980) performed a large observational study of social workers and social<br />
adm<strong>in</strong>istrators; project INISS. Wilson’s participation project INISS (<strong>in</strong>formation<br />
needs and <strong>in</strong>formation services <strong>in</strong> local authority social services departments) had the<br />
purpose of exam<strong>in</strong><strong>in</strong>g <strong>in</strong>formation needs and <strong>in</strong>formation behaviour among social<br />
workers and social adm<strong>in</strong>istrators (Wilson, 1980, p. 199). The results were supposed<br />
to be used for improv<strong>in</strong>g and develop<strong>in</strong>g <strong>in</strong>formation system organization and<br />
<strong>in</strong>formation service delivery (Wilson, 1980, p. 199). The project was carried out <strong>in</strong> a<br />
selected set of British local authorities departments represent<strong>in</strong>g both urban and rural<br />
departments. Furthermore the test persons reflected different categories of employees<br />
(Wilson, 1980, p. 203). 22 subjects were observed us<strong>in</strong>g structured observation,<br />
provid<strong>in</strong>g 6.000 records of communication events (Wilson & Streatfield, 1977, p. 282).<br />
The study is primarily a study of <strong>in</strong>formation behaviour. Hence, it is concerned with<br />
multiple aspects of the work situation of the subjects be<strong>in</strong>g studied. Still there are<br />
elements of <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> the results, e.g. when referr<strong>in</strong>g to the role of current<br />
awareness bullet<strong>in</strong>s (Wilson & Streatfield, 1977, p. 285).<br />
The study shows, that 74% of all sessions last 5 m<strong>in</strong>utes or less (Wilson &<br />
Streatfield, 1977, p. 284). The issue of limited time is still noted years after Wilson &<br />
Streatfield’s study (see e.g., Quirchmayr & Traunmüller, 1991). In addition, the social<br />
service staff members stress the importance of clearly and succ<strong>in</strong>ctly presented texts,<br />
preferably <strong>in</strong> a format, which makes the identification of key elements easy accessible<br />
(Wilson, 1980, p. 211). As regards <strong>in</strong>formation needs, the study <strong>in</strong>dicates, that<br />
54
55<br />
Chapter 4<br />
<strong>in</strong>formation needs among the participants are more complex that just verificative needs.<br />
The study f<strong>in</strong>ds that topical needs, whether conscious or muddled are present among the<br />
participants. However, this does not exclude the presence of verificative <strong>in</strong>formation<br />
needs <strong>in</strong> the social services departments. Put another way, this <strong>in</strong>dicates that the work<br />
tasks of the employees generate both verificative and topical <strong>in</strong>formation needs. In the<br />
personal files observed <strong>in</strong> the study, a number of different <strong>in</strong>formation types are to be<br />
found; e.g. committee papers, pamphlets, reports, and statistics (Wilson, 1980, p. 211).<br />
It means that the length of the s<strong>in</strong>gle units of <strong>in</strong>formation is vary<strong>in</strong>g.<br />
4.4.2 System development <strong>in</strong> the Danish Parliament<br />
In 1989 and onwards Ingwersen (1994) worked as a consultant on a project<br />
regard<strong>in</strong>g the <strong>in</strong>troduction of a new <strong>in</strong>formation system <strong>in</strong> the Danish Parliament. The<br />
project <strong>in</strong>cluded the design and development of a thesaurus. As an <strong>in</strong>troductory part of<br />
the project the <strong>in</strong>formation structure and work<strong>in</strong>g processes of people employed <strong>in</strong> the<br />
Parliament were analysed. For this particular purpose a user and doma<strong>in</strong> analysis was<br />
carried out. The empirical basis for the analysis comprised <strong>in</strong>terviews with 32<br />
respondents on the basis of a structured questionnaire. The respondents were <strong>in</strong> some<br />
cases groups of respondents result<strong>in</strong>g <strong>in</strong> 32+ respondents. The respondents comprise<br />
members of parliament (MP’s), assistants, and secretariat employees. The questionnaire<br />
consisted of four parts: 1) Characteristics of respondents, 2) Quality of <strong>in</strong>formation<br />
(critical <strong>in</strong>cident); 3) Quality of <strong>in</strong>formation (<strong>in</strong> general), and 4) Types of subject terms<br />
and subject levels.<br />
The results of the user and doma<strong>in</strong> analysis primarily <strong>in</strong>form us about the<br />
search<strong>in</strong>g behaviour of the participants. Thus, the study gives directions for the<br />
required functionalities of the future <strong>in</strong>formation system to be implemented. What<br />
characterizes the documents of the organization is that they are connected <strong>in</strong> a highly<br />
complex manner reflect<strong>in</strong>g the different stages <strong>in</strong> the law-mak<strong>in</strong>g process. An<br />
important feature of the system is that it allows for high precision searches. Thus, the<br />
respondents consider it an important facility of search<strong>in</strong>g that they are able to identify a<br />
specific document, when descriptive data or subject data are known <strong>in</strong> advance. This<br />
demand is connected to the participants’ need to be able to limit search results as much<br />
as possible. The paper outl<strong>in</strong>es different ways of reach<strong>in</strong>g this goal. One way is to<br />
assign several controlled <strong>in</strong>dex terms to each document, allow<strong>in</strong>g for discrim<strong>in</strong>ation<br />
between documents. Here one should note the need to discrim<strong>in</strong>ate between documents.<br />
Another parameter is the short w<strong>in</strong>dow of currency of the documents expressed by the
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
respondents. Thus, the participants consider the documents of the <strong>in</strong>formation system<br />
as outdated after two years or even less. Thirdly, exhaustive descriptive metadata need<br />
to be available <strong>in</strong> order to support searches, when one or several descriptive<br />
characteristics of the document are known.<br />
Another result of the study regards potential terms for the future thesaurus.<br />
Here differences of op<strong>in</strong>ion were expressed as to what makes a synonym. Thus, the<br />
respondents were presented with different synonyms <strong>in</strong> the fourth part of the <strong>in</strong>terview<br />
that they marked as either related or not related terms. It appears that different<br />
employment functions did not agree on the relations of specific terms. MPs <strong>in</strong> several<br />
cases seemed more certa<strong>in</strong> than the rema<strong>in</strong><strong>in</strong>g two groups of employment as to when<br />
synonyms were related and when they were not. Thus, when apply<strong>in</strong>g thesauri or other<br />
controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages, one must take <strong>in</strong>to account that sub groups of an<br />
organization may differ as to their perception of relations between concepts despite that<br />
they work with the same subject areas. On the other hand all three groups represented<br />
<strong>in</strong> the study share the op<strong>in</strong>ion that popular expressions for laws and political concepts<br />
must be present <strong>in</strong> the thesaurus.<br />
Figure 4.2 Comprehensive model of <strong>in</strong>formation seek<strong>in</strong>g. Adapted from Johnson et al. (1995).<br />
56
57<br />
Chapter 4<br />
4.4.3 Information behavior of employees <strong>in</strong> a eng<strong>in</strong>eer<strong>in</strong>g and technical service<br />
<strong>government</strong> office<br />
Johnson et al.’s (1995) study takes it’s po<strong>in</strong>t of departure <strong>in</strong> the employees of a<br />
<strong>government</strong>al agency concerned with eng<strong>in</strong>eer<strong>in</strong>g and technical services. The <strong>in</strong>tention<br />
beh<strong>in</strong>d the study is to <strong>in</strong>vestigate the background factors that affect <strong>in</strong>formation seek<strong>in</strong>g<br />
actions. The dependent variable of the study, <strong>in</strong>formation seek<strong>in</strong>g actions, <strong>in</strong>volves two<br />
aspects, namely scope, that is, the range of people consulted <strong>in</strong> order to access<br />
<strong>in</strong>formation, and depth, that is, the amount of <strong>in</strong>formation sought. In this sense the<br />
study is not a seek<strong>in</strong>g study per se. Rather, the study <strong>in</strong>forms us about the factors, that<br />
affect seek<strong>in</strong>g behavior. The model tested <strong>in</strong> the study appears from Figure 4.2.<br />
The empirical basis of the study <strong>in</strong>cludes 380 responses to a survey<br />
questionnaire. 26 percent did not respond to the questionnaire. The respondents were<br />
characterized as hav<strong>in</strong>g a fairly long seniority <strong>in</strong> the organization. Also, the<br />
communication <strong>in</strong> the organization is extensive along with <strong>in</strong>terpersonal and group<br />
<strong>in</strong>terdependence. The tests of the model show that the strongest paths exist between<br />
characteristics and action, and between cultural beliefs and characteristics. As regards<br />
the former path relation it means that among the tested <strong>in</strong>dependent variables, the<br />
respondents’ assessment of the quality of communication channels guides the amount of<br />
<strong>in</strong>formation and people approached <strong>in</strong> order to solve an <strong>in</strong>formation need. Though not<br />
specifically expressed <strong>in</strong> the paper we assume that the relation between the two is<br />
<strong>in</strong>versely proportional mean<strong>in</strong>g that with high quality of the channels less <strong>in</strong>formation<br />
and people need to be approached. The latter path relation moves a step backwards <strong>in</strong><br />
the model from the former relation and expresses a strong relation between cultural<br />
beliefs and characteristics. Thus, the relation documents that for the respondent group<br />
the cultural conception of a channel decides on the subsequent assessment of the<br />
channel. What is also found <strong>in</strong> the study is that some modifications need to be made to<br />
the proposed model. Thus, some of the variables put forward <strong>in</strong> the left column of<br />
Figure 4.2 do not take the path through utility. Instead, they have a direct effect on both<br />
characteristics (<strong>in</strong> the middle column) and actions (right column). This f<strong>in</strong>d<strong>in</strong>g suggests<br />
that the variables <strong>in</strong> the left column can be seen as important to several stages of the<br />
process outl<strong>in</strong>ed <strong>in</strong> the model. In other words, we can see demographics, direct<br />
experience, beliefs, and salience as direct <strong>in</strong>dicators of consulted channels <strong>in</strong><br />
<strong>in</strong>formation seek<strong>in</strong>g.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
4.4.4 Federal, state, and local policy makers’ selection of <strong>in</strong>formation sources<br />
The purpose of Oh’s (1996) study is to <strong>in</strong>vestigate which factors have an<br />
<strong>in</strong>fluence on selection of <strong>in</strong>formation <strong>in</strong> bureaucracies, Also the relation between the<br />
identified <strong>in</strong>fluenc<strong>in</strong>g factors is under study. Thus, the study is similar to the study by<br />
Johnson et al. (1995). Oh’s study <strong>in</strong>forms us about the seek<strong>in</strong>g behaviour of<br />
<strong>government</strong> employees. On the basis of exist<strong>in</strong>g theory and results a theoretical path<br />
model is created to expla<strong>in</strong> <strong>in</strong>formation selection. One assumption beh<strong>in</strong>d the study is<br />
that dist<strong>in</strong>ct policy area affects the selection of <strong>in</strong>formation. This is the reason for<br />
test<strong>in</strong>g the path model <strong>in</strong> two different policy areas with<strong>in</strong> mental health, namely<br />
delivery and f<strong>in</strong>anc<strong>in</strong>g of mental health service. The two policy areas are selected on<br />
the basis of the assumption that the former primarily comprises generalists while the<br />
latter ma<strong>in</strong>ly <strong>in</strong>clude specialists. The method applied <strong>in</strong> the study is twofold. First, a<br />
series of open ended <strong>in</strong>terviews were carried out. The <strong>in</strong>terviews were subsequently<br />
coded. The purpose of the first study was to discover basic <strong>in</strong>formation about the policy<br />
mak<strong>in</strong>g process of the implied policy areas. Second, a series of questionnaires were<br />
carried out <strong>in</strong> order to be able to test the path model.<br />
The results demonstrate that the generalist and the specialist groups do have<br />
some features <strong>in</strong> common while they differ at other po<strong>in</strong>ts. The characteristics of<br />
selection of <strong>in</strong>formation sources compris<strong>in</strong>g both generalists and specialists are that<br />
<strong>in</strong>ternal sources are preferred regardless of the preced<strong>in</strong>g knowledge of the problem at<br />
hand. Also it appears that the education of the employees more likely affects the<br />
selection of sources compared to age. Also the type of <strong>in</strong>formation sought for<br />
<strong>in</strong>fluences the selection process of the respondents. The <strong>in</strong>fluence does not just address<br />
which sources are selected, but also the number of sources selected. Thus, some<br />
<strong>in</strong>formation types require search<strong>in</strong>g of more sources than others. As mentioned above<br />
the two groups differ at some po<strong>in</strong>ts. The specialists have a greater probability of<br />
compar<strong>in</strong>g different sources, when search<strong>in</strong>g for <strong>in</strong>formation than do the generalists.<br />
One reason for this is the specialists’ need for ensur<strong>in</strong>g the reliability and validity of the<br />
collected <strong>in</strong>formation.<br />
It is the differences between specialists and generalists that lead Oh to sum up<br />
that “the factors <strong>in</strong>fluenc<strong>in</strong>g selection of <strong>in</strong>formation sources strongly differ between the<br />
two policy areas”, suggest<strong>in</strong>g, that future studies must take this difference <strong>in</strong>to account.<br />
However, s<strong>in</strong>ce this f<strong>in</strong>d<strong>in</strong>g has only been verified as to employees’ selection of sources<br />
we are not try<strong>in</strong>g to make the same dist<strong>in</strong>ction <strong>in</strong> our doma<strong>in</strong> study and search test. The<br />
58
59<br />
Chapter 4<br />
major reason for this is that our field of study comprises more general seek<strong>in</strong>g and<br />
search<strong>in</strong>g behaviour which results <strong>in</strong> a slightly different focus compared to Oh’s study.<br />
4.4.5 F<strong>in</strong>nish municipal employees<br />
As a part of her dissertation work, Byström (1999) conducted a study of two<br />
F<strong>in</strong>nish local (municipal) <strong>government</strong>s. The study has been presented with different<br />
foci across a number of works (Byström, 1997, 1999, 2002). Therefore we will base the<br />
present section on a comb<strong>in</strong>ation of these three publications. Us<strong>in</strong>g diary, <strong>in</strong>terview,<br />
organizational document review and observation (Byström, 1997, p. 132) 54 (80 of the<br />
cases from the pilot are <strong>in</strong>cluded) cases handled by 19 officials are analyzed. Data on<br />
the cases were collected from the moment they arrived at the registrar’s office<br />
(Byström, 1999, p. 67-68). In the study Byström focuses on <strong>in</strong>formation seek<strong>in</strong>g, and<br />
she is not specifically concerned with the actual <strong>in</strong>formation search<strong>in</strong>g. Among other<br />
th<strong>in</strong>gs, Byström analyzes <strong>in</strong>formation needs, the relation between the complexity of<br />
work tasks and the subject expertise of the participants <strong>in</strong> the study (1999, p. 85). As<br />
expected with a group of fairly experienced participants, the subject expertise is <strong>in</strong> a lot<br />
of cases rather large.<br />
The results of the study are <strong>in</strong>terpreted as to a theoretical frame regard<strong>in</strong>g type<br />
of work tasks and types of <strong>in</strong>formation needed. Five types of work tasks of <strong>in</strong>creased<br />
complexity were appo<strong>in</strong>ted for the study, namely automatic <strong>in</strong>formation process<strong>in</strong>g<br />
tasks, normal <strong>in</strong>formation process<strong>in</strong>g tasks, normal decision tasks, known-genu<strong>in</strong>e<br />
decision tasks, and unknown-genu<strong>in</strong>e decision tasks. The <strong>in</strong>creased level of complexity<br />
is expressed <strong>in</strong> terms of a subject’s level of a priory determ<strong>in</strong>ability of the <strong>in</strong>formation<br />
needed, the <strong>in</strong>formation seek<strong>in</strong>g process, and the expected outcome of the seek<strong>in</strong>g<br />
process. Three <strong>in</strong>formation types are also specified for the <strong>in</strong>formation needed. These<br />
<strong>in</strong>clude task <strong>in</strong>formation (or s<strong>in</strong>gle task related), doma<strong>in</strong> <strong>in</strong>formation (or multi task<br />
related), and task-solv<strong>in</strong>g <strong>in</strong>formation (or <strong>in</strong>structional) (Byström, 1997, 2002).<br />
From the analysis of the collected data it turns out that with the highest degree<br />
of a priori determ<strong>in</strong>ability (automatic and normal <strong>in</strong>formation process<strong>in</strong>g tasks) are by<br />
far the most frequent tasks among the participants. Next follow with decreas<strong>in</strong>g<br />
frequency normal decision tasks and known-genu<strong>in</strong>e decision tasks. Unknown-genu<strong>in</strong>e<br />
decision tasks are not present <strong>in</strong> the data material and must thus be expected to be the<br />
rarest task to the participants. Thus, the participants most often take care of tasks that<br />
have a low degree of uncerta<strong>in</strong>ty as to what <strong>in</strong>formation is needed and what constitute<br />
the process of gett<strong>in</strong>g hold of the <strong>in</strong>formation (Byström, 1997). Further, it seems that
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
different types of <strong>in</strong>formation are needed for the particular work tasks. Thus, the results<br />
<strong>in</strong>dicate that with an <strong>in</strong>creased level of uncerta<strong>in</strong>ty about the task <strong>in</strong> question the number<br />
of types <strong>in</strong>creases. <strong>Automatic</strong> <strong>in</strong>formation process<strong>in</strong>g tasks have the largest share of not<br />
acquir<strong>in</strong>g any <strong>in</strong>formation. This share decreases as complexity of tasks <strong>in</strong>creases. Task<br />
<strong>in</strong>formation is most common <strong>in</strong> normal <strong>in</strong>formation process<strong>in</strong>g tasks while both a<br />
comb<strong>in</strong>ation of task and doma<strong>in</strong> <strong>in</strong>formation or even task, doma<strong>in</strong>, and task solv<strong>in</strong>g<br />
<strong>in</strong>formation is most common <strong>in</strong> decision tasks. Comb<strong>in</strong>ed with the frequency of work<br />
tasks, task <strong>in</strong>formation becomes the <strong>in</strong>formation type with the highest frequency, while<br />
doma<strong>in</strong> <strong>in</strong>formation has a medium frequency and task-solv<strong>in</strong>g <strong>in</strong>formation has the<br />
lowest degree of frequency. A similar distribution is <strong>in</strong>dicated <strong>in</strong> Serola (2006).<br />
Likewise, the type of source applied changes with an <strong>in</strong>crease of uncerta<strong>in</strong>ty <strong>in</strong> the work<br />
task at hand. Hence, the share of documentary sources decreases with an <strong>in</strong>crease of<br />
uncerta<strong>in</strong>ty while people as <strong>in</strong>formation sources <strong>in</strong>crease (Byström, 2002). In sum,<br />
Byström’s results <strong>in</strong>dicate that the complexity of work tasks is somewhat connected to<br />
the type and amount of <strong>in</strong>formation acquired. Also, the use of documentary <strong>in</strong>formation<br />
is more widespread <strong>in</strong> tasks of low uncerta<strong>in</strong>ty and complexity which suggests that it is<br />
these types of tasks that <strong>in</strong> particular should be supported by <strong>in</strong>formation systems.<br />
4.4.6 Users of the European Parliamentary Documentation Centre<br />
In 2004 Marcella et al. (2007) exam<strong>in</strong>ed users of the European Parliamentary<br />
Documentation Centre (PDC) regard<strong>in</strong>g <strong>in</strong>formation needs and <strong>in</strong>formation seek<strong>in</strong>g<br />
behaviour. The ma<strong>in</strong> purpose of the study is to make recommendations for service<br />
development <strong>in</strong> the PDC on the basis of the study of users of the PDC. Semi structured<br />
<strong>in</strong>terviews were conducted with different types of adm<strong>in</strong>istrative staff (72 persons). The<br />
types count adm<strong>in</strong>istrative staff, MEP assistants, legal service adm<strong>in</strong>istrators, and<br />
MEPs. S<strong>in</strong>ce only 5 of the 72 persons are MEPs, we have decided to <strong>in</strong>clude the study<br />
<strong>in</strong> the present review as it reflects seek<strong>in</strong>g behaviour from employees’ po<strong>in</strong>t of view.<br />
Also 11 PDC staff were <strong>in</strong>terviewed. In order to assure experienced test persons, the<br />
data were collected prior to the 2004 election for the European Parliament.<br />
The study explores elements of <strong>in</strong>formation behaviour, <strong>in</strong>formation seek<strong>in</strong>g<br />
behaviour and <strong>in</strong>formation search<strong>in</strong>g. 90% of the <strong>in</strong>terviewees use <strong>in</strong>formation at least<br />
on a daily basis. The <strong>in</strong>formation orig<strong>in</strong>ates from both <strong>in</strong>ternal and external sources. It<br />
is applied for a wide range of activities and takes the form of both raw and analysed<br />
data. The <strong>in</strong>terviewees express difficulties <strong>in</strong> locat<strong>in</strong>g relevant <strong>in</strong>formation. The<br />
reasons for the difficulties count transparency, lack of digitalization (of older materials)<br />
60
61<br />
Chapter 4<br />
and representation of different views of op<strong>in</strong>ion, and objectivity of data. In accordance<br />
with prior studies it is essential that the time available is limited. The time pressure<br />
<strong>in</strong>tensifies the difficulties of locat<strong>in</strong>g <strong>in</strong>formation. This is <strong>in</strong>dicated <strong>in</strong> that the<br />
participants use other people to perform their searches and that an important criterion<br />
for relevance of <strong>in</strong>formation is the size of the <strong>in</strong>formation.<br />
In l<strong>in</strong>e with Byström’s (1997) results presented <strong>in</strong> the previous section the<br />
participants have <strong>in</strong>formation needs of vary<strong>in</strong>g complexity. Marcella et al. do not<br />
estimate the relative extent of different types of <strong>in</strong>formation needs but the paper<br />
<strong>in</strong>dicates the presence of vary<strong>in</strong>g complexity at different places. Search<strong>in</strong>g by enter<strong>in</strong>g<br />
complete citations po<strong>in</strong>ts to <strong>in</strong>formation needs of low complexity. On the other hand<br />
the <strong>in</strong>formation seek<strong>in</strong>g connected to the legislative process, where the <strong>in</strong>formation need<br />
starts out <strong>in</strong> a wide rang<strong>in</strong>g manner and is later becom<strong>in</strong>g more focused po<strong>in</strong>ts to more<br />
complex <strong>in</strong>formation needs.<br />
4.4.7 Information literacy of Scottish <strong>government</strong> civil service staff<br />
The overall purpose of Crawford & Irv<strong>in</strong>g’s (2009) study is to <strong>in</strong>vestigate the<br />
nature of civil service employees’ <strong>in</strong>formation literacy <strong>in</strong> order to be able to direct<br />
improv<strong>in</strong>g <strong>in</strong>itiatives more specifically towards the actual practice. The research<br />
method applied <strong>in</strong> the study is structured <strong>in</strong>terviews that are allowed to change slightly<br />
depend<strong>in</strong>g on the specific type of staff be<strong>in</strong>g <strong>in</strong>terviewed. Thus, the 20 <strong>in</strong>terviews that<br />
were made embraced different types of <strong>government</strong> employees: care home staff, civil<br />
service staff, and social work staff. The paper does not share the word<strong>in</strong>g of questions<br />
that has comprised the <strong>in</strong>terview, nor how the respondents are distributed across the<br />
different employee types.<br />
The most recurrent f<strong>in</strong>d<strong>in</strong>g of the study is the importance of humans as sources<br />
of <strong>in</strong>formation. People are used <strong>in</strong> <strong>in</strong>formation seek<strong>in</strong>g at different levels. Thus, other<br />
people are used as sources of <strong>in</strong>formation, but also <strong>in</strong> order to support the selection of<br />
websites for <strong>in</strong>formation search<strong>in</strong>g. The employees evaluate the sources employed for<br />
<strong>in</strong>formation seek<strong>in</strong>g, whether they are human or ICT-based sources. In this sense they<br />
appear very <strong>in</strong>formation literate. At the same time the <strong>in</strong>formation environment seems<br />
<strong>in</strong>trovert. The authors do not explicate what the <strong>in</strong>trovercy embraces. However, the<br />
scope of the paper could suggest that it covers lack of openness towards chang<strong>in</strong>g the<br />
<strong>in</strong>formation practice accord<strong>in</strong>g to <strong>in</strong>formation literacy courses. The paper also<br />
<strong>in</strong>vestigates aspects of search<strong>in</strong>g behaviour. Thus, <strong>in</strong> connection with the electronic<br />
resource data management system of the adm<strong>in</strong>istration it is mentioned, that the
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
specificity of subject terms assigned is not sufficient. This f<strong>in</strong>d<strong>in</strong>g suggests that the<br />
employees request high precision when search<strong>in</strong>g for <strong>in</strong>formation. The employees also<br />
demonstrate a high understand<strong>in</strong>g of the value of the <strong>in</strong>formation sought and applied.<br />
On the other hand, the authors br<strong>in</strong>g out that the quality of <strong>in</strong>ternet searches varies<br />
across the employees leav<strong>in</strong>g room for improvement, e.g., through <strong>in</strong>formation literacy<br />
courses.<br />
4.4.8 Civil servants’ <strong>in</strong>ternet skills<br />
In a recent study by van Deursen & van Dijk (2010) the <strong>in</strong>ternet skills of civil<br />
servants were the subject of <strong>in</strong>vestigation. The purpose of the study was to f<strong>in</strong>d out the<br />
strength of skills at different levels, namely operational, formal, <strong>in</strong>formation, and<br />
strategic levels. The levels are specified to comprise:<br />
1. Operational: operat<strong>in</strong>g browsers, onl<strong>in</strong>e search eng<strong>in</strong>es, and complet<strong>in</strong>g<br />
onl<strong>in</strong>e forms,<br />
2. Formal: navigate the <strong>in</strong>ternet and ma<strong>in</strong>ta<strong>in</strong> a sense of location while<br />
navigat<strong>in</strong>g the <strong>in</strong>ternet,<br />
3. Information: locate required <strong>in</strong>formation, and<br />
4. Strategic: take advantage of the <strong>in</strong>ternet for specific goals.<br />
98 civil servants from different Dutch executive policy agencies and<br />
municipalities served as the empirical basis of the study. The four levels were<br />
operationalized <strong>in</strong>to two search assignments each, a total of 8 assignments. For every<br />
assignment a maximum of time allowed to solve the task was specified. The paper does<br />
not explicate what the motivation for the allowed search time is. The assignments were<br />
used to test the participants’ ability to fulfill the assignment with<strong>in</strong> the period. The<br />
degree of accomplishment was taken to express the skills of the participants. This<br />
measure was subsequently controlled as to different background data.<br />
The general f<strong>in</strong>d<strong>in</strong>gs of the study are that the participants’ operational and<br />
formal skills are stronger than <strong>in</strong>formation and strategic skills. Also, it seems that age<br />
affects the skills <strong>in</strong> the sense that younger participants perform better <strong>in</strong> solv<strong>in</strong>g the<br />
assignments than do older participants. Another difference <strong>in</strong> performance was<br />
identified, namely as to the type of employment. Thus, the executive employees had a<br />
lower degree of performance compared to policy advisors and adm<strong>in</strong>istrators.<br />
Unfortunately, the paper does not report <strong>in</strong> a more qualitatively manner how the<br />
different assignments have been solved by the participants. However, we can use the<br />
results of the study to make clear that different characteristics about the respondents<br />
62
63<br />
Chapter 4<br />
may affect respondents’ skills and that the skills to some degree depend on the type of<br />
employment.<br />
4.5 Related studies of <strong>in</strong>formation seek<strong>in</strong>g and search<strong>in</strong>g<br />
As appears from the sections above, there are not many clear cut studies of the<br />
seek<strong>in</strong>g behaviour of <strong>government</strong> employees. This is the reason why we are present<strong>in</strong>g<br />
some studies below of professional employees shar<strong>in</strong>g some common features with the<br />
doma<strong>in</strong> <strong>in</strong> question. The studies present the seek<strong>in</strong>g behaviour of professional<br />
<strong>in</strong>formation users to whom the employment of <strong>in</strong>formation conta<strong>in</strong>s a core activity <strong>in</strong><br />
solv<strong>in</strong>g daily work tasks as a part of their job. Further, we have <strong>in</strong>cluded professional<br />
legal seek<strong>in</strong>g behaviour <strong>in</strong> the review consider<strong>in</strong>g that legal sources are expected to play<br />
an important role to <strong>government</strong> employees, s<strong>in</strong>ce their job is to govern the law.<br />
4.5.1 Legal seek<strong>in</strong>g behavior<br />
Different authors have <strong>in</strong>vestigated seek<strong>in</strong>g behavior of both academic lawyers<br />
and attorneys. Parts of the f<strong>in</strong>d<strong>in</strong>gs are <strong>in</strong>terest<strong>in</strong>g <strong>in</strong> this review because both the legal<br />
profession and e-<strong>government</strong> employees take their po<strong>in</strong>t of departure <strong>in</strong> the legal<br />
framework constituted by the law. As a consequence it is to be expected that they to a<br />
certa<strong>in</strong> degree share <strong>in</strong>formation sources and seek<strong>in</strong>g behavior.<br />
Kuhlthau & Tama (2001) have conducted a study that <strong>in</strong>vestigates the seek<strong>in</strong>g<br />
behavior of practic<strong>in</strong>g lawyers. 8 lawyers from different small and medium sized<br />
enterprises were <strong>in</strong>terviewed follow<strong>in</strong>g a semi-structured <strong>in</strong>terview guide. The study<br />
comprises both rout<strong>in</strong>e and complex tasks though most attention is paid to the complex<br />
tasks <strong>in</strong> the analysis. The <strong>in</strong>terviewees prefer pr<strong>in</strong>ted over electronic sources. It is<br />
expressed that the search<strong>in</strong>g possibilities <strong>in</strong> electronic sources does not support<br />
serendipity (cf. Foster & Ford, 2003) which is often needed <strong>in</strong> the lawyers’ work with<br />
complex cases. When carry<strong>in</strong>g out rout<strong>in</strong>e tasks the <strong>in</strong>terviewees are more will<strong>in</strong>g to<br />
apply electronic sources. A number of electronic sources are applied for stay<strong>in</strong>g up to<br />
date, e.g., e-mail and listserv. The <strong>in</strong>terviewees stress the need to be able to filter the<br />
<strong>in</strong>formation <strong>in</strong> order to avoid <strong>in</strong>formation overload. Here, time pressure makes the<br />
difference. Thus, the <strong>in</strong>terviewees do not have time to go through all the <strong>in</strong>formation<br />
and are concerned to miss important <strong>in</strong>formation. In addition to pr<strong>in</strong>ted and electronic<br />
sources the lawyers use persons as <strong>in</strong>formation sources <strong>in</strong> accordance with other user
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
groups presented above. A similar f<strong>in</strong>d<strong>in</strong>g was made by Choo et al. (2006), who<br />
reported that the employees of a Canadian law firm regularly exchanged <strong>in</strong>formation<br />
with the people, they worked with. A f<strong>in</strong>al <strong>in</strong>terest<strong>in</strong>g result for the present work is the<br />
lawyers’ expressed need to have uniform and well organized access to their documents.<br />
In a later, related study by Makri, Blandford & Cox (2008a) legal <strong>in</strong>formation<br />
seek<strong>in</strong>g was <strong>in</strong>vestigated for academic lawyers. The purpose of the study is to be able<br />
to make recommendations for the development of established law databases based on<br />
the users’ seek<strong>in</strong>g and search<strong>in</strong>g behavior. 27 participants, rang<strong>in</strong>g from first year<br />
undergraduate to Professor performed searches to f<strong>in</strong>d <strong>in</strong>formation for their work while<br />
th<strong>in</strong>k<strong>in</strong>g aloud. The frame for the analysis is Ellis’ (1989) model for <strong>in</strong>formation<br />
seek<strong>in</strong>g. Dur<strong>in</strong>g the course of the analysis different sub processes to Ellis’ model are<br />
identified. The academic background of the participants means that search<strong>in</strong>g for<br />
scientific articles is <strong>in</strong> focus throughout the paper at the expense of legal sources. As<br />
regards e-<strong>government</strong> employees we expect the use of scientific articles to be close to<br />
noth<strong>in</strong>g. Still, some of the results should be emphasized. Thus, the authors f<strong>in</strong>d that<br />
stay<strong>in</strong>g updated is particularly important <strong>in</strong> legal matters <strong>in</strong> order to avoid bas<strong>in</strong>g one’s<br />
work on materials that have been overruled, have changed the law, or is no longer the<br />
case <strong>in</strong> general. Updat<strong>in</strong>g behavior takes place <strong>in</strong> connection with Ellis’ level<br />
“monitor<strong>in</strong>g” and at a new level identified by the authors, namely “access<strong>in</strong>g”.<br />
Monitor<strong>in</strong>g is def<strong>in</strong>ed as “ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g awareness of developments of an area through<br />
regularly follow<strong>in</strong>g particular sources” (Ellis, 1989, p. 177). In the study of Makri,<br />
Blandford & Cox monitor<strong>in</strong>g takes place at source, document, and content level. Active<br />
monitor<strong>in</strong>g is carried out by the participants by conduct<strong>in</strong>g searches <strong>in</strong> different law<br />
databases, by brows<strong>in</strong>g particular sources, and by follow<strong>in</strong>g previously bookmarked<br />
web pages. Passive monitor<strong>in</strong>g takes the form of subscrib<strong>in</strong>g to e-mail alert lists. The<br />
“updat<strong>in</strong>g” behavior identified by the authors is a behavior subord<strong>in</strong>ate to “access<strong>in</strong>g”<br />
def<strong>in</strong>ed as “ga<strong>in</strong><strong>in</strong>g access to resources, sources or documents/content” (Makri,<br />
Blandford & Cox, 2008a, p. 625). Updat<strong>in</strong>g differs from monitor<strong>in</strong>g <strong>in</strong> that it<br />
designates the behavior of <strong>in</strong>vestigat<strong>in</strong>g the current understand<strong>in</strong>g of a document or the<br />
content of a document. Updat<strong>in</strong>g is primarily direct <strong>in</strong> the form of searches <strong>in</strong> a law<br />
database or check<strong>in</strong>g footnotes to legal texts. The importance of stay<strong>in</strong>g updated is also<br />
verified <strong>in</strong> other studies of legal <strong>in</strong>formation behavior (e.g., du Plessis & du Toit, 2006).<br />
64
4.5.2 Information behaviour of software eng<strong>in</strong>eers<br />
65<br />
Chapter 4<br />
The overall purpose of Freund, Toms & Waterhouse’s (2005) study of software<br />
eng<strong>in</strong>eers was to identify which contextual factors have an effect on <strong>in</strong>formation<br />
seek<strong>in</strong>g <strong>in</strong> a work context. The study was based on a comb<strong>in</strong>ation of four methods;<br />
focus group, semi structured <strong>in</strong>terviews, observation, and f<strong>in</strong>ally analysis of documents<br />
and digital <strong>in</strong>formation (phase I). In the second study (phase II) reported <strong>in</strong> the paper,<br />
14 software services consultants were <strong>in</strong>terviewed on the basis of a semi-structured<br />
<strong>in</strong>terview. The purpose of the study was to <strong>in</strong>vestigate <strong>in</strong>formation behavior on the<br />
basis of a work task framework.<br />
Essential results from the study of phase I is the dependency of <strong>in</strong>formation needs to the<br />
type of work task at hand. Work tasks can range from short term to long term<br />
commitments and the development of <strong>in</strong>formation needs is highly dependent of this<br />
work task context. Phase II of the study reveals that <strong>in</strong>formation is extremely important<br />
to the <strong>in</strong>terviewees’ work. Thus, on average they use approximately 20-30% of their<br />
work<strong>in</strong>g hours search<strong>in</strong>g for and consult<strong>in</strong>g <strong>in</strong>formation sources. The results of phase<br />
II’s study have been summed up <strong>in</strong> Figure 4.3. The figure illustrates the <strong>in</strong>fluence of<br />
work content on access constra<strong>in</strong>ts and <strong>in</strong>formation characteristics, which aga<strong>in</strong> affects<br />
the strategies applied for search<strong>in</strong>g and select<strong>in</strong>g <strong>in</strong>formation. Specifically, the figure<br />
shows that different characteristics of the work context to a large extent affect the<br />
seek<strong>in</strong>g process. Affect<strong>in</strong>g elements of the work context comprise the employees’<br />
characteristics such as her exist<strong>in</strong>g knowledge about the task at hand. Also the type of<br />
task (whether consultant or eng<strong>in</strong>eer<strong>in</strong>g), and the specific problem at hand (e.g.,<br />
learn<strong>in</strong>g, collect advice, or f<strong>in</strong>d facts), seems to affect <strong>in</strong>formation seek<strong>in</strong>g for the<br />
<strong>in</strong>terviewees. The work contextual factors affect the selection of sources <strong>in</strong> terms of the<br />
time available and the availability of sources, but also the characteristics of <strong>in</strong>formation<br />
and knowledge of the subject. This aga<strong>in</strong> has an effect on the type of channel, source,<br />
and genre selected. The seek<strong>in</strong>g process mirrored <strong>in</strong> the model reflects a l<strong>in</strong>ear<br />
conception of <strong>in</strong>formation seek<strong>in</strong>g. Rather, the strength of the model lies <strong>in</strong> its<br />
enumeration of factors that <strong>in</strong>fluences <strong>in</strong>formation seek<strong>in</strong>g.<br />
4.5.3 Professional seek<strong>in</strong>g behaviour<br />
The purpose of Leckie, Pettigrew & Sylva<strong>in</strong>’s (1996) paper is to model the<br />
<strong>in</strong>formation seek<strong>in</strong>g of not just one specific profession but to identify what characterizes<br />
the <strong>in</strong>formation seek<strong>in</strong>g that takes place across professionals. A review of exist<strong>in</strong>g
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
sought, Figure 4.3 e.g., Model as to of subject, cognitive the factors degree affect<strong>in</strong>g of detail <strong>in</strong>formation and specificity, seek<strong>in</strong>g and <strong>in</strong> the preced<strong>in</strong>g doma<strong>in</strong> of software<br />
engagement eng<strong>in</strong>eer<strong>in</strong>g. Adapted (short or from long Freund, assignment, Toms & the Waterhouse stage a project (2005).<br />
is at, etc.), the type of work<br />
66
67<br />
Chapter 4<br />
studies of eng<strong>in</strong>eers’, lawyers’, and health care professionals’ seek<strong>in</strong>g behaviour, and of<br />
seek<strong>in</strong>g models forms the basis of the general model developed by the authors. The<br />
model is depicted <strong>in</strong> Figure 4.4. The model as such <strong>in</strong>forms us about the major role that<br />
work roles play for the subsequent <strong>in</strong>formation seek<strong>in</strong>g of professionals. What can be<br />
discovered from the model is that work roles should be taken <strong>in</strong>to account, when<br />
<strong>in</strong>vestigat<strong>in</strong>g the seek<strong>in</strong>g behaviour of professionals. Compared to the model of Freund,<br />
Toms & Waterhouse (2005), the present model to a greater extent reflects the<br />
<strong>in</strong>teractivity of <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> that it <strong>in</strong>cludes feedback loops. On the other<br />
hand, the model <strong>in</strong> itself is not very thorough as to the specific steps <strong>in</strong> <strong>in</strong>formation<br />
seek<strong>in</strong>g. Thus, the dist<strong>in</strong>ct steps of <strong>in</strong>formation seek<strong>in</strong>g are not mirrored <strong>in</strong> the model.<br />
However, the authors compensate for this <strong>in</strong> their presentation of the model.<br />
4.6 Summary<br />
The present review has made different perspectives on <strong>government</strong> employee<br />
seek<strong>in</strong>g behaviour clear. A diversity of <strong>in</strong>formation needs is present. Thus, <strong>in</strong>formation<br />
needs range from simple <strong>in</strong>formation needs to far more complex <strong>in</strong>formation needs.<br />
However, apparently simple <strong>in</strong>formation needs are the most common <strong>in</strong> the doma<strong>in</strong>.<br />
The diversity of <strong>in</strong>formation needs <strong>in</strong> the doma<strong>in</strong> requires the presence of different<br />
types of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The <strong>in</strong>formation conta<strong>in</strong>ed <strong>in</strong> <strong>in</strong>formation systems needs to be<br />
represented by sufficient descriptive metadata <strong>in</strong> order to support verificative searches<br />
(Ingwersen & Wormell, 1989). In addition, <strong>in</strong> order to meet more complex <strong>in</strong>formation<br />
needs an adequate amount of subject metadata also needs to be present <strong>in</strong> e-<strong>government</strong><br />
<strong>in</strong>formation systems. Further, the assignment of both descriptive and topic metadata is<br />
required <strong>in</strong> order for the employees to be able to discrim<strong>in</strong>ate between large sets of<br />
documents conta<strong>in</strong>ed <strong>in</strong> <strong>in</strong>formation systems. The diversity of work tasks and<br />
<strong>in</strong>formation needs also affects the amount and types of <strong>in</strong>formation applied by the<br />
employees. The review has shown that the amount of <strong>in</strong>formation and the <strong>in</strong>formation<br />
types applied depends on the work task at hand. With simple work tasks task<br />
<strong>in</strong>formation is the most dom<strong>in</strong>ant type of <strong>in</strong>formation applied. When the complexity of<br />
tasks <strong>in</strong>creases, so does also the amount of <strong>in</strong>formation collected and the types of<br />
<strong>in</strong>formation applied. Thus, doma<strong>in</strong> <strong>in</strong>formation and task solv<strong>in</strong>g <strong>in</strong>formation is<br />
primarily used for solv<strong>in</strong>g complex work tasks.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Figure 4.4 The process of <strong>in</strong>formation seek<strong>in</strong>g of professionals. Adapted from Leckie, Pettigrew &<br />
Sylva<strong>in</strong> (1996, p. 180)<br />
Time is an issue to the employees at different levels. In general the time<br />
available for handl<strong>in</strong>g tasks is limited. The same f<strong>in</strong>d<strong>in</strong>g has been made for other user<br />
groups (cf. Savola<strong>in</strong>en, 2006). The time pressure of the employees calls for the<br />
possibility of carry<strong>in</strong>g out effective searches with high precision <strong>in</strong> <strong>in</strong>formation systems.<br />
One way of meet<strong>in</strong>g this requirement is aga<strong>in</strong> by assign<strong>in</strong>g topic metadata at sufficient<br />
level of specificity. Time is also significant to the respondents as to the importance of<br />
stay<strong>in</strong>g updated on the subject of their work. The topics of e-<strong>government</strong> are dynamic<br />
and the employees need to keep updated with<strong>in</strong> the latest developments. The updat<strong>in</strong>g<br />
is carried out by active <strong>in</strong>formation seek<strong>in</strong>g and <strong>in</strong> a more passive manner by follow<strong>in</strong>g<br />
newsletters and other forms of updat<strong>in</strong>g. F<strong>in</strong>ally, the development of subjects means<br />
that documents become obsolete. Be<strong>in</strong>g able to sort documents as to their currency and<br />
to state documents as to their news value is thus important <strong>in</strong> <strong>in</strong>formation systems.<br />
A long range of <strong>in</strong>formation sources are applied by employees <strong>in</strong> order to solve<br />
<strong>in</strong>formation needs. Information is collected from pr<strong>in</strong>ted, digital, and human<br />
68
69<br />
Chapter 4<br />
<strong>in</strong>formation sources. The assessment of sources as to the <strong>in</strong>formation need <strong>in</strong> question<br />
is highly qualified. The preferences for <strong>in</strong>formation sources depend on a number of<br />
characteristics of the employees. Among other th<strong>in</strong>gs, the policy areas and the type of<br />
employment <strong>in</strong>fluence the selection of sources. Also it seems that the number and type<br />
of sources <strong>in</strong>crease along with the complexity of work tasks and <strong>in</strong>formation needs.<br />
Persons as sources <strong>in</strong> general are very frequent and the importance of this particular<br />
source also <strong>in</strong>creases with the complexity of the work task at hand.<br />
In sum, we do get some <strong>in</strong>sight <strong>in</strong>to the seek<strong>in</strong>g behaviour of e-<strong>government</strong><br />
employees from the presented studies. However, what has also become clear from the<br />
review is that the body of knowledge on the seek<strong>in</strong>g behaviour of <strong>government</strong><br />
employees is limited. Firstly, the number of studies specifically <strong>in</strong>vestigat<strong>in</strong>g the<br />
seek<strong>in</strong>g behaviour of <strong>government</strong> employees is not impressive. Secondly, some of the<br />
studies mentioned above are of an earlier date. This becomes problematic s<strong>in</strong>ce we<br />
have previously stated that the work tasks of employees are expected to change with the<br />
digitalization of <strong>government</strong>s. With a change of work tasks we might also see a change<br />
<strong>in</strong> the character of <strong>in</strong>formation needs and as a consequence also a change <strong>in</strong> seek<strong>in</strong>g<br />
behaviour. The behaviour mirrored <strong>in</strong> the older studies may therefore not reflect the<br />
current situation for <strong>government</strong> employees. Thirdly, several of the studies above do<br />
not provide direct <strong>in</strong>sight <strong>in</strong>to the seek<strong>in</strong>g behaviour of the user group <strong>in</strong> question. This<br />
does not have to do with the quality of the studies. Rather it is an expression of the fact<br />
that the studies were carried out with another purpose than <strong>in</strong>vestigat<strong>in</strong>g specific<br />
seek<strong>in</strong>g behaviour. A core assumption of the empirical foundation of the present work<br />
is that the evaluation of <strong>in</strong>formation systems needs to take its po<strong>in</strong>t of departure <strong>in</strong><br />
potential users. On this basis, we estimate that there is a need for a more thorough<br />
<strong>in</strong>vestigation of the current state of civil servants’ seek<strong>in</strong>g behaviour with particular<br />
emphasis on tax <strong>government</strong>s. This <strong>in</strong>vestigation serves the purpose of qualify<strong>in</strong>g the<br />
design of the search test. This is the primary reason for carry<strong>in</strong>g out the empirical<br />
doma<strong>in</strong> study of the thesis.
5 Index<strong>in</strong>g of electronic documents<br />
71<br />
Chapter 5<br />
The concept of <strong><strong>in</strong>dex<strong>in</strong>g</strong> has different mean<strong>in</strong>gs. In LIS, the widest sense of the concept<br />
designates <strong>in</strong>dex terms as a set of labels that <strong>in</strong>formation searchers can apply <strong>in</strong><br />
<strong>in</strong>formation search<strong>in</strong>g <strong>in</strong> order to denote authors, subjects, journal names etc. (cf.,<br />
Rowley, 1994). Here, we are <strong>in</strong>vestigat<strong>in</strong>g the subject of documents. Hence, we employ<br />
a narrower def<strong>in</strong>ition of the term. The understand<strong>in</strong>g of the concept of <strong><strong>in</strong>dex<strong>in</strong>g</strong>, that<br />
guides the present work is, that it designates the act of carry<strong>in</strong>g out representations of<br />
the subject of <strong>in</strong>formation <strong>in</strong> order to enable <strong>in</strong>clusion and retrieval of documents <strong>in</strong> a<br />
database (Lancaster, 2003; Rowley & Hartley, 2008). For it, <strong><strong>in</strong>dex<strong>in</strong>g</strong> supports the<br />
purpose of subject retrieval systems, namely “...to retrieve documents, whose aboutness<br />
suggest that a user may f<strong>in</strong>d <strong>in</strong> them mean<strong>in</strong>g(s) expedient to a certa<strong>in</strong> need of the<br />
moment” (Beghtol, 1986, p. 85).The subject representation can take the form of for<br />
<strong>in</strong>stance descriptors, subject head<strong>in</strong>gs, or classification codes (Mai, 2005).<br />
Index<strong>in</strong>g has three ma<strong>in</strong> purposes; to facilitate easy location of documents by<br />
topic, to enable the identification of relations between documents, and to predict the<br />
relevance of a document to <strong>in</strong>formation needs (Korfhage, 1997). In other words,<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> is a highly important factor <strong>in</strong> the process of <strong>in</strong>formation retrieval. Or as<br />
Soergel puts it: Index<strong>in</strong>g ”...sets an upper limit for retrieval performance...” (1985, p.<br />
327). When seen <strong>in</strong> relation to the IR process, <strong><strong>in</strong>dex<strong>in</strong>g</strong> represents the <strong>in</strong>put to a system<br />
and retrieval of documents the output respectively (Milstead, 1992, p. 408).<br />
Accord<strong>in</strong>gly, the dist<strong>in</strong>ction between <strong>in</strong>put and output stresses the close relation<br />
between <strong><strong>in</strong>dex<strong>in</strong>g</strong> and retrieval as part of the IR process. Also, the applied <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
practice <strong>in</strong>fluences the results of <strong>in</strong>formation retrieval and retrieval should affect how<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> is carried out.<br />
The relation between subject <strong><strong>in</strong>dex<strong>in</strong>g</strong>, subject catalogu<strong>in</strong>g, and classification<br />
is close. All three concepts are used to designate aspects of labell<strong>in</strong>g and describ<strong>in</strong>g<br />
documents accord<strong>in</strong>g to their content, whether it is <strong>in</strong> the form of classification codes,<br />
subject terms or other <strong>in</strong>dicators (Anderson & Pérez-Carballo, 2005; Lancaster, 2003).<br />
This is reflected <strong>in</strong> the literature, where the concepts are used <strong>in</strong> an ambiguous way.<br />
Turn<strong>in</strong>g to automated acts of <strong><strong>in</strong>dex<strong>in</strong>g</strong> and classification, the situation is the same. Here<br />
the process of decid<strong>in</strong>g the content of documents and group<strong>in</strong>g them accord<strong>in</strong>gly may<br />
be referred to as automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (Lancaster, 2003; Moens, 2000; Salton & McGill,
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
1983) or automatic classification (e.g., Golub, 2007). In the present work, we def<strong>in</strong>e<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> a broad sense, tak<strong>in</strong>g the similarities of subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> and classification<br />
<strong>in</strong>to account. This def<strong>in</strong>ition allows for a broad view on the literature on automated<br />
approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong> and classification as well. We apply the term automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>, s<strong>in</strong>ce our focus is on the labell<strong>in</strong>g and group<strong>in</strong>g of documents.<br />
In order to expose the context of <strong><strong>in</strong>dex<strong>in</strong>g</strong> we make an <strong>in</strong>troduction to the<br />
purpose of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Second the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process is presented and followed by<br />
approaches to and core concepts <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Afterwards approaches to automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> are discussed. We f<strong>in</strong>ish the chapter by look<strong>in</strong>g at hybrid <strong><strong>in</strong>dex<strong>in</strong>g</strong> types that<br />
comb<strong>in</strong>es elements of either human or automatically based <strong><strong>in</strong>dex<strong>in</strong>g</strong> approaches.<br />
5.1 The process of <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
The process of <strong><strong>in</strong>dex<strong>in</strong>g</strong> refers to the act of assign<strong>in</strong>g subject terms to<br />
documents or other types of <strong>in</strong>formation <strong>in</strong> order to enable retrieval. Accord<strong>in</strong>g to<br />
Philipson (2008), the process of <strong><strong>in</strong>dex<strong>in</strong>g</strong> starts when the <strong>in</strong>dexer beg<strong>in</strong>s to familiarize<br />
with a document and ends, whenever the subject description has been completed. The<br />
process of <strong><strong>in</strong>dex<strong>in</strong>g</strong> has been presented conta<strong>in</strong><strong>in</strong>g different numbers of steps <strong>in</strong> the<br />
literature. In its most simplistic form, the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process is constituted by two steps.<br />
In the first step the document is analyzed <strong>in</strong> order to decide on the subject. Here, an<br />
identification of the aboutness of the document is identified. The first step may be<br />
referred to as the conceptual analysis (Lancaster, 2003), but the literature shows<br />
alternative designations as well (Mai, 2000, p. 281). In the second step of the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
process, the document subject is translated <strong>in</strong>to a set of <strong>in</strong>dex terms (Mai, 2005). This<br />
part of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process is denoted translation (Lancaster, 2003).<br />
The two step conception of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process has been challenged by other<br />
scholars (Mai, 2000). The <strong><strong>in</strong>dex<strong>in</strong>g</strong> process has been presented with up to five steps.<br />
The <strong>in</strong>creased number of steps allows for a more differentiated presentation of the<br />
process. However, the two steps <strong>in</strong> the simplified model can always be identified as<br />
underly<strong>in</strong>g the more detailed presentations. Rowley (1988, p. 50) presents the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
process as conta<strong>in</strong><strong>in</strong>g 3 steps: 1) familiarization, 2) analysis, and 3) conversion of<br />
concepts <strong>in</strong>to <strong>in</strong>dex terms. In the first step the <strong>in</strong>dexer becomes acqua<strong>in</strong>ted with the<br />
content of the document. Among other th<strong>in</strong>gs the <strong>in</strong>dexer should be aware of the<br />
structure of the subject. The familiarization forms the basis of the second phase: the<br />
analysis of the document. The second phase can to a certa<strong>in</strong> degree be guided by<br />
72
73<br />
Chapter 5<br />
guidel<strong>in</strong>es such as <strong>in</strong>structions, but experience and <strong>in</strong>tuition are also important here. In<br />
the third phase concepts from the document are matched with <strong>in</strong>dex terms from an <strong>in</strong>dex<br />
vocabulary. Compared to Lancaster’s’ (2003) two step model, Rowley expands the first<br />
step <strong>in</strong>to her first two phases, while Rowley’s third phase corresponds with Lancaster’s<br />
second step. Chowdhury (2004, p. 74) operates with a 5-step model of subject <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
conta<strong>in</strong><strong>in</strong>g the follow<strong>in</strong>g steps: 1) analysis of subject; 2) identification of keywords; 3)<br />
standardization of keywords; 4) choice of <strong><strong>in</strong>dex<strong>in</strong>g</strong> system, whether pre- or post<br />
coord<strong>in</strong>ate, and preparation of entries; and 5) fil<strong>in</strong>g of entries. Aga<strong>in</strong> we see Lancaster’s<br />
two steps as underly<strong>in</strong>g Chowdhury’s five. Thus, Chowdhury’s steps 1-3 correspond to<br />
Lancaster’s first step. At Chowdhury’s first step the <strong>in</strong>dexer analyses the subject of the<br />
document while the second step <strong>in</strong>volves the decision on which part of perhaps several<br />
subjects should be represented <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Whether the third step is ma<strong>in</strong>ly<br />
oriented towards conceptual analysis or translation is difficult to state due to<br />
Chowdhury’s limited description. However, we consider it as a part of the conceptual<br />
analysis s<strong>in</strong>ce it is positioned previous to the <strong>in</strong>troduction of the controlled vocabulary.<br />
In addition, it conta<strong>in</strong>s a standardization of the keywords selected on the basis of the<br />
conceptual analysis. The fourth and fifth steps of Chowdhury match the translation<br />
state of Lancaster. Here the entries <strong>in</strong> the controlled vocabulary are generated and filed<br />
<strong>in</strong>to the system. In sum, vary<strong>in</strong>g levels of detail may be identified <strong>in</strong> presentations of<br />
the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process. Different advantages are associated with a more detailed<br />
presentation of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process. Mai (2000) mentions the usefulness when<br />
carry<strong>in</strong>g out analyses of the process. An additional advantage is that it allows for more<br />
specificity when <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es are developed.<br />
Figure 5.1: Illustration of the subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> process (Mai, 2000, p. 279).
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Though a number of presentations exists on the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process, scholars<br />
with<strong>in</strong> the field of <strong><strong>in</strong>dex<strong>in</strong>g</strong> agree, that not much is known about the subject <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
process. In particular the part concern<strong>in</strong>g the <strong>in</strong>dexer’s determ<strong>in</strong>ation of the subject of a<br />
document is not very well discovered <strong>in</strong> the literature. Despite available <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
politics, standards, or guidel<strong>in</strong>es, it is difficult to decide, what takes place <strong>in</strong> the <strong>in</strong>itial<br />
step; identify<strong>in</strong>g the subject of a document (Mai, 2000; 2005). This is obviously a<br />
problem, s<strong>in</strong>ce the <strong>in</strong>itial step of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process may be considered most<br />
important, s<strong>in</strong>ce it forms the basis for the steps to follow. However, the entire process<br />
of <strong><strong>in</strong>dex<strong>in</strong>g</strong> is associated with a reduction or perhaps even loss of <strong>in</strong>formation compared<br />
to the full text of the document. Figure 5.1 illustrates this. A reduction of <strong>in</strong>formation<br />
is needed, because to end users it reduces the amount of <strong>in</strong>formation to keep track of.<br />
On the other hand, if documents are represented by wrong or mislead<strong>in</strong>g <strong>in</strong>dex terms, it<br />
could cause severe problems. Therefore, ensur<strong>in</strong>g the quality of <strong><strong>in</strong>dex<strong>in</strong>g</strong> is essential to<br />
successful retrieval.<br />
5.2 Quality of <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
Index<strong>in</strong>g quality is closely connected to the retrieval of documents. Thus, if<br />
the quality of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> is low, it will reflect on the quality of search results (Mai,<br />
2000). Index<strong>in</strong>g quality may be expressed <strong>in</strong> different terms. Two overall perspectives<br />
exist on how to measure <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality. One perspective considers the quality <strong>in</strong><br />
terms of retrieval effectiveness. That is, the quality is measured <strong>in</strong> terms of the ability<br />
of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> to be able to discrim<strong>in</strong>ate relevant documents from irrelevant documents<br />
as to search requests (e.g., Schultz, 1970; Borko, 1977; Lancaster, 2003). The other<br />
po<strong>in</strong>t of view considers quality <strong>in</strong> terms of the degree of consistency of the <strong><strong>in</strong>dex<strong>in</strong>g</strong>,<br />
that is, the accuracy <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong> of a document (e.g., Roll<strong>in</strong>g, 1981). However,<br />
other concepts also add to the identification of <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality, namely specificity and<br />
exhaustivity. The concepts are important to <strong><strong>in</strong>dex<strong>in</strong>g</strong> because they help characterize the<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> and are known to affect <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality. Below follows an <strong>in</strong>troduction to the<br />
concepts.<br />
5.2.1 Specificity<br />
Specificity expresses the generic level of assigned <strong>in</strong>dex terms (Soergel, 1994).<br />
The concept of specificity is <strong>in</strong>herently connected to the vocabulary applied for<br />
74
75<br />
Chapter 5<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the sense that the specificity of the applied vocabulary decides the possible<br />
level of specificity <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Thus, the generic levels of vocabularies will differ as to<br />
the scope of the vocabulary. For <strong>in</strong>stance, the same content of a document will most<br />
likely have a different depth of assigned <strong>in</strong>dex terms <strong>in</strong> a general vocabulary compared<br />
to a special vocabulary (cf. Mai, 2004b).<br />
It is a common approach <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> practices that <strong>in</strong>dex terms are chosen at<br />
the most specific level possible with<strong>in</strong> the frame of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language (e.g., Bates,<br />
1979; Lancaster, 2003). Hereby, the “force of discrim<strong>in</strong>ation” (Blair, 2002, p. 280) is<br />
supported. By force of discrim<strong>in</strong>ation is meant that by assign<strong>in</strong>g the most specific <strong>in</strong>dex<br />
terms possible, the database allows for discrim<strong>in</strong>ation between documents <strong>in</strong> the<br />
database, <strong>in</strong> particular between general and specific documents. Plac<strong>in</strong>g documents at<br />
the most specific level <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language ensures that documents at the same<br />
level of description will be retrieved <strong>in</strong> the same search session. For <strong>in</strong>stance,<br />
documents deal<strong>in</strong>g with <strong>in</strong>come taxes <strong>in</strong> general are at a higher generic level than<br />
documents deal<strong>in</strong>g with allowance for travel expenses. The pr<strong>in</strong>ciple of assign<strong>in</strong>g the<br />
most specific <strong>in</strong>dex term possible is beneficial when carry<strong>in</strong>g out specific searches.<br />
However, if the search is broader the system needs to allow for <strong>in</strong>clusion of narrower<br />
descriptors <strong>in</strong> order to avoid the <strong>in</strong>clusion of possibly relevant documents of greater<br />
specificity (Soergel, 1994).<br />
5.2.2 Exhaustivity<br />
Exhaustivity deals with <strong><strong>in</strong>dex<strong>in</strong>g</strong> terms’ coverage of the content of a document<br />
(Salton, 1986; Soergel, 1994; Lancaster, 2003; Anderson & Pérez-Carballo, 2005). Are<br />
just core aspects of the document covered by <strong><strong>in</strong>dex<strong>in</strong>g</strong> terms, or are sub aspects<br />
represented as well? Obviously, the larger the numbers of terms assigned, the greater<br />
the exhaustivity of the document will be. The counterpo<strong>in</strong>t to exhaustivity is selective<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>, where only the central subjects of a document is covered by the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
(Lancaster, 2003).<br />
Soergel (1994) dist<strong>in</strong>guishes between viewpo<strong>in</strong>t exhaustivity and importance<br />
exhaustivity. Importance exhaustivity addresses thresholds for when an aspect of a<br />
document is important enough to be represented <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. That is, how important<br />
must an element of a document be <strong>in</strong> order to be <strong>in</strong>cluded <strong>in</strong> the description of the<br />
document? Viewpo<strong>in</strong>t exhaustivity on the other hand po<strong>in</strong>ts to the depth or range of the<br />
implied <strong><strong>in</strong>dex<strong>in</strong>g</strong> language. Thus, viewpo<strong>in</strong>t exhaustivity designates the degree as to<br />
which facets and viewpo<strong>in</strong>ts expressed <strong>in</strong> a document are represented <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
language. One could say that the level of viewpo<strong>in</strong>t exhaustivity is def<strong>in</strong>ed by the limits<br />
of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language. This way the two types of exhaustivity complement each<br />
other. In the first case the level of the exhaustivity is set by the <strong>in</strong>dexer or the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
rules. In the second case the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language sets the upper limit for exhaustivity. In<br />
practice the two types of exhaustivity <strong>in</strong>teract. Importance exhaustivity will be restricted<br />
by the nature of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language. At the same time there is no need for a highly<br />
exhaustive <strong><strong>in</strong>dex<strong>in</strong>g</strong> language, if the def<strong>in</strong>ed <strong><strong>in</strong>dex<strong>in</strong>g</strong> policy prescribes a low level of<br />
importance exhaustivity. However, dist<strong>in</strong>guish<strong>in</strong>g between the two types of<br />
exhaustivity allow for identification of the factors affect<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> exhaustivity.<br />
The level of exhaustivity has economic implications (Lancaster, 2003). A high<br />
level of exhaustivity will require more effort from <strong>in</strong>dexers than a low level of<br />
exhaustivity. It is not necessarily useful to estimate exhaustivity quantitatively <strong>in</strong> terms<br />
of the number of assigned terms. Thus, other factors have an impact on exhaustivity,<br />
such as the size of the documents. Few <strong>in</strong>dex terms added to short documents may be<br />
just as exhaustive as more <strong>in</strong>dex terms added to longer documents (Anderson & Pérez-<br />
Carballo, 2005). The <strong><strong>in</strong>dex<strong>in</strong>g</strong> approach is another factor. Thus, a s<strong>in</strong>gle controlled<br />
term added, may represent the content of a document more exhaustively than a number<br />
of uncontrolled terms added by an <strong>in</strong>dexer (Fugmann, 1993). In terms of recall and<br />
precision (see section 5.2.4) high exhaustivity of <strong><strong>in</strong>dex<strong>in</strong>g</strong> will <strong>in</strong>crease precision of<br />
search results <strong>in</strong> the sense that documents deal<strong>in</strong>g with the searched subject partially<br />
will be retrieved along with documents whose ma<strong>in</strong> focus is on the same subject.<br />
Simultaneously recall is improved by high exhaustivity when documents can be found<br />
that has a more peripheral mention of the searched subject (Rowley, 1988). Also the<br />
ability to discrim<strong>in</strong>ate between documents must be considered <strong>in</strong> relation to<br />
exhaustivity. Thus, if the same terms are assigned to many documents, the<br />
discrim<strong>in</strong>ation value of the term decreases (Lancaster, 2003)<br />
5.2.3 Consistency<br />
Consistency becomes an issue when deal<strong>in</strong>g with human <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The<br />
consistency problem arises from the subjective process tak<strong>in</strong>g place when <strong>in</strong>dexers<br />
decide on the aboutness of a document. Hence consistency refers to the level of<br />
agreement between two or more <strong>in</strong>dexers on which <strong>in</strong>dex terms to use for the<br />
representation of a document (Zunde & Dexter, 1969). This type of consistency is also<br />
known as <strong>in</strong>ter-<strong>in</strong>dexer consistency (Lancaster, 2003). In other words; do two or more<br />
<strong>in</strong>dexers agree on, what is the subject of a document? And do they select the same <strong>in</strong>dex<br />
76
77<br />
Chapter 5<br />
term to represent the subject? The deviation between <strong>in</strong>dexers may take place at<br />
different levels. Lancaster (2003) lists 7 factors that may <strong>in</strong>fluence the degree of<br />
consistency between <strong>in</strong>dexers. The factors appear <strong>in</strong> Table 5.1. A related concept,<br />
<strong>in</strong>tra-<strong>in</strong>dexer consistency, refers to one <strong>in</strong>dexers level of agreement with himself<br />
(Lancaster, 2003). Here the question would be: Does the same <strong>in</strong>dexer have the same<br />
<strong>in</strong>terpretation of the subject of a document at different times? In this sense, the concept<br />
of consistency takes the subjective nature of human <strong>in</strong>dexers <strong>in</strong>to account and deals<br />
with the fact, that <strong><strong>in</strong>dex<strong>in</strong>g</strong> is a highly subjectively dependent process when performed<br />
by human be<strong>in</strong>gs.<br />
1. Number of terms assigned<br />
2. Controlled vocabulary versus free text <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
3. Size and specificity of vocabulary<br />
4. Characteristics of subject matter and its term<strong>in</strong>ology<br />
5. Indexer factors<br />
6. Tools available to <strong>in</strong>dexer<br />
7. Length of item to be <strong>in</strong>dexed<br />
Table 5.1 Possible factors affect<strong>in</strong>g consistency. From Lancaster (2003, p. 71).<br />
We have briefly mentioned that consistency could be one way to express the<br />
quality of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Roll<strong>in</strong>g (1981, p. 71) even def<strong>in</strong>es <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality <strong>in</strong> terms of<br />
consistency. The assumption is that the similarity of documents <strong>in</strong> an IR system cannot<br />
be properly expressed, if the <strong>in</strong>dexers do not demonstrate a sufficient level of<br />
consistency when assign<strong>in</strong>g <strong>in</strong>dex terms. However, express<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality <strong>in</strong><br />
terms of consistency has been disputed by other scholars. Cooper (1969) challenges<br />
consistency as a measure of quality, because consistency does not necessarily imply<br />
good <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Instead, he emphasizes the need to carry out <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> accordance<br />
with the requests users make to an IR system <strong>in</strong> order to ensure successful retrieval. As<br />
a consequence Cooper suggests <strong>in</strong>dexer-requester consistency as highly relevant to<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> quality. It is implicit to <strong>in</strong>dexer-requester consistency, that it is relevant, when<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> quality is expressed <strong>in</strong> terms of retrieval effectiveness. The assumption is that<br />
consistency might very well be high between <strong>in</strong>dexers, but if users apply other search<br />
terms than the ones consistently assigned by <strong>in</strong>dexers, the performance of searches will<br />
not be good. Achiev<strong>in</strong>g a high degree of <strong>in</strong>dexer-requester consistency is made difficult<br />
by the diverse conditions characteriz<strong>in</strong>g the <strong>in</strong>dexer and the requester respectively.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
However, the f<strong>in</strong>d<strong>in</strong>gs by Gomez, Lochbaum & Landauer (1990) suggest, that the richer<br />
the applied vocabulary, the more likely it is to see correspondence between <strong>in</strong>dexers’<br />
<strong>in</strong>dex terms and searchers’ search terms.. A similar f<strong>in</strong>d<strong>in</strong>g of the study is, that the<br />
more names an <strong>in</strong>formation object is allowed to have <strong>in</strong> an <strong>in</strong>formation system, the<br />
more likely it is, that it will be retrieved by searchers.<br />
5.2.4 Performance measures<br />
An alternative way of measur<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality is to <strong>in</strong>vestigate retrieval<br />
effectiveness. An important <strong>in</strong>strument here is the application of performance<br />
measures. Performance measures give an <strong>in</strong>dication of <strong>in</strong>dexer-requester consistency as<br />
suggested by Cooper (1969). Performance measures provide a macro analysis of the<br />
system performance and should preferably be supplemented by microanalysis as<br />
specific <strong>in</strong>vestigations of retrieval success and failure (Soergel, 1985). Us<strong>in</strong>g<br />
performance measures for IR evaluation have been a common practice s<strong>in</strong>ce the<br />
1950’ies. Kent et al. (1955) are among the first to propose different measures of<br />
performance <strong>in</strong> the shape of a number of factors express<strong>in</strong>g system performance. Two<br />
performance measures - recall and precision - have traditionally been employed <strong>in</strong> order<br />
to measure the quality of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The performance measures are quantitative<br />
measures express<strong>in</strong>g respectively:<br />
Recall = Number of relevant documents retrieved<br />
Total number of relevant documents <strong>in</strong> the collection<br />
Precision= Number of relevant documents retrieved<br />
Total number of documents retrieved from the collection<br />
Technically speak<strong>in</strong>g, precision is easier to measure, s<strong>in</strong>ce the evaluator only<br />
need to know which documents from a list of retrieved documents that are actually<br />
relevant. As for recall, one needs to know the relevance of all documents <strong>in</strong> the<br />
collection. In other words, recall challenges the setup of IR evaluation. Further, it<br />
becomes clear, that the concept of relevance is highly important for the outcome of IR<br />
evaluation due to its core position <strong>in</strong> the equations above. The concept of relevance<br />
represents a large and <strong>in</strong>dependent research area. S<strong>in</strong>ce the concept as such is beyond<br />
the scope of the present work, we will not explore further on it here. However, a<br />
thorough review of the concept can be found <strong>in</strong> Borlund (2003a). S<strong>in</strong>ce the first<br />
78
79<br />
Chapter 5<br />
<strong>in</strong>troductions of recall, precision and related performance measures additional measures<br />
have been <strong>in</strong>troduced, that allows tak<strong>in</strong>g <strong>in</strong>to account the characteristics of large scale<br />
IR systems. Examples are mean average precision, <strong>in</strong>teractive recall, and relative<br />
relevance (Kelly, 2009). We will not go further <strong>in</strong>to detail with these measures here.<br />
Different elements <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages can help <strong>in</strong>crease recall or precision<br />
or both. Accord<strong>in</strong>g to Lancaster (2003) exhaustive <strong><strong>in</strong>dex<strong>in</strong>g</strong> will <strong>in</strong>crease recall and<br />
lower precision s<strong>in</strong>ce exhaustivity <strong>in</strong>creases the number of retrieved items <strong>in</strong> search<strong>in</strong>g.<br />
Further, vocabulary control and the presence of different relationships <strong>in</strong> the vocabulary<br />
will <strong>in</strong>crease recall. Inversely, specificity of <strong><strong>in</strong>dex<strong>in</strong>g</strong>, scope notes, and relationships <strong>in</strong><br />
the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language are examples of precision devices (Aitchison, 1992). In sum, it is<br />
possible to adjust the <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality accord<strong>in</strong>g to the expected use of the <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
5.3 Approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
Index<strong>in</strong>g can be divided and characterized <strong>in</strong> a number of different ways, depend<strong>in</strong>g on<br />
the scope. In the sections to follow, we will present the perspectives needed <strong>in</strong> order to<br />
<strong>in</strong>troduce the Ph.D. project. The approaches presented below have been empirically<br />
tested <strong>in</strong> a variety of ways. S<strong>in</strong>ce some of the approaches are usually close related (e.g.,<br />
<strong>in</strong>tellectual and controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong>), empirical comparisons of the approaches may be<br />
relevant to several of the sections below. Therefore we present the empirical studies<br />
where we consider them most relevant.<br />
5.3.1 Document, user, and doma<strong>in</strong> oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
The approach to <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be def<strong>in</strong>ed by the po<strong>in</strong>t of departure of the<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>; whether document, user or doma<strong>in</strong> oriented. The orientation of the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
captures the focus of the subject analysis that takes place ahead of the assignment of<br />
<strong>in</strong>dex terms, that is, the <strong>in</strong>itial step of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process.<br />
Document oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> (or entity oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong>) seeks to represent the<br />
content of documents (Soergel, 1985; Fidel, 1994; Mai, 2005). Thus, the analysis of<br />
the document carried out <strong>in</strong> the first step of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process is based solely on the<br />
content of the document and does not take <strong>in</strong>to account the potential use of the<br />
document. The purpose of the document oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> is to carry out a description<br />
that is loyal to the content of the document. Document oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> may <strong>in</strong><br />
pr<strong>in</strong>ciple be carried out without any preced<strong>in</strong>g knowledge of the users expected to
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
benefit from the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The strength of the document centred approach is that<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> is kept stable due to the static nature of the document. Hereby, the <strong>in</strong>dexers do<br />
not need to consider potential future use of the document (Mai, 2005). Further, <strong>in</strong>dexers<br />
do not need extensive knowledge about the context of the document, whether the<br />
context implies users or the doma<strong>in</strong> <strong>in</strong> question (Fidel, 1994). As po<strong>in</strong>ted out by Mai<br />
(2005), the document oriented approach is supported by the <strong>in</strong>ternational standard for<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> (ISO, 1985).<br />
User (or request) oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> designates <strong><strong>in</strong>dex<strong>in</strong>g</strong> aimed at meet<strong>in</strong>g the<br />
requests expected from a particular audience (Fidel, 1994; Soergel, 1994; Lancaster,<br />
2003). Here, users’ anticipated requests are form<strong>in</strong>g the basis of the <strong>in</strong>dex terms<br />
assigned to a document. Thus, the <strong>in</strong>dexer considers, whether a document should be<br />
retrieved for a certa<strong>in</strong> request or not. Soergel (1985, p. 233) equates descriptors with<br />
queries. By work<strong>in</strong>g through (parts of) an <strong><strong>in</strong>dex<strong>in</strong>g</strong> language the <strong>in</strong>dexer checks<br />
whether a descriptor is relevant to the document <strong>in</strong> question. This sort of <strong><strong>in</strong>dex<strong>in</strong>g</strong> is<br />
also referred to as checklist <strong><strong>in</strong>dex<strong>in</strong>g</strong> (Soergel, 1985; Fidel, 1994). By reflect<strong>in</strong>g<br />
anticipated requests, user oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> seeks to <strong>in</strong>crease <strong>in</strong>dexer-requester<br />
consistency (cf. Cooper, 1969).<br />
Doma<strong>in</strong> oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be considered an extension of user oriented<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>. The conception of doma<strong>in</strong> is commonly associated with Hjørland’s (2002;<br />
Hjørland & Albrechtsen, 1995) concept of doma<strong>in</strong> analysis, which is primarily<br />
concerned with scientific discipl<strong>in</strong>es. However, Mai (2005, p. 605) considers the term<br />
doma<strong>in</strong> <strong>in</strong> a broader sense and def<strong>in</strong>es it as “a group of people who share common<br />
goals.” This way e.g., professional group<strong>in</strong>gs and <strong>in</strong>terest communities are also<br />
potential recipients of the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The assumption beh<strong>in</strong>d doma<strong>in</strong> oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
is that the subject of a document is to a large extent determ<strong>in</strong>ed by the contextual use of<br />
the document. Doma<strong>in</strong> oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> extends user based <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the sense that<br />
it does take the context of users <strong>in</strong>to account. Thus, it is supposed that the doma<strong>in</strong> users<br />
are members of, has a significant <strong>in</strong>fluence on the role of the document with<strong>in</strong> that<br />
particular doma<strong>in</strong> (Mai, 2005). It may be discussed whether users or doma<strong>in</strong>s change<br />
the most over time and as a consequence which of the two approaches is the most<br />
durable. However, both approaches need regular updates <strong>in</strong> order to ma<strong>in</strong>ta<strong>in</strong> their<br />
currency towards the users (cf. Lancaster, 2003; Mai, 2005).<br />
80
81<br />
Chapter 5<br />
Figure 5.2 Document and doma<strong>in</strong> oriented approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Adapted from Mai (2005, p.<br />
607)<br />
The approaches mentioned above may be summed up by Mai’s illustration (see<br />
Figure 5.2). It is important to note, that the three approaches are to a certa<strong>in</strong> degree<br />
condensed constructions that serve the purpose of identify<strong>in</strong>g tendencies <strong>in</strong> subject<br />
identification. In practice the approaches will <strong>in</strong> some cases be difficult to perform <strong>in</strong> a<br />
clean-cut manner. As an example of this, Mai (2005) mentions the difficulties <strong>in</strong>dexers<br />
may have, not us<strong>in</strong>g for <strong>in</strong>stance contextual knowledge when <strong>in</strong>terpret<strong>in</strong>g the subject of<br />
a document solely on the basis of the document.<br />
5.3.2 Controlled vs. uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
Controlled and uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> refers to the <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages used to<br />
perform <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In its basic form, a controlled vocabulary is an authority list<br />
specify<strong>in</strong>g the <strong>in</strong>dex terms, <strong>in</strong>dexers can assign when perform<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong>. However,<br />
<strong>in</strong> addition, controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages are commonly express<strong>in</strong>g some sort of<br />
semantic structure <strong>in</strong> order to be able to for <strong>in</strong>stance control synonyms, differentiate<br />
between homographs, and l<strong>in</strong>k related terms (Lancaster, 2003). Controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
languages belong to the type of systems referred to as knowledge organization systems<br />
or KOS. KOS may be characterized as to their structure (or the relationships expressed)<br />
and function (cf., Zeng, 2008). As a consequence a number of different KOS exists.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Figure 5.3 Types of vocabularies and their relationships. Adapted from Morville & Rosenfeld<br />
(2007, p. 195)<br />
Subject head<strong>in</strong>g lists are alphabetically ordered lists of controlled terms and related<br />
subhead<strong>in</strong>gs. Thesauri on the other hand differ by hav<strong>in</strong>g fully organized terms<br />
elaborat<strong>in</strong>g relations between concepts (Aitchison, 1992). Thesauri and subject head<strong>in</strong>g<br />
lists have two features <strong>in</strong> common. When <strong>in</strong> use they control the use and form of<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> terms, and enables relations between terms <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language (Rowley,<br />
1988, p. 68). Additionally, taxonomies, ontologies, and classification schemes are<br />
variants of controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages (Aitchison, 1992; Gilchrist, 2003). Controlled<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> languages can be characterized as to their degree of complexity. Morville &<br />
Rosenfeld have illustrated this graphically (see Figure 5.3). Accord<strong>in</strong>g to the model, the<br />
lowest level of complexity represents equivalence relationships while the highest level<br />
represents associative relationships as for <strong>in</strong>stance expressed <strong>in</strong> thesauri.<br />
In controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages “both the terms used to represent subjects,<br />
and the process whereby terms are assigned to particular documents, are controlled or<br />
executed by a person” (Rowley, 1994, p. 109). This is the ma<strong>in</strong> reason for the close<br />
relation between controlled and manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> mentioned <strong>in</strong> section 5.3. 4 However,<br />
as will be seen later <strong>in</strong> the present chapter, automatic methods for controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
have been developed. In other words, the relation between controlled and manual<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> is not unequivocal.<br />
4 We elaborate further on manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the section to follow (section 5.3.3).<br />
82
83<br />
Chapter 5<br />
Uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> extracts <strong><strong>in</strong>dex<strong>in</strong>g</strong> words from the document itself or<br />
from another source outside of the controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> language. When speak<strong>in</strong>g of<br />
uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong>, two generic types exist. Free <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages assigns terms<br />
to documents that not necessarily orig<strong>in</strong>ate from the document itself. Natural language<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> on the other hand applies terms from the document for representation, and is<br />
usually employed when perform<strong>in</strong>g automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (Rowley, 1988). Index<strong>in</strong>g by<br />
natural language forms a subord<strong>in</strong>ate field of research to the general field of natural<br />
language process<strong>in</strong>g (NLP) (Chowdhury, 2003). Uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be carried<br />
out by humans or mach<strong>in</strong>es. However, natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong> is commonly<br />
associated with automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (see section 5.4).<br />
Dubois (1987, p. 249) have summarized the strengths and weaknesses of<br />
controlled vocabularies and free text <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The key po<strong>in</strong>ts appear from Table 5.2.<br />
To a large extent Blair & Maron’s (1985) study <strong>in</strong> an empirical manner supports<br />
Dubois’ summary concern<strong>in</strong>g free text <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Blair & Maron tested the retrieval<br />
effectiveness of a full text retrieval system that is, <strong><strong>in</strong>dex<strong>in</strong>g</strong> by natural language.<br />
Involv<strong>in</strong>g two test persons with<strong>in</strong> the legal doma<strong>in</strong> Blair & Maron found that the level<br />
of recall <strong>in</strong> the searches carried out was surpris<strong>in</strong>gly low. A number of different reasons<br />
expla<strong>in</strong>ed the results regard<strong>in</strong>g recall. For one th<strong>in</strong>g the test persons had difficulties<br />
predict<strong>in</strong>g the exact word<strong>in</strong>g applied <strong>in</strong> the documents searched for. It turned out that<br />
the test persons’ selection of words was decided by their po<strong>in</strong>t of view on the problem<br />
<strong>in</strong> question. Also, misspell<strong>in</strong>gs <strong>in</strong> the documents conta<strong>in</strong>ed <strong>in</strong> the retrieval system<br />
resulted <strong>in</strong> lack of retrieval. Both these f<strong>in</strong>d<strong>in</strong>gs illustrate how the searchers are be<strong>in</strong>g<br />
challenged when search<strong>in</strong>g for <strong>in</strong>formation as po<strong>in</strong>ted out by Dubois (1987). Further it<br />
was found that search terms rated important by the searchers did not occur <strong>in</strong> document<br />
relevant to given requests. In some cases the terms were just not <strong>in</strong>cluded <strong>in</strong> the<br />
documents. In other cases the terms occurred, but were expressed <strong>in</strong> terms of narrower<br />
or broader concepts. This problem is also addressed by Dubois. On the other hand this<br />
can be considered the strength of natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong>. For <strong>in</strong>stance, Tenopir<br />
(1985) found that the use of synonyms <strong>in</strong> natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong> were able to<br />
compensate for users’ <strong>in</strong>complete queries.<br />
The performance of controlled versus uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages have a<br />
core subject of <strong>in</strong>vestigation <strong>in</strong> the LIS research literature. One of the first<br />
<strong>in</strong>vestigations compar<strong>in</strong>g the retrieval effectiveness of different <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages was<br />
the Cranfield tests. The tests took place for approximately a decade beg<strong>in</strong>n<strong>in</strong>g <strong>in</strong> the
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Controlled<br />
vocabularies<br />
Advantages Disadvantages<br />
Solves many semantic<br />
problems<br />
Permits generic<br />
relationships to be<br />
identified<br />
Maps areas of knowledge<br />
Free text Low cost<br />
Simplified search<strong>in</strong>g<br />
Full <strong>in</strong>formation content<br />
searchable<br />
Every word has equal<br />
retrieval value<br />
No human <strong><strong>in</strong>dex<strong>in</strong>g</strong> errors<br />
No delay <strong>in</strong> <strong>in</strong>corporat<strong>in</strong>g<br />
new terms<br />
84<br />
High cost<br />
Possible <strong>in</strong>adequacies of coverage<br />
Human error<br />
Possible out of date vocabulary<br />
Difficulty of systematically<br />
<strong>in</strong>corporat<strong>in</strong>g all relevant<br />
relationships between terms<br />
Greater burden on searcher<br />
Information implicitly but not<br />
overtly <strong>in</strong>cluded <strong>in</strong> text may be<br />
missed<br />
Absence of specific to generic<br />
l<strong>in</strong>kage<br />
Vocabulary of discipl<strong>in</strong>e must be<br />
known<br />
Table 5.2 Summary of strengths and weaknesses of controlled vocabularies and free text. Adapted<br />
from Dubois (1987, p. 249).<br />
mid-1950’s. The overall purpose of the Cranfield tests was to carry out comparative<br />
evaluations of a number of different controlled and uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages.<br />
However, the tests have become at least equally known for their pioneer<br />
contribution to the methodical body of knowledge on evaluation of IR systems (cf.<br />
Sparck Jones, 1981). The Cranfield tests comprised two tests; Cranfield I and Cranfield<br />
II. Cranfield I identified the complexity of isolat<strong>in</strong>g a s<strong>in</strong>gle <strong><strong>in</strong>dex<strong>in</strong>g</strong> language <strong>in</strong> a test<br />
situation, s<strong>in</strong>ce the tested <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages were found to be <strong>in</strong>teract<strong>in</strong>g as to their<br />
functions as precision and recall devices respectively (Cleverdon, 1967). Criticism was<br />
put forward by different authors, ma<strong>in</strong>ly concern<strong>in</strong>g methodical issues (Sparck Jones,<br />
1981). Next followed Cranfield II with a slightly enlarged test collection compared to<br />
Cranfield I. Cranfield II built upon Cranfield I and served the purpose of carry<strong>in</strong>g out a<br />
closer <strong>in</strong>vestigation of the effect s<strong>in</strong>gle <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages had on performance. Like <strong>in</strong><br />
Cranfield I, a number of different <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages were tested aga<strong>in</strong>st each other.<br />
The languages were distributed across three ma<strong>in</strong> types: 1) S<strong>in</strong>gle term <strong><strong>in</strong>dex<strong>in</strong>g</strong>
85<br />
Chapter 5<br />
languages, 2) Simple concept <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages, and 3) Controlled term <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
languages. Furthermore, <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages represent<strong>in</strong>g keywords <strong>in</strong> titles and<br />
abstracts were <strong>in</strong>cluded <strong>in</strong> the test. Among the results of the test the <strong>in</strong>verse relation<br />
between recall and precision was found. By this is meant that when recall is high,<br />
precision tends to be low and vice versa (Cleverdon, 1967). An ordered list of the<br />
performance of the tested <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages when measured <strong>in</strong> terms of normalized<br />
recall further showed that apply<strong>in</strong>g s<strong>in</strong>gle terms (the first group of <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages<br />
mentioned above) was superior to the rema<strong>in</strong><strong>in</strong>g two groups. S<strong>in</strong>gle concepts (the<br />
second group of <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages tested) had the lowest performance, while<br />
controlled terms and keywords from titles and abstracts had a medium score (Cleverdon<br />
& Keen, 1966, p. 253; Cleverdon, 1967, p. 189). 5 Thus, the results suggest that<br />
uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> is certa<strong>in</strong>ly a valuable tool for retrieval purposes, but that they<br />
should preferably be <strong>in</strong> the form of s<strong>in</strong>gle term languages compared to simple concept<br />
languages.<br />
In another study Cous<strong>in</strong>s (1992) compared the performance of basic marc<br />
records, and records enriched with either natural language <strong>in</strong>dex terms or controlled<br />
<strong>in</strong>dex terms. Performance was measured <strong>in</strong> terms of recall. The natural language terms<br />
of the study orig<strong>in</strong>ated from the table of contents and back of the book <strong>in</strong>dexes of the<br />
<strong>in</strong>dexed units. PRECIS represented the controlled vocabulary for the test. The choice of<br />
PRECIS was based on a preced<strong>in</strong>g <strong>in</strong>vestigation, where it was found that out of three<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> languages, PRECIS was the most suitable for the queries guid<strong>in</strong>g the test. 11<br />
queries of vary<strong>in</strong>g themes were applied for the test. In her test Cous<strong>in</strong>s found that the<br />
retrieval performance of the enriched records exceeded the basic records. However, it<br />
was also found that the relative retrieval performance <strong>in</strong> the enriched records depended<br />
on whether the queries applied for the test were truncated or not. Thus, it turned out,<br />
that PRECIS had a better performance when queries were not truncated. Conversely,<br />
when test queries were truncated the retrieval performance of the natural language<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> was superior. Overall, truncated queries applied for natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
had the best retrieval performance of the test. In Cous<strong>in</strong>s discussion she mentions the<br />
<strong>in</strong>fluence of the test queries on the test results. Thus, the formulation of some of the<br />
queries turned out to have quite an effect on the test result due to their choice of terms<br />
5 A thorough presentation of the results of the Cranfield tests has been presented <strong>in</strong> Cleverdon (1960)<br />
(Cranfield I) and Cleverdon & Keen (1966) (Cranfield II).
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
and subsequent potential for truncation. Apart from the search test results, Cous<strong>in</strong>s<br />
study adds to emphasize the importance of the amount and nature of queries applied <strong>in</strong><br />
retrieval tests. This is particularly the case, when the test setup does not <strong>in</strong>clude real<br />
users, but are carried out <strong>in</strong> an experimental sett<strong>in</strong>g like Cous<strong>in</strong>’s.<br />
In a study with a slightly different focus, Gross & Taylor (2005) <strong>in</strong>vestigated<br />
the amount of relevant records be<strong>in</strong>g missed if controlled <strong>in</strong>dex terms were removed<br />
from records <strong>in</strong> a library catalogue. Thus, though not explicated by the authors, recall<br />
was used to measure the performance between records <strong>in</strong>clud<strong>in</strong>g and records exclud<strong>in</strong>g<br />
controlled subject data. A sample of 227 queries drawn from a log of the library<br />
catalogue functioned as the <strong>in</strong>formation needs of the study. The study found that<br />
approximately one third of records would not have been retrieved without the<br />
assignment of controlled subject data. The study supports the general perception that<br />
controlled subject data supports recall. Also, obviously, controlled subject data need to<br />
supplement the natural language appear<strong>in</strong>g <strong>in</strong> records. In a similar study Veenema<br />
(1996) evaluated the performance of controlled <strong>in</strong>dex terms and natural language <strong>in</strong> a<br />
small test collection (553 documents of highly vary<strong>in</strong>g content and form) compiled<br />
from a Canadian embassy. The <strong><strong>in</strong>dex<strong>in</strong>g</strong> policy guid<strong>in</strong>g the manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> is far from<br />
aim<strong>in</strong>g at exhaustivity. This results <strong>in</strong> an average of 2 assigned terms per document <strong>in</strong><br />
the test collection. The comparison of the two <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages shows that the nature<br />
of the <strong>in</strong>formation need affects the performance of the respective languages. Thus, due<br />
to the highly restrictive <strong><strong>in</strong>dex<strong>in</strong>g</strong> policy on the controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> language, the natural<br />
language performed better <strong>in</strong> <strong>in</strong>formation needs concern<strong>in</strong>g locations, while the<br />
controlled <strong>in</strong>dex terms performed better on <strong>in</strong>formation needs regard<strong>in</strong>g a certa<strong>in</strong> sector.<br />
Though the empirical basis of the study is rather limited, the study adds to illustrate the<br />
implications of <strong><strong>in</strong>dex<strong>in</strong>g</strong> policies on test results, but also how specific characteristics of<br />
<strong>in</strong>formation requests may affect outcomes of comparisons of <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages.<br />
Savoy (2005) has compared manual, assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> and automatic,<br />
extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> a study of French database named Amaryllis. Here, we will<br />
present the results relevant to the performance of controlled versus uncontrolled<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>. However, implicitly the differences between manual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
are also illustrated by the study. In the study, manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> was ma<strong>in</strong>ly carried by<br />
us<strong>in</strong>g a controlled vocabulary. The <strong>in</strong>dexers were allowed to supplement with<br />
uncontrolled <strong>in</strong>dex terms. In practice uncontrolled terms occurred rarely, though the<br />
share was not specified <strong>in</strong> the paper. <strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> was represented <strong>in</strong> the study<br />
by ten different <strong><strong>in</strong>dex<strong>in</strong>g</strong> models such as the Okapi probabilistic model, a b<strong>in</strong>ary model<br />
86
87<br />
Chapter 5<br />
where a term either occurs or do not occur, and a number of weighted approaches. The<br />
test collection conta<strong>in</strong>ed approximately 145.000 documents. Thus, the results of the<br />
study cannot necessarily be transferred to real life databases, which commonly conta<strong>in</strong><br />
millions of documents. 25 queries represented the <strong>in</strong>formation needs of the study.<br />
Concern<strong>in</strong>g controlled versus uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> the study found the best<br />
performance to be achieved by us<strong>in</strong>g a comb<strong>in</strong>ation of the two general <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
languages. Similar f<strong>in</strong>d<strong>in</strong>gs have been made by Tenopir (1985) regard<strong>in</strong>g controlled<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> and natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong>. A comparison between controlled and<br />
uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> slightly favored controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> when measured as to mean<br />
average precision. However, the results were not statistically significant. Go<strong>in</strong>g<br />
through the results manually revealed rather comprehensive variations at query level.<br />
This result emphasizes the <strong>in</strong>fluence of test queries on test results and the importance of<br />
validation.<br />
The studies above have compared controlled and uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
languages. Recently, Price et al. (2007; 2009) have <strong>in</strong>troduced the notion of semantic<br />
components that allow for a simultaneous comb<strong>in</strong>ation of controlled and free text<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>. Semantic component <strong><strong>in</strong>dex<strong>in</strong>g</strong> provides a supplementary, enriched<br />
description of document contents by manually mark<strong>in</strong>g up segments of text <strong>in</strong> a<br />
document (i.e., semantic component <strong>in</strong>stances) with labels (semantic component<br />
names). Doma<strong>in</strong>-specific documents tend to conta<strong>in</strong> characteristic types of <strong>in</strong>formation<br />
(semantic components). With semantic components a searcher can search for query<br />
terms with<strong>in</strong> specific semantic components, or specify a preference for documents<br />
conta<strong>in</strong><strong>in</strong>g particular semantic components. Hereby, the searcher can comb<strong>in</strong>e the<br />
advances of uncontrolled full text search and doma<strong>in</strong>-oriented controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> that<br />
emphasizes topics or components of the documents. Semantic components have been<br />
empirically evaluated (e.g., Price et al., 2007; Price et al., 2009). The results suggest<br />
that this particular type of <strong><strong>in</strong>dex<strong>in</strong>g</strong> can be a valuable improvement of full text <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
As appears from the studies presented above, precision and <strong>in</strong> particular recall<br />
has been applied several times. However, the results do not po<strong>in</strong>t to an unambiguous<br />
relation between types of <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages and the mentioned performance measures.<br />
Rather, it seems that Svenonius’ (1986, p. 335) perception that both free text and<br />
controlled vocabularies contribute to recall and precision, but <strong>in</strong> different ways, is<br />
validated. Apparently a comb<strong>in</strong>ation of controlled and uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be<br />
advisable, tak<strong>in</strong>g <strong>in</strong>to account the respective strengths and weaknesses of the respective<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> languages. Rowley (1994) concludes her paper by outl<strong>in</strong><strong>in</strong>g a number of
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
factors that may help decide on the optimal comb<strong>in</strong>ation of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages; 1)<br />
the search<strong>in</strong>g environment, 2) the searchers, 3) available retrieval facilities and<br />
strategies, and 4) the nature of the search. On the basis of these factors and on the basis<br />
of this section <strong>in</strong> general, we can conclude that the selection of <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages<br />
should reflect the actual area of function.<br />
5.3.3 Intellectual vs. automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
Overall, <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be carried out by humans (<strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>), by<br />
mach<strong>in</strong>es (automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>), or by a comb<strong>in</strong>ation of the two (semi-automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>). As <strong>in</strong>dicated by the name, <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong> is the <strong><strong>in</strong>dex<strong>in</strong>g</strong> carried out by<br />
humans, that is, <strong>in</strong>dexers assign <strong>in</strong>dex words to documents, usually on the basis of a<br />
controlled vocabulary. The literature also applies the terms human or manual <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
to designate <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
Rafferty & Hidderley (2007) identify three approaches to <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>:<br />
Expert-led <strong><strong>in</strong>dex<strong>in</strong>g</strong>, author-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>, and user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Traditionally,<br />
<strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong> have been carried out by professional <strong>in</strong>dexers, expert-led<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>. The purpose is to establish a connection between user and document on the<br />
basis of a controlled vocabulary, by us<strong>in</strong>g free text identifiers, or a comb<strong>in</strong>ation of the<br />
two. In scientific databases it is also common, that authors attach keywords to their<br />
contributions. This is referred to as author-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>. These keywords are not<br />
selected from a controlled vocabulary. Rather, they represent the authors’ perception of<br />
the content of their document <strong>in</strong> the form of uncontrolled <strong>in</strong>dex terms. With the amount<br />
of <strong>in</strong>formation produced today, e.g., on the Internet, supplement<strong>in</strong>g or perhaps even<br />
replac<strong>in</strong>g professional <strong>in</strong>dexers with other <strong>in</strong>dexers can be a means to ensure subject<br />
representation of <strong>in</strong>formation objects. Thus, <strong>in</strong> the latest decade, we have seen the<br />
emergence of onl<strong>in</strong>e sources that allow users to assign tags to <strong>in</strong>formation sources (e.g.,<br />
Hunter, 2009; Trant, 2009). User tags broaden the conception of <strong><strong>in</strong>dex<strong>in</strong>g</strong> due to the<br />
supplementary functions, tags also have (Golder & Huberman, 2006). Thus, tags allow<br />
for more f<strong>in</strong>e gra<strong>in</strong>ed access to <strong>in</strong>formation sources than usually possible through<br />
professional <strong><strong>in</strong>dex<strong>in</strong>g</strong> us<strong>in</strong>g a controlled vocabulary (Kipp, 2005). The latter type is<br />
known as user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
Rafferty & Hidderley (2007) characterizes the three types of <strong>in</strong>tellectual<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> as to their communicative potential. Thus, both expert-led and author-based<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> is characterized as monologic, because they express a k<strong>in</strong>d of <strong><strong>in</strong>dex<strong>in</strong>g</strong> that is<br />
88
Controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> Uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
Monologic Professional <strong>in</strong>dexers Author-based <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
Dialogic User-based <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
89<br />
Chapter 5<br />
Figure 5.4 Generalized characteristics of <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Accumulated on the basis of<br />
Rafferty & Hidderley (2007).<br />
not communicat<strong>in</strong>g with the potential users of the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. User-based <strong><strong>in</strong>dex<strong>in</strong>g</strong> on the<br />
other hand represents a dialogic type of <strong><strong>in</strong>dex<strong>in</strong>g</strong>, because it allows for the users of<br />
documents to express their <strong>in</strong>dividual <strong>in</strong>terpretation of an <strong>in</strong>formation unit. This is<br />
graphically illustrated <strong>in</strong> Figure 5.4. One must keep <strong>in</strong> m<strong>in</strong>d, that the figure presents a<br />
generalized view of the <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong> types. In other words exceptions to the<br />
figure do exist. For <strong>in</strong>stance, <strong>in</strong> some cases professional <strong>in</strong>dexers also carry out<br />
uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The reason for connect<strong>in</strong>g professional <strong>in</strong>dexers with controlled<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> after all is the fact, that this is the most frequently occurr<strong>in</strong>g type of <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
performed by this particular group of <strong>in</strong>dexers. As appears from the figure, the box<br />
represent<strong>in</strong>g dialogic, controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> is empty. The reason is that this type of<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> has not yet been fully developed. Different authors have addressed the<br />
problem and different solutions for the lack of control <strong>in</strong> folksonomies have been<br />
proposed (cf., Trant, 2009).<br />
Different studies have <strong>in</strong>vestigated the characteristics of the different types of<br />
<strong>in</strong>dexers mentioned above. Kipp (2005) has compared users’, authors’ and professional<br />
<strong>in</strong>dexers’ assignment of <strong>in</strong>dex terms and tags to 165 scientific papers from core LIS<br />
journals. The analysis presented mostly <strong>in</strong>vestigated the terms assigned by professional<br />
<strong>in</strong>dexers and users. Kipp found that there was some overlap <strong>in</strong> the terms assigned, but<br />
that the overlap often represented narrower terms, broader terms, related terms and<br />
synonyms. However, quite a number of terms were not related to each other between<br />
the <strong>in</strong>dexer groups. Kipp suggested, that one explanation could be, that users could<br />
apply one specific term to address new concepts, whereas <strong>in</strong>dexers needed to express<br />
new terms <strong>in</strong> a controlled vocabulary by a comb<strong>in</strong>ation of controlled terms already<br />
exist<strong>in</strong>g <strong>in</strong> the vocabulary.<br />
In a later study, Strader (2009) made a comparative study <strong>in</strong>vestigat<strong>in</strong>g the<br />
degree of overlap between author-assigned keywords and Library of Congress Subject<br />
Head<strong>in</strong>gs (LCSH), that is, controlled <strong>in</strong>dex terms assigned by professional <strong>in</strong>dexers.<br />
The subject of <strong>in</strong>vestigation was bibliographic records represent<strong>in</strong>g doctoral students’
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
publications <strong>in</strong> an onl<strong>in</strong>e catalogue. 285 theses and dissertations conta<strong>in</strong><strong>in</strong>g a total of<br />
1.681 author keywords and 1.181 LCSH terms were analyzed. The study showed, that<br />
there was a certa<strong>in</strong> overlap between author-assigned and LCSH. However,<br />
approximately half of the author-assigned keywords did not match LCSH. A number of<br />
reasons can expla<strong>in</strong> the lack of overlap between subject terms. One reason may be that<br />
LCSH are not updated frequently enough to reflect current research. Another reason is<br />
that the authors use a different term<strong>in</strong>ology to represent similar concepts. Strader also<br />
found that about one-tenth of author-assigned subject terms and one-third of LCSH<br />
supplements data could be found elsewhere <strong>in</strong> the bibliographic record. In other words,<br />
LCSH to a larger degree supply users with unique access po<strong>in</strong>ts to the <strong>in</strong>vestigated<br />
records. However, it was concluded that both types of <strong><strong>in</strong>dex<strong>in</strong>g</strong> enriches the retrieval<br />
environment for users.<br />
Thomas, Caudle & Schmitz (2009) also exam<strong>in</strong>ed LCSH, but compared it to<br />
user tags <strong>in</strong> Library Th<strong>in</strong>g. Ten books were selected to form the basis of the<br />
<strong>in</strong>vestigation. The criteria for selection were that the books were popular, and that they<br />
represented weak LCSH areas. Both criteria must be taken <strong>in</strong>to account, when apply<strong>in</strong>g<br />
the results of the <strong>in</strong>vestigation, s<strong>in</strong>ce it favors user tags, which potentially affects the<br />
generalizability of the study. On the basis of the <strong>in</strong>vestigation the authors found, that<br />
users tag for their own purposes. Also, there was a certa<strong>in</strong> overlap between LCSH<br />
subject terms and user tags, but user tags were stronger than LCSH terms, when<br />
concern<strong>in</strong>g task organization. Users of <strong>in</strong>formation systems will get the richest<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>, when the system applies a comb<strong>in</strong>ation of user tags and LCSH terms, but the<br />
benefits are greater, when the number of tags is large.<br />
As a part of her Ph.D. work, Choi (2010a; 2010b) carried out a study<br />
<strong>in</strong>vestigat<strong>in</strong>g user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong> and expert-led <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Though prelim<strong>in</strong>ary <strong>in</strong><br />
nature, the study compared <strong>in</strong>dex terms assigned to web documents at the web sites<br />
Intute, BUBL and Delicious. The first study took <strong>in</strong>to account both controlled and<br />
uncontrolled keywords from Intute. The study showed, that the subject perspectives<br />
expressed at the three exam<strong>in</strong>ed websites differed, even between the two sites<br />
represent<strong>in</strong>g professional <strong>in</strong>dexers (Choi, 2010a). The second study left out subjective<br />
and personal tags from Delicious. The study found that the level of similarity between<br />
<strong>in</strong>dexers and users differed as to the subject of the <strong>in</strong>dexed websites. Thus, subjects<br />
with a larger <strong>in</strong>take of new words (e.g., technology) tend to generate less consistency<br />
between <strong>in</strong>dexers (Choi, 2010b).<br />
90
91<br />
Chapter 5<br />
Attar (2006) carried out a study evaluat<strong>in</strong>g the <strong><strong>in</strong>dex<strong>in</strong>g</strong> performance of student<br />
<strong>in</strong>dexers. Unlike the studies just mentioned, Attar’s study is not comparative <strong>in</strong> nature.<br />
The study <strong>in</strong>vestigated subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> and the formal description of <strong>in</strong>formation units<br />
<strong>in</strong> a library catalogue. 37 undergraduate and graduate students catalogued and <strong>in</strong>dexed<br />
a full library collection with very diverse document types after hav<strong>in</strong>g received two<br />
days of detailed tra<strong>in</strong><strong>in</strong>g. The students came from diverg<strong>in</strong>g studies, but none were LIS<br />
students. When possible, the students <strong>in</strong>dexed <strong>in</strong>formation with<strong>in</strong> the subject area they<br />
were familiar with from their study. Evaluat<strong>in</strong>g the <strong><strong>in</strong>dex<strong>in</strong>g</strong> subsequently, Attar found,<br />
that the problems <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong> carried out <strong>in</strong> particular related to <strong>in</strong>consistent and<br />
<strong>in</strong>correct use of subject head<strong>in</strong>gs. For literary works, particularly the use of genre<br />
caused trouble. The problems were caused by lack of tra<strong>in</strong><strong>in</strong>g and lack of familiarity<br />
with LCSH. In this manner, the study stresses the importance of proper tra<strong>in</strong><strong>in</strong>g, when<br />
carry<strong>in</strong>g out <strong><strong>in</strong>dex<strong>in</strong>g</strong> at a professional level.<br />
The empirical studies presented above to a large extent <strong>in</strong>form us about the<br />
characteristics of different types of <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>. However, as reflected <strong>in</strong><br />
Figure 5.4, the results of the comparisons also elucidate the pros and cons of controlled<br />
vs. uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages presented <strong>in</strong> section 5.3.2. Tak<strong>in</strong>g <strong>in</strong>to account,<br />
that user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong> appear to follow a power law distribution: few <strong>in</strong>formation<br />
units receive most of the assigned tags and vice versa (cf., Thomas, Caudle & Schmitz,<br />
2009), user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong> should not be the only type of <strong><strong>in</strong>dex<strong>in</strong>g</strong>, at least <strong>in</strong> systems<br />
that are also used for high precision searches. Further, the studies, that have been<br />
presented above have one th<strong>in</strong>g <strong>in</strong> common. The method applied is to analyze the<br />
product of the <strong><strong>in</strong>dex<strong>in</strong>g</strong>, namely the assigned tags or <strong>in</strong>dex terms. Several authors make<br />
conclusions on the <strong>in</strong>tentions of the <strong>in</strong>dexers on the basis of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> product. A<br />
study <strong>in</strong>vestigat<strong>in</strong>g <strong>in</strong>dexer <strong>in</strong>tentions for <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> a more qualitative manner would<br />
be an <strong>in</strong>terest<strong>in</strong>g supplement to the exist<strong>in</strong>g and highly enlighten<strong>in</strong>g studies.<br />
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> constitute a contrast to <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong> s<strong>in</strong>ce it<br />
designate <strong><strong>in</strong>dex<strong>in</strong>g</strong> carried out solely on the basis of a mechanical identification of <strong>in</strong>dex<br />
terms on the basis of word occurrences <strong>in</strong> documents. We will expla<strong>in</strong> the concept more<br />
thoroughly <strong>in</strong> Section 5.4 and onwards. Accord<strong>in</strong>g to Albrechtsen (1993) automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> represents the most simplistic conception of subject analysis, s<strong>in</strong>ce the subject<br />
of the document is solely based on the frequency of terms. However, automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> may take the form of either extracted or assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> (see Sections 5.4.1<br />
and 5.4.2. below). As far as automatically assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> is concerned,<br />
Albrechtsen’s statement may be discussed. Here the assignment of <strong>in</strong>dex terms is
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
carried out on the basis of a set of rules direct<strong>in</strong>g the occurrence of certa<strong>in</strong> words to<br />
specific po<strong>in</strong>ts <strong>in</strong> a controlled vocabulary. S<strong>in</strong>ce these rules have been formulated by<br />
humans, some sort of <strong>in</strong>tellectual <strong>in</strong>terpretation of the subject relation between the<br />
document and the controlled vocabulary has been established.<br />
If manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> is taken to represent controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> and automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> to represent uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong>, the differences between the two<br />
approaches will to a large extent be reflected <strong>in</strong> Table 5.2. However, additional<br />
differences exist between manual and automatic methods. One obvious difference<br />
between human and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> relates to economy. Thus, it is quite costly to<br />
perform manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>, when it comes to economy and time consumption, at least<br />
concern<strong>in</strong>g expert-led <strong><strong>in</strong>dex<strong>in</strong>g</strong>. This can expla<strong>in</strong> some of the efforts put <strong>in</strong>to<br />
develop<strong>in</strong>g automatic methods. Accord<strong>in</strong>gly, the low costs connected with automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> are implicitly considered a strength. Here it is important to keep <strong>in</strong> m<strong>in</strong>d the<br />
costs related to not be<strong>in</strong>g able to f<strong>in</strong>d <strong>in</strong>formation (cf. Feldman & Sherman, 2001).<br />
Ineffective retrieval may be caused by both manual and automatic methods. Thus for<br />
both <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods Feldman & Sherman’s calculations emphasize the need to carry<br />
out evaluations <strong>in</strong> order to ensure the functionality and quality of <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
However, the two approaches differ as regards to more qualitative aspects as<br />
well. We have previously mentioned consistency, which is a highly relevant concept<br />
here. As appears from the papers reviewed above on manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>, human <strong>in</strong>dexers<br />
undertake an <strong>in</strong>terpretation of the content of a piece of <strong>in</strong>formation ahead of the<br />
assignment of <strong>in</strong>dex terms, whether they are controlled or uncontrolled. This<br />
<strong>in</strong>terpretation most likely leads to <strong>in</strong>consistencies due to differences <strong>in</strong> the <strong>in</strong>dexers’<br />
conception on, what the document is about (Anderson & Perez-Carballo, 2001a).<br />
Concurrently, the human <strong>in</strong>terpretation allows for documents to be represented by terms<br />
not present <strong>in</strong> the document, which potentially enriches the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>, the <strong>in</strong>terpretation is based on statistical calculations based on term occurrence,<br />
which <strong>in</strong>creases consistency considerably. But, as stated by Bloomfield (2002),<br />
consistency of <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be consistently good or bad. As a consequence it could be<br />
added that <strong><strong>in</strong>dex<strong>in</strong>g</strong> can be <strong>in</strong>consistently good. By this is meant that a consistent bad<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> is not necessarily preferably to an <strong>in</strong>consistent <strong><strong>in</strong>dex<strong>in</strong>g</strong> that conta<strong>in</strong>s very<br />
good elements along with very bad elements. Accord<strong>in</strong>g to Mandersloot, Douglas &<br />
Spicer (1970, p. 50), “[human...] <strong><strong>in</strong>dex<strong>in</strong>g</strong> may have <strong>in</strong>consistencies, but it is flexible.<br />
Mach<strong>in</strong>e <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be consistent, but it is rigid.” Whether this op<strong>in</strong>ion is also<br />
reflected <strong>in</strong> empirical comparisons of the two approaches will be <strong>in</strong>vestigated below.<br />
92
93<br />
Chapter 5<br />
Different authors have referred to the difficulties of isolat<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> as a<br />
variable when measur<strong>in</strong>g the performance of an IR system (e.g., Anderson & Perez-<br />
Carballo, 2001a). However, numerous studies exist that have compared the<br />
performance of the two approaches. Salton’s (1986a) review of early studies argues for<br />
the potential of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> various evaluative sett<strong>in</strong>gs. As mentioned<br />
previously, several of the studies reviewed <strong>in</strong> section 5.3.2, are also relevant <strong>in</strong> the<br />
present section due to their comparison of on one side manual controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> and<br />
on the other hand automatic, uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Examples are the Cranfield<br />
experiments (e.g., Cleverdon, 1967) that demonstrated promis<strong>in</strong>g results as regards<br />
automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the form of s<strong>in</strong>gle terms compared to manually assigned<br />
controlled <strong>in</strong>dex terms, and Savoy’s (2005) study that found that the best performance<br />
was achieved by a comb<strong>in</strong>ation of manual and automatic methods. TREC 6 have also<br />
carried out experiments regard<strong>in</strong>g automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> and retrieval. Here the studies<br />
have not been compared to human <strong><strong>in</strong>dex<strong>in</strong>g</strong>, but have been tested <strong>in</strong> isolation. In<br />
particular the tracks test<strong>in</strong>g term weight<strong>in</strong>g are relevant to automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (Harman<br />
& Voorhees, 2006). In sum, apart from expected lower expenditures on automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>, what can also be expected from automation of <strong><strong>in</strong>dex<strong>in</strong>g</strong> procedures is an<br />
<strong>in</strong>creased level of consistency. In the sections below a more detailed presentation of the<br />
characteristics of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> will follow.<br />
5.4 Approaches to automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> designates the situation, when mach<strong>in</strong>es substitute human<br />
<strong>in</strong>dexers and carry out the <strong><strong>in</strong>dex<strong>in</strong>g</strong> of documents (Lancaster, 2003, p. 283). With our<br />
po<strong>in</strong>t of departure <strong>in</strong> the broad def<strong>in</strong>ition of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> as outl<strong>in</strong>ed <strong>in</strong> the<br />
<strong>in</strong>troduction to the present chapter, automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> is covered by a number of<br />
diverse research societies. Golub (2006) differentiates between four approaches (text<br />
categorization, document cluster<strong>in</strong>g, document classification, and mixed approaches)<br />
orig<strong>in</strong>at<strong>in</strong>g from different research societies such as mach<strong>in</strong>e learn<strong>in</strong>g, <strong>in</strong>formation<br />
retrieval, and library science. A fair amount of the automatic approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
are based on techniques and pr<strong>in</strong>ciples that go back <strong>in</strong> time. The most significant<br />
difference between the time of development of the techniques and the present is that the<br />
6 Short for Text REtrieval Conference. TREC first started out <strong>in</strong> 1992 (Harman & Voorhees, 2006).
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
power and capacity of hardware has <strong>in</strong>creased along with the amount of digitalized<br />
documents. As a consequence, the automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> achieved today has a better<br />
performance, though there is still room for improvement (Lancaster, 2003, p. 330-331).<br />
We divide the automatic approaches as to whether they represent extracted or<br />
assigned methods. In relation to the present work, this division makes sense, because it<br />
reflects the manual approaches that are be<strong>in</strong>g mirrored <strong>in</strong> the automatic counterparts.<br />
Moreover, this is the categorization employed by Lancaster (2003) and Moens (2000).<br />
We will use this division <strong>in</strong> the sections to follow. In practice, algorithms for automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> usually make use of more than one method at the time (Coyle, 2008). In our<br />
review, however, we will present and discuss the methods <strong>in</strong> their pure form with<br />
whatever characteristics they may have.<br />
5.4.1 <strong>Automatic</strong> extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
In automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> terms are drawn from the document itself to<br />
represent the content of the document <strong>in</strong> l<strong>in</strong>e with natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong> mentioned<br />
previously. The most basic k<strong>in</strong>d of automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> is <strong><strong>in</strong>dex<strong>in</strong>g</strong> all<br />
occurr<strong>in</strong>g words <strong>in</strong> a collection of documents (Anderson & Perez-Carballo, 2001b, p.<br />
258) . However, not all natural language <strong>in</strong>dex terms appear<strong>in</strong>g <strong>in</strong> documents makes<br />
good descriptors of a document. Therefore, a number of techniques are necessary <strong>in</strong><br />
order to <strong>in</strong>crease the quality of descriptors when extract<strong>in</strong>g <strong>in</strong>dex terms from<br />
documents. The basic procedure consists of five steps of which some or all may be<br />
<strong>in</strong>cluded <strong>in</strong> order to improve the automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>; 1) identification of words<br />
appear<strong>in</strong>g <strong>in</strong> the document collection (lexical analysis); 2) removal of function words<br />
us<strong>in</strong>g a stop word list; 3) execute stemm<strong>in</strong>g <strong>in</strong> order to make words appear <strong>in</strong> their basic<br />
form; 4) compute a weight<strong>in</strong>g factor for the rema<strong>in</strong><strong>in</strong>g words tak<strong>in</strong>g <strong>in</strong>to account the<br />
term frequency and <strong>in</strong>verse document frequency; and 5) represent documents with the<br />
calculated value on the basis of the previous steps (cf. Salton, 1989, p. 304; Salton &<br />
McGill, 1983). Others supplement the five step procedure with additional steps such as<br />
formation of phrases. In some cases this results <strong>in</strong> a changed succession of steps<br />
(Moens, 2000, p. 78). In the sections to follow, we will elaborate on the s<strong>in</strong>gle steps.<br />
5.4.1.1 Lexical analysis and stop word lists<br />
Lexical analysis identifies a stream of characters <strong>in</strong>to a stream of words.<br />
S<strong>in</strong>gle words are identified, when separated by space or punctuation (Moens, 2000).<br />
94
95<br />
Chapter 5<br />
Some challenges, which might occur <strong>in</strong> the process, are abbreviations, hyphenated<br />
terms, punctuation <strong>in</strong> general, and digits. A mach<strong>in</strong>e readable dictionary may help<br />
solve the problems of abbreviations, and <strong>in</strong> some cases hyphenated terms. The<br />
examples do not necessarily cause problems. However, they need to be considered,<br />
when perform<strong>in</strong>g lexical analysis (Fox, 1992; Moens, 2000).<br />
Stop word lists are lists of the most common words that are removed from full<br />
text documents <strong>in</strong> order to reduce the number of possible <strong>in</strong>dex terms. Alternative<br />
designations comprise stop lists or negative vocabularies (Fox, 1989). The assumption<br />
about stop words is that they do not candidate for good <strong>in</strong>dex terms. In particular, it is<br />
desirable to elim<strong>in</strong>ate function words from the list of potential <strong>in</strong>dex terms (e.g., Luhn,<br />
1957; Salton, 1989). Further, stop word lists limits the space needed <strong>in</strong> <strong>in</strong>dices (Wilbur<br />
& Sirotk<strong>in</strong>, 1992). Stop word lists commonly conta<strong>in</strong> between 50 and 400 words when<br />
directed towards English text (Moens, 2000). For both lexical analysis and stop word<br />
lists, the doma<strong>in</strong> <strong>in</strong> question should be taken <strong>in</strong>to consideration. Thus, for some terms,<br />
the usefulness of potential <strong>in</strong>dex terms may differ depend<strong>in</strong>g on the application area<br />
(Fox, 1992). When prepar<strong>in</strong>g a stop word list different choices must be made. The size<br />
of the list, whether large or small, is at the <strong>in</strong>troductory stage decided by the cut-off<br />
level based on the frequency of terms. For <strong>in</strong>stance, Fox (1989) set the cut-off to<br />
occurrences above 300 for a large stop word list. In addition, different qualitative<br />
actions can be made <strong>in</strong> order to qualify the stop word list further. Examples are<br />
reckon<strong>in</strong>g of alternative spell<strong>in</strong>gs and variants of stop words with diverg<strong>in</strong>g prefixes<br />
and suffixes. Also exam<strong>in</strong>ation of potentially relevant and irrelevant words with an<br />
occurrence close to the cut-off limit is likely to qualify the stop word list (cf. Fox,<br />
1989).<br />
5.4.1.2 Stemm<strong>in</strong>g<br />
Stemm<strong>in</strong>g identifies morphologically related terms by reduc<strong>in</strong>g variants of a<br />
word to its stem or root (Moens, 2000; Salton & McGill, 1983; Anderson & Perez-<br />
Carballo, 2001b). Specifically affixes, that is, prefixes and suffixes are removed from<br />
natural language <strong>in</strong> order to identify stems (cf. Hammarström, 2006). The assumption<br />
is, that “when stems are used as <strong>in</strong>dex terms, a greater number of potentially relevant<br />
items can be identified than when one of the orig<strong>in</strong>al full text words is <strong>in</strong> use” (Salton &<br />
McGill, 1983, p. 72). Us<strong>in</strong>g a stemmer is likely to <strong>in</strong>crease recall, as documents with<br />
morphological variations of the same stem are merged to be represented by the same<br />
<strong>in</strong>dex term. Further, like stop word lists, the use of stemm<strong>in</strong>g reduces the need for
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
space <strong>in</strong> the <strong>in</strong>dex, s<strong>in</strong>ce the number of potential <strong>in</strong>dex terms are reduced dur<strong>in</strong>g the<br />
process (Salton & McGill, 1983; Moens, 2000; Willett, 2006).<br />
Stemm<strong>in</strong>g can be divided <strong>in</strong>to manual or automatic methods. Manual methods<br />
employ some type of regular expressions. <strong>Automatic</strong> stemm<strong>in</strong>g on the other hand uses<br />
for <strong>in</strong>stance affix removal, successor varieties, table look ups or n-grams (Frakes, 1992).<br />
Automated stemm<strong>in</strong>g is carried out by a stemm<strong>in</strong>g algorithm that removes prefixes,<br />
suffixes or both. Two potential problems challenge the performance of stemm<strong>in</strong>g:<br />
under stemm<strong>in</strong>g and over stemm<strong>in</strong>g. Under stemm<strong>in</strong>g removes too little of the term,<br />
while over stemm<strong>in</strong>g removes too much of the term and thus corresponds to what is<br />
known as over truncation and under truncation <strong>in</strong> retrieval (cf. Chowdhury, 2004).<br />
However, what causes over stemm<strong>in</strong>g and under stemm<strong>in</strong>g differs between languages<br />
due to the differences <strong>in</strong> morphological structure (Moens, 2000; Willett, 2006)<br />
A number of stemmers have been proposed. However, two algorithms <strong>in</strong><br />
particular stand out, namely Lov<strong>in</strong>s’ stemmer (1968) and Porter’s stemmer (1980).<br />
Both algorithms are aimed at suffix removal, which is the most common type of<br />
stemmers. Further, both stemmers are aimed at s<strong>in</strong>gle-word terms (Galvez, de Moya-<br />
Anegon & Solana, 2005). Lov<strong>in</strong>s’ stemmer <strong>in</strong>volves two steps. In the first step the<br />
stemm<strong>in</strong>g is carried out. In the second step spell<strong>in</strong>g exceptions are handled by a set of<br />
rules <strong>in</strong> order to avoid the merg<strong>in</strong>g of stems with differ<strong>in</strong>g spell<strong>in</strong>gs. Examples are<br />
collide and collision (Lov<strong>in</strong>s, 1968). Like Lov<strong>in</strong>s, Porter specifies a set of suffixes to be<br />
removed from stems. However, spell<strong>in</strong>g exceptions are not <strong>in</strong>corporated <strong>in</strong> the Porter<br />
stemmer (Porter, 1980). Recently, Porter has developed a new generic stemmer,<br />
Snowball, which provides stemmers for a number of different European languages<br />
<strong>in</strong>clud<strong>in</strong>g Danish (Porter, 2001).<br />
5.4.1.3 Weight<strong>in</strong>g factors<br />
When terms are weighted it is implicit, that some terms, even after lexical<br />
analysis, stop word lists, and stemm<strong>in</strong>g have been applied, are more important than<br />
others. In other words, when differentiat<strong>in</strong>g the weights of the rema<strong>in</strong><strong>in</strong>g terms, it is<br />
implied that the first three steps are not sufficient for the identification of good <strong>in</strong>dex<br />
terms. Luhn (e.g., 1961) was a pioneer <strong>in</strong> suggest<strong>in</strong>g, that terms occurr<strong>in</strong>g <strong>in</strong> documents<br />
could substitute for controlled vocabularies <strong>in</strong> respect to <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The assumption was<br />
that the subject of a document is reflected by the occurrence of terms designat<strong>in</strong>g that<br />
subject (Moens, 2000; Salton, 1989; Salton & McGill, 1983). The higher the frequency<br />
of a term (TF), the higher is the probability that the document is concerned with the<br />
96
97<br />
Chapter 5<br />
Figure 5.5 The resolv<strong>in</strong>g power of significant <strong>in</strong>dex terms. Adapted from Luhn (1958a, p. 161)<br />
subject referred to by the term. Obviously, this is only true up to a certa<strong>in</strong> po<strong>in</strong>t. Stop<br />
words and other high frequent words do not make good <strong>in</strong>dex terms. Accord<strong>in</strong>g to<br />
Luhn (1958a), good <strong>in</strong>dex terms should be found among terms with a medium<br />
frequency <strong>in</strong> the document. Luhn’s thoughts are a cont<strong>in</strong>uation of Zipf’s f<strong>in</strong>d<strong>in</strong>gs.<br />
Approximately a decade earlier, Zipf (1949) discovered that the frequency of terms <strong>in</strong> a<br />
document is <strong>in</strong>versely proportional to its rank position. The pr<strong>in</strong>ciples are illustrated<br />
below (see Figure 5.5).<br />
The early ideas by Luhn have been ref<strong>in</strong>ed and expanded <strong>in</strong> the years to follow<br />
due to different issues; among others lack of uniform application and empirical support<br />
(Salton, 1970, 1988). Thus, it has turned out that fundamental problems arise, if TF is<br />
used as the only basis for extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The reason is that mere TF does not take<br />
<strong>in</strong>to account the comparable occurrence of terms across documents. The result will be<br />
low precision <strong><strong>in</strong>dex<strong>in</strong>g</strong>, s<strong>in</strong>ce a term with a high frequency <strong>in</strong> a large collection of<br />
documents is not able to dist<strong>in</strong>guish the documents from each other, which is the<br />
implicit purpose of precision (Salton & Buckley, 1988; Salton, 1989). One way of<br />
correct<strong>in</strong>g for the limitations of TF is to add the <strong>in</strong>verse document frequency (IDF) <strong>in</strong>to<br />
the calculation of term weights. IDF expresses the occurrence of terms <strong>in</strong> a collection
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
of documents. The assumption is, that terms with a high frequency across a collection<br />
is less able to discrim<strong>in</strong>ate between the documents conta<strong>in</strong><strong>in</strong>g that particular term, than<br />
a term that is high frequent <strong>in</strong> just a few documents (Salton, 1989). The formula for<br />
IDF takes just this <strong>in</strong>to account. The formula is (Moens, 2000; Salton & Buckley,<br />
1988):<br />
<br />
N<br />
log<br />
n<br />
t<br />
where<br />
log = common logarithm<br />
N = number of documents <strong>in</strong> the collection<br />
nt = number of documents <strong>in</strong> the collection conta<strong>in</strong><strong>in</strong>g the term t<br />
By comb<strong>in</strong><strong>in</strong>g TF and IDF, high weights are allocated to terms that<br />
simultaneously have a high frequency <strong>in</strong> a document and a low frequency <strong>in</strong> the<br />
document collection. Further, the product of TF*IDF is one way of measur<strong>in</strong>g term<br />
discrim<strong>in</strong>ation values. Thus, term discrim<strong>in</strong>ation is comparable with IDF (Salton, Yang<br />
& Yu, 1975). Term discrim<strong>in</strong>ation value expresses a terms ability to dist<strong>in</strong>guish<br />
documents of a collection from each other. A core concept <strong>in</strong> relation to term<br />
discrim<strong>in</strong>ation value is connectivity. High connectivity is characteristic for bad <strong>in</strong>dex<br />
terms due to their lack of capacity to dist<strong>in</strong>guish between documents <strong>in</strong> a collection. On<br />
the contrary, good <strong>in</strong>dex terms has a low connectivity between documents (Jones &<br />
Furnas, 1987; Moens, 2000). The applicability of the TF*IDF factor have been the<br />
subject of different op<strong>in</strong>ions over the years. In a presentation of early empirical retrieval<br />
tests Sparck Jones (1973) concludes that the comb<strong>in</strong>ation of the two frequencies<br />
improves retrieval considerably, when compared to weight<strong>in</strong>g based of TF alone.<br />
However, as po<strong>in</strong>ted out by Salton & Buckley (1988), a major weakness of the TF*IDF<br />
product is the need for cont<strong>in</strong>uous updates of the frequency factor. This is <strong>in</strong> particular<br />
necessary <strong>in</strong> dynamic document collections. Thus, TF*IDF is more suitable for static<br />
collections.<br />
In addition to IDF, the length of documents (or vectors cf., Salton & Buckley<br />
(1988)) could be taken <strong>in</strong>to consideration, when determ<strong>in</strong><strong>in</strong>g term weights. Thus, long<br />
documents conta<strong>in</strong> more terms than short, which makes a long document more<br />
98
99<br />
Chapter 5<br />
retrievable than a short document due to the higher frequency of terms. In retrieval the<br />
problem of the frequency of terms may be reduced by normaliz<strong>in</strong>g the term frequency<br />
as to the length of the document (S<strong>in</strong>ghal et al., 1996). Evidently, normalization is<br />
particularly necessary <strong>in</strong> document collections conta<strong>in</strong><strong>in</strong>g heterogeneous documents.<br />
Further, a long document usually conta<strong>in</strong>s more synonyms for the same concept, which<br />
also <strong>in</strong>creases retrieval. In this case, obviously, normalization of length will not be<br />
useful for correction. Instead more qualitative tools must be considered. The possible<br />
higher degree of semantic variability <strong>in</strong> longer documents could, at least partly, expla<strong>in</strong><br />
the tendencies observed by S<strong>in</strong>ghal et al. (1996). They f<strong>in</strong>d that <strong>in</strong> spite of<br />
normalization, longer documents still tend to have better retrieval compared to shorter<br />
documents. Similar observations have been made earlier by Sparck Jones (1973), who<br />
concluded that document length normalization has a little, if any, effect on retrieval.<br />
5.4.1.4 Compound nouns as <strong>in</strong>dex terms<br />
The procedure outl<strong>in</strong>ed above refers to extraction of s<strong>in</strong>gle word <strong>in</strong>dex terms.<br />
However, also phrases may be taken <strong>in</strong>to account when consider<strong>in</strong>g weight<strong>in</strong>g factors.<br />
Phrases constitute a particular challenge s<strong>in</strong>ce the occurrence of two or more words <strong>in</strong> a<br />
phrase frequently has a quite different mean<strong>in</strong>g than the s<strong>in</strong>gle terms <strong>in</strong>cluded <strong>in</strong> the<br />
phrase itself. This is the case concern<strong>in</strong>g noun phrases and proper names. Usually,<br />
phrases bear more mean<strong>in</strong>g and specificity <strong>in</strong>dex terms than the s<strong>in</strong>gle terms<br />
constitut<strong>in</strong>g the phrase (Croft, Turtle & Lewis, 1991; Lancaster, 2003). A classic<br />
example is the phrase “venetian bl<strong>in</strong>ds”. When <strong>in</strong> a phrase, the concept refers to a<br />
certa<strong>in</strong> k<strong>in</strong>d of bl<strong>in</strong>ds. When divided <strong>in</strong>to s<strong>in</strong>gle terms, it refers to people from a<br />
specific city and someth<strong>in</strong>g used to cover w<strong>in</strong>dows respectively, that is, a completely<br />
different mean<strong>in</strong>g. When comb<strong>in</strong>ed, but not as a phrase, it may refer to venetians, that<br />
cannot see. On the other hand, the probability that the two words may occur <strong>in</strong> the same<br />
sentence, but without appear<strong>in</strong>g as a phrase is rather low (Salton & McGill, 1983, p.<br />
86). A number of techniques, whether simple (e.g., simple collocations, statistically<br />
validated N-grams, syntactic structures) or advanced (e.g., extended n-grams, or<br />
syntactic pars<strong>in</strong>g), may be used <strong>in</strong> order to identify phrases <strong>in</strong> documents (Strzalkowski<br />
et al., 1999, p. 117). At present, the methods are expensive and time consum<strong>in</strong>g to<br />
perform, and many questions rema<strong>in</strong>s unanswered, such as the pay off as to the<br />
<strong>in</strong>vestments undertaken <strong>in</strong> different contexts (Voorhees & Pazienza, 1999; Anderson &<br />
Perez-Carballo, 2001b). This may also expla<strong>in</strong> why at present the weight<strong>in</strong>g functions<br />
of s<strong>in</strong>gle terms are to some extent employed when weight<strong>in</strong>g phrases as well (Moens,<br />
2000). Phrases may be considered either as a set of words or as separate concepts when
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
weighted (Croft, Turtle & Lewis, 1991). Three scenarios may be l<strong>in</strong>ed up for the<br />
calculation of phrase weights; 1) the phrase is considered as one unit and the assigned<br />
weight is <strong>in</strong>dependent of the components constitut<strong>in</strong>g the phrase; 2) the phrase weight is<br />
calculated on the basis of the s<strong>in</strong>gle terms compris<strong>in</strong>g the phrase; or 3) a comb<strong>in</strong>ation of<br />
1) and 2), where the results of both approaches are considered, when the weight is<br />
computed (Moens, 2000).<br />
The challenges of phrases mentioned here particularly concern the English<br />
language. In the present empirical work the test collection consists of primarily Danish<br />
texts. The Danish language belongs to the Germanic family of languages along with<br />
German, Swedish, and others. Germanic languages differ from English <strong>in</strong> a number of<br />
ways. The way compound nouns are created is particularly pert<strong>in</strong>ent to automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>. Thus, where English compound nouns are created as phrases, Germanic<br />
languages create compounds by jo<strong>in</strong><strong>in</strong>g them together <strong>in</strong> one word (Hedlund, 2002).<br />
This means that results of IR and <strong><strong>in</strong>dex<strong>in</strong>g</strong> studies cannot be transferred to Germanic<br />
languages as a matter of course (Ahlgren & Kekälä<strong>in</strong>en, 2007). Thus, the challenges<br />
related to English basically consist of identify<strong>in</strong>g when two or more s<strong>in</strong>gle terms are <strong>in</strong><br />
fact a noun phrase. The purpose is to enable an <strong>in</strong>crease <strong>in</strong> precision. In the Germanic<br />
languages the challenges are if anyth<strong>in</strong>g the opposite. Here techniques needs to be able<br />
to identify the components of compound nouns <strong>in</strong> order to <strong>in</strong>crease recall (cf. Pedersen,<br />
Navarretta & Hansen, 2005). Techniques for identify<strong>in</strong>g the components of compound<br />
nouns have been developed and tested for a number of languages other than English.<br />
However, Danish is not among the most thoroughly discovered languages <strong>in</strong> this<br />
respect. A 2-year research project, the VID-project, was carried out <strong>in</strong> the mid-00s by<br />
centre for language technology, University of Copenhagen. The overall purpose of the<br />
project was to <strong>in</strong>vestigate the potential of human language technologies as regards<br />
acquisition and representation of <strong>in</strong>formation (Pedersen, Navarretta & Henriksen, 2004).<br />
Amongst others the project contributed with knowledge about how mark<strong>in</strong>g up texts as<br />
to word classes would affect recall and precision <strong>in</strong> IR. The study found that precision<br />
were very satisfy<strong>in</strong>g (=0.9), whereas recall surpris<strong>in</strong>gly were lower (=0.6). The reasons<br />
for the lower recall were expla<strong>in</strong>ed by errors <strong>in</strong> the recognition of terms, and by a<br />
general lack of complexity <strong>in</strong> the recognition (Pedersen, Navarretta & Hansen, 2005, p.<br />
28). The results of the study emphasize the need for language tools to allow for a high<br />
degree of complexity, when aimed at Danish and similar languages.<br />
To our knowledge no other research supplements the VID-project as regards<br />
the uncover<strong>in</strong>g of the Danish language. Swedish, on the other hand, has been<br />
100
101<br />
Chapter 5<br />
<strong>in</strong>vestigated <strong>in</strong> different studies. Due to the large share of similarities we may<br />
reasonably transfer Swedish results to the Danish language. Ahlgren & Kekälä<strong>in</strong>en<br />
(2007) have tested a number of different techniques <strong>in</strong> a comparative study of Swedish<br />
text. 4 <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods rang<strong>in</strong>g from raw text over <strong>in</strong>flection 7 to two variants of<br />
compound splitt<strong>in</strong>g were compared <strong>in</strong> the study. The <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods were evaluated<br />
and compared on a collection of Swedish newspaper articles us<strong>in</strong>g topics from CLEF.<br />
To set the scene for evaluations 6 user profiles were set up. The profiles varied as to<br />
their degree of patience and their perception of what makes a relevant document.<br />
Normalized discounted cumulated ga<strong>in</strong> (nDCG) (Järvel<strong>in</strong> & Kekälä<strong>in</strong>en, 2002) was<br />
used to measure the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods. The study found that, compared<br />
to the rema<strong>in</strong>der of the tested methods, <strong>in</strong> general the simplest <strong><strong>in</strong>dex<strong>in</strong>g</strong> method had the<br />
lowest performance when orig<strong>in</strong>al words from the topics were used as query terms. On<br />
the contrary, when the topic terms were truncated for queries, the same method had the<br />
best performance compared to the rema<strong>in</strong>der. In sum, <strong>in</strong>flection and compound<br />
splitt<strong>in</strong>g did not improve retrieval compared to simple truncation. It appears the lessons<br />
learned <strong>in</strong> phrase <strong><strong>in</strong>dex<strong>in</strong>g</strong>, namely that it is time consum<strong>in</strong>g and that the payoff is<br />
questionable, also seems to be the case for methods applicable for more complex<br />
languages than English.<br />
5.4.1.5 Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> refers to the automatic extraction of <strong>in</strong>dex terms based on<br />
various techniques. The techniques <strong>in</strong>cluded here corresponds to what Golub<br />
designates as document cluster<strong>in</strong>g (Golub, 2006). However, <strong>in</strong> order to be able to<br />
separate the overall concept from the specific technique cluster<strong>in</strong>g, we apply the term<br />
extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> below. As noted by Golub (Golub, 2006), this approach lies with<strong>in</strong><br />
the IR-tradition. The close relation between extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> and <strong>in</strong>formation<br />
retrieval is constituted by the common use of advanced IR techniques for mark<strong>in</strong>g up<br />
documents and match<strong>in</strong>g queries with documents.<br />
Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> its most simple form is based on the steps described <strong>in</strong><br />
the preced<strong>in</strong>g sections (lexical analysis, removal of stop words, stemm<strong>in</strong>g, and term<br />
weight<strong>in</strong>g) (Salton, 1991). However, also more advanced techniques exist. Such a<br />
technique is, for example the vector space model. The vector space model may be<br />
7<br />
Inflection designate the different forms a word can take, whether it is due to mutation caused by plural<br />
form, strong verbs, compound<strong>in</strong>g use of glue morphemes and others (cf. Ahlgren & Kekälä<strong>in</strong>en, 2007).
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
applied for comparison between documents (<strong><strong>in</strong>dex<strong>in</strong>g</strong>) or between documents and<br />
queries (IR). For <strong><strong>in</strong>dex<strong>in</strong>g</strong> purposes documents are represented by vectors on the basis<br />
of terms occurr<strong>in</strong>g <strong>in</strong> the documents of a collection. The steps comprised by simple<br />
extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> mentioned above are followed <strong>in</strong> order to generate a vector<br />
represent<strong>in</strong>g each document. Subsequently the vectors are processed us<strong>in</strong>g cluster<br />
analysis (Salton, Wong & Yang, 1975).<br />
Two ma<strong>in</strong> steps constitute the process of advanced extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong>: 1) First,<br />
documents are be<strong>in</strong>g represented vectors. Subsequently the vectors are be<strong>in</strong>g compared<br />
us<strong>in</strong>g a similarity measure, and 2) Clusters are formed by means of cluster<strong>in</strong>g<br />
algorithms (cf. Golub, 2006, p. 356). Cluster analysis designates a method for data<br />
analysis with numerous areas of application. By means of cluster analysis unlabelled<br />
patterns with<strong>in</strong> a set of items can be grouped <strong>in</strong>to mean<strong>in</strong>gful clusters (Ja<strong>in</strong>, Murty &<br />
Flynn, 1999). In terms of documents cluster analysis clusters documents <strong>in</strong> a collection<br />
accord<strong>in</strong>g to common features between documents <strong>in</strong> a collection.<br />
Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> is an unsupervised way of organiz<strong>in</strong>g documents, because<br />
no pre-categorized documents are used as tra<strong>in</strong><strong>in</strong>g documents. Document cluster<strong>in</strong>g<br />
may be based on terms occurr<strong>in</strong>g <strong>in</strong> the documents, or on co-occurr<strong>in</strong>g citations. Terms<br />
can also form the basis for cluster<strong>in</strong>g. In that case co-occurrence <strong>in</strong> the document<br />
collection constitutes the unit of analysis (Rasmussen, 1992). Document cluster<strong>in</strong>g is<br />
characterized as an extracted k<strong>in</strong>d of <strong><strong>in</strong>dex<strong>in</strong>g</strong>, s<strong>in</strong>ce the clusters are not matched aga<strong>in</strong>st<br />
a controlled vocabulary.<br />
The performance of extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> based on various cluster<strong>in</strong>g techniques<br />
and/or on simple extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> has been exam<strong>in</strong>ed <strong>in</strong> different comparative studies.<br />
We have already mentioned the Cranfield tests, one of the first attempts to evaluate<br />
automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> (see section 5.3.2).<br />
An early attempt to carry out a cluster<strong>in</strong>g <strong>in</strong>terface for post retrieval, Grouper,<br />
was presented by Zamir & Etzioni (1999) <strong>in</strong> the late 90es. The functionality was made<br />
for a meta search eng<strong>in</strong>e, HuskySearch. The technology beh<strong>in</strong>d Grouper consisted of<br />
three steps; 1) stemm<strong>in</strong>g, 2) identification of basic clusters, and 3) merg<strong>in</strong>g of clusters<br />
with a high degree of overlap between conta<strong>in</strong>ed documents. Further, Grouper had<br />
technology built <strong>in</strong>, which allowed for correct<strong>in</strong>g redundant titles of clusters along with<br />
technology allow<strong>in</strong>g for fast process<strong>in</strong>g of search results.. The <strong>in</strong>terface was evaluated<br />
us<strong>in</strong>g search logs. 3.183 queries had been logged at the Grouper <strong>in</strong>terface with<strong>in</strong> 2<br />
months, while 19.330 queries had been logged from the HuskySearch <strong>in</strong>terface<br />
(represent<strong>in</strong>g ranked search results with no cluster<strong>in</strong>g of results). The data material<br />
102
103<br />
Chapter 5<br />
does now allow for an explanation of the patterns identified <strong>in</strong> the search logs due to the<br />
lack of qualitative data. Another limit to the study, which was po<strong>in</strong>ted out by the<br />
authors, was the lack of control of who used the test system (Grouper) and the basel<strong>in</strong>e<br />
system (HuskySearch). From the data it appeared that users explored several clusters <strong>in</strong><br />
order to locate relevant documents <strong>in</strong> the Grouper <strong>in</strong>terface. The authors expla<strong>in</strong>ed the<br />
undesired situation by either a user behavior that searches for different perspectives of<br />
their <strong>in</strong>formation need or simply that generation of clusters were not able to match user<br />
needs sufficiently. When compared to the basel<strong>in</strong>e system it was found that Grouper<br />
users found more documents, perhaps suggest<strong>in</strong>g that the cluster<strong>in</strong>g made it easier to<br />
locate relevant documents. A qualitative follow up on the study would provide a more<br />
thorough understand<strong>in</strong>g of the hypotheses put forward by the authors <strong>in</strong> the light of their<br />
f<strong>in</strong>d<strong>in</strong>gs.<br />
A later study based on extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> is Käki’s (2005a; 2005b; Käki &<br />
Aula, 2005) <strong>in</strong>vestigation of categorization of web documents for his dissertation work.<br />
Here, two algorithms for extract<strong>in</strong>g category candidates were applied. One was allowed<br />
for s<strong>in</strong>gle terms along with phrases, while the other required phrases conta<strong>in</strong><strong>in</strong>g of at<br />
least 2 terms. The algorithms were used <strong>in</strong> order to build a list of categories for<br />
organiz<strong>in</strong>g web results. The results were added to a category if it conta<strong>in</strong>ed the name of<br />
the category <strong>in</strong> its result summary text (Käki, 2005a). It is the extraction of candidate<br />
terms from the documents themselves and the lack of supervision that cause us to<br />
classify Käki’s work with<strong>in</strong> extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
On the basis of the two extraction algorithms two <strong>in</strong>terfaces was set up for<br />
test<strong>in</strong>g. Different evaluations have been reported from the study. Käki & Aula (2005)<br />
made a comparative study of an <strong>in</strong>terface compris<strong>in</strong>g the algorithm and categorized<br />
search <strong>in</strong>terface with the World Wide Web (hereafter: the web) as the test base. The<br />
basel<strong>in</strong>e was a Google web page display<strong>in</strong>g the results as a ranked list. 20 test persons<br />
participated <strong>in</strong> the test, where 9 predef<strong>in</strong>ed queries <strong>in</strong> general topic areas formed the<br />
start<strong>in</strong>g po<strong>in</strong>t of the searches. The test persons were allowed 1 m<strong>in</strong>ute for each task <strong>in</strong><br />
order to reflect a faster behavior that supposedly would be more realistic. The<br />
performance of the experimental system and the basel<strong>in</strong>e system were measured <strong>in</strong><br />
terms of 1) time to accomplish a task, 2) number of results selected for a task, 3)<br />
relevance of selected results measured on a 3-po<strong>in</strong>t scale (relevant, related, and not<br />
relevant), and 4) subjective attitude concern<strong>in</strong>g both experimental and basel<strong>in</strong>e systems<br />
(Käki & Aula, 2005, p. 199). In addition recall and precision were measured on the<br />
basis of the relevance judgments carried out by the test persons. The study found that
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
the categorized <strong>in</strong>terface had a better average performance <strong>in</strong> precision (62%, sd=13<br />
aga<strong>in</strong>st 49%, sd=15) and recall (33%, sd=4 aga<strong>in</strong>st 19%, sd=7). The results of the test<br />
persons’ attitudes aga<strong>in</strong>st the two systems demonstrated a fairly more positive attitude<br />
towards the test system compared to the basel<strong>in</strong>e system. The test did not f<strong>in</strong>d<br />
substantial differences as to the time applied, most likely due to the very short time<br />
w<strong>in</strong>dow applied <strong>in</strong> the test.<br />
The highly controlled test just referred was followed up by a longitud<strong>in</strong>al study<br />
that was considerably less experimental (Käki, 2005b). 16 people participated <strong>in</strong> the<br />
study. The participants did not receive any <strong>in</strong>struction for the use of the test system<br />
besides us<strong>in</strong>g it any way they would like. This reflected the purpose of the study,<br />
namely to reflect the participants’ real behavior. This time no comparison was made to<br />
a basel<strong>in</strong>e system. Like <strong>in</strong> the previous study, the test system was applied to the web.<br />
The data collection lasted for three months <strong>in</strong>clud<strong>in</strong>g one month of compensation for a<br />
holiday period. Two types of data were collected; search logs and questionnaires. One<br />
questionnaire was distributed a week or two after the launch of the data collection, the<br />
other <strong>in</strong> the end of the study. 3099 queries were logged, while 3232 result pages were<br />
accessed and 1915 categories were selected. The relevance of retrieved documents was<br />
not registered. The study found that categories were used to select 26% of the accessed<br />
result pages. In the qualitative part of the first questionnaire, participants <strong>in</strong>dicated that<br />
categories were useful, when “…the orig<strong>in</strong>al query was vague, broad, general, or<br />
conta<strong>in</strong>ed words that have multiple mean<strong>in</strong>gs...” (Käki, 2005b, p. 138). The ability of<br />
the categories to help <strong>in</strong>crease the focus of a less precise query was also expressed <strong>in</strong> the<br />
second questionnaire. Further, categories were found useful, when result rank<strong>in</strong>gs were<br />
deficient. The results of the study are <strong>in</strong>terest<strong>in</strong>g, because it demonstrates that<br />
categoriz<strong>in</strong>g results is not necessarily useful <strong>in</strong> all <strong>in</strong>formation search<strong>in</strong>g situations.<br />
From the analysis we do get some <strong>in</strong>dication of, when categories may be useful.<br />
However, a more systematic <strong>in</strong>vestigation would clearly be relevant.<br />
5.4.2 <strong>Automatic</strong> assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
<strong>Automatic</strong> assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> is the automatic equivalent to human controlled<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>. The major difference between automatic extracted and automatic assigned<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> is that a coupl<strong>in</strong>g is established between terms occurr<strong>in</strong>g <strong>in</strong> a collection of<br />
documents and a controlled vocabulary. The apparent advantage of coupl<strong>in</strong>g natural<br />
language <strong>in</strong>dex terms with a controlled vocabulary is the enabl<strong>in</strong>g of allow<strong>in</strong>g relations<br />
between documents that share one or more controlled <strong>in</strong>dex terms.<br />
104
105<br />
Chapter 5<br />
Different approaches exist for perform<strong>in</strong>g automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Two<br />
methods can be identified with<strong>in</strong> the text categorization literature. 8 One is based on<br />
knowledge eng<strong>in</strong>eer<strong>in</strong>g, the other on mach<strong>in</strong>e learn<strong>in</strong>g (Sebastiani, 2002). Text<br />
categorization is considered an assignment type of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> due to its<br />
categorization of documents <strong>in</strong>to a predef<strong>in</strong>ed set of categories. To compare,<br />
<strong>in</strong>formation filter<strong>in</strong>g represents another means of categoriz<strong>in</strong>g documents, though with<br />
dynamic categories (Belk<strong>in</strong> & Croft, 1992).<br />
Initially, text categorization was carried out us<strong>in</strong>g a rule based approach (or<br />
knowledge eng<strong>in</strong>eer<strong>in</strong>g approach, cf. Sebastiani (2002)). Typically, a set of rules was<br />
built after the pr<strong>in</strong>ciple of if-then, mean<strong>in</strong>g that if a document met certa<strong>in</strong> criteria it<br />
would be categorized <strong>in</strong> a specific category (Sebastiani, 1999). In practice, a profile<br />
was created for each term to be assigned, conta<strong>in</strong><strong>in</strong>g words and phrases with a high<br />
frequency <strong>in</strong> the documents that usually would be assigned with the controlled <strong>in</strong>dex<br />
term (cf. Lancaster, 2003, p. 287-288). The sum of rules is referred to as a classifier.<br />
Hayes & We<strong>in</strong>ste<strong>in</strong> (1990) are frequently mentioned <strong>in</strong> the literature as an example of<br />
this approach. They reported a system for categorization developed for news stories at<br />
Reuters. The categorization of documents is based on two steps; 1) concept recognition,<br />
and 2) categorization rules. In the first phase both s<strong>in</strong>gle terms and phrases are <strong>in</strong>cluded<br />
<strong>in</strong> the recognition. Further, the system is based on a certa<strong>in</strong> degree of human<br />
<strong>in</strong>tervention to either limit or extend the context of terms if necessary. Thus, the system<br />
may basically be considered a hybrid cf. section 5.5. Also the rules have been extended<br />
compared to pla<strong>in</strong> if-then rules. Thus, the context of a term is <strong>in</strong>cluded <strong>in</strong> order to<br />
decide on the strength of the term as to a specific category. Further, when generat<strong>in</strong>g<br />
the rules, the developers may take <strong>in</strong>to account terms’ specific position <strong>in</strong> a news story<br />
just like the length of the document may be considered. 674 rules were created <strong>in</strong> order<br />
to meet the needs of the document collections at Reuters’. Hayes & We<strong>in</strong>ste<strong>in</strong> (1990)<br />
report an evaluation <strong>in</strong> their presentation of the system. However, due to the very<br />
concise presentation we will not go further <strong>in</strong>to the results here. Further, we have not<br />
8<br />
Here we see an example of <strong>in</strong>consistent term<strong>in</strong>ology. In section 5.4.1.5 we have been referr<strong>in</strong>g to Käki<br />
(eg., Käki, 2005a), who applies the term categorization to denote an extracted type of automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>. In the present section text categorization denotes an assigned type of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
To avoid confusion we will dist<strong>in</strong>guish between the two by referr<strong>in</strong>g to the former as categorization or<br />
extracted categorization and to the latter as text categorization or assigned categorization.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
been able to locate supplementary studies perform<strong>in</strong>g more systematic evaluations of<br />
the system. However, the authors report an estimate sav<strong>in</strong>gs of <strong>in</strong>troduc<strong>in</strong>g the system<br />
to be approximately $752.000 <strong>in</strong> the first full year of deployment despite the expenses<br />
of 6.5 person-years for the development of the system.<br />
Also the American Petroleum Institute exemplifies the knowledge eng<strong>in</strong>eer<strong>in</strong>g<br />
approach. Here, the document collection consists of abstracts conta<strong>in</strong><strong>in</strong>g detailed<br />
technical <strong>in</strong>formation. The units of analysis were abstracts that were subjected to<br />
stemm<strong>in</strong>g and analysis at phrase level due to the large proportion of phrases with<strong>in</strong> the<br />
chemical doma<strong>in</strong>. The set of rules were to a large extent built around the thesaurus<br />
applied to the collection. For <strong>in</strong>stance, cross references <strong>in</strong> the thesaurus functioned as<br />
rules po<strong>in</strong>t<strong>in</strong>g natural language terms to the preferred terms <strong>in</strong> the thesaurus (Mart<strong>in</strong>ez,<br />
Lucey & L<strong>in</strong>der, 1987). As <strong>in</strong> Hayes & We<strong>in</strong>ste<strong>in</strong>’s (1990) study the evaluation of the<br />
present study is limited. In the paper by Mart<strong>in</strong>ez et al. the performance of the<br />
knowledge eng<strong>in</strong>eer<strong>in</strong>g approach is evaluated as to percentages of hits and noise. The<br />
evaluation reports on approximately 50% which cannot be considered impressive. In<br />
addition noise is reported to comprise about 15% of the retrieved documents.<br />
The manual and at times rigid elaboration of rules <strong>in</strong> the knowledge<br />
eng<strong>in</strong>eer<strong>in</strong>g approach has turned out to be expensive and time consum<strong>in</strong>g. To illustrate,<br />
the solution of the American Petroleum Institute conta<strong>in</strong>ed about 14,000 rules<br />
(Mart<strong>in</strong>ez, Lucey & L<strong>in</strong>der, 1987, p. 162). In addition the preparation of term profiles<br />
has been somewhat challeng<strong>in</strong>g. Further, decid<strong>in</strong>g on the relation between document<br />
terms and controlled terms has resulted <strong>in</strong> weak results <strong>in</strong> early studies (Apté, Damerau<br />
& Weiss, 1994; Lancaster, 2003, p. 288). As a consequence of these challenges<br />
alternative solutions were sought and the mach<strong>in</strong>e learn<strong>in</strong>g approach emerged dur<strong>in</strong>g<br />
the 1990s (Sebastiani, 1999; 2002). In the mach<strong>in</strong>e learn<strong>in</strong>g approach a classifier is also<br />
built for each category. Essentially, here the process consists of three stages. First, a<br />
number of documents are categorized manually <strong>in</strong>to a set of predef<strong>in</strong>ed categories. The<br />
selected documents serve the function of tra<strong>in</strong><strong>in</strong>g documents. Preferably, the tra<strong>in</strong><strong>in</strong>g<br />
documents already exist <strong>in</strong> the collection to be classified. Alternatively, artificial<br />
documents may be constructed. Next, a classifier is constructed on the basis of<br />
characteristics of the tra<strong>in</strong><strong>in</strong>g documents. A learner forms the basis of build<strong>in</strong>g the<br />
classifier. The learner will usually be available <strong>in</strong> advance. If a learner does not exist,<br />
some effort must be put <strong>in</strong>to construct<strong>in</strong>g one s<strong>in</strong>ce the learner to a large extent decides<br />
the effectiveness ultimately. It is <strong>in</strong> this second step of the categorization process that<br />
the manual production of rules <strong>in</strong> the knowledge eng<strong>in</strong>eer<strong>in</strong>g approach is replaced by<br />
106
107<br />
Chapter 5<br />
mach<strong>in</strong>e learn<strong>in</strong>g. A number of techniques exist for build<strong>in</strong>g the classifier. Examples<br />
count multivariate regression models, nearest neighbour classifiers, probabilistic<br />
Bayesian models, neural networks, symbolic rule learn<strong>in</strong>g, and Support Vector<br />
Mach<strong>in</strong>es (Dumais et al., 1998). Expla<strong>in</strong><strong>in</strong>g each technique <strong>in</strong> detail is besides the<br />
scope of the present work, but thorough reviews can be found <strong>in</strong> Dietterich (1997), and<br />
Kotsiantis, Zaharakis & P<strong>in</strong>telas (2006). The third and f<strong>in</strong>al step of the mach<strong>in</strong>e<br />
learn<strong>in</strong>g approach to categorization consists of apply<strong>in</strong>g the classifier to the full<br />
collection of documents (cf. Sebastiani, 2002; Golub, 2006, p. 352-353).<br />
Text categorization is characterized by be<strong>in</strong>g either hard or ranked. Hard text<br />
categorization basically denotes a fully automated procedure, while ranked text<br />
categorization conta<strong>in</strong>s approval by a human <strong>in</strong>dexer (Sebastiani, 2002). Thus, ranked<br />
text categorization is basically a semiautomatic approach, which will be explored<br />
further <strong>in</strong> section 5.5.<br />
Mach<strong>in</strong>e learn<strong>in</strong>g as an approach to categorization has been thoroughly tested<br />
<strong>in</strong> different studies (see e.g., Cunn<strong>in</strong>gham, Litt<strong>in</strong> & Witten, 1997). The tests have<br />
<strong>in</strong>vestigated a s<strong>in</strong>gle or several of the techniques mentioned above from a system<br />
oriented perspective. Core examples count Apté, Damerau & Weiss’ (1994), Chen<br />
(1995), and Dumais et al. (1998). However, <strong>in</strong> the present work we are concerned with<br />
the usefulness of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> from a user perspective. Therefore, our review of<br />
studies below will <strong>in</strong>clude studies that have <strong>in</strong>corporated users <strong>in</strong> their evaluation. The<br />
aim is to establish a frame of reference for the results found <strong>in</strong> the search test.<br />
Some authors have <strong>in</strong>vestigated particular <strong>government</strong> Web pages. The<br />
GovStat Project (http://www.ils.unc.edu/govstat) has given rise to a number of studies<br />
relevant to the title of the present section. The project is concerned with a specific k<strong>in</strong>d<br />
of <strong>in</strong>formation with<strong>in</strong> the public doma<strong>in</strong>; US <strong>government</strong>al statistical <strong>in</strong>formation.<br />
However, the ma<strong>in</strong> project focus is on user access and use of <strong>government</strong>al statistical<br />
<strong>in</strong>formation, which is why some of the studies provide valuable <strong>in</strong>sight <strong>in</strong>to the doma<strong>in</strong><br />
of e-<strong>government</strong>. We are present<strong>in</strong>g three studies from the GovStat project below,<br />
namely the studies by Efron et al. (2004) and Kules & Shneidermann (2004; 2005). We<br />
f<strong>in</strong>ish this section with a review of Roitblat, Kershaw & Oot (2010).<br />
Efron et al. (2004) have carried out a study of mach<strong>in</strong>e learn<strong>in</strong>g with<strong>in</strong> the<br />
context of the GovStat project. The purpose of the study was to compare three<br />
representations of documents; keyword, title, and the full text of the documents. The<br />
study consisted of two phases. The first phase clustered 1279 content rich documents<br />
us<strong>in</strong>g k-means. The clusters were generated on the basis of either the full text of the
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
documents, the documents’ titles, or on the basis of human generated keyword<br />
metadata. One document could appear <strong>in</strong> one cluster each. The purpose was to identify<br />
the topic of the documents. The quality of the three approaches was evaluated as part of<br />
the first phase. 10 categories were generated on the basis of phase 1.<br />
In the second phase, the rema<strong>in</strong>der 14.000 documents of the collection were<br />
labelled us<strong>in</strong>g automatic classification. Ahead of the classification of documents, a<br />
classifier had been tra<strong>in</strong>ed on the basis of the topics identified <strong>in</strong> phase 1. Four models<br />
formed the basis for the classifier; probabilistic Roccio, naive Bayes (below: limited<br />
model), support vector mach<strong>in</strong>es, and an augmented model, that applied naive Bayes on<br />
a tra<strong>in</strong><strong>in</strong>g set extended with supplementary documents from the www doma<strong>in</strong> <strong>in</strong><br />
question (below: extended model). Based on an analysis of the accuracy of the four<br />
models’ classification, the second phase compared the two versions of the naive Bayes<br />
classifier. 11 human judges work<strong>in</strong>g on the GovStat project tested the generality of the<br />
two rema<strong>in</strong><strong>in</strong>g classifiers.<br />
The analysis demonstrated that if the success of the classifiers were measured<br />
by their ability to classify documents correct <strong>in</strong> either first or second place, the extended<br />
model performed better than the simple model. However, if the success is measured by<br />
the two models’ ability to classify documents correct the first time, the limited model<br />
performs better. Further, when compared to human assignments to the classes, the<br />
naive Bayes tends to have a more even distribution of documents between the classes.<br />
5.5 Hybrid types of <strong>in</strong>tellectual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
Above we have been present<strong>in</strong>g what is considered prototypical approaches to<br />
automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In reality however, also examples of hybrid forms of <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
appear. Computer assisted manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> refers to the process, where elements of<br />
manual and automatic methods are comb<strong>in</strong>ed <strong>in</strong> order to handle <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In the<br />
literature, computer assisted manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> may also be referred to as computer aided<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> (e.g Lancaster, 2003), semiautomatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (e.g., Fangmeyer, 1974),<br />
mach<strong>in</strong>e aided <strong><strong>in</strong>dex<strong>in</strong>g</strong> (e.g., Milstead, 1992), or simply MAI.<br />
Two basic approaches to computer assisted manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> exist. One<br />
approach is labelled candidate term systems. In essence, candidate term systems<br />
suggest terms for assignment that are subsequently approved by human <strong>in</strong>dexers<br />
(Milstead, 1994, p. 579, Lancaster, 2003, p. 292). This k<strong>in</strong>d of MAI is represented <strong>in</strong> a<br />
system for <strong><strong>in</strong>dex<strong>in</strong>g</strong> at NASA (the NASA Lexical Dictionary (NLD)) (Silvester,<br />
108
109<br />
Chapter 5<br />
Genuardi & Kl<strong>in</strong>gbiel, 1994). In NLD an automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> procedure is carried out,<br />
which subsequently presents controlled <strong>in</strong>dex terms to <strong>in</strong>dexers for manual approval.<br />
The <strong><strong>in</strong>dex<strong>in</strong>g</strong> system was tested for the effect on the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process of human<br />
<strong>in</strong>dexers. One result of the implementation of MAI was, that the average number of<br />
<strong>in</strong>dex terms assigned to documents had been reduced, result<strong>in</strong>g <strong>in</strong> an <strong>in</strong>creased<br />
uniformity <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Also, a predom<strong>in</strong>ant part of the <strong>in</strong>dexers were able to save<br />
time due to the suggestions for <strong>in</strong>dex terms provided by the system. This corresponds<br />
to the results of a later study made by Berrios, Cuc<strong>in</strong>a & Fagan (2002). They found that<br />
the number of <strong>in</strong>dexed documents <strong>in</strong>creased along with the degree of automation <strong>in</strong> the<br />
test system. Lastly, Silvester & Kl<strong>in</strong>gbiel’s work <strong>in</strong>dicated that the selection of <strong>in</strong>dex<br />
terms had become more qualitative s<strong>in</strong>ce the <strong>in</strong>dexers did not need to spend their time<br />
look<strong>in</strong>g up terms <strong>in</strong> the controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> language (1993).<br />
The second approach supplements human <strong><strong>in</strong>dex<strong>in</strong>g</strong> by means of some sort of<br />
automatic procedure. Here, <strong><strong>in</strong>dex<strong>in</strong>g</strong> terms assigned by humans (or similar human<br />
<strong>in</strong>puts) are taken as po<strong>in</strong>t of departure for a subsequent add<strong>in</strong>g of <strong>in</strong>dex terms (Milstead,<br />
1994, p. 579-80, Lancaster, 2003, p. 291). In this sense, the approach corresponds with<br />
text categorization mentioned above. When text categorization takes the form of<br />
semiautomatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>, a ranked order<strong>in</strong>g of potential relevant categories is presented<br />
to the <strong>in</strong>dexer for approval (Sebastiani, 2002). One example of this approach is the<br />
MedIndEx project presented by Humphrey (1989). The project has been carried out<br />
with<strong>in</strong> the National Library of Medic<strong>in</strong>e and is based on Medical Subject Head<strong>in</strong>gs<br />
(MeSH) and the literature found <strong>in</strong> Medl<strong>in</strong>e. In MedIndEx a detailed system of<br />
predef<strong>in</strong>ed frames, facts, and rules guide the automatic analyses of documents <strong>in</strong> the<br />
system. These tools form the human <strong>in</strong>put to the automatic part of the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
procedure. However, though mentioned as an example of supplement<strong>in</strong>g human<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> by Milstead (1994, p. 580), the MedIndEx also shares some characteristics<br />
with candidate term systems by <strong>in</strong>volv<strong>in</strong>g <strong>in</strong>dexers to approve or reject the suggestions<br />
provided after automatic procedures have been carried out.<br />
Hodge characterizes MAI along a cont<strong>in</strong>uum accord<strong>in</strong>g to the degree of<br />
support provided by <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> aid, rang<strong>in</strong>g from no computer support (basically<br />
manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>) to full automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (1994). At the lowest level of mach<strong>in</strong>e<br />
support, we f<strong>in</strong>d support of clerical activities. Examples are location of <strong>in</strong>dex terms and<br />
entries of terms <strong>in</strong> mach<strong>in</strong>e-readable form. Tools for this type of support comprise<br />
thesauri and other k<strong>in</strong>ds of controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages. Next follows support for<br />
quality control. The quality control may take different forms. In general this type of
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
support checks the manual <strong>in</strong>put of <strong>in</strong>dexers rang<strong>in</strong>g from spell<strong>in</strong>g corrections to<br />
suggestions for candidate preferred terms <strong>in</strong> the case of <strong>in</strong>valid terms. The last step of<br />
the cont<strong>in</strong>uum supports <strong>in</strong>tellectual activities regard<strong>in</strong>g the selection of terms as well.<br />
One way would be to prompt the <strong>in</strong>dexer for <strong>in</strong>dex terms, e.g. <strong>in</strong> relation to other terms<br />
entered by the <strong>in</strong>dexer. Another type would be rem<strong>in</strong>d<strong>in</strong>g the <strong>in</strong>dexer of required<br />
elements <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process. Basically, semiautomatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can be useful <strong>in</strong><br />
the <strong>in</strong>troductory stages of full automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> for a manual review of suggested<br />
<strong>in</strong>dex terms. Also, the hybrid between manual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can be applied<br />
with the purpose of enhanc<strong>in</strong>g manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
5.6 Summary<br />
The present chapter have presented the concept and process of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. What<br />
we have seen, is a number of ways to characterize <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The share of variables<br />
outl<strong>in</strong>ed throughout the chapter stresses a basic premise of empirical evaluations of<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>, namely the challenge of controll<strong>in</strong>g variables.<br />
We have outl<strong>in</strong>ed the characteristics of manual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>, but<br />
also hybrid types of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. We have seen that irrespective of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> carried<br />
out, pros and cons can be deduced. <strong>Automatic</strong> and semiautomatic methods for <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
have been tested <strong>in</strong> a variety of sett<strong>in</strong>gs. The <strong>in</strong>troduction of automatic, extracted<br />
methods have allowed for an automatic production of controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong>, which is<br />
highly desirable with the amounts of documents produced today. <strong>Automatic</strong>, extracted<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> allows for a controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> cleared of the challenges of consistency <strong>in</strong><br />
manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
As it appears from the reviews <strong>in</strong> the automatic part of the chapter, the web or<br />
parts of it have been the subject of <strong>in</strong>vestigation <strong>in</strong> many studies. In the search test of<br />
present Ph.D. study we <strong>in</strong>vestigate the applicability of automatic, assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> a<br />
particular test sett<strong>in</strong>g, namely an <strong>in</strong>tranet. This equals a considerably smaller amount of<br />
documents compared to the Web. Further, automated approaches have been tested on e<strong>government</strong><br />
subfields with promis<strong>in</strong>g results. However, we do not have knowledge of<br />
studies test<strong>in</strong>g the methods <strong>in</strong> a collection of documents embrac<strong>in</strong>g the entire range of<br />
document types applied <strong>in</strong> e-<strong>government</strong>. This will be the aim of the search test that<br />
follows.<br />
110
6 Empirical framework<br />
111<br />
Chapter 6<br />
We have previously presented the methodological standpo<strong>in</strong>t of the thesis. In the<br />
present chapter, we move on to report on the empirical methods applied <strong>in</strong> the two<br />
studies constitut<strong>in</strong>g the research project, the doma<strong>in</strong> study and the search test. In the<br />
presentation we follow the sequence of the actual execution of the <strong>in</strong>dividual studies.<br />
This means that we <strong>in</strong>itiate with the doma<strong>in</strong> study <strong>in</strong>clud<strong>in</strong>g questionnaire and focus<br />
group <strong>in</strong>terview designs. Next follows the design of the search test. The chapter is<br />
closed by a section expla<strong>in</strong><strong>in</strong>g the relation between the collected empirical data and the<br />
research questions put forward <strong>in</strong> the <strong>in</strong>troductory chapter.<br />
6.1 Doma<strong>in</strong> study<br />
The case study is <strong>in</strong>itiated by a doma<strong>in</strong> study. As expla<strong>in</strong>ed, the purpose of<br />
the doma<strong>in</strong> study is to identify and account for the contextual framework of the search<br />
test as regards the e-<strong>government</strong> doma<strong>in</strong>. The doma<strong>in</strong> study consists of two separate<br />
parts; an analytical and an empirical. The analytical part has been reported <strong>in</strong> the<br />
<strong>in</strong>formation seek<strong>in</strong>g review (Chapter 4). The empirical part comprises two elements; a<br />
survey questionnaire followed by focus group <strong>in</strong>terviews. To be able to dist<strong>in</strong>guish,<br />
we will use the term respondent to denote a questionnaire participant and the term<br />
participant to denote a focus group participant <strong>in</strong> the rema<strong>in</strong>der of the thesis.<br />
The aim of comb<strong>in</strong><strong>in</strong>g different types of data collection methods is to be able to<br />
compensate for <strong>in</strong>herent <strong>in</strong>dividual limitations of the implied methods. Thus, the<br />
weaknesses and the strengths of methods have an effect on the outcome of the data<br />
collection. The comb<strong>in</strong>ation of different research methods <strong>in</strong> order to explore a research<br />
problem is commonly known as method triangulation. The order and types of methods<br />
applied for triangulation may vary. Miles & Huberman have identified four overall<br />
successions of research methods. The succession may either start out with quantitative<br />
methods followed by qualitative methods, with qualitative methods followed by<br />
quantitative methods, or may employ both methods <strong>in</strong> a parallel manner. The choice of<br />
succession depends on the purpose of the study carried out (Miles & Huberman, 1994,<br />
p. 41). In the doma<strong>in</strong> study we followed the first type of succession, quantitative data<br />
followed by qualitative. This succession helps the researcher ga<strong>in</strong> an overview of
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
important phenomena <strong>in</strong> the first part of the data collection (<strong>in</strong> the present work through<br />
a questionnaire). The quantitative data collection is subsequently followed up by<br />
qualitative data collection (<strong>in</strong> the present work through focus group <strong>in</strong>terviews) to<br />
provide <strong>in</strong>sight <strong>in</strong>to and understand<strong>in</strong>g of the patterns identified <strong>in</strong> the quantitative data.<br />
The survey questionnaire was distributed to a sample of the employees <strong>in</strong> the<br />
case organization. The purpose of the questionnaire was to ga<strong>in</strong> <strong>in</strong>sight <strong>in</strong>to the<br />
distribution of work tasks, <strong>in</strong>formation needs and metadata preferences across the case<br />
organization. Further we wanted to <strong>in</strong>vestigate whether there was a dependency<br />
between work tasks and seek<strong>in</strong>g behaviour, as it could <strong>in</strong>fluence on the choice of test<br />
persons for the search test. A number of advantages and disadvantages are connected<br />
with questionnaires. Questionnaires are associated with several strengths. As the<br />
researcher is not present dur<strong>in</strong>g data collection, the costs are lower compared to other<br />
methods. Also the analysis of data is less time consum<strong>in</strong>g (Frankfort-Nachmias &<br />
Nachmias, 1996). This is particularly the case, when us<strong>in</strong>g Kalus (see section 6.2.1) for<br />
data collection, due to the feature <strong>in</strong> the system allow<strong>in</strong>g the research to extract the<br />
results of the survey directly <strong>in</strong>to an excel spread sheet, which can subsequently be<br />
imported to an analysis software, e.g., SPSS. Further, bias may be reduced due to the<br />
lack of <strong>in</strong>teraction between <strong>in</strong>terviewer and <strong>in</strong>terviewee, and due to the high degree of<br />
anonymity (Frankfort-Nachmias & Nachmias, 1996). In both cases reduction of bias is<br />
ascribed to the non-present <strong>in</strong>terviewer. The presence of an <strong>in</strong>terviewer may result <strong>in</strong><br />
bad communication between <strong>in</strong>terviewer and <strong>in</strong>terviewee. Also the skills of an<br />
<strong>in</strong>terviewer may <strong>in</strong>fluence the results, when conduct<strong>in</strong>g <strong>in</strong>terviews. The presence of an<br />
<strong>in</strong>terviewer can also affect the answers of the respondent to become less honest, because<br />
the respondent’s feel<strong>in</strong>g of anonymity is low. However, questionnaires also have<br />
weaknesses. In questionnaires it is highly important, that questions are understandable<br />
to the respondents, s<strong>in</strong>ce the researcher is not present to expla<strong>in</strong> the mean<strong>in</strong>g of<br />
questions to the respondents. In this manner the lack of presence of the researcher at the<br />
same time becomes a strength and a weakness <strong>in</strong> questionnaires. Thus, <strong>in</strong><br />
questionnaires, the importance of understandable and unambiguous questions cannot be<br />
emphasized enough. Further, a common problem <strong>in</strong> questionnaires is low response<br />
rates (Frankfort-Nachmias & Nachmias, 1996).<br />
The second and qualitative part of the doma<strong>in</strong> study consisted of seven focus<br />
group <strong>in</strong>terviews. Focus groups are associated with a number of characteristics<br />
imply<strong>in</strong>g strengths or weaknesses <strong>in</strong> terms of their function as a tool for data collection.<br />
A dist<strong>in</strong>ctive feature of focus groups is the synergy aris<strong>in</strong>g from the <strong>in</strong>teraction between<br />
112
113<br />
Chapter 6<br />
the participants. This is a strength, when result<strong>in</strong>g <strong>in</strong> a more thorough discussion than<br />
can be achieved <strong>in</strong> an <strong>in</strong>dividual <strong>in</strong>terview. On the other hand the presence of other<br />
participants may cause censor<strong>in</strong>g and conform<strong>in</strong>g with the participants, which is not<br />
desirable (Carey & Smith, 1994, p. 124). Hav<strong>in</strong>g more <strong>in</strong>terviewees present at the same<br />
time further enables the <strong>in</strong>terviewer to ask participants to compare experiences and<br />
understand<strong>in</strong>gs, which aga<strong>in</strong> enriches the understand<strong>in</strong>g of the <strong>in</strong>dividual <strong>in</strong> the group<br />
(Morgan, 1996, p. 139). In quantitative terms, the method provides data <strong>in</strong> a quick<br />
manner and at lower costs compared to <strong>in</strong>dividual <strong>in</strong>terviews (Walden, 2006, p. 224).<br />
The comb<strong>in</strong>ation of these features made us choose focus group <strong>in</strong>terviews over<br />
<strong>in</strong>dividual <strong>in</strong>terviews for the qualitative exploration of the survey results.<br />
The focal po<strong>in</strong>t of the focus group <strong>in</strong>terviews were the results of the<br />
questionnaire. Thus, we wanted to <strong>in</strong>troduce the participants to the patterns identified <strong>in</strong><br />
the results <strong>in</strong> order to encourage to elaboration and discussion <strong>in</strong> the group. In the focus<br />
group <strong>in</strong>terviews we wanted to present the participants with the questionnaire results <strong>in</strong><br />
order for them to be able to elaborate on and discuss the patterns. In the section to<br />
follow, we are elaborat<strong>in</strong>g on the methods applied for the doma<strong>in</strong> study.<br />
6.2 Questionnaire design, collection, and analysis<br />
A questionnaire was used as the quantitative part of the doma<strong>in</strong> study to get an<br />
overall view of the distribution of employees, work tasks, and seek<strong>in</strong>g behaviour across<br />
the case organization. The collection of data for the survey lasted for one week took<br />
place between 11th and 18th December 2008. We kept a rather short time w<strong>in</strong>dow for<br />
the <strong>in</strong>vestigation, because we hypothesized that most people, if they respond to a<br />
questionnaire, respond rather quickly, while they still remember hav<strong>in</strong>g received an<br />
<strong>in</strong>vitation to participate. As expected, the majority of responses were received with<strong>in</strong><br />
the first two days after the launch of the survey. An <strong>in</strong>vitation to participate was<br />
distributed by mail (see Appendix 3) to a stratified sample of the employees. The email<br />
expla<strong>in</strong>ed the background of the <strong>in</strong>vestigation. Follow<strong>in</strong>g a l<strong>in</strong>k <strong>in</strong> the e-mail, the<br />
respondents were taken to the onl<strong>in</strong>e survey. After 5 days an e-mail was sent to rem<strong>in</strong>d<br />
the potential respondents of the survey. We settled for one rem<strong>in</strong>der <strong>in</strong> to avoid<br />
annoy<strong>in</strong>g the respondents (cf. Cook, Heath & Thompson, 2000, p. 831).<br />
In Chapter 2 we <strong>in</strong>troduced SKATs bus<strong>in</strong>ess model that comprises and<br />
describes all work tasks handled by the employees, external as well as <strong>in</strong>ternal (see<br />
Appendix 1). The condensed work tasks and the ma<strong>in</strong> processes of the bus<strong>in</strong>ess model
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
have formed the basis for the questionnaire and the recruitment for the focus groups<br />
respectively. As emphasized earlier, the diversity of tasks handled by SKAT is large as<br />
regards the topics and the form of the tasks. In response to this, we had a hypothesis,<br />
that different work tasks might generate differences <strong>in</strong> the seek<strong>in</strong>g behaviour. To test<br />
this hypothesis, we used the work tasks from the bus<strong>in</strong>ess model as the focal po<strong>in</strong>t <strong>in</strong> the<br />
questionnaire. Thus, each respondent answered clarify<strong>in</strong>g questions about work tasks<br />
relevant to them. The questionnaire consists of two overall parts; one common to all<br />
respondents identify<strong>in</strong>g background data; and a second part identify<strong>in</strong>g the work tasks<br />
handled by the respondents. In the second part of the questionnaire, the respondents<br />
answered a number of questions explor<strong>in</strong>g seek<strong>in</strong>g behaviour triggered from their work<br />
tasks. We will elaborate further on this below.<br />
6.2.1 Technique and structure<br />
The questionnaire was developed us<strong>in</strong>g the software Kalus<br />
(http://www.kalus.dk). The questionnaire ma<strong>in</strong>ly consisted of pre-coded (or closed-<br />
choice) questions, that has a f<strong>in</strong>ite range of answers for the respondent to choose from,<br />
when respond<strong>in</strong>g to a question. The questionnaire does also conta<strong>in</strong> a few examples of<br />
open-ended questions. Open-ended questions were <strong>in</strong>cluded <strong>in</strong> order to allow for<br />
clarification or supplement of a prior pre-coded question. The strength of us<strong>in</strong>g precoded<br />
questions is that responses are not subject to a potential mis<strong>in</strong>terpretation before<br />
they can be compared and calculated. This is the ma<strong>in</strong> reason for the prevail<strong>in</strong>g role of<br />
these particular questions <strong>in</strong> the questionnaire. However, concurrently it should be kept<br />
<strong>in</strong> m<strong>in</strong>d that a common problem with this type of questions is the miss<strong>in</strong>g possibilities<br />
for respondents to elaborate on their answers (Buck<strong>in</strong>gham & Saunders, 2004; de Vaus,<br />
2002b). The choice of primarily pre-coded questions and mandatory answers were<br />
closely tied to the purpose of the questionnaire and it’s relation to the overall research<br />
questions. Thus, the overall purpose of the questionnaire was to provide an overview of<br />
the distribution across the organization as to work tasks and <strong>in</strong>formation seek<strong>in</strong>g. The<br />
more detailed elaboration and expla<strong>in</strong><strong>in</strong>g of questionnaire results were to be<br />
<strong>in</strong>vestigated <strong>in</strong> the focus groups. With this <strong>in</strong> m<strong>in</strong>d, it was reasonable to support the<br />
overview function of the questionnaire by ma<strong>in</strong>ly pre-coded questions and prompted<br />
answers. When apply<strong>in</strong>g this approach for a questionnaire, the pilot test<strong>in</strong>g become<br />
even more important (de Vaus, 2002b). Further, research show that the word<strong>in</strong>g of<br />
questions may have an impact on the outcome of surveys (e.g., Olsen, 1997). Also the<br />
<strong>in</strong>troduction to a question affects how a question is answered (Clark & Schober, 1992).<br />
114
115<br />
Chapter 6<br />
When design<strong>in</strong>g the questionnaire, we wanted to take <strong>in</strong>to account the considerable<br />
sensitivity towards use of language that respondents have. One way of do<strong>in</strong>g this is to<br />
aim at mak<strong>in</strong>g the questions as precise as possible, for <strong>in</strong>stance by us<strong>in</strong>g probes or<br />
<strong>in</strong>corporat<strong>in</strong>g cognitive reliefs <strong>in</strong>to the questions (Olsen, 1997, p. 300). What we<br />
wanted to achieve was to reduce the degree of uncerta<strong>in</strong>ty by elaborat<strong>in</strong>g on the<br />
questions and the possibilities for replies form the respondents. The word<strong>in</strong>g of the<br />
questionnaire was tested ahead of the data collection. We elaborate on the pilot and<br />
pretest<strong>in</strong>g <strong>in</strong> section 6.2.4.<br />
Cont<strong>in</strong>gency questions were used to direct respondents to questions relevant to<br />
them (de Vaus, 2002b). In the questionnaire, all questions about work tasks worked as<br />
cont<strong>in</strong>gency questions <strong>in</strong> order to guide the respondents to the questions relevant to the<br />
work task <strong>in</strong> question. The purpose was to make sure respondents only reported seek<strong>in</strong>g<br />
behaviour regard<strong>in</strong>g their actual work tasks. Further, cont<strong>in</strong>gency questions <strong>in</strong>creases<br />
the likel<strong>in</strong>ess of the respondent to f<strong>in</strong>ish the survey, as the cognitive complexity is<br />
reduced (Shropshire, Hawdon & Witte, 2009, p. 356). In most questions we prompted<br />
for answers. This disposition may be discussed. Optional answers have the advantage<br />
of not forc<strong>in</strong>g the respondent to respond to a question. At the same time, optional<br />
answers tend to be skipped, when respondents work through the questionnaire (cf.<br />
Evans & Mathur, 2005, p. 200). Prompted answers on the other hand may cause, that<br />
respondents give up answer<strong>in</strong>g the questionnaire and do not f<strong>in</strong>ish. After careful<br />
consideration we chose to prompt for answers <strong>in</strong> order to make sure, that the important<br />
questions were answered and not avoid hav<strong>in</strong>g to leave out too many answers due to<br />
<strong>in</strong>completeness.<br />
6.2.2 Content<br />
The questionnaire conta<strong>in</strong>s six questions for each work task. The six questions<br />
are replicated for all of the n<strong>in</strong>eteen condensed work tasks <strong>in</strong>cluded <strong>in</strong> the questionnaire,<br />
result<strong>in</strong>g <strong>in</strong> a 75 pages questionnaire. Due to the cont<strong>in</strong>gency character of the questions<br />
regard<strong>in</strong>g work tasks, not all pages were presented to the respondents. Before gett<strong>in</strong>g to<br />
the po<strong>in</strong>t of elaborat<strong>in</strong>g on the work tasks, the respondents were asked about a number<br />
of background data. The questionnaire f<strong>in</strong>ished by thank<strong>in</strong>g the respondents for their<br />
participation, time and contribution.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
6.2.2.1 Background data<br />
The questionnaire was <strong>in</strong>itiated by a number of demographic questions. We<br />
refer to these as background data. The questions count the respondents’:<br />
year of birth (1a),<br />
gender (1b),<br />
most recent f<strong>in</strong>ished education (2a),<br />
title of education (2b),<br />
place of employment (3a),<br />
departmental affiliation (4-10), and<br />
length of service <strong>in</strong> the organization (11) 9<br />
The purpose of the questions is to enable test<strong>in</strong>g for the impact of demographic<br />
characteristics on seek<strong>in</strong>g behaviour. Further, the background data is needed <strong>in</strong> order to<br />
control for the degree to which the sample reflects the population, it has been drawn<br />
from.<br />
6.2.2.2 Work tasks<br />
Ahead of the design of the questionnaire, we hypothesized that seek<strong>in</strong>g<br />
behaviour could differ as to the work task <strong>in</strong> question, both when it come the subject<br />
and <strong>in</strong> particular the complexity of the work task. The assumptions were based on<br />
Byström’s f<strong>in</strong>d<strong>in</strong>gs of the correspondence between work task complexity and seek<strong>in</strong>g<br />
behaviour (e.g., reported <strong>in</strong> Byström & Järvel<strong>in</strong>, 1995; Byström, 1997) (see section<br />
4.4.5). The generic work tasks described by SKAT do not address the complexity <strong>in</strong><br />
Byström’s terms as such. Rather they are topical descriptions of the areas of<br />
responsibility of the organization. Despite the difference of def<strong>in</strong>itions, SKATs<br />
descriptions of work tasks were used as the basic foundation of the questionnaire.<br />
Further, <strong>in</strong> comb<strong>in</strong>ation with the <strong>in</strong>formation need types (see section 6.2.2.3) we do<br />
become an impression of the complexity of the work task. This decision had several<br />
reasons. The large diversity of tasks which have been mentioned previously is a core<br />
characteristic of the organization. Identify<strong>in</strong>g <strong>in</strong>formation seek<strong>in</strong>g characteristics on the<br />
basis of work tasks allowed for data that could <strong>in</strong>form us about potential (and expected)<br />
differences <strong>in</strong> seek<strong>in</strong>g behaviour among the work tasks. We needed this knowledge for<br />
two purposes. The ma<strong>in</strong> purpose was to be able to answer the research questions<br />
9 The parentheses refers to the question numbers (see Appendix 4).<br />
116
117<br />
Chapter 6<br />
concerned with seek<strong>in</strong>g behaviour. Secondly, we wanted to use the data for the<br />
selection of test persons for the search test. For this secondary purpose we wanted to<br />
explore the use of the <strong>in</strong>tranet for different work purposes. Specifically we wanted to<br />
identify potential variations <strong>in</strong> the <strong>in</strong>tensity of use of the <strong>in</strong>tranet. Lastly, the work task<br />
descriptions allowed for a standardized framework of the work areas covered by SKAT<br />
<strong>in</strong> a language familiar to the respondents. The work tasks are represented on pages 12,<br />
16, 38, 45, 49, and 65 <strong>in</strong> the questionnaire (see Appendix 4). In sum, 19 work tasks are<br />
<strong>in</strong>cluded distributed among six ma<strong>in</strong> processes. Probes were considered particularly<br />
important <strong>in</strong> the questions regard<strong>in</strong>g work tasks. Thus, s<strong>in</strong>ce the selection or<br />
deselection of work tasks is highly <strong>in</strong>fluential on the results, it was important, that the<br />
respondents were able to identify their work tasks <strong>in</strong> the generic descriptions. For that<br />
reason we used the probe to give examples of the subtasks conta<strong>in</strong>ed <strong>in</strong> the overall work<br />
task.<br />
6.2.2.3 Elaboration of work tasks<br />
The work tasks were elaborated on through six questions. The first, frequency<br />
of work tasks, were considered relevant <strong>in</strong> order to explore the relation between work<br />
tasks and <strong>in</strong>formation seek<strong>in</strong>g (see question 15, Appendix 4). For the case organization,<br />
this question was particularly relevant due to the share of work tasks that are highly<br />
seasonal. The frequency also allowed <strong>in</strong>sight <strong>in</strong>to to the relative importance of the work<br />
task <strong>in</strong> question, and thereby enables to exam<strong>in</strong>e, whether the frequency affects other<br />
aspects of seek<strong>in</strong>g behaviour. Next followed the respondents’ experience with the work<br />
task (see question 16, Appendix 4). The question was <strong>in</strong>cluded s<strong>in</strong>ce this was expected<br />
to have an <strong>in</strong>fluence on their seek<strong>in</strong>g behaviour, e.g., as to selection of sources and<br />
frequency of <strong>in</strong>formation seek<strong>in</strong>g.<br />
The third question regarded the frequency of <strong>in</strong>formation seek<strong>in</strong>g (see question<br />
17, Appendix 4). The rationale for ask<strong>in</strong>g this question was that some of the work tasks<br />
might have a tendency to generate <strong>in</strong>formation seek<strong>in</strong>g more often than others. Ask<strong>in</strong>g<br />
this question we wanted to explore, whether there was a divergence between how<br />
<strong>in</strong>formation demand<strong>in</strong>g the outl<strong>in</strong>ed work tasks were. We identified the categories of<br />
choice as to their frequency (every time, every second time, every 3 rd or 4 th time, or<br />
practically never) <strong>in</strong>stead of us<strong>in</strong>g less exact alternatives like almost always, sometimes,<br />
once <strong>in</strong> a while, and the like. We are aware, that <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> practice does<br />
not occur on such fixed <strong>in</strong>tervals as suggested <strong>in</strong> the listed answer categories, which<br />
may have confused the respondents. Further, the responses were expected to express<br />
average frequencies. On the other hand, we considered the alternative (e.g., often,
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
rarely etc.) as too semantically open. The challenge of semantically open categories is<br />
the <strong>in</strong>terpretation of results, which become less exact.<br />
Fourth came <strong>in</strong>formation sources (see question 18, Appendix 4). The selection<br />
of <strong>in</strong>formation sources reflects aspects of how a work task is dealt with. This was the<br />
ma<strong>in</strong> reason for <strong>in</strong>clud<strong>in</strong>g the question <strong>in</strong> the questionnaire. Further, the question about<br />
<strong>in</strong>formation sources had the purpose of identify<strong>in</strong>g the relative importance of the<br />
<strong>in</strong>tranet compared to other <strong>in</strong>formation sources. F<strong>in</strong>ally, with this question, we wanted<br />
to identify, if there was a difference <strong>in</strong> the importance of the <strong>in</strong>tranet depend<strong>in</strong>g on the<br />
work task <strong>in</strong> question. This could po<strong>in</strong>t to particular work tasks of relevance, when<br />
identify<strong>in</strong>g test persons for the search test. Be<strong>in</strong>g aware, that the work tasks handled <strong>in</strong><br />
the case organization are quite diverse, we listed some <strong>in</strong>formation sources but also<br />
allowed for the respondents to add miss<strong>in</strong>g sources <strong>in</strong> an open field. This way we were<br />
able to get a comprehensive picture of the use of <strong>in</strong>formation sources, without hav<strong>in</strong>g to<br />
list too many sources that might not be relevant to the majority of respondents. The<br />
question was constructed <strong>in</strong> a way that allowed for the respondents to choose the<br />
sources relevant to them, whether one or more. The question was measured <strong>in</strong> terms of<br />
dichotomous variables s<strong>in</strong>ce it enables us to compare the results with prior results. In<br />
the light of the changes of direction <strong>in</strong> <strong>in</strong>formation seek<strong>in</strong>g studies mentioned <strong>in</strong> section<br />
4.2, one could dispute the relevance of the <strong>in</strong>formation sources questions <strong>in</strong> the<br />
questionnaire. On the other hand, seek<strong>in</strong>g studies that <strong>in</strong>clude sources of <strong>in</strong>formation<br />
cont<strong>in</strong>ue to f<strong>in</strong>d their legitimacy (e.g., Davies, 2007; Makri, Blandford & Cox, 2008a;<br />
Connaway, Dickey & Radford, 2011; Lu & Yuan, 2011). In addition, also more recent<br />
studies and models of <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong>volves the aspect of <strong>in</strong>formation sources,<br />
yet with another focal po<strong>in</strong>t (e.g., Byström & Järvel<strong>in</strong>, 1995; Byström, 1999). In the<br />
present study, <strong>in</strong>vestigat<strong>in</strong>g the use of sources had one significant reason; we wanted to<br />
map the <strong>in</strong>tranet of the case organization, s<strong>in</strong>ce it is the object of the evaluation of<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> methods later <strong>in</strong> the thesis. By mapp<strong>in</strong>g the <strong>in</strong>tranet along with other<br />
<strong>in</strong>formation sources, we wanted to display the relative importance of the <strong>in</strong>tranet as to<br />
its function, strength, and weaknesses as experienced by the organization employees. A<br />
side effect of the question was the possibility of mirror<strong>in</strong>g the scenery for <strong>in</strong>formation<br />
seek<strong>in</strong>g <strong>in</strong> the organization.<br />
The fifth question measured the <strong>in</strong>formation needs that emerge when deal<strong>in</strong>g<br />
with a work task (see question 19, Appendix 4). With the variable ‘<strong>in</strong>formation need’,<br />
we wanted to discover, if the identified work tasks of the organisation have a tendency<br />
to generate certa<strong>in</strong> types of <strong>in</strong>formation needs. However, it may be complicated<br />
118
119<br />
Chapter 6<br />
represent<strong>in</strong>g the variable by the theoretical concepts themselves. This is particularly<br />
difficult when consider<strong>in</strong>g the problem of respondents’ sensitivity towards the<br />
formulation of questions discussed above. Therefore we represented the <strong>in</strong>formation<br />
needs with eight <strong>in</strong>dicators of different <strong>in</strong>formation needs. The rationale is that it is<br />
easier for the respondents to relate to an <strong>in</strong>dicator compared to a theoretical concept.<br />
The decision about which theoretical basis to use for operationalization of the<br />
<strong>in</strong>formation needs was highly <strong>in</strong>fluenced by the method selected for data collection. An<br />
obvious choice would have been to use the recent proposal for types of <strong>in</strong>formation<br />
needs suggested by Ingwersen & Järvel<strong>in</strong> (2005). However the proposal conta<strong>in</strong>s eight<br />
different types of <strong>in</strong>formation needs, which would be difficult to operationalize <strong>in</strong> a<br />
form understandable to the respondents. Instead we used the trichotomy suggested by<br />
Ingwersen (1992, pp. 116-117). Here the suggested <strong>in</strong>formation needs are: 1)<br />
verificative needs (VN), 2) conscious topical needs (CTN), and 3) muddled topical<br />
needs (MTN).<br />
Indicator Description Correspond<strong>in</strong>g<br />
<strong>in</strong>formation need<br />
1 I know exactly which documents I need <strong>in</strong> order to<br />
solve the work task<br />
VN<br />
2 I need to f<strong>in</strong>d a document I have used before VN<br />
3 I pretty much know which documents exists on the CTN<br />
subject<br />
4 I am work<strong>in</strong>g with a new project with<strong>in</strong> a subject area<br />
well known to me. I would like to acqua<strong>in</strong>t myself<br />
with the part, that is new to me<br />
MTN<br />
5 I am look<strong>in</strong>g for documents for a new work task<br />
with<strong>in</strong> a subject area that is familiar to me<br />
6 I am work<strong>in</strong>g with a subject area, that I have not been<br />
work<strong>in</strong>g with before<br />
7 I know the subject well but need a specific piece of<br />
<strong>in</strong>formation<br />
CTN<br />
MTN<br />
CTN<br />
Table 6.1 Indicators of <strong>in</strong>formation needs <strong>in</strong> questionnaire and correspond<strong>in</strong>g theoretical<br />
descriptions
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
When a user is hav<strong>in</strong>g a verificative <strong>in</strong>formation need, he wants to locate an<br />
item or piece of <strong>in</strong>formation, where some k<strong>in</strong>d of bibliographic <strong>in</strong>formation is known.<br />
Conscious topical needs cover <strong>in</strong>formation needs, <strong>in</strong> which the user wants to discover<br />
aspects of a subject matter known to her. Both verificative and conscious topical needs<br />
are associated with strong cognitive structures. That is, the uncerta<strong>in</strong>ty of the<br />
<strong>in</strong>formation user is low. The muddled topical needs cover a situation, where a user<br />
wants to discover concepts and relations with<strong>in</strong> a subject area not well known to him.<br />
In this latter type of <strong>in</strong>formation need the cognitive structures are weaker as to the topic<br />
<strong>in</strong> question. Ingwersen’s (1992) trichotomy of <strong>in</strong>formation needs allowed for each of<br />
the dist<strong>in</strong>ct <strong>in</strong>formation needs to be represented by more than one <strong>in</strong>dicator and at the<br />
same time not overload<strong>in</strong>g the respondents with statements to relate to. Between 2 and<br />
three <strong>in</strong>dicators were developed to represent each <strong>in</strong>formation need. We restricted the<br />
number of <strong>in</strong>dicators due to the possible length of the questionnaire <strong>in</strong> l<strong>in</strong>e with de<br />
Vaus’ (2002b) directions. The <strong>in</strong>dicators used <strong>in</strong> the questionnaire are shown <strong>in</strong> Table<br />
6.1.<br />
The sixth and last question concerned preferred metadata (see question 20,<br />
Appendix 4). The overall rationale for ask<strong>in</strong>g the respondents about preferred metadata<br />
Table 6.2 List of respondents' preferred metadata listed <strong>in</strong> questionnaire<br />
Metadata<br />
1 Target audience (e.g. accountants, employers, divorced, exporters)<br />
2 Superior subjects (from the taxonomy)<br />
3 Subject (description of the specific topic of the document)<br />
4 Name/title of legal text/rul<strong>in</strong>g (e.g. LBK no. 931 as of 18/09/2008)<br />
5 Object of the document (e.g. car, property, stays abroad)<br />
6 Activity (e.g. deposits, assessments, bill<strong>in</strong>g, imports)<br />
7 Geographical data (e.g. name of city, country, region)<br />
8 Responsible <strong>in</strong>stitution or department (who published the document?)<br />
9 Project (is the document connected to a specific project?)<br />
10 Document type (e.g. rul<strong>in</strong>g, form, guidance)<br />
11 Document number (e.g., journal number, number of rul<strong>in</strong>gs, ISBN)<br />
12 Document ID (cont<strong>in</strong>uous number attached to documents at the <strong>in</strong>tranet)<br />
13 Work task (search<strong>in</strong>g for colleagues engaged <strong>in</strong> a particular service or task,<br />
regardless of location)<br />
120
121<br />
Chapter 6<br />
was to <strong>in</strong>vestigate the elements of an ideal search situation. Further, we wanted to use<br />
the results of this question to encourage the participants to expla<strong>in</strong> their present<br />
search<strong>in</strong>g behaviour when present<strong>in</strong>g the results to the focus groups. The question was<br />
designed as a list of 13 different metadata that the respondents could choose from. The<br />
metadata represented both <strong>in</strong>tr<strong>in</strong>sic and extr<strong>in</strong>sic metadata. That is, whether the<br />
metadata can be found directly or <strong>in</strong>directly <strong>in</strong> the document, or if the metadata<br />
designates someth<strong>in</strong>g external, but still relevant to the understand<strong>in</strong>g of the document.<br />
Metadata are usually divided <strong>in</strong>to three types depend<strong>in</strong>g on, whether they refer to the<br />
content, the context, or the structure of the document (Gilliland, 2008). The thirteen<br />
metadata <strong>in</strong>cluded were aimed at represent<strong>in</strong>g all three types of metadata (<strong>in</strong>cluded<br />
metadata appear from Table 6.2). The question f<strong>in</strong>ished with the possibility to suggest<br />
miss<strong>in</strong>g metadata <strong>in</strong> the list.<br />
6.2.3 Data collection<br />
At the time of the launch of the questionnaire SKAT had 8679 employees that<br />
comprise the population of the survey. We distributed the questionnaire to a sample of<br />
this population. A number of advantages are associated with sampl<strong>in</strong>g. One is to<br />
reduce time and costs when collect<strong>in</strong>g and analys<strong>in</strong>g results. In addition, samples are<br />
for the most part sufficiently reflect<strong>in</strong>g the population (Zikmund, 2000). In the present<br />
<strong>in</strong>vestigation a sample was preferred over <strong>in</strong>clud<strong>in</strong>g the population <strong>in</strong> order to reduce<br />
the amount of time spent on respond<strong>in</strong>g by the employees. The questionnaire was<br />
distributed to a stratified random sample of the employees with<strong>in</strong> the organization<br />
(Levy & Lemeshow, 2008). The strata were constructed on the basis of the<br />
departmental affiliation of the employees. With<strong>in</strong> each stratum a random sample was<br />
drawn reflect<strong>in</strong>g the relative size of the departments. The sample size was set to 799.<br />
In this way the sample was abundant above the amount required for a precision of<br />
results of less than 5% (cf. Israel, 1992).<br />
6.2.4 Pilot test<strong>in</strong>g<br />
In order to reduce the risk of errors (e.g., Buck<strong>in</strong>gham & Saunders, 2004, p.<br />
84), the questionnaire was pilot- and pretested ahead of <strong>in</strong>itiat<strong>in</strong>g the survey. The<br />
questionnaire was discussed with our contact person at SKAT and presented at a<br />
research meet<strong>in</strong>g with colleagues at RSLIS. Next, a pilot was carried out among a<br />
number of SKAT employees. The selection criteria reflected the stratified sample, yet
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
with fewer participants. The purpose of the pilot test was twofold. Firstly, we wanted<br />
to get an impression of the recipients’ perceived understand<strong>in</strong>g of the questionnaire and<br />
allow for feedback to potential ambiguities. Secondly, we wanted an <strong>in</strong>dication of what<br />
could be an expected response rate <strong>in</strong> the actual survey. This was needed <strong>in</strong> order to<br />
calculate the size of the sample. Further we wanted to test the questionnaire on a group<br />
of people resembl<strong>in</strong>g the ones, that would be answer<strong>in</strong>g the f<strong>in</strong>al version of the<br />
questionnaire as recommended by de Vaus (2002b). The pilot questionnaire<br />
corresponded to the f<strong>in</strong>al questionnaire. However, <strong>in</strong> the pilot questionnaire text boxes<br />
had been <strong>in</strong>serted <strong>in</strong> order to welcome the pilot respondents’ comments for the<br />
questions. The pilot was distributed to 89 respondents. Of these 29% f<strong>in</strong>ished the<br />
questionnaire (see Appendix 5 for further details on the pilot). The feedback from the<br />
pilot- and pre-tests was <strong>in</strong>corporated <strong>in</strong>to the questionnaire before it was distributed to<br />
the respondents. Corrections comprised add<strong>in</strong>g of options, word<strong>in</strong>g of probes,<br />
simplification of the layout, and the like.<br />
6.2.5 Data analysis<br />
The questionnaire data consisted of scales rang<strong>in</strong>g from categorical to <strong>in</strong>terval<br />
scale. The categorical data obviously appear when ask<strong>in</strong>g about the gender of the<br />
respondent. However, the questionnaire also conta<strong>in</strong>s a number of questions allow<strong>in</strong>g<br />
the respondents to select one or more predef<strong>in</strong>ed answers, e.g., regard<strong>in</strong>g <strong>in</strong>formation<br />
sources (question 18, see Appendix 4). In this case every choice also constitutes a<br />
categorical variable, imply<strong>in</strong>g that the variable either has been selected (=1) or<br />
deselected (=0) by the respondent. The quantum of categorical data <strong>in</strong> the data set<br />
determ<strong>in</strong>es the analysis of the questionnaire data. Thus, we used nonparametric<br />
statistics to analyse the data, s<strong>in</strong>ce the requirements for us<strong>in</strong>g parametric tests are data at<br />
<strong>in</strong>terval level (Siegel & Castellan, 1988, p. 33). The questionnaire data was analysed<br />
us<strong>in</strong>g descriptive statistics. Inferential statistics were carried out too, but did not<br />
perform results of adequate significance for report here.<br />
The descriptive, univariate analysis of the questionnaire data consists of<br />
frequency distributions as to the respondents and their seek<strong>in</strong>g behaviour. Frequencies<br />
are usually reported as percentages, because they are easier to read than raw<br />
frequencies. Further, compared to raw frequencies the comparison of percentages is<br />
more dist<strong>in</strong>ct, because the figures have been normalized. However, the basis of the<br />
normalization is a division of the frequencies by 100. The smaller the sample is, the<br />
more impact the s<strong>in</strong>gle unit gets when report<strong>in</strong>g results as percentages (Healey, 2007).<br />
122
123<br />
Chapter 6<br />
A predom<strong>in</strong>ant part of the work tasks reported <strong>in</strong> the questionnaire part of the doma<strong>in</strong><br />
study has less than 100 answers. In order to avoid comparison of figures <strong>in</strong> the<br />
univariate part of the analysis that is not true to the actual responses with<strong>in</strong> the s<strong>in</strong>gle<br />
work task, we will report the frequencies for all responses. Yet, raw frequencies are<br />
difficult to compare across two or more groups that do not have the same quantum of<br />
responses. With the comparisons of frequencies across work tasks <strong>in</strong> m<strong>in</strong>d, we also<br />
report the percentages <strong>in</strong> the relevant tables.<br />
Table 6.3 Cross tabulations carried out on the basis of variables <strong>in</strong> questionnaire data<br />
Independent<br />
variables<br />
Education<br />
Department<br />
Length of<br />
service<br />
Periodicity of<br />
occurrence of<br />
work task<br />
Experience<br />
with work task<br />
Frequency of<br />
<strong>in</strong>formation<br />
seek<strong>in</strong>g<br />
Use of<br />
<strong>in</strong>formation<br />
sources<br />
Frequency of<br />
<strong>in</strong>formation<br />
seek<strong>in</strong>g<br />
Use of<br />
<strong>in</strong>formation<br />
sources<br />
Dependent variables<br />
Indicators of<br />
<strong>in</strong>formation<br />
needs<br />
Preferred<br />
metadata<br />
Further, whenever relevant, we provide the average percentages <strong>in</strong> the univariate<br />
statistics tables. Two rationales lie beh<strong>in</strong>d this decision. One reason is that some tables<br />
are rather comprehensive because of the number of reported values. In these cases the<br />
average percentages can help ga<strong>in</strong> an overview of the content of the table. Also,<br />
average percentages can help clarify, if a certa<strong>in</strong> work task differs from the average<br />
distribution <strong>in</strong> upper or lower direction. Whenever reported the average percentages are
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
reported at the bottom of the tables. As for the descriptive, bivariate analyses, we<br />
carried out cross tabulations of central variables. The exact cross tabulations appear<br />
from Table 6.3. In the table, the columns represent dependent variables while the rows<br />
represent <strong>in</strong>dependent variables. That is, that we controlled for the degree of <strong>in</strong>fluence<br />
on the four dependent variables from the <strong>in</strong>dependent variables listed <strong>in</strong> the rows of the<br />
table. The results are reported <strong>in</strong> Chapter 7.<br />
6.2.6 Methodical reflections<br />
340 respondents completed the questionnaire result<strong>in</strong>g <strong>in</strong> a response rate of<br />
42,6%. 302 respondents (37,8%) did not log <strong>in</strong>to the questionnaire at all. However, of<br />
the 799 employees, that constituted the sample, 156 (19.5%) began the questionnaire,<br />
but did not f<strong>in</strong>ish (see Appendix 5). In the latter group of respondents, one part is<br />
<strong>in</strong>terest<strong>in</strong>g <strong>in</strong> particular. 66 respondents stop respond<strong>in</strong>g when f<strong>in</strong>ish<strong>in</strong>g the questions<br />
concern<strong>in</strong>g the work task Inspection. Further, 10.9% of the respondents complet<strong>in</strong>g the<br />
questionnaire, did not choose any work tasks (see Table 7.2). Different motives may be<br />
detected for non-response <strong>in</strong> surveys (see e.g., Nakash et al., 2008). We do not know<br />
what the exact reason for non-response <strong>in</strong> the present survey. Both <strong>in</strong>ternal and external<br />
motives can be identified. Internal causes could be that the respondents got bored with<br />
the questionnaire, because the same questions kept be<strong>in</strong>g repeated, when more than one<br />
work task was chosen. This was <strong>in</strong>dicated by some respondents. Another motive could<br />
be that the respondents could not relate to the description of work tasks, and therefore<br />
ended up choos<strong>in</strong>g none. An external reason could be that the employees <strong>in</strong> the<br />
organization are presented with questionnaires from time to time. It might be the case<br />
that this particular questionnaire was deselected because the <strong>in</strong>vited employees felt<br />
overloaded with questionnaire surveys. Whether the reasons are <strong>in</strong>ternal or external, the<br />
amount of respondents quitt<strong>in</strong>g the questionnaire before f<strong>in</strong>ish<strong>in</strong>g it and the amount of<br />
respondents not select<strong>in</strong>g any work tasks stresses the importance of questionnaire<br />
designs.<br />
Another methodical challenge to the questionnaire data is caused by the design<br />
of the questionnaire itself. Thus, a central feature of the questionnaire guides the<br />
respondents to the specific work tasks, they are carry<strong>in</strong>g out. This allows <strong>in</strong>sight <strong>in</strong>to<br />
the characteristics of specific work tasks <strong>in</strong> the organization. Concurrently, however,<br />
this particular feature at the same time has had the effect, that some work tasks received<br />
very few answers (see Table 7.3). This has had the consequence that the reliability of<br />
work tasks with few respondents must be considered. We will report frequencies and<br />
124
125<br />
Chapter 6<br />
percentages <strong>in</strong> regards to the univariate statistics, but will be precautious with the results<br />
from work tasks with few respondents.<br />
Despite the methodical challenges, the data report answers from 340 people<br />
regard<strong>in</strong>g their seek<strong>in</strong>g behaviour. The respondents represent a stratified sample of<br />
approximately 8000 employees, ensur<strong>in</strong>g that many types of employees are represented.<br />
The purpose of the questionnaire data was to ga<strong>in</strong> an overview of the seek<strong>in</strong>g behaviour<br />
across work tasks. This purpose has been met by the questionnaire. The subsequent<br />
focus groups counterbalance for the limitations of the questionnaire.<br />
6.3 Focus group method<br />
Focus group <strong>in</strong>terviews were <strong>in</strong>cluded as the qualitative counterpart <strong>in</strong> the<br />
doma<strong>in</strong> study. We refer to the group <strong>in</strong>terviews as focus group <strong>in</strong>terviews <strong>in</strong> order to<br />
mirror Morgan’s (1996) def<strong>in</strong>ition. He def<strong>in</strong>es focus groups as <strong>in</strong>terviews with a<br />
composite group of people that are controlled by a moderator while discuss<strong>in</strong>g a topic<br />
def<strong>in</strong>ed by the moderator or researcher. The overall purpose of the focus groups was to<br />
validate and elaborate on the survey results. In the follow<strong>in</strong>g sections, we will account<br />
for the research method applied <strong>in</strong> this part of the data collection. We <strong>in</strong>itiate by<br />
present<strong>in</strong>g the data collection as regards purpose and design of the focus groups<br />
(Section 6.3.1), the questions guid<strong>in</strong>g the focus groups (Section 6.3.2), and the conduct<br />
and documentation (Section 6.3.3). We f<strong>in</strong>ish by explicat<strong>in</strong>g the methods used for data<br />
analysis (Section 6.3.4).<br />
6.3.1 Purpose and design<br />
The general <strong>in</strong>tention beh<strong>in</strong>d the focus group <strong>in</strong>terviews was to reduce the<br />
restrictions around them, and allow for the elaborations put forward by the participants,<br />
s<strong>in</strong>ce elaborations were just the purpose of the <strong>in</strong>terviews. On the other hand, we aimed<br />
for a fairly tight form of the focus groups to make sure that all subareas were covered by<br />
the discussions (cf. Halkier, 2008, pp. 38-41). A slide show and an <strong>in</strong>terview guide<br />
were applied to reta<strong>in</strong> structure. The slide show was presented to the participants dur<strong>in</strong>g<br />
the <strong>in</strong>terview sessions. The <strong>in</strong>tention beh<strong>in</strong>d the slide show was to prompt discussions<br />
of the questionnaire results among the participants and encourage them to expla<strong>in</strong> and<br />
clarify the underly<strong>in</strong>g <strong>in</strong>formation behaviour and mean<strong>in</strong>g of <strong>in</strong>formation <strong>in</strong> their daily<br />
work. An example of the focus group slideshows appears from Appendix 8. In
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
addition, a semi structured <strong>in</strong>terview guide was applied to support and guide the group<br />
discussions. The <strong>in</strong>tention beh<strong>in</strong>d the <strong>in</strong>terview guide was not to force the questions on<br />
the participants. Thus, if the participants had other relevant issues, they wanted to<br />
discuss <strong>in</strong> relation to the presented results, they were allowed to. Rather the <strong>in</strong>terview<br />
guide had the function of support<strong>in</strong>g the focus group moderator <strong>in</strong> case discussions<br />
removed too far from the subject <strong>in</strong> question, or <strong>in</strong> case the conversation stalled. In this<br />
sense the <strong>in</strong>terview guide rather served as a supportive tool to ensure that discussions<br />
would develop. This also meant that not all questions necessarily needed answers from<br />
the participants.<br />
Ma<strong>in</strong> process Number of participants<br />
Settlement Participant 1-6 (6 persons)<br />
Instruction Participant 7-12 (6 persons)<br />
Processes of support Participa13-17 (5 persons)<br />
Customs <strong>in</strong>spection Participant 18-22 (5 persons)<br />
Common <strong>in</strong>spection Participant 23-27 (5 persons)<br />
Management and development Participant 28-31 (4 persons)<br />
Collection Participant 32-35 (4 persons)<br />
Table 6.4 Overview of participants <strong>in</strong> focus groups<br />
7 workshops were conducted, each represent<strong>in</strong>g one of the ma<strong>in</strong> processes of<br />
the bus<strong>in</strong>ess model of the case organization. One process, Inspection, was represented<br />
by two workshops, s<strong>in</strong>ce the two work tasks conta<strong>in</strong>ed <strong>in</strong> the ma<strong>in</strong> process were<br />
considered so diverse, that it might affect the outcome, if they had been merged <strong>in</strong>to one<br />
workshop. The workshops took place <strong>in</strong> June 2009 <strong>in</strong> four different locations across<br />
Denmark (see specifications <strong>in</strong> Appendix 7. Each workshop lasted approximately 2<br />
hours and had between 4 and 6 participants and may therefore be characterized as a<br />
m<strong>in</strong>i group type of group compared to full groups, which usually has between 8 and 10<br />
participants (Greenbaum, 1993). In total, 35 persons were <strong>in</strong>terviewed. The<br />
distribution between the ma<strong>in</strong> processes appears from Table 6.4.<br />
Each workshop represents one of the ma<strong>in</strong> processes <strong>in</strong> the bus<strong>in</strong>ess model.<br />
The recruit<strong>in</strong>g of participants consisted of two steps. Firstly, a number of managers<br />
were asked by e-mail to identify approximately five participants <strong>in</strong> their department.<br />
The managers reported a list of names back that were contacted directly by e-mail<br />
afterwards. Different locations were used <strong>in</strong> order to allow for representation of all six<br />
126
127<br />
Chapter 6<br />
ma<strong>in</strong> processes of the bus<strong>in</strong>ess model <strong>in</strong> the workshops. The workshops took place <strong>in</strong><br />
four different physical locations respectively. Collect<strong>in</strong>g the data <strong>in</strong> different locations<br />
had the benefit of represent<strong>in</strong>g different types of offices. Also different types of<br />
employees were represented <strong>in</strong> the focus groups. Thus the participants represented<br />
employees with an academic background, employees educated with<strong>in</strong> the case<br />
organization, and employees with a clerical background.<br />
6.3.2 Data collection: Interview guide<br />
The <strong>in</strong>terview guide appears <strong>in</strong> Appendix 9. The function of the <strong>in</strong>terview guide<br />
was to have a set of questions to br<strong>in</strong>g <strong>in</strong>to play <strong>in</strong> case the participants had trouble<br />
discuss<strong>in</strong>g the presented slides without trigger<strong>in</strong>g questions. The literature suggest, that<br />
the succession <strong>in</strong> <strong>in</strong>terviews starts out with general questions followed by more specific<br />
questions (e.g., Stewart, Shamdasani & Rook, 2007, p. 61). We decided to apply this<br />
succession throughout the <strong>in</strong>terview start<strong>in</strong>g out with an <strong>in</strong>troduction to the participants’<br />
background. Bloor et al. (2001) recommends, that demographic data are collected<br />
ahead of the focus group, e.g., by us<strong>in</strong>g a short questionnaire. However, we found, that<br />
lett<strong>in</strong>g the participants start out by <strong>in</strong>troduc<strong>in</strong>g themselves worked well as a way of<br />
gett<strong>in</strong>g everyone <strong>in</strong>to play from the beg<strong>in</strong>n<strong>in</strong>g of the <strong>in</strong>terview on a topic comfortable to<br />
them. This is just the function of open<strong>in</strong>g questions (Krueger, 1998, p. 23). Next the<br />
second part of the <strong>in</strong>terview followed, concern<strong>in</strong>g the f<strong>in</strong>d<strong>in</strong>gs of the survey. Here, the<br />
questionnaire results relevant to the current focus group were <strong>in</strong>troduced <strong>in</strong> the slide<br />
show. The questions asked followed the four themes of the questionnaire, namely the<br />
frequency of <strong>in</strong>formation seek<strong>in</strong>g, use of <strong>in</strong>formation sources, and developed<br />
<strong>in</strong>formation needs. The focus groups f<strong>in</strong>ished with a discussion of preferred metadata<br />
when seek<strong>in</strong>g <strong>in</strong>formation at the <strong>in</strong>tranet.<br />
6.3.3 Execution and documentation<br />
The <strong>in</strong>terview guide was accompanied by the slide show as an object for discussion and<br />
explanation. This meant that the slide show came to serve as probe for the questions<br />
asked by the <strong>in</strong>terviewer, help<strong>in</strong>g to keep the <strong>in</strong>terview on topic (e.g., Rub<strong>in</strong> & Rub<strong>in</strong>,<br />
2005, p. 164). The work shops were <strong>in</strong>itiated by an <strong>in</strong>troduction to the <strong>in</strong>terviewer, to<br />
the workshop purpose, and the agenda. Hand-outs of the slide show were distributed to<br />
the participants to enable them to go back <strong>in</strong> the slides, if they had additional comments<br />
later <strong>in</strong> the <strong>in</strong>terview. Some goodies were offered to the participants <strong>in</strong> order to show
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
our appreciation of their efforts. A Dictaphone recorded the <strong>in</strong>terview <strong>in</strong> preparation for<br />
documentation purposes. The Dictaphone was started, when the participants started<br />
<strong>in</strong>troduc<strong>in</strong>g themselves. The group <strong>in</strong>terviews ended whenever the participants had<br />
discussed the slides conta<strong>in</strong>ed <strong>in</strong> the slideshow. We f<strong>in</strong>ished the session by thank<strong>in</strong>g the<br />
participants for their time, <strong>in</strong>put, and contributions, and welcomed them to contact us <strong>in</strong><br />
case they recalled topics of relevance after the end<strong>in</strong>g of the <strong>in</strong>terview.<br />
The <strong>in</strong>terviews were subsequently transferred to the transcription software<br />
Express Scribe and transcribed. For the transcription, we developed a list of criteria for<br />
what to <strong>in</strong>clude and what to exclude from the transcription (see Appendix 10). Bloor et<br />
al. (2001) suggest, that all speech are transcribed <strong>in</strong>clud<strong>in</strong>g passages, where other<br />
participants agree with a s<strong>in</strong>gle persons statements. S<strong>in</strong>ce we are not perform<strong>in</strong>g<br />
content analysis of the focus groups on other passages than the ones concern<strong>in</strong>g the<br />
participants’ background, we are not go<strong>in</strong>g to be calculat<strong>in</strong>g the degree of agreement.<br />
This is the ma<strong>in</strong> reason why we have not transcribed these support<strong>in</strong>g “mm”s and<br />
“yeah”s. We f<strong>in</strong>ally anonymized the participants’ names before convert<strong>in</strong>g the<br />
<strong>in</strong>terviews <strong>in</strong>to the rtf format required for import<strong>in</strong>g files <strong>in</strong>to atlas.ti.<br />
6.3.4 Data analysis<br />
The focus groups transcriptions were analysed <strong>in</strong> two sections. The analysis<br />
software atlas.ti (version 5.6.2) was used to support the analysis (see Figure 6.1). The<br />
first analysis concerns the <strong>in</strong>troductory part of the <strong>in</strong>terviews, where the participants<br />
presented themselves. The purpose was to discover the distribution of the participants<br />
as to their work tasks, education and length of service. In this <strong>in</strong>troductory analysis we<br />
were <strong>in</strong>spired by the pr<strong>in</strong>ciples of content analysis, which is a quantitatively oriented<br />
type of analysis with the purpose of summariz<strong>in</strong>g a complete set of data or parts of it<br />
(Neuendorf, 2002; Krippendorff, 2004). In the present analysis, we used the pr<strong>in</strong>ciples<br />
of content analysis to get a quantitative overview of the distribution of the participants.<br />
The second part of the analysis concerns the elaboration and validation of the<br />
survey results <strong>in</strong> preparation for answer<strong>in</strong>g the research questions. This second part of<br />
the analysis was guided by Halkier’s (2008) three steps <strong>in</strong> focus group analysis: 1)<br />
cod<strong>in</strong>g; 2) categorization, and 3) conceptualization. In the cod<strong>in</strong>g process, passages of<br />
text are marked up with prelim<strong>in</strong>ary labels. Here, categorization designates the process,<br />
where the <strong>in</strong>itial codes are related to each other, identify<strong>in</strong>g subord<strong>in</strong>ate, superior, and<br />
co-ord<strong>in</strong>ate codes among the <strong>in</strong>itial codes attached. The categorization can imply a<br />
128
Figure 6.1 Screen dump from atlas.ti cod<strong>in</strong>g of focus group <strong>in</strong>terviews<br />
129<br />
Chapter 6<br />
reduction of the data, when codes are comb<strong>in</strong>ed <strong>in</strong>to superior categories, but also further<br />
complication of the data, if codes are expanded and supplemented with more detailed<br />
sub codes. Identification of relations and contradictions between codes is <strong>in</strong>herent <strong>in</strong><br />
process of categorization. F<strong>in</strong>ally, conceptualization designates the part of the analysis,<br />
where the categorization and codes are related to the data, but also the theoretical<br />
concepts underly<strong>in</strong>g the data, either as to similar studies, theoretical concepts, or other<br />
empirical parts of the research project.<br />
We started out by cod<strong>in</strong>g the <strong>in</strong>terviews with free codes, correspond<strong>in</strong>g to<br />
Halkier’s first step cod<strong>in</strong>g. Next, we used the function <strong>in</strong> atlas.ti allow<strong>in</strong>g group<strong>in</strong>g of<br />
the <strong>in</strong>itial codes <strong>in</strong>to cod<strong>in</strong>g families. Hereby we were able to categorize the codes<br />
accord<strong>in</strong>g to Halkier’s second step categorization. The third step, conceptualization,<br />
were represented by the analysis of the codes and cod<strong>in</strong>g families and relat<strong>in</strong>g these to<br />
other studies, to the questionnaire data and to theoretical concepts. Quotes from the<br />
questionnaire, the focus group <strong>in</strong>terviews and the search test are presented through the<br />
thesis. The applied quotes have been translated <strong>in</strong>to English, but they appear <strong>in</strong> their<br />
orig<strong>in</strong>al Danish word<strong>in</strong>g <strong>in</strong> Appendix 11. The results of the seven focus group<br />
<strong>in</strong>terviews are reported <strong>in</strong> Chapter 7 along with the questionnaire results.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
6.3.5 Limitations<br />
The focus groups <strong>in</strong>terviews were based on a based on a convenience sample of<br />
employees. We acknowledge that a random sample perhaps could have been more<br />
representative of SKAT as such. However, the educational level of the organization<br />
was reflected <strong>in</strong> the participants along with the majority of the organization work tasks.<br />
Further, the focus groups were carried out <strong>in</strong> four different locations across Denmark <strong>in</strong><br />
order to reflect the geographical distribution of the organizations. 35 people<br />
participated <strong>in</strong> 7 focus groups provid<strong>in</strong>g valuable <strong>in</strong>sight <strong>in</strong>to their daily <strong>in</strong>formation<br />
seek<strong>in</strong>g patterns.<br />
6.4 Search test design<br />
The search test compares full text <strong><strong>in</strong>dex<strong>in</strong>g</strong>, an extracted type of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>,<br />
and automatic assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the form of text categorization. The search test was<br />
set up as an experimental test. The test took place <strong>in</strong> June, 2010 <strong>in</strong> two different office<br />
locations of SKAT, below location 1 and 2. In accordance with our methodological<br />
standpo<strong>in</strong>t we asked employees at SKAT to participate <strong>in</strong> the search test. In the<br />
rema<strong>in</strong>der of the thesis we will use the term test person to denote a search test<br />
participant.<br />
6.4.1 Test system<br />
The first draft of the search test design was to carry out the test when the<br />
revised <strong>in</strong>tranet had been implemented, and the employees had had some time to adjust<br />
to the system. However, the process of implement<strong>in</strong>g the new portal at SKATs pages<br />
was delayed. This meant that it was not possible to execute the test <strong>in</strong> the portal<br />
environment <strong>in</strong> operation. Instead we used a prototype of the future <strong>in</strong>tranet as our test<br />
base. At the time of the search test the categorization was still be<strong>in</strong>g tra<strong>in</strong>ed. In<br />
addition, the prototype had some functional <strong>in</strong>expediencies. We expla<strong>in</strong>ed these to the<br />
test persons as a part of the <strong>in</strong>troduction to the test system, but nevertheless the course<br />
of the test was <strong>in</strong> some cases challenged. In order to avoid changes <strong>in</strong> the system across<br />
the s<strong>in</strong>gle test sessions, the test system was not updated dur<strong>in</strong>g the search test period.<br />
From a technical perspective the test system was embedded <strong>in</strong> a separate test<br />
environment. The test database was generated <strong>in</strong> august 2009 and has not been updated<br />
<strong>in</strong> the <strong>in</strong>terven<strong>in</strong>g period of time up to the search test <strong>in</strong> June, 2010. Thus, the newest<br />
130
131<br />
Chapter 6<br />
documents conta<strong>in</strong>ed <strong>in</strong> the test base at the time of the search test were from august<br />
2009. The test base conta<strong>in</strong>ed a sample of the documents conta<strong>in</strong>ed <strong>in</strong> the current<br />
<strong>in</strong>tranet. The test base conta<strong>in</strong>ed 188.600 documents that had been randomly drawn<br />
from the <strong>in</strong>tranet. By comparison, at the time of the search test the <strong>in</strong>tranet <strong>in</strong> use<br />
conta<strong>in</strong>ed 681.640 documents. That is, the test base conta<strong>in</strong>ed approximately 28 % of<br />
the full version of the <strong>in</strong>tranet.<br />
As <strong>in</strong> the <strong>in</strong>tranet <strong>in</strong> function, the prototype was based on CMS technology.<br />
Autonomy’s (www.autonomy.com) search software IDOL provided the search<br />
functionalities of the search <strong>in</strong>terface. The <strong>in</strong>terface is depicted <strong>in</strong> Figure 6.2. Though<br />
more fields were available, the test persons solely used the fields “Søgetekst” (Query<br />
box), “Søgetype” (Search operator), and “Dokumenttype” (Document type) dur<strong>in</strong>g<br />
test<strong>in</strong>g. The possibility to specify searches to forms (“Blanket”), <strong>in</strong>formation, or selfservice<br />
(“Selvbetjen<strong>in</strong>g”) just below the grey bar (<strong>in</strong> the middle of the <strong>in</strong>terface, see<br />
Figure 6.2) was default set to “Information” and was not changed dur<strong>in</strong>g the test.<br />
Neither was the default sett<strong>in</strong>g of rank<strong>in</strong>g search results as to their relevance.<br />
The query box was used for enter<strong>in</strong>g query terms. The box supported the use<br />
of quotation marks for phase searches. Search terms entered were automatically<br />
Figure 6.2 Screen dump of the test system: Search fields
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
truncated. The search operator field specified how search terms were comb<strong>in</strong>ed. One<br />
of four options could be chosen. “Free text” (FT) retrieved documents conta<strong>in</strong><strong>in</strong>g most,<br />
but not necessarily all, entered search terms. “Pages conta<strong>in</strong><strong>in</strong>g all words” (AW)<br />
retrieved documents conta<strong>in</strong><strong>in</strong>g all search terms <strong>in</strong> their exact or truncated form. Thus,<br />
the operator corresponds to us<strong>in</strong>g Boolean “AND” (Large, Tedd & Hartley, 2001, p.<br />
148 ff.). “This exact sentence” (ES) retrieved documents that conta<strong>in</strong>ed the search<br />
terms <strong>in</strong> the exact form and order entered <strong>in</strong>to the query box. The operator corresponds<br />
to enter<strong>in</strong>g the search terms <strong>in</strong> quotation marks. By this the system is consider<strong>in</strong>g the<br />
search terms as a s<strong>in</strong>gle term (Large, Tedd & Hartley, 2001, p. 167 ff.). Lastly, the “At<br />
least one of the words” (OW) operator retrieved documents conta<strong>in</strong><strong>in</strong>g at least one of the<br />
entered search terms <strong>in</strong> truncated form. The operator corresponds to apply<strong>in</strong>g Boolean “OR”<br />
(Large, Tedd & Hartley, 2001, p. 148 ff.). Of the four, the ES operator is the most<br />
restrictive. Next follows the AW operator. The FT and the OW operators <strong>in</strong><br />
comparison retrieve larger sets of documents. The last field available to the test persons<br />
was the metadata field “Document type”. The field made it possible to limit search<br />
results to specific document types. Choice was between 12 different document types <strong>in</strong><br />
a drop down menu. An empty field at the top of the menu was the default sett<strong>in</strong>g of the<br />
menu, which enabled a search with no limitation as to document types. Search results<br />
were delivered on a list ranked as to the relevance of the documents to the search terms<br />
entered. For each hit different pieces of <strong>in</strong>formation were provided; a document title, a<br />
snippet highlight<strong>in</strong>g the search terms and the surround<strong>in</strong>g terms, the document type (cf.<br />
the document type field mentioned above), and the date of publication. An example of<br />
a result list appears from Figure 6.3.<br />
A central feature of IDOL is the ability of automatically categoriz<strong>in</strong>g<br />
documents on the basis of mach<strong>in</strong>e learn<strong>in</strong>g as described <strong>in</strong> section 5.4.2. 10 The IDOL<br />
categorization facilities were applied to categorize the test system search results. The<br />
taxonomy taken <strong>in</strong>to use on January 1, 2008 (see section 2.4.2) formed the basis of the<br />
categories that search results are automatically placed <strong>in</strong>to when presented to the end<br />
users. The categorization tra<strong>in</strong><strong>in</strong>g started out <strong>in</strong> November, 2008. The first step of the<br />
tra<strong>in</strong><strong>in</strong>g consisted of giv<strong>in</strong>g each subject <strong>in</strong> the taxonomy a rough <strong>in</strong>troduction to the<br />
10 For further elaborations of the IDOL, white papers on the system can be found at www.autonomy.com.<br />
In addition, Chaudhry (2010) have made a comparison with 11 other similar systems.<br />
132
Figure 6.3 Screen dump of the test system: Categorization<br />
133<br />
Chapter 6<br />
understand<strong>in</strong>g of the content of that subject. The procedure consisted of select<strong>in</strong>g 5<br />
terms representative of the subject. The 5 terms were subsequently used to search the<br />
test base. The search result was frisked <strong>in</strong> order to identify candidate documents to<br />
represent each category. The m<strong>in</strong>imum number of candidate documents <strong>in</strong> each<br />
category was been set to a m<strong>in</strong>imum of 20. IDOLs manual recommends between 40-50<br />
candidate documents. The status of the categorization at the time of the search test was<br />
as follows: If a document had been manually <strong>in</strong>dexed at the time of import to the test<br />
database, the manual mark-up of the document decided the plac<strong>in</strong>g of the document <strong>in</strong><br />
the portlet. This is the case for documents published after January 1, 2008. However,<br />
older documents did not have any subject terms attached. For this group of documents<br />
the plac<strong>in</strong>g <strong>in</strong> the port let was based on the tra<strong>in</strong><strong>in</strong>g that IDOL had achieved at the time<br />
of the search test.<br />
The categorization appears from Figure 6.3 (the box at the right hand side of<br />
the result list). The selection of one or more categories took place after a search had<br />
been carried out and a result existed. On the basis of the retrieved documents, the<br />
search result was limited as to subjects present <strong>in</strong> the search results. The categorization<br />
w<strong>in</strong>dow just showed the terms from the taxonomy actually conta<strong>in</strong><strong>in</strong>g documents <strong>in</strong> the
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
current result set. If several categories were selected on the basis of the same query, the<br />
first category was not <strong>in</strong>cluded <strong>in</strong> the subsequent category choices.<br />
In the test situation, when the test persons used the test system without<br />
categorization, the right hand side of the screen was covered <strong>in</strong> order to avoid, that the<br />
test persons were affected by the controlled terms from the taxonomy when compos<strong>in</strong>g<br />
queries for the system. In addition the test persons were not tempted to use the<br />
categorization, when it was not visible to them. The cover<strong>in</strong>g of the categorization<br />
w<strong>in</strong>dow means that two test systems are produced <strong>in</strong> methodical sense; one based on<br />
free text <strong><strong>in</strong>dex<strong>in</strong>g</strong> and one based on categorization of search results. S<strong>in</strong>ce the free text<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> system functions as the basel<strong>in</strong>e for measur<strong>in</strong>g the effect of categorization, we<br />
refer to this system as System A. Accord<strong>in</strong>gly, the system employ<strong>in</strong>g categorization<br />
will be denoted as System B.<br />
6.4.2 Test persons<br />
32 test persons participated <strong>in</strong> the search test. The test persons were recruited at<br />
location 1 and location 2. Ingwersen (2000) recommends 40-50 test persons for purely<br />
quantitative studies, and less for qualitative studies. S<strong>in</strong>ce we were carry<strong>in</strong>g out a<br />
qualitative study, we found 32 people to be satisfy<strong>in</strong>g. From the results of the doma<strong>in</strong><br />
study we had found that the frequency of <strong>in</strong>tranet use was high <strong>in</strong> most parts of the<br />
organization. Therefore, we did not f<strong>in</strong>d it necessary to exclude certa<strong>in</strong> work tasks from<br />
the search test. The choice of the two offices was motivated by the condition that the<br />
two departments represent the different educational groups of employees identified <strong>in</strong><br />
the doma<strong>in</strong> study questionnaire.<br />
To locate relevant test persons all employees with<strong>in</strong> the specified offices<br />
received a web questionnaire. In total the questionnaire was sent to 459 employees. In<br />
the questionnaire the employees answered questions about their background, work<br />
tasks, frequency of use of the <strong>in</strong>tranet and frequency of <strong>in</strong>formation seek<strong>in</strong>g (Appendix<br />
4). We refer to this questionnaire as the recruitment questionnaire. Reliability of<br />
research designs is affected by the consistency of measures, among other th<strong>in</strong>gs<br />
(Carm<strong>in</strong>es & Woods, 2005). Keep<strong>in</strong>g the consistency of work tasks consistent between<br />
the doma<strong>in</strong> study questionnaire and the recruitment questionnaire comprised a special<br />
challenge. Thus, <strong>in</strong> the <strong>in</strong>terven<strong>in</strong>g time the bus<strong>in</strong>ess model had changed and another<br />
merger had taken place <strong>in</strong> the organization. In order to capture the modified bus<strong>in</strong>ess<br />
model and still be able to mirror the previous bus<strong>in</strong>ess model we expanded the report<strong>in</strong>g<br />
of current work tasks. Yet, the widen<strong>in</strong>g was carried out <strong>in</strong> a way that allowed for the<br />
134
135<br />
Chapter 6<br />
current work tasks to be fit <strong>in</strong>to the work tasks of the previous bus<strong>in</strong>ess model. Like<br />
with the doma<strong>in</strong> study questionnaire we aimed at reduc<strong>in</strong>g the semantic openness of the<br />
questionnaire by the use of probes (see Section 6.2.1). As for probes regard<strong>in</strong>g the<br />
work tasks, we used the latest annual report of SKAT as <strong>in</strong>spiration (SKAT, 2009).<br />
In our selection of test persons, we emphasized the test persons’ frequency of<br />
use of the <strong>in</strong>tranet and their general frequency of <strong>in</strong>formation seek<strong>in</strong>g. As for frequency<br />
of use, the most important parameter was that <strong>in</strong>formation needs and derived<br />
<strong>in</strong>formation seek<strong>in</strong>g took place more often than “practically never”. 42 people met<br />
these requirements. Of these, 10 were used as pilot testers. The rema<strong>in</strong>der 32 carried<br />
out the actual search test.<br />
6.4.3 Search tasks<br />
The literature suggests three generic types of search tasks for IR evaluation,<br />
namely natural, simulated, and assigned search tasks (cf., Vakkari, 2003). We used<br />
simulated and genu<strong>in</strong>e work tasks for the test evaluation. Based on the<br />
recommendations put forward by Ingwersen (2000, p. 173), three simulated work tasks<br />
were carried out. The purpose of employ<strong>in</strong>g simulated work tasks was to <strong>in</strong>crease the<br />
degree of experimental control <strong>in</strong> the operational evaluation (cf. Borlund, 2000, p. 72,<br />
2003b). Different recommendations have been given for the development and use of<br />
simulated work tasks. Among other th<strong>in</strong>gs, the recommendations comprise that<br />
simulated work tasks and genu<strong>in</strong>e <strong>in</strong>formation needs are employed <strong>in</strong> the same test, that<br />
the work tasks are tailored to the <strong>in</strong>formation environment and the test persons, and that<br />
search jobs are permuted (Borlund, 2003b). As can be seen below, these<br />
recommendations were <strong>in</strong>corporated <strong>in</strong>to the present test design.<br />
IR evaluations are frequently carried out with graduate or undergraduate<br />
students. This is also the case for the empirical use of simulated work tasks (Borlund &<br />
Schneider, 2010). However, a few studies have applied different versions of simulated<br />
work tasks on professional users (e.g., Nielsen, 2004; Suomela & Kekälä<strong>in</strong>en, 2005,<br />
2006; Wacholder et al., 2007; Blomgren, Vallo & Byström, 2004). Simulated work<br />
tasks have been employed <strong>in</strong> a study of professional users by Blomgren, Vallo &<br />
Byström (2004). On the basis of their study they conclude, that “…compos<strong>in</strong>g a<br />
simulated work task situation that offers a sufficient level of reality for all participants,<br />
must be done with great care” (Blomgren, Vallo & Byström, 2004, p. 66). Obviously,<br />
trigger<strong>in</strong>g real <strong>in</strong>formation needs <strong>in</strong> a simulated and professional context is challeng<strong>in</strong>g,<br />
not least when participants have different work tasks and backgrounds with<strong>in</strong> the
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
professional context, which is the case <strong>in</strong> the present study and <strong>in</strong> the study by<br />
Blomgren, Vallo & Byström. In the study by Price et al. (2009), an subject expert<br />
participates <strong>in</strong> the development of simulated work tasks <strong>in</strong> order to ensure wellfunction<strong>in</strong>g<br />
tasks. The importance of reality of simulated work tasks is emphasized by<br />
several authors (e.g., Blomgren, Vallo & Byström, 2004; Borlund, 2000). Different<br />
aspects may be kept <strong>in</strong> m<strong>in</strong>d <strong>in</strong> order to ensure realism. Here we operationalize realism<br />
as a relevant subject comb<strong>in</strong>ed with a level of complexity correspond<strong>in</strong>g to the test<br />
persons’ genu<strong>in</strong>e <strong>in</strong>formation needs.<br />
As regards the subject content of the simulated work tasks we used different<br />
sources as <strong>in</strong>spiration. We went through the fields <strong>in</strong> the doma<strong>in</strong> study questionnaire<br />
that allowed for open responses. Also the focus group <strong>in</strong>terviews were scanned <strong>in</strong> order<br />
to locate ideas for search tasks. Lastly, we consulted web pages communicat<strong>in</strong>g<br />
citizens’ and m<strong>in</strong>or bus<strong>in</strong>esses’ questions about taxes for <strong>in</strong>spiration. To decide on the<br />
level of complexity of the simulated work tasks, we consulted the results of the doma<strong>in</strong><br />
study. The doma<strong>in</strong> study revealed that <strong>in</strong>formation needs of low complexity were far<br />
more common than more complex types. In the questionnaire, the most frequent<br />
<strong>in</strong>dicators were “I need to f<strong>in</strong>d a document I have used before” and “I know the subject<br />
well but need to f<strong>in</strong>d a specific piece of <strong>in</strong>formation”. Saracevic et al. (1987, p. 35)<br />
def<strong>in</strong>es the complexity of search tasks as to the number of concepts conta<strong>in</strong>ed. Iivonen<br />
(1995) operationalizes the complexity further by decid<strong>in</strong>g, that simple search tasks<br />
consists of up to three concepts. Complex search tasks conta<strong>in</strong>s above three concepts.<br />
Tak<strong>in</strong>g <strong>in</strong>to account that the employees reported simple <strong>in</strong>formation needs as their<br />
predom<strong>in</strong>ant type, we developed simulated work tasks that conta<strong>in</strong>ed three concepts (or<br />
search keys) or less. About ten simulated work tasks were developed. We subsequently<br />
carried out a pilot test <strong>in</strong> order to f<strong>in</strong>d out, how the work tasks worked <strong>in</strong> the test<br />
situation. We wanted <strong>in</strong>formation about the understandability of the work tasks for the<br />
test persons, and specifically, if the test persons with<strong>in</strong> a reasonable amount of time<br />
were able to solve the work tasks. Also we wanted to reduce the number of work tasks.<br />
The work resulted <strong>in</strong> three simulated work tasks concern<strong>in</strong>g the sale of an apartment<br />
(SIM 1), taxation of e-bus<strong>in</strong>esses (SIM 2), and tax based issues related to work<strong>in</strong>g as a<br />
freelancer (SIM 3). The latter of the three search tasks conta<strong>in</strong>ed four search keys (see<br />
Table 6.7). However, one is a non-topical facet, which is the reason for still consider<strong>in</strong>g<br />
it a simple task <strong>in</strong> terms of Iivonen. The f<strong>in</strong>al simulated work tasks appear from<br />
Appendix 14.<br />
136
137<br />
Chapter 6<br />
To be able to control for the test persons <strong>in</strong>sight <strong>in</strong>to the controlled search task<br />
an on screen questionnaire was filled out every time a task had been completed (see<br />
Appendix 15). We asked the test persons about their <strong>in</strong>sight <strong>in</strong>to the subject of the work<br />
task, their view on the difficulty of the work task and the resemblance of the work task<br />
with their usual work tasks. All questions were graded on a 5-po<strong>in</strong>t Likert scale.<br />
In addition to the simulated work task situations, the test persons were asked to<br />
br<strong>in</strong>g a genu<strong>in</strong>e <strong>in</strong>formation need to the test session. A genu<strong>in</strong>e <strong>in</strong>formation need serves<br />
several purposes (Borlund & Schneider, 2010). We consider the most important ones to<br />
be the function as a basel<strong>in</strong>e for simulated needs and the possibility to ga<strong>in</strong> <strong>in</strong>sight <strong>in</strong>to<br />
the system’s effect on real <strong>in</strong>formation needs. Also, it appears, that genu<strong>in</strong>e <strong>in</strong>formation<br />
needs may get better scores on different performance measures (cf., Blomgren, Vallo &<br />
Byström, 2004). Specifically, we e-mailed the test persons shortly before their test<br />
session to ask them to br<strong>in</strong>g a genu<strong>in</strong>e <strong>in</strong>formation need. This way, the e-mail served a<br />
second function; namely as a rem<strong>in</strong>der for the test persons to show up. The exact<br />
word<strong>in</strong>g of the e-mail appears from Appendix 16.<br />
The genu<strong>in</strong>e tasks brought by the respondents confirmed the lack of<br />
uncontrollability <strong>in</strong> controlled test sett<strong>in</strong>gs. The tasks were highly vary<strong>in</strong>g as to their<br />
content reflect<strong>in</strong>g ma<strong>in</strong>ly specialist matters. Also organisational matters such as the<br />
annual summer party were represented though. The tasks also <strong>in</strong>cluded examples that<br />
could not be solved us<strong>in</strong>g the prototype as the <strong>in</strong>formation sought was not <strong>in</strong>cluded <strong>in</strong><br />
the database. In those cases the test persons made up a new task for themselves. The<br />
character of the tasks corresponded to the simulated search task <strong>in</strong> terms on the number<br />
of facets <strong>in</strong>cluded. Thus, the genu<strong>in</strong>e tasks conta<strong>in</strong>ed between one and three facets.<br />
Three examples are listed <strong>in</strong> Table 6.5.<br />
Table 6.5 Examples of genu<strong>in</strong>e search tasks<br />
Search terms Document type Category Search operator Facets<br />
<strong>in</strong>cluded<br />
Ordrenumre Internal - Free text 3<br />
store selskaber <strong>in</strong>formation<br />
Bødetakster - Penalty Free text 2<br />
Skattekvittance - - Free text 1
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
6.4.4 Test procedure<br />
The test procedure consisted of three parts; 1) an <strong>in</strong>troduction to the session, 2) the<br />
search part, where the test persons searched the two systems us<strong>in</strong>g the search tasks and<br />
evaluated retrieved documents, and 3) a post search <strong>in</strong>terview.<br />
The <strong>in</strong>troduction to the test session consisted of different elements. Firstly, the<br />
guidel<strong>in</strong>es for perform<strong>in</strong>g the search tasks were carried out. Next, the test system was<br />
<strong>in</strong>troduced to the test persons. Due to time constra<strong>in</strong>ts the <strong>in</strong>troduction did not <strong>in</strong>clude<br />
time for the test persons to try out the system. The presentation <strong>in</strong>cluded the<br />
characteristics of the system as to search possibilities and the shortcom<strong>in</strong>gs <strong>in</strong> the<br />
prototype. The elements conta<strong>in</strong>ed <strong>in</strong> the <strong>in</strong>troduction are listed <strong>in</strong> Appendix 17. The<br />
<strong>in</strong>troduction was closed by <strong>in</strong>form<strong>in</strong>g and ensur<strong>in</strong>g the test person of their anonymity of<br />
the test (Kvale & Br<strong>in</strong>kmann, 2009, p. 63 ff.).<br />
The test persons searched us<strong>in</strong>g 4 search tasks; 3 simulated and one genu<strong>in</strong>e.<br />
The tasks were rotated as to their succession and the succession of the test systems<br />
(System A and B) <strong>in</strong> order to control for order effects on the test results (cf. Kelly,<br />
2009) and to meet the recommendations put forward by Borlund as to the use of<br />
simulated search tasks (2003b). The miss<strong>in</strong>g try-out of the system even further<br />
necessitated the rotation of work tasks. The rotations applied appear from Appendix 18.<br />
Also appear<strong>in</strong>g from the rotation appendix is that the rotations also addressed the<br />
succession of test systems. When search<strong>in</strong>g <strong>in</strong> System B, it was mandatory that the test<br />
persons made use of the categorization menu <strong>in</strong> the right hand side of the screen. This<br />
was necessary s<strong>in</strong>ce the only <strong>in</strong>dication of the categorization <strong>in</strong> the system is visible<br />
here. Thus, search results are not presented accord<strong>in</strong>g to the <strong>in</strong>herent categorization.<br />
This decision also means that searches omitt<strong>in</strong>g categorization when it should have been<br />
applied was removed from the results. Whenever a task was completed (or resigned<br />
from), a short questionnaire was completed on the screen.<br />
The documents were evaluated on the basis of the title and snippets <strong>in</strong>cluded <strong>in</strong><br />
the result lists. The ma<strong>in</strong> reason for this was that it removes the snippet-document<br />
relationship as a variable <strong>in</strong> the results (cf. Turp<strong>in</strong> et al., 2009; He et al., 2010) and<br />
allows for comparison with correspond<strong>in</strong>g studies (e.g., Käki & Aula, 2005). Further,<br />
the prototype had trouble connect<strong>in</strong>g from l<strong>in</strong>ks <strong>in</strong> the result lists for certa<strong>in</strong> document<br />
types. The test persons were asked to assess the relevance of documents as to the work<br />
task <strong>in</strong> question, that is, situational relevance (cf. Figure 6.4).<br />
The relevance of search results was noted when the result lists were shown to<br />
138
W<br />
CW<br />
assessor/user<br />
N<br />
SR<br />
P<br />
r/q<br />
Real world<br />
IT<br />
A<br />
O-O n<br />
Collection<br />
of objects<br />
139<br />
Chapter 6<br />
Legend:<br />
: Assessor’s / user’s cognitive<br />
space<br />
W : Work task situation<br />
CW : Cognitive perceptionof W<br />
SR : Situational relevance<br />
P : Pert<strong>in</strong>ence relevance<br />
IT : Intellectual topicality<br />
A : Algorithmic relevance<br />
N : Information need<br />
r/q : request/query version<br />
O : retrieved <strong>in</strong>formationobject(s)<br />
: Relevance assessment(s)<br />
or <strong>in</strong>terpretation (s)<br />
: Transformation<br />
: IR system<br />
Figure 6.4 Relevance types <strong>in</strong> IR evaluation adapted from Borlund (2003a, p. 915).<br />
the test persons. This way we received the immediate evaluation of the document while<br />
the test person remembered the document. After the search part of the test, a short post<br />
search <strong>in</strong>terview was conducted. The purpose of the <strong>in</strong>terview was to make the test<br />
persons sum up and reflect on their overall impressions of the test system, on their<br />
present use of the <strong>in</strong>tranet, and how categorization could be useful <strong>in</strong> their daily work.<br />
Due to time constra<strong>in</strong>ts the <strong>in</strong>terview guide was kept rather short. The <strong>in</strong>terview guide<br />
appears from Appendix 19.<br />
Dur<strong>in</strong>g the test the test manager was present <strong>in</strong> the room. There were several<br />
purposes for this. One was that the test persons could be observed dur<strong>in</strong>g their searches.<br />
This enabled the possibility to ask the test persons to elaborate on specific moves <strong>in</strong> the<br />
subsequent <strong>in</strong>terview. Further, the schedule did not leave time for the test persons to get<br />
acqua<strong>in</strong>ted with the test system before the test started. By lett<strong>in</strong>g the test manager be<br />
present dur<strong>in</strong>g the session, the test persons had the possibility to ask clarify<strong>in</strong>g questions<br />
dur<strong>in</strong>g the session. At the closure of the session, the test persons received a m<strong>in</strong>or<br />
acknowledgement for their <strong>in</strong>volvement.<br />
Physically the test took place at location 1 and 2. At location 1 a test room was<br />
available for the conduct of the test. The test room had a stationary mach<strong>in</strong>e for the test<br />
persons to use and a laptop for the test manager. Morae was <strong>in</strong>stalled on both mach<strong>in</strong>es.<br />
The Morae Observer module monitored the test persons’ actions on the laptop screen
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
for the test manager to follow. The monitor<strong>in</strong>g was not kept secret to the test persons,<br />
s<strong>in</strong>ce the purpose of us<strong>in</strong>g it was to avoid physically hav<strong>in</strong>g to look the test persons over<br />
their shoulders. At location 2 a test room was not available. Therefore we brought a<br />
laptop with Morae <strong>in</strong>stalled to enable logg<strong>in</strong>g. Dur<strong>in</strong>g the tests at location 2 the test<br />
manager was obliged to follow the test persons’ moves on the test mach<strong>in</strong>e. The<br />
predom<strong>in</strong>ant part of the tests was carried out at location 1.<br />
6.4.5 Pilot test<br />
Pilot tests were carried out at several stages of the process ahead of the launch of the<br />
search test. Specifically, the recruitment questionnaire, the simulated work tasks, and<br />
the test procedure were tested ahead of the actual collection of data.<br />
The recruitment questionnaire was pretested by a number of colleagues at the<br />
RSLIS. Further, a number of employees at SKAT pilot tested the questionnaire. S<strong>in</strong>ce<br />
the recruitment questionnaire had quite some resemblances with the doma<strong>in</strong> study<br />
questionnaire, we could to a certa<strong>in</strong> extent rely on the methodical experiences ga<strong>in</strong>ed<br />
here. However, the changes <strong>in</strong> the bus<strong>in</strong>ess model necessitated a pilot to ensure that the<br />
modified work tasks were understandable to the recruitment respondents. The<br />
questionnaire was adjusted accord<strong>in</strong>g to the feedback from both RSLIS colleagues and<br />
SKAT employees.<br />
The search task pretest also conta<strong>in</strong>ed different elements. We have already<br />
mentioned the pretest with the purpose of identify<strong>in</strong>g the most relevant work tasks and<br />
reduc<strong>in</strong>g the total number of work tasks. In addition, we tested the work tasks <strong>in</strong> the<br />
test system. Thus, <strong>in</strong> advance of the pilot of the search tasks among employees at<br />
SKAT, the search tasks were tested for their relevance to the test system. Thus, we<br />
tested if the outputs of the search tasks were suitable with the purpose of the search<br />
tasks. We wanted to f<strong>in</strong>d out whether the number documents that would match the<br />
requests were sufficient. In their pr<strong>in</strong>ciples for search result visualization Kules &<br />
Shneiderman (Kules & Shneiderman, 2004, p. 2) suggest that 100-1000 results are<br />
needed as a m<strong>in</strong>imum for an adequate basis of a categorized overview. However, their<br />
pr<strong>in</strong>ciples are based on the web, where the number of documents by far outnumbers our<br />
test collection. Kules & Shneiderman do follow the pr<strong>in</strong>ciple with the reservation that<br />
the optimal number of results depends on many factors such as task doma<strong>in</strong>, and<br />
document quality. Due to the size of the test collection we have had to aim at a lower<br />
number of results. Instead we have emphasized the availability of highly relevant<br />
140
141<br />
Chapter 6<br />
documents to match the simulated work tasks <strong>in</strong> our f<strong>in</strong>al choice of tasks. 11 work tasks<br />
were tested and of these 3 were picked out for the search test.<br />
Also the test situation as such was pilot tested. We needed <strong>in</strong>formation about<br />
how to handle practical matters such as how to document the searches, which<br />
succession of test elements to follow, and how to carry out the evaluations of work tasks<br />
and search results. Also we wanted an approximate estimation of the duration of a test<br />
session. The pilot tests provided very useful <strong>in</strong>sight <strong>in</strong>to these matters and the test<br />
design was corrected accord<strong>in</strong>g to the experiences ga<strong>in</strong>ed <strong>in</strong> the pilot tests. In actual<br />
practice the simulated search tasks and the test procedure were pilot tested<br />
simultaneously. We let the first test persons recruited by the recruitment questionnaire<br />
function as pilot testers and cont<strong>in</strong>ued to pilot until the test design was suitable for data<br />
collection. In total 10 pilot testers participated.<br />
6.4.6 Techniques for data collection and preparation<br />
Dur<strong>in</strong>g the course of the search test different methods for data collection were used <strong>in</strong><br />
order to allow for elaboration of the search process. The test persons’ <strong>in</strong>teraction with<br />
the test system was logged us<strong>in</strong>g the software Morae (see<br />
http://www.techsmith.com/morae.asp). Morae facilitate logg<strong>in</strong>g of key and screen<br />
activity. Both options were applied for documentation of the test, though we are<br />
primarily us<strong>in</strong>g the key log for analysis. Search (or transaction) logs have been widely<br />
used <strong>in</strong> order to document and analyze <strong>in</strong>teractions with retrieval systems and search<strong>in</strong>g<br />
behavior. The most significant strength of search logs as to the present test setup is the<br />
unobtrusiveness of the method (Jansen, 2006, p. 424-425). However the data delivered<br />
are descriptive (Jansen & Pooch, 2001, p. 242). That implies that the search log data<br />
should not stand alone, if we want to expla<strong>in</strong> and understand the <strong>in</strong>teraction between<br />
system and user. For that reason the log data were supplemented with qualitative data<br />
<strong>in</strong> order to compensate for the limitations of the search log as a research tool.<br />
Participant observation also took place dur<strong>in</strong>g the test procedure (cf. Ely, 1991,<br />
p. 41 ff.). As the test manager was present dur<strong>in</strong>g the test, observations were made <strong>in</strong><br />
order to capture moves, comments, modifications, and other acts of relevance to the<br />
search test. The observations are not reported here <strong>in</strong>dependently. Rather, the purpose<br />
of the observation was to qualify the post search <strong>in</strong>terview and enable the test manager<br />
to ask the test persons specifically about their <strong>in</strong>teraction with the system.<br />
Interviews, both oral and <strong>in</strong> questionnaire form, were carried out along the<br />
course of the search test. The recruitment questionnaire provided background data on
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
the test persons’ demographic data, seek<strong>in</strong>g behavior, and the like. Dur<strong>in</strong>g the search<br />
test the simulated work tasks were assessed as to the test persons’ knowledge of the<br />
subject, their perception of the degree of difficulty, and the extent of similarity with<br />
their genu<strong>in</strong>e work tasks. Lastly, after the test persons had carried out the search tasks,<br />
a post <strong>in</strong>terview were carried out. The purpose of the <strong>in</strong>terview was to ask follow up<br />
questions <strong>in</strong> order to get a more comprehensive picture of the search situation. For<br />
documentation purposes a Dictaphone was set to record the search test and the post<br />
<strong>in</strong>terview. It was decided to record the full event <strong>in</strong> case the test persons gave<br />
comments dur<strong>in</strong>g search<strong>in</strong>g that would be of value to our understand<strong>in</strong>g of their<br />
<strong>in</strong>teraction with the test system. Further, us<strong>in</strong>g the Dictaphone reduced the need for<br />
note tak<strong>in</strong>g dur<strong>in</strong>g search<strong>in</strong>g and allowed for the test manager to focus on the test<br />
persons and their actions. The recorded sequences were transcribed subsequently.<br />
The last type of documentation comprises the relevance assessments made<br />
dur<strong>in</strong>g search<strong>in</strong>g. Relevance was captured along two dimensions; the degree of<br />
relevance and the criteria applied for the assessment (Borlund, 2003a). The more<br />
systematical of the two were the measurement of the degree of relevance. We have<br />
already mentioned that relevance assessments took its po<strong>in</strong>t of departure <strong>in</strong> situational<br />
relevance. The degrees of situational relevance of the documents retrieved were<br />
measured on a 4-po<strong>in</strong>t scale. We followed Sormunens (2002) four po<strong>in</strong>t scale s<strong>in</strong>ce it<br />
allows for a dist<strong>in</strong>ction between the two categories of partial relevance <strong>in</strong>to relevant and<br />
useful and relevant and potential useless (Sormunen, 2002, p. 329). In order to reflect<br />
this dist<strong>in</strong>ction we followed Sormunens description of the respective degrees <strong>in</strong> our<br />
explanation to the test persons. In addition, we asked the test persons about the<br />
motivations for their assessments, i.e. the relevance criteria applied. The purpose of<br />
<strong>in</strong>clud<strong>in</strong>g relevance criteria was not to make a systematic <strong>in</strong>vestigation of relevance<br />
criteria. Rather, the criteria were <strong>in</strong>cluded as a tool to encourage the test persons to<br />
expla<strong>in</strong> the assessments given. The questions appear from the post search <strong>in</strong>terview<br />
guide, though asked <strong>in</strong> connection with relevance assessments.<br />
6.4.7 Data analysis<br />
The data collected consisted of 1) background data (from the recruitment<br />
questionnaire), 2) <strong>in</strong>terview transcriptions (from the search sessions and the post search<br />
<strong>in</strong>terview), 3) search logs, 4) relevance assessments, and 5) assessments of the<br />
simulated search tasks. Background data and assessments of tasks were analysed us<strong>in</strong>g<br />
descriptive statistics <strong>in</strong> SPSS. As the data were used to ga<strong>in</strong> an <strong>in</strong>sight <strong>in</strong>to the<br />
142
143<br />
Chapter 6<br />
characteristics of the test persons and the appropriateness of the search tasks, we did not<br />
f<strong>in</strong>d reason to expand this part of the analysis further. Aga<strong>in</strong> the record<strong>in</strong>gs from the<br />
search test were transcribed to facilitate structured analysis. In the present case the<br />
transcription was carried out by an external transcriber. The procedure is clarified <strong>in</strong><br />
Appendix 10. Like with the focus group <strong>in</strong>terviews we used atlas.ti for analysis and<br />
followed Halkier’s (2008) three steps for analysis of qualitative data.<br />
The search log registered search time and keys applied. From the screen video<br />
recorded dur<strong>in</strong>g the searches, we manually drew number of hits retrieved, selection of<br />
subject categories, use of <strong>in</strong>formation filters and search types. All were registered <strong>in</strong><br />
SPSS for the analysis purposes. Lastly the relevance assessments of documents were<br />
typed <strong>in</strong>to SPSS. This work resulted <strong>in</strong> the identification of a number of variables listed<br />
<strong>in</strong> Table 6.6.<br />
At query level we measured the number of terms applied, search keys applied, the<br />
search operators and document type specifications used, the number of hits, the success<br />
of queries, and the type of reformulations undertaken. The number of search terms is<br />
<strong>in</strong>cluded to provide <strong>in</strong>formation about the number of terms needed <strong>in</strong> order to achieve a<br />
satisfy<strong>in</strong>g number of results. Search keys provide knowledge about the number of<br />
search task facets covered <strong>in</strong> queries. The facets identified <strong>in</strong> Table 6.7 (the outer right<br />
column) forms the basis of <strong>in</strong>terpretation of the queries. All elements of a query could<br />
add to the facets; query terms, document types, and categories (the latter only <strong>in</strong> system<br />
B queries). When a category was <strong>in</strong>cluded <strong>in</strong> a query, it was counted as one concept no<br />
matter the number of terms describ<strong>in</strong>g the category. The variable is <strong>in</strong>cluded for several<br />
reasons. One reason is to be able to identify the average number of terms used to<br />
represent search keys. This <strong>in</strong>forms about which search keys are considered more<br />
important to the test persons, but also the level of detail <strong>in</strong> the representation of search<br />
keys. The other reason is the option of identify<strong>in</strong>g the optimal number of search keys<br />
for obta<strong>in</strong><strong>in</strong>g a useful search result.<br />
Search operators and the use of the document type filter express, how searchers<br />
comb<strong>in</strong>e their search terms, and whether they aim for a narrow or a broad search result.<br />
The number of hits is another <strong>in</strong>dicator of the success of a search. Thus, a set of results<br />
can be very small (e.g. 0 hits) or very large (e.g. 50.000 hits). Includ<strong>in</strong>g the number of<br />
hits further enables to compare the quantitative output of different queries. The test<br />
system provided an approximate count of the number of results. In very small sets it
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 6.6 Search test variables, their def<strong>in</strong>ition and measurement<br />
Variable Def<strong>in</strong>ition Measurement<br />
Query level<br />
Terms per query Number of words separated by a s<strong>in</strong>gle Average number of<br />
spac<strong>in</strong>g. Dashes were not counted as terms per query<br />
s<strong>in</strong>gle terms. Terms connected with a<br />
dash (eg.” e-handel”) were counted as<br />
one term.<br />
Search keys per Number of search keys applied <strong>in</strong> queries Average number of<br />
query<br />
search keys per query<br />
Use of search The search operator chosen for a specific Distribution of queries<br />
operators <strong>in</strong> query<br />
us<strong>in</strong>g each of the four<br />
queries<br />
search types <strong>in</strong><br />
Use of the filter<br />
percentages<br />
The DT filter chosen (if any) for a Average number of<br />
“Document type” specific query<br />
queries us<strong>in</strong>g the DT<br />
(DT) <strong>in</strong> queries<br />
filter <strong>in</strong> percentages<br />
Number of hits <strong>in</strong> The number of hits retrieved <strong>in</strong> queries. Average number of<br />
queries<br />
hits retrieved<br />
Query success Queries retriev<strong>in</strong>g at least one document Percentage of<br />
with a relevance score of 2 or 3 are successful queries<br />
considered successful<br />
Type of Reformulations <strong>in</strong> queries. Registered as Percentage of<br />
reformulations the change from the past to the present reformulations <strong>in</strong><br />
query. Registered types count: Category,<br />
query terms, document type, search<br />
operator, and a comb<strong>in</strong>ation of the<br />
above.<br />
queries<br />
144
Number of sessions with<br />
reformulations<br />
Number of reformulations<br />
per session<br />
Session level<br />
Number of sessions conta<strong>in</strong><strong>in</strong>g<br />
more than one query.<br />
Reformulations comprise changes<br />
of queries, search type (or<br />
categories <strong>in</strong> system B), or<br />
document type.<br />
Number of times a query have<br />
been reformulated <strong>in</strong> a session<br />
Session success Sessions conta<strong>in</strong><strong>in</strong>g at least one<br />
successful query are considered<br />
successful<br />
Test persons’ assessment Measured on a scale from 1-5,<br />
of their <strong>in</strong>sight <strong>in</strong>to the where<br />
simulated search tasks 1=No <strong>in</strong>sight, and 5=Great <strong>in</strong>sight<br />
Test persons’ assessment Measured on a scale from 1-5,<br />
of simulated search tasks’ where<br />
level of difficulty 1=Very easy, and 5=Very difficult<br />
Test persons’ assessment Measured on a scale from 1-5,<br />
of the resemblance where<br />
between the simulated 1=No resemblance, and 5=Great<br />
search task and their daily<br />
work tasks<br />
resemblance<br />
145<br />
Chapter 6<br />
Percentage of<br />
sessions<br />
reformulations<br />
with<br />
Average number of<br />
reformulations per<br />
session<br />
Average number of<br />
sessions solved<br />
Average<br />
<strong>in</strong>sight<br />
score on<br />
Average score on the<br />
level of difficulty<br />
Average score on<br />
resemblance between<br />
search task and daily<br />
work tasks<br />
could be verified, that the count was approximated, as the actual number of results<br />
sometimes differed slightly from the <strong>in</strong>formed count. To give equal conditions to small<br />
and large retrieval sets, the number of search results summarized by the system was<br />
registered as the result for all searches. Lastly, the type of reformulations was <strong>in</strong>cluded.<br />
Query reformulations (or modifications) designate the actions taken by searchers <strong>in</strong><br />
order to adjust an <strong>in</strong>adequate search result. For that reason reformulations are highly<br />
<strong>in</strong>formative as to users’ <strong>in</strong>teraction with an IR system. Huang & Efthimiadis (2009, p.<br />
79) have suggested a taxonomy of reformulations that reflect modifications of search<br />
terms alone. With the present identification of reformulations we wanted reflect the<br />
changes made <strong>in</strong> all fields of the search <strong>in</strong>terface <strong>in</strong>clud<strong>in</strong>g the categorization w<strong>in</strong>dow.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Overall, we may term these variables as <strong>in</strong>teraction variables (cf. Kelly, 2009, p. 105<br />
ff.). A related, and very common variable to <strong>in</strong>clude <strong>in</strong> this type of studies, is the search<br />
time applied. Search time was excluded from the present data, as <strong>in</strong>teraction with the<br />
Table 6.7 Simulated search task facets<br />
Search<br />
task<br />
Description Facets<br />
Sim1 Sell<strong>in</strong>g apartment purchased by parents for their<br />
children. Can the parent get a tax relief for expenses<br />
concern<strong>in</strong>g the estate agent, repairs, and the loss<br />
ga<strong>in</strong>ed, when the apartment was sold?<br />
F<strong>in</strong>d documents outl<strong>in</strong><strong>in</strong>g the fiscal conditions<br />
concern<strong>in</strong>g apartments purchased by parents for<br />
their children.<br />
Sim2 Taxation of e-commerce: An owner-managed one<br />
man publish<strong>in</strong>g house wants to sell books onl<strong>in</strong>e <strong>in</strong><br />
the United States and other countries. The<br />
permanent establishment is <strong>in</strong> Denmark. How is<br />
the owner taxed on his earn<strong>in</strong>gs?<br />
F<strong>in</strong>d documents outl<strong>in</strong><strong>in</strong>g, how e-commerce with<br />
permanent establishment <strong>in</strong> Denmark is taxed.<br />
Sim3 Freelance work: A freelance teacher is about to<br />
expand his activities, which will make him earn<br />
about 100.000 DKR per year. Now he is not sure,<br />
whether he can cont<strong>in</strong>ue as a salaried worker, or if<br />
he must start his own bus<strong>in</strong>ess and become<br />
registered for VAT.<br />
F<strong>in</strong>d documents outl<strong>in</strong><strong>in</strong>g the rules for when to<br />
become registered for VAT.<br />
146<br />
Topical facets:<br />
- Bus<strong>in</strong>ess activity: Parents’<br />
purchase<br />
- Taxation: Tax relief<br />
Non-topical facets:<br />
- Information type: Legal<br />
guidances, citizen booklets,<br />
legislation<br />
Topical facets:<br />
- Bus<strong>in</strong>ess activity:<br />
- E-commerce<br />
- Taxation: Permanent<br />
establishment DK, foreign<br />
<strong>in</strong>come<br />
Non-topical facets:<br />
- Information type: Legal<br />
guidances, bus<strong>in</strong>ess<br />
guidances, legislation<br />
Topical facets:<br />
- Bus<strong>in</strong>ess format: Freelance<br />
- Bus<strong>in</strong>ess activity: Teach<strong>in</strong>g<br />
- Taxation: VAT register<strong>in</strong>g<br />
Non-topical facets:<br />
- Information type: Legal<br />
guidances, bus<strong>in</strong>ess<br />
guidances, legislation
147<br />
Chapter 6<br />
observer took place <strong>in</strong> many search sessions and affected the time spent. As a result,<br />
search time would not have been a valid variable <strong>in</strong> the present data set.<br />
Performance is another prevalent variable type <strong>in</strong> IR evaluation studies (cf.<br />
Kelly, 2009, 106 ff.). Commonly established performance measures are used to<br />
quantify and compare the performance of IR systems. We have already mentioned<br />
precision and recall (section 5.2.4). Other examples count the discounted cumulative<br />
ga<strong>in</strong> (DCG), a measure tak<strong>in</strong>g <strong>in</strong>to account the rank<strong>in</strong>g of documents (Järvel<strong>in</strong> &<br />
Kekälä<strong>in</strong>en, 2002), and mean average precision, a measure that calculates the mean of<br />
precision after all relevant documents have been retrieved (Voorhees, 2000). However,<br />
the form of the log file did not enable these calculations, as it did not store the<br />
documents retrieved. However, we did measure query success <strong>in</strong> terms of the query’s<br />
ability to retrieve relevant documents as outl<strong>in</strong>ed <strong>in</strong> the previous section. For the<br />
purpose of performance measurement, we set a successful search to be a query<br />
retriev<strong>in</strong>g at least one document with a relevance of 2 or 3 (on a scale from 0-3, where 3<br />
is the score of full relevance). 2 was <strong>in</strong>cluded <strong>in</strong> the measurement of success, as it<br />
turned out that the test persons at several occasions stopped their search<strong>in</strong>g, when a<br />
level 2 document had been retrieved. To exemplify, two test persons stop with the<br />
follow<strong>in</strong>g statements:<br />
“Well, I didn’t f<strong>in</strong>d anyth<strong>in</strong>g that states exactly how to do it, but I have found<br />
someth<strong>in</strong>g <strong>in</strong>dicat<strong>in</strong>g where I might f<strong>in</strong>d the rules.” (TP1, l<strong>in</strong>e 243-244), and<br />
“[I still th<strong>in</strong>k it is a 2..] because I do get some <strong>in</strong>formation about the tax<br />
rules… But of course you do need to go one level deeper <strong>in</strong> order to hit a 3” (TP6, l<strong>in</strong>e<br />
49-52).<br />
Another reason for <strong>in</strong>clud<strong>in</strong>g level 2 documents <strong>in</strong> the def<strong>in</strong>ition of a successful query is<br />
the assessment of documents from the metadata conta<strong>in</strong>ed <strong>in</strong> the result lists, and not the<br />
full document. As one test person puts it:<br />
“...I would not give it a 3. Actually, I would probably give 1 to both of them,<br />
because I can’t know if it is what I need, before I get <strong>in</strong> and see if it is correct. But<br />
those are the ones, I would choose. Unless I can see that I can move on...” (TP3, l<strong>in</strong>e<br />
73-76).
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
It appears from the quote that documents might be rated lower because the test persons<br />
do not get to assess the full version of documents.<br />
At session level a number of <strong>in</strong>teraction variables were also <strong>in</strong>cluded; the<br />
number of sessions with reformulations, the number of reformulations, the session<br />
success. The reformulations basically provide same <strong>in</strong>formation as with queries, though<br />
at session level the impression might change slightly, which is why it was <strong>in</strong>cluded here<br />
too. Likewise, the session success is a condensation of the query level <strong>in</strong> order to be<br />
able to compare across the search tasks at a more overall level. In terms of Kelly (2009,<br />
p. 104-105) the rema<strong>in</strong><strong>in</strong>g three variables at session level are characterized as<br />
<strong>in</strong>formation need variables. Here we measured the test persons’ assessments of the<br />
search tasks (solely for simulated search tasks) <strong>in</strong> terms of the level of difficulty, their<br />
<strong>in</strong>sight <strong>in</strong>to the topic, and the similarity of the task with genu<strong>in</strong>e work tasks. Though<br />
the risk of receiv<strong>in</strong>g highly subjective answers <strong>in</strong> this type of assessments, we <strong>in</strong>cluded<br />
them to have some <strong>in</strong>dication of the test persons perception of simulated search tasks.<br />
Subsequently to the registration of data, statistical analyses were carried out.<br />
The analysis consisted of univariate and bivariate statistics, frequencies, means, and<br />
correlations. In addition, <strong>in</strong>ferential statistics was carried out, when relevant. We used<br />
Pearson’s R for <strong>in</strong>terval and scale level data and chi square (2) for data at nom<strong>in</strong>al<br />
level.<br />
6.5 Limitations<br />
It is recognized that the search test has limitations. As the test is designed as a<br />
laboratory, controlled test, it does not necessarily reflect the everyday seek<strong>in</strong>g behaviour<br />
of the employees. Also test persons searched on the basis of three simulated search<br />
tasks. The challenges of design<strong>in</strong>g suitable tasks for professional users have been<br />
outl<strong>in</strong>ed above. From the results presented <strong>in</strong> Chapter 8, the searchers’ handl<strong>in</strong>g of the<br />
genu<strong>in</strong>e search tasks differ <strong>in</strong> some respects from the simulated search tasks. However<br />
<strong>in</strong> most cases the differences are m<strong>in</strong>or. Further, the accordance of facets <strong>in</strong> simulated<br />
and genu<strong>in</strong>e search tasks demonstrates realism to the employees concern<strong>in</strong>g this aspect.<br />
In addition the test persons carried out their own <strong>in</strong>terpretations of the search tasks as to<br />
construct<strong>in</strong>g queries, provid<strong>in</strong>g 128 sessions and 564 queries. In this respect the test<br />
provides knowledge of the test persons’ understand<strong>in</strong>g of and ability to <strong>in</strong>corporate<br />
elements of a search <strong>in</strong>terface <strong>in</strong>to their queries. Lastly, we want to address the state of<br />
the prototype used for the test. Though the system <strong>in</strong>cluded about a fourth of the<br />
148
149<br />
Chapter 6<br />
documents of the runn<strong>in</strong>g <strong>in</strong>tranet, it could have meant that known documents were not<br />
<strong>in</strong>cluded. In addition the tra<strong>in</strong><strong>in</strong>g of the categorization was not f<strong>in</strong>al at the time of the<br />
test, which at times challenged the test persons and may have affected the search log<br />
data. However, the search <strong>in</strong>terviews provided valuable qualitative data to expla<strong>in</strong> and<br />
understand the nature of the challenges and the test persons use of the prototype.<br />
6.6 Relation between research method and research questions<br />
In the previous sections we have outl<strong>in</strong>ed the research methods form<strong>in</strong>g the basis for the<br />
collection and analysis of data. We will close the present chapter by <strong>in</strong>terconnect<strong>in</strong>g the<br />
research method with the research questions guid<strong>in</strong>g the thesis <strong>in</strong> order to clarify the<br />
purpose of the specific elements of the research method. The relations between research<br />
questions and their empirical basis are outl<strong>in</strong>ed <strong>in</strong> Table 6.8. As appears from the table<br />
RQ 1.1-1.4 and 2.1-2.9 are empirically based, while RQ 1.5 and RQ2.10 puts the<br />
empirical f<strong>in</strong>d<strong>in</strong>gs <strong>in</strong>to perspective. Next we will present the results of the doma<strong>in</strong><br />
study.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 6.8 Outl<strong>in</strong>e of the relation between research questions and empirical data<br />
Research question Empirical basis<br />
RQ1: What characterizes the e-<strong>government</strong> employee’s <strong>in</strong>formation seek<strong>in</strong>g behaviour <strong>in</strong><br />
relation to:<br />
1.1 Their use of <strong>in</strong>formation sources? Survey questionnaire and focus group<br />
1.2 Their frequency of <strong>in</strong>formation seek<strong>in</strong>g? <strong>in</strong>terviews<br />
1.3 Their <strong>in</strong>formation needs?<br />
1.4 Their metadata preferences?<br />
1.5 How does the seek<strong>in</strong>g behaviour affect The empirical f<strong>in</strong>d<strong>in</strong>gs of RQ 1.1-1.4<br />
demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong>?<br />
are analysed from an <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
perspective. The response to the<br />
question is analytical.<br />
RQ2: How do automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> and automatic categorization perform <strong>in</strong><br />
relation to the identified doma<strong>in</strong> characteristics as to<br />
2.1 Number of queries <strong>in</strong> sessions? Search log supported by search<br />
2.2 Number of terms <strong>in</strong> queries?<br />
2.3 Number of concepts <strong>in</strong> queries?<br />
2.4 The type of search operator applied?<br />
2.5 The use of document type filters?<br />
2.6 Number of reformulations?<br />
2.7 Types of reformulations?<br />
2.8 Degree of search success <strong>in</strong> queries and<br />
sessions?<br />
<strong>in</strong>terviews<br />
2.9 Overall performance measured by<br />
performance measures<br />
2.10 Which implications does the performance The empirical f<strong>in</strong>d<strong>in</strong>gs of RQ 2.1-2.9<br />
of different <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods have for future are analysed <strong>in</strong> terms of their<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> and <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es <strong>in</strong> the doma<strong>in</strong> implications. The response to the<br />
of e-<strong>government</strong>?<br />
question is analytical.<br />
150<br />
Chapter 7<br />
Chapter 8
7 Doma<strong>in</strong> study results<br />
151<br />
Chapter 7<br />
The purpose of the doma<strong>in</strong> study is to be able to answer the research questions<br />
regard<strong>in</strong>g the <strong>in</strong>formation seek<strong>in</strong>g behaviour of e-<strong>government</strong> employees and how the<br />
doma<strong>in</strong> characteristics affect demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong> with<strong>in</strong> the doma<strong>in</strong> (research<br />
question 1) outl<strong>in</strong>ed <strong>in</strong> Chapter 1. The <strong>in</strong>vestigation of seek<strong>in</strong>g behavior <strong>in</strong> the doma<strong>in</strong><br />
served more purposes <strong>in</strong> the project. Most importantly, the doma<strong>in</strong> study should <strong>in</strong>form<br />
the subsequent search test as to how it is designed <strong>in</strong> order to reflect the behavior with<strong>in</strong><br />
the doma<strong>in</strong>. Secondly, we wanted a validation of the relevance of the system chosen for<br />
the search test (a prototype of a future version of the <strong>in</strong>tranet at SKAT, see section<br />
6.4.1).<br />
The results of the questionnaire (see section 6.2) and the focus groups (see<br />
section 6.3) form the basis for the doma<strong>in</strong> study. The chapter is <strong>in</strong>troduced by a<br />
presentation of the questionnaire respondents and the focus group participants (section<br />
7.1 and 7.2). Next follows results the results and analysis of the empirical data<br />
collection regard<strong>in</strong>g research questions 1.1-1.4. The purpose of the section is to be able<br />
to characterize the seek<strong>in</strong>g behaviour of e-<strong>government</strong> employees <strong>in</strong> the case study. We<br />
have divided the analysis <strong>in</strong> two parts. The first section concerns the f<strong>in</strong>d<strong>in</strong>gs related to<br />
general seek<strong>in</strong>g behaviour of the employees (section 7.3). The succeed<strong>in</strong>g section is<br />
concerned with the results generat<strong>in</strong>g demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong> (section 7.4). The chapter<br />
is f<strong>in</strong>ished by a summary.<br />
7.1 Questionnaire respondents, their background and work tasks<br />
340 respondents completed the questionnaire result<strong>in</strong>g <strong>in</strong> a response rate on 42, 6% (see<br />
Appendix 21), which was an <strong>in</strong>crease of responses compared to the pilot test. Here the<br />
response rate was 29 % (see Appendix 5). The degree of response of the rema<strong>in</strong><strong>in</strong>g 57<br />
% also appears from Appendix 21. As we are only us<strong>in</strong>g the full responses as basis for<br />
the data analysis, the 42, 6% are the focal po<strong>in</strong>t of the rema<strong>in</strong>der of this chapter as to the<br />
questionnaire part.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 7.1 Distribution of respondents as to their education (percentages)<br />
152<br />
# Percentages<br />
Internal clerk programme 97 28.5<br />
Adm<strong>in</strong>istrative assistant 95 27.9<br />
Other vocational education and tra<strong>in</strong><strong>in</strong>g 26 7.6<br />
Upper secondary education 10 2.9<br />
Short-cycle higher education 10 2.9<br />
Bachelor degree 7 2.1<br />
Medium-cycle higher education 26 7.6<br />
Long-cycle higher education 43 12.6<br />
Master’s programme 26 7.6<br />
Total 340 100<br />
The age of the respondents ranges between 19 and 68 years. The average age of the<br />
respondents is slightly above 47 years with a standard deviation of 9.5 years, which<br />
reflect the population figures (see Appendix 23). The respondents overall have quite a<br />
long length of service <strong>in</strong> the organization (see Appendix 24). Accord<strong>in</strong>gly, the<br />
respondents’ experience with the s<strong>in</strong>gle work tasks are also extensive, when measured<br />
as the number of years, the respondents have been work<strong>in</strong>g with the task (see Appendix<br />
22). Thus, the exchange of employees is limited, and that the respondents tend to<br />
cont<strong>in</strong>ue carry<strong>in</strong>g out the same work tasks for some time. However, <strong>in</strong>ternal circulation<br />
of employees <strong>in</strong> the organization also takes place. Thus, both the focus group<br />
<strong>in</strong>terviews and the search test have revealed employees that have carried out numerous<br />
different and diverse tasks dur<strong>in</strong>g their time of service. The majority of the respondents<br />
are educated with<strong>in</strong> the organization or are adm<strong>in</strong>istrative assistants (see Table 7.1).<br />
Another large group have f<strong>in</strong>ished a higher education or master’s programmes. In sum,<br />
the respondents may be characterized as employees of a certa<strong>in</strong> age that are expected to<br />
have a quite some <strong>in</strong>sight <strong>in</strong> organization matters and topics due to<br />
the general long length of service with<strong>in</strong> the organization and due to the educational<br />
background that <strong>in</strong> many cases can be considered as organization specific.<br />
The respondents could select 19 different generic work tasks as their work<br />
tasks <strong>in</strong> the questionnaire. There were neither upper nor lower limits to the number of<br />
selections. The frequencies are shown <strong>in</strong> Table 7.2. We have already discussed the<br />
10,9 % of the respondents not select<strong>in</strong>g any work tasks <strong>in</strong> section 6.2.6 and will not
153<br />
Chapter 7<br />
elaborate further on this issue here. Respondents most frequently chose one (27, 9 %)<br />
or two (35, 8 %) work tasks. From three and upwards, the number of respondents<br />
decreases. The number of work tasks selected by the respondents show that employees<br />
predom<strong>in</strong>antly carry out a few work tasks dur<strong>in</strong>g their work day. This corresponds to<br />
the task oriented organization structure mentioned <strong>in</strong> section 2.4. Further the size of the<br />
organization allows for highly specialized employees.<br />
It may be discussed, how one person can take care of up to as many as six<br />
generic work tasks. The answer may be found <strong>in</strong> exactly the word generic. It may have<br />
caused the respondents some problems identify<strong>in</strong>g exactly their work area <strong>in</strong> the generic<br />
nature of the description of the work tasks (see Appendix 1), when the actual work<br />
Table 7.2 Number of work tasks selected by respondents<br />
Number of work tasks<br />
carried out by employees<br />
Data from web questionnaire:<br />
All employees,<br />
N=340<br />
# %<br />
0 WT 37 10,9%<br />
1 WT 95 27,9%<br />
2 WT 122 35,8%<br />
3 WT 52 15,2%<br />
4 WT 20 5,9%<br />
5 WT 10 2,9%<br />
6 WT 5 1,5%<br />
Total 340 100%
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 7.3 Ranked frequency of work tasks <strong>in</strong> questionnaire results<br />
Work task carried out by employees Data from web<br />
questionnaire:<br />
All respondents,<br />
n=340<br />
# %<br />
Instruction 181 53%<br />
Inspection: common 61 18%<br />
Settlement: prelim<strong>in</strong>ary assessment of <strong>in</strong>come/personal<br />
taxes<br />
57 17%<br />
Settlement: bus<strong>in</strong>ess relations 57 17%<br />
Processes of support: legal support 45 13%<br />
Collection 39 12%<br />
Management and development: development 27 8%<br />
Settlement: corporation taxes 25 7%<br />
Settlement: common 20 6%<br />
Settlement: vehicles 18 5%<br />
Inspection: customs 16 5%<br />
Management and development: strategy 16 5%<br />
Processes of support: <strong>in</strong>ternal activities 15 4%<br />
Settlement: estate 14 4%<br />
Processes of support: IT service and adm<strong>in</strong>istration 14 4%<br />
Processes of support: HR and education 14 4%<br />
Management and development: bus<strong>in</strong>ess management 14 4%<br />
Settlement: customs 12 4%<br />
Processes of support: m<strong>in</strong>ister service 10 3%<br />
Total 655<br />
154
155<br />
Chapter 7<br />
area consists of a comb<strong>in</strong>ation of several work tasks. 11 Further, the respondents were<br />
asked to “pick also work tasks that “you carry out elements of” (page 12 of the<br />
questionnaire, see Appendix 4). However, the majority of the respondents selected<br />
between one and three work tasks. The work tasks are represented by the respondents<br />
accord<strong>in</strong>g to Table 7.3. In total, the respondents answered questions about 655 work<br />
tasks distributed between the 19 generic work tasks. The table demonstrates the relative<br />
extent of the work tasks among the respondents. The most dom<strong>in</strong>at<strong>in</strong>g work task is<br />
Instruction. Instruction differs from most of the other work tasks. Thus, accord<strong>in</strong>g to<br />
the def<strong>in</strong>ition of Instruction, it represents a different layer, because it operates at a meta<br />
level basically concern<strong>in</strong>g the contact with clients, whether citizens or bus<strong>in</strong>esses.<br />
Instruction does not refer to specific subject areas <strong>in</strong> the organisation which is the case<br />
for the rema<strong>in</strong>der of the work tasks.<br />
7.2 Characteristics of focus group participants<br />
The participants <strong>in</strong> the focus groups were assembled <strong>in</strong> order to represent the six ma<strong>in</strong><br />
processes <strong>in</strong> the bus<strong>in</strong>ess model of the organization. As can be seen from Appendix 25,<br />
all six ma<strong>in</strong> processes <strong>in</strong> the bus<strong>in</strong>ess model were represented by the participants. It<br />
turned out, however, that several of the participants covered more than one of the work<br />
tasks. This is <strong>in</strong> l<strong>in</strong>e with the questionnaire results just mentioned. As a consequence,<br />
some participants are placed several places <strong>in</strong> the table. Instruction constitutes a special<br />
case, s<strong>in</strong>ce it is a part of most of the participants’ daily work <strong>in</strong> some sense next to their<br />
other primary functions. The six participants placed here are the ones participat<strong>in</strong>g <strong>in</strong><br />
the focus group specifically concern<strong>in</strong>g Instruction.<br />
The participants represented a number of different educational backgrounds.<br />
When counted by the division from the questionnaire, the participants are distributed<br />
accord<strong>in</strong>g to Table 7.4. Some of the educations mentioned <strong>in</strong> the questionnaire<br />
11<br />
For <strong>in</strong>stance one comb<strong>in</strong>ation <strong>in</strong> the questionnaire represents three tasks: Settlement: Prelim<strong>in</strong>ary<br />
assessment of <strong>in</strong>come/personal taxes, Inspection: Common, and Processes of support: Legal support.<br />
Thus, the work area is concerns <strong>in</strong>spections and legal support <strong>in</strong> regards to <strong>in</strong>come taxes.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 7.4 Focus group participants' educational background<br />
Title of education Data from focus groups:<br />
Focus group participants,<br />
N=35<br />
156<br />
# %<br />
Internal clerk programme 19 54<br />
Adm<strong>in</strong>istrative assistant 3 9<br />
Other vocational education<br />
and tra<strong>in</strong><strong>in</strong>g<br />
Upper secondary education -<br />
Short-cycle higher education -<br />
Bachelor degree -<br />
Medium-cycle higher<br />
education<br />
Long-cycle higher education 8 23<br />
Master’s programme 2 6<br />
Could not be placed 3 9<br />
Total 35<br />
have not been represented <strong>in</strong> the focus groups. We do not consider this a problem s<strong>in</strong>ce<br />
it is the same educations that are less frequent <strong>in</strong> the questionnaire results (see Table<br />
7.1). Also, the aim of the focus groups was not necessarily to be representative for the<br />
organization as to the level of education. The participants range between a few months<br />
and up to about 40 years as to their length of service with<strong>in</strong> the organization. Thus,<br />
both high experience employees and newcomers are represented <strong>in</strong> the groups.<br />
7.3 Results regard<strong>in</strong>g professional e-<strong>government</strong> seek<strong>in</strong>g behavior<br />
The purpose of section 7.3 is to present the general seek<strong>in</strong>g behavior found <strong>in</strong> the<br />
questionnaire and the focus group <strong>in</strong>terviews. The section addresses the employees’<br />
seek<strong>in</strong>g behavior <strong>in</strong> terms of <strong>in</strong>formation sources applied.<br />
-<br />
-
7.3.1 Use of <strong>in</strong>formation sources<br />
157<br />
Chapter 7<br />
The respondents’ selection of sources appears <strong>in</strong> Table 7.5. The questionnaire<br />
does not reveal the relative importance of the listed sources to solve certa<strong>in</strong> work tasks,<br />
as it was not <strong>in</strong>corporated <strong>in</strong> the design of the questionnaire. Thus, we have asked<br />
which sources are used by the respondents, but not the frequency of the s<strong>in</strong>gle source.<br />
The content of Table 7.5 therefore expresses the range of <strong>in</strong>formation sources. The<br />
questionnaire allowed for the respondents to propose additional sources besides the<br />
predef<strong>in</strong>ed ones. Also the focus groups contributed with supplementary sources and<br />
verified the sources mentioned by the respondents. The organization demonstrates a<br />
very broad use of <strong>in</strong>formation sources. From the percentages mentioned at the bottom<br />
row of Table 7.5 it appears that the average importance of the predef<strong>in</strong>ed sources varies<br />
to a large extent. From the table it appears that the <strong>in</strong>tranet is the predom<strong>in</strong>ant source of<br />
<strong>in</strong>formation to the employees. On average 85% of all work tasks applies the system for<br />
problem solv<strong>in</strong>g. Also the WWW and reference works are important to the employees.<br />
The predef<strong>in</strong>ed sources can be arranged <strong>in</strong> three overall groups; reference<br />
works, various web sites, and <strong>in</strong>ternal systems. The groups are not mutually exclusive,<br />
but are used to characterize the systems applied. A fourth group came up dur<strong>in</strong>g the<br />
open questions of the questionnaire and dur<strong>in</strong>g the focus groups: Colleagues as sources<br />
of <strong>in</strong>formation. The results regard<strong>in</strong>g this particular source of <strong>in</strong>formation will be<br />
presented <strong>in</strong> section 7.3.2. The groups guide the analysis of use of sources <strong>in</strong> the<br />
sections to follow. The additional sources cover <strong>in</strong>ternal systems apart from the<br />
predef<strong>in</strong>ed sources, other specialized systems, specific websites, and colleagues (see<br />
Appendix 26). The appendix reflects the myriads of sources used <strong>in</strong> a large specialized<br />
organization as SKAT. The sources are <strong>in</strong>cluded <strong>in</strong> the relevant sections below, when it<br />
has a purpose.<br />
7.3.1.1 Reference works<br />
Due to its area of function, SKAT is to a large extent guided by legislation and rules. In<br />
this section we denote reference works as digital and pr<strong>in</strong>ted reference works. This is<br />
mirrored <strong>in</strong> importance of reference works, whether pr<strong>in</strong>ted or digital appear<strong>in</strong>g <strong>in</strong><br />
Table 7.5. From the table the importance of the legal basis of the organization is<br />
emphasized. In general terms the employees use reference works to a large extent: both<br />
types were used <strong>in</strong> about 40% of the work tasks.<br />
The dist<strong>in</strong>ction between pr<strong>in</strong>ted and electronic sources addresses a general<br />
change <strong>in</strong> organizations that pr<strong>in</strong>ted books are phased out for the benefit of pr<strong>in</strong>ted
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
editions. Focus<strong>in</strong>g on the work tasks with above 50 respondents, the digital versions<br />
have a vaguely higher score (see Table 7.5). However pr<strong>in</strong>ted versions are still<br />
important to the employees. The participants mentioned different reasons for still<br />
need<strong>in</strong>g pr<strong>in</strong>ted versions of reference works. The overall label for the reasons was<br />
practical matters. The label covers different motivations. One is the nature of the work<br />
task. The nature of the work tasks designates the aspect that some employees <strong>in</strong> the<br />
organization carry out parts of their work away from their desk top due to meet<strong>in</strong>gs,<br />
either <strong>in</strong>ternally, or externally pay<strong>in</strong>g visits to citizens, bus<strong>in</strong>esses, and other<br />
<strong>government</strong>s. This supports the recommendations given by Garcia et al. (2006), that the<br />
implementation of technology <strong>in</strong> work places should be closely related to how work<br />
tasks are carried out <strong>in</strong> particular sett<strong>in</strong>gs. Also some participants found pr<strong>in</strong>ted<br />
versions are easier to read, and lastly it was mentioned that pr<strong>in</strong>ted versions are easier to<br />
search. Regard<strong>in</strong>g the search<strong>in</strong>g, differences of op<strong>in</strong>ions were expressed <strong>in</strong> the focus<br />
groups though, which may also expla<strong>in</strong> the even use of the two. To exemplify:<br />
“I use electronic reference works a lot. I believe that I f<strong>in</strong>d far the most here.<br />
If I search right, I will get it. But they also conta<strong>in</strong> cross references to all the th<strong>in</strong>g on<br />
the <strong>in</strong>tranet, and it is supposed to get, what is on the Internet too, through the<br />
Parliament and the like.” (R16, p. 5), and<br />
”…as long as you have a pr<strong>in</strong>ted reference work, they are easier to consult.<br />
That is, if you know where to look.” (R23, p. 11).<br />
The two quotes illustrate how the selection of either electronic or pr<strong>in</strong>ted sources is a<br />
matter of the user’s preferences and experience.<br />
7.3.1.2 Web sites<br />
The predef<strong>in</strong>ed websites covered by the head<strong>in</strong>g “Web sites” is the homepage of the<br />
Danish Parliament, m<strong>in</strong>istry homepages, borger.dk 12 , “Rets<strong>in</strong>formation” 13 and the<br />
12<br />
Borger.dk is the Danish common portal of communication between citizens and <strong>government</strong>s. The<br />
portal enables self-service for citizens, but has also got an area for public authorities. See<br />
www.borger.dk.<br />
13<br />
Rets<strong>in</strong>formation is the official danish website conta<strong>in</strong><strong>in</strong>g the acts, their procedural history, historical<br />
law, and the like. The database is located at: www.rets<strong>in</strong>fo.dk.<br />
158
159<br />
Chapter 7<br />
<strong>in</strong>ternet <strong>in</strong> general. As appears from Table 7.5, the preferred resource to use of the five<br />
listed is the Internet. Further, the <strong>in</strong>ternet is the second most used <strong>in</strong>formation source of<br />
all the predef<strong>in</strong>ed types. Of course, the designation of the source may have a say<strong>in</strong>g <strong>in</strong><br />
its predom<strong>in</strong>ance. Thus, <strong>in</strong> pr<strong>in</strong>ciple, the <strong>in</strong>ternet <strong>in</strong> general could <strong>in</strong>clude the rema<strong>in</strong>der<br />
of the web based sources listed <strong>in</strong> the questionnaire. As appears from the table, the<br />
<strong>in</strong>ternet <strong>in</strong> general is the second most frequent source of <strong>in</strong>formation <strong>in</strong> SKAT. Us<strong>in</strong>g<br />
Google for search<strong>in</strong>g was brought forward several times dur<strong>in</strong>g the focus groups. The<br />
search eng<strong>in</strong>e was used for explorative searches and as a gateway to search<strong>in</strong>g other<br />
systems like the <strong>in</strong>tranet. To exemplify:<br />
“For me, if I need rul<strong>in</strong>gs, I use Google even though I know I can access<br />
Rets<strong>in</strong>formation and Thomson too. But I search Google, because I f<strong>in</strong>d the electronic<br />
reference works too bad. Then I f<strong>in</strong>d the rul<strong>in</strong>g <strong>in</strong> Google and then I might get referred<br />
to one of those pages that we are perhaps supposed to use, but I simply f<strong>in</strong>d their search<br />
functionalities too bad.” (R11, p. 5)<br />
Further websites is one of the examples of sources that are closely related to<br />
specific work tasks. Thus both Rets<strong>in</strong>formation, m<strong>in</strong>istry homepages, and the<br />
homepage of the Danish parliament are far more extended <strong>in</strong> the ma<strong>in</strong> processes<br />
“Processes of support” and “Management and development”. The <strong>in</strong>creased use here<br />
demonstrates that the employees to a larger extent than other employees are engaged <strong>in</strong><br />
detailed legal matters <strong>in</strong> the organization.<br />
7.3.1.3 Internal systems<br />
In the predef<strong>in</strong>ed sources, the <strong>in</strong>tranet and Captia (an electronic case management<br />
system) are the representatives of <strong>in</strong>ternal sources of the organization. Apart from the<br />
two, the questionnaire and focus groups reported a number of additional sources with<strong>in</strong><br />
the <strong>in</strong>ternal group of systems. To a large extent, the <strong>in</strong>ternal systems added represent<br />
systems equivalent to Captia. Examples are Dipsy, KMD, Remedy, TST, DR, and the<br />
like. The systems serve different purposes <strong>in</strong> the organization, but have one th<strong>in</strong>g <strong>in</strong><br />
common; they are all systems for registration of either cases, requests or other data.<br />
The systems mentioned reflected local differences as an implication of the mergers, but<br />
also highly specialized systems support<strong>in</strong>g the professional activities of the employees<br />
(see Appendix 26). As a preparation for the search test we were <strong>in</strong>terested <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g<br />
out, if the <strong>in</strong>tranet use differed as to work tasks. Thus, we wanted to f<strong>in</strong>d out if some<br />
work tasks were more appropriate for the search test than others. It
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 7.5 Respondents' use of predef<strong>in</strong>ed <strong>in</strong>formation sources (percentages) (to be cont<strong>in</strong>ued on the succeed<strong>in</strong>g page)<br />
Sources used for<br />
certa<strong>in</strong> work<br />
tasks<br />
Intranet Digital<br />
reference<br />
works<br />
Pr<strong>in</strong>ted<br />
reference<br />
works<br />
Homepage of<br />
the Danish<br />
Parliament<br />
160<br />
Sources<br />
Captia M<strong>in</strong>istry<br />
home-<br />
pages<br />
Borger.dk Rets<strong>in</strong>-<br />
forma-tion<br />
The Internet <strong>in</strong><br />
general<br />
# % # % # % # % # % # % # % # % # %<br />
Instruction 154 85 102 56 98 54 17 9 47 26 31 17 21 12 42 23 89 49<br />
Settlement:<br />
common<br />
Settlement:<br />
prelim<strong>in</strong>ary<br />
assessment of<br />
<strong>in</strong>come/ personal<br />
taxes<br />
Settlement:<br />
bus<strong>in</strong>ess relations<br />
Settlement:<br />
corporation taxes<br />
Settlement:<br />
customs<br />
16 80 11 55 9 45 2 10 6 30 2 10 2 10 3 15 8 40<br />
50 88 42 74 33 58 1 2 6 11 4 7 6 11 5 9 22 39<br />
46 81 34 60 33 58 8 14 14 25 8 14 4 7 12 21 24 42<br />
21 84 21 84 17 68 4 16 10 40 4 16 - - 6 24 16 64<br />
9 75 1 8 7 58 - - 2 17 - - - - - - 2 17<br />
Legend: The table states the percentages of respondents that use a specific <strong>in</strong>formation source for a certa<strong>in</strong> work task. S<strong>in</strong>ce the table at least for some<br />
<strong>in</strong>formation sources reflects a wide variation between work tasks, the last row summarize the total average percentage across all work tasks reported.
Sources used for<br />
certa<strong>in</strong> work<br />
tasks<br />
Settlement:<br />
vehicles<br />
Intranet Digital<br />
reference<br />
works<br />
Pr<strong>in</strong>ted<br />
reference<br />
works<br />
Homepage of<br />
the Danish<br />
Parliament<br />
161<br />
Sources<br />
Captia M<strong>in</strong>istry<br />
home-<br />
pages<br />
Borger.dk Rets<strong>in</strong>-<br />
forma-tion<br />
Chapter 7<br />
The Internet <strong>in</strong><br />
general<br />
# % # % # % # % # % # % # % # % # %<br />
15 83 3 17 4 22 2 11 3 17 2 11 3 17 3 !% 10<br />
Settlement: estate 11 79 8 57 9 64 1 7 6 43 5 36 3 21 5 36 9 64<br />
Inspection:<br />
common<br />
Inspection:<br />
customs<br />
49 80 51 84 43 71 6 10 14 23 4 7 5 8 19 31 35 57<br />
11 69 1 6 5 31 1 6 4 25 1 6 - - 1 6 4 25<br />
Collection 32 82 16 41 11 28 2 5 16 41 3 7 6 15 8 21 13 33<br />
Processes of<br />
support: legal<br />
support<br />
Processes of<br />
support: m<strong>in</strong>ister<br />
service<br />
43 96 36 80 32 71 17 38 17 38 18 40 3 7 22 49 20 44<br />
9 90 5 50 3 30 8 80 4 40 6 60 - - 5 50 5 50<br />
56
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 7.5 Respondents' use of predef<strong>in</strong>ed <strong>in</strong>formation sources (percentages). Part 2 (to be cont<strong>in</strong>ued on the succeed<strong>in</strong>g page)<br />
Sources used for<br />
certa<strong>in</strong> work<br />
tasks<br />
Processes of<br />
support: IT<br />
service and<br />
adm<strong>in</strong>istration<br />
Processes of<br />
support: HR and<br />
education<br />
Processes of<br />
support: <strong>in</strong>ternal<br />
activities<br />
Management and<br />
development:<br />
strategy<br />
Intranet Digital<br />
reference<br />
works<br />
Pr<strong>in</strong>ted<br />
reference<br />
works<br />
Homepage of<br />
the Danish<br />
Parliament<br />
162<br />
Sources<br />
Captia M<strong>in</strong>istry<br />
home-<br />
pages<br />
Borger.dk Rets<strong>in</strong>-<br />
forma-tion<br />
The Internet <strong>in</strong><br />
general<br />
# % # % # % # % # % # % # % # % # %<br />
11 79 1 7 2 14 - - 3 21 - - 1 7 - - 9 64<br />
12 86 3 21 1 7 - - 3 21 3 21 - - 1 7 9 64<br />
13 87 1 7 - - - - 5 33 - - 1 7 2 13 12 80<br />
16 100 3 19 5 31 5 31 1 6 6 38 1 6 3 19 12 75
Sources used for<br />
certa<strong>in</strong> work<br />
tasks<br />
Management and<br />
development:<br />
bus<strong>in</strong>ess<br />
management<br />
Management<br />
and<br />
development:<br />
development<br />
Total average<br />
percentage<br />
Intranet Digital<br />
reference<br />
works<br />
Pr<strong>in</strong>ted<br />
reference<br />
works<br />
Homepage of<br />
the Danish<br />
Parliament<br />
163<br />
Sources<br />
Captia M<strong>in</strong>istry<br />
home-<br />
pages<br />
Borger.dk Rets<strong>in</strong>-<br />
forma-tion<br />
Chapter 7<br />
The Internet <strong>in</strong><br />
general<br />
# % # % # % # % # % # % # % # % # %<br />
14 100 1 7 4 29 2 14 3 21 2 14 1 7 1 7 8 57<br />
26 96 4 15 6 22 8 30 6 22 7 26 - - 9 33 19 70<br />
85 39 40 15 26 17 7 19 52<br />
Legend: The table states the percentages of respondents that use a specific <strong>in</strong>formation source for a certa<strong>in</strong> work task. S<strong>in</strong>ce the table at least for some<br />
<strong>in</strong>formation sources reflects a wide variation between work tasks, the last row summarize the total average percentage across all work tasks reported.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
turned out that the <strong>in</strong>tranet was listed as the most frequently used source across all work<br />
tasks except for one (Inspection: common) (see Table 7.5), and may thus overall be<br />
considered the most important source of <strong>in</strong>formation <strong>in</strong> the organization. Tak<strong>in</strong>g the<br />
focus groups <strong>in</strong>to account, it appears that the <strong>in</strong>tranet holds different functions to the<br />
participants. Key functions are as a library of <strong>in</strong>ternal messages and documents, as a<br />
tool for be<strong>in</strong>g updated on topics of <strong>in</strong>terest, and as a library of specialist <strong>in</strong>formation.<br />
To illustrate:<br />
“…so the <strong>in</strong>formation need that I have [regard<strong>in</strong>g the <strong>in</strong>tranet]is more aimed<br />
towards changes <strong>in</strong> new court decisions, new legislation, and we usually get that from<br />
the <strong>in</strong>tranet. That means that I go there every morn<strong>in</strong>g to see, if anyth<strong>in</strong>g new have<br />
come <strong>in</strong> relation to collection, and that is how I stay updated” (R33, p. 1)<br />
Hav<strong>in</strong>g placed the <strong>in</strong>tranet as the most important general source of <strong>in</strong>formation, the<br />
<strong>in</strong>tranet is at the same time considered a challeng<strong>in</strong>g system to use by the participants.<br />
The challenges has different overall directions: too much <strong>in</strong>formation, irrelevant<br />
<strong>in</strong>formation, and trouble locat<strong>in</strong>g relevant <strong>in</strong>formation. Two quotes exemplify:<br />
“So the <strong>in</strong>tranet, it is our common notice board. And that also decides the<br />
search results. You might even get recipes, if they have been published.” (R7, p. 5), and<br />
“It is very often, when we are answer<strong>in</strong>g agent telephones. For <strong>in</strong>stance when<br />
e-<strong>in</strong>come was new, they would ask us “how do you do a correction of wrongly stated<br />
taxes”, and we also didn’t know many of the questions and then we could search the<br />
<strong>in</strong>tranet, but we gave up. We had to pass them on to someone deal<strong>in</strong>g with it, because it<br />
took us too long, and it was confus<strong>in</strong>g to search the <strong>in</strong>tranet. We couldn’t f<strong>in</strong>d the<br />
answers, we needed, because you got page by page conta<strong>in</strong><strong>in</strong>g the least bit about e<strong>in</strong>come,<br />
that’s what you get.” (XX, settlement, p. 3)<br />
The problems are solved <strong>in</strong> different ways. For documents, that may also be<br />
found at the official web page of the organization, several of the participants mention,<br />
that they f<strong>in</strong>d the web site easier to navigate than the <strong>in</strong>tranet. Others perform a Google<br />
search, either at the whole www or limited to the doma<strong>in</strong> www.skat.dk. A third<br />
common way of solv<strong>in</strong>g the problem is to ask a colleague for help. Regardless the<br />
164
165<br />
Chapter 7<br />
approach applied to solve the <strong>in</strong>tranet search problems, the importance of qualified<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> is stressed.<br />
To sum up a variety of sources are used by the employees along with creativity<br />
to f<strong>in</strong>d <strong>in</strong>formation when needed. In terms of the search test we were <strong>in</strong>formed that<br />
apart from “Settlement: Customs”, the <strong>in</strong>tranet had an extensive use <strong>in</strong> the organization.<br />
That supported our choice of system for the search test.<br />
7.3.2 Colleagues as sources of <strong>in</strong>formation<br />
One general characteristic throughout all of the focus groups is the importance<br />
of colleagues as <strong>in</strong>formation sources. We did not ask about this particular type of<br />
source <strong>in</strong> the questionnaire. Though, a number of respondents mentioned colleagues<br />
and neighbor tra<strong>in</strong><strong>in</strong>g as additional sources <strong>in</strong> the open box below the predef<strong>in</strong>ed answer<br />
options for <strong>in</strong>formation sources. Colleagues as <strong>in</strong>formation sources has been<br />
<strong>in</strong>vestigated <strong>in</strong> the LIS research previously, <strong>in</strong> the public doma<strong>in</strong> (e.g., Hazlett,<br />
McAdam & Beggs, 2008; Woudstra & van den Hooff, 2008) as well as <strong>in</strong> other<br />
professional contexts (e.g., Herzum et al., 2002; Herzum & Pejtersen, 2000; Xu, Tan &<br />
Yang, 2006). Here, the employees put forward two ma<strong>in</strong> reasons for us<strong>in</strong>g colleagues:<br />
For efficiency matters:<br />
“Well, if it is tasks with<strong>in</strong> special problem areas, and we know we have<br />
colleagues with knowledge about it, then it is tempt<strong>in</strong>g to go ask, because the person is<br />
likely to know the latest decisions <strong>in</strong> the area. Instead of start<strong>in</strong>g to… It is also a<br />
matter of time. You can save time by…” (XX, Guidance, p. 11),<br />
And for validation matters:<br />
”Well, I do prefer to consult the customs guidance <strong>in</strong> the first place, and then…<br />
if I am not really sure if anyth<strong>in</strong>g new has come, then I will check out the electronic and<br />
stuff. And then I always go ask…” (R19, p. 5)<br />
To sum up, colleagues are important to the employees, here and <strong>in</strong> related studies.<br />
However, to some extent it is due to <strong>in</strong>effective retrieval systems, which emphasizes the<br />
need for an improved <strong><strong>in</strong>dex<strong>in</strong>g</strong> practice.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 7.6 Questionnaire results regard<strong>in</strong>g the frequency of <strong>in</strong>formation seek<strong>in</strong>g<br />
Work tasks<br />
Every<br />
time<br />
Instruction 33<br />
18%<br />
Settlement: common 5<br />
Settlement: prelim<strong>in</strong>ary<br />
assessment of <strong>in</strong>come/personal<br />
taxes<br />
25%<br />
5<br />
9%<br />
Settlement: bus<strong>in</strong>ess relations 10<br />
18%<br />
Settlement: corporation taxes 10<br />
40%<br />
Settlement: customs 2<br />
17%<br />
Settlement: vehicles 4<br />
22%<br />
Settlement: estate 4<br />
29%<br />
Inspection: common 18<br />
30%<br />
Inspection: customs 5<br />
31%<br />
Collection 8<br />
21%<br />
Processes of support: legal support 14<br />
Processes of support: m<strong>in</strong>ister<br />
service<br />
31%<br />
5<br />
50%<br />
166<br />
Every second<br />
time<br />
20<br />
11%<br />
2<br />
10%<br />
6<br />
11%<br />
6<br />
11%<br />
4<br />
16%<br />
1<br />
8%<br />
2<br />
11%<br />
1<br />
7%<br />
13<br />
21%<br />
-<br />
-<br />
9<br />
20%<br />
1<br />
10%<br />
Frequencies<br />
Every 3rd<br />
or 4 th time<br />
86<br />
48%<br />
5<br />
25%<br />
27<br />
47%<br />
27<br />
47%<br />
10<br />
40%<br />
3<br />
25%<br />
7<br />
39%<br />
6<br />
43%<br />
24<br />
39%<br />
4<br />
25%<br />
14<br />
36%<br />
18<br />
40%<br />
2<br />
20%<br />
Practically<br />
never<br />
42<br />
23%<br />
8<br />
40%<br />
19<br />
33%<br />
14<br />
25%<br />
1<br />
4%<br />
6<br />
50%<br />
5<br />
28%<br />
3<br />
21%<br />
5<br />
8%<br />
7<br />
44%<br />
17<br />
44%<br />
4<br />
9%<br />
2<br />
20%
Work tasks<br />
Processes of support: IT service<br />
and adm<strong>in</strong>istration<br />
Processes of support: HR and<br />
education<br />
Processes of support: <strong>in</strong>ternal<br />
activities<br />
Management and development:<br />
strategy<br />
Management and development:<br />
bus<strong>in</strong>ess management<br />
Management and development:<br />
development 27<br />
Every<br />
time<br />
4<br />
29%<br />
2<br />
14%<br />
3<br />
20%<br />
2<br />
13%<br />
3<br />
21%<br />
6<br />
22%<br />
167<br />
Every second<br />
time<br />
1<br />
7%<br />
3<br />
21%<br />
1<br />
7%<br />
2<br />
13%<br />
2<br />
14%<br />
5<br />
19%<br />
Frequencies<br />
Every 3rd<br />
or 4 th time<br />
5<br />
36%<br />
6<br />
43%<br />
5<br />
33%<br />
7<br />
44%<br />
5<br />
36%<br />
14<br />
52%<br />
7.4 Seek<strong>in</strong>g results regard<strong>in</strong>g demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Chapter 7<br />
Practically<br />
never<br />
4<br />
29%<br />
3<br />
21%<br />
6<br />
40%<br />
5<br />
31%<br />
4<br />
29%<br />
In the sections to follow, we report on the f<strong>in</strong>d<strong>in</strong>gs <strong>in</strong>form<strong>in</strong>g about the demands for<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong>.<br />
7.4.1 The frequency on <strong>in</strong>formation seek<strong>in</strong>g<br />
The need for <strong>in</strong>formation seek<strong>in</strong>g were documented <strong>in</strong> the questionnaire by question 17<br />
(see Appendix 4) regard<strong>in</strong>g frequency of <strong>in</strong>formation seek<strong>in</strong>g. The question was not<br />
aimed specifically at the <strong>in</strong>tranet. Rather the question was formulated broadly <strong>in</strong> order<br />
to <strong>in</strong>vestigate <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> general. This means, that the question maps the<br />
<strong>in</strong>formation seek<strong>in</strong>g regardless the source applied. The distribution for the s<strong>in</strong>gle work<br />
tasks appears <strong>in</strong> Table 7.6. The table shows, that the most common frequency for<br />
<strong>in</strong>formation seek<strong>in</strong>g is every third or fourth time (column 3). Thus, <strong>in</strong> 12 of 19 work<br />
tasks this is the frequency with the highest score. Apparently the general picture is that<br />
2<br />
7%
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
<strong>in</strong>formation seek<strong>in</strong>g does take place rather frequently, but not necessarily every time a<br />
work task is solved.<br />
In the focus groups, the issue of <strong>in</strong>formation seek<strong>in</strong>g received some attention,<br />
because some of the frequencies from the questionnaire did not mirror the frequencies<br />
of the participants. Thus, the participants discussed, what constitutes <strong>in</strong>formation<br />
seek<strong>in</strong>g. Some participants <strong>in</strong>tuitively understood <strong>in</strong>formation seek<strong>in</strong>g as mere look ups<br />
<strong>in</strong> an <strong>in</strong>formation system. One participant says:<br />
“If a client reports to the counter out here, you ask for his civil registration<br />
number and log <strong>in</strong>to his <strong>in</strong>formation. This is the first <strong>in</strong>formation. You cannot serve a<br />
client unless you seek <strong>in</strong>formation at least once... But if someone asks a specialist<br />
question, then the need for <strong>in</strong>formation is not nearly as substantial. Because then you<br />
answer on the basis of someth<strong>in</strong>g you know like the back of your hand...The only<br />
requests that do not require <strong>in</strong>formation are the ones ask<strong>in</strong>g for direction to the motor<br />
unit. They are handed over an <strong>in</strong>struction. Everyone else <strong>in</strong>volves look ups.” (R7, p. 2-<br />
3)<br />
Another participant (R33) supplements:<br />
“We cannot do anyth<strong>in</strong>g without hav<strong>in</strong>g the ICT based possibilities of look<strong>in</strong>g<br />
up companies, demands, what does this company owe, this person, what does he or she<br />
owes. We need to access the network all the time.”<br />
In other words, if <strong>in</strong>formation seek<strong>in</strong>g is understood by the respondents <strong>in</strong> the sense of<br />
mere look-ups <strong>in</strong> some sort of <strong>in</strong>formation system, then <strong>in</strong>formation seek<strong>in</strong>g occurs very<br />
frequently if not even every time a work task is solved.<br />
Information seek<strong>in</strong>g triggered by an <strong>in</strong>formation need occurs less frequently.<br />
The frequency is affected by different conditions. One condition is the number of self<br />
service solutions developed <strong>in</strong> the organization. Self-service implies that citizens are<br />
handl<strong>in</strong>g a range of tasks by themselves. One consequence of this is that some<br />
knowledge areas of the employees are not ma<strong>in</strong>ta<strong>in</strong>ed, because the work tasks that used<br />
to help ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g the knowledge areas are handled by the citizens themselves now. In<br />
relation to the frequency of <strong>in</strong>formation seek<strong>in</strong>g, this means, that the frequency<br />
<strong>in</strong>creases, because an <strong>in</strong>formation need now emerges <strong>in</strong> situations that used to be dealt<br />
with by the employees memory. The follow<strong>in</strong>g quote illustrates this:<br />
168
169<br />
Chapter 7<br />
“...when we were employed by the municipality, our job was to assess as many<br />
people as possible, that is go<strong>in</strong>g through their tax return to see, whether they did it right<br />
or wrong... this means, that back then we ga<strong>in</strong>ed experience all the time and kept up<br />
with what happened <strong>in</strong> this area and this area... now we need to make people use selfservice<br />
and make error lists, so we keep los<strong>in</strong>g, what we once used to know by memory.<br />
I certa<strong>in</strong>ly feel, that many of the questions, I used to answer just like that, now requires<br />
read<strong>in</strong>g. Just to be brought up to date and see, if someth<strong>in</strong>g new has occurred s<strong>in</strong>ce the<br />
last time.” (R10, p. 4)<br />
This discussion may also expla<strong>in</strong>, why several work tasks <strong>in</strong> Table 7.6 has a peak of<br />
frequency at both “Every 3 rd or 4 th time” and “Every time”.<br />
Another condition is the prior knowledge of the case handled. Accord<strong>in</strong>g to<br />
R35 <strong>in</strong>formation seek<strong>in</strong>g only takes place, if:<br />
“... you are handl<strong>in</strong>g a completely new case. Then, obviously, I need to seek<br />
more <strong>in</strong>formation about this company. If it is a company, I know <strong>in</strong> advance, I might<br />
just check, what has been declared and what has been paid. But no matter what, I<br />
always seek before I am go<strong>in</strong>g to talk to a company.”<br />
Some work tasks differ from the general tendency of “Every 3 rd or 4 th time”<br />
be<strong>in</strong>g the most common frequency. Us<strong>in</strong>g percentage distribution as an <strong>in</strong>dicator, seven<br />
work tasks generate noticeably more or less frequent <strong>in</strong>formation seek<strong>in</strong>g than the most<br />
frequent category.<br />
With<strong>in</strong> “Processes of support” two work tasks differed from the overall pattern.<br />
“M<strong>in</strong>ister service” generated a higher frequency of <strong>in</strong>formation seek<strong>in</strong>g with the largest<br />
percentage share of all work tasks on “Every time”. “Internal activities” on the other<br />
hand had a lower frequency than the general picture ad had the majority of the<br />
respondents seek<strong>in</strong>g for <strong>in</strong>formation every third or fourth time or practically never.<br />
Neither of the work tasks had a lot of respondents. But still, the focus groups and the<br />
description of the work tasks added to our understand<strong>in</strong>g of the respondents’ behaviour<br />
<strong>in</strong> the two particular work tasks.<br />
“M<strong>in</strong>ister service” was not directly represented <strong>in</strong> the focus group <strong>in</strong>terviews,<br />
but was discussed <strong>in</strong> the focus group for “Processes of support”. The <strong>in</strong>terview<br />
supported, that “M<strong>in</strong>ister service” make a special case as to the frequency of
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
<strong>in</strong>formation seek<strong>in</strong>g due to the content of the work task. R14 compares it to the other<br />
work tasks with<strong>in</strong> Processes of support this way:<br />
“My spontaneous explanation for “M<strong>in</strong>ister service” is, that, well, so much<br />
more is at stake, when servic<strong>in</strong>g the m<strong>in</strong>ister. You need to be so much more certa<strong>in</strong>.<br />
...with “M<strong>in</strong>ister service”, you need to be 100% certa<strong>in</strong>. Of course you need to <strong>in</strong> other<br />
cases as well, but more is just at stake with “M<strong>in</strong>ister service”... You need to be 100%<br />
sure, that what you write and produce and contribute with is correct.” (R14, p. 3)<br />
Thus, it seems, that the importance of correct <strong>in</strong>formation becomes even more<br />
important, when passed on to the target group of “M<strong>in</strong>ister service”.<br />
“Internal activities” on the other hand generated an average frequency of<br />
<strong>in</strong>formation seek<strong>in</strong>g that was fairly low compared to the general picture. Thus, the<br />
majority of respondent selected either “every 3 rd or 4 th time” or “Practically never” to<br />
describe their frequency. In the focus groups, the two representatives of “Internal<br />
activities” were rather different. One (R32) took care of mail, that could not be<br />
delivered directly to the relevant party. The other one (R28) worked with<br />
communication. This difference of work tasks may expla<strong>in</strong> the distribution of<br />
frequencies of <strong>in</strong>formation seek<strong>in</strong>g. To R32, what made the frequency of <strong>in</strong>formation<br />
seek<strong>in</strong>g decrease was the k<strong>in</strong>d of <strong>in</strong>formation needed:<br />
“In our group, experience is more important. We almost need to know, what<br />
the different departments are do<strong>in</strong>g, and that is what we try to... But all the time it is<br />
what you remember. He has got someth<strong>in</strong>g to do with this and he has got someth<strong>in</strong>g to<br />
do with that. You can practically not look it up anywhere” (R32, p. 4)<br />
However, when asked about <strong>in</strong>formation sources later on <strong>in</strong> the <strong>in</strong>terview, it appeared<br />
that a number of sources were considered highly necessary <strong>in</strong> order to solve the work<br />
task at hand.<br />
R28 on the other hand considered “Every 3 rd or 4 th time” <strong>in</strong>sufficient when<br />
describ<strong>in</strong>g her own frequency of <strong>in</strong>formation seek<strong>in</strong>g. When asked if she looked for<br />
<strong>in</strong>formation more often than “Every 3 rd or 4 th time”, she replied:<br />
“Yes, I th<strong>in</strong>k so, because I also use it to orientate myself about some th<strong>in</strong>gs<br />
before I show up or answer an e-mail... But it is also related to how I understand a task<br />
170
171<br />
Chapter 7<br />
because to me the <strong>in</strong>tranet and seek<strong>in</strong>g is a part of my job all the time about, well both<br />
SKAT as a bus<strong>in</strong>ess but also the subject area, I am work<strong>in</strong>g with. So you somehow<br />
either seek <strong>in</strong>formation or have signed up for a news mail... And all that <strong>in</strong>formation<br />
aids to how a task is solved one way or the other.”(R28, p. 6)<br />
It seems that the actual frequency of <strong>in</strong>formation seek<strong>in</strong>g is rather frequent with<strong>in</strong><br />
“Processes of support”. The reason for the high frequency of “Practically never” at<br />
“Internal processes” may be expla<strong>in</strong>ed by the sub work tasks that are also <strong>in</strong>cluded <strong>in</strong><br />
the overall description of “Internal processes”, for <strong>in</strong>stance purchas<strong>in</strong>g and<br />
adm<strong>in</strong>istrat<strong>in</strong>g goods, services, and build<strong>in</strong>gs. These are not work tasks that necessarily<br />
generate a high frequency of <strong>in</strong>formation seek<strong>in</strong>g.<br />
Other work tasks generated less <strong>in</strong>formation seek<strong>in</strong>g than the general tendency<br />
and had the majority of respondents <strong>in</strong>dicat<strong>in</strong>g “Practically never” as the frequency of<br />
their <strong>in</strong>formation seek<strong>in</strong>g. The specific work tasks are: “Settlement: common”,<br />
“Settlement: customs”, “Inspection: customs”, and “Collection”.<br />
“Customs”, whether <strong>in</strong> the ma<strong>in</strong> process of “Settlement” or “Inspection”, were<br />
discussed <strong>in</strong> the fourth focus group. All participants turned out to have their primary<br />
function with<strong>in</strong> the ma<strong>in</strong> process of “Settlement”, but also had some <strong>in</strong>sight <strong>in</strong>to<br />
Inspection. The participants had difficulties relat<strong>in</strong>g to, that the majority of respondents<br />
of “Settlement: customs” had answered “Practically never” to represent their frequency<br />
of <strong>in</strong>formation seek<strong>in</strong>g. They did provide examples of work tasks that did not require<br />
<strong>in</strong>formation seek<strong>in</strong>g, either because the <strong>in</strong>formation needed was well known to them or<br />
was already a part of the papers provided for the case. But a large part of the work tasks<br />
carried out needed some k<strong>in</strong>d of <strong>in</strong>formation seek<strong>in</strong>g. The participants of the focus<br />
group did not come to at full agreement on, what was the correct frequency, but the<br />
group agreed, that “Practically never” did not provide a sufficient picture of the actual<br />
frequency.<br />
A hypothesis for the doma<strong>in</strong> study was that the seek<strong>in</strong>g behaviour would differ<br />
depend<strong>in</strong>g on the work task at hand. Look<strong>in</strong>g at Table 7.6, this hypothesis to some<br />
extent is confirmed. A few work tasks stand out with a more frequent behaviour, others<br />
with a less frequent behaviour. Though the general impression is that the employees<br />
look for <strong>in</strong>formation regularly, but not every time they are engaged with a work task.<br />
However disagreements as to the general figures could be traced <strong>in</strong> the focus groups,<br />
<strong>in</strong>dicat<strong>in</strong>g that numbers from the table are percentages, and that <strong>in</strong>dividual differences<br />
occur. In the recruitment of test persons for the search test we wanted to reflect this
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 7.7 Distribution of <strong>in</strong>dicators of <strong>in</strong>formation needs<br />
Work task Information needs<br />
Instruction (181) 38<br />
172<br />
1 2 3 4 5 6 7<br />
21%<br />
Settlement: common (20) 5<br />
Settlement: prelim<strong>in</strong>ary assessment of<br />
<strong>in</strong>come/personal taxes (57)<br />
25%<br />
16<br />
28%<br />
Settlement: bus<strong>in</strong>ess relations (57) 17<br />
30%<br />
Settlement: corporation taxes (25) 3<br />
Settlement: customs (12)<br />
12%<br />
Settlement: vehicles (18) 8<br />
44%<br />
Settlement: estate (14) 5<br />
36%<br />
Inspection: common (61) 13<br />
21%<br />
Inspection: customs (16) 4<br />
25%<br />
Collection (39) 6<br />
15%<br />
Processes of support: legal support (45) 14<br />
Processes of support: m<strong>in</strong>ister service<br />
(10)<br />
Processes of support: IT service and<br />
adm<strong>in</strong>istration (14)<br />
Processes of support: HR and education<br />
(14)<br />
Processes of support: <strong>in</strong>ternal activities<br />
(15)<br />
31%<br />
4<br />
29%<br />
2<br />
14%<br />
3<br />
20%<br />
84<br />
46%<br />
11<br />
55%<br />
26<br />
46%<br />
27<br />
47%<br />
13<br />
52%<br />
3<br />
25%<br />
9<br />
50%<br />
8<br />
57%<br />
37<br />
61%<br />
5<br />
31%<br />
13<br />
33%<br />
26<br />
58%<br />
4<br />
40%<br />
7<br />
50%<br />
8<br />
57%<br />
8<br />
53%<br />
46<br />
25%<br />
6<br />
30%<br />
17<br />
30%<br />
15<br />
26%<br />
6<br />
24%<br />
4<br />
33%<br />
5<br />
28%<br />
4<br />
29%<br />
18<br />
30%<br />
8<br />
50%<br />
7<br />
18%<br />
16<br />
36%<br />
3<br />
30%<br />
1<br />
7%<br />
5<br />
36%<br />
5<br />
33%<br />
26<br />
14%<br />
4<br />
20%<br />
4<br />
7%<br />
8<br />
14%<br />
3<br />
12%<br />
1<br />
8%<br />
3<br />
17%<br />
1<br />
7%<br />
16<br />
26%<br />
3<br />
19%<br />
3<br />
8%<br />
5<br />
11%<br />
5<br />
50%<br />
1<br />
7%<br />
1<br />
7%<br />
4<br />
27%<br />
48<br />
27%<br />
5<br />
25%<br />
19<br />
33%<br />
13<br />
23%<br />
5<br />
20%<br />
2<br />
17%<br />
4<br />
22%<br />
3<br />
21%<br />
24<br />
39%<br />
3<br />
19%<br />
6<br />
15%<br />
16<br />
36%<br />
5<br />
50%<br />
3<br />
21%<br />
4<br />
29%<br />
2<br />
13%<br />
51<br />
28%<br />
4<br />
20%<br />
20<br />
35%<br />
15<br />
26%<br />
6<br />
24%<br />
2<br />
17%<br />
3<br />
17%<br />
4<br />
29%<br />
26<br />
43%<br />
4<br />
25%<br />
12<br />
31%<br />
14<br />
31%<br />
4<br />
40%<br />
6<br />
43%<br />
6<br />
43%<br />
4<br />
27%<br />
117<br />
65%<br />
13<br />
65%<br />
36<br />
63%<br />
27<br />
47%<br />
18<br />
72%<br />
6<br />
50%<br />
11<br />
61%<br />
9<br />
64%<br />
42<br />
69%<br />
5<br />
31%<br />
23<br />
59%<br />
34<br />
76%<br />
5<br />
50%<br />
7<br />
50%<br />
9<br />
64%<br />
11<br />
73%
Work task Information needs<br />
Management and development: strategy<br />
(16)<br />
Management and development: bus<strong>in</strong>ess<br />
management (14)<br />
Management and development:<br />
development (27)<br />
173<br />
Chapter 7<br />
1 2 3 4 5 6 7<br />
3<br />
19%<br />
3<br />
21%<br />
5<br />
19%<br />
8<br />
50%<br />
6<br />
43%<br />
13<br />
48%<br />
5<br />
31%<br />
3<br />
21%<br />
5<br />
19%<br />
8<br />
50%<br />
5<br />
36%<br />
9<br />
33%<br />
8<br />
50%<br />
4<br />
29%<br />
10<br />
37%<br />
9<br />
56%<br />
7<br />
50%<br />
15<br />
56%<br />
10<br />
63%<br />
7<br />
50%<br />
17<br />
63%<br />
Legend:<br />
1) I know exactly which documents I need <strong>in</strong> order to solve the work task<br />
2) I need to f<strong>in</strong>d a document I have used before<br />
3) I pretty much know which documents exist on the subject<br />
4) I am work<strong>in</strong>g with a new project with<strong>in</strong> a subject area well known to me. I would like to<br />
acqua<strong>in</strong>t myself with the part that is new to me<br />
5) I am look<strong>in</strong>g for documents for a new work task with<strong>in</strong> a subject area that is familiar to me<br />
6) I am work<strong>in</strong>g with a subject area that I have not been work<strong>in</strong>g with before<br />
7) I know the subject well but need a specific piece of <strong>in</strong>formation<br />
<strong>in</strong>dividuality. To do this, it was decided to let the <strong>in</strong>dividual frequency use of the<br />
<strong>in</strong>tranet guide, who was selected as test persons for the test.<br />
7.4.2 Types of <strong>in</strong>formation needs<br />
Types of <strong>in</strong>formation needs were <strong>in</strong>vestigated <strong>in</strong> the questionnaire <strong>in</strong> terms of a<br />
number of <strong>in</strong>dicators of each of the three <strong>in</strong>formation needs employed <strong>in</strong> the thesis. It is<br />
important to keep <strong>in</strong> m<strong>in</strong>d that the question about <strong>in</strong>formation needs is formulated<br />
specifically towards the <strong>in</strong>tranet due to the search test. If the range and diversity of<br />
sources applied by the employees is taken <strong>in</strong>to account (see section 7.3.1), it is possible<br />
that specific sources are used for certa<strong>in</strong> <strong>in</strong>formation needs. What we are report<strong>in</strong>g on<br />
<strong>in</strong> the present section is therefore the <strong>in</strong>formation needs that are solved us<strong>in</strong>g the<br />
<strong>in</strong>tranet.<br />
Information needs were represented <strong>in</strong> the questionnaire as a number of<br />
<strong>in</strong>dicators represent<strong>in</strong>g the <strong>in</strong>formation needs suggested by Ingwersen (1992). The<br />
distribution of the respondents across work tasks and <strong>in</strong>formation need <strong>in</strong>dicators<br />
appears from Table 7.7. Two <strong>in</strong>dicators <strong>in</strong> particular describe the situation of the<br />
respondents across the work tasks, namely <strong>in</strong>dicator 2 (I need to f<strong>in</strong>d a document I have
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
used before) and 7 (I know the subject well but need to f<strong>in</strong>d a specific piece of<br />
<strong>in</strong>formation). Thus, <strong>in</strong> most of the 19 work tasks, these are the most frequently<br />
occurr<strong>in</strong>g situations trigger<strong>in</strong>g an <strong>in</strong>formation need.<br />
This distribution corresponds well with the focus group results. Thus several<br />
participants express, that seek<strong>in</strong>g carried out at the <strong>in</strong>tranet is usually focused and that<br />
more open searches are carried out elsewhere. Accord<strong>in</strong>g to R23:<br />
“I do not use [the <strong>in</strong>tranet] to seek without a specific goal. I would at Google,<br />
otherwise not. Used or seen before… It is possible, that you have not used the<br />
document before, but you have seen it before at the least.”. (R23, p. 12)<br />
R18 agrees:<br />
“I know this document is <strong>in</strong> there and I need to use it now. Or: I know this<br />
court rul<strong>in</strong>g exists and I need to f<strong>in</strong>d it now. Or someth<strong>in</strong>g else... Typically probably<br />
someth<strong>in</strong>g I have seen before, that I need to use aga<strong>in</strong>.” (R18, p. 5)<br />
A third <strong>in</strong>dicator is common <strong>in</strong> the work tasks belong<strong>in</strong>g to the ma<strong>in</strong> process<br />
management and development, namely <strong>in</strong>dicator 6 (I am work<strong>in</strong>g with a subject area<br />
that I have not been work<strong>in</strong>g with before). The focus group on management and<br />
development clarifies why. Thus, management and development is a ma<strong>in</strong> process,<br />
where new projects are planned, developed, and launched. Thus, look<strong>in</strong>g for <strong>in</strong>spiration<br />
was the participants’ explanation for the higher frequency for <strong>in</strong>dicator 6. For<br />
Inspection: customs, the most frequent <strong>in</strong>dicator is number 3 (I pretty much know which<br />
documents exist on the subject). This may be related to the frequency of <strong>in</strong>formation<br />
seek<strong>in</strong>g for the work task mentioned above (Section 7.4.1). Thus, it seems, that this<br />
particular work task deal<strong>in</strong>g with field work related to controll<strong>in</strong>g goods and means of<br />
transportation is rout<strong>in</strong>e, and that the employees know the sources needed to solve the<br />
work task.<br />
On the other side, two <strong>in</strong>dicators generally have low frequencies, namely 4 (I<br />
am work<strong>in</strong>g with a new project with<strong>in</strong> a subject area well known to me. I would like to<br />
acqua<strong>in</strong>t myself with the part that is new to me) and 1 (I know exactly which documents<br />
I need <strong>in</strong> order to solve the work task). Indicator 4 may have a low frequency, because<br />
it is less frequent to be start<strong>in</strong>g up new projects than deal<strong>in</strong>g with rout<strong>in</strong>e types of tasks.<br />
174
175<br />
Chapter 7<br />
We previously outl<strong>in</strong>ed the <strong>in</strong>formation needs correspond<strong>in</strong>g to the <strong>in</strong>dicators<br />
(see Table 6.1). A translation of Table 7.7 to the <strong>in</strong>herent <strong>in</strong>formation needs referred to<br />
by the <strong>in</strong>dicators, displays the predom<strong>in</strong>ant <strong>in</strong>formation needs of the respondents. Table<br />
7.8 displays the average percentage distribution of the three types of <strong>in</strong>formation needs<br />
underly<strong>in</strong>g the <strong>in</strong>dicators from Table 7.7. Aga<strong>in</strong> we see that the verificative and the<br />
conscious topical needs are the most common <strong>in</strong>formation needs.<br />
Table 7.8 Average percentage distribution of verificative needs (VN), conscious topical needs<br />
(CTN), and muddled topical needs (MTN).<br />
Work tasks Information needs<br />
VN CTN MTN<br />
Instruction (181) 38% 39% 21%<br />
Settlement common (20) 40% 40% 20%<br />
prelim<strong>in</strong>ary assessment of<br />
<strong>in</strong>come/personal taxes (57)<br />
37% 42% 21%<br />
bus<strong>in</strong>ess relations (57) 39% 32% 20%<br />
corporation taxes (25) 32% 39% 18%<br />
customs (12) 13% 33% 13%<br />
vehicles (18) 47% 37% 17%<br />
estate (14) 46% 38% 18%<br />
Inspection common (61) 41% 46% 34%<br />
customs (16) 28% 33% 22%<br />
Collection (39) 24% 31% 19%<br />
Processes of support legal support (45) 44% 49% 21%<br />
Management and<br />
development:<br />
m<strong>in</strong>ister service (10) 20% 43% 45%<br />
IT service and adm<strong>in</strong>istration (14) 39% 26% 25%<br />
HR and education (14) 36% 43% 25%<br />
<strong>in</strong>ternal activities (15) 37% 40% 27%<br />
strategy (16) 34% 48% 53%<br />
bus<strong>in</strong>ess management (14) 32% 33% 43%<br />
development (27) 33% 40% 44%<br />
Legend: The table displays the mean of occurrences of one or more representatives of the<br />
<strong>in</strong>dicators of <strong>in</strong>formation needs.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Some work tasks has a different rank<strong>in</strong>g of importance as to <strong>in</strong>formation needs.<br />
Three work tasks <strong>in</strong> particular stand out, namely the work tasks belong<strong>in</strong>g to the ma<strong>in</strong><br />
process Management and development. Here, muddled topical needs are the most<br />
frequently occurr<strong>in</strong>g needs.<br />
One aspect of <strong>in</strong>formation seek<strong>in</strong>g is not triggered by a specific work task<br />
when it comes to the <strong>in</strong>tranet at SKAT. The <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> question is the<br />
seek<strong>in</strong>g carried out by the employees <strong>in</strong> order to ma<strong>in</strong>ta<strong>in</strong> a current state of knowledge<br />
as to their work tasks. Thus, besides seek<strong>in</strong>g for <strong>in</strong>formation <strong>in</strong> relation to specific work<br />
tasks, the <strong>in</strong>tranet is also used to stay updated on recent developments with<strong>in</strong> topics of<br />
<strong>in</strong>terest to the employees. The <strong>in</strong>tranet is cont<strong>in</strong>uously updated with news and updates.<br />
This flow of <strong>in</strong>formation is partly caused by the characteristic of the foundation of<br />
the organization. Thus, the work at SKAT is largely guided by legal rules that<br />
constantly evolve. This further means, that the knowledge of the employees needs to be<br />
stay updated. Several of the participants state that they consult the <strong>in</strong>tranet on a daily<br />
basis for updates with<strong>in</strong> their work<strong>in</strong>g areas.<br />
We cannot verify this particular behaviour on the basis of the respondents,<br />
s<strong>in</strong>ce the questionnaire did not aim at <strong>in</strong>vestigat<strong>in</strong>g this k<strong>in</strong>d of behaviour. Instead this<br />
characteristic of the employees’ seek<strong>in</strong>g behaviour was revealed dur<strong>in</strong>g the focus group<br />
<strong>in</strong>terviews.<br />
”...I th<strong>in</strong>k, that the <strong>in</strong>tranet and seek<strong>in</strong>g is a part of my work all the time, also<br />
just keep<strong>in</strong>g myself updated on, well both on SKAT as a bus<strong>in</strong>ess but also the field, I am<br />
work<strong>in</strong>g with. So you somehow either seek <strong>in</strong>formation or have signed up for a<br />
newsletter, and then you receive the <strong>in</strong>formation that way. And all that <strong>in</strong>formation<br />
helps you solve the work task one way or the other.”<br />
Another participant (R28) agrees:<br />
“It also the place, where control signals and the like are com<strong>in</strong>g. What we<br />
need to obey with<strong>in</strong> the bus<strong>in</strong>ess. And also… the directions, the legal directions. When<br />
they are updated, they are published there too. So there is a lot to be attentive to there,<br />
really. You cannot avoid it. It would be scary, if it was not at 100 %, our <strong>in</strong>tranet. You<br />
sort of need to be <strong>in</strong> there to be able to do your job.” (R28, p. 10-11)<br />
176
177<br />
Chapter 7<br />
This behavior is not dist<strong>in</strong>ctive for the doma<strong>in</strong> <strong>in</strong> question here. Thus, similar f<strong>in</strong>d<strong>in</strong>gs<br />
have been made <strong>in</strong> different doma<strong>in</strong>s. With<strong>in</strong> the doma<strong>in</strong> of eng<strong>in</strong>eer<strong>in</strong>g Bigdeli (2007)<br />
found, that develop<strong>in</strong>g knowledge and expertise was among the most important<br />
motivations to look for <strong>in</strong>formation. Further, Del Fiol et al. (2008) <strong>in</strong>cludes knowledge<br />
update as a criterion for success <strong>in</strong> their evaluation of an <strong>in</strong>formation system for<br />
cl<strong>in</strong>icians. Information needs that are not directly tied to a work task thus occur <strong>in</strong> other<br />
professional user groups apart from the one <strong>in</strong> question <strong>in</strong> the thesis.<br />
To sum up, the most frequently occurr<strong>in</strong>g <strong>in</strong>formation needs on the <strong>in</strong>tranet are<br />
verificative and conscious topical needs. Aga<strong>in</strong> we see “M<strong>in</strong>ister service” stand out<br />
with a high score on all <strong>in</strong>dicators. However, this reflects the high frequency of<br />
<strong>in</strong>formation seek<strong>in</strong>g as to the work task reported <strong>in</strong> the prior section. In terms of<br />
<strong>in</strong>formation seek<strong>in</strong>g this work task differs from the rema<strong>in</strong>der.<br />
7.4.3 Preferred metadata<br />
Each group of questions regard<strong>in</strong>g a work task <strong>in</strong> the questionnaire were closed by<br />
ask<strong>in</strong>g the respondents, which metadata they would like to be able to apply for<br />
search<strong>in</strong>g the <strong>in</strong>tranet 14 . The distribution of the respondents’ preferences appears from<br />
Table 7.9. The here it is evident that the most desired type of metadata among the<br />
employees is concerned with the topic of the document. Though the percentage po<strong>in</strong>ts<br />
is vary<strong>in</strong>g, the metadata “subject” has the highest occurrence <strong>in</strong> 16 out of 19 work tasks.<br />
The importance of a well-function<strong>in</strong>g description of the subjects of the documents is<br />
obvious (with an emphasis on well-function<strong>in</strong>g). As addressed by a focus group<br />
participant:<br />
It all depends how good you are at describ<strong>in</strong>g the subject. Which terms are<br />
used? Who divides it <strong>in</strong>to the superior subjects that can be searched for? It all depends<br />
on the quality of what is there. And the people, who uploaded it.” (R1, p. 10).<br />
The orientation towards the subject and content of documents is hardly surpris<strong>in</strong>g.<br />
What is <strong>in</strong>terest<strong>in</strong>g, though, is that requests for superior subjects (the upper level of the<br />
taxonomy) are far less extended. We <strong>in</strong>terpret it as a request for metadata support<strong>in</strong>g<br />
14 The list of preferred metadata and their probes from the questionnaire appears from Table 6.2.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 7.9 Metadata preferences distributed across work tasks<br />
Work tasks Metadata<br />
Instruction 38<br />
1 2 3 4 5 6 7 8 9 10 11 12 13<br />
(21%)<br />
Settlement: common 3<br />
(15%)<br />
Settlement: prelim<strong>in</strong>ary 21<br />
assessment of (37%)<br />
<strong>in</strong>come/personal taxes<br />
Settlement: bus<strong>in</strong>ess 17<br />
relations<br />
(30%)<br />
Settlement: corporation 4<br />
taxes<br />
(16%)<br />
Settlement: customs 1<br />
(8%)<br />
31<br />
(17%)<br />
3<br />
(15%)<br />
14<br />
(25%)<br />
10<br />
(18%)<br />
9<br />
(36%)<br />
1<br />
(8%)<br />
117<br />
(65%)<br />
12<br />
(60%)<br />
36<br />
(63%)<br />
33<br />
(58%)<br />
15<br />
(60%)<br />
4<br />
(33%)<br />
65<br />
(36%)<br />
8<br />
(40%)<br />
19<br />
(33%)<br />
21<br />
(37%)<br />
14<br />
(56%)<br />
3<br />
(25%)<br />
77<br />
(43%)<br />
10<br />
(50%)<br />
24<br />
(42%)<br />
22<br />
(39%)<br />
9<br />
(36%)<br />
1<br />
(8%)<br />
178<br />
59<br />
(33%)<br />
8<br />
(40%)<br />
12<br />
(21%)<br />
16<br />
(28%)<br />
5<br />
(20%)<br />
2<br />
(17%)<br />
20<br />
(11%)<br />
2<br />
(10%)<br />
5<br />
(9%)<br />
8<br />
(14%)<br />
2<br />
(8%)<br />
23<br />
(13%)<br />
2<br />
(10%)<br />
8<br />
(14%)<br />
4<br />
(7%)<br />
3<br />
(12%)<br />
24<br />
(13%)<br />
2<br />
(10%)<br />
9<br />
(16%)<br />
10<br />
(18%)<br />
3<br />
(12%)<br />
60<br />
(33%)<br />
8<br />
(40%)<br />
18<br />
(32%)<br />
17<br />
(30%)<br />
12<br />
(48)<br />
2<br />
(17%)<br />
53<br />
(29%)<br />
5<br />
(25%)<br />
17<br />
(30%)<br />
14<br />
(25%)<br />
11<br />
(44%)<br />
2<br />
(17%)<br />
18<br />
(10%)<br />
2<br />
(10%)<br />
7<br />
(12%)<br />
7<br />
(12%)<br />
2<br />
(8%)<br />
83<br />
(46%)<br />
5<br />
(25%)<br />
24<br />
(42%)<br />
19<br />
(33%)<br />
9<br />
(36%)<br />
5<br />
(42%)<br />
Legend: The table displays the total numbers of respondents with<strong>in</strong> a work task choos<strong>in</strong>g a certa<strong>in</strong> type of metadata. The percentages refer to<br />
percentages of all respondents with<strong>in</strong> the work task <strong>in</strong> the questionnaire. The numbers of columns represent the metadata conta<strong>in</strong>ed <strong>in</strong> the<br />
questionnaire, namely: 1) Target group, 2) Superior subject, 3) Subject, 4) Name of legal text or court decision, 5) Object, 6) Activity, 7) Geographic<br />
data, 8) Responsible <strong>in</strong>stitution or department, 9) Project, 10) Document type, 11) Document number, 12) Document ID, 13) Work task.
Work tasks Metadata<br />
Settlement: vehicles 3<br />
179<br />
Chapter 7<br />
1 2 3 4 5 6 7 8 9 10 11 12 13<br />
(17%)<br />
Settlement: estate 3<br />
(21%)<br />
Inspection: common 13<br />
(21%)<br />
Inspection: customs 4<br />
(25%)<br />
Collection 6<br />
(15%)<br />
Processes of support: 12<br />
legal support<br />
(27%)<br />
Processes of support: 4<br />
m<strong>in</strong>ister service (40%)<br />
Processes of support: 3<br />
IT service and (21%)<br />
adm<strong>in</strong>istration<br />
Processes of support: 3<br />
HR and education (21%)<br />
5<br />
(28%)<br />
3<br />
(21%)<br />
19<br />
(31%)<br />
2<br />
(13%)<br />
8<br />
(21%)<br />
16<br />
(36%)<br />
6<br />
(60%)<br />
5<br />
(36%)<br />
4<br />
(29%)<br />
9<br />
(50%)<br />
12<br />
(86%)<br />
44<br />
(72%)<br />
8<br />
(50%)<br />
23<br />
(59%)<br />
35<br />
(78%)<br />
8<br />
(80%)<br />
6<br />
(43%)<br />
9<br />
(64%)<br />
7<br />
(39%)<br />
7<br />
(50%)<br />
30<br />
(49%)<br />
5<br />
(31%)<br />
8<br />
(21%)<br />
29<br />
(64%)<br />
6<br />
(60%)<br />
4<br />
(29%)<br />
2<br />
(14%)<br />
11<br />
(61%)<br />
8<br />
(57%)<br />
24<br />
(39%)<br />
3<br />
(19%)<br />
15<br />
(39%)<br />
18<br />
(40%)<br />
6<br />
(60%)<br />
3<br />
(21%)<br />
3<br />
(21%)<br />
2<br />
(11%)<br />
10<br />
(74%)<br />
17<br />
(28%)<br />
6<br />
(38%)<br />
16<br />
(41%)<br />
12<br />
(27%)<br />
4<br />
(40%)<br />
4<br />
(29%)<br />
1<br />
(7%)<br />
2<br />
(11%)<br />
3<br />
(21%)<br />
1<br />
(2%)<br />
3<br />
(19%)<br />
3<br />
(8%)<br />
5<br />
(11%)<br />
3<br />
(30%)<br />
4<br />
(29%)<br />
2<br />
(14%)<br />
2<br />
(11%)<br />
2<br />
(14%)<br />
8<br />
(13%)<br />
3<br />
(19%)<br />
4<br />
(10%)<br />
11<br />
(24%)<br />
3<br />
(30%)<br />
3<br />
(21%)<br />
5<br />
(36%)<br />
1<br />
(6%)<br />
1<br />
(7%)<br />
11<br />
(18%)<br />
2<br />
(13%)<br />
3<br />
(8%)<br />
5<br />
(11%)<br />
5<br />
(50%)<br />
6<br />
(43%)<br />
2<br />
(14%)<br />
1<br />
(6%)<br />
7<br />
(50%)<br />
29<br />
(48%)<br />
6<br />
(38%)<br />
12<br />
(31%)<br />
24<br />
(53%)<br />
8<br />
(80%)<br />
3<br />
(21%)<br />
1<br />
(7%)<br />
1<br />
(6%)<br />
6<br />
(43%)<br />
27<br />
(44%)<br />
4<br />
(25%)<br />
7<br />
(18%)<br />
19<br />
(42%)<br />
4<br />
(40%)<br />
2<br />
(14%)<br />
2<br />
(14%)<br />
1<br />
(6%)<br />
1<br />
(7%)<br />
5<br />
(5%)<br />
1<br />
(6%)<br />
2<br />
(5%)<br />
5<br />
(11%)<br />
2<br />
(20%)<br />
2<br />
(14%)<br />
2<br />
(14%)<br />
6<br />
(33%)<br />
5<br />
(36%)<br />
23<br />
(38%)<br />
9<br />
(56%)<br />
18<br />
(46%)<br />
22<br />
(49%)<br />
6<br />
(60%)<br />
5<br />
(36%)<br />
6<br />
(43%)
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Work tasks Metadata<br />
Processes of support:<br />
<strong>in</strong>ternal activities<br />
Management and<br />
development: strategy<br />
Management and<br />
development: bus<strong>in</strong>ess<br />
management<br />
Management and<br />
development:<br />
development<br />
1 2 3 4 5 6 7 8 9 10 11 12 13<br />
2<br />
(13%)<br />
4<br />
(25%)<br />
6<br />
(43%)<br />
7<br />
(26%)<br />
5<br />
(33%)<br />
6<br />
(38%)<br />
8<br />
(57%)<br />
12<br />
(44%)<br />
9<br />
(60%)<br />
13<br />
(81%)<br />
10<br />
71%)<br />
18<br />
(67%)<br />
2<br />
(13%)<br />
3<br />
(19%)<br />
5<br />
(36%)<br />
10<br />
(37%)<br />
2<br />
(13%)<br />
6<br />
(38%)<br />
6<br />
(43%)<br />
9<br />
(33%)<br />
180<br />
2<br />
(13%)<br />
7<br />
(44%)<br />
8<br />
(57%)<br />
11<br />
(41%)<br />
3<br />
(20%)<br />
4<br />
(25%)<br />
3<br />
(21%)<br />
8<br />
(30%)<br />
3<br />
(20%)<br />
10<br />
(63%)<br />
7<br />
(50%)<br />
13<br />
(48%)<br />
4<br />
(27%)<br />
9<br />
(56%)<br />
4<br />
(29%)<br />
15<br />
(56%)<br />
3<br />
(20%)<br />
6<br />
(38%)<br />
4<br />
(29%)<br />
6<br />
(22%)<br />
1<br />
(7%)<br />
3<br />
(19%)<br />
2<br />
(14%)<br />
4<br />
(15%)<br />
3<br />
(20%)<br />
3<br />
(19%)<br />
2<br />
(14%)<br />
4<br />
(15%)<br />
9<br />
(60%)<br />
7<br />
(44%)<br />
6<br />
(43%)<br />
11<br />
(41%)<br />
Legend: The table displays the total numbers of respondents with<strong>in</strong> a work task choos<strong>in</strong>g a certa<strong>in</strong> type of metadata. The percentages refer to<br />
percentages of all respondents with<strong>in</strong> the work task <strong>in</strong> the questionnaire. The numbers of columns represent the metadata conta<strong>in</strong>ed <strong>in</strong> the<br />
questionnaire, namely: 1) Target group, 2) Superior subject, 3) Subject, 4) Name of legal text or court decision, 5) Object, 6) Activity, 7) Geographic<br />
data, 8) Responsible <strong>in</strong>stitution or department, 9) Project, 10) Document type, 11) Document number, 12) Document ID, 13) Work task.
181<br />
Chapter 7<br />
highly specific searches <strong>in</strong> a system that tends to overload the users with many<br />
irrelevant documents.<br />
Another type of metadata is frequently requested by the employees: Work task.<br />
Work task metadata is def<strong>in</strong>ed as “search<strong>in</strong>g for colleagues engaged <strong>in</strong> a particular<br />
service or task regardless of location” (from Table 6.2). Three work tasks of the<br />
questionnaire ranged it as the most important metadata (“Settlement: customs”,<br />
“Inspection: customs”, and “Processes of support: <strong>in</strong>ternal activities”). The rema<strong>in</strong>der<br />
of the work tasks ranged this particular metadata as be<strong>in</strong>g <strong>in</strong> the middle of the spectrum<br />
of importance. In the focus groups work task metadata also received quite some<br />
attention. Thus, the participants, regardless of work tasks, required improved<br />
possibilities to locate colleagues across the organization. Above we saw, that<br />
colleagues are widely used as <strong>in</strong>formation sources across the organization (see section<br />
7.3.2). The focus on work tasks as metadata <strong>in</strong> the focus groups corresponds well to the<br />
role of colleagues as <strong>in</strong>formation sources.<br />
Geographic data, document number, and document ID are found <strong>in</strong> the lower<br />
end of requirements for metadata. The metadata types were also not mentioned <strong>in</strong> the<br />
focus groups. This is <strong>in</strong>terpreted as an <strong>in</strong>dication of that the employees commonly<br />
would not be us<strong>in</strong>g the metadata actively to retrieve <strong>in</strong>formation. One th<strong>in</strong>g is<br />
surpris<strong>in</strong>g, though. Document types were ranked middle to low among the work tasks,<br />
when compared to other work tasks. In the focus groups, document types received<br />
more attention. Here they were assessed as an important type of metadata. To<br />
exemplify:<br />
“Often you go look for, well, decisions, orders or judgments <strong>in</strong> the equivalent<br />
area. And then you actively go search for judgments or orders, so it is exclusively the<br />
document type <strong>in</strong> the first place, that you know that you want. But it is not because it is<br />
the most important th<strong>in</strong>g, but it is a part of what we use <strong>in</strong> exactly <strong>in</strong> handl<strong>in</strong>g that<br />
case.” (R7, p. 8).<br />
As appears from the search test <strong>in</strong> the follow<strong>in</strong>g chapter, document types were used as<br />
an important filter here too. On this basis we must consider it an important type. In<br />
particular with<strong>in</strong> a doma<strong>in</strong> with such a variety of document types as is the case <strong>in</strong> e<strong>government</strong>.<br />
For the rema<strong>in</strong>der of the types a medium or low frequency was traced <strong>in</strong><br />
the table, suggest<strong>in</strong>g that all types could be relevant at some po<strong>in</strong>t, but that not<br />
necessarily all should be <strong>in</strong>cluded <strong>in</strong> a default search <strong>in</strong>terface.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
7.5 Summary and implications for <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
We <strong>in</strong>troduce the present section with 2 quotes from the focus groups,<br />
emphasiz<strong>in</strong>g the role of <strong>in</strong>formation <strong>in</strong> e-<strong>government</strong>:<br />
“There is a high, high frequency of <strong>in</strong>formation seek<strong>in</strong>g. It is <strong>in</strong>deed necessary<br />
and important that everyth<strong>in</strong>g go<strong>in</strong>g out from here is correct. Whether a rate or a<br />
reference for a paragraph or whatever it just needs to be <strong>in</strong> order.” (R26, p. ), and:<br />
“...you cannot memorize all the rules. That is why you go <strong>in</strong> and read them.”<br />
(R16, p. ).<br />
The quotes emphasize two important aspects of <strong>in</strong>formation use <strong>in</strong> SKAT: The<br />
<strong>in</strong>formation passed on to customers, such as citizens or <strong>government</strong>s, must be accurate.<br />
In addition the area is controlled by so many rules that it is not possible to memorize<br />
everyth<strong>in</strong>g. The purpose of the present section is to summarize the f<strong>in</strong>d<strong>in</strong>gs of the<br />
chapter and draw the implications of the seek<strong>in</strong>g behavior identified above to<br />
requirements for <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />
On the basis of the employees’ preferences for <strong>in</strong>formation sources it was<br />
reflected that the <strong>in</strong>tranet was an important source of <strong>in</strong>formation. However, several<br />
participants <strong>in</strong> the focus groups expressed dissatisfaction with the system’s ability to<br />
retrieve relevant <strong>in</strong>formation. It was also found that along with the <strong>in</strong>tranet and the<br />
<strong>in</strong>ternet, colleagues were important sources of <strong>in</strong>formation, to validate f<strong>in</strong>d<strong>in</strong>gs and to<br />
save time search<strong>in</strong>g.<br />
The frequency of <strong>in</strong>formation varied between work tasks, but the most<br />
common frequency <strong>in</strong> the questionnaire was every 3 rd or 4 th time a task was solved.<br />
This <strong>in</strong>dicates a frequent seek<strong>in</strong>g behavior and suggests that the employees are<br />
experienced <strong>in</strong>formation searchers. In terms of demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong> practice this also<br />
means that the employees are able to perform exact searches, if they have the right<br />
options available and that they are able to assess the consequences of a query. The<br />
predom<strong>in</strong>ant <strong>in</strong>formation needs among the employees were verificative and conscious<br />
topical needs. A s<strong>in</strong>gle work tasks stood out, but it had few cases and does not move<br />
the general picture. To meet these <strong>in</strong>formation needs, <strong><strong>in</strong>dex<strong>in</strong>g</strong> must be able to support<br />
verificative searches by add<strong>in</strong>g or draw<strong>in</strong>g metadata from the documents. Thus,<br />
verificative <strong>in</strong>formation needs are characterized by be<strong>in</strong>g guided by some k<strong>in</strong>d of<br />
known bibliographic <strong>in</strong>formation about the document. The conscious topical needs<br />
182
183<br />
Chapter 7<br />
should be supported by sufficient and high-quality metadata describ<strong>in</strong>g the content of<br />
documents. This is supported by the employees’ demands for metadata. However, the<br />
reduced <strong>in</strong>terest for superior subjects <strong>in</strong>dicates that subject metadata must be at a certa<strong>in</strong><br />
level of specificity <strong>in</strong> order to meet the employees’ large <strong>in</strong>sight <strong>in</strong>to their work areas.<br />
Lastly, the employees made requirements for metadata accessibility. Apart<br />
from subject metadata, work tasks were highly desired by the employees, <strong>in</strong>dicat<strong>in</strong>g the<br />
importance of be<strong>in</strong>g able to locate topic experts <strong>in</strong> the national organization. Document<br />
types did not receive much attention <strong>in</strong> the questionnaire, but <strong>in</strong> the focus groups the<br />
participants emphasized the document type as an important type of metadata. No<br />
metadata listed <strong>in</strong> the questionnaire were dismissed. However the metadata varied <strong>in</strong><br />
their importance to the employees <strong>in</strong>dicat<strong>in</strong>g, that <strong>in</strong> the particular work area <strong>in</strong> question<br />
must be explored when develop<strong>in</strong>g metadata <strong>in</strong> e-<strong>government</strong>.
8 Search test results<br />
185<br />
Chapter 8<br />
In the search test we made an experimental test <strong>in</strong> a prototype of SKATs future <strong>in</strong>tranet.<br />
Two systems were tested; system A and system B (for screen dumps: see section 6.4.1).<br />
System A represents a free-text web based search <strong>in</strong>terface with the possibility of<br />
limit<strong>in</strong>g search results as to document types and adjust<strong>in</strong>g search results by means of<br />
search operators. System B extends system A’s search facilities by offer<strong>in</strong>g a subject<br />
based categorization of search results (see section 6.4.1). In the present chapter we<br />
present the f<strong>in</strong>d<strong>in</strong>gs of the search test.<br />
8.1 The test persons<br />
32 test persons participated <strong>in</strong> the search test, 11 males and 21 females. The mean age<br />
of the test persons was 47, while the average length of service comprised approximately<br />
22 years (see Appendix 27). The age distribution corresponds to that of the population<br />
and of the survey questionnaire respondents (see Appendix 23). The majority of the test<br />
persons either had academic educations or were educated with<strong>in</strong> the organization. The<br />
same pattern appeared <strong>in</strong> the questionnaire results of the doma<strong>in</strong> study, though the share<br />
of persons with an academic education was slightly higher <strong>in</strong> the search test compared<br />
to the doma<strong>in</strong> study (see section 7.1 and 7.2). In our selection of test persons we<br />
emphasized that the test persons had a certa<strong>in</strong> frequency of use of the current <strong>in</strong>tranet.<br />
This is mirrored <strong>in</strong> the frequency of <strong>in</strong>tranet use depicted <strong>in</strong> Table 8.1. Thus, 25 of the<br />
32 participants estimate their frequency of use to be on a daily basis or even several<br />
times a day. The rema<strong>in</strong><strong>in</strong>g 7 consult the system on a weekly basis.<br />
Table 8.1 Frequency of test persons' <strong>in</strong>tranet use<br />
Frequency Percent of N=32<br />
Several times a day 18 56.3<br />
On a daily basis 7 21.9<br />
On a weekly basis 7 21.9<br />
Total 32 100.0<br />
Legend: The <strong>in</strong>tranet use frequency of the test persons participat<strong>in</strong>g <strong>in</strong> the search test. N=32.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 8.2 Rank<strong>in</strong>g of test persons' most important <strong>in</strong>formation sources<br />
Information sources Frequency Percent of N=32<br />
Intranet 29 91%<br />
Internal systems 19 60%<br />
The Internet 18 56%<br />
Electronic reference works 15 47%<br />
Colleagues (<strong>in</strong>clud<strong>in</strong>g the staff register) 13 41%<br />
Legend: The table depicts the <strong>in</strong>formation systems most frequently mentioned by the test persons<br />
as be<strong>in</strong>g among the three most important systems <strong>in</strong> terms of solv<strong>in</strong>g daily work tasks. Systems<br />
mentioned by less than 40 % of the respondents have been excluded from the table. N=32.<br />
In the recruitment questionnaire the forthcom<strong>in</strong>g test persons were asked the po<strong>in</strong>t out<br />
their three most important <strong>in</strong>formation sources from a predef<strong>in</strong>ed list. An open field<br />
provided the option of <strong>in</strong>dication additional sources.<br />
The sources important to most of the test persons are listed <strong>in</strong> Table 8.2. As<br />
emerges from the table, the vast majority of the test persons list the <strong>in</strong>tranet as be<strong>in</strong>g<br />
among their three most important sources of <strong>in</strong>formation. Subsequently, <strong>in</strong>ternal<br />
systems and the Internet follow. To sum up, the test persons are experienced users of<br />
the current <strong>in</strong>tranet and we can expect them to have a f<strong>in</strong>e idea of what can be found<br />
there and how. This also opens the possibility to compare the experimental system of<br />
the search test and the test persons’ use of it to their genu<strong>in</strong>e use of the runn<strong>in</strong>g <strong>in</strong>tranet.<br />
Three simulated search tasks (sim1, sim2, and sim3) and one genu<strong>in</strong>e<br />
<strong>in</strong>formation need (NWT) formed the basis of the search test. Sim1 was concerned with<br />
fiscal conditions when sell<strong>in</strong>g an apartment, sim2 with taxation of e-commerce, and<br />
sim3 with VAT registration of freelance teachers. To control the possible <strong>in</strong>fluence of<br />
the simulated search tasks to the test results, the test persons carried out a short<br />
evaluation every time a task had been completed. The evaluation was measured on a 5po<strong>in</strong>t<br />
Likert scale. The general scores of the evaluation appear from Table 8.3. Across<br />
all sessions the questions were assessed as just below average. The test persons rate<br />
their <strong>in</strong>sight <strong>in</strong>to the task topics at just below 3. Along with an average resemblance<br />
with daily work tasks of 2.34 and the test persons’ long average length of service it is<br />
assumed that the test persons have estimated that their knowledge of the work tasks is<br />
general, but not detailed. The average of 2.59 concern<strong>in</strong>g the difficulty of the search<br />
186
187<br />
Chapter 8<br />
Table 8.3 General evaluation of simulated search tasks <strong>in</strong> system a, system b, and total (averages)<br />
System A System B All sessions<br />
N=48 N=47 with SWT<br />
(One miss<strong>in</strong>g) N=95<br />
(One miss<strong>in</strong>g)<br />
Difficulty of search task 2.19 3.00 2.59<br />
Insight <strong>in</strong>to the topic of the search task 2.88 2.85 2.86<br />
Resemblance with daily tasks 2.40 2.28 2.34<br />
Legend: In total, 60 sessions were carried out <strong>in</strong> each system, <strong>in</strong>clud<strong>in</strong>g the genu<strong>in</strong>e work tasks.<br />
However, we did not ask the test persons to evaluate their own search tasks. Therefore: N=48,<br />
when calculated for the respective systems.<br />
tasks <strong>in</strong>dicate that the tasks have not been either too hard or too easy to solve us<strong>in</strong>g the<br />
two systems. However, here we see a fairly large distance between the level of<br />
difficulty between system A (2.19) and system B (3.00). It appears that the system have<br />
had some <strong>in</strong>fluence on the test persons’ perception of the level of difficulty.<br />
Table 8.4 is more specific and dist<strong>in</strong>guishes the task assessments between the<br />
three simulated search tasks. Here, m<strong>in</strong>or differences exist as to the <strong>in</strong>sight of the<br />
search tasks and their resemblance with the test persons genu<strong>in</strong>e work tasks. Aga<strong>in</strong><br />
system B most significantly differs from system A regard<strong>in</strong>g the level of difficulty. The<br />
largest distance between the two systems concerns sim2 (e-commerce). The assessment<br />
of sim2 <strong>in</strong> system B very well support the trouble, the test persons experienced when<br />
solv<strong>in</strong>g the task. We will explore possible explanations for the differences of<br />
assessments between system A and system B later <strong>in</strong> this chapter.<br />
Table 8.4 Evaluation of simulated search tasks specified to s<strong>in</strong>gle simulated search tasks (averages)<br />
Sim1 Sim2 Sim3 Total<br />
SysA<br />
(n=16)<br />
SysB<br />
(n=16)<br />
SysA<br />
(n=16)<br />
SysB<br />
(n=16)<br />
SysA<br />
(n=16)<br />
SysB<br />
(n=15, 1<br />
miss<strong>in</strong>g)<br />
SysA SysB<br />
Difficulty 2.13 2.44 2.44 4.06 2.00 2.47 2.19 3.00<br />
Insight 3.06 3.44 2.75 1.88 2.81 3.27 2.88 2.85<br />
Resemblance 2.31 2.25 2.38 2.00 2.50 2.60 2.40 2.28
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
8.2 Overall search<strong>in</strong>g behaviour and performance<br />
The search test provides data on the search<strong>in</strong>g behaviour <strong>in</strong> the two test systems, system<br />
A and system B. The empirical data support<strong>in</strong>g the rema<strong>in</strong>der of the chapter comprises<br />
the search log, search <strong>in</strong>terviews, and relevance assessments. In total, 128 sessions<br />
consist<strong>in</strong>g of 564 queries were undertaken by the 32 test persons, 64 sessions <strong>in</strong> each of<br />
the two systems. Table 8.5 summarizes the general f<strong>in</strong>d<strong>in</strong>gs.<br />
The average number of terms used <strong>in</strong> the queries of the test is 2.25 for system<br />
A and slightly higher <strong>in</strong> system B: 2.43. This corresponds to the average number of<br />
terms found <strong>in</strong> similar studies. For <strong>in</strong>stance Jansen, Sp<strong>in</strong>k & Saracevic (2000, p. 214)<br />
measured an average of 2.21 terms <strong>in</strong> their analysis of search logs <strong>in</strong> Excite. In a log<br />
analysis of a university OPAC, Lau & Goh (2006, p. 1322) found the average query<br />
length to be 2.86. In a cluster<strong>in</strong>g search eng<strong>in</strong>e (vivisimo.com) Koshman, Sp<strong>in</strong>k &<br />
Jansen (2006, p. 1879) found and average of 3.13, also based on log analysis. Some<br />
years later Hochstotter & Koch (2009, p. 55) identified a slightly lower average<br />
(between 1.6 and 1.8) <strong>in</strong> their study based on live tickers <strong>in</strong> a number of general and<br />
meta Web search eng<strong>in</strong>es. Lately, Lykke, Price & Delcambre (2012) showed averages<br />
of 1.5 and 2.0 <strong>in</strong> their comparative search test of a web based health portal. Lastly, <strong>in</strong> a<br />
study compar<strong>in</strong>g categorized searches with non-categorized searches, Käki (2005b, p.<br />
136) found an average of 2.10 for the former type, and 2.04 for the latter. Our f<strong>in</strong>d<strong>in</strong>gs<br />
corresponds to the f<strong>in</strong>d<strong>in</strong>gs of a highly similar study then, support<strong>in</strong>g that on average<br />
more search terms are applied <strong>in</strong> categorized queries than <strong>in</strong> non-categorized queries.<br />
Table 8.5 General f<strong>in</strong>d<strong>in</strong>gs of variables <strong>in</strong> search test<br />
Variables System A System B<br />
Sessions N=64 Sessions N=64<br />
Queries N=229 Queries N=335<br />
Number of terms <strong>in</strong> queries (average) 2.25 2.43<br />
Number of search keys <strong>in</strong> queries (average) 1,67 1.90<br />
Search filter “document type” applied (percentage) 43.2 31.6<br />
Number of sessions with reformulations (percentage) 65.6 82.8<br />
Number of reformulations <strong>in</strong> sessions (average) 2.58 4.23<br />
Query success (percentage) 30.6 21.5<br />
Session success (percentage) 89.1 84.4<br />
188
189<br />
Chapter 8<br />
As regards the average number of search keys the slightly higher average of<br />
terms <strong>in</strong> system B is reflected <strong>in</strong> the average number of search keys. Thus, system B<br />
queries conta<strong>in</strong> 1.90 search keys compared to 1.67 <strong>in</strong> system B. To compare, the<br />
differences between average number of terms and search keys <strong>in</strong> Lykke, Price &<br />
Delcambre’s (2012) study was slightly lower compared to the present results. Thus, the<br />
test persons used more terms to represent a search keys <strong>in</strong> the present test.<br />
Both systems offered filter<strong>in</strong>g by document type. The filter was used <strong>in</strong> 42.3 %<br />
of queries <strong>in</strong> system A and <strong>in</strong> 31.6 % of queries <strong>in</strong> system B. This distribution was<br />
expected as system A has fewer query specification options. Reformulations took place<br />
<strong>in</strong> both systems. However, <strong>in</strong> system A the share of sessions with reformulations was<br />
65.6 %, while 82.8 % of the sessions <strong>in</strong> system B required reformulations. In addition<br />
the average number of reformulations was notably higher <strong>in</strong> system B (4.23) compared<br />
to system A (2.58). This obviously means that an average session <strong>in</strong> system A conta<strong>in</strong>s<br />
3.58 queries while the correspond<strong>in</strong>g number for system B is 5.23. The averages are<br />
slightly above the f<strong>in</strong>d<strong>in</strong>gs of similar studies of web search eng<strong>in</strong>es and web portals. To<br />
compare Lykke, Price & Delcambre (2012) found an average of 2.5 and 3.2 queries per<br />
session. Koshman, Sp<strong>in</strong>k & Jansen’s (2006, p. 1879) average was marg<strong>in</strong>ally higher:<br />
3.37. To sum up, the present study, and <strong>in</strong> particular system B, has an <strong>in</strong>creased number<br />
of queries <strong>in</strong> sessions, when compared to similar studies.<br />
The success of sessions and queries has been summed up <strong>in</strong> Table 8.6 and<br />
Table 8.7. The total success at session level slightly benefits system A with relevant<br />
documents found <strong>in</strong> 89.1 % of all sessions. System B succeeded <strong>in</strong> 84.4 sessions. A<br />
specification as to search tasks reveals a fairly even distribution of successful sessions<br />
Table 8.6 Session success (percentages)<br />
Sim1 Sim2 Sim3 NWT Total<br />
SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />
Session 15 16 15 9 16 16 11 13 57 54<br />
succeeded<br />
(93.8) (100.0) (93.8) (56.3) (100.0 (100.0 (68.8) (81.3) (89.1) (84.4)<br />
Session 1 0 (0.0) 1 7 0 (0.0) 0 5 3 7 10<br />
failed (6.3)<br />
(6.3) (43.8) (0.0) (31.3) (18.8) (10.9) (15.6)<br />
Total 16 16 16 16 16 16 16 16 64 64
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 8.7 Query success (percentages)<br />
Query<br />
succeeded<br />
Sim1 Sim2 Sim3 NWT Total<br />
SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />
18<br />
Query failed 13<br />
Total 31<br />
(58.1)<br />
(41.9)<br />
(100.0)<br />
23<br />
(33.3)<br />
46<br />
(66.7)<br />
69<br />
(100.0)<br />
17<br />
(30.4)<br />
39<br />
(69.6)<br />
56<br />
(100.0)<br />
11<br />
(9.7)<br />
102<br />
(90.3)<br />
113<br />
(100.0)<br />
20<br />
(27.8)<br />
52<br />
(72.2)<br />
72<br />
(100.0)<br />
190<br />
22<br />
(25.6)<br />
64<br />
(74.4)<br />
86<br />
(100.0)<br />
15<br />
(21.4)<br />
55<br />
(78.6)<br />
70<br />
(100.0)<br />
16<br />
(23.9)<br />
51<br />
(76.1)<br />
67<br />
(100.0)<br />
70<br />
(30.6)<br />
159<br />
(69.4)<br />
229<br />
(100.0)<br />
72<br />
(21.5)<br />
263<br />
(78.5)<br />
335<br />
(100.0)<br />
between the two systems except <strong>in</strong> sim2. For the rema<strong>in</strong>der of the sessions, the systems<br />
performs equally, and even with a m<strong>in</strong>or advantage for system B. In sim2, 1 session<br />
failed <strong>in</strong> system A, while 7 sessions failed <strong>in</strong> system B. This may very well expla<strong>in</strong> a<br />
part of why the test persons assessed the task as markedly more difficult, as we have<br />
just seen.<br />
At query level the total number of successful searches is fairly even between<br />
the two systems. Only, <strong>in</strong> system B the total numbers of failed queries are markedly<br />
higher than <strong>in</strong> system A, particularly concern<strong>in</strong>g sim1 and sim2, and as a consequence<br />
also when compared at system level <strong>in</strong> the last two columns <strong>in</strong> Table 8.7. Thus, the<br />
performance at query level <strong>in</strong>creases the differences of performance at the benefit of<br />
system A compared to the more even overall performance at session level. In short, the<br />
two systems provide approximately the same number of successful queries. It just<br />
requires more failed queries <strong>in</strong> system B.<br />
To sum up, the overall comparison of the two test systems shows a slight<br />
advantage of system A at session level <strong>in</strong> terms of ability to retrieve relevant<br />
documents. The advantage of system A <strong>in</strong>creases, when measured at query level. In<br />
addition, system A differs from system B, as fewer terms are needed <strong>in</strong> queries, and the<br />
share and number of reformulations are lower. In the sections to follow we will explore<br />
the nature and causes of the difference of performance of the two systems. We will<br />
explore, what characterizes the search situation (section 8.2.1), the number and types of<br />
reformulations carried out (section 0), and the un<strong>in</strong>tended use of system A <strong>in</strong> system B<br />
searches (section 8.2.3).
8.2.1 The search situation<br />
191<br />
Chapter 8<br />
The search situation is characterized by different components. In the dataset<br />
we have identified four components that is guid<strong>in</strong>g this presentation of results: sessions,<br />
queries, search operators, and filter<strong>in</strong>g by document type. We present the results <strong>in</strong> that<br />
order.<br />
8.2.1.1 Sessions<br />
128 sessions were carried out <strong>in</strong> the search test, 64 <strong>in</strong> system A and 64 <strong>in</strong> system B. As<br />
appears from the total numbers, more queries are executed <strong>in</strong> system B than <strong>in</strong> system<br />
A. This is also the case at task level (see Table 8.8). Here, the average number of<br />
queries needed <strong>in</strong> order to solve a task <strong>in</strong> system differs with almost 2 queries (the last<br />
column). As regards the <strong>in</strong>dividual search tasks, the genu<strong>in</strong>e <strong>in</strong>formation need has a<br />
slightly lower average <strong>in</strong> system B compared to system A, <strong>in</strong>dicat<strong>in</strong>g that the genu<strong>in</strong>e<br />
<strong>in</strong>formation need actually benefitted from the categories. In the rema<strong>in</strong>der of the search<br />
tasks system B is above system A <strong>in</strong> terms of averages. It has already been shown that<br />
particularly sim1 and sim2 conta<strong>in</strong>ed a significantly higher share of failed queries <strong>in</strong><br />
system B compared to system A. That also appears <strong>in</strong> the present table, where sim1 and<br />
sim2 executed <strong>in</strong> system B has an average of queries twice as large as <strong>in</strong> system A. In<br />
terms of variance, the standard deviation of the two systems is practically the same. At<br />
task level, the two differs more with the highest maximum of system A <strong>in</strong> sim3 (27<br />
reformulations), and the highest maximum of system B <strong>in</strong> sim1 (18 reformulations) (see<br />
Table 1, Appendix 28). Thus, the variances with<strong>in</strong> both systems are fairly large. For<br />
sim1 the difference is caused by a very high success rate <strong>in</strong> system A. A further<br />
explanation could be that sim1 <strong>in</strong> system A was assessed as below average as regards<br />
difficulty, and that the test persons had a rather good knowledge of <strong>in</strong> advance (cf.<br />
Table 8.4)<br />
Table 8.8 Number of queries <strong>in</strong> sessions at task level (averages)<br />
System A 1.94 (n=16)<br />
System B 4.31 (n=16)<br />
Sim1 Sim2 Sim3 NWT Total<br />
3.50 (n=16) 4.50 (n=16) 4.38 (n=16) 3.58 (n=64)<br />
7.06 (n=16) 5.38 (n=16) 4.19 (n=16) 5.23 (n=64)<br />
Total 3.13 (n=32) 5.28 (n=32) 4.94 (n=32) 4.28 (n=32) 4.41 (n=128)
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 8.9 Number of queries <strong>in</strong> sessions as to success or failure (averages)<br />
Session<br />
succeeded<br />
Session<br />
failed<br />
Sim1 Sim2 Sim3 NWT Total<br />
SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />
2.00<br />
(n=15)<br />
1.00<br />
(n=1)<br />
Total 1.94<br />
(n=16)<br />
4.31<br />
(n=16)<br />
2.93<br />
(n=15)<br />
. 12.00<br />
(n=1)<br />
4.31<br />
(n=16)<br />
3.50<br />
(n=16)<br />
6.78<br />
(n=9)<br />
7.43<br />
(n=7)<br />
7.06<br />
(n=16)<br />
4.50<br />
(n=16)<br />
192<br />
5.38<br />
(n=16)<br />
3.73<br />
(n=11)<br />
- - 5.80<br />
(n=5)<br />
4.50<br />
(n=16)<br />
5.38<br />
(n=16)<br />
4.38<br />
(n=16)<br />
3.46<br />
(n=13)<br />
7.33<br />
(n=3)<br />
4.19<br />
(n=16)<br />
3.28<br />
(n=57)<br />
6.00<br />
(n=7)<br />
3.58<br />
(n=64)<br />
4.83<br />
(n=54)<br />
7.40<br />
(n=10)<br />
5.23<br />
(n=64)<br />
Table 8.9 illustrates the number of queries <strong>in</strong> sessions of success and failure<br />
respectively. The majority of the sessions are f<strong>in</strong>ished with the retrieval of one or more<br />
relevant documents. From the table it is clear that, apart from one exception (sim1,<br />
system A), sessions with fewer reformulations are more likely to succeed. In the three<br />
simulated search tasks, system A is superior to system B, as system A sessions have a<br />
lower average of queries. For the genu<strong>in</strong>e <strong>in</strong>formation need (NWT) the average is a<br />
little lower for system B sessions than for system A sessions, but not close enough to<br />
change the overall impression of system A as the most efficient system <strong>in</strong> terms of low<br />
average number of queries <strong>in</strong> sessions. To sum up, at session level test persons put<br />
more effort, <strong>in</strong> terms of the number of queries, <strong>in</strong>to sessions that rema<strong>in</strong> unsolved <strong>in</strong> the<br />
end. The average number of queries is higher <strong>in</strong> system B searches, and a session is<br />
more likely to succeed, if it is solved with fewer queries.<br />
8.2.1.2 Queries<br />
The average number of terms has already been summed up to be 2.25 for<br />
system A and 2.43 for system B. In Table 8.10 the calculations have been made up at<br />
task level. The table shows that more terms have been entered <strong>in</strong> all system B queries<br />
when compared to system A, except for sim3. Here the average number of terms is<br />
notably lower <strong>in</strong> system B than <strong>in</strong> system A. One possible reason for this could aga<strong>in</strong><br />
be found <strong>in</strong> the test persons’ assessments of the task. Thus, sim3, system B have<br />
received the absolute highest score on resemblance with the test persons’ daily work<br />
tasks. However, due to the scores of sim3, the connection between the average numbers<br />
of search terms <strong>in</strong> queries is not consistently higher <strong>in</strong> system B than <strong>in</strong> system A.<br />
However, when measured as the number of search terms entered <strong>in</strong> the respective<br />
systems, the overall impression is a superior system A.
Table 8.10 Number of search terms <strong>in</strong> queries (averages)<br />
Sim1 Sim2 Sim3 NWT Total<br />
193<br />
Chapter 8<br />
System A 2.32<br />
(n=31)<br />
2.39<br />
(n=56)<br />
2.42<br />
(n=72)<br />
1.94<br />
(n=70)<br />
2.25<br />
(N=229)<br />
System B 2.54<br />
(n=69)<br />
2.88<br />
(n=113)<br />
1.79<br />
(n=86)<br />
2.39<br />
(n=67)<br />
2.43<br />
(N=335)<br />
Total 2.47 (n=100) 2.72 (n=169) 2.08 (n=158) 2.16 (n=137) 2.36 (N=564)<br />
Table 8.11 outl<strong>in</strong>es the number of search keys used for the <strong>in</strong>dividual search<br />
tasks of the test. Overall, the average number of search terms is above the average<br />
number of search keys <strong>in</strong> queries. That means that on average each search key was<br />
represented with more than one term. The figures count examples of synonym terms for<br />
the same concept and phrases such as “when to become VAT registered”. The average<br />
number of search keys to some extent reflects the average number of search terms just<br />
identified. Thus, the average is higher <strong>in</strong> system B for sim1 and sim2, while sim3 is<br />
higher <strong>in</strong> system A. The low average <strong>in</strong> sim1, system A reveals that the search task had<br />
one very significant word, parents’ purchase (forældrekøb), which, when used as a<br />
query term, listed a highly relevant citizen booklet that most test persons assessed as<br />
relevant. As can be seen from the table, more concepts have been used <strong>in</strong> system B<br />
compared to system A. That may, at least partially be expla<strong>in</strong>ed by the test persons’<br />
lack of <strong>in</strong>sight <strong>in</strong>to the test system. In a number of cases the test persons composed a<br />
full query that were able to retrieve the documents wanted and then they were asked to<br />
filter by a category. In some of these cases categories represent<strong>in</strong>g a search key already<br />
represented by the search terms were chosen. In other cases an additional search key<br />
Table 8.11 Number of search keys <strong>in</strong> queries (averages)<br />
Sim1 Sim2 Sim3 Total<br />
System A 1.29 (n=31) 1.82 (n=56) 1.72 (n=72) 1.67 (N=159)<br />
System B 1.97 (n=69) 2.12 (n=113) 1.56 (n=86) 1.90 (N=268)<br />
Total 1.76 (n=100) 2.02 (n=169) 1.63 (n=158) 1.81 (N=427)<br />
Legend: The table reflects the figures from the simulated search tasks, as we could not perform the<br />
query search key analysis for the genu<strong>in</strong>e search tasks. That expla<strong>in</strong>s the reduced N compared to<br />
Table 8.10.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 8.12 Number of search terms <strong>in</strong> queries as to success or failure (averages)<br />
Query<br />
success<br />
Query<br />
failure<br />
Sim1 Sim2 Sim3 NWT Total<br />
SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />
2.28<br />
(n=18)<br />
2.38<br />
(n=13)<br />
Total 2.32<br />
(n=31)<br />
2.3<br />
(n=23)<br />
2.65<br />
(n=46)<br />
2.54<br />
(n=69)<br />
2.35<br />
(n=17)<br />
2.41<br />
(n=39)<br />
2.39<br />
(n=56)<br />
3.00<br />
(n=11)<br />
2.86<br />
(n=102)<br />
2.88<br />
(n=113)<br />
2.2<br />
(n=20)<br />
2.5<br />
(n=52)<br />
2.42<br />
(n=72)<br />
194<br />
1.77<br />
(n=22)<br />
1.8<br />
(n=64)<br />
1.79<br />
(n=86)<br />
1.93<br />
(n=15)<br />
1.95<br />
(n=55)<br />
1.94<br />
(n=70)<br />
2.13<br />
(n=16)<br />
2.47<br />
(n=51)<br />
2.39<br />
(n=67)<br />
2.20<br />
(n=70)<br />
2.28<br />
(n=159)<br />
2.25<br />
(n=229)<br />
2.21<br />
(n=72)<br />
2.49<br />
(n=263)<br />
2.43<br />
(n=335)<br />
was added to the query by the choice of a category. These latter cases expla<strong>in</strong> a part of<br />
the reason for the general <strong>in</strong>creased number of search keys <strong>in</strong> system B searches.<br />
In Table 8.12, the average number of search terms <strong>in</strong> successful queries, and <strong>in</strong> queries<br />
that failed is listed. With the exception of sim2, system B, queries have consistently<br />
had a higher success rate with a lower number of search terms. In terms of search keys<br />
the same overall picture is the same (see Table 8.13). Here successful queries<br />
consistently represent fewer search keys, when compared to failed queries. Thus, <strong>in</strong> the<br />
present database a query based on few search terms and search keys is more likely to<br />
retrieve relevant documents. A part of the explanation could be the relatively small<br />
database beh<strong>in</strong>d the prototype. The more search terms entered the less documents may<br />
match the search terms. This is supported by a correlation analysis show<strong>in</strong>g a<br />
statistically significant relation between the number of search terms entered and the<br />
number of hits (see table 4, Appendix 28). Further, the succession of search tasks did<br />
Table 8.13 Number of search keys <strong>in</strong> queries as to success or failure (averages)<br />
Query<br />
success<br />
Query<br />
failure<br />
Sim1 Sim2 Sim3 Total<br />
SysA SysB SysA SysB SysA SysB SysA SysB<br />
1.28<br />
(n=18)<br />
1.31<br />
(n=13)<br />
Total 1.29<br />
(n=31)<br />
1.57<br />
(n=23)<br />
2.17<br />
(n=46)<br />
1.97<br />
(n=69)<br />
1.53<br />
(n=17)<br />
1.95<br />
(n=39)<br />
1.82<br />
(n=56)<br />
2.09<br />
(n=11)<br />
2.12<br />
(n=102)<br />
2.12<br />
(n=113)<br />
1.65<br />
(n=20)<br />
1.75<br />
(n=52)<br />
1.72<br />
(n=72)<br />
1.55<br />
(n=22)<br />
1.56<br />
(n=64)<br />
1.56<br />
(n=86)<br />
1.49<br />
(n=55)<br />
1.77<br />
(n=104)<br />
1.67<br />
(N=159)<br />
1.66<br />
(n=56)<br />
1.96<br />
(n=211)<br />
1.90<br />
(N=267)
195<br />
Chapter 8<br />
not either have an effect on the number of search terms applied (see table 4, Appendix<br />
28). Thus, there were no significance as to the succession of search tasks and the<br />
number of terms entered <strong>in</strong> the query field. Another reason for the higher success of<br />
queries with fewer search terms and search keys may be the test persons’ professional<br />
background. By this is meant that the test persons <strong>in</strong> a number of queries entered<br />
specific and correct search terms that efficiently retrieved documents. When enter<strong>in</strong>g<br />
more terms or search keys at the same time the number of search results became very<br />
limited. To conclude, the experiences ga<strong>in</strong>ed dur<strong>in</strong>g the test did not change how test<br />
persons composed their queries, at least <strong>in</strong> terms <strong>in</strong> the number of terms entered. Also<br />
queries with fewer terms and concepts were superior, most likely because the test<br />
persons’ <strong>in</strong>sights <strong>in</strong>to the general topic made them enter qualified search terms, and<br />
because fewer terms and concepts did not restrict the number of results too much. More<br />
terms and concepts were applied <strong>in</strong> system B, partially because categories were added <strong>in</strong><br />
system B to queries that at times were complete without the category. When the<br />
category was added, it occasionally represented a new concept, which <strong>in</strong>creased the<br />
average number of concepts <strong>in</strong> system B.<br />
Table 8.14 Distribution of search operator <strong>in</strong> queries (percentages)<br />
Sim1 Sim2 Sim3 NWT Total<br />
SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />
Free text 16 22 27 61 21 62 38 32 102 177<br />
(51.6) (31.9) (48.2) (54.0) (29.2) (72.1) (54.3) (47.8 (44.5) (52.8)<br />
Pages 13 45 28 48 46 23 23 29 110 145<br />
conta<strong>in</strong><strong>in</strong>g<br />
all words<br />
(41.9) (65.2) (50.0) (42.5) (63.9) (26.7) (32.9) (43.3) (48.0) (43.3)<br />
This exact 2 - 1 4 5 1 5 6 13 11<br />
sentence (6.5) (1.8) (3.5) (6.9) (1.2) (7.1) (9.0) (5.7) (3.3)<br />
At least - 2 - - - - 4 - 4 2<br />
one of the<br />
words<br />
(2.9)<br />
(5.7) (1.7) (0.6)<br />
Total 31 69 56 113 72 86 70 67 229 335<br />
Legend: The AW operator retrieves documents conta<strong>in</strong><strong>in</strong>g all search terms. FT retrieve documents<br />
that conta<strong>in</strong> most, but not necessarily all, search terms. ES corresponds to apply<strong>in</strong>g quotation<br />
marks. And the OW operator retrieves documents, where at least one of the types search terms is<br />
conta<strong>in</strong>ed. In the search test all search results were ranked as to the best match (relevance).
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
8.2.1.3 Search operators<br />
In the search <strong>in</strong>terface the default sett<strong>in</strong>g of the search operator field is “Free<br />
text” (FT). Therefore, as the test persons did not have any prior experience with the test<br />
systems, it was plausible that the default FT operator would be the most frequently used<br />
<strong>in</strong> the test. Thus, users tend to use the default sett<strong>in</strong>gs put forward by the system<br />
(Markey, 2007a, p. 1077). As expected, the FT operator had a high frequency across the<br />
queries, though along with the “Pages conta<strong>in</strong><strong>in</strong>g all words” (AW) operator. The<br />
unexpected is that system A has a slightly higher frequency of FT searches, while the<br />
opposite is the case for system B. Thus, <strong>in</strong> system B, the AW operator is more frequent<br />
than the FT operator (see Table 8.14). We have previously mentioned that the AW<br />
operator is the more restrictive of the two (see section 6.4.1). In comb<strong>in</strong>ation with the<br />
mandatory categorization <strong>in</strong> system B, it is likely to result <strong>in</strong> large differences between<br />
the sizes of search results <strong>in</strong> the two systems.<br />
One explanation for the unexpected distribution of the FT and the AW<br />
operators between system A and system B is that some test persons had trouble<br />
Table 8.15 Number of search terms used with search operators <strong>in</strong> queries (averages)<br />
Free text<br />
(FT)<br />
Pages<br />
conta<strong>in</strong><strong>in</strong>g<br />
all words<br />
(AW)<br />
This exact<br />
sentence<br />
(ES)<br />
At least one<br />
of the<br />
words<br />
(OW)<br />
Sim1 Sim2 Sim3 NWT Total<br />
SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />
2.25<br />
(n=16)<br />
2.62<br />
(n=13)<br />
1.0<br />
(n=2)<br />
Total 2.32<br />
2.41<br />
(n=22)<br />
2.62<br />
- 2.0<br />
(n=31<br />
(n=45)<br />
2.30<br />
(n=27)<br />
2.46<br />
(n=28)<br />
- 3.00<br />
(n=2)<br />
2.54<br />
(n=69)<br />
(n=1)<br />
2.64<br />
(n=61)<br />
3.10<br />
(n=48)<br />
3.75<br />
(n=4)<br />
1.57<br />
(n=21)<br />
2.85<br />
(n=46)<br />
2.00<br />
(n=5)<br />
196<br />
1.58<br />
(n=62)<br />
2.17<br />
(n=23)<br />
6.00<br />
(n=1)<br />
1.76<br />
(n=38)<br />
1.83<br />
(n=23)<br />
2.40<br />
(n=5)<br />
- - - - 3.75<br />
2.39<br />
(n=56)<br />
2.88<br />
(n=113)<br />
2.42<br />
(n=72)<br />
1.79<br />
(n=86)<br />
(n=4)<br />
1.94<br />
(n=70)<br />
2.03<br />
(n=32)<br />
2.76<br />
(n=29)<br />
2.50<br />
(n=6)<br />
1.94<br />
(n=102)<br />
2.51<br />
(n=110)<br />
2.08<br />
(n=13)<br />
- 3.75<br />
2.39<br />
(n=67)<br />
(n=4)<br />
2.13<br />
(n=177)<br />
2.74<br />
(n=145)<br />
3.27<br />
(n=11)<br />
2.0<br />
(n=2)
197<br />
Chapter 8<br />
<strong>in</strong>corporat<strong>in</strong>g the two operators and separat<strong>in</strong>g them from each other. Thus, test persons<br />
<strong>in</strong>termittently wondered, why search terms did not occur <strong>in</strong> their result list, when us<strong>in</strong>g<br />
the FT operator. To exemplify:<br />
“Yes, but on the other hand it could also give… free text… then they all ought<br />
to come…” (TP15, l<strong>in</strong>e 306)<br />
In addition the test persons consistently used more search terms when apply<strong>in</strong>g the AW<br />
operator than when the FT operator was used (see Table 8.15), result<strong>in</strong>g <strong>in</strong> a gap<br />
concern<strong>in</strong>g the number of documents retrieved when us<strong>in</strong>g one of the two preferred<br />
operators. To illustrate, an average search <strong>in</strong> system A us<strong>in</strong>g the FT operator retrieved<br />
548 documents while the AW operator <strong>in</strong> the same system on average retrieved 121<br />
documents. In system B, average FT searches retrieved 25 documents, while average<br />
AW searches retrieved 10 documents (see Table 3, Appendix 28). Thus, the searches<br />
carried out <strong>in</strong> system B were significantly narrower than the broader system A searches,<br />
as the search results <strong>in</strong> addition were filtered as to the subject. In terms of Boolean<br />
logic, the addition of a category corresponds to comb<strong>in</strong><strong>in</strong>g a query with an additional<br />
term, and <strong>in</strong> some cases an additional concept, as a Boolean “AND”. Aga<strong>in</strong> it appears<br />
that some test persons have had trouble fully understand<strong>in</strong>g the comparative<br />
implications of the two search operators. Unfortunately, it has not been possible to<br />
deduce causes for the difference <strong>in</strong> search operators between the two systems <strong>in</strong> the<br />
search <strong>in</strong>terviews, as the test persons have not addressed it dur<strong>in</strong>g their searches.<br />
Lastly, the search operator field was rarely used to adjust search results <strong>in</strong><br />
reformulations (cf. Table 8.21). That <strong>in</strong>dicates that the test persons did not feel<br />
sufficiently safe us<strong>in</strong>g the operators for reformulations and <strong>in</strong>stead preferred other types<br />
of reformulations (we analyse reformulations closer below (section 0)). That the<br />
understand<strong>in</strong>g of Boolean operators challenges end users corresponds to the f<strong>in</strong>d<strong>in</strong>gs of<br />
similar studies (eg. Markey, 2007a).<br />
As it is evident from above, the use of search operators has resulted <strong>in</strong><br />
significant differences as to numbers of hits retrieved <strong>in</strong> the two test systems due to the<br />
use of search operators. However, the success of the queries <strong>in</strong> terms of search<br />
operators is needed <strong>in</strong> order to identify, which performed better. That appears from<br />
Table 8.16. The success rate of system A is higher on a general level, when compared<br />
to system B. System A queries have a slightly higher success rate <strong>in</strong> FT searches
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 8.16 Success of search operators (percentages)<br />
System A System B<br />
FT AW ES OW FT AW ES OW<br />
Success 33 30 7 - 38 32 1 (9.1) 1<br />
(32.4) (27.3) (53.8) (21.5) (22.1)<br />
(50.0)<br />
Failure 69 80 6 4 139 113 10 1<br />
(67.6) (72.7) (46.2) (100.0) (78.5) (77.9) (90.9) (50.0)<br />
Total 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0<br />
Legend: FT=Free text, AW=Pages conta<strong>in</strong><strong>in</strong>g all words, ES=This exact sentence, OW=At least one<br />
of the words.<br />
compared to AW searches. As appeared from Table 8.15, the number of search terms<br />
was lower <strong>in</strong> FT searches compared to AW searches. This suggests that system A has<br />
the best performance <strong>in</strong> open searches (us<strong>in</strong>g the FT operator and fewer terms). As<br />
regards system B, the success of FT and AW is fairly even, but below the average of<br />
system A. Apparently, the queries <strong>in</strong> system B have filtered out too many relevant hits.<br />
To conclude, the best performance was found <strong>in</strong> system A us<strong>in</strong>g the FT operator. This<br />
also <strong>in</strong>dicates a well-function<strong>in</strong>g relevance rank<strong>in</strong>g with<strong>in</strong> the system. To conclude,<br />
system A managed to perform better on the basis of the broader queries applied. The<br />
test persons had difficulties apply<strong>in</strong>g and understand<strong>in</strong>g the search operators correctly,<br />
which lead to a weaker performance of system B due to a majority of small result sets.<br />
8.2.1.4 Filter<strong>in</strong>g by metadata<br />
The test system documents were marked up as to which document type the<br />
document belongs to. That facilitates an <strong>in</strong>clusion of the particular metadata field<br />
“document type”, when queries are built up. Document type metadata are a powerful<br />
retrieval tool <strong>in</strong> a collection like the test collection with many heterogeneous document<br />
types. Thus, us<strong>in</strong>g the document type filter removes many irrelevant documents by<br />
their type. Request<strong>in</strong>g a specific document type was optional <strong>in</strong> the prototype. From<br />
the comments given by test persons dur<strong>in</strong>g their search sessions it is clear that the<br />
specification of document types is an important option <strong>in</strong> the search <strong>in</strong>terface. The<br />
possibility for specifications was not commented by the test persons as such, but it was<br />
used as a natural function <strong>in</strong> queries. In addition, document types were also mentioned<br />
as one among several important metadata <strong>in</strong> the doma<strong>in</strong> study (see section 7.4.2). In<br />
198
199<br />
Chapter 8<br />
particular legal guidances were emphasized as important to the employees’ work. For<br />
<strong>in</strong>stance:<br />
“Well, it is the common assessment guidance, the one we refer to as our Bible,<br />
you can say, where you need to go check, if it is right <strong>in</strong> the legal rules…” (TP02, l<strong>in</strong>e<br />
112-113).<br />
And the statement is supported:<br />
“And that is just the problem: Is it a bus<strong>in</strong>ess or is it not? And I know the<br />
assessment guidance so well that I know that all facets are <strong>in</strong>cluded here. There might<br />
be four or seven sub divisions to that document, but it is <strong>in</strong> there. It is just a matter of<br />
click<strong>in</strong>g further and further down until you f<strong>in</strong>d it…” (TP05, l<strong>in</strong>e 65-68).<br />
From Table 8.17 it appears that overall a larger share of system A queries used<br />
the document type filter <strong>in</strong> comparison to system B. When apply<strong>in</strong>g the document type<br />
filter, legal guidances was the preferred document type searched across both systems.<br />
That emphasizes the importance of the document type <strong>in</strong> the test persons’ daily work,<br />
which was also expressed dur<strong>in</strong>g the <strong>in</strong>terviews. Further, legal guidances are listed as<br />
one among more relevant document types <strong>in</strong> the non-topical facet for all three simulated<br />
search tasks (see Table 6.7).<br />
At search task level it becomes evident that the overall averages (the outer right<br />
columns <strong>in</strong> Table 8.17) represent a large variation. In system A the average of 56.8 %<br />
of “None chosen” <strong>in</strong>cludes the highest average of 83.9 % <strong>in</strong> sim1 and the lowest <strong>in</strong> sim3<br />
(29.2 %). The differences <strong>in</strong> system B are smaller, but still of <strong>in</strong>terest. Here the highest<br />
average is 86.6 % <strong>in</strong> the genu<strong>in</strong>e <strong>in</strong>formation need and the lowest at 55.1 % <strong>in</strong> sim1.<br />
The general lowest use of the document type filter is <strong>in</strong> sim1, system A and <strong>in</strong> the<br />
genu<strong>in</strong>e search task, system B. Here, less than a quarter of the queries applied the filter.<br />
In sim3 the biggest difference between the two systems appears. Here the filter was<br />
used <strong>in</strong> approximately 70 % of the system A queries, while only approximately a<br />
quarter of the system B queries applied the filter. We may expla<strong>in</strong> the general higher<br />
use of the document type filter <strong>in</strong> system A with the lower number of filter<strong>in</strong>g<br />
possibilities, when compared to system B.<br />
When compared to the facet analysis of the simulated search tasks (section<br />
6.4.7; Table 6.7), the largest share of correct document filter sett<strong>in</strong>gs was used <strong>in</strong> sim1.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Here legal guidances, legislation, and citizen booklets were listed as correct document<br />
types for the task. All queries used either no filter or one of the three just mentioned.<br />
The same was the case for sim 3, system B. For sim 3, system A and sim 2, a greater<br />
variety of document types were applied <strong>in</strong> the queries. Consider<strong>in</strong>g the f<strong>in</strong>d<strong>in</strong>gs<br />
Table 8.17 Document type filter used <strong>in</strong> queries (percentages)<br />
None<br />
chosen<br />
Legal<br />
guidances<br />
Bus<strong>in</strong>ess<br />
guidances<br />
Citizen<br />
booklets<br />
Legislation<br />
Internal<br />
<strong>in</strong>formation<br />
Internal<br />
guidances<br />
Bus<strong>in</strong>ess<br />
newsletters<br />
Sim1 Sim2 Sim3 NWT Total<br />
SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />
26<br />
(83.9)<br />
2<br />
(6.5)<br />
3<br />
(9.7)<br />
38<br />
(55.1)<br />
26<br />
(37.7)<br />
- -<br />
5<br />
(7.2)<br />
33<br />
(58.9)<br />
11<br />
(19.6)<br />
5<br />
(8.9)<br />
4<br />
(7.1)<br />
- - -<br />
71<br />
(62.8)<br />
16<br />
(14.2)<br />
18<br />
(15.9)<br />
-<br />
5<br />
(4.4)<br />
21<br />
(29.2)<br />
10<br />
(13.9)<br />
11<br />
(15.3)<br />
200<br />
3<br />
(4.2)<br />
13<br />
(18.1)<br />
62<br />
(72.1)<br />
8<br />
(9.3)<br />
3<br />
(3.5)<br />
-<br />
13<br />
(15.1)<br />
- - - - - -<br />
- -<br />
1<br />
(1.8)<br />
- - -<br />
1<br />
(0.9)<br />
2<br />
(1.8)<br />
5<br />
(6.9)<br />
5<br />
(6.9)<br />
50<br />
(71.4)<br />
4<br />
(5.7)<br />
4<br />
(5.7)<br />
6<br />
(8.6)<br />
58<br />
(86.6)<br />
8<br />
(11.9)<br />
- -<br />
-<br />
- -<br />
- - -<br />
- - -<br />
-<br />
130<br />
(56.8)<br />
27<br />
(11.8)<br />
16<br />
(7.0)<br />
14<br />
(6.1)<br />
13<br />
(5.7)<br />
6<br />
(2.6)<br />
6<br />
(2.6)<br />
5<br />
(2.2)<br />
229<br />
(68.4)<br />
58<br />
(17.3)<br />
Case law - - - - - - 4 (5.7) - 4 (1.7) -<br />
Others - - 2 (3.6) - 1 (1.4) - - - 3 (1.3) -<br />
Legislative<br />
materials<br />
Forms<br />
SKAT<br />
circulars<br />
21<br />
(6.3)<br />
5<br />
(1.5)<br />
18<br />
(5.4)<br />
-<br />
1<br />
(0.3)<br />
2<br />
(0.6)<br />
- - - - 2 (2.8) - - - 2 (0.9) -<br />
- - - - - - 2 (2.9)<br />
1<br />
(1.5)<br />
2 (0.9)<br />
1<br />
(0.3)<br />
- - - - 1 (1.4) - - - 1 (0.4) -<br />
Total 31 69 56 113 72 86 70 67 229 335
201<br />
Chapter 8<br />
Table 8.18 Search success for the document type filter <strong>in</strong> system A and system B queries<br />
(percentages)<br />
System A Total System B Total<br />
Success Failure system A Success Failure system B<br />
None chosen 52 (40.0) 78 (60.0) 130 59 (25.8) 170(74.2) 229<br />
(100.0)<br />
(100.0)<br />
Legal<br />
2 (7.4) 25 (92.6) 27 6 (10.3) 52 (10.3) 58<br />
guidances<br />
(100.0)<br />
(100.0)<br />
Legislation 0 13 13 2 (11.1) 16 (88.9) 18<br />
(100.0) (100.0)<br />
(100.0)<br />
Bus<strong>in</strong>ess 7 (43.8) 9 (56.3) 16 3 (14.3) 18 (85.7) 21<br />
guidances<br />
(100.0)<br />
(100.0)<br />
Bus<strong>in</strong>ess<br />
newsletters<br />
0 5 (100.0) 5 (100.0) 0 2 (100.0) 2 (100.0)<br />
Internal<br />
guidances<br />
Citizen<br />
booklets<br />
Internal<br />
<strong>in</strong>formation<br />
1 (16.7) 5 (83.3) 6 (100.0) 0 1 (100.0) 1 (100.0)<br />
6 (42.9) 8 (57.1) 14<br />
(100.0)<br />
2 (40.0) 3 (60.0) 5 (100.0)<br />
1 (16.7) 5 (8 3.3) 6 (100.0) - - -<br />
Legend: Document types that have been applied less than 5 times <strong>in</strong> total across the two systems<br />
have been omitted from the table.<br />
regard<strong>in</strong>g sessions (section 8.2.1.1), where just the same two simulated search tasks had<br />
the highest average number of reformulations it appears that a wrong choice of<br />
document types lead to more reformulations <strong>in</strong> both systems. For sim2 and sim3 the<br />
correct document types (<strong>in</strong> terms of the facet analysis) was legal guidances, legislation,<br />
and bus<strong>in</strong>ess guidances. To conclude, the use of the document filter is overall higher <strong>in</strong><br />
system A. Further, the frequency of application very much depends on the specific task<br />
at hand. And lastly, if a wrong document type has been chosen, the query is likely to<br />
result <strong>in</strong> an unsatisfactory search result and subsequent reformulation.<br />
The <strong>in</strong>fluence of the document type filter on the search success appears from<br />
Table 8.18. In Table 8.5 it became evident that system A had the highest share of<br />
successful sessions and queries. The difference between the two systems was slightly
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
higher for queries than for sessions. The general results reflect on the division as to use<br />
of the document type filter.<br />
For system A searches particularly bus<strong>in</strong>ess guidances and citizen booklets<br />
were helpful <strong>in</strong> retriev<strong>in</strong>g relevant documents. Both document types were mentioned as<br />
one among more possibilities <strong>in</strong> the aforementioned facet analysis of the simulated<br />
search tasks. Also queries that did not <strong>in</strong>clude the document type filter performed well.<br />
At least 40 % of the queries us<strong>in</strong>g these sett<strong>in</strong>gs <strong>in</strong> system A retrieved relevant<br />
documents. At a general level the percentages of successful search results are lower <strong>in</strong><br />
system B. The highest share of successful queries (25.8 %) was found <strong>in</strong> queries that<br />
did not <strong>in</strong>clude the document type filter. One exception is the document type “Citizen<br />
booklet”, but the result is less significant as it was used <strong>in</strong> 5 documents. Apart from the<br />
“Citizen booklet” the share of successful queries decreases, when the document type<br />
filter is <strong>in</strong>cluded. We have already mentioned the over specifications of queries <strong>in</strong><br />
system B. The results from Table 8.18 support the exist<strong>in</strong>g impression of over<br />
specifications <strong>in</strong> system B. Thus, us<strong>in</strong>g the document type filter <strong>in</strong> system A helps<br />
specify and reduce search results, while the filter <strong>in</strong> system B tends to limit the search<br />
results too much.<br />
8.2.2 Reformulations<br />
Reformulations are <strong>in</strong>terest<strong>in</strong>g, because they can <strong>in</strong>form us about if and how<br />
searchers try to correct a query on the basis of an unsatisfy<strong>in</strong>g search result. Previously<br />
Table 8.5 demonstrated frequent reformulations <strong>in</strong> both systems, though with a higher<br />
frequency <strong>in</strong> system B compared to system A. System B also accounts for the highest<br />
average number of reformulations. Table 8.19 and Table 8.20 specify the figures at task<br />
level. From the first of the two tables it appears that the general figures are mirrored at<br />
Table 8.19 Number of sessions with query reformulations (percentages)<br />
Reformulations<br />
No reformulations<br />
Sim1 Sim2 Sim3 NWT Total<br />
SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />
6<br />
(37.5)<br />
10<br />
(62.5)<br />
11<br />
(68.8)<br />
5<br />
(31.3)<br />
12<br />
(75.0)<br />
4<br />
(25.0)<br />
16<br />
(100.0)<br />
10<br />
(62.5)<br />
0 (0.0) 6<br />
(37.5)<br />
202<br />
15<br />
(93.8)<br />
1<br />
(6.3)<br />
14<br />
(87.5)<br />
2<br />
(12.5)<br />
11<br />
(68.8)<br />
5<br />
(31.3)<br />
42<br />
(65.6)<br />
22<br />
(34.4)<br />
Total 16 16 16 16 16 16 16 16 64 64<br />
53<br />
(82.8)<br />
11<br />
(17.2)
Table 8.20 Number of reformulations <strong>in</strong> sessions<br />
Sim1 Sim2 Sim3 NWT Total<br />
203<br />
Chapter 8<br />
SysA 2.50 (n=6) 3.33 (n=12) 5.60 (n=10) 3.86 (n=14) 3.93 (n=42)<br />
SysB 4.82 (n=11) 6.06 (n=16) 4.67 (n=15) 4.64 (n=11) 5.11 (n=53)<br />
Total 4.00 (n=17) 4.89 (n=28) 5.04 (n=25) 4.20 (n=25) 4.59 (N=95)<br />
Legend: Sessions without reformulations have been excluded from the present table, which makes<br />
N=95. That implies that of the total of 128 sessions <strong>in</strong> the search test, 95 had reformulations.<br />
session level with one exception. The genu<strong>in</strong>e search task is the only task hav<strong>in</strong>g fewer<br />
reformulations <strong>in</strong> system B than <strong>in</strong> system A. However, the number of reformulations<br />
is still high. Further, sim1, system A is the only example of a task where the number of<br />
sessions without reformulations surpasses the number of sessions with reformulations.<br />
In sessions with reformulations the general average number of reformulations was 4.59,<br />
a little less for system A searches and a little above for system B searches (see Table<br />
8.20, outer right column, bottom cell). From the table below it is apparent that sim3 had<br />
a higher average of reformulations <strong>in</strong> system A. For the rema<strong>in</strong>der of the tasks, system<br />
B had the highest average number of reformulations. As concluded <strong>in</strong> section 8.2.1.1 it<br />
required more queries to retrieve relevant documents <strong>in</strong> system B.<br />
Types of reformulations add to our understand<strong>in</strong>g of the search moves carried<br />
out by the test persons. We have analysed reformulations as to whether the category,<br />
the search terms, the document type, or the search operator were changed, if several<br />
parameters were changed, or if no reformulation occurred (mostly <strong>in</strong> the first query of a<br />
session) (see Table 8.21). As mentioned before, changes of search operators are rare <strong>in</strong><br />
both systems. In system A the overall preferred reformulation is a change of search<br />
terms. Next follows a change of the document type and simultaneous change of two or<br />
more parameters. As discussed <strong>in</strong> section 8.2.1.3, the search operator is rarely used as a<br />
s<strong>in</strong>gle reformulation move. Compared to system B, the use of the document type filter<br />
is far more used <strong>in</strong> system A, most likely because this is the only possible way of<br />
reduc<strong>in</strong>g search results <strong>in</strong> system A apart from chang<strong>in</strong>g the search terms or the search<br />
operator. Thus, the test persons actually used the available options for modification of<br />
their search results. Further the regular use of the document type filter emphasizes the<br />
importance and relevance of the filter.<br />
In system B the preferred reformulation was a change of categories, closely<br />
followed by a comb<strong>in</strong>ation of two or more parameters. Next <strong>in</strong> terms of frequency
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 8.21 Types of reformulations for all queries (percentages)<br />
No<br />
reformulations<br />
Sim1 Sim2 Sim3 NWT Total<br />
SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />
16<br />
(51.6)<br />
16<br />
(23.2)<br />
Category - 15<br />
(21.7)<br />
Query terms 11<br />
(35,5)<br />
15<br />
(21.7)<br />
16<br />
(28.6)<br />
15<br />
(13.3)<br />
- 46<br />
(40.7)<br />
28<br />
(50.0)<br />
Document type 4<br />
(7.1)<br />
Search<br />
operators<br />
>1 types<br />
simultaneously<br />
1<br />
(3.2)<br />
3<br />
(9.7)<br />
Total 31<br />
(100)<br />
1<br />
(1.4)<br />
22<br />
(31.9)<br />
69<br />
(100)<br />
3<br />
(5.4)<br />
5<br />
(8.9)<br />
56<br />
(100)<br />
6<br />
(5.3)<br />
6<br />
(5.3)<br />
2<br />
(1.8)<br />
38<br />
(33.6)<br />
113<br />
(100)<br />
204<br />
20<br />
(27.8)<br />
15<br />
(17.4)<br />
- 41<br />
(47.7)<br />
23<br />
(31.9)<br />
18<br />
(25.0)<br />
8<br />
(9.3)<br />
1<br />
(1.2)<br />
17<br />
(24.3)<br />
16<br />
(23.9)<br />
- 12<br />
(17.9)<br />
35<br />
(50)<br />
6<br />
(8.6)<br />
- 4<br />
(5.7)<br />
11<br />
(15.3)<br />
72<br />
(100)<br />
21<br />
(24.4)<br />
86<br />
(100)<br />
8<br />
(11.4)<br />
70<br />
(100)<br />
18<br />
(26.9)<br />
1<br />
(1.5)<br />
2<br />
(3.0)<br />
18<br />
(26.9)<br />
67<br />
(100)<br />
69<br />
(30.1)<br />
62<br />
(18.5)<br />
- 114<br />
(34.0)<br />
97<br />
(42.4)<br />
28<br />
(12.2)<br />
8<br />
(3.5)<br />
27<br />
(11.8)<br />
229<br />
(100)<br />
47<br />
(14.0)<br />
8<br />
(2.4)<br />
5<br />
(1.5)<br />
99<br />
(29.6)<br />
335<br />
(100)<br />
followed a change of query terms, while document type and search operators were<br />
rarely used as query modifiers. Here it is evident that categories are important, which is<br />
to be expected as they were mandatory <strong>in</strong> system B. In addition categories were to a<br />
large extent comb<strong>in</strong>ed with other parameters. Most commonly a change of category<br />
was comb<strong>in</strong>ed with a change of search terms (see table 6, Appendix 28). This reflects<br />
the design of the system, where only categories with content were shown to the<br />
searchers. Thus, when search terms were changed, a change of available categories was<br />
likely to occur, as the categories reflected the list of retrieved documents. This also<br />
expla<strong>in</strong>s the importance of a change of query terms as a reformulation.<br />
The division of search tasks <strong>in</strong> Table 8.21 shows some <strong>in</strong>dividual<br />
characteristics. One characteristic is the use of categories across system B queries.<br />
Thus, categories were used approximately twice as much <strong>in</strong> sim2 and sim3 compared to<br />
sim1 and the genu<strong>in</strong>e work task. As the categories were not comb<strong>in</strong>ed with other<br />
modification tools, the number refers to queries, where the test persons have clicked<br />
different categories on the basis of the same query terms to f<strong>in</strong>d relevant documents.
Table 8.22 Query success on the basis of types of reformulations (percentages)<br />
205<br />
Chapter 8<br />
System A Total System B Total<br />
Success Failure system A Success Failure system B<br />
Category - - - 24 90 114<br />
(21.1) (78.9) (100.0)<br />
Query terms 22 75 97 5 42 47<br />
(22.7) (77.3) (100.0) (10.6) (89.4) (100.0)<br />
Document type 9 19 28 1 7 8<br />
(32.1) (67.9) (100.0) (12.5) (87.5) (100.0)<br />
Search operators 1 7 8 1 4 5<br />
(12.5) (87.5) (100.0) (20.0) (80.0) (100.0)<br />
>1 types 11 16 27 19 80 99<br />
simultaneously (40.7) (59.3) (100.0) (19.2) (80.8) (100.0)<br />
The success of the respective types of reformulations has been summed up <strong>in</strong><br />
Table 8.22. Overall, system A has a higher share of successful reformulations when<br />
compared to system B. At the level of types of reformulations the best performance is<br />
achieved <strong>in</strong> system A by us<strong>in</strong>g a comb<strong>in</strong>ation of terms. Here, about 40 % of queries<br />
manage to retrieve relevant documents. Next follows a change of sett<strong>in</strong>gs of the<br />
document type filter. In system B the variance of performance were smaller than <strong>in</strong><br />
system A. Here the test persons had less success <strong>in</strong> improv<strong>in</strong>g their outputs by<br />
chang<strong>in</strong>g query terms and search operators, mean<strong>in</strong>g that the two most frequent<br />
reformulation types accounted for fairly the same share of successful queries.<br />
Categories, search operators, and a comb<strong>in</strong>ation of query modifiers had the best<br />
performance with<strong>in</strong> the system, but the performance was below the percentages ga<strong>in</strong>ed<br />
<strong>in</strong> system A. Thus, with<strong>in</strong> system B we may conclude that reformulations based on a<br />
change of categories perform better <strong>in</strong> comparison with the rema<strong>in</strong>der modification<br />
tools. System A reformulations were most successful, when they consisted of a<br />
comb<strong>in</strong>ation of more parameters simultaneously. However, the share of successful<br />
reformulations leaves room for improvement <strong>in</strong> both systems.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
8.2.3 Comb<strong>in</strong>ed system B sessions and queries<br />
Dur<strong>in</strong>g the course of the search test, test persons occasionally ended up assess<strong>in</strong>g<br />
documents before choos<strong>in</strong>g a category <strong>in</strong> system B queries. The behavior had different<br />
causes. One cause was the speed of the system. Thus, <strong>in</strong> the time wait<strong>in</strong>g for the<br />
system to categorize search results, some test persons began to review the documents<br />
found on the basis of the <strong>in</strong>itial query. On other occasions the test persons actually saw<br />
the document they were look<strong>in</strong>g for <strong>in</strong> the results list before even decid<strong>in</strong>g on a category<br />
to reduce search results by, and ended up assess<strong>in</strong>g the <strong>in</strong>itial search results without<br />
filter<strong>in</strong>g by a category. We denote these searches as comb<strong>in</strong>ed system B searches. The<br />
follow<strong>in</strong>g quote serves as an illustration of comb<strong>in</strong>ed system B searches:<br />
“But the first time I searched, I got an e-commerce handbook. I would have<br />
preferred that to go<strong>in</strong>g down there [“down there” refers to the categorization w<strong>in</strong>dow<br />
on the right hand side of the screen]” (TP10, l<strong>in</strong>e 10-11).<br />
In several cases when a highly relevant document had been discovered before the choice<br />
of a category <strong>in</strong> system B, the test persons could not locate the document <strong>in</strong> the<br />
categories, which occasionally led to frustrations. To exemplify:<br />
”It is just as bad, because it says “Arrears”. And “Employers”, and it is<br />
neither of them. So let’s see about “Employers”… Because it says “Employers and Ataxes”<br />
And it is withhold by the A-taxes, just like our employers withhold our taxes. I<br />
simply can’t f<strong>in</strong>d it. I know it is <strong>in</strong> there. But on the basis of this, I can’t get <strong>in</strong> there.<br />
Because when I know where it is at, I would go directly for it <strong>in</strong>stead.” (TP05, l<strong>in</strong>e 113-<br />
117).<br />
A third type of behavior also triggered comb<strong>in</strong>ed system B queries. It has previously<br />
been observed that system B searches tended to be narrow. When the <strong>in</strong>itial query<br />
resulted <strong>in</strong> very few search results, it did not seem natural to the test persons to further<br />
reduce an already limited search results. Some test persons undertook the<br />
categorization despite the few results, while others omitted the categorization and<br />
assessed the results retrieved on the basis of the rema<strong>in</strong><strong>in</strong>g search possibilities.<br />
“It says just that... Well, the costs to the European border should be <strong>in</strong>cluded<br />
<strong>in</strong> the customs value. The other one regard<strong>in</strong>g transportation, I can see that it is<br />
206
207<br />
Chapter 8<br />
expla<strong>in</strong>ed with great precision. But <strong>in</strong> this case I did not search for “Customs” down<br />
here [<strong>in</strong> the categories]. I got it by search<strong>in</strong>g for freight and customs value and “pages<br />
with all words”. And then I got the customs guidance, which is also the one referr<strong>in</strong>g to<br />
the customs codes treat<strong>in</strong>g the rules about the amount of carriage to add. So this<br />
[document] is a three then. But I didn’t get it by search<strong>in</strong>g for “Bus<strong>in</strong>ess imports” or<br />
“Shipp<strong>in</strong>g” or “Exports” [referr<strong>in</strong>g to categories]” (TP32, l<strong>in</strong>e 295-301)<br />
The quote illustrates, <strong>in</strong> a comb<strong>in</strong>ed system B search with just two retrieval results, how<br />
the test person ends up assess<strong>in</strong>g the documents retrieved without categorization.<br />
The comb<strong>in</strong>ed system B queries and sessions were coded as system B searches<br />
<strong>in</strong>asmuch as the test persons had access to the taxonomy and could be <strong>in</strong>fluenced by it.<br />
In methodical respect, an overview of the extent of the queries must be provided<br />
though. To be able to do this, additional codes were added to enable separation from<br />
the correct system B queries. Report<strong>in</strong>g on the extent of comb<strong>in</strong>ed system B sessions is<br />
the purpose of the present section. Table 8.23 lists the share of comb<strong>in</strong>ed system B<br />
sessions. The table shows that about 60 % of the system B sessions conta<strong>in</strong>ed one or<br />
more queries omitt<strong>in</strong>g categories. It is also evident from the table that approximately 60<br />
% of the successful sessions <strong>in</strong> system B had at least one query that did not <strong>in</strong>clude the<br />
choice of a category. The extent of sessions that to some degree pass over the<br />
categorization is substantial then.<br />
Table 8.24 enlarge on comb<strong>in</strong>ed system B sessions. The table shows the<br />
system deliver<strong>in</strong>g successful results for queries conta<strong>in</strong>ed <strong>in</strong> sessions. In that way the<br />
Table 8.23 Sessions carried out <strong>in</strong> system B, or <strong>in</strong> a comb<strong>in</strong>ation of System B and system A:<br />
Frequency and success (percentages)<br />
Number of sessions <strong>in</strong> Number of successful<br />
system B<br />
sessions system B<br />
System B 26 (40.6) 22 (40.7)<br />
Comb<strong>in</strong>ed system B sessions 38 (59.4) 32 (59.3)<br />
Total 64 (100.0) 54 (100.0)<br />
Legend: System B denotes sessions, that have been carried out <strong>in</strong> system B exclusively. “Comb<strong>in</strong>ed<br />
system B sessions” refers to the sessions that should have been carried out <strong>in</strong> system B, but where<br />
test persons have assessed the relevance of documents found <strong>in</strong> system A and <strong>in</strong> system B.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Table 8.24 System of successful queries <strong>in</strong> comb<strong>in</strong>ed system B sessions<br />
208<br />
Frequency Percent<br />
Valid Task not solved 6 15.8<br />
System A 13 34.2<br />
System B 15 39.5<br />
Both systems applied 4 10.5<br />
Total 38 100.0<br />
Legend: The table lists the systems that have provided documents with a relevance score of 2 or 3 <strong>in</strong><br />
comb<strong>in</strong>ed system B sessions. That expla<strong>in</strong>s why N=38.<br />
table address the sessions based on a comb<strong>in</strong>ation of the two test systems. It is<br />
identified that though a comb<strong>in</strong>ed system B session have <strong>in</strong>cluded queries conducted <strong>in</strong><br />
system A and system B, not both systems have necessarily provided useful search<br />
results. The share of successful sessions is fairly even between the two systems. 13<br />
sessions were solved by omitt<strong>in</strong>g categories, 15 sessions had success <strong>in</strong> <strong>in</strong>clud<strong>in</strong>g the<br />
categories <strong>in</strong> their queries. Only 4 sessions found relevant documents by means of both<br />
systems. This means that at session level the share of success is fairly even between the<br />
two systems. It also means that the test persons may have omitted the categorization <strong>in</strong><br />
some queries of a session, but it may still be by means of categorization that relevant<br />
documents are found.<br />
Table 8.25 extends the prior table and present the share of successes at query<br />
level. The table present all queries carried out <strong>in</strong> system B; both dist<strong>in</strong>ct system B<br />
queries and comb<strong>in</strong>ed system B queries. Though the test persons <strong>in</strong> a number of cases<br />
found the categorization irrelevant, it was still used <strong>in</strong> approximately two thirds of the<br />
queries (see outer right hand column). In addition, when calculated <strong>in</strong> terms of the<br />
Table 8.25 System B queries: Frequency of category use and query success (percentages)<br />
Success Failure Total<br />
Queries with categories 52 (24.2) 163 (75.8) 215 (100.0)<br />
Queries without categories 20 (16.7) 100 (83.3) 120 (100.0)<br />
Total 72 263 335<br />
Legend: The table conta<strong>in</strong>s all queries processed <strong>in</strong> system B, both regular system B queries and<br />
comb<strong>in</strong>ed system B queries (N=335).
209<br />
Chapter 8<br />
share of successful queries, queries <strong>in</strong>clud<strong>in</strong>g categories had a better performance (24.2<br />
% of queries with success) than queries omitt<strong>in</strong>g categorization (16.7 % of queries were<br />
successful). Summ<strong>in</strong>g up on comb<strong>in</strong>ed system B searches, more than half of system B<br />
sessions <strong>in</strong>cluded system A queries to some extent. However, at query level for all<br />
system B queries, queries <strong>in</strong>clud<strong>in</strong>g a category had a larger chance of succeed<strong>in</strong>g <strong>in</strong><br />
comparison with queries that basically corresponded to system A queries.<br />
In the post search <strong>in</strong>terviews the test persons were asked to assess system B<br />
(see <strong>in</strong>terview guide <strong>in</strong> Appendix 19). In the responses we found answers to, when the<br />
categorization was useful, when it was not. The answers are analysed <strong>in</strong> the present<br />
section <strong>in</strong> order to elaborate further on the results ga<strong>in</strong>ed from the search log presented<br />
above.<br />
There was an overall agreement between the test persons that the<br />
categorization was ma<strong>in</strong>ly useful, when they had a large set of results. TP21 said on the<br />
basis of a query with 14 results:<br />
“It did not help me so much there, because the query didn’t have that many<br />
results. It was possible to cope with the documents there, whether the categorization<br />
had been there or not. Only 14 documents were retrieved. You could cope with that. It<br />
is ma<strong>in</strong>ly helpful, when you get large results, a thousand documents or so” (TP21, l<strong>in</strong>e<br />
257-260)<br />
When the categorization was useful <strong>in</strong> terms of retrieval set sizes varied. Some<br />
mentioned 40 documents, others far more like TP21. Categorization was also found<br />
useful <strong>in</strong> generat<strong>in</strong>g new perspectives on the composition of a query and for<br />
understand<strong>in</strong>g the facets of the search task. That supports the decision of cod<strong>in</strong>g<br />
comb<strong>in</strong>ed system B queries and sessions as system B queries and sessions <strong>in</strong> the overall<br />
cod<strong>in</strong>g of the search log. One example is TP02, who would have liked to have access to<br />
the categorization <strong>in</strong> a system A session:<br />
“At the end I would have liked to be able to go over there [<strong>in</strong>to the<br />
categorization], because no matter what I did, I could not f<strong>in</strong>d anyth<strong>in</strong>g. And then I need<br />
somewhere else to search, where I have the option of see<strong>in</strong>g other sub-topics, <strong>in</strong> order<br />
to perhaps access it that way.” (TP02, l<strong>in</strong>e 625-633).<br />
TP09 supports the statement of TP02 <strong>in</strong> discuss<strong>in</strong>g a system B session:
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
“It worked well there, because suddenly I found a pr<strong>in</strong>cipal topic that I could<br />
click on. And that gave me that… Hey! Yes! That has to do with company taxation. So it<br />
also helped me th<strong>in</strong>k<strong>in</strong>g what this is at all” (TP09, l<strong>in</strong>e 553-555)<br />
The f<strong>in</strong>d<strong>in</strong>gs confirms Käki’s f<strong>in</strong>d<strong>in</strong>gs (though based on extracted categorization, see<br />
section 5.4.1.5), that when “…the orig<strong>in</strong>al query was vague, broad, general, or<br />
conta<strong>in</strong>ed words that have multiple mean<strong>in</strong>gs...” (Käki, 2005b, p. 138). Still, the test<br />
persons of the present search test discussed, if the categorization was more useful to<br />
people with some or no <strong>in</strong>sight <strong>in</strong>to the topic of the tasks. TP06 knew what to look for<br />
<strong>in</strong> one of the tasks:<br />
“I knew that if I was to look for someth<strong>in</strong>g about the taxation, then I would<br />
also know someth<strong>in</strong>g about <strong>in</strong>dependent bus<strong>in</strong>esses. And then I could go <strong>in</strong> there faster.<br />
So I knew that I should choose “Personal <strong>in</strong>comes” over “Capital <strong>in</strong>come” [examples<br />
of categories]. I know the tax rules. So it is easier to choose between the categories,<br />
when the answer is known <strong>in</strong> advance” (TP06, l<strong>in</strong>e 392-395)<br />
TP20 on the other hand did not f<strong>in</strong>d much help <strong>in</strong> the categorization:<br />
“But I don’t know, if I would ever start go<strong>in</strong>g through all this [the categories]. I<br />
th<strong>in</strong>k it takes more time, because I don’t know what is beh<strong>in</strong>d. If I was a specialist <strong>in</strong><br />
SKAT and knew all about company tax settlements or the like, then it [the<br />
categorization] might be perfect for me. Because then I would know that I can go <strong>in</strong><br />
there exactly, click that, and get the documents out. But I don’t know if it would forget<br />
about some documents that I need, if it limits the results too much”. (TP20, l<strong>in</strong>e 339-<br />
344)<br />
TP24 sums up the usefulness for both users with large knowledge on the task topic and<br />
users with less knowledge:<br />
If I know what I am look<strong>in</strong>g for, or at least th<strong>in</strong>k I know where to go [<strong>in</strong> the<br />
categories], then it is really good. But when I don’t know, it might also be good,<br />
because you get to try out different keywords [taxonomy terms]. But if you have the<br />
wrong keyword, you will def<strong>in</strong>itely not f<strong>in</strong>d it that way.“ (TP24, l<strong>in</strong>e 320-323)<br />
210
211<br />
Chapter 8<br />
The reason for the difference of op<strong>in</strong>ion may be due to lack of <strong>in</strong>sight, <strong>in</strong>to the<br />
functionalities of the system, and <strong>in</strong>to the structure and content of the taxonomy. Thus,<br />
a considerable number of the test persons expressed lack of experience with the test<br />
system as an important reason, if they experienced difficulties locat<strong>in</strong>g relevant<br />
documents. The difficulties can be read <strong>in</strong> Table 8.21 above. Here 34 % of all system<br />
B reformulations consist of chang<strong>in</strong>g the category, mean<strong>in</strong>g that test persons clicked<br />
around between categories with no simultaneous changes of the rema<strong>in</strong>der of the search<br />
options. In other cases the trouble experienced by the test persons were caused by<br />
apparently curious categorizations offered by system B. One example was the presence<br />
of the taxonomy term “Tonnage taxes” <strong>in</strong> a query regard<strong>in</strong>g property ga<strong>in</strong> taxes (TP13).<br />
We have already mentioned the vary<strong>in</strong>g sizes of the documents of the collection and the<br />
importance of the document type directions to the employees, a very large document<br />
type. The f<strong>in</strong>d<strong>in</strong>g suggests that <strong>in</strong> collections with large documents, the documents<br />
should be <strong>in</strong>dexed <strong>in</strong> smaller units to obta<strong>in</strong> precision of search results. On the other<br />
side, when perform<strong>in</strong>g categorization of search results that are already very limited as<br />
was the case <strong>in</strong> many system B searches, the results of the categorization may also be<br />
skewed. Be it lack of experience with the categorization <strong>in</strong> system B, too narrow<br />
queries or odd suggestions for categories, we consider all three as explanations for the<br />
general <strong>in</strong>creased number of queries <strong>in</strong> system B sessions described <strong>in</strong> section 8.2.1.1.<br />
TP14 summarizes the discussion by say<strong>in</strong>g:<br />
Once you beg<strong>in</strong> to get an idea, what the categories are, what they stand for…<br />
Then you fumble, until you f<strong>in</strong>d out what it is. Are there more roads lead<strong>in</strong>g to Rome,<br />
or which is the fastest, or… Well, it is an adaptation with some th<strong>in</strong>gs. What is the<br />
wisest th<strong>in</strong>g to do…” (TP14, l<strong>in</strong>e 493-495)<br />
8.3 Summary and performance implications for future <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e<strong>government</strong><br />
The purpose of the present chapter was to answer research question 2 and 3 regard<strong>in</strong>g<br />
the comparative performance of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> terms of extracted versus<br />
assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong>, and the implications for future <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es <strong>in</strong> e-<strong>government</strong>.<br />
Above the focus has been on research question 2 and the results of the search test. In<br />
this summary we will unify the conclusions drawn along the respective sections of the
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
chapter <strong>in</strong> order to be able state the implications of the results for e-<strong>government</strong><br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es (research question 2.10). The general figures of the test (section<br />
8.2) demonstrated a better performance of system A <strong>in</strong> terms of fewer terms and<br />
concepts <strong>in</strong> queries, fewer sessions with reformulations, fewer queries <strong>in</strong> sessions with<br />
reformulations, and a higher share of success <strong>in</strong> sessions and queries. However a more<br />
detailed analysis of figures and <strong>in</strong>terviews provided a more differentiated picture. Thus,<br />
when counted as to search tasks system B were equal to or above <strong>in</strong> some tasks. This<br />
was the case for the genu<strong>in</strong>e work tasks <strong>in</strong> terms of session success, query success, the<br />
number of queries <strong>in</strong> sessions and the number of queries <strong>in</strong> successful sessions. In<br />
sim3, system B outperformed system A <strong>in</strong> terms of a lower average number of search<br />
terms and search keys <strong>in</strong> queries, both as regards total numbers and successful queries.<br />
The analysis also detected a higher use of the document type filter <strong>in</strong> system A, which<br />
was expla<strong>in</strong>ed by the reduced number of query composition tools <strong>in</strong> system A. In<br />
addition the search log discovered that above half of system B sessions <strong>in</strong>cluded one or<br />
more queries omitt<strong>in</strong>g categorization. The search log and the search <strong>in</strong>terviews revealed<br />
different reasons for the omissions: When search results were too small, if a relevant<br />
document was discovered at the list of results while wait<strong>in</strong>g for the system to categorize<br />
results, or related to the previous: if a highly relevant document was found among the<br />
first results before a category was chosen.<br />
Different causes were found for the lower general performance of system B.<br />
One reason was the test persons’ challenges of handl<strong>in</strong>g the search operators available<br />
<strong>in</strong> the prototype. Significantly more restrictions were applied <strong>in</strong> system B queries,<br />
result<strong>in</strong>g <strong>in</strong> at times very few search results, and also reduc<strong>in</strong>g the assessment of the<br />
documents retrieved. Another reason was found <strong>in</strong> the post search <strong>in</strong>terviews. Here<br />
lack of experience with the categorization features of system B was a frequent<br />
explanation for the difficulties experienced. Furthermore, some test persons found it<br />
difficult to identify by the label, which documents were conta<strong>in</strong>ed <strong>in</strong> the respective<br />
categories. The f<strong>in</strong>d<strong>in</strong>gs emphasize the importance of users’ familiarity with the design<br />
and functionality of retrieval systems. The outcome of the difficulties could be detected<br />
<strong>in</strong> the types of reformulations carried out. To expla<strong>in</strong>, about one third of all queries<br />
carried out <strong>in</strong> system B were reformulations based on a change of categories alone.<br />
Opposite understand<strong>in</strong>gs also existed among the test persons though. Overall,<br />
categorization was useful, when there was a certa<strong>in</strong> amount of documents to categorize.<br />
At few results it was easier to look through the documents manually. System B was<br />
also useful, when the employees had some knowledge of the search task topic. Then it<br />
212
213<br />
Chapter 8<br />
was considered easier to assess the relevance of the categories, as the labels of the<br />
categories made sense. However, the categorization of system B was also beneficial,<br />
when test persons had a limited knowledge of the search task at hand. In those cases<br />
categories helped the test persons discover and understand facets conta<strong>in</strong>ed <strong>in</strong> the task.<br />
Here it is important to make clear that limited knowledge should be understood as<br />
generalist knowledge of the organization topics.<br />
As appears from the conclud<strong>in</strong>g remarks the use and omission of categorization<br />
<strong>in</strong> solv<strong>in</strong>g search tasks is not the same to all users despite that they may f<strong>in</strong>d themselves<br />
with<strong>in</strong> the same doma<strong>in</strong>, as with the case study carried out <strong>in</strong> the thesis. On the basis of<br />
the search test it is concluded that at times free text <strong><strong>in</strong>dex<strong>in</strong>g</strong> as represented <strong>in</strong> system A<br />
is preferred by users. This is <strong>in</strong> particular the case, when they know precisely what to<br />
look for. In these situations metadata like the type of the document is helpful <strong>in</strong><br />
compos<strong>in</strong>g queries of high precision. When few documents of high precision are the<br />
result of a query, the employees prefer search<strong>in</strong>g by metadata. What has also become<br />
evident dur<strong>in</strong>g the test is the employees’ emphasis of document types, both concern<strong>in</strong>g<br />
queries and when assess<strong>in</strong>g query outputs. The employees had a large <strong>in</strong>sight <strong>in</strong>to the<br />
range of documents at the <strong>in</strong>tranet, as specific document types often were the outcome<br />
of a work process. That stresses the importance of metadata <strong>in</strong> e-<strong>government</strong>, when<br />
compos<strong>in</strong>g queries, but also <strong>in</strong> document snippets of search results.<br />
The overall implication of the search test for <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es <strong>in</strong> e<strong>government</strong><br />
is that both extracted and assigned automatic types should be present <strong>in</strong><br />
professional-<strong>government</strong>. As appeared from chapter 5, categorization has primarily<br />
been tested on the www, but as demonstrated <strong>in</strong> the search test, also smaller and more<br />
specialized systems may benefit from the <strong><strong>in</strong>dex<strong>in</strong>g</strong> approach. From the doma<strong>in</strong> study<br />
we learned that verificative <strong>in</strong>formation needs and conscious topical <strong>in</strong>formation needs<br />
are prevalent among e-<strong>government</strong> employees. Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> terms of free text<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> has proven to be useful particular to the verificative type of <strong>in</strong>formation needs,<br />
while categorization was used more, when the test persons need ideas for search terms<br />
or perspectives of the work task at hand, that is, the conscious topical needs.<br />
Assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the form of categorization assisted the users <strong>in</strong> their<br />
<strong>in</strong>formation seek<strong>in</strong>g, when they needed ideas for query reformulation or when they had<br />
difficulties <strong>in</strong>terpret<strong>in</strong>g the concepts conta<strong>in</strong>ed <strong>in</strong> a search task. In future e-<strong>government</strong><br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> both <strong><strong>in</strong>dex<strong>in</strong>g</strong> approaches should be represented <strong>in</strong> order to meet the diversity<br />
of <strong>in</strong>formation need types identified <strong>in</strong> the doma<strong>in</strong> study. The search test has also<br />
emphasized the importance of users’ familiarity with the KOS (<strong>in</strong> this case the
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
taxonomy). When the employees didn’t correspond with the categories, it easier to<br />
manually go through an number of results, than it was to click a number of categories to<br />
f<strong>in</strong>d someth<strong>in</strong>g relevant to the task at hand.<br />
214
9 Conclusion and recommendations for future work<br />
215<br />
Chapter 9<br />
The purpose of the thesis was to <strong>in</strong>vestigate if and how automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can improve<br />
professional e-<strong>government</strong> users’ access to digitalized, work based <strong>in</strong>formation. To do<br />
this, the preced<strong>in</strong>g chapters have reviewed, <strong>in</strong>vestigated, and analysed <strong><strong>in</strong>dex<strong>in</strong>g</strong>,<br />
<strong>in</strong>formation seek<strong>in</strong>g and search<strong>in</strong>g <strong>in</strong> e-<strong>government</strong> from a professional, user based<br />
perspective. Chapter 2 expla<strong>in</strong>ed the methodological standpo<strong>in</strong>t of the thesis. Chapter<br />
3, 4, and 5 reviewed the e-<strong>government</strong> doma<strong>in</strong>, e-<strong>government</strong> seek<strong>in</strong>g behaviour, and<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> methods respectively. The review chapters served the purpose of guid<strong>in</strong>g the<br />
empirical <strong>in</strong>vestigations. In chapter 6 we outl<strong>in</strong>ed and accounted for the empirical<br />
designs, data collection and analysis of the two overall empirical studies of the thesis,<br />
the doma<strong>in</strong> study and the search test. The results of the studies were reported and<br />
analysed <strong>in</strong> Chapter 7 and 8. Chapter 7 addressed research question 1 concern<strong>in</strong>g<br />
professional e-<strong>government</strong> seek<strong>in</strong>g behaviour and the related <strong><strong>in</strong>dex<strong>in</strong>g</strong> demands by<br />
account<strong>in</strong>g for the results of the doma<strong>in</strong> study. Chapter 8 were concerned with the<br />
search test. By do<strong>in</strong>g this, research questions 2 and 3 concern<strong>in</strong>g the doma<strong>in</strong> specific<br />
performance of two <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods, and the related implications for <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
guidel<strong>in</strong>es with<strong>in</strong> the doma<strong>in</strong> was answered. The purpose of the present chapter is to<br />
unify the thesis’ threads <strong>in</strong> order to answer the research questions put forward <strong>in</strong><br />
Chapter 1. In section 9.1 we summarize the empirical f<strong>in</strong>d<strong>in</strong>gs of the thesis. Section<br />
9.2 makes recommendations for future work.<br />
9.1 Summary of empirical f<strong>in</strong>d<strong>in</strong>gs<br />
From the doma<strong>in</strong> study it was found that the e-<strong>government</strong> employees applied a myriad<br />
of ma<strong>in</strong>ly electronic <strong>in</strong>formation sources <strong>in</strong> their daily work. The predom<strong>in</strong>ant source<br />
was the <strong>in</strong>tranet. It has the highest use across all work tasks, while other types of<br />
sources depend on the work task at hand. The general prevalence of the <strong>in</strong>tranet<br />
supports its relevance to our choice of test system for the search test. Apart from direct<br />
<strong>in</strong>formation sources both the open field of the questionnaire and the focus group<br />
participants expressed an extensive use of colleagues as sources of <strong>in</strong>formation.<br />
The employees had a large work experience with<strong>in</strong> SKAT. With a long length<br />
of service <strong>in</strong> the organization the frequency of <strong>in</strong>formation seek<strong>in</strong>g predom<strong>in</strong>antly took
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
place with<strong>in</strong> regular <strong>in</strong>tervals, though not all the time. The employees demonstrated a<br />
good basic <strong>in</strong>sight <strong>in</strong>to their work topics on the basis of their experience. Though,<br />
particularly the employees engaged <strong>in</strong> citizen service had experienced, that their work<br />
tasks had changed with the <strong>in</strong>troduction of self-service. The change had caused a<br />
reduced memoriz<strong>in</strong>g of rules and regulations, as the employees were less <strong>in</strong> contact with<br />
citizens. The result was an <strong>in</strong>creased need for verification and updat<strong>in</strong>g <strong>in</strong>formation.<br />
Beyond that employee rout<strong>in</strong>e and topic <strong>in</strong>sight are attributable to the general frequency<br />
of <strong>in</strong>formation seek<strong>in</strong>g of approximately every 3 rd of 4 th time a work task is handled. To<br />
conclude, the study confirms the expected changes of employees’ work tasks with the<br />
<strong>in</strong>troduction of e-<strong>government</strong>, at least regard<strong>in</strong>g employees occupied with servic<strong>in</strong>g<br />
citizens.<br />
The ma<strong>in</strong> reason for consult<strong>in</strong>g the <strong>in</strong>tranet was verificative and conscious<br />
topical <strong>in</strong>formation needs. A few work tasks from the adm<strong>in</strong>istrative parts of the<br />
organization stood out with a high share of more complex <strong>in</strong>formation needs <strong>in</strong> terms of<br />
muddled topical needs, but they were exceptions to the general picture. It must be taken<br />
<strong>in</strong>to account though, that the questions guid<strong>in</strong>g the <strong>in</strong>formation needs questions of the<br />
questionnaire specifically concerned the <strong>in</strong>tranet, and that other <strong>in</strong>formation needs may<br />
occur <strong>in</strong> relation to other <strong>in</strong>formation sources. However, the results correspond well to<br />
the experience and <strong>in</strong>sight of the employees and to the conclusions drawn above, that<br />
employees often check up on <strong>in</strong>formation and rules to make sure they are updated. In<br />
addition, the results regard<strong>in</strong>g <strong>in</strong>formation needs were verified by the focus groups.<br />
Concern<strong>in</strong>g metadata the doma<strong>in</strong> study found an extensive need for metadata<br />
among the employees. A part of the reason for requir<strong>in</strong>g more and higher quality<br />
metadata orig<strong>in</strong>ated from a general difficulty of locat<strong>in</strong>g relevant documents <strong>in</strong> the<br />
runn<strong>in</strong>g <strong>in</strong>tranet. The difficulties often made the employees consult a colleague <strong>in</strong>stead<br />
of the <strong>in</strong>tranet In particular content metadata <strong>in</strong> terms of subject metadata were<br />
requested by the employees at the <strong>in</strong>tranet, but also other types were <strong>in</strong>quired by many<br />
employees. Thus, a general <strong>in</strong>terest towards metadata existed among the employees.<br />
The f<strong>in</strong>d<strong>in</strong>gs emphasize the importance of high quality mark up of documents to<br />
effective shar<strong>in</strong>g of knowledge <strong>in</strong> e-<strong>government</strong>.<br />
On the basis of the f<strong>in</strong>d<strong>in</strong>gs of the doma<strong>in</strong> study, the follow<strong>in</strong>g demands for<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> were deduced. As both verificative and conscious topical needs were<br />
dom<strong>in</strong>ant among the employees when consult<strong>in</strong>g the <strong>in</strong>tranet, both contextual and<br />
content metadata should be represented <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. A part of the def<strong>in</strong>ition of a<br />
verificative <strong>in</strong>formation need is that the user wants to locate a document on the basis of<br />
216
217<br />
Chapter 9<br />
some k<strong>in</strong>d of known bibliographic <strong>in</strong>formation. This calls for contextual metadata.<br />
Simultaneously conscious topical needs are solved by explor<strong>in</strong>g aspects of a known<br />
subject matter. Here content and contextual metadata is <strong>in</strong> place. To conclude, eemployees<br />
can ga<strong>in</strong> from both types of metadata <strong>in</strong> terms of their <strong>in</strong>formation needs,<br />
which is why they should be represented <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In addition the dissatisfaction<br />
with the search outcomes of the present <strong>in</strong>tranet was remarkable, and <strong>in</strong> many cases it<br />
resulted <strong>in</strong> giv<strong>in</strong>g up and consult<strong>in</strong>g colleagues <strong>in</strong>stead. For <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es it is<br />
emphasized that not only are metadata needed, they also need to be carefully added <strong>in</strong><br />
order to ensure quality. The quality is a premise for employees to be able to carry out<br />
effective and efficient <strong>in</strong>formation seek<strong>in</strong>g.<br />
The search test had its ma<strong>in</strong> focus on content metadata <strong>in</strong> terms of the subject<br />
categorization tested. However, also metadata <strong>in</strong> terms of document types turned out to<br />
be important to the test persons dur<strong>in</strong>g the test. On the basis of the doma<strong>in</strong> study<br />
f<strong>in</strong>d<strong>in</strong>gs regard<strong>in</strong>g <strong>in</strong>formation needs, three low complexity simulated search tasks<br />
guided the test searches along with one genu<strong>in</strong>e <strong>in</strong>formation need brought by each test<br />
person. Both simulated and genu<strong>in</strong>e search tasks were simple <strong>in</strong> terms of the number of<br />
concepts <strong>in</strong>cluded. Hence, all tasks consisted of three topical concepts or below.<br />
At a general level the search test found system B (compris<strong>in</strong>g categorization) to<br />
have more average terms <strong>in</strong> queries (2.43 to 2.25 <strong>in</strong> system A), and more average<br />
concepts <strong>in</strong> queries (1.90 to 1.67), and to have a lower share of queries apply<strong>in</strong>g the<br />
document type filter (31.6 to 43.2). Furthermore it required more work from the test<br />
persons to ga<strong>in</strong> success <strong>in</strong> system B. Here the share of sessions with reformulations was<br />
82.8 to 65.6 <strong>in</strong> system A, and the average number of reformulations was higher (4.23 to<br />
2.58 <strong>in</strong> system A). At session level system B was equal to or above system A <strong>in</strong> 3 of<br />
the 4 tasks <strong>in</strong> terms of the number of successful sessions. In terms of queries the total<br />
number of successful queries was fairly even between the two systems, though the<br />
number of failed queries were significantly higher <strong>in</strong> system B compared to system A.<br />
To conclude the effort required to locate relevant documents <strong>in</strong> system B was<br />
significantly higher.<br />
Further, a general f<strong>in</strong>d<strong>in</strong>g of the study was that queries with fewer terms were<br />
more likely to succeed. That <strong>in</strong>dicates that the test persons are very good at f<strong>in</strong>d<strong>in</strong>g<br />
relevant search terms. Further it means that the comb<strong>in</strong>ation with a category has a risk<br />
of over restrict<strong>in</strong>g results. This could <strong>in</strong>dicate that a less specific taxonomy could be<br />
useful to the employees, at least <strong>in</strong> a relatively small database as the one tested here.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Different causes were found for the <strong>in</strong>creased effort to retrieve relevant<br />
documents <strong>in</strong> system B. The test persons consistently used more search terms with the<br />
more restrictive search operator “Pages conta<strong>in</strong><strong>in</strong>g all words” and fewer search terms<br />
with the less restrictive search operator “Free text”. This shows that the test persons<br />
had difficulties understand<strong>in</strong>g the mean<strong>in</strong>g of the two predom<strong>in</strong>ant operators of the<br />
system. However, <strong>in</strong> terms of search results, system B further reduced the results <strong>in</strong> the<br />
categorization <strong>in</strong> order to complete the query, result<strong>in</strong>g <strong>in</strong> very limited search results,<br />
while the same operation <strong>in</strong> system A resulted <strong>in</strong> faster retrieval of relevant documents,<br />
because no further restrictions were added to the query. Further, <strong>in</strong> terms of system B,<br />
some test persons expressed trouble f<strong>in</strong>d<strong>in</strong>g suitable categories <strong>in</strong> the categorization to<br />
match their queries due to lack of knowledge of the taxonomy. The trouble was<br />
identified <strong>in</strong> the analysis of types of reformulations <strong>in</strong> system B too, where a change of<br />
mere categorization accounted for 40% of all reformulations <strong>in</strong> sim2, 47% percent <strong>in</strong><br />
sim3, and 34% <strong>in</strong> total numbers. The results stress the importance of an appropriate and<br />
mean<strong>in</strong>gful level of detail <strong>in</strong> controlled vocabularies. However, the results also stress<br />
that though the employees are considered experienced <strong>in</strong>formation searchers, they may<br />
be confused by the mean<strong>in</strong>g of Boolean operators. To compare, the average number of<br />
queries us<strong>in</strong>g the document type filter was higher <strong>in</strong> system A, though with large<br />
variations at task level. However, the use reflected a better understand<strong>in</strong>g of the use of<br />
the document type filter as a query tool <strong>in</strong> the two systems.<br />
Omissions of categorizations <strong>in</strong> one third of system B queries were the result of<br />
the test persons’ challenges. Analyses of the queries carried out <strong>in</strong> system B showed a<br />
fairly even distribution of successful sessions as to whether the session had been solved<br />
by means of categorization or not. At query level the <strong>in</strong>clusion of a category was<br />
successful <strong>in</strong> 24.2 % of queries, while of the queries that omitted categories had a<br />
success rate of 16.7 %. In the <strong>in</strong>terviews carried out, the omissions of categories were<br />
expla<strong>in</strong>ed. Categorization was not supportive <strong>in</strong> queries, where a highly relevant result<br />
came out among the first results. Neither was it relevant, if a very small set of results<br />
were retrieved. In those cases the categorization were considered as <strong>in</strong>convenient to the<br />
retrieval process, as it was easier to manually look through the results <strong>in</strong>stead of<br />
decid<strong>in</strong>g on the correct category. On the other hand categorization was useful <strong>in</strong><br />
suggest<strong>in</strong>g new search terms for a query<br />
Overall, it is concluded that there is a basis for implement<strong>in</strong>g categorization <strong>in</strong><br />
<strong>in</strong>formation systems support<strong>in</strong>g professional e-<strong>government</strong> users. Metadata based<br />
218
219<br />
Chapter 9<br />
extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> are important for successful retrieval <strong>in</strong> the doma<strong>in</strong> too, <strong>in</strong> order to<br />
be able to support verificative <strong>in</strong>formation needs <strong>in</strong> the doma<strong>in</strong>..<br />
9.2 Contributions of the thesis<br />
The contributions of the thesis <strong>in</strong> terms of the theoretical and empirical framework are<br />
identified to be:<br />
A confirmation of the non-verified assumptions of the doma<strong>in</strong> of e-<strong>government</strong> that<br />
work tasks of e-<strong>government</strong> employees are expected to change as a result of <strong>in</strong>creased<br />
self-service among external stakeholders <strong>in</strong> the doma<strong>in</strong> (Snellen, 2002; Dörfler, 2003;<br />
Marchion<strong>in</strong>i, Samet & Brandt, 2003; Brown, 2005; Landsforen<strong>in</strong>gen af Kommunale<br />
Servicecentre, 2005; Mahler & Regan, 2005). The results have shown, that at least for<br />
employees engaged <strong>in</strong> servic<strong>in</strong>g citizens, the need to verify <strong>in</strong>formation has <strong>in</strong>creased,<br />
as less is memorized due to less rout<strong>in</strong>e. To LIS, the consequences of <strong>in</strong>creased<br />
<strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> order to rema<strong>in</strong> updated are important.<br />
As regards <strong>in</strong>formation seek<strong>in</strong>g of professional e-<strong>government</strong> users it has been outl<strong>in</strong>ed<br />
that the user group is not very well discovered. In the light of changes <strong>in</strong> the work tasks<br />
just mentioned, an update of e-<strong>government</strong> employees was needed. The thesis has<br />
added to our knowledge of the user group <strong>in</strong> terms of their:<br />
use of <strong>in</strong>formation sources<br />
frequency of <strong>in</strong>formation seek<strong>in</strong>g<br />
metadata preferences<br />
predom<strong>in</strong>ant types of <strong>in</strong>formation needs developed and how these needs are met<br />
by means of contextual and content metadata<br />
search<strong>in</strong>g behavior<br />
Regard<strong>in</strong>g the performance of automatic categorization <strong>in</strong> the doma<strong>in</strong> different th<strong>in</strong>gs<br />
have been learned:<br />
Categorization is supportive to users <strong>in</strong> tasks, where a new perspective of a task<br />
is needed, either <strong>in</strong> the form of suggestions for new search terms or <strong>in</strong> offer<strong>in</strong>g<br />
an understand<strong>in</strong>g of the facets conta<strong>in</strong>ed <strong>in</strong> the search task. In addition<br />
categorization supports users <strong>in</strong> reduc<strong>in</strong>g large search results. In verificative
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
searches categorization is less useful, if highly relevant documents are retrieved<br />
fast. In those cases categorization reduces efficiency.<br />
Categorization have primarily been tested <strong>in</strong> larger collection than the present<br />
test collection. From the present results it has been learned that categorization is<br />
also useful <strong>in</strong> smaller document collections.<br />
However, <strong>in</strong> order to be able to be supportive to the user group an appropriate<br />
level of specificity must be expressed through the KOS. In addition, the<br />
categorization of documents must be correct and meet the employees’<br />
understand<strong>in</strong>g of the doma<strong>in</strong>.<br />
9.3 Recommendations for future work<br />
In cont<strong>in</strong>uation of the conclusions drawn above, the present section suggests<br />
recommendations for future work. Thus, though the thesis have added to our<br />
knowledge about professional e-<strong>government</strong> <strong>in</strong>formation seek<strong>in</strong>g and the ability of<br />
automatic categorization to support this behavior, new question have arisen, that<br />
rema<strong>in</strong>s to be answered. The suggestions are divided <strong>in</strong> two: Suggestions regard<strong>in</strong>g the<br />
empirical sett<strong>in</strong>g and suggestions regard<strong>in</strong>g the tools applied <strong>in</strong> the study.<br />
The empirical stett<strong>in</strong>g of the thesis was a case study of SKAT, the largest<br />
<strong>government</strong> agency <strong>in</strong> Denmark. We have touched upon the long length of service of<br />
the employees and its implications for <strong>in</strong>formation needs and seek<strong>in</strong>g behavior. This is<br />
not necessarily a general tendency. Therefore it would be <strong>in</strong>terest<strong>in</strong>g to <strong>in</strong>vestigate, if<br />
the behavior is different <strong>in</strong> smaller <strong>government</strong>s.<br />
As a consequence of the <strong>in</strong>formation needs characterized <strong>in</strong> the doma<strong>in</strong> study,<br />
low complexity simulated search tasks were used as the po<strong>in</strong>t of departure of the search<br />
test along with one genu<strong>in</strong>e search task. It was found that system B performed better <strong>in</strong><br />
some variables <strong>in</strong> relation to the genu<strong>in</strong>e search task as the only task. For that reason it<br />
would be enrich<strong>in</strong>g to explore the performance of e-<strong>government</strong> categorization <strong>in</strong> a<br />
study designed to reflect genu<strong>in</strong>e search tasks to a larger extent. In this connection<br />
another question arises. Thus, we have not been with<strong>in</strong> the aim of the test design to<br />
state the performance of categorization <strong>in</strong> terms of more complex <strong>in</strong>formation needs. A<br />
study <strong>in</strong>vestigat<strong>in</strong>g just complex <strong>in</strong>formation needs <strong>in</strong> e-<strong>government</strong> would add further<br />
to our knowledge of the performance of categorization <strong>in</strong> the doma<strong>in</strong>. Lastly, <strong>in</strong> relation<br />
to the empirical sett<strong>in</strong>g, the search test provided an <strong>in</strong>sight <strong>in</strong>to employees’ use of a<br />
system for a short amount of time <strong>in</strong> a system that was new to them. This has<br />
220
221<br />
Chapter 9<br />
advantages, which have been outl<strong>in</strong>ed <strong>in</strong> the empirical framework. However a study<br />
<strong>in</strong>vestigat<strong>in</strong>g categorization <strong>in</strong> a more natural sett<strong>in</strong>g could add other perspectives to our<br />
understand<strong>in</strong>g of the field.<br />
The search test has applied different tools. The tools have also raised questions<br />
to be asked ahead. The categorization made use of a two level taxonomy for arrang<strong>in</strong>g<br />
search results. Different op<strong>in</strong>ions have been put forward from the test persons as to the<br />
specificity of the taxonomy. As it was not a part of the purpose and design of the search<br />
test to address this question, we have not been able to validate the cause of the<br />
differences. With<strong>in</strong> professional users it would therefore be <strong>in</strong>terest<strong>in</strong>g to ga<strong>in</strong> more<br />
knowledge of what the appropriate specificity and choice of concepts with<strong>in</strong> KOS like<br />
taxonomies is. In addition the project have <strong>in</strong>vestigated automatic assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong><br />
terms of automated categorization and found it supportive <strong>in</strong> e-<strong>government</strong> seek<strong>in</strong>g<br />
behavior. Investigations of other types of assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> would <strong>in</strong>crease our<br />
knowledge of the relative performance of different assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods <strong>in</strong> the<br />
doma<strong>in</strong>.<br />
This thesis adds to our knowledge about professional e-<strong>government</strong> seek<strong>in</strong>g behavior,<br />
and has <strong>in</strong>creased our understand<strong>in</strong>g of how this behavior can be supported by<br />
automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>.
10 References<br />
223<br />
References<br />
Abecker, A., Bernardi, A., H<strong>in</strong>kelmann, K., Kühn, O. & S<strong>in</strong>tek, M. (1998). Toward a<br />
technology for organizational memories. IEEE Intelligent Systems, 13(3), 40-48.<br />
Ahlgren, P. & Kekälä<strong>in</strong>en, J. (2007). Index<strong>in</strong>g strategies for Swedish full text retrieval<br />
under different user scenarios. Information Process<strong>in</strong>g & Management, 43(1),<br />
81-102.<br />
Aitchison, J. (1992). Index<strong>in</strong>g languages and <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In: Dossett, P. (Ed.), Handbook<br />
of Special Librarianship and Information Work (6. ed., pp. 191-233). London:<br />
Aslib.<br />
Alasem, A. (2009). An overview of e-Government metadata standards and Initiatives<br />
based on Dubl<strong>in</strong> Core. Electronic Journal of e-Government, 7(1), 1-10.<br />
Alavi, M. & Leidner, D.E. (2001). Review: Knowledge management and knowledge<br />
management systems: Conceptual foundations and research issues. MIS<br />
Quarterly, 25(1), 107-136.<br />
Albrechtsen, H. (1993). Subject analysis and <strong><strong>in</strong>dex<strong>in</strong>g</strong>: from automated <strong><strong>in</strong>dex<strong>in</strong>g</strong> to<br />
doma<strong>in</strong> analysis. The Indexer, 18(4), 219-224.<br />
Andersen, K.V., Grönlund, Å., Moe, C.E. & Se<strong>in</strong>, M.K. (2005). Introduction to the<br />
special issue. Scand<strong>in</strong>avian Journal of Information Systems, 17(2), 3-10.<br />
Andersen, K.V. & Kraemer, K.L. (1994). Information technology and transitions <strong>in</strong> the<br />
public service: A comparison of Scand<strong>in</strong>avia and the United States.<br />
Scand<strong>in</strong>avian Journal of Information Systems, 6(1), 3-24.<br />
Anderson, J.D. & Perez-Carballo, J. (2001a). The nature of <strong><strong>in</strong>dex<strong>in</strong>g</strong>: How humans and<br />
mach<strong>in</strong>es analyze messages and texts for retrieval. Part I: Research, and the<br />
nature of human <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Information Process<strong>in</strong>g & Management, 37(2), 231-<br />
254.<br />
Anderson, J.D. & Perez-Carballo, J. (2001b). The nature of <strong><strong>in</strong>dex<strong>in</strong>g</strong>: How humans and<br />
mach<strong>in</strong>es analyze messages and texts for retrieval. Part II: Mach<strong>in</strong>e <strong><strong>in</strong>dex<strong>in</strong>g</strong>,<br />
and the allocation of human versus mach<strong>in</strong>e effort. Information Process<strong>in</strong>g &<br />
Management, 37(2), 255-277.<br />
Anderson, J.D. & Pérez-Carballo, J. (2005). Information Retrieval Design: Pr<strong>in</strong>ciples<br />
and Options for Information Description, Organization, Display, and Access <strong>in</strong><br />
Information Retrieval Databases, Digital Libraries, Catalogs, and Indexes. St.<br />
Petersburg: Ometeca Institute.<br />
Andrews, J. & Duhon, L. (1997). GILS, Government Information Locator Service:<br />
Blend<strong>in</strong>g old and new to access U.S. <strong>government</strong>al <strong>in</strong>formation. The Serials<br />
Librarian, 31(1-2), 327-333.<br />
Apté, C., Damerau, F. & Weiss, S.M. (1994). Automated learn<strong>in</strong>g of decision rules for<br />
text categorization. ACM Transactions on Information Systems, 12(3), 233-251.<br />
Arellano-Gault, D. & del Castillo-Vega, A. (2004). Maturation of public adm<strong>in</strong>istration<br />
<strong>in</strong> a multicultural environment: Lessons from the Anglo-Saxon, Lat<strong>in</strong>, and<br />
Scand<strong>in</strong>avian political traditions. International Journal of Public<br />
Adm<strong>in</strong>istration, 27(7), 519-528.<br />
Askim, J. (2007). How do politicians use performance <strong>in</strong>formation? An analysis of the<br />
Norwegian local <strong>government</strong> experience. International Review of Adm<strong>in</strong>istrative<br />
Sciences, 73(3), 453-472.<br />
Askim, J. (2009). The demand side of performance measurement: Expla<strong>in</strong><strong>in</strong>g<br />
councillors' utilization of performance <strong>in</strong>formation <strong>in</strong> policymak<strong>in</strong>g.<br />
International Public Management Journal, 12(1), 24-47.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Attar, K.E. (2006). Why appo<strong>in</strong>t professionals? A student catalogu<strong>in</strong>g project. Journal<br />
of Librarianship and Information Science, 38(3), 173-185.<br />
Attfield, S., Blandford, A. & Makri, S. (2010). Social and <strong>in</strong>teractional practices for<br />
dissem<strong>in</strong>at<strong>in</strong>g current awareness <strong>in</strong>formation <strong>in</strong> an organisational sett<strong>in</strong>g.<br />
Information Process<strong>in</strong>g & Management, 46(6), 632-645.<br />
Bates, M.J. (1979). Information search tactics. Journal of the American Society for<br />
Information Science, 30(4), 205-214.<br />
Becker, J., Pfeiffer, D. & Räckers, M. (2007). Doma<strong>in</strong> specific process modell<strong>in</strong>g <strong>in</strong><br />
public adm<strong>in</strong>istrations: The PICTURE approach. In: Wimmer, M.A., Scholl,<br />
H.J. & Grönlund, Å., (Eds.), EGOV 2007, (pp. 68-79). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />
Beghtol, C. (1986). Bibliographic classification theory and text l<strong>in</strong>guistics: aboutness<br />
analysis, <strong>in</strong>tertextuality and the cognitive act of classify<strong>in</strong>g documents. Journal<br />
of Documentation, 42(2), 84-113.<br />
Bekkers, V. & Homburg, V. (2007). The myths of e-<strong>government</strong>: Look<strong>in</strong>g beyond the<br />
assumptions of a new and better <strong>government</strong>. Information Society, 23(5), 373-<br />
382.<br />
Belk<strong>in</strong>, N.J. & Croft, W.B. (1992). Information filter<strong>in</strong>g and <strong>in</strong>formation retrieval: Two<br />
sides of the same co<strong>in</strong>? Communications of the ACM, 35(12), 29-38.<br />
Belk<strong>in</strong>, N.J., Oddy, R.N. & Brooks, H.M. (1982). ASK for <strong>in</strong>formation retrieval: Part 1.<br />
Background and theory. Journal of Documentation, 38(2), 61-71.<br />
Bellamy, C. (2002). From automation to knowledge management: Moderniz<strong>in</strong>g British<br />
<strong>government</strong> with ICTS. International Review of Adm<strong>in</strong>istrative Sciences, 68(2),<br />
213-230.<br />
Bellamy, C. & Taylor, J.A. (1998). Govern<strong>in</strong>g <strong>in</strong> the Information Age. Buck<strong>in</strong>gham:<br />
Open University Press.<br />
Berrios, D.C., Cuc<strong>in</strong>a, R.J. & Fagan, L.M. (2002). Methods for semi-automated<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> for high precision <strong>in</strong>formation retrieval. Journal of the American<br />
Medical Informatics Association, 9(6), 637-652.<br />
Bertot, J.C., Jaeger, P.T. & Grimes, J.M. (2010). Us<strong>in</strong>g ICTs to create a culture of<br />
transparency: E-<strong>government</strong> and social media as openness and anti-corruption<br />
tools for societies. Government Information Quarterly, 27(3), 264-271.<br />
Beynon-Davies, P. (2007). Models for e-<strong>government</strong>. Transform<strong>in</strong>g Government:<br />
People, Process and Policy, 1(1), 7-28.<br />
Bigdeli, Z. (2007). Iranian eng<strong>in</strong>eers' <strong>in</strong>formation needs and seek<strong>in</strong>g habits: An agro<strong>in</strong>dustry<br />
company experience. Information Research, 12(2).<br />
Blair, D.C. (2002). The challenge of commercial document retrieval, Part I: Major<br />
issues, and a framework based on search exhaustivity, determ<strong>in</strong>acy of<br />
representation and document collection size. Information Process<strong>in</strong>g &<br />
Management, 38(2), 273-291.<br />
Blair, D.C. & Maron, M.E. (1985). An evaluation of retrieval effectiveness for a fulltext<br />
document-retrieval system. Communications of the ACM, 28(3), 289-299.<br />
Blomgren, L., Vallo, H. & Byström, K. (2004). Evaluation of an <strong>in</strong>formation system <strong>in</strong><br />
an <strong>in</strong>formation seek<strong>in</strong>g process. In: Heery, R. & Lyon, L. (Eds.), ECDL 2004<br />
(pp. 57-68). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />
Bloomfield, M. (2002). Index<strong>in</strong>g: Neglected and poorly understood. Catalog<strong>in</strong>g &<br />
Classification Quarterly, 33(1), 63-75.<br />
Bloor, M., Frankland, J., Thomas, M. & Robson, K. (2001). Focus Groups <strong>in</strong> Social<br />
Research. London: Sage.<br />
Borko, H. (1977). Toward a theory of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Information Process<strong>in</strong>g &<br />
Management, 13, 355-365.<br />
Borlund, P. (2000). Experimental components for the evaluation of <strong>in</strong>teractive<br />
<strong>in</strong>formation retrieval systems. Journal of Documentation, 56(1), 71-90.<br />
224
225<br />
References<br />
Borlund, P. (2003a). The concept of relevance <strong>in</strong> IR. Journal of the American Society<br />
for Information Science and Technology, 54(10), 913-925.<br />
Borlund, P. (2003b). The IIR evaluation model: A framework for evaluation of<br />
<strong>in</strong>teractive <strong>in</strong>formation retrieval systems. Information Research, 8(3).<br />
Borlund, P. & Ingwersen, P. (1997). The development of a method for the evaluation of<br />
<strong>in</strong>teractive <strong>in</strong>formation retrieval systems. Journal of Documentation, 53(3), 225-<br />
250.<br />
Borlund, P. & Schneider, J.W. (2010). Reconsideration of the simulated work task<br />
situation: A context <strong>in</strong>strument for evaluation of <strong>in</strong>formation retrieval<br />
<strong>in</strong>teraction. In: Belk<strong>in</strong>, N.J. & Kelly, D. (Eds.), IIiX 2010. New Brunswick, New<br />
Jersey: ACM.<br />
Bountouri, L., Papatheodorou, C., Soulikias, V. & Stratis, M. (2009). Metadata<br />
<strong>in</strong>teroperability <strong>in</strong> public sector <strong>in</strong>formation. Journal of Information Science,<br />
35(2), 204-231.<br />
Box, R.C. (1999). Runn<strong>in</strong>g <strong>government</strong> like a bus<strong>in</strong>ess: Implications for public<br />
adm<strong>in</strong>istration theory and practice. The American Review of Public<br />
Adm<strong>in</strong>istration, 29(1), 19-43.<br />
Brown, D. (2005). Electronic <strong>government</strong> and public adm<strong>in</strong>istration. International<br />
Review of Adm<strong>in</strong>istrative Sciences, 71(2), 241-254.<br />
Buck<strong>in</strong>gham, A. & Saunders, P. (2004). The Survey Methods Workbook: From Design<br />
to Analysis. Cambridge: Polity Press.<br />
Byström, K. (1997). Municipal adm<strong>in</strong>istrators at work: Information needs and seek<strong>in</strong>g<br />
(IN&S) <strong>in</strong> relation to task complexity: A case-study amongst municipal officials,<br />
Information Seek<strong>in</strong>g <strong>in</strong> Context. Tampere, F<strong>in</strong>land: Taylor Graham.<br />
Byström, K. (1999). Task Complexity, Information Types and Information Sources.<br />
Unpublished Doctoral dissertation, University of Tampere, Tampere.<br />
Byström, K. (2002). Information and <strong>in</strong>formation sources <strong>in</strong> tasks of vary<strong>in</strong>g<br />
complexity. Journal of the American Society for Information Science and<br />
Technology, 53(7), 581-591.<br />
Byström, K. & Hansen, P. (2005). Conceptual framework for tasks <strong>in</strong> <strong>in</strong>formation<br />
studies. Journal of the American Society for Information Science and<br />
Technology, 56(10), 1050-1061.<br />
Byström, K. & Järvel<strong>in</strong>, K. (1995). Task complexity affects <strong>in</strong>formation seek<strong>in</strong>g and<br />
use. Information Process<strong>in</strong>g & Management, 31(2), 191-213.<br />
Carey, M.A. & Smith, M.W. (1994). Captur<strong>in</strong>g the group effect <strong>in</strong> focus groups: A<br />
special concern <strong>in</strong> analysis. Qualitative Health Research, 4(1), 123-127.<br />
Carm<strong>in</strong>es, E.G. & Woods, J.A. (2005). Reliability assessment. In: Encyclopedia of<br />
Social Measurement (Vol. 3, pp. 361-365).<br />
Case, D.O. (2006). Information behavior. Annual Review of Information Science and<br />
Technology, 40, 293-327.<br />
Case, D.O. (2007). Look<strong>in</strong>g for Information: A Survey of Research on Information<br />
Seek<strong>in</strong>g, Needs, and Behavior. Amsterdam: Elsevier.<br />
Center for effektiviser<strong>in</strong>g og digitaliser<strong>in</strong>g (2002). Prospekt for FESD (Fællesoffentlig<br />
Elektronisk Sags- og Dokumenthåndter<strong>in</strong>g). Retrieved 13-03, 2011, from<br />
http://moderniser<strong>in</strong>g.dk/fileadm<strong>in</strong>/user_upload/documents/Projekter/FESD/Bagg<br />
rund/FESD-prospekt.pdf.<br />
Chau, M., Fang, X. & Sheng, O.R.L. (2007). What are people search<strong>in</strong>g on <strong>government</strong><br />
web sites? A study of search activity on the Utah.gov web site. Communications<br />
of the ACM, 50(4), 87-92.<br />
Chaudhry, A.S. (2010). Assessment of taxonomy build<strong>in</strong>g tools. The Electronic<br />
Library, 28(6), 769-788.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Chen, H. (1995). Mach<strong>in</strong>e learn<strong>in</strong>g for <strong>in</strong>formation retrieval: Neural networks, symbolic<br />
learn<strong>in</strong>g, and genetic algorithms. Journal of the American Society for<br />
Information Science, 46(3), 194-216.<br />
Choi, Y. (2010a). Enhanc<strong>in</strong>g access to the Web: Vocabulary analysis on users' tags and<br />
professionals' <strong>in</strong>dex terms, iConference. University of Ill<strong>in</strong>ois at Urbana-<br />
Champaign, Ill<strong>in</strong>ois, U.S.A.<br />
Choi, Y. (2010b). Traditional versus emerg<strong>in</strong>g knowledge organization systems:<br />
Consistency of subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> of the Web by <strong>in</strong>dexers and taggers, ASIST<br />
2010. Pittsburgh, PA, USA.<br />
Choo, C.W. (2006). The Know<strong>in</strong>g Organization: How Organizations Use Information<br />
to Construct Mean<strong>in</strong>g, Create Knowledge, and Make Decisions (2. ed.). New<br />
York: Oxford University Press.<br />
Choo, C.W., Furness, C., Paquette, S., van den Berg, H., Detlor, B., Bergeron, P. &<br />
Heaton, L. (2006). Work<strong>in</strong>g with <strong>in</strong>formation: Information management and<br />
culture <strong>in</strong> a professional services organization. Journal of Information Science,<br />
32(6), 491-510.<br />
Chowdhury, G.G. (2003). Natural language process<strong>in</strong>g. Annual Review of Information<br />
Science and Technology, 37, 51-89.<br />
Chowdhury, G.G. (2004). Introduction to Modern Information Retrieval (2. ed.).<br />
London: Facet.<br />
Christian, E. (1999). Experiences with <strong>in</strong>formation locator services. Journal of<br />
Government Information, 26(3), 271-285.<br />
Christian, E. (2001). A metadata <strong>in</strong>itiative for global <strong>in</strong>formation discovery.<br />
Government Information Quarterly, 18(3), 209-221.<br />
Clark, H.H. & Schober, M.F. (1992). Ask<strong>in</strong>g questions and <strong>in</strong>fluenc<strong>in</strong>g answers. In:<br />
Tanur, J.M. (Ed.), Questions about Questions: Inquiries <strong>in</strong>to the Cognitive Bases<br />
of Surveys (pp. 15-48). New York: Russel Sage Foundation.<br />
Cleverdon, C. (1967). The Cranfield tests on <strong>in</strong>dex language devices. Aslib<br />
Proceed<strong>in</strong>gs, 19(6), 173-194.<br />
Cleverdon, C. & Keen, M. (1966). Aslib Cranfield research project. Factors<br />
determ<strong>in</strong><strong>in</strong>g the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> systems. Volume 2: Test results.<br />
Cranfield: College of Aeronautics.<br />
Cleverdon, C.W. (1960). ASLIB Cranfield Research Project: Report on the first stage of<br />
an <strong>in</strong>vestigation <strong>in</strong>to the comparative efficiency of <strong><strong>in</strong>dex<strong>in</strong>g</strong> systems. Cranfield:<br />
College of Aeronautics.<br />
Codagnone, C. & Wimmer, M.A., (Eds.). (2007). Roadmapp<strong>in</strong>g eGovernment<br />
Research: Visions and Measures towards Innovative Governments <strong>in</strong> 2020.<br />
[Koblentz]: eGovRTD2020 Project Consortium.<br />
Cole, C. & Leide, J. (2006). A cognitive framework for human <strong>in</strong>formation behavior:<br />
The place of metaphor <strong>in</strong> human <strong>in</strong>formation organiz<strong>in</strong>g behavior. In: Sp<strong>in</strong>k, A.<br />
& Cole, C. (Eds.), New Directions <strong>in</strong> Human Information Behavior (Vol. 8, pp.<br />
171-202). Netherlands: Spr<strong>in</strong>ger.<br />
Cong, X. & Pandya, K.V. (2003). Issues of knowledge management <strong>in</strong> the public sector.<br />
Electronic Journal of Knowledge Management, 1(2), 25-33.<br />
Connaway, L.S., Dickey, T.J. & Radford, M.L. (2011). "If it is too <strong>in</strong>convenient I'm not<br />
go<strong>in</strong>g after it": Convenience as a critical factor <strong>in</strong> <strong>in</strong>formation-seek<strong>in</strong>g<br />
behaviors. Library & Information Science Research, 33(3), 179-190.<br />
Cook, C., Heath, F. & Thompson, R. (2000). A meta-analysis of response rates <strong>in</strong> Web-<br />
or <strong>in</strong>ternet-based surveys. Educational and psychological measurement, 60(6),<br />
821-836.<br />
Cooper, W.S. (1969). Is <strong>in</strong>ter<strong>in</strong>dexer consistency a hobgobl<strong>in</strong>?' American<br />
Documentation, 20(3), 268-279.<br />
226
227<br />
References<br />
Courtright, C. (2007). Context <strong>in</strong> <strong>in</strong>formation behavior research. Annual Review of<br />
Information Science and Technology, 41, 273-306.<br />
Cous<strong>in</strong>s, S.A. (1992). Enhanc<strong>in</strong>g subject access to opacs: Controlled vocabulary vs.<br />
natural language. Journal of Documentation, 48(3), 291-309.<br />
Coyle, K. (2008). Mach<strong>in</strong>e <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The Journal of Academic Librarianship, 34(6),<br />
530-531.<br />
Crawford, J. & Irv<strong>in</strong>g, C. (2009). Information literacy <strong>in</strong> the workplace: A qualitative<br />
exploratory study. Journal of Librarianship and Information Science, 42(1), 29-<br />
38.<br />
Croft, W.B., Turtle, H.R. & Lewis, D.D. (1991, October 13-16.). The use of phrases and<br />
structured queries <strong>in</strong> <strong>in</strong>formation retrieval. In: Bookste<strong>in</strong>, A., Chiaramella, Y.,<br />
Salton, G. & Raghavan, V.V., (Eds.), Proceed<strong>in</strong>gs of the 14th Annual<br />
International ACM SIGIR Conference on Research and Development <strong>in</strong><br />
Information Retrieval, (pp. 32-45). Chicago, Ill<strong>in</strong>ois, USA: New York: ACM.<br />
Cuillier, D. & Piotrowski, S.J. (2009). Internet <strong>in</strong>formation-seek<strong>in</strong>g and its relation to<br />
support for access to <strong>government</strong> records. Government Information Quarterly,<br />
26(3), 441-449.<br />
Cunn<strong>in</strong>gham, S.J., Litt<strong>in</strong>, J. & Witten, I.H. (1997). Applications of Mach<strong>in</strong>e Learn<strong>in</strong>g <strong>in</strong><br />
Information Retrieval (Work<strong>in</strong>g Paper 97/6). Hamilton, New Zealand: The<br />
University of Waikato, Department of Computer Science.<br />
Davies, K. (2007). The <strong>in</strong>formation-seek<strong>in</strong>g behaviour of doctors: A review of the<br />
evidence. Health Information and Libraries Journal, 24(2), 78-94.<br />
Dawes, S.S. (2009). Governance <strong>in</strong> the digital age: A research and action framework for<br />
an uncerta<strong>in</strong> future. Government Information Quarterly, 26(2), 257-264.<br />
de Groot, D. (2003). Vigorous knowledge management <strong>in</strong> the Dutch public sector. In:<br />
Wimmer, M.A. (Ed.), 4th IFIP International Work<strong>in</strong>g Conference, KMGov 2003<br />
(pp. 94-99): Spr<strong>in</strong>ger.<br />
de Jong, M. & Lentz, L. (2006). Municipalities on the Web: User-Friendl<strong>in</strong>ess of<br />
Government Information on the Internet. In: Wimmer, M., Scholl, H., Grönlund,<br />
Å. & Andersen, K. (Eds.), Electronic Government, 5th International<br />
Conference, EGOV 2006 (pp. 174-185). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />
De Mey, M. (1977). The cognitive viewpo<strong>in</strong>t: Its development and its scope. In: De<br />
Mey, M., P<strong>in</strong>xten, R., Poriau, m. & Vandamme, F. (Eds.), International<br />
Workshop on the Cognitive Viewpo<strong>in</strong>t (pp. xvi-xxxii). Ghent, Belgium:<br />
University of Ghent.<br />
de Vaus, D. (2002b). Surveys <strong>in</strong> Social Research (5. ed.). London: Routledge.<br />
Del Fiol, G., Haug, P.J., Cim<strong>in</strong>o, J.J., Narus, S.P., Norl<strong>in</strong>, C. & Mitchell, J.A. (2008).<br />
Effectiveness of topic-specific <strong>in</strong>fobuttons: A randomized controlled trial.<br />
Journal of the American Medical Informatics Association, 15(6), 752-759.<br />
Dempsey, L. & Heery, R. (1998). Metadata: A current view of practice and issues.<br />
Journal of Documentation, 54(2), 145-172.<br />
Dias, C. (2001). Corporate portals: A literature review of a new concept <strong>in</strong> Information<br />
Management. International Journal of Information Management, 21(4), 269-<br />
287.<br />
Dietterich, T.G. (1997). Mach<strong>in</strong>e-learn<strong>in</strong>g research: Four current directions. AI<br />
Magaz<strong>in</strong>e, 18(4), 97-136.<br />
du Plessis, T. & du Toit, A.S.A. (2006). Knowledge management and legal practice.<br />
International Journal of Information Management, 26(5), 360-371.<br />
Dubois, C.P.R. (1987). Free text versus controlled vocabulary. Onl<strong>in</strong>e Review, 11(4),<br />
243-253.<br />
Dumais, S., Platt, J., Heckerman, D. & Sahami, M. (1998). Inductive learn<strong>in</strong>g<br />
algorithms and representations for text categorization. In: Makki, K. &<br />
Bouganim, L. (Eds.), CIKM '98 Proceed<strong>in</strong>gs of the seventh <strong>in</strong>ternational
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
conference on Information and knowledge management (pp. 148-155). New<br />
York: ACM.<br />
Dörfler, A. (2003). Bus<strong>in</strong>ess process modell<strong>in</strong>g and help systems as part of KM <strong>in</strong> e<strong>government</strong>.<br />
In: Wimmer, M.A., (Ed.), KMGov, (pp. 297-303).<br />
Edmiston, K.D. (2003). State and local e-<strong>government</strong>: Prospects and challenges. The<br />
American Review of Public Adm<strong>in</strong>istration, 33(1), 20-45.<br />
Edmunds, A. & Morris, A. (2000). The problem of <strong>in</strong>formation overload <strong>in</strong> bus<strong>in</strong>ess<br />
organisations: a review of the literature. International Journal of Information<br />
Management, 20(1), 17-28.<br />
Efron, M., Elsas, J., Marchion<strong>in</strong>i, G. & Zhang, J. (2004). Mach<strong>in</strong>e learn<strong>in</strong>g for<br />
<strong>in</strong>formation architecture <strong>in</strong> a large <strong>government</strong>al website. In: Proceed<strong>in</strong>gs of the<br />
4th ACM/IEEE-CS jo<strong>in</strong>t conference on Digital libraries, (pp. 151-159). Tuscon,<br />
AZ, USA.<br />
El-Sherb<strong>in</strong>i, M. & Klim, G. (2004). Metadata and catalog<strong>in</strong>g practices. The Electronic<br />
Library, 22(3), 238-248.<br />
Ellis, D. (1989). A behavioural approach to <strong>in</strong>formation retrieval system design. Journal<br />
of Documentation, 45(3), 171-212.<br />
Elwood, S. (2008). Grassroots groups as stakeholders <strong>in</strong> spatial data <strong>in</strong>frastructures:<br />
Challenges and opportunities for local data development and shar<strong>in</strong>g.<br />
International Journal of Geographical Information Science, 22(1), 71-90.<br />
Ely, M. (1991). Do<strong>in</strong>g Qualitative Research: Circles With<strong>in</strong> Circles. London:<br />
Routledge.<br />
Eppler, M.J. & Mengis, J. (2004). The concept of <strong>in</strong>formation overload: A review of<br />
literature from organization science, account<strong>in</strong>g, market<strong>in</strong>g, MIS, and related<br />
discipl<strong>in</strong>es. The Information Society, 20(5), 325-344.<br />
Evans, J.R. & Mathur, A. (2005). The value of onl<strong>in</strong>e surveys. Internet Research, 15(2),<br />
195 -219.<br />
Fag<strong>in</strong>, R., Kumar, R., McCurley, K.S., Novak, J., Sivakumar, D., Toml<strong>in</strong>, J.A. &<br />
Williamson, D.P. (2003, May 20–24, 2003). Search<strong>in</strong>g the workplace web. In:<br />
WWW2003: Proceed<strong>in</strong>gs of the 12th <strong>in</strong>ternational conference on World Wide<br />
Web, (pp. 366-375). Budapest, Hungary.<br />
Fang, Z. (2002). E-<strong>government</strong> <strong>in</strong> digital era: Concept, practice, and development.<br />
International Journal of The Computer, The Internet and Management, 10(2), 1-<br />
22.<br />
Fangmeyer, H. (1974). Semi <strong>Automatic</strong> Index<strong>in</strong>g: State of the Art. Neuilly Sur Se<strong>in</strong>e,<br />
France: North Atlantic Treaty Organization.<br />
Feldman, S. & Sherman, C. (2001). The High Cost of Not F<strong>in</strong>d<strong>in</strong>g Information.<br />
Retrieved 21-03, 2010, from<br />
http://www.ejitime.com/materials/IDC%20on%20The%20High%20Cost%20Of<br />
%20Not%20F<strong>in</strong>d<strong>in</strong>g%20Information.pdf.<br />
Fidel, R. (1994). User-centred <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Journal of the American Society for<br />
Information Science, 45(8), 572-576.<br />
Floropoulos, J., Spathis, C., Halvatzis, D. & Tsipouridou, M. (2010). Measur<strong>in</strong>g the<br />
success of the Greek Taxation Information System. International Journal of<br />
Information Management, 30(1), 47-56.<br />
Ford, F.N. (1985). Decision support systems and expert systems: A comparison.<br />
Information & Management, 8(1), 21-26.<br />
Foster, A. & Ford, N. (2003). Serendipity and <strong>in</strong>formation seek<strong>in</strong>g: An empirical study.<br />
Journal of Documentation, 59(3), 321-340.<br />
Fourie, I. (2009). Learn<strong>in</strong>g from research on the <strong>in</strong>formation behaviour of healthcare<br />
professionals: A review of the literature 2004–2008 with a focus on emotion.<br />
Health Information and Libraries Journal, 26(3), 171-186.<br />
228
229<br />
References<br />
Fox, C. (1989). A stop list for general text. Newsletter ACM SIGIR Forum, 24(1-2), 19-<br />
35.<br />
Fox, C. (1992). Lexical analysis and stoplists. In: Frakes, W.B. & Baeza-Yates, R.<br />
(Eds.), Information Retrieval: Data Structures & Algorithms (pp. 102-130).<br />
Englewood Cliffs, New Jersey: Prentice Hall.<br />
Frakes, W.B. (1992). Stemm<strong>in</strong>g algorithms. In: Frakes, W.B. & Baeza-Yates, R. (Eds.),<br />
Information Retrieval: Data Structures & Algorithms (pp. 131-160). Englewood<br />
Cliffs, New Jersey: Prentice Hall.<br />
Frankfort-Nachmias, C. & Nachmias, D. (1996). Research Methods <strong>in</strong> the Social<br />
Sciences (5. ed.). London: Arnold.<br />
Freund, L., Toms, E.G. & Waterhouse, J. (2005). Model<strong>in</strong>g the <strong>in</strong>formation behaviour<br />
of software eng<strong>in</strong>eers us<strong>in</strong>g a work - task framework. Proceed<strong>in</strong>gs of the<br />
American Society for Information Science and Technology, 42(1).<br />
Fu, J.-R., Farn, C.-K. & Chao, W.-P. (2006). Acceptance of electronic tax fil<strong>in</strong>g: A<br />
study of taxpayer <strong>in</strong>tentions. Information & Management, 43(1), 109-126.<br />
Fugmann, R. (1993). Subject Analysis and Index<strong>in</strong>g: Theoretical Foundation and<br />
Practical Advice. Frankfurt/Ma<strong>in</strong>: Indeks Verlag.<br />
Galvez, C., de Moya-Anegon, F. & Solana, V.H. (2005). Term conflation methods <strong>in</strong><br />
<strong>in</strong>formation retrieval: Non-l<strong>in</strong>guistic and l<strong>in</strong>guistic approaches. Journal of<br />
Documentation, 61(4), 520-547.<br />
Garcia, A.C., Dawes, M.E., Kohne, M.L., Miller, F.M. & Groschwitz, S.F. (2006).<br />
Workplace studies and technological change. Annual Review of Information<br />
Science and Technology, 40(1), 393-437.<br />
Gil-Garcia, J.R. & Mart<strong>in</strong>ez-Moyano, I.J. (2007). Understand<strong>in</strong>g the evolution of e<strong>government</strong>:<br />
The <strong>in</strong>fluence of systems of rules on public sector dynamics.<br />
Government Information Quarterly, 24, 266-290.<br />
Gilchrist, A. (2001). Corporate taxonomies: Report on a survey of current practice.<br />
Onl<strong>in</strong>e Information Review, 25(2), 94-103.<br />
Gilchrist, A. (2003). Tesauri, taxonomies and ontologies: An etymological note. Journal<br />
of Documentation, 59(1), 7-18.<br />
Gilliland-Swetland, A. (2005). Electronic records management. Annual Review of<br />
Information Science and Technology, 39, 219-253.<br />
Gilliland, A.J. (2008). Sett<strong>in</strong>g the stage. In: Baca, M. (Ed.), Introduction to Metadata<br />
(Onl<strong>in</strong>e version, ver. 3.0 ed.).<br />
Glassey, O. (2002). A one-stop <strong>government</strong> prototype based on use cases and scenarios.<br />
In: Traunmüller, R. & Lenk, K. (Eds.), EGOV 2002 (pp. 116-123).<br />
Glassey, O. (2004). Develop<strong>in</strong>g a one-stop <strong>government</strong> data model. Government<br />
Information Quarterly, 21(2), 156-169.<br />
Glazer, R. (1993). Measur<strong>in</strong>g the value of <strong>in</strong>formation: The <strong>in</strong>formation-<strong>in</strong>tensive<br />
organization. IBM Systems Journal, 32(1), 99-110.<br />
Goh, D.H.-L., Chua, A.Y.-K., Luyt, B. & Lee, C.S. (2008). Knowledge access, creation<br />
and transfer <strong>in</strong> e-<strong>government</strong> portals. Onl<strong>in</strong>e <strong>in</strong>formation review, 32(3), 348-369.<br />
Golder, S.A. & Huberman, B.A. (2006). Usage patterns of collaborative tagg<strong>in</strong>g<br />
systems. Journal of Information Science, 32(2), 198-208.<br />
Golub, K. (2006). Automated subject classification of textual web documents. Journal<br />
of Documentation, 62(3), 350-371.<br />
Golub, K. (2007). Automated Subject Classification of Textual Documents <strong>in</strong> the<br />
Context of Web-Based Hierarchical Brows<strong>in</strong>g. Unpublished PhD thesis, Lund<br />
University, Lund.<br />
Gomez, L.M., Lochbaum, C.C. & Landauer, T.K. (1990). All the right words: F<strong>in</strong>d<strong>in</strong>g<br />
what you want as a function of richness of <strong><strong>in</strong>dex<strong>in</strong>g</strong> vocabulary. Journal of the<br />
American Society for Information Science, 41(8), 547-559.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Gouscos, D., Lambrou, M., Mentzas, G. & Georgiadis, P. (2003). A methodological<br />
approach for def<strong>in</strong><strong>in</strong>g one-stop e-<strong>government</strong> service offer<strong>in</strong>gs. In: Traunmüller,<br />
R. (Ed.), Electronic Government (pp. 173-176). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />
Grant, G. & Chau, D. (2005). Develop<strong>in</strong>g a generic framework for e-<strong>government</strong>.<br />
Journal of Global Information Management, 13(1), 1-30.<br />
Greenbaum, T.L. (1993). The Handbook for Focus Group Research (Revised and<br />
expanded ed.). New York: Lex<strong>in</strong>gton.<br />
Gross, T. & Taylor, A.G. (2005). What have we got to lose? The effect of controlled<br />
vocabulary on keyword search<strong>in</strong>g results. College & Research Libraries, 66(3),<br />
212-230.<br />
Grundén, K. (2009). A social perspective on implementation of e-<strong>government</strong>: A<br />
longitud<strong>in</strong>al study at the County Adm<strong>in</strong>istration of Sweden. Electronic Journal<br />
of e-Government, 7(1), 65-76.<br />
Grönlund, Å. (2003). Emerg<strong>in</strong>g electronic <strong>in</strong>frastructures: Explor<strong>in</strong>g democratic<br />
components. Social Science Computer Review, 21(1), 55-72.<br />
Grönlund, Å. (2005). What's <strong>in</strong> a field: Explor<strong>in</strong>g the eGovernment doma<strong>in</strong>,<br />
Proceed<strong>in</strong>gs of the 38th Hawaii International Conference on System Sciences.<br />
Grönlund, Å. (2010). Ten years of e-<strong>government</strong>: The 'end of history' and new<br />
beg<strong>in</strong>n<strong>in</strong>g. In: Wimmer, M.A.e.a. (Ed.), Electronic Government (pp. 13-24):<br />
Spr<strong>in</strong>ger.<br />
Grönlund, Å. & Horan, T.A. (2004). Introduc<strong>in</strong>g e-gov: history, def<strong>in</strong>ition, and issues.<br />
Communications of the Association for Information Systems, 15, 713-729.<br />
Gunnlaugsdottir, J. (2008). Register<strong>in</strong>g and search<strong>in</strong>g for records <strong>in</strong> electronic records<br />
management systems. International Journal of Information Management, 28(4),<br />
293-304.<br />
Ha, L. & Zenebe, A. (2008). Knowledge management <strong>in</strong> <strong>government</strong>, The 2nd<br />
International International Conference <strong>in</strong> Knowledge Generation,<br />
Communication and Management. Orlando, Florida: International Institute of<br />
Informatics and Systemics.<br />
Halcomb, E.J. & Davidson, P.M. (2006). Is verbatim transcription of <strong>in</strong>terview data<br />
always necessary? Applied Nurs<strong>in</strong>g Research, 19(1), 38–42.<br />
Halkier, B. (2008). Fokusgrupper (2. ed.). Frederiksberg: Samfundslitteratur.<br />
Hammarström, H. (2006). A naive theory of affixation and an algorithm for extraction.<br />
In: Wicentowski, R. & Kondrak, G. (Eds.), SIGPHON '06: Proceed<strong>in</strong>gs of the<br />
Eighth Meet<strong>in</strong>g of the ACL Special Interest Group on Computational Phonology<br />
and Morphology (pp. 79-88). Stroudsburg: Association for Computational<br />
L<strong>in</strong>guistics.<br />
Harman, D.K. & Voorhees, E.M. (2006). TREC: An overview. Annual Review of<br />
Information Science and Technology, 40, 113-155.<br />
Hawk<strong>in</strong>g, D. (2004). Challenges <strong>in</strong> enterprise search. In: Proceed<strong>in</strong>gs of the 15th<br />
Australasian database conference, (pp. 15-24). Duned<strong>in</strong>, New Zealand.<br />
Hayes, P.J. & We<strong>in</strong>ste<strong>in</strong>, S.P. (1990). Construe-TIS: A system for content-based<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong> of a database of news stories. In: Rappaport, A. & Smith, R. (Eds.),<br />
The Second Conference on Innovative Applications of Artificial Intelligence<br />
(IAAI), Wash<strong>in</strong>gton, DC. Menlo Park, California: AAAI Press.<br />
Haynes, D. (2004). Metadata for Information Management and Retrieval. London:<br />
Facet.<br />
Hazlett, S.A., McAdam, R. & Beggs, V. (2008). An exploratory study of knowledge<br />
flows: A case study of Public Sector Procurement. Total Quality Management,<br />
19(1-2), 57-66.<br />
He, J., Shu, B., Li, X. & Yan, H. (2010). Effective Time Ratio: A Measure for Web<br />
Search Eng<strong>in</strong>es with Document Snippets<br />
230
231<br />
References<br />
Information Retrieval Technology. In: Cheng, P.-J., Kan, M.-Y., Lam, W. & Nakov, P.<br />
(Eds.), 6th Asia Information Retrieval Societies Conference, AIRS 2010, Taipei,<br />
Taiwan, December 1-3, 2010. Proceed<strong>in</strong>gs (Vol. 6458, pp. 73-84). Berl<strong>in</strong>:<br />
Spr<strong>in</strong>ger.<br />
Healey, J.F. (2007). The Essentials of Statistics: A Tool for Social Research. Belmont,<br />
CA: Thomson Higher Education.<br />
Hedlund, T. (2002). Compounds <strong>in</strong> dictionary-based cross-language <strong>in</strong>formation<br />
retrieval. Information Research, 7(2).<br />
Heeks, R. & Bailur, S. (2006). Analyz<strong>in</strong>g eGovernment Research: Perspectives,<br />
Philosophies, Theories, Methods and Practice (Vol. 16, iGovernment Work<strong>in</strong>g<br />
Paper Series). Manchester: University of Manchester, Institute for Development<br />
Policy and Management.<br />
Heeks, R. & Bailur, S. (2007). Analyz<strong>in</strong>g e-<strong>government</strong> research: Perspectives,<br />
philosophies, theories, methods, and practice. Government Information<br />
Quarterly, 24(2), 243-265.<br />
Helbig, N., Dawes, S.S., Mulki, F.H., Hrd<strong>in</strong>ova, J.L. & Cook, M.E. (2008).<br />
International Digital Government Research: A Reconnaissance Study: Center<br />
for Technology <strong>in</strong> Government, University at Albany, SUNY.<br />
Henriksen, H.Z. & Damsgaard, J. (2006). The rise and descent of visions for e<strong>government</strong>.<br />
In: Donnellan, B., Larsen, T.J., Lev<strong>in</strong>e, L. & DeGross, J.I. (Eds.),<br />
The Transfer and Diffusion of Information Technology got Organizational<br />
Resilience: IFIP TC8 WG 8.6 International Work<strong>in</strong>g Conference, June 7-10,<br />
2006, Galway, Ireland (pp. 275-289). New York: Spr<strong>in</strong>ger.<br />
Herzum, M., Andersen, H.H.K., Andersen, V. & Hansen, C.B. (2002). Trust <strong>in</strong><br />
<strong>in</strong>formation sources: Seek<strong>in</strong>g <strong>in</strong>formation from people, documents, and virtual<br />
agents. Interact<strong>in</strong>g with Computers, 14(5), 575-599.<br />
Herzum, M. & Pejtersen, A.M. (2000). The <strong>in</strong>formation-seek<strong>in</strong>g practices of eng<strong>in</strong>eers:<br />
search<strong>in</strong>g for documents as well as for people. Information Process<strong>in</strong>g &<br />
Management, 36(5), 761-778.<br />
Hjørland, B. (2002). Doma<strong>in</strong> analysis <strong>in</strong> <strong>in</strong>formation science: Eleven approaches<br />
traditional as well as <strong>in</strong>novative. Journal of Documentation, 58(4), 422-462.<br />
Hjørland, B. & Albrechtsen, H. (1995). Toward a new horizon <strong>in</strong> <strong>in</strong>formation science:<br />
Doma<strong>in</strong> analysis. Journal of the American Society for Information Science,<br />
46(6), 400-425.<br />
Hochstotter, N. & Koch, M. (2009). Standard parameters for search<strong>in</strong>g behaviour <strong>in</strong><br />
search eng<strong>in</strong>es and their empirical evaluation. Journal of Information Science,<br />
35(1), 45-65.<br />
Hodge, G.M. (1994). Computer-assisted database <strong><strong>in</strong>dex<strong>in</strong>g</strong>: The state-of-the-art. The<br />
Indexer, 19(1), 23-27.<br />
Homburg, V. (2004). E-<strong>government</strong> and NPM: a perfect marriage? In: Janssen, M., Sol,<br />
H.G. & Wagenaar, R.W., (Eds.), ICEC '04 Proceed<strong>in</strong>gs of the 6th <strong>in</strong>ternational<br />
conference on Electronic commerce, (pp. 547-555). New York: ACM.<br />
Hovy, E. (2008a). An outl<strong>in</strong>e for the foundations of digital <strong>government</strong> research. In:<br />
Chen, H., Brandt, L., Gregg, V., Traunmüller, R., Dawes, S., Hovy, E.,<br />
Mac<strong>in</strong>tosh, A. & Larson, C.A. (Eds.), Digital Government: E-<strong>government</strong><br />
Research, Case studies, and Implementation (pp. 43-59). New York: Spr<strong>in</strong>ger.<br />
Hu, G., Pan, W. & Wang, J. (2010). The dist<strong>in</strong>ctive lexicon and consensual conception<br />
of e-Government: an exploratory perspective. International Review of<br />
Adm<strong>in</strong>istrative Sciences, 76(3), 577-597.<br />
Hu, P.J.-H., Brown, S.A., Thong, J.Y.L., Chan, F.K.Y. & Tam, K.Y. (2008).<br />
Determ<strong>in</strong>ants of service quality and cont<strong>in</strong>uance <strong>in</strong>tention of onl<strong>in</strong>e services:The<br />
case of eTax. Journal of the American Society for Information Science and<br />
Technology, 60(2), 292-306.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Hu, P.J.-H., Hsu, F.-M., Hu, H.-f. & Chen, H. (2010). Agency satisfaction with<br />
electronic record management systems: A large-scale survey. Journal of the<br />
American Society for Information Science and Technology, 61(12), 2559-2574.<br />
Huang, J. & Efthimiadis, E.N. (2009). Analyz<strong>in</strong>g and evaluat<strong>in</strong>g query reformulation<br />
strategies <strong>in</strong> web search logs, Proceed<strong>in</strong>gs of the 18th ACM conference on<br />
Information and knowledge management. Hong Kong, Ch<strong>in</strong>a: ACM.<br />
Humphrey, S.M. (1989). Research on <strong>in</strong>teractive knowledge-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>: The<br />
MedIndEx prototype. Proceed<strong>in</strong>gs of the Annual Symposium on Computer<br />
Application <strong>in</strong> Medical Care, 527-533.<br />
Hunter, J. (2009). Collaborative semantic tagg<strong>in</strong>g and annotation systems. Annual<br />
Review of Information Science and Technology, 43, 187-239.<br />
Iivonen, M. (1995). Consistency <strong>in</strong> the selection of search concepts and search terms.<br />
Information Process<strong>in</strong>g & Management, 31(2), 173-190.<br />
Ingwersen, P. (1986a). Cognitive analysis and the role of the <strong>in</strong>termediary <strong>in</strong><br />
<strong>in</strong>formation retrieval. In: Davies, R. (Ed.), Intelligent Information Systems:<br />
Progress and Prospects (pp. 206-237). Chichester: Horwood.<br />
Ingwersen, P. (1992). Information Retrieval Interaction. London: Taylor Graham.<br />
Ingwersen, P. (1994). Systemudvikl<strong>in</strong>g i et <strong>in</strong>-house miljø: Folket<strong>in</strong>gets<br />
emneordssystem som case-studie. Biblioteksarbejde, 41, 5-23.<br />
Ingwersen, P. (1996). Cognitive perspectives of <strong>in</strong>formation retrieval <strong>in</strong>teraction:<br />
elements of a cognitive IR theory. Journal of Documentation, 52(1), 3-50.<br />
Ingwersen, P. (1999). Cognitive <strong>in</strong>formation retrieval. Annual Review of Information<br />
Science and Technology, 34, 3-52.<br />
Ingwersen, P. (2000). Users <strong>in</strong> context. In: Agosti, M., Crestani, F. & Pasi, G. (Eds.),<br />
Lectures on Information Retrieval (pp. 157-178): Spr<strong>in</strong>ger.<br />
Ingwersen, P. & Järvel<strong>in</strong>, K. (2005). The Turn: Integration of Information Seek<strong>in</strong>g and<br />
Retrieval <strong>in</strong> Context. Dordrecht: Spr<strong>in</strong>ger.<br />
Ingwersen, P. & Järvel<strong>in</strong>, K. (2007). On the holistic cognitive theory for <strong>in</strong>formation<br />
retrieval: Drift<strong>in</strong>g outside the cave of the laboratory framework. In: Dom<strong>in</strong>ich,<br />
S. & Kiss, F. (Eds.), International Conference on the Theory of Information<br />
Retrieval (pp. 135-147). Budapest, Hungary: Foundation for Information<br />
Society.<br />
Ingwersen, P. & Wormell, I. (1989). Modern <strong><strong>in</strong>dex<strong>in</strong>g</strong> and retrieval tecgniques<br />
match<strong>in</strong>g different types of <strong>in</strong>formation needs. In: Koskiala, S. & Launo, R.<br />
(Eds.), Proceed<strong>in</strong>gs of the forty-fourth FID Congress held <strong>in</strong> Hels<strong>in</strong>ki, F<strong>in</strong>land,<br />
28 August-1 September, 1988 (pp. 79-90). Amsterdam: Elsevier.<br />
ISO. (1985). Documentation: Methods for Exam<strong>in</strong><strong>in</strong>g Documents, Determ<strong>in</strong><strong>in</strong>g Their<br />
Subjects and Select<strong>in</strong>g Index<strong>in</strong>g Terms (ISO 5963-1985). Geneva: International<br />
Organization for Standardization.<br />
Israel, G.D. (1992). Determ<strong>in</strong><strong>in</strong>g Sample Size (Fact Sheet PEOD-6). Ga<strong>in</strong>esville, FL:<br />
University of Florida.<br />
Jaeger, P.T. (2003). The endless wire: E-<strong>government</strong> as global phenomenon.<br />
Government Information Quarterly, 20, 323-331.<br />
Jaeger, P.T. & Thompson, K.M. (2004). Social <strong>in</strong>formation behavior and the democratic<br />
process: Information poverty, normative behavior, and electronic <strong>government</strong> <strong>in</strong><br />
the United States. Library & Information Science Research, 26(1), 94-107.<br />
Ja<strong>in</strong>, A.K., Murty, M.N. & Flynn, P.J. (1999). Data cluster<strong>in</strong>g: A review. ACM<br />
Comput<strong>in</strong>g Surveys, 31(3), 264-323.<br />
Jansen, B.J. (2006). Search log analysis: What it is, what's been done, how to do it.<br />
Library & Information Science Research, 28(3), 407-432.<br />
Jansen, B.J. & Pooch, U. (2001). A review of web search<strong>in</strong>g studies and a framework<br />
for future research. Journal of the American Society for Information Science and<br />
Technology, 52(3), 235-246.<br />
232
233<br />
References<br />
Jansen, B.J., Sp<strong>in</strong>k, A. & Saracevic, T. (2000). Real life, real users, and real needs: A<br />
study and analysis of user queries on the web. Information Process<strong>in</strong>g &<br />
Management, 36(2), 207-227.<br />
Johansen, H.C. (2007). Dansk skattehistorie: Indkomstskatter og offentlig vækst 1903-<br />
2005 (Vol. 6): Told- og Skattehistorisk Selskab.<br />
Johnson, J.D., Donohue, W.A., Atk<strong>in</strong>, C.K. & Johnson, S. (1995). A comprehensive<br />
model of <strong>in</strong>formation seek<strong>in</strong>g: Tests focus<strong>in</strong>g on a technical organization.<br />
Science Communication, 16(3), 274-303.<br />
Johnston, J. (2004). Public adm<strong>in</strong>istration: Organizational aspects. In: International<br />
Encyclopedia of the Social & Behavioral Sciences (pp. 12507-12512).<br />
Johnston, J. & Callender, G. (1997). Vulnerable <strong>government</strong>s: Inadvertent de-skill<strong>in</strong>g <strong>in</strong><br />
the new global economic and managerialist paradigm? International Review of<br />
Adm<strong>in</strong>istrative Sciences, 63(1), 41-56.<br />
Jones, W.P. & Furnas, G.W. (1987). Pictures of relevance: A geometric analysis of<br />
swimilarity measures. Journal of the American Society for Information Science,<br />
38(6), 420-442.<br />
Järvel<strong>in</strong>, K. (2007). An analysis of two approaches <strong>in</strong> <strong>in</strong>formation retrieval: From<br />
frameworks to study designs. Journal of the American Society for Information<br />
Science and Technology, 58(7), 971-986.<br />
Järvel<strong>in</strong>, K. & Kekälä<strong>in</strong>en, J. (2002). Cumulated ga<strong>in</strong>-based evaluation of IR<br />
techniques. ACM Transactions on Information Systems, 20(4), 422-446.<br />
Kavadias, G. & Tambouris, E. (2003). GovML: A markup language for describ<strong>in</strong>g<br />
public services and life events. In: Wimmer, M.A. (Ed.), Knowledge<br />
Management <strong>in</strong> Electronic Government. Proceed<strong>in</strong>gs of the 4th IFIP<br />
International Work<strong>in</strong>g Conference, KMGov 2003, Rhodes, Greece, May 26–28,<br />
2003. Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />
Kelly, D. (2009). Methods for evaluat<strong>in</strong>g <strong>in</strong>teractive <strong>in</strong>formation retrieval systems with<br />
users. Foundations and Trends <strong>in</strong> Information Retrieval, 3(1-2), 1-224.<br />
Kent, A., Berry, M.M., Luehrs, F.U. & Perry, J.W. (1955). Mach<strong>in</strong>e literature search<strong>in</strong>g<br />
VIII. Operational criteria for design<strong>in</strong>g <strong>in</strong>formation retrieval systems. American<br />
Documentation, 6(2), 93-101.<br />
Kettunen, K. & Henttonen, P. (2010). Miss<strong>in</strong>g <strong>in</strong> action? Content of records<br />
management metadata <strong>in</strong> real life. Library & Information Science Research,<br />
32(1), 43-52.<br />
Kipp, M.E.I. (2005). Complementary or discrete contexts <strong>in</strong> onl<strong>in</strong>e <strong><strong>in</strong>dex<strong>in</strong>g</strong>: A<br />
comparison of user, creator, and <strong>in</strong>termediary keywords. Canadian Journal of<br />
Information and Library Science, 29(4), 419-436.<br />
Klischewski, R. (2006). Ontologies for e-document management <strong>in</strong> public<br />
adm<strong>in</strong>istration. Bus<strong>in</strong>ess Process Management Journal, 12(1), 34-47.<br />
Kopackova, H., Michalek, K. & Cejna, K. (2010). Accessibility and f<strong>in</strong>dability of local<br />
e-<strong>government</strong> websites <strong>in</strong> the Czech Republic. Universal Access In The<br />
Information Society, 9(1), 51-61.<br />
Korfhage, R.R. (1997). Information Storage and Retrieval. New York: Wiley.<br />
Koshman, S., Sp<strong>in</strong>k, A. & Jansen, B.J. (2006). Web search<strong>in</strong>g on the Vivisimo search<br />
eng<strong>in</strong>e. Journal of the American Society for Information Science and<br />
Technology, 57(14), 1875-1887.<br />
Kotsiantis, S.B., Zaharakis, I.D. & P<strong>in</strong>telas, P.E. (2006). Mach<strong>in</strong>e learn<strong>in</strong>g: A review of<br />
classification and comb<strong>in</strong><strong>in</strong>g techniques. Artificial Intelligence Review, 26(3),<br />
159-190.<br />
Kraemer, K.L. & Dedrick, J. (1997). Comput<strong>in</strong>g and Public Organizations. Journal of<br />
Public Adm<strong>in</strong>istration Research and Theory, 7(1), 89-112.<br />
Kraemer, K.L. & K<strong>in</strong>g, J.L. (1986). Comput<strong>in</strong>g and public organizations. Public<br />
Adm<strong>in</strong>istration Review, 46(6), 488-496.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Krippendorff, K. (2004). Content Analysis: An Introduction to Its Methodology (2. ed.).<br />
Thousand Oaks: Sage.<br />
Krueger, R.A. (1998). Develop<strong>in</strong>g Questions for Focus Groups (Vol. 3). Thousand<br />
Oaks: Sage.<br />
Kuhlthau, C.C. & Tama, S.L. (2001). Information search process of lawyers: A call for<br />
'just for me' <strong>in</strong>formation services. Journal of Documentation, 57(1), 25-43.<br />
Kules, B. & Shneiderman, B. (2004). Categorized graphical overviews for web search<br />
results: An exploratory study us<strong>in</strong>g U. S. <strong>government</strong> agencies as a mean<strong>in</strong>gful<br />
and stable structure, Proceed<strong>in</strong>gs of the Third Annual Workshop on HCI<br />
Research <strong>in</strong> MIS. Wash<strong>in</strong>gton, D.C.<br />
Kules, B. & Shneiderman, B. (2005). Us<strong>in</strong>g mean<strong>in</strong>gful and stable categories to support<br />
exploratory web search: Two formative studies (HCIL Technical Report 2005-<br />
31). Maryland: Human-Computer Interaction Laboratory, University of<br />
Maryland.<br />
Kvale, S. & Br<strong>in</strong>kmann, S. (2009). Interviews: Learn<strong>in</strong>g the Craft of Qualitative<br />
Research Interview<strong>in</strong>g (2. ed.). Los Angeles: Sage.<br />
Käki, M. (2005a). Enhanc<strong>in</strong>g Web Search Result Access with <strong>Automatic</strong> Categorization.<br />
Unpublished Doctoral Dissertation, Department of Computer Sciences,<br />
University of Tampere, Tampere, F<strong>in</strong>land, from http://acta.uta.fi/pdf/951-44-<br />
6490-7.pdf.<br />
Käki, M. (2005b). F<strong>in</strong>dex: Search result categories help users when document rank<strong>in</strong>g<br />
fails. In: Proceed<strong>in</strong>gs of the SIGCHI conference on Human factors <strong>in</strong><br />
comput<strong>in</strong>g systems, (pp. 131-140). Portland, Oregon: ACM.<br />
Käki, M. & Aula, A. (2005). F<strong>in</strong>dex: Improv<strong>in</strong>g search result use through automatic<br />
filter<strong>in</strong>g categories. Interact<strong>in</strong>g with Computers, 17(2), 187-206.<br />
Lancaster, F.W. (2003). Index<strong>in</strong>g and Abstract<strong>in</strong>g <strong>in</strong> Theory and Practice (3. ed.).<br />
London: Facet.<br />
Landsforen<strong>in</strong>gen af Kommunale Servicecentre, A.o.I. (2005). LKS: Projekt<br />
Borgerbetjen<strong>in</strong>g 2007: Rapport fra arbejdsgruppen om IT.<br />
Large, A., Tedd, L.A. & Hartley, R.J. (2001). Information Seek<strong>in</strong>g <strong>in</strong> the Onl<strong>in</strong>e Age:<br />
Pr<strong>in</strong>ciples and Practice. München: K. G. Saur.<br />
Lau, E.P. & Goh, D.H.-L. (2006). In search of query patterns: A case study of a<br />
university OPAC. Information Process<strong>in</strong>g & Managament, 42, 1316-1329.<br />
Layne, K. & Lee, J. (2001). Develop<strong>in</strong>g fully functional E-<strong>government</strong>: A four stage<br />
model. Government Information Quarterly, 18, 122-136.<br />
Leckie, G.J., Pettigrew, K.E. & Sylva<strong>in</strong>, C. (1996). Model<strong>in</strong>g the <strong>in</strong>formation seek<strong>in</strong>g of<br />
professionals: A general model derived from research on eng<strong>in</strong>eers, health care<br />
professionals, and lawyers. Library Quarterly, 66(2), 161-193.<br />
Lev<strong>in</strong>e, M.M. (1974). Information Needs <strong>in</strong> Milwaukee: Agencies and Groups (Ed-089<br />
769). Milwuakee: Milwaukee Urban Observatory.<br />
Levy, P.S. & Lemeshow, S. (2008). Sampl<strong>in</strong>g of Populations: Methods and<br />
Applications (4. ed.). Hoboken, New Jersey: Wiley.<br />
Lips, M. (1998). Reorganiz<strong>in</strong>g public service delivery <strong>in</strong> an <strong>in</strong>formation age. In:<br />
Snellen, I.T.M. & van de Donk, W.B.H.J. (Eds.), Public Adm<strong>in</strong>istration <strong>in</strong> an<br />
Information Age (pp. 325-339). Amsterdam: IOS.<br />
Liu, Y., Zhu, L. & Gorton, I. (2007). Performance Assessment for e-Government<br />
Services: An Experience Report. In: Schmidt, H.W., Crnkovic, I., He<strong>in</strong>eman,<br />
G.T. & Stafford, J.A. (Eds.), Component-Based Software Eng<strong>in</strong>eer<strong>in</strong>g. 10th<br />
International Symposium, CBSE 2007, Medford, MA, USA, July 9-11, 2007 (pp.<br />
74-89). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />
Lov<strong>in</strong>s, J.B. (1968). Development of a stemm<strong>in</strong>g algorithm. Mechanical Translation<br />
and Computational L<strong>in</strong>guistics, 11(1-2), 22-31.<br />
234
235<br />
References<br />
Lu, L. & Yuan, Y.C. (2011). Shall I google it or ask the competent villa<strong>in</strong> down the<br />
hall? The moderat<strong>in</strong>g role of <strong>in</strong>formation need <strong>in</strong> <strong>in</strong>formation source selection.<br />
Journal of the American Society for Information Science and Technology, 62(1),<br />
133-145.<br />
Luhn, H.P. (1957). A statistical approach to mechanized encod<strong>in</strong>g and search<strong>in</strong>g of<br />
literary <strong>in</strong>formation. IBM Journal of Research and Development, 1(4), 309-317.<br />
Luhn, H.P. (1958a). The automatic creation of literature abstracts. IBM Journal of<br />
Research and Development, 2(2), 159-165.<br />
Luhn, H.P. (1961). The automatic derivation of <strong>in</strong>formation retrieval encodements from<br />
mach<strong>in</strong>e-readable texts. In: Kent, A. (Ed.), Information Retrieval and Mach<strong>in</strong>e<br />
Translation (Vol. 3, pt. 2, pp. 1021-1028). New York: Interscience.<br />
Lykke, M., Price, S. & Delcambre, L. (2012). How doctors search: A study of query<br />
behaviour and the impact on search results. Information Process<strong>in</strong>g &<br />
Managament(0).<br />
MacMull<strong>in</strong>, S.E. & Taylor, R.S. (1984). Problem dimensions and <strong>in</strong>formation traits. The<br />
Information Society, 3(1), 91-111.<br />
Mahler, J.G. & Regan, P.M. (2005). Agency <strong>in</strong>ternets and the chang<strong>in</strong>g dynamics of<br />
congressional oversight. In: Garson, G.D. (Ed.), Handbook of Public<br />
Information Systems (2. ed., pp. 559-568). Boca Raton: Taylor & Francis.<br />
Mai, J.-E. (2004b). The future of general classification. Catalog<strong>in</strong>g & Classification<br />
Quarterly, 37(1 & 2), 3-12.<br />
Mai, J.E. (2000). Deconstruct<strong>in</strong>g the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process. Advances <strong>in</strong> Librarianship, 23,<br />
269-298.<br />
Mai, J.E. (2005). Analysis <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong>: document and doma<strong>in</strong> centered approaces.<br />
Information Process<strong>in</strong>g & Management, 41, 599-611.<br />
Makri, S., Blandford, A. & Cox, A.L. (2008a). Investigat<strong>in</strong>g the <strong>in</strong>formation-seek<strong>in</strong>g<br />
behaviour of academic lawyers: From Ellis' model to design. Information<br />
Process<strong>in</strong>g & Management, 44(2), 613-634.<br />
Mandersloot, W.G.B., Douglas, E.M.B. & Spicer, N. (1970). Thesaurus control: The<br />
selection, group<strong>in</strong>g, and cross-referenc<strong>in</strong>g of terms for <strong>in</strong>clusion <strong>in</strong> a coord<strong>in</strong>ate<br />
<strong>in</strong>dex word list. Journal of the American Society for Information Science, 21(1),<br />
49-57.<br />
Marcella, R., Baxter, G., Davies, S. & Toornstra, D. (2007). The <strong>in</strong>formation needs and<br />
<strong>in</strong>formation-seek<strong>in</strong>g behaviour of the users of the European Parliamentary<br />
Documentation Centre: A customer knowledge study. Journal of<br />
Documentation, 63(6), 920-934.<br />
Marchion<strong>in</strong>i, G., Samet, H. & Brandt, L. (2003). Digital <strong>government</strong>. Communications<br />
of the ACM, 46(1), 25-27.<br />
Mar<strong>in</strong>i, F. (2000). Public adm<strong>in</strong>istration. In: Shafritz, J.M. (Ed.), Def<strong>in</strong><strong>in</strong>g Public<br />
Adm<strong>in</strong>istration: Selections from the International Encyclopedia of Public Policy<br />
and Adm<strong>in</strong>istration (pp. 3-16). Jaipur: Rawat.<br />
Markey, K. (2007a). Twenty-five years of end-user search<strong>in</strong>g, part 1: Research f<strong>in</strong>d<strong>in</strong>gs.<br />
Journal of the American Society for Information Science and Technology, 58(8),<br />
1071-1081.<br />
Mart<strong>in</strong>, B. (2008). Knowledge management. Annual Review of Information Science and<br />
Technology, 42, 371-424.<br />
Mart<strong>in</strong>ez, C., Lucey, J. & L<strong>in</strong>der, E. (1987). An expert system for mach<strong>in</strong>e-aided<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>. Journal of Chemical Information and Computer Sciences, 27(4), 158-<br />
162.<br />
Meijer, A.J. & Homburg, V.M.F. (2008). Introduction: Zoom<strong>in</strong>g <strong>in</strong> and zoom<strong>in</strong>g out on<br />
electronic <strong>government</strong>. International Journal of Public Adm<strong>in</strong>istration, 31(7),<br />
707-710.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Miles, M.B. & Huberman, A.M. (1994). Qualitative Data Analysis: An Expanded<br />
Sourcebook (2. ed.). Thousand Oaks: Sage.<br />
Millard, J. (2003). ePublic services <strong>in</strong> Europe: Past, present and future. Research<br />
f<strong>in</strong>d<strong>in</strong>gs and new challenges. Aarhus: Danish Technological Institute.<br />
Milstead, J.L. (1992). Methodologies for subject analysis <strong>in</strong> bibliographic databases.<br />
Information Process<strong>in</strong>g & Management, 28(3), 407-431.<br />
Milstead, J.L. (1994). Needs for research <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Journal of the American Society<br />
for Information Science, 45(8), 577-582.<br />
M<strong>in</strong>istry of f<strong>in</strong>ance. (2001). IT, the Internet, and the Public Sector. Copenhagen:<br />
M<strong>in</strong>istry of f<strong>in</strong>ance.<br />
Moen, W.E. (2001). The metadata approach to access<strong>in</strong>g <strong>government</strong> <strong>in</strong>formation.<br />
Government Information Quarterly, 18(3), 155-165.<br />
Moen, W.E. & McClure, C.R. (1997). An Evaluation of the Federal Government's<br />
Implementation of the Government Information Locator Service (GILS): F<strong>in</strong>al<br />
Report. Wash<strong>in</strong>gton, DC.: General Services Adm<strong>in</strong>istration Office of<br />
Information Technology Integration.<br />
Moens, M.-F. (2000). <strong>Automatic</strong> Index<strong>in</strong>g and Abstract<strong>in</strong>g of Document Texts. Boston:<br />
Kluwer.<br />
Morgan, D.L. (1996). Focus groups. Annual Review of Sociology, 22, 129-152.<br />
Mukherjee, R. & Mao, J. (2004). Enterprise search: Tough stuff. Queue, 2(2), 36-46.<br />
Nakash, R.A., Hutton, J.L., Lamb, S.E., Gates, S. & Fisher, J. (2008). Response and<br />
non-response to postal questionnaire follow-up <strong>in</strong> a cl<strong>in</strong>ical trial: A qualitative<br />
study of the patient's perspective. Journal of Evaluation <strong>in</strong> Cl<strong>in</strong>ical Practice, 14,<br />
226-235.<br />
National Archives of Australia (2010). Development history. Retrieved 01-04-2011,<br />
2011, from http://www.agls.gov.au/about/.<br />
National IT and Telecom Agency. (2009). Overordnede Pr<strong>in</strong>cipper og Best Practice:<br />
Version 1.0. Copenhagen: National IT and Telecom Agency.<br />
Neuendorf, K.A. (2002). The Content Analysis Guidebook. Thousand Oaks: Sage.<br />
Nicholas, D. & Colgrave, K. (1996). Councillors and <strong>in</strong>formation: A study of<br />
<strong>in</strong>formation needs and <strong>in</strong>formation provision. Aslib Proceed<strong>in</strong>gs, 48(2), 37-46.<br />
Nielsen, J.A., Kræmmergaard, P., Nielsen, P.A. & Bjørnholt, B. (2009). Det kommunale<br />
digitaliser<strong>in</strong>gslandskab 2009: Status og udfordr<strong>in</strong>ger. <strong>Aalborg</strong>: <strong>Aalborg</strong><br />
University.<br />
Nielsen, M.L. (2001). A framework for work task based thesaurus design. Journal of<br />
Documentation, 57(6), 774-797.<br />
Nielsen, M.L. (2004). Task-based evaluation of associative thesaurus <strong>in</strong> real-life<br />
environment. Proceed<strong>in</strong>gs of the 67th ASIS&T Annual Meet<strong>in</strong>g, 41, 437-447.<br />
Nikoi, S.K. (2008). Information needs of NGOs: A case study of NGO development<br />
workers <strong>in</strong> the northern region of Ghana. Information Development, 24(1), 44-<br />
52.<br />
NISO (2004). Understand<strong>in</strong>g Metadata. Retrieved 23-03, 2011, from<br />
http://www.niso.org/publications/press/Understand<strong>in</strong>gMetadata.pdf.<br />
OECD. (2010). Denmark: Efficient E-<strong>government</strong> For Smarter Public Service Delivery:<br />
Prelim<strong>in</strong>ary Copy. Paris, France: OECD.<br />
Oh, C.H. (1996). Information search<strong>in</strong>g <strong>in</strong> <strong>government</strong>al bureaucracies: An <strong>in</strong>tegrated<br />
model. The American Review of Public Adm<strong>in</strong>istration, 26(1), 41-70.<br />
Olsen, H. (1997). Tal taler ikke uden ord. Politica, 29(3), 295-310.<br />
Orton, R., Marcella, R. & Baxter, G. (2000). An observational study of the <strong>in</strong>formation<br />
seek<strong>in</strong>g behaviour of Members of Parliament <strong>in</strong> the United K<strong>in</strong>gdom. Aslib<br />
Proceed<strong>in</strong>gs, 52(6), 207-217.<br />
236
237<br />
References<br />
Palkovits, S., Woitsch, R. & Karagiannis, D. (2003). Process-based knowledge<br />
management and modell<strong>in</strong>g <strong>in</strong> e-<strong>government</strong>: An <strong>in</strong>evitable comb<strong>in</strong>ation. In:<br />
Wimmer, M.A. (Ed.), KMGov 2003 (pp. 213-218): Spr<strong>in</strong>ger.<br />
Pedersen, B.S., Navarretta, C. & Hansen, D.H. (2005). Ontologibaseret teksthåndter<strong>in</strong>g:<br />
Med sprogteknologi (VID-rapport no. 6). Copenhagen: Center for<br />
Sprogteknologi.<br />
Pedersen, B.S., Navarretta, C. & Henriksen, L. (2004). Build<strong>in</strong>g bus<strong>in</strong>ess ontologies<br />
with language technology techniques: The VID project. In: OntoLex 2004<br />
Proceed<strong>in</strong>gs (pp. 30-35). Paris: European Language Resources Association.<br />
Peel, M. & Rowley, J. (2010). Information shar<strong>in</strong>g practice <strong>in</strong> multi-agency work<strong>in</strong>g.<br />
Aslib Proceed<strong>in</strong>gs, 62(1), 11-28.<br />
Peres, M., Guzmán, F. & Valbuena, T. (2009). Onl<strong>in</strong>e <strong>government</strong> strategy<br />
development model for <strong>in</strong>teractional and transactional phases <strong>in</strong> the territorial<br />
order, The 3rd International Conference on Theory and Practice of Electronic<br />
Governance. Bogota, Columbia: ACM.<br />
Peristeras, V., Tatabanis, K. & Goudos, S.K. (2009). Model-driven eGovernment<br />
<strong>in</strong>teroperability: A review of the state of the art. Computer Standards &<br />
Interfaces, 31(4), 316-328.<br />
Personalestyrelsen (2010). Forhandl<strong>in</strong>gsdatabasen. Retrieved 26-01, 2010, from<br />
http://perst.dk/Arbejdspladsen/Ledelses<strong>in</strong>formation%20og%20statistik/Ledelsesi<br />
nformation%20og%20lonstyr<strong>in</strong>g/Forhandl<strong>in</strong>gsdatabasen.aspx.<br />
Philipson, K.B. (2008). Indekser<strong>in</strong>gsprocessen: Konsistensmål til sammenlign<strong>in</strong>g af<br />
tilgange til emnebestemmelse og emnebeskrivelse. Dansk Biblioteksforskn<strong>in</strong>g,<br />
4(3), 57-71.<br />
Poland, B.D. (2003). Transcription qualiry. In: Holste<strong>in</strong>, J.A. & Gubrium, J.F. (Eds.),<br />
Inside Interview<strong>in</strong>g: New Lenses, New Concerns (pp. 267-287). Thousand Oaks:<br />
Sage.<br />
Porter, M.F. (1980). An algorithm for suffix stripp<strong>in</strong>g. Program: Electronic Library and<br />
Information Systems, 14(3), 130-137.<br />
Porter, M.F. (2001). Snowball: A language for stemm<strong>in</strong>g algorithms. Retrieved 19-08,<br />
2011, from http://snowball.tartarus.org/texts/<strong>in</strong>troduction.html.<br />
Price, S.L., Nielsen, M.L., Delcambre, L.M.L. & Vedsted, P. (2007). Semantic<br />
components enhance retrieval of doma<strong>in</strong>-specific documents. In: CIKM '07:<br />
Proceed<strong>in</strong>gs of the sixteenth ACM conference on Conference on <strong>in</strong>formation and<br />
knowledge management (pp. 429-438). New York: ACM.<br />
Price, S.L., Nielsen, M.L., Delcambre, L.M.L., Vedsted, P. & Ste<strong>in</strong>hauer, J. (2009).<br />
Us<strong>in</strong>g semantic components to search for doma<strong>in</strong>-specific documents: An<br />
evaluation from the system perspective and the user perspective. Information<br />
Systems, 34, 724-752.<br />
Project Digital Government & The Digital Taskforce (2002). Towards E-Government:<br />
Vision and Strategy for the Public Sector <strong>in</strong> Denmark. Retrieved 13-07, 2010,<br />
from http://www.epractice.eu/files/media/media_362.pdf.<br />
Quam, E. (2001). Inform<strong>in</strong>g and evaluat<strong>in</strong>g a metadata <strong>in</strong>itiative: Usability and<br />
metadata studies <strong>in</strong> M<strong>in</strong>nesotaメs Foundations Project. Government Information<br />
Quarterly, 18(3), 181-194.<br />
Quirchmayr, G. & Traunmüller, R. (1991). Expert systems <strong>in</strong> law and public<br />
adm<strong>in</strong>istration: Recent developments and future prospects. In: Traunmüller, R.<br />
(Ed.), Governmental and Municipal Information Systems, II: Proceed<strong>in</strong>gs of the<br />
2nd IFIP TC(/WG8.5 Work<strong>in</strong>g Conference on Governmental and Municipal<br />
Information Systems, Balatonfüred, Hungary, 3-6 June (pp. 145-163).<br />
Amsterdan: Elsevier.<br />
Rafferty, P. & Hidderley, R. (2007). Flickr and Democratic Index<strong>in</strong>g: Dialogic<br />
approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Aslib Proceed<strong>in</strong>gs, 59(4/5), 397-410.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Ra<strong>in</strong>s, S.A. (2008). Health at high speed: Broadband <strong>in</strong>ternet access, health<br />
communication, and the digital divide. Communication Research, 35(3), 283-<br />
297.<br />
Rasmussen, E. (1992). Cluster<strong>in</strong>g algorithms. In: Frakes, W.B. & Baeza-Yates, R.<br />
(Eds.), Information Retrieval: Data Structures & Algorithms (pp. 419-442).<br />
Englewood Cliffs, New Jersey: Prentice Hall.<br />
Rasmussen, E.M. (2003). Index<strong>in</strong>g and retrieval for the Web. Annual Review of<br />
Information Science and Technology, 37, 91-124.<br />
Reddick, C.G. (2005). Citizen <strong>in</strong>teraction with e-<strong>government</strong>: From the streets to<br />
servers? Government Information Quarterly, 22(1), 38-57.<br />
Ren, W.-H. (1999). Self-efficacy and the search for <strong>government</strong> <strong>in</strong>formation. Reference<br />
& User Services Quarterly, 38(3), 283-291.<br />
Robb<strong>in</strong>, A., Courtright, C. & Davis, L. (2004). ICTs and political life. Annual Review of<br />
Information Science and Technology, 38(1), 411-482.<br />
Robertson, S.E. & Hancock-Beaulieu, M.M. (1992). On the evaluation of IR systems.<br />
Information Process<strong>in</strong>g & Management, 28(4), 457-466.<br />
Roitblat, H.L., Kershaw, A. & Oot, P. (2010). Document categorization <strong>in</strong> legal<br />
electronic discovery: Computer classification vs. manual review. Journal of the<br />
American Society for Information Science and Technology, 61(1), 70-80.<br />
Roll<strong>in</strong>g, L. (1981). Index<strong>in</strong>g consistency, quality and efficiency. Information<br />
Process<strong>in</strong>g & Management, 17(2), 69-76.<br />
Rouse, W.B. & Rouse, S.H. (1984). Human <strong>in</strong>formation seek<strong>in</strong>g and design of<br />
<strong>in</strong>formation systems. Information Process<strong>in</strong>g & Management, 20(1-2), 129-138.<br />
Rowley, J. (1988). Abstract<strong>in</strong>g and Index<strong>in</strong>g (2. ed.). London: Clive B<strong>in</strong>gley.<br />
Rowley, J. (1994). The controlled versus natural <strong><strong>in</strong>dex<strong>in</strong>g</strong> language debate revisited: A<br />
perspective on <strong>in</strong>formation retrieval practice and research. Journal of<br />
Information Science, 20(2), 108-118.<br />
Rowley, J. (2011). e-Government stakeholders: Who are they and what do they want?<br />
International Journal of Information Management, 31(1), 53-62.<br />
Rowley, J. & Hartley, R. (2008). Organiz<strong>in</strong>g Knowledge: An Introduction to Manag<strong>in</strong>g<br />
Access to Information (4. ed.). Hampshire: Ashgate.<br />
Rub<strong>in</strong>, H.J. & Rub<strong>in</strong>, I.S. (2005). Qualitative Interview<strong>in</strong>g: The Art of Hear<strong>in</strong>g Data (2.<br />
ed.). Thousand Oaks: Sage.<br />
Sabucedo, L.Á. & Rifón, L.A. (2006). Semantic Service Oriented Architectures for<br />
eGovernment Platforms. Retrieved 08-01-2010.<br />
Salton, G. (1970). <strong>Automatic</strong> text analysis. Science, 168(3929), 335-343.<br />
Salton, G. (1986a). Another look at automatic text-retrieval systems. Communications<br />
of the ACM, 29(7), 648-656.<br />
Salton, G. (1988). <strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> and abstract<strong>in</strong>g. In: Willet, P. (Ed.), Document<br />
Retrieval Systems (pp. 42-80). London: Taylor Graham.<br />
Salton, G. (1989). <strong>Automatic</strong> Text Process<strong>in</strong>g: The Transformation, Analysis, and<br />
Retrieval of Information by Computer. Read<strong>in</strong>g, Massachusetts: Addison-<br />
Wesley.<br />
Salton, G. (1991). Developments <strong>in</strong> automatic text retrieval. Science, 253(5023), 974-<br />
980.<br />
Salton, G. & Buckley, C. (1988). Term-weight<strong>in</strong>g approaches <strong>in</strong> automatic text<br />
retrieval. Information Process<strong>in</strong>g & Management, 24(5), 513-523.<br />
Salton, G. & McGill, M.J. (1983). Introduction to Modern Information Retrieval. New<br />
York: McGraw-Hill.<br />
Salton, G., Wong, A. & Yang, C.S. (1975). A vector space model for automatic<br />
<strong><strong>in</strong>dex<strong>in</strong>g</strong>. Communications of the ACM, 18(11), 613-620.<br />
238
239<br />
References<br />
Salton, G., Yang, C.S. & Yu, C.T. (1975). A theory of term importance <strong>in</strong> automatic<br />
text analysis. Journal of the American Society for Information Science, 26(1),<br />
33-44.<br />
Saracevic, T. (1996). Relevance reconsidered '96. In: Ingwersen, P. & Pors, N.O. (Eds.),<br />
Colis 2 - Second International Conference On Conceptions Of Library And<br />
Information Science: Integration In Perspective, Proceed<strong>in</strong>gs (pp. 201-218).<br />
Copenhagen S: Royal School Librarianship.<br />
Saracevic, T., Kantor, P., Chamis, A. & Trivison, D. (1987). Experiments on the<br />
Cognitive Aspects of Information Seek<strong>in</strong>g and Information Retriev<strong>in</strong>g. F<strong>in</strong>al<br />
Report and Appendices. Wash<strong>in</strong>gton, D.C.: National Science Foundation, Div.<br />
of Information Science and Technology.<br />
Savola<strong>in</strong>en, R. (1995). Everyday life <strong>in</strong>formation seek<strong>in</strong>g: Approach<strong>in</strong>g <strong>in</strong>formation<br />
seek<strong>in</strong>g <strong>in</strong> the context of モway of lifeヤ. Library & Information Science<br />
Research, 17(3), 259-294.<br />
Savola<strong>in</strong>en, R. (2006). Time as a context of <strong>in</strong>formation seek<strong>in</strong>g. Library & Information<br />
Science Research, 28(1), 110-127.<br />
Savoy, J. (2005). Bibliographic database access us<strong>in</strong>g free-text and controlled<br />
vocabulary: An evaluation. Information Process<strong>in</strong>g & Management, 41(4), 873-<br />
890.<br />
Saxena, K.B.C. & Aly, A.M.M. (1995). Information technology support for<br />
reeng<strong>in</strong>eer<strong>in</strong>g public adm<strong>in</strong>istration: A conceptual framework. International<br />
Journal of Information Management, 15(4), 271-293.<br />
Schamber, L., Eisenberg, M.B. & Nilan, M.S. (1990). A re-exam<strong>in</strong>ation of relevance:<br />
Toward a dynamic, situational def<strong>in</strong>ition. Information Process<strong>in</strong>g &<br />
Managament, 26(6), 755-776.<br />
Schellong, A. (2007). Cross<strong>in</strong>g the boundary: Why putt<strong>in</strong>g the e <strong>in</strong> <strong>government</strong> is the<br />
easy part. In: PNG Work<strong>in</strong>g Paper Series, PNG07-002. Retrieved 18-01, 2010,<br />
from<br />
http://www.hks.harvard.edu/netgov/files/png_work<strong>in</strong>gpaper_series/PNG07-<br />
002_Work<strong>in</strong>gPaper_cross<strong>in</strong>g_the_boundary_schellong.pdf.<br />
Schultz, C.K. (1970). Cost-effectiveness as a guide <strong>in</strong> develop<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> rules.<br />
Information Storage and Retrieval, 6(4), 335-340.<br />
Schwartz, D.G., Divit<strong>in</strong>i, M. & Brasethvik, T. (2000). Internet-Based Organizational<br />
Memory and Knowledge Management. Hershey, USA: Idea Group.<br />
Sebastiani, F. (1999). A tutorial on automated text categorisation. In: Amandi, A. &<br />
Zun<strong>in</strong>o, A. (Eds.), Proceed<strong>in</strong>gs of the 1st Argent<strong>in</strong>ian Symposium on Artificial<br />
Intelligence (ASAI'99) (pp. 7-35). Buenos Aires, AR.<br />
Sebastiani, F. (2002). Mach<strong>in</strong>e learn<strong>in</strong>g <strong>in</strong> automated text categorization. ACM<br />
Comput<strong>in</strong>g Surveys, 34(1), 1-47.<br />
Serola, S. (2006). City planners' <strong>in</strong>formation seek<strong>in</strong>g behavior: Information channels<br />
used and <strong>in</strong>formation types needed <strong>in</strong> vary<strong>in</strong>g types of perceived work tasks. In:<br />
Ruthven, I. (Ed.), IIiX: Proceed<strong>in</strong>gs of the 1st International Conference on<br />
Information Interaction <strong>in</strong> Context (pp. 42-45). New York: ACM.<br />
Shropshire, K.O., Hawdon, J.E. & Witte, J.C. (2009). Web survey design: Balanc<strong>in</strong>g<br />
measurement, response, and topical <strong>in</strong>terest. Sociological Methods Research,<br />
37(3), 344-370.<br />
Siegel, S. & Castellan, N.J. (1988). Nonparametric Statistics for the Behavioral<br />
Sciences (2. ed.). New York: McGraw Hill.<br />
Silvester, J.P., Genuardi, M.T. & Kl<strong>in</strong>gbiel, P.H. (1994). Mach<strong>in</strong>e-aided <strong><strong>in</strong>dex<strong>in</strong>g</strong> at<br />
Nasa. Information Process<strong>in</strong>g & Managament, 30, 631-645.<br />
Silvester, J.P. & Kl<strong>in</strong>gbiel, P.H. (1993). An operational system for subject switch<strong>in</strong>g<br />
between controlled vocabularies. Information Process<strong>in</strong>g & Management, 29(1),<br />
47-59.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
S<strong>in</strong>ghal, A., Salton, G., Mitra, M. & Buckley, C. (1996). Document length<br />
normalization. Information Process<strong>in</strong>g & Management, 32(5), 619-633.<br />
SKAT (2009). Årsrapport 2008. Retrieved 04-02, 2010, from<br />
http://www.skat.dk/SKAT.aspx?oId=1809360&vId=0.<br />
SKAT (2010). About us. Retrieved 26-01, 2010, from<br />
http://www.skat.dk/SKAT.aspx?oId=1826783&vId=0.<br />
Skov, M. (2009). The Re<strong>in</strong>vented Museum: Explor<strong>in</strong>g Information Seek<strong>in</strong>g Behaviour <strong>in</strong><br />
a Digital Museum Context. Unpublished Doctoral dissertation, Research<br />
Programme Information Interaction and Information Architecture, Royal School<br />
of Library and Information Science, Copenhagen.<br />
Snellen, I.T.M. (2002). Electronic governance: Implications for citizens, politicians and<br />
public servants. International Review of Adm<strong>in</strong>istrative Sciences, 68(2), 183-<br />
198.<br />
Soergel, D. (1985). Organiz<strong>in</strong>g Information: Pr<strong>in</strong>ciples of Data Base and Retrieval<br />
Systems. San Diego, CA: Academic Press.<br />
Soergel, D. (1994). Index<strong>in</strong>g and retrieval performance: The logical evidence. Journal<br />
of the American Society for Information Science, 45(8), 589-599.<br />
Soergel, D. (1999). The rise of ontologies or the re<strong>in</strong>vention of classification. Journal of<br />
the American Society for Information Science, 50(12), 1119-1120.<br />
Solomon, P. (1997a). Discover<strong>in</strong>g <strong>in</strong>formation behavior <strong>in</strong> sense mak<strong>in</strong>g.1. Time and<br />
tim<strong>in</strong>g. Journal of the American Society for Information Science, 48(12), 1097-<br />
1108.<br />
Solomon, P. (1997b). Discover<strong>in</strong>g <strong>in</strong>formation behavior <strong>in</strong> sense mak<strong>in</strong>g.2. The social.<br />
Journal of the American Society for Information Science, 48(12), 1109-1126.<br />
Solomon, P. (1997c). Discover<strong>in</strong>g <strong>in</strong>formation behavior <strong>in</strong> sense mak<strong>in</strong>g.3. The person.<br />
Journal of the American Society for Information Science, 48(12), 1127-1138.<br />
Sormunen, E. (2002). Liberal relevance criteria of TREC - count<strong>in</strong>g on negligible<br />
documents? In: SIGIR '02: Proceed<strong>in</strong>gs of the 25th annual <strong>in</strong>ternational ACM<br />
SIGIR conference on Research and development <strong>in</strong> <strong>in</strong>formation retrieval, (pp.<br />
324-330). August 11-15, 2002, Tampere, F<strong>in</strong>land: ACM.<br />
Southon, F.C.G., Todd, R.J. & Seneque, M. (2002). Knowledge management <strong>in</strong> three<br />
organizations: An exploratory study. Journal of the American Society for<br />
Information Science and Technology, 53(12), 1047-1059.<br />
Sparck Jones, K. (1973). Index term weight<strong>in</strong>g. Information Storage and Retrieval,<br />
9(11), 619-633.<br />
Sparck Jones, K. (1981). The Cranfield tests. In: Jones, K.S. (Ed.), Information<br />
Retrieval Experiment (pp. 256-284). London: Butterworths.<br />
Sprehe, J.T., McClure, C.R. & Zellner, P. (2002). The role of situational factors <strong>in</strong><br />
manag<strong>in</strong>g U.S. federal recordkeep<strong>in</strong>g. Government Information Quarterly,<br />
19(3), 289-305.<br />
Ste<strong>in</strong>mark, C. (2005). EDM <strong>in</strong> the Danish public sector: The FESD project. Aslib<br />
Proceed<strong>in</strong>gs, 57(4), 369-377.<br />
Stenmark, D. (2005). How Intranets differ from the Web: Organisational cultureʼs effect<br />
on technology. In: Bartmann, D., Rajola, F., Kall<strong>in</strong>ikos, J., Avison, D.E.,<br />
W<strong>in</strong>ter, R., E<strong>in</strong>-Dor, P., Becker, J., Bodendorf, F. & We<strong>in</strong>hardt, C. (Eds.),<br />
European Conference on Information Systems ECIS 05.<br />
Stewart, D.W., Shamdasani, P.N. & Rook, D.W. (2007). Focus Groups: Theory and<br />
Practice (2. ed.). Thousand Oaks: Sage.<br />
Strader, C.R. (2009). Author-assigned keywords versus Library of Congress Subject<br />
Head<strong>in</strong>gs implications for the catalog<strong>in</strong>g of electronic theses and dissertations.<br />
Library Ressources & Technical Services, 53(4), 243-250.<br />
Strzalkowski, T., L<strong>in</strong>, F., Wang, J. & Perez-Carballo, J. (1999). Evaluat<strong>in</strong>g natural<br />
language process<strong>in</strong>g techniques <strong>in</strong> <strong>in</strong>formation retrieval. In: Strzalkowski, T.<br />
240
241<br />
References<br />
(Ed.), Natural Language Information Retrieval (pp. 113-145). Dordrecht:<br />
Kluwer.<br />
Suomela, S. & Kekälä<strong>in</strong>en, J. (2005). Ontology as a search-tool: A study of real users'<br />
query formulation with and without conceptual support. In: Losada, D.E. &<br />
Fernandez-Luna (Eds.), ECIR proceed<strong>in</strong>gs 2005 (pp. 315-329): Spr<strong>in</strong>ger.<br />
Suomela, S. & Kekälä<strong>in</strong>en, J. (2006). User evaluation of ontology as query construction<br />
tool. Information Retrieval, 9, 455-475.<br />
Svenonius, E. (1986). Unanswered questions <strong>in</strong> the design of controlled vocabularies.<br />
Journal of the American Society for Information Science, 37(5), 331-341.<br />
Talja, S., Tuom<strong>in</strong>en, K. & Savola<strong>in</strong>en, R. (2005). "Isms" <strong>in</strong> <strong>in</strong>formation science:<br />
Constructivism, collectivism and constructionism. Journal of Documentation,<br />
61(1), 79-101.<br />
Tambouris, E., Manouselis, N. & Costopoulou, C. (2007). Metadata for digital<br />
collections of e-<strong>government</strong> resources. The Electronic Library, 25(2), 176-192.<br />
Taylor, R.S. (1968). Question-negotiation and <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> libraries. College<br />
& Research Libraries, 29(3), 178-194.<br />
Taylor, R.S. (1991). Information use environments. In: Derv<strong>in</strong>, B. & Voigt, M.J. (Eds.),<br />
Progress <strong>in</strong> Communication Sciences (Vol. 10, pp. 217-255). Norwood, NJ:<br />
Ablex.<br />
Tenopir, C. (1985). Full text database retrieval performance. Onl<strong>in</strong>e Information<br />
Review, 9(2), 149-164.<br />
The Danish Government, Local Government Denmark & Danish Regions. (2010).<br />
Mandate: New Common Public Strategy for Digitalization 2011-2015: Local<br />
Government Denmark.<br />
The Danish Government, Local Government Denmark, Danish Regions, Copenhagen<br />
Municipality & Frederiksberg Municipality (2004). The Danish eGovernment<br />
Strategy 2004-2006: Realis<strong>in</strong>g the Potential. Retrieved 13-07, 2010, from<br />
http://www.epractice.eu/files/media/media_275.pdf.<br />
The Danish Government, Local Government Denmark (LGDK) & Danish Regions<br />
(2007). The Danish E-Government Strategy 2007-2010: Towards Better Digital<br />
Service, Increased Efficiency and Stronger Collaboration. from<br />
http://www.moderniser<strong>in</strong>g.dk/fileadm<strong>in</strong>/user_upload/documents/Projekter/digita<br />
liser<strong>in</strong>gsstrategi/Danish_E-<strong>government</strong>_strategy_2007-2010.pdf.<br />
Thomas, M., Caudle, D.M. & Schmitz, C.M. (2009). To tag or not to tag? Library Hi<br />
Tech, 27(3), 411-434.<br />
Trant, J. (2009). Study<strong>in</strong>g social tagg<strong>in</strong>g and folksonomy: A review and framework.<br />
Journal of Digital Information, 10(1).<br />
Turp<strong>in</strong>, A., Scholer, F., Järvel<strong>in</strong>, K., Wu, M. & Culpepper, J.S. (2009). Includ<strong>in</strong>g<br />
summaries <strong>in</strong> system evaluation. In: Allan, J. (Ed.), Proceed<strong>in</strong>gs of the 32nd<br />
<strong>in</strong>ternational ACM SIGIR conference on Research and development <strong>in</strong><br />
<strong>in</strong>formation retrieval. New York: ACM.<br />
United Nations. (2012). E-Government Survey 2012: E-Government for the People.<br />
New York: United Nations.<br />
Vakkari, P. (1999). Task complexity, problem structure and <strong>in</strong>formation actions:<br />
Integrat<strong>in</strong>g studies on <strong>in</strong>formation seek<strong>in</strong>g and retrieval. Information Process<strong>in</strong>g<br />
& Management, 35, 819-837.<br />
Vakkari, P. (2003). Task-based <strong>in</strong>formation search<strong>in</strong>g. Annual Review of Information<br />
Science and Technology, 37, 413-464.<br />
van de Donk, W.B.H.J. & Snellen, I.T.M. (1989). Knowledge-based systems <strong>in</strong> public<br />
adm<strong>in</strong>istration: Evolv<strong>in</strong>g practices and norms. In: Snellen, I.T.M., van de Donk,<br />
W.B.H.J. & Baquiast, J.-P. (Eds.), Expert Systems <strong>in</strong> Public Adm<strong>in</strong>istration:<br />
Evolv<strong>in</strong>g Practices and Norms (pp. 3-22). Amsterdam: Elsevier.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
van Deursen, A. & van Dijk, J. (2010). Civil servantsメ <strong>in</strong>ternet skills: Are they ready for<br />
e-<strong>government</strong>? In: Wimmer, M.A., Chappelet, J.-L., Janssen, M. & Scholl, H.J.<br />
(Eds.), Electronic Government. 9th IFIP WG 8.5 International Conference,<br />
EGOV 2010, Lausanne, Switzerland, August 29 - September 2, 2010.<br />
Proceed<strong>in</strong>gs (pp. 132-143). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />
Veal, D.C. (2001). Techniques of document management: A review of text retrieval and<br />
related technologies. Journal of Documentation, 57(2), 192-217.<br />
Veenema, F. (1996). To <strong>in</strong>dex or not to <strong>in</strong>dex. Canadian Journal of Information and<br />
Library Science, 21(2), 1-22.<br />
Vellucci, S.L. (1998). Metadata. Annual Review of Information Science and<br />
Technology, 33, 187-222.<br />
Voorhees, E. & Pazienza, M. (1999). Natural language process<strong>in</strong>g and <strong>in</strong>formation<br />
retrieval. In: Lecture Notes <strong>in</strong> Computer Science: Information Extraction (Vol.<br />
1714, pp. 32-48). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />
Voorhees, E.M. (2000). Variations <strong>in</strong> relevance judgments and the measurement of<br />
retrieval effectiveness. Information Process<strong>in</strong>g & Managament, 36, 697-716.<br />
Wacholder, N., Kelly, D., Kantor, P., Rittman, R., Sun, Y., Bai, B., Small, S., Yamrom,<br />
B. & Strzalkowski, T. (2007). A model for quantitative evaluation of an end-toend<br />
question-answer<strong>in</strong>g system. Journal of the American Society for Information<br />
Science and Technology, 58(8), 1082-1099.<br />
Walden, G.R. (2006). Focus group <strong>in</strong>terview<strong>in</strong>g <strong>in</strong> the library literature: A selective<br />
annotated bibliography 1996-2005. Reference Services Review, 34(2), 222-241.<br />
Wang, P. (1999). Methodologies and methods for user behavioral research. Annual<br />
Review of Information Science and Technology, 34, 53-99.<br />
Wang, Y.-S. & Shih, Y.-W. (2009). Why do people use <strong>in</strong>formation kiosks? A<br />
validation of the unified theory of acceptance and use of technology.<br />
Government Information Quarterly, 26(1), 158-165.<br />
Weibel, S. (1997). The Dubl<strong>in</strong> Core: A simple content description model for electronic<br />
resources. Bullet<strong>in</strong> of the American Society for Information Science, 24(1), 9-11.<br />
White, M. (2005). The Content Management Handbook. London: Facet.<br />
Wilbur, W.J. & Sirotk<strong>in</strong>, K. (1992). The automatic identification of stop words. Journal<br />
of Information Science, 18(1), 45-55.<br />
Willett, P. (2006). The Porter stemm<strong>in</strong>g algorithm: Then and now. Program: Electronic<br />
Library and Information Systems, 40(3), 219-223.<br />
Wilson, T.D. (1980). Information system design implications of research <strong>in</strong>to the<br />
<strong>in</strong>formation behaviour of social workers and social adm<strong>in</strong>istrators. In: Harbo,<br />
O.K., L., (Ed.), Theory and application of <strong>in</strong>formation research: Proceed<strong>in</strong>gs of<br />
the Second International Research Forum on Information Science, 3-6 August,<br />
1977, (pp. 198-213). Royal School of Librarianship, Copenhagen: London, UK:<br />
Mansell.<br />
Wilson, T.D. (1981). On user studies and <strong>in</strong>formation needs. Journal of Documentation,<br />
37(1), 3-15.<br />
Wilson, T.D. (1999). Models <strong>in</strong> <strong>in</strong>formation behaviour research. Journal of<br />
Documentation, 55(3), 249-270.<br />
Wilson, T.D. & Streatfield, D.R. (1977). Information needs <strong>in</strong> local authority social<br />
services departments: an <strong>in</strong>terim report on project INISS. Journal of<br />
Documentation, 33(4), 277-293.<br />
Wimmer, M.A. (2007). eGovernment as a multidiscipl<strong>in</strong>ary research field. In:<br />
Codagnone, C. & Wimmer, M.A. (Eds.), Roadmapp<strong>in</strong>g eGovernment Research:<br />
Visions and Measures towards Innovative Governments <strong>in</strong> 2020 (pp. 12-14).<br />
[Koblentz]: eGovRTD2020 Project Consortium.<br />
242
243<br />
References<br />
Woudstra, L. & van den Hooff, B. (2008). Inside the source selection process: Selection<br />
criteria for human <strong>in</strong>formation sources. Information Process<strong>in</strong>g & Management,<br />
44(3), 1267-1278.<br />
Xu, Y.C., Tan, C.Y.B. & Yang, L. (2006). Who will you ask? An empirical study of<br />
<strong>in</strong>terpersonal task <strong>in</strong>formation seek<strong>in</strong>g. Journal of the American Society for<br />
Information Science and Technology, 57(12), 1666-1677.<br />
Yang, D., Tong, L., Ye, Y. & Wu, H. (2006). Support<strong>in</strong>g effective operation of e<strong>government</strong>al<br />
services through workflow and knowledge management. In:<br />
Aberer, K., Peng, Z., Rundenste<strong>in</strong>er, E.A., Zhang, Y. & Li, X., (Eds.), Web<br />
Information Systems: WISE 2006, 7th International Conference on Web<br />
Information Systems Eng<strong>in</strong>eer<strong>in</strong>g, Wuhan, Ch<strong>in</strong>a, October 23-26, (pp. 102-113).<br />
Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />
Yildiz, M. (2007). E-<strong>government</strong> research: Review<strong>in</strong>g the literature, limitations, and<br />
ways forward. Government Information Quarterly, 24, 646-665.<br />
Zamir, O. & Etzioni, O. (1999). Grouper: A dynamic cluster<strong>in</strong>g <strong>in</strong>terface to Web search<br />
results. Computer Networks, 31(11-16, 17 May 1999), 1361-1374.<br />
Zeng, M.L. (2008). Knowledge Organization Systems (KOS). Knowledge Organization,<br />
35(2/3), 160-182.<br />
Zikmund, W.G. (2000). Bus<strong>in</strong>ess Research Methods (6. ed.). Fort Worth: Hartcourt.<br />
Zipf, G.K. (1949). Human Behavior and the Pr<strong>in</strong>ciple of Least Effort. Cambridge:<br />
Addison-Wesley.<br />
Zunde, P. & Dexter, M.E. (1969). Index<strong>in</strong>g consistency and quality. American<br />
Documentation, 20(3), 259-267.<br />
Østergaard, M. & Olesen, J.D. (2004). Digital forkalkn<strong>in</strong>g: En debatbog om digital<br />
forvaltn<strong>in</strong>g i Danmark. Frederikshavn: Dafolo.<br />
Åström, F. (2007). Changes <strong>in</strong> the LIS research front: Time-sliced cocitation analyses of<br />
LIS journal articles, 1990-2004. Journal of the American Society for Information<br />
Science and Technology, 58(7), 947-957.
List of abbreviations<br />
ARIST Annual Review of Information Science and Technology<br />
ICT Information and Communication Technology<br />
IDF Inverse document frequency<br />
IIR Interactive Information Retrieval<br />
IR Information Retrieval<br />
KOS Knowledge Organiz<strong>in</strong>g Systems<br />
LCSH Library of Congress Subject Head<strong>in</strong>gs<br />
LIS Library and Information Science<br />
MAI Mach<strong>in</strong>e aided/assisted <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />
RSLIS Royal School of Library and Information Science<br />
TF Term frequency<br />
245<br />
Abbreviations
Appendices<br />
247<br />
Appendices<br />
List of abbreviations ................................................................................................................................. 245<br />
Appendices ............................................................................................................................................... 247<br />
Appendix 1: Generic work tasks at SKAT ............................................................................................... 249<br />
Appendix 2: Distribution of employees across ma<strong>in</strong> processes <strong>in</strong> the bus<strong>in</strong>ess model ............................ 253<br />
Appendix 3: E-mail <strong>in</strong>vitation to employees ............................................................................................ 255<br />
Appendix 4: Questions conta<strong>in</strong>ed <strong>in</strong> questionnaire .................................................................................. 257<br />
Appendix 5: Questionnaire pilot test data ................................................................................................ 259<br />
Appendix 6: L<strong>in</strong>k to questionnaire ........................................................................................................... 261<br />
Appendix 7: Dates for the conduct of focus group <strong>in</strong>terviews ................................................................. 263<br />
Appendix 8: Example of the slides guid<strong>in</strong>g a focus group <strong>in</strong>terview ....................................................... 265<br />
Appendix 9: Focus group <strong>in</strong>terview guide................................................................................................ 275<br />
Appendix 10: Transcription conventions.................................................................................................. 277<br />
Appendix 11: Verbatim Danish versions of quotes used <strong>in</strong> the thesis ...................................................... 279<br />
Appendix 12: E-mail <strong>in</strong>vitation to participate <strong>in</strong> search test..................................................................... 287<br />
Appendix 13: Questionnaire for recruit<strong>in</strong>g test persons for the search test .............................................. 291<br />
Appendix 14: Simulated search tasks ....................................................................................................... 293<br />
Appendix 15: Test persons’ <strong>in</strong>sight <strong>in</strong>to simulated search tasks .............................................................. 295<br />
Appendix 16: E-mail concern<strong>in</strong>g naturalistic <strong>in</strong>formation needs ............................................................. 297<br />
Appendix 17: Instructions for search test persons .................................................................................... 299<br />
Appendix 18: Rotation of search tasks ..................................................................................................... 303<br />
Appendix 19: Search test <strong>in</strong>terview guide ................................................................................................ 305<br />
Appendix 20: Judgement of the relevance of retrieved documents <strong>in</strong> search test .................................... 307<br />
Appendix 21: Completeness degree of questionnaire responses .............................................................. 309<br />
Appendix 22: Respondents’ experience with work tasks ......................................................................... 311<br />
Appendix 23: Age distribution of population, respondents and test persons ............................................ 313<br />
Appendix 24: Respondents’ length of service <strong>in</strong> the organization ........................................................... 315<br />
Appendix 25: Focus group participants work tasks .................................................................................. 317<br />
Appendix 26: Additional sources mentioned by respondents ................................................................... 319<br />
Appendix 27: Test persons’ background data .......................................................................................... 325<br />
Appendix 28: Supplementary search test tables ....................................................................................... 327
Appendix 1: Generic work tasks at SKAT<br />
249<br />
Appendices<br />
This appendix summarizes and expla<strong>in</strong>s the content of the 19 generic work tasks<br />
constitut<strong>in</strong>g the version of the bus<strong>in</strong>ess model that formed the basis of the survey<br />
questionnaire.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Ma<strong>in</strong> process Work task Description<br />
Instruction Common Answer<strong>in</strong>g requests, whether written, <strong>in</strong> person, or by<br />
phone.<br />
Market<strong>in</strong>g, guidance, and outgo<strong>in</strong>g service.<br />
Settlement Common Handl<strong>in</strong>g payments, settlements, certifications, access<br />
to records or registration, expenditures,<br />
reimbursements or deal<strong>in</strong>g with compla<strong>in</strong>ts.<br />
Prelim<strong>in</strong>ary Prelim<strong>in</strong>ary <strong>in</strong>come assessments, annual tax<br />
assessment of statements and returns of personal taxes, family<br />
<strong>in</strong>come/person allowance, gift taxes, taxation of estate of deceased<br />
al taxes persons, undivided estate, and settlements regard<strong>in</strong>g<br />
personal taxes.<br />
Bus<strong>in</strong>ess Handl<strong>in</strong>g and mak<strong>in</strong>g decisions about bus<strong>in</strong>esses<br />
relations regard<strong>in</strong>g VAT settlements, excise duties, retirement<br />
benefits, taxes on labor costs, A tax, and differences of<br />
<strong>in</strong>come.<br />
Corporation Prelim<strong>in</strong>ary <strong>in</strong>come assessments, annual tax<br />
taxes<br />
statements, deal<strong>in</strong>g with applications and mak<strong>in</strong>g<br />
decisions – all regard<strong>in</strong>g foundations, associations,<br />
and companies.<br />
Customs Registration of imports and exports, custom<br />
procedures for private persons and companies, mak<strong>in</strong>g<br />
decisions about areas of customs and deal<strong>in</strong>g with<br />
applications<br />
permissions.<br />
for custom licenses and custom<br />
Vehicles Expedition of vehicles and license plates, handl<strong>in</strong>g<br />
procedures concern<strong>in</strong>g duty exemption, assessments,<br />
and monthly specifications.<br />
Estate Assessments (depreciations) of estate, handl<strong>in</strong>g<br />
assessment communications, taxation on the basis of<br />
the law of assessed valuation, recalculation of taxes,<br />
and registration of property.<br />
250
251<br />
Appendices<br />
Inspection Common Handl<strong>in</strong>g crim<strong>in</strong>al cases and cases of liability,<br />
<strong>in</strong>clud<strong>in</strong>g the right to operate, divided estates, and<br />
<strong>in</strong>spections.<br />
Customs Inspection of customs (goods and means of<br />
transportation) towards citizens and bus<strong>in</strong>esses.<br />
Collection Common Collection tasks, <strong>in</strong>clud<strong>in</strong>g enforced payments,<br />
Processes of Legal support<br />
adm<strong>in</strong>istration of estates, and handl<strong>in</strong>g compla<strong>in</strong>ts<br />
about collection.<br />
Dissem<strong>in</strong>ation of rules, <strong>in</strong>structions, <strong>in</strong>formation, and<br />
support<br />
<strong>in</strong>terpretation of practice, rules, and laws.<br />
Secretary Preparation of draft m<strong>in</strong>isterial replies to the<br />
service parliament and citizens, of memos and analyzes, and<br />
submission of hear<strong>in</strong>g statements for legislation and<br />
m<strong>in</strong>isterial responses for the Fiscal Affairs Committee.<br />
IT service and Ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g processes to ensure and document the<br />
adm<strong>in</strong>istration best IT support of SKATs processes through<br />
appo<strong>in</strong>tments and contact with the bus<strong>in</strong>ess. System<br />
ownership,<br />
management.<br />
platform ownership, and change<br />
HR and Adm<strong>in</strong>istration to ensure proper staff conditions and<br />
education the treatment of employees accord<strong>in</strong>g to current rules.<br />
Examples count recruitment, hir<strong>in</strong>g, tra<strong>in</strong><strong>in</strong>g and<br />
development, payroll, and absenteeism.<br />
Internal Procurement and adm<strong>in</strong>istration of goods, services and<br />
activities build<strong>in</strong>gs, account<strong>in</strong>g, communications, press, and<br />
secretarial service.<br />
Management Strategy Processes through which strategies for SKAT are<br />
and<br />
planned by means of sight l<strong>in</strong>es, overall objectives,<br />
development<br />
development<br />
prioritization.<br />
of strategic <strong>in</strong>itiatives, and their<br />
Bus<strong>in</strong>ess Adm<strong>in</strong>istration of grants and contracts, production<br />
management plann<strong>in</strong>g, management of contracts and vendors, and<br />
IT architecture management<br />
Development Ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g tasks that support the legislative process<br />
or participation <strong>in</strong> development projects.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
252
253<br />
Appendices<br />
Appendix 2: Distribution of employees across ma<strong>in</strong> processes <strong>in</strong> the<br />
bus<strong>in</strong>ess model<br />
The figures <strong>in</strong> the table below orig<strong>in</strong>ate from an e-mail correspondence with SKAT. The<br />
figures reflect the distribution of full-time equivalents across the six ma<strong>in</strong> processes<br />
from the bus<strong>in</strong>ess model of SKAT.<br />
Ma<strong>in</strong> process # %<br />
Instruction 881 11,1%<br />
Settlement 2355 29,8%<br />
Inspection 2321 29,3%<br />
Collection 1020 12,9%<br />
Processes of support 819 10,4%<br />
Management and development 516 6,5%<br />
Total 7912 100%
Appendix 3: E-mail <strong>in</strong>vitation to employees<br />
Subject: Invitation til at svare på et spørgeskema<br />
Kære medarbejder hos SKAT<br />
255<br />
Appendices<br />
Jeg er ph.d studerende ved Danmarks Biblioteksskole. Som en del af et større<br />
forskn<strong>in</strong>gsprojekt, der udføres i samarbejde med IT & Telestyrelsen, er jeg i øjeblikket<br />
ved at foretage en undersøgelse af, hvordan medarbejdere hos SKAT benytter<br />
forskellige <strong>in</strong>formationskilder i forb<strong>in</strong>delse med deres forskellige arbejdsopgaver.<br />
Formålet er at undersøge, hvordan man kan forbedre medarbejderes søgn<strong>in</strong>g efter<br />
<strong>in</strong>formation, når de løser forskellige arbejdsopgaver.<br />
Jeg bruger blandt andet et spørgeskema til at <strong>in</strong>dsamle data. I den forb<strong>in</strong>delse skriver jeg<br />
til dig for at høre, om du vil bidrage til undersøgelsen ved at besvare spørgeskemaet.<br />
Det tager ca. 10 m<strong>in</strong>utter at besvare spørgeskemaet, som er tilgængeligt på <strong>in</strong>ternettet.<br />
D<strong>in</strong> besvarelse vil naturligvis blive behandlet fortroligt. Det betyder, at resultaterne kun<br />
vil blive gjort op på en sådan måde, at enkeltpersoners besvarelser ikke kan identificeres<br />
i resultaterne.<br />
Jeg håber du vil hjælpe med projektet ved at besvare spørgeskemaet. Undersøgelsen<br />
kører <strong>in</strong>dtil den 18/12-2008 kl. 18.<br />
Du kommer i gang ved trykke på følgende l<strong>in</strong>k:<br />
http://kalus3.kalus.dk/l?d=3LPNCV2EeE3E<br />
Du er selvfølgelig velkommen til at kontakte mig, hvis du har kommentarer, spørgsmål<br />
eller lignende. På forhånd mange tak for d<strong>in</strong> tid og hjælp.<br />
Med venlig hilsen<br />
Tanja Svarre<br />
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,<br />
Tanja Svarre, ph.d studerende<br />
Danmarks Biblioteksskole, <strong>Aalborg</strong>-afdel<strong>in</strong>gen, Frederik Bajers Vej 7K, 9220 <strong>Aalborg</strong><br />
Øst<br />
Tlf. 9815 7922, fax 9815 1042<br />
E-mail: tas@db.dk
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
256
Appendix 4: Questions conta<strong>in</strong>ed <strong>in</strong> questionnaire<br />
QUESTION<br />
NUMBER<br />
TITLE OF<br />
QUESTION<br />
1 Age 1a<br />
2 Gender 1b<br />
3 Education 2a<br />
4 Title of education 2b<br />
5 Place of employment 3a<br />
6 Further comments to place of employment 3b<br />
7 Departmental affiliation 4‐10<br />
8 Length of service <strong>in</strong> organization 11<br />
9 Work tasks with<strong>in</strong> <strong>in</strong>struction 12<br />
10 Work tasks with<strong>in</strong> settlement 16<br />
11 Work tasks with<strong>in</strong> <strong>in</strong>spection 38<br />
12 Work tasks with<strong>in</strong> collection 45<br />
13 Work tasks with<strong>in</strong> processes of support 49<br />
14 Work tasks with<strong>in</strong> management and development 65<br />
257<br />
PAGE OF REFERENCE<br />
Appendices<br />
IN WEB QUESTIONNAIRE<br />
15 Frequency of work task 13a, 17a, 20a, 23a, 26a, 29a,<br />
32a, 35a, 39a, 42a, 46a, 50a,<br />
53a, 56a, 59a, 62a, 66a, 69a,<br />
72a<br />
16 Work task experience 13b, 17b, 20b, 23b, 26b,<br />
29b, 32b, 35b, 39b, 42b,<br />
46b, 50b, 53b, 56b, 59b,<br />
62b, 66b, 69b, 72b<br />
17 Need for <strong>in</strong>formation to solve work task 14a, 18a, 21a, 24a, 27a, 30a,<br />
33a, 36a, 40a, 43a, 47a, 51a,<br />
54a, 57a, 60a, 63a, 67a, 70a,<br />
73a<br />
18 Information sources 14b,18b, 21b, 24b, 27b, 30b,<br />
33b, 36b, 40b, 43b, 47b,<br />
51b, 54b, 57b, 60b, 63b,<br />
67b, 70b, 73b
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
19 Information needs 15a, 19a, 22a, 25a, 28a, 31a,<br />
34a, 37a, 41a, 44a, 48a, 52a,<br />
55a, 58a, 61a, 64a, 68a, 71a,<br />
74a<br />
20 Metadata 15b, 19b, 22b, 25b, 28b,<br />
31b, 34b, 37b, 41b, 44b,<br />
48b, 52b, 55b, 58b, 61b,<br />
64b, 68b, 71b, 74b<br />
21 Closure and further contact 75<br />
258
Appendix 5: Questionnaire pilot test data<br />
Pilot recipients Logged <strong>in</strong>to pilot<br />
F<strong>in</strong>ished pilot<br />
questionnaire<br />
questionnaire<br />
100% 89 46% 41 29% 26<br />
259<br />
Appendices
Appendix 6: L<strong>in</strong>k to questionnaire<br />
The full version of the questionnaire can be found follow<strong>in</strong>g this l<strong>in</strong>k:<br />
http://kalus3.kalus.dk/l?d=zTBK24SAF6ep<br />
261<br />
Appendices
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
262
Appendix 7: Dates for the conduct of focus group <strong>in</strong>terviews<br />
Date Work task<br />
9/6-2009 Settlement<br />
11/6-2009 Instruction<br />
22/6-2009 Processes of support<br />
22/6-2009 Inspection: Customs<br />
23/6-2009 Inspection: Common<br />
29/6-2009 Management and development<br />
1/7-2009 Collection<br />
263<br />
Appendices
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
264
Appendix 8: Example of the slides guid<strong>in</strong>g a focus group <strong>in</strong>terview<br />
265<br />
Appendices<br />
The follow<strong>in</strong>g slides guided the focus group for management and development. 15<br />
slides were presented to the participants, <strong>in</strong>troduc<strong>in</strong>g the purpose of the <strong>in</strong>terview and<br />
present<strong>in</strong>g results from the questionnaire. The form of the slides <strong>in</strong> the present<br />
appendix was followed <strong>in</strong> the rema<strong>in</strong>der of the focus group slides.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
266
267<br />
Appendices
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
268
269<br />
Appendices
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
270
271<br />
Appendices
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
272
273<br />
Appendices
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
274
Appendix 9: Focus group <strong>in</strong>terview guide<br />
275<br />
Appendices<br />
The focus group <strong>in</strong>terview guide followed the structure of the questionnaire and the<br />
succession of the focus group slide shows. Below the <strong>in</strong>terview guide questions are<br />
listed as to the slides they were support<strong>in</strong>g (cf. the previous appendix).<br />
Correspond<strong>in</strong>g<br />
slides<br />
Slide 5<br />
Slide 6-7<br />
Slides 8-9<br />
Interview guide questions<br />
The <strong>in</strong>terview started out with a short presentation of each participant<br />
as to:<br />
Their concrete work task with<strong>in</strong> the ma<strong>in</strong> process of the<br />
bus<strong>in</strong>ess model,<br />
Their experience with the work task,<br />
Their educational background,<br />
How often they carry out the work task, and<br />
Whether they carry out other work tasks than the one<br />
discussed today<br />
Does the frequency of <strong>in</strong>formation seek<strong>in</strong>g depend on the<br />
concrete work tasks? How? Why?<br />
Is it possible, that the answers from the survey express<br />
average frequencies? If so, what is the real frequency? What is<br />
the actual oscillation?<br />
How often do you seek <strong>in</strong>formation?<br />
Are you seek<strong>in</strong>g <strong>in</strong>formation for certa<strong>in</strong> work tasks?<br />
Is there a difference between the way, you seek <strong>in</strong>formation<br />
depend<strong>in</strong>g on the work task <strong>in</strong> question?<br />
What sources are used when, and why?<br />
And for which work tasks?<br />
What is the frequency of use of concrete <strong>in</strong>formation sources?
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Slides 10-11<br />
Slides 12-14<br />
Do the results reflect your everyday <strong>in</strong>formation needs?<br />
How?<br />
Are there any differences?<br />
Can you try to expla<strong>in</strong>, when which metadata could be of use?<br />
o Is it <strong>in</strong> certa<strong>in</strong> situations?<br />
o For certa<strong>in</strong> work tasks?<br />
o For certa<strong>in</strong> <strong>in</strong>formation needs?<br />
o For certa<strong>in</strong> types of documents?<br />
How do you th<strong>in</strong>k, <strong>in</strong>formation seek<strong>in</strong>g at the <strong>in</strong>tranet is<br />
work<strong>in</strong>g at present?<br />
276
Appendix 10: Transcription conventions<br />
277<br />
Appendices<br />
Verbatim transcription took place <strong>in</strong> connection with focus group <strong>in</strong>terviews <strong>in</strong> the<br />
doma<strong>in</strong> study and <strong>in</strong>dividual <strong>in</strong>terviews <strong>in</strong> the search test. In order to target<br />
transcription consistency, a set of guidel<strong>in</strong>es were developed ahead of the transcription<br />
process (cf. Poland, 2003). Some guidel<strong>in</strong>es recurred <strong>in</strong> the transcriptions of both focus<br />
group and <strong>in</strong>dividual <strong>in</strong>terviews. In both cases, topics that were irrelevant to the theme<br />
of the <strong>in</strong>terview were omitted from the transcription along with laughter, <strong>in</strong>terjections,<br />
and the like. Whenever passages were counted out, it was marked with “...”.<br />
S<strong>in</strong>ce the two forms of <strong>in</strong>terviews carried out at some po<strong>in</strong>ts differ with respect<br />
to transcription issues, some type specific guidel<strong>in</strong>es supplemented the common<br />
recommendations mentioned above. Verbatim transcription is a challeng<strong>in</strong>g task<br />
(Halcomb & Davidson, 2006). Transcription of focus group <strong>in</strong>terviews is particularly<br />
challeng<strong>in</strong>g. Apart from identify<strong>in</strong>g and typ<strong>in</strong>g s<strong>in</strong>gle words and statements, the<br />
transcriber must identify who said what when. In addition the participants occasionally<br />
spoke all at once. All this considered, it was decided to transcribe the focus group<br />
<strong>in</strong>terviews without outside assistance. To keep focus on the content of the<br />
conversations tak<strong>in</strong>g place, affirmative remarks from fellow participants were omitted<br />
from the transcriptions.<br />
As regards the <strong>in</strong>dividual search test <strong>in</strong>terviews an external transcriber was<br />
hired. In addition to the common omissions, further elements were systematically<br />
sorted out dur<strong>in</strong>g the transcriptions. These elements comprise:<br />
Introductions to the search test (see Appendix 17 for a description of the<br />
<strong>in</strong>troduction delivered to all test persons ahead of the search test), whether <strong>in</strong> the<br />
beg<strong>in</strong>n<strong>in</strong>g of the search test or <strong>in</strong> the middle <strong>in</strong>troduc<strong>in</strong>g the part referr<strong>in</strong>g to the<br />
categorization.<br />
Conversations dur<strong>in</strong>g the search test, that was considered irrelevant to the<br />
content of the thesis. These pieces of conversation especially took place, when<br />
the system had a long response time to a request.<br />
Clarify<strong>in</strong>g comments, questions and related responses concern<strong>in</strong>g the execution<br />
of the test.<br />
All transcriptions were set up with l<strong>in</strong>e numbers <strong>in</strong> order to enable accurate referral.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
278
Appendix 11: Verbatim Danish versions of quotes used <strong>in</strong> the thesis<br />
279<br />
Appendices<br />
This appendix reports the Danish quotes applied <strong>in</strong> the thesis. The quotes orig<strong>in</strong>ate from<br />
the focus group and search test <strong>in</strong>terviews. The full transcriptions have been enclosed<br />
for the assessment committee. Other <strong>in</strong>terested are referred to the author for details on<br />
the <strong>in</strong>terviews.<br />
The identification of quotes follows the structure below:<br />
Focus group<br />
transcriptions<br />
Test person<br />
transcriptions<br />
Reference <strong>in</strong><br />
text<br />
Explanation<br />
(R1, p. 3) R1 refers to the focus group participant, p.3 to the<br />
pages of the transcription referred to.<br />
(TP1, l<strong>in</strong>e 2-<br />
6) (r<br />
TP refers to the test person deliver<strong>in</strong>g the quote.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />
data<br />
R16, p. 5 Jeg bruger elektroniske opslagsværker rigtig meget. Jeg synes jeg f<strong>in</strong>der langt det meste<br />
på de elektroniske opslagsværker. Hvis jeg søger rigtigt, så får jeg det. Men det<br />
krydshenviser jo også til både alt det, der ligger på <strong>in</strong>tranettet og det skulle jo også gerne<br />
fange det, der ligger på Internettet, nemlig via folket<strong>in</strong>gets hjemmeside og den slags t<strong>in</strong>g.<br />
R23, p. 11 Men så længe, man har et trykt opslagsværk, så er det jo nemmere at slå op i. Hvis man<br />
lige ved, hvor man skal lede.<br />
R11, p. 5 For mit vedkommende hvis jeg skal bruge nogle afgørelser, så bruger jeg Google, selvom<br />
jeg ved jeg kan gå <strong>in</strong>d i rets<strong>in</strong>formation og Thomson også. Men jeg søger på Google, for<br />
jeg synes de der elektroniske opslagsværker, som vi har, de er simpelthen for dårlige. Så<br />
f<strong>in</strong>der jeg afgørelsen <strong>in</strong>de på Google og så kan det godt være jeg bliver henvist til en af de<br />
sider, som vi måske egentlig ret beset burde bruge, men jeg synes deres søgefunktioner er<br />
simpelthen for dårlige.<br />
R33, p. 1 Så det <strong>in</strong>formationsbehov, jeg har, det er jo måske mere målrettet på det, som ændrer sig i<br />
nye retsafgørelser, ny lovgivn<strong>in</strong>g, og det får vi jo normalt via <strong>in</strong>tranettet og det vil jo sige,<br />
at så går jeg <strong>in</strong>d hver morgen og ser, er der kommet noget, der relaterer sig til <strong>in</strong>ddrivelse,<br />
og det er den måde, jeg holder mig ajour på.<br />
R7, p. 5 Så <strong>in</strong>tranettet det er jo vores alle sammens opslagstavle og så er søgeresultatet jo altså<br />
også derefter. Du får jo bageopskrifterne også, hvis de er lagt <strong>in</strong>d.<br />
XX,<br />
afregn<strong>in</strong>g, p.<br />
3<br />
xx,<br />
guidance, p.<br />
11<br />
R19<br />
(0:33:27):,<br />
p. 5<br />
Det er tit når vi sidder på agenttelefonerne, f. eks da e-<strong>in</strong>dkomst var nyt, så kunne de<br />
spørge os ”hvordan laver man en efterangivelse” og vi var også i tvivl om mange af<br />
spørgsmålene, så kunne vi gå <strong>in</strong>d og søge på <strong>in</strong>tranettet, men vi opgav. Vi blev nødt til at<br />
stille dem videre til nogle af dem, der sad med det, for det tog for lang tid og det var<br />
uoverskueligt at søge på <strong>in</strong>tranettet. Vi kunne ikke f<strong>in</strong>de de svar, vi havde brug for. Fordi<br />
du fik side op og side ned og alt der stod bare med den m<strong>in</strong>dste om e-<strong>in</strong>dkomst, det<br />
kommer jo med.<br />
Jamen hvis det er opgaver <strong>in</strong>denfor specielle problemstill<strong>in</strong>ger, hvor vi ved at vi har<br />
kolleger, der har nogle spidskompetencer der, så er det jo fristende at gå hen og spørge,<br />
fordi vedkommende mange gange også kender måske de sidste afgørelser, der ligger på<br />
det område. Frem for at begynde at… der er også en tidsfaktor i det. Man kan spare en del<br />
tid ved at…<br />
Jo, men der har jeg det også lidt sådan at, jeg kan egentlig lidt bedst lide at slå op i<br />
toldvejledn<strong>in</strong>gen i første omgang og så… hvis det ikke rigtigt jeg synes at jeg er sikker på<br />
om der nu er kommet noget nyt, så går jeg <strong>in</strong>d og roder lidt og ser den elektroniske og<br />
280
281<br />
Appendices<br />
Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />
data<br />
sådan noget. Og så går jeg altid over og spørger…<br />
R1, p. 10 Det kommer jo helt an på hvor god man er til at beskrive det emne. Hvad er det for ord,<br />
man bruger? Hvem er det, der deler det op i de hovedemner, der kan søges? Det kommer<br />
helt an på kvaliteten af det, der ligger der. Og dem, der har lagt det <strong>in</strong>d.<br />
R7, p. 8 Mange gange i forb<strong>in</strong>delse med sagsbehandl<strong>in</strong>g så går du jo også <strong>in</strong>d og leder efter jamen<br />
er der afgørelser, kendelser eller domme på tilsvarende område. Og så går du jo positivt<br />
<strong>in</strong>d og søger i domme og kendelser, så det er udelukkende dokumenttypen i første<br />
omgang, som at du ved at det er sådan en, du vil have fat i. men det er ikke fordi det er det<br />
vigtigste, men det er en del af det, vi bruger i lige præcis den salgsbehandl<strong>in</strong>g.<br />
R7, p. 2-3 Hvis der kommer en kunde herude, og henvender sig ved skranken, så beder du om at få<br />
vedkommendes cpr-nummer og går <strong>in</strong>d på deres oplysn<strong>in</strong>ger. Det er den først <strong>in</strong>formation.<br />
Du kan ikke ekspedere en kunde andet end at du søger <strong>in</strong>formation m<strong>in</strong>dst en gang. Og så<br />
er spørgsmålet at hvis folk har troet, at <strong>in</strong>formationen var først på det tidspunkt, at der<br />
blev stillet et spørgsmål, at man så gik <strong>in</strong>d og brugte det. Men <strong>in</strong>formation er jo allerede,<br />
når vi henter data frem på skatteyderen. Når vi skifter billede, så henter vi en ny<br />
<strong>in</strong>formation.<br />
R10, p. 4 Det er fordi vores… da vi var kommunalt ansatte, der gik vores opgave ud på at ligne så<br />
mange folk som muligt, altså gennemse deres selvangivelse og se, om de gjorde det rigtigt<br />
eller forkert. Det er så lavet om efter vi er kommet til staten, og det vil sige dengang der<br />
fik vi en erfar<strong>in</strong>g hele tide og holdt ved lige med hvad sker der på det område og det<br />
område. Men efter vi er kommet til staten, der er det ikke første prioritet, det er tvært imod<br />
nok lavest prioriteret, nu der er det at vi skal sørge for at få folk til at bruge tast selv og<br />
lave fejllister, så derfor mister vi hele tiden noget af det, vi engang bare kunne på<br />
rygraden. Jeg kan i hvert fald mærke med mig selv, at mange af de spørgsmål, jeg førhen<br />
bare havde sådan der, det skal du altså <strong>in</strong>d og læse om nu her. For lige at ajourføre og se<br />
er der kommet noget nyt siden.<br />
R14, p. 3 M<strong>in</strong> umiddelbare forklar<strong>in</strong>g på m<strong>in</strong>isterbetjen<strong>in</strong>g vil være, at jamen der er der så meget<br />
mere på spil, når man betjener m<strong>in</strong>isteren, at man skal være så meget mere sikker i s<strong>in</strong><br />
sag. Det er m<strong>in</strong> umiddelbare vurder<strong>in</strong>g af det, hvor imod jamen altså den paratviden vi har<br />
som juridiske eksperter på hvert vores område gør, at vi meget ofte kan klare et spørgsmål<br />
eller et problem med et skud fra hoften med den viden, vi har og så nogle gange, jamen så<br />
har man brug for lige at slå t<strong>in</strong>g efter. Men altså med m<strong>in</strong>isterbetjen<strong>in</strong>g, der skal man være<br />
100 % sikker, det skal man selvfølgelig også i andre sager, men der er bare mere på spil
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />
data<br />
med m<strong>in</strong>isterbetjen<strong>in</strong>g.<br />
R32, p. 4 Nu nede i vores gruppe, der er det mere erfar<strong>in</strong>g. Der skal vi vide, hvad den afdel<strong>in</strong>g laver<br />
næsten, og den afdel<strong>in</strong>g laver og det prøver vi også. Nu skal vi have møde igen på fredag,<br />
hvor det skal gøres yderligere bemærket med, hvad de enkelte afdel<strong>in</strong>ger laver. Men det er<br />
jo altså hvad man kan huske hele tiden. Det har den der noget med at gøre og det har den<br />
der noget med at gøre. Du kan næsten ikke slå det op nogen steder.<br />
R23, p. 12 Jeg bruger det ikke bare til at søge ud i den blå luft. Det ville jeg gøre på google. Ikke<br />
ellers. Brugt eller set før. Det er ikke sikkert, man har brugt det, men man har i hvert fald<br />
set det.<br />
R28, p. 10-<br />
11<br />
TP1, l<strong>in</strong>e<br />
243-244<br />
TP6, l<strong>in</strong>e<br />
49-52<br />
TP3, l<strong>in</strong>e<br />
73-76<br />
FOC 3, L<strong>in</strong>e<br />
200-202<br />
(FOC 6,<br />
L<strong>in</strong>e 145-<br />
Det er jo der også at styresignaler og t<strong>in</strong>g og sager kommer. Det vi skal rette os efter<br />
<strong>in</strong>denfor forretn<strong>in</strong>gen. Og også... vejledn<strong>in</strong>gerne, de juridiske vejledn<strong>in</strong>ger, når de bliver<br />
opdateret, kommer det jo også ud der. Så egentlig er der jo rigtig meget, man følger med i.<br />
Man kan ikke undgå det. Det ville være uhyggeligt, hvis den ikke var på 100 %, vores<br />
<strong>in</strong>tranet. På en eller anden måde er man ligesom der<strong>in</strong>de for at kunne passe sit arbejde.<br />
Altså jeg har ikke fundet noget, hvor der står decideret for, hvordan man gør, men jeg har<br />
fundet noget, der måske <strong>in</strong>dikerer, at der kan jeg f<strong>in</strong>de reglerne.<br />
IP: Men det er stadig en to’er for dig?<br />
TP06: Ja, det synes jeg, det er, fordi man får alligevel lidt at vide om, hvordan<br />
beskatn<strong>in</strong>gsreglerne er… Men man skal selvfølgelig niveauet længere ned for at ramme en<br />
treer på det.<br />
Jeg ville ikke give den en treer. Jeg ville nok faktisk give en etter til begge to, fordi jeg<br />
først kan vide, om det er det korrekte, når jeg kommer <strong>in</strong>d i og ser, om det egentlig er det,<br />
jeg har brug for. Men det er dem, jeg ville vælge - med m<strong>in</strong>dre jeg kan se, at jeg kan gå<br />
videre.<br />
...det er fuldstændigt ubrugeligt. Man kan ikke f<strong>in</strong>de noget. Ja, det kan du godt, du kan<br />
f<strong>in</strong>de 5.000 hits på et eller andet. Man kan ikke bruge det til noget. Det er også derfor jeg<br />
tror, der er mange, der gerne vil have bøger. Det er fordi de er rimeligt sikre på de der<br />
stikordsregistre...<br />
Det er en høj, høj, høj frekvens af <strong>in</strong>formationssøgn<strong>in</strong>g. Det er jo pibende nødvendigt og<br />
vigtigt, at alt det, vi sender ud herfra, det er bare rigtigt. Om så det er en sats eller en<br />
paragrafhenvisn<strong>in</strong>g eller hvad dælen det er, så skal det bare være i orden.<br />
... man kan jo ikke huske alle reglerne udenad, så derfor går man <strong>in</strong>d og læser på dem.<br />
282
283<br />
Appendices<br />
Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />
data<br />
146<br />
FOC 7, L<strong>in</strong>e<br />
120-145<br />
FOC 1, L<strong>in</strong>e<br />
216-218<br />
FOC 2, L<strong>in</strong>e<br />
285-288<br />
FOC 6, L<strong>in</strong>e<br />
159-173<br />
FOC 2, L<strong>in</strong>e<br />
273-277<br />
FOC 5, L<strong>in</strong>e<br />
281-294<br />
Hvis der kommer en kunde herude, og henvender sig ved skranken, så beder du om at få<br />
vedkommendes cpr-nummer og går <strong>in</strong>d på deres oplysn<strong>in</strong>ger. Det er den første<br />
<strong>in</strong>formation. Du kan ikke ekspedere en kunde andet end at du søger <strong>in</strong>formation m<strong>in</strong>dst en<br />
gang.... Men er der en, der kommer og stille en et fagligt spørgsmål, så er behovet ikke<br />
nær så stort. Fordi så sidder der noget på rygraden, du svarer ud fra... De eneste<br />
henvendelser, der ikke kræver <strong>in</strong>formation er dem, der spørger om vej til motorkontoret,<br />
de får en vejledn<strong>in</strong>g udleveret. Alle andre er der opslag i forb<strong>in</strong>delse med.<br />
Vi kan jo <strong>in</strong>gent<strong>in</strong>g uden at vi har edb mulighed for at gå <strong>in</strong>d og spørge på en virksomhed,<br />
krav, hvad skylder den her virksomhed, den her person, hvad skylder han eller hun. Vi<br />
skal <strong>in</strong>d over nettet hele tiden.<br />
Det er jo kun, synes jeg, hvis man skal ekspedere en helt ny sag. Så bliver jeg selvfølgelig<br />
nødt til at søge noget mere om den her virksomhed, og vis det er en virksomhed, jeg<br />
kender i forvejen, så går jeg måske bare lige <strong>in</strong>d og tjekker, hvad er der angivet og hvad er<br />
der betalt. Men uanset hvad, så går jeg jo altid <strong>in</strong>d og søger <strong>in</strong>den jeg skal snakke med en<br />
virksomhed.<br />
M<strong>in</strong> umiddelbare forklar<strong>in</strong>g på m<strong>in</strong>isterbetjen<strong>in</strong>g vil være, at jamen der er der så meget<br />
mere på spil, når man betjener m<strong>in</strong>isteren, at man skal være så meget mere sikker i s<strong>in</strong><br />
sag... med m<strong>in</strong>isterbetjen<strong>in</strong>g, der skal man være 100 % sikker, det skal man selvfølgelig<br />
også i andre sager, men der er bare mere på spil med m<strong>in</strong>isterbetjen<strong>in</strong>g.... Man skal være<br />
100 % sikker på det man skriver og yder og bidrager med, det er rigtigt.<br />
Nu nede i vores gruppe, der er det mere erfar<strong>in</strong>g. Der skal vi vide, hvad den afdel<strong>in</strong>g laver<br />
næsten, og den afdel<strong>in</strong>g laver og det prøver vi også... Men det er jo altså hvad man kan<br />
huske hele tiden. Det har den der noget med at gøre og det har den der noget med at gøre.<br />
Du kan næsten ikke slå det op nogen steder.<br />
Jo, det synes jeg, fordi jeg synes også man bruger det til at orientere sig om nogle t<strong>in</strong>g, før<br />
man møder op eller <strong>in</strong>den man skriver mailen... det er jo også i forhold til hvordan man<br />
betragter en opgave, for jeg tænker lidt at <strong>in</strong>tranet og søgn<strong>in</strong>g er jo hele tiden en del af mit<br />
arbejde, også bare det at holde mig orienteret om, jamen både for SKAT som forretn<strong>in</strong>g<br />
men også det fagområde, jeg sidder med. Så man på en eller anden måde enten søger<br />
<strong>in</strong>formation eller har tilmeldt sig en nyhedsmail... Og alle de <strong>in</strong>formationer er jo med til,<br />
hvordan man kan løse en opgave på en eller anden facon.<br />
FOC 4, L<strong>in</strong>e ...de søger generelt ikke ret meget. De r<strong>in</strong>ger eventuelt, hvis der er et eller andet.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />
data<br />
67-68<br />
FOC 3, L<strong>in</strong>e<br />
877-878)<br />
FOC 4, L<strong>in</strong>e<br />
371-373<br />
FOC 7, L<strong>in</strong>e<br />
208-216<br />
FOC 5, L<strong>in</strong>e<br />
289-294<br />
FOC 5, L<strong>in</strong>e<br />
554-558<br />
TP10, l<strong>in</strong>e<br />
10-11<br />
TP05, l<strong>in</strong>e<br />
113-117<br />
Jeg bruger det ikke bare til at søge ud i den blå luft. Det ville jeg gøre på google. Ikke<br />
ellers. Brugt eller set før. Det er ikke sikkert, man har brugt det, men man har i hvert fald<br />
set det.<br />
Jeg ved der ligger et eller andet dokument, det skal jeg lige bruge nu. Eller jeg ved at der<br />
f<strong>in</strong>des denne her dom, den skal jeg lige f<strong>in</strong>de frem. Eller et eller andet. Typisk nok noget,<br />
jeg har set før, men som jeg nu skal bruge igen.<br />
...da vi var kommunalt ansatte, der gik vores opgave ud på at ligne så mange folk som<br />
muligt, altså gennemse deres selvangivelse og se, om de gjorde det rigtigt eller forkert...<br />
det vil sige dengang der fik vi en erfar<strong>in</strong>g hele tide og holdt ved lige med hvad sker der på<br />
det område og det område... nu der er det at vi skal sørge for at få folk til at bruge tast<br />
selv og lave fejllister, så derfor mister vi hele tiden noget af det, vi engang bare kunne på<br />
rygraden. Jeg kan i hvert fald mærke med mig selv, at mange af de spørgsmål, jeg førhen<br />
bare havde sådan der, det skal du altså <strong>in</strong>d og læse om nu her. For lige at ajourføre og se<br />
er der kommet noget nyt siden.<br />
...for jeg tænker lidt at <strong>in</strong>tranet og søgn<strong>in</strong>g er jo hele tiden en del af mit arbejde, også bare<br />
det at holde mig orienteret om, jamen både for SKAT som forretn<strong>in</strong>g men også det<br />
fagområde, jeg sidder med. Så man på en eller anden måde enten søger <strong>in</strong>formation eller<br />
har tilmeldt sig en nyhedsmail, hvor man så får det <strong>in</strong>d på den måde. Og alle de<br />
<strong>in</strong>formationer er jo med til, hvordan man kan løse en opgave på en eller anden facon.<br />
”Det er jo der også at styresignaler og t<strong>in</strong>g og sager kommer. Det vi skal rette os efter<br />
<strong>in</strong>denfor forretn<strong>in</strong>gen. Og også... vejledn<strong>in</strong>gerne, de juridiske vejledn<strong>in</strong>ger, når de bliver<br />
opdateret, kommer det jo også ud der. Så egentlig er der jo rigtig meget, man følger med i.<br />
Man kan ikke undgå det. Det ville være uhyggeligt, hvis den ikke var på 100 %, vores<br />
<strong>in</strong>tranet. På en eller anden måde er man ligesom der<strong>in</strong>de for at kunne passe sit arbejde.”<br />
Men første gang, jeg søgte, der kom der en håndbog om e-handel. Den ville jeg hellere<br />
have valgt end at gå derned.<br />
Det er ligeså r<strong>in</strong>ge, for der står restance. Og arbejdsgivere, og det er <strong>in</strong>gen af delene. Så<br />
skal vi se med arbejdsgivere… fordi der står arbejdsgivere og A-skat. Og det er <strong>in</strong>deholdt<br />
af A-skat, ligesom vores arbejdsgiver <strong>in</strong>deholder vores skat. Det kan jeg simpelthen ikke<br />
f<strong>in</strong>de. Jeg ved, den ligger der<strong>in</strong>de. Men ud fra det her kommer jeg aldrig der<strong>in</strong>d. For når<br />
jeg ved, hvor det ligger, så ville jeg gå direkte efter den der i stedet for.<br />
TP15, l<strong>in</strong>e Ja, men omvendt kunne den jo også give… fritekst… så skulle de jo alle sammen komme.<br />
284
285<br />
Appendices<br />
Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />
data<br />
306<br />
TP32, l<strong>in</strong>e<br />
295-301<br />
TP21, l<strong>in</strong>e<br />
257-260<br />
TP02, l<strong>in</strong>e<br />
625-633<br />
TP09, l<strong>in</strong>e<br />
553-555<br />
TP06, l<strong>in</strong>e<br />
392-395<br />
Der står lige nøjagtig, at… Altså, omkostn<strong>in</strong>ger til EU's grænse skal medregnes i<br />
toldværdien. Den anden, der vedrører transporten, der kan jeg se, at den her<strong>in</strong>de forklarer<br />
det helt præcist her. Men der har jeg heller ikke været <strong>in</strong>d og søge på ”told” hernede. Den<br />
kom på bare på, at jeg søgte på ”fragt og toldværdi” og ”sider med alle ord”. Og så kom<br />
jeg <strong>in</strong>d på toldvejledn<strong>in</strong>g, som også er den, der henviser til toldkodeks, som behandler de<br />
der regler om, hvor meget fragt der skal lægges til. Så denne her er jo en treer. Men jeg<br />
kom ikke <strong>in</strong>d til den ved at søge på ”erhvervsmæssig import” eller ”forsendelse” eller<br />
”eksport”.<br />
TP21: Der hjalp den ikke så meget, for der var ikke så mange dokumenter alligevel. Der<br />
kunne du overskue de dokumenter, der var der, om du havde haft den eller ej. Der var kun<br />
14 dokumenter, der kom frem. Dem ville du kunne overskue. Den vil nok hovedsageligt<br />
være en hjælp, når du kommer op på de store mængder, altså 1000 dokumenter og den<br />
slags.<br />
Jeg sad her til sidste og kunne gå tænke mig at gå over. For uanset hvad jeg gjorde, kunne<br />
jeg ikke f<strong>in</strong>de det. Og så må jeg have et andet søgested, hvor jeg kan have en mulighed for<br />
at se nogle andre underpunkter, så jeg måske ad den vej kan gå <strong>in</strong>d. Så i den sidste synes<br />
jeg, jeg manglede det.<br />
IP: Sådan til at generere ideer til, hvad man kunne søge på, eller?<br />
TP02: Ja, fordi jeg synes, at det, jeg satte <strong>in</strong>d… Det hedder måske noget andet i<br />
momsloven, end det jeg satte <strong>in</strong>d. Det skal jeg lige have fundet ud af. I forhold til det, der<br />
manglede jeg den her. Der irriterede det mig, at der var en seddel. For uanset hvad jeg<br />
gjorde, kunne jeg ikke få den op.<br />
TP09: Der fungerede det jo godt, for der fandt jeg jo lige pludselig et overemne, som jeg<br />
så kunne klikke <strong>in</strong>d på. Og det gav mig… hov, ja, det har noget med selskabsbeskatn<strong>in</strong>g at<br />
gøre. Så det hjalp mig lidt på mig, også med at tænke, hvad det er for noget, det her.<br />
TP06: Ja, det havde jeg. Jeg vidste, at hvis jeg skulle gå <strong>in</strong>d at kigge på noget med<br />
beskatn<strong>in</strong>gen, så vidste jeg også noget om selvstændig virksomhed. Og så kunne jeg<br />
hurtigere gå <strong>in</strong>d der… Så vidste jeg skulle gå <strong>in</strong>d under personlig <strong>in</strong>dkomst og ikke<br />
kapital<strong>in</strong>dkomst. Jeg kender de skattemæssige regler. Så er det nemmere at gå <strong>in</strong>d i<br />
kategorierne, når man sådan set kender svaret på forhånd.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
TP20,<br />
l<strong>in</strong>e<br />
339-344<br />
TP14,<br />
l<strong>in</strong>e<br />
493-495<br />
Men jeg ved ikke om jeg nogens<strong>in</strong>de ville begynde at løbe alt det igennem. For jeg synes det<br />
for mig tager længere tid, fordi jeg ikke har kendskab nok til, hvad der ligger bag. Hvis jeg nu<br />
var fagmenneske i Skat og vidste alt om virksomhedsskatteordn<strong>in</strong>ger e.l., så kan det godt<br />
være, at den var genial for mig. For jeg ville vide, at jeg lige præcis kan godt <strong>in</strong>d og så trykke<br />
på det der og så få dokumenterne frem. Men jeg ved ikke, om den måske ville glemme nogle<br />
dokumenter, som jeg har brug for. Om den begrænser for meget.<br />
Når man får det første spadestik i , hvad det er for nogle kategorier, hvad de står for og dækker<br />
over, sådan at… Så fumler man, <strong>in</strong>dtil man f<strong>in</strong>der ud, hvad det er. Er der flere veje til Rom,<br />
eller hvordan er den hurtigste, eller… ja. Det er en tilvænn<strong>in</strong>g med nogle t<strong>in</strong>g. Hvad er det<br />
smarteste at gøre…<br />
286
Appendix 12: E-mail <strong>in</strong>vitation to participate <strong>in</strong> search test<br />
287<br />
Appendices<br />
The test persons for the search test were contacted by e-mail. The e-mail <strong>in</strong>formed the<br />
employees about the purpose of the search test and the progress of the search test. Also,<br />
the e-mail <strong>in</strong>formed potential test persons about privacy issues. The e-mail appears at<br />
the follow<strong>in</strong>g page (<strong>in</strong> Danish).
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Subject: Vil du bidrage til at forbedre SKATs <strong>in</strong>tranet?<br />
Kære medarbejder hos SKAT<br />
Som en del af et større forskn<strong>in</strong>gsprojekt vedr. søgemuligheder på SKATs <strong>in</strong>tranet,<br />
foretager vi i den kommende tid en evaluer<strong>in</strong>g af SKATs <strong>in</strong>tranetløsn<strong>in</strong>g. Formålet med<br />
evaluer<strong>in</strong>gen er at undersøge, hvordan man kan forbedre medarbejderes søgn<strong>in</strong>g efter<br />
<strong>in</strong>formation, når de løser forskellige arbejdsopgaver.<br />
I den forb<strong>in</strong>delse har vi brug for d<strong>in</strong> hjælp. Intranettet bliver testet af et udvalg af<br />
medarbejdere i SKAT. Testen udføres hhv. på ”location 1” og ”location 2”. Det tager<br />
ca. 1 1/2 time og består i, at du får nogle søgeopgaver udleveret, som er dit<br />
udgangspunkt for søgetesten. Der søges i både den nuværende og den nye<br />
<strong>in</strong>tranetløsn<strong>in</strong>g. Søgetesten afsluttes med et kort <strong>in</strong>terview. D<strong>in</strong> deltagelse vil naturligvis<br />
blive behandlet fortroligt og resultaterne formidlet på en måde, så du ikke vil kunne<br />
identificeres.<br />
Hvis du vil være med, beder vi dig udfylde, hvornår du har mulighed for at deltage samt<br />
hvordan vi kan komme i kontakt med dig ved at trykke på dette l<strong>in</strong>k: [log<strong>in</strong>data]<br />
Du vil desuden blive stillet nogle enkelte spørgsmål omkr<strong>in</strong>g d<strong>in</strong> arbejdsfunktion og<br />
brug af <strong>in</strong>tranettet. Besvarelsen tager omkr<strong>in</strong>g 3 m<strong>in</strong>utter.<br />
Vi vil meget gerne have d<strong>in</strong> tilkendegivelse hurtigst muligt og senest torsdag den 20/5-<br />
2010 kl. 18.<br />
Forskn<strong>in</strong>gsprojektet udføres som et samarbejde mellem Danmarks Biblioteksskole, IT<br />
& Telestyrelsen og SKAT. Hos SKAT er projektet forankret i Projektenheden (Ebbe<br />
Tor Andersen). Søgetesten er godkendt af viceskattedirektør Kaj Kirkegaard. Hvis du<br />
har kommentarer, spørgsmål eller lignende, er du velkommen til at kontakte Tanja<br />
Svarre (kontaktoplysn<strong>in</strong>ger nedenfor).<br />
På forhånd mange tak for d<strong>in</strong> tid og hjælp.<br />
Med venlig hilsen<br />
288
Ebbe Tor Andersen (Kommunikation, SKAT) og<br />
Tanja Svarre (Danmarks Biblioteksskole)<br />
289<br />
Appendices<br />
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,<br />
Tanja Svarre, ph.d.-studerende<br />
Danmarks Biblioteksskole, <strong>Aalborg</strong>-afdel<strong>in</strong>gen, Fredrik Bajers Vej 7K, 9220 <strong>Aalborg</strong><br />
Øst<br />
Tlf. 9815 7922, fax 9815 1042<br />
E-mail: tas@db.dk<br />
Ebbe Tor Andersen, specialkonsulent<br />
Projektenheden, SKAT<br />
Tlf. 7 17 02<br />
E-mail: Ebbe.Tor@Skat.dk
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
290
291<br />
Appendices<br />
Appendix 13: Questionnaire for recruit<strong>in</strong>g test persons for the search test<br />
The present appendix presents the questionnaire applied for recruit<strong>in</strong>g test persons for<br />
the search test. The questionnaire was prepared <strong>in</strong> Kalus and is available at:<br />
http://kalus3.kalus.dk/l?d=RmMBCvq24teH
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
292
Appendix 14: Simulated search tasks<br />
293<br />
Appendices<br />
The present appendix presents the three search tasks form<strong>in</strong>g the basis for the controlled<br />
searched <strong>in</strong> the search test.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
SIM 1: Salg af forældrekøbt lejlighed<br />
Søgecase:<br />
Kirsten har solgt en lejlighed købt som forældrekøb. Hun har haft tab på salget og i<br />
samme forb<strong>in</strong>delse haft udgifter til ejendomsmægler og renover<strong>in</strong>g af lejligheden. Kan<br />
hun nu trække tab og udgifter fra i skat?<br />
Søgeopgave:<br />
F<strong>in</strong>d dokumenter, der angiver de skattemæssige forhold omkr<strong>in</strong>g et forældrekøb.<br />
SIM 2: Beskatn<strong>in</strong>g af e-handel<br />
Søgecase:<br />
Et personligt ejet enkeltmandsforlag ønsker at sælge egne, engelsksprogede bøger ved<br />
hjælp af e-handel på hjemmesider i USA og andre lande, eksempelvis Amazon.com og<br />
Smashwords.com. Der er fast driftssted i Danmark. Hvordan skal <strong>in</strong>dehaveren forholde<br />
sig i forhold til beskatn<strong>in</strong>g af <strong>in</strong>dtægten på salget?<br />
Søgeopgave:<br />
F<strong>in</strong>d dokumenter, der angiver, hvordan man beskatter e-handel, der har fast driftssted i<br />
Danmark.<br />
SIM 3: Freelancer<br />
Søgecase:<br />
Jens underviser freelance for en virksomhed, men er på vej til at udvide med flere<br />
kunder. Lykkes alle forhåndsaftaler, vil han komme til at tjene omkr<strong>in</strong>g 100.000 årligt.<br />
Nu er han blevet i tvivl om, om han i givet fald kan fortsætte som lønmodtager eller om<br />
han skal starte erhvervsmæssig virksomhed op og momsregistreres.<br />
Søgeopgave:<br />
F<strong>in</strong>d dokumenter, der angiver reglerne for, hvornår man skal momsregistreres.<br />
294
Appendix 15: Test persons’ <strong>in</strong>sight <strong>in</strong>to simulated search tasks<br />
295<br />
Appendices<br />
Every time a search task had been completed the test persons answered a short on<br />
screen questionnaire captur<strong>in</strong>g their <strong>in</strong>sight <strong>in</strong>to the subject of the task. The<br />
questionnaire was embedded <strong>in</strong> Morae. The questions conta<strong>in</strong>ed <strong>in</strong> the questionnaire<br />
were:<br />
1. M<strong>in</strong> <strong>in</strong>dsigt i arbejdsopgavens emne:<br />
Ingen<br />
<strong>in</strong>dsigt<br />
2. Hvor svær var opgaven?<br />
1 2 3 4 5 Stor <strong>in</strong>dsigt<br />
Meget let 1 2 3 4 5 Meget svær<br />
3. Hvor meget m<strong>in</strong>dede opgaven om de arbejdsopgaver, du sidder med til dagligt?<br />
Intet<br />
sammenfald<br />
1 2 3 4 5<br />
Stort<br />
sammenfald
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
296
Appendix 16: E-mail concern<strong>in</strong>g naturalistic <strong>in</strong>formation needs<br />
297<br />
Appendices<br />
A few days before the search test the test persons received an e-mail ask<strong>in</strong>g them to<br />
br<strong>in</strong>g a search task for the test session. We <strong>in</strong>tentionally sent the e-mail shortly before<br />
the time of the test. This way we wanted make sure, that the test persons actually<br />
remembered to br<strong>in</strong>g the task when show<strong>in</strong>g up. An alternative would have been to<br />
mention it <strong>in</strong> the e-mail confirm<strong>in</strong>g the appo<strong>in</strong>tment. However for some test persons,<br />
they received the confirmative e-mail weeks before the appo<strong>in</strong>tment and that could have<br />
caused the test persons to forget about the extra task. Another benefit was that the test<br />
persons were rem<strong>in</strong>ded of their upcom<strong>in</strong>g appo<strong>in</strong>tment. The text of the e-mail appears<br />
below:<br />
Fra: Tanja Svarre Jonasen<br />
Sendt: on 16-06-2010 10:17<br />
Emne: Vedr. søgetest<br />
Kære testperson,<br />
Når du møder op til søgetesten en af de kommende dage bedes du medbr<strong>in</strong>ge et problem eller<br />
en søgeopgave, som du for nyligt har løst ved at søge på det nuværende <strong>in</strong>tranet. Opgaven<br />
skal helst bære præg af at være typisk for d<strong>in</strong> brug af <strong>in</strong>tranettet.<br />
Vel mødt i lokale F-1-46 (lokalet ved siden af videokonferencen).<br />
Mange hilsner,<br />
Tanja Svarre<br />
Tlf. 9877 3025
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
298
Appendix 17: Instructions for search test persons<br />
299<br />
Appendices<br />
This appendix presents the elements conta<strong>in</strong>ed <strong>in</strong> the <strong>in</strong>struction given to the test<br />
persons <strong>in</strong> advance of the search test.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Instructions for test persons<br />
Test procedure:<br />
You will receive a set of search tasks. The search tasks are divided <strong>in</strong>to two<br />
groups to be searched us<strong>in</strong>g their own search functionality.<br />
Please f<strong>in</strong>d the documents you need to solve the search task you are work<strong>in</strong>g on.<br />
Maybe you know the answer to the task <strong>in</strong> advance, but please search until you<br />
have the document or the documents that can answer the task.<br />
Documents are assessed by a 4-po<strong>in</strong>t relevance scale (the description is<br />
presented and handed out to the test person <strong>in</strong> pr<strong>in</strong>t to enable brush-up dur<strong>in</strong>g<br />
the test).<br />
When you make a search please pr<strong>in</strong>t out the search results. It will be used for<br />
relevance judgments afterwards.<br />
After completion of each of the tasks you fill out a questionnaire on the screen<br />
concern<strong>in</strong>g the task and then f<strong>in</strong>ally I have some general questions about the<br />
system you have been search<strong>in</strong>g.<br />
.<br />
Presentation of the prototype:<br />
System functionalities<br />
o <strong>Automatic</strong> truncation<br />
o Search type: Explanation of the different possibilities<br />
o Document types conta<strong>in</strong>ed: The types are presented and a pr<strong>in</strong>ted<br />
overview is handed out.<br />
o Categorization (is presented as the test person starts out us<strong>in</strong>g it, either<br />
<strong>in</strong>itially or on the way)<br />
o Time of publish<strong>in</strong>g<br />
The system is a prototype. This means that you need to be aware of:<br />
o Please dispense with the numbers stated after the categories – they are<br />
not exact at the present time<br />
o The database has been generated <strong>in</strong> the fall of 2009, which means that<br />
the latest documents are not conta<strong>in</strong>ed. In case you are look<strong>in</strong>g for<br />
<strong>in</strong>structions or similar documents that have been updated recently, it will<br />
be sufficient for you to f<strong>in</strong>d the latest document able to answer your<br />
request <strong>in</strong> the collection.<br />
300
301<br />
Appendices<br />
o At present the system does not offer correspondence between result lists<br />
and the full text of the documents. Therefore, please consider the<br />
relevance of your search results on the basis of the hit lists.<br />
o You need to use your mouse to click “search”. Us<strong>in</strong>g the “enter”-button<br />
will direct you to simple search.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
302
Appendix 18: Rotation of search tasks<br />
303<br />
Appendices<br />
A set of pr<strong>in</strong>ciples were set up for the construction of the rotation. The first row of<br />
unique successions was generated by list<strong>in</strong>g the work tasks as to their number <strong>in</strong><br />
ascend<strong>in</strong>g and decl<strong>in</strong><strong>in</strong>g succession respectively. Next, all work tasks moved one<br />
position to the right. The follow<strong>in</strong>g rotation moved the last task to position number two.<br />
Lastly, a rotation was generated by mov<strong>in</strong>g the task at position number three to position<br />
number one. All rotations were generated twice, start<strong>in</strong>g respectively with<br />
categorization, or without. By these means, 42 unique rotations were formed. Of these,<br />
32 were needed for the search test. These rotations are listed <strong>in</strong> the table on the next<br />
page.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
1. position 2. position 3. position 4. position<br />
1 1 (Sys B) 2 (Sys B) 3 (Sys A) 4 (Sys A)<br />
2 1 (Sys A) 2 (Sys A) 3 (Sys B) 4 (Sys B)<br />
3 1 (Sys B) 4 (Sys B) 2 (Sys A) 3 (Sys A)<br />
4 1 (Sys A) 4 (Sys A) 2 (Sys B) 3 (Sys B)<br />
5 1 (Sys A) 3 (Sys A) 4 (Sys B) 2 (Sys B)<br />
6 1 (Sys A) 3 (Sys A) 2 (Sys B) 4 (Sys B)<br />
7 1 (Sys B) 3 (Sys B) 4 (Sys A) 2 (Sys A)<br />
8 1 (Sys B) 3 (Sys B) 2 (Sys A) 4 (Sys A)<br />
9 2 (Sys A) 4 (Sys A) 3 (Sys B) 1 (Sys B)<br />
10 2 (Sys A) 3 (Sys A) 4 (Sys B) 1 (Sys B)<br />
11 2 (Sys A) 1 (Sys A) 4 (Sys B) 3 (Sys B)<br />
12 2 (Sys A) 4 (Sys A) 1 (Sys B) 3 (Sys B)<br />
13 2 (Sys B) 4 (Sys B) 3 (Sys A) 1 (Sys A)<br />
14 2 (Sys B) 3 (Sys B) 4 (Sys A) 1 (Sys A)<br />
15 2 (Sys B) 1 (Sys B) 4 (Sys A) 3 (Sys A)<br />
16 2 (Sys B) 4 (Sys B) 1 (Sys A) 3 (Sys A)<br />
17 3 (Sys A) 1 (Sys A) 2 (Sys B) 4 (Sys B)<br />
18 3 (Sys A) 2 (Sys A) 4 (Sys B) 1 (Sys B)<br />
19 3 (Sys A) 2 (Sys A) 1 (Sys B) 4 (Sys B)<br />
20 3 (Sys A) 4 (Sys A) 2 (Sys B) 1 (Sys B)<br />
21 3 (Sys B) 1 (Sys B) 2 (Sys A) 4 (Sys A)<br />
22 3 (Sys B) 2 (Sys B) 4 (Sys A) 1 (Sys A)<br />
23 3 (Sys B) 2 (Sys B) 1 (Sys A) 4 (Sys A)<br />
24 3 (Sys B) 4 (Sys B) 2 (Sys A) 1 (Sys A)<br />
25 4 (Sys A) 2 (Sys A) 1 (Sys B) 3 (Sys B)<br />
26 4 (Sys A) 3 (Sys A) 2 (Sys B) 1 (Sys B)<br />
27 4 (Sys A) 1 (Sys A) 2 (Sys B) 3 (Sys B)<br />
28 4 (Sys A) 3 (Sys A) 1 (Sys B) 2 (Sys B)<br />
29 4 (Sys B) 2 (Sys B) 1 (Sys A) 3 (Sys A)<br />
30 4 (Sys B) 3 (Sys B) 2 (Sys A) 1 (Sys A)<br />
31 4 (Sys B) 1 (Sys B) 2 (Sys A) 3 (Sys A)<br />
32 4 (Sys B) 3 (Sys B) 1 (Sys A) 2 (Sys A)<br />
Legend: Sys A and Sys B refer to the designations of the two parts of the test system (see section<br />
6.4.1). The columns list the search tasks as to their position <strong>in</strong> the order of succession.<br />
304
Appendix 19: Search test <strong>in</strong>terview guide<br />
305<br />
Appendices<br />
This appendix presents the <strong>in</strong>terview guide f<strong>in</strong>ish<strong>in</strong>g the test sessions. The questions<br />
are categorized <strong>in</strong> three superior groups.<br />
Perception of the test system<br />
When was the categorization (not) helpful to you dur<strong>in</strong>g search<strong>in</strong>g?<br />
In which way?<br />
What was it about the categorization that made it (un)helpful to you?<br />
In which situations did you not need the categorization?<br />
Present use of the <strong>in</strong>tranet<br />
What characterizes your use of the present <strong>in</strong>tranet? (situations, where it is omitted,<br />
documents you look for <strong>in</strong> the <strong>in</strong>tranet and the like)?<br />
Categorization <strong>in</strong> your daily work<br />
I would like you to describe a typical situation from your daily work, where you make<br />
use of the <strong>in</strong>tranet.<br />
To that situation, how would categorization be useful to you?<br />
To that situation, how would categorization not be relevant to you?
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
306
307<br />
Appendices<br />
Appendix 20: Judgement of the relevance of retrieved documents <strong>in</strong> search<br />
test<br />
The present appendix presents the four degrees of relevance, the test persons could use<br />
dur<strong>in</strong>g the assessment of retrieval sets. The explanation of the dist<strong>in</strong>ct degrees are<br />
based on Sormunen (2002). In the test situation, the content of the appendix were<br />
expla<strong>in</strong>ed to the test persons. Further, a pr<strong>in</strong>t of the explanations was placed next to the<br />
test mach<strong>in</strong>e to allow for the test persons to consult it whenever needed.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Dokumenters relevans<br />
0: Dokumentet <strong>in</strong>deholder <strong>in</strong>gen <strong>in</strong>formation om emnet<br />
1: Dokumentet peger på emnet, men <strong>in</strong>deholder hverken mere eller anden<br />
<strong>in</strong>formation end emnebeskrivelsen, typisk en sætn<strong>in</strong>g eller et faktum.<br />
2: Dokumentet <strong>in</strong>deholder mere <strong>in</strong>formation end emnebeskrivelsen, men ikke på<br />
en udtømmende måde. Er der tale om et emne med flere facetter er det kun<br />
visse undertemaer eller synspunkter, der er dækket. Typisk et tekstafsnit, nogle<br />
sætn<strong>in</strong>ger eller fakta.<br />
3: Dokumentet diskuterer emnets temaer udtømmende. Er der tale om et emne<br />
med flere facetter, er alle eller næsten alle facetter eller synspunkter dækket af<br />
dokumentet. Typisk flere tekstafsnit eller en del sætn<strong>in</strong>ger eller fakta.<br />
308
Appendix 21: Completeness degree of questionnaire responses<br />
309<br />
Appendices<br />
#<br />
% of<br />
798<br />
Completes the questionnaire 340 42,6%<br />
Answer beyond Inspection (page 45 <strong>in</strong> the questionnaire) but quits<br />
somewhere hereafter<br />
27 3,4%<br />
Stop, when the questions regard<strong>in</strong>g Inspection (page 44 <strong>in</strong> the<br />
questionnaire) has f<strong>in</strong>ished<br />
66 8,3%<br />
Stop when the work tasks starts (before page 12 <strong>in</strong> the questionnaire) 13 1,6%<br />
Start the questionnaire but quits before answer<strong>in</strong>g their place of<br />
employment (before page 3 <strong>in</strong> the questionnaire)<br />
14 1,8%<br />
Sign <strong>in</strong>to the questionnaire, but does not answer any questions 36 4,5%<br />
Do not log <strong>in</strong> at all 302 37,8%<br />
Total 798* 100%<br />
Legend: The questionnaire was sent to 799 respondents. However, one could not be <strong>in</strong>cluded due to<br />
errors and were deleted. Therefore, the sum of respondents adds up to 798.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
310
Appendix 22: Respondents’ experience with work tasks<br />
Work task/<br />
experience with work task<br />
311<br />
0-6 months<br />
Instruction 6<br />
3%<br />
Settlement: common<br />
Settlement: prelim<strong>in</strong>ary assessment of<br />
<strong>in</strong>come/personal taxes<br />
Settlement: bus<strong>in</strong>ess relations<br />
2<br />
4%<br />
Settlement: corporation taxes 1<br />
4%<br />
Settlement: customs<br />
Settlement: vehicles 1<br />
6%<br />
Settlement: estate 2<br />
14%<br />
Inspection: common<br />
Inspection: customs 1<br />
6%<br />
Collection 4<br />
10%<br />
Processes of support: legal support<br />
Processes of support: m<strong>in</strong>ister service 3<br />
30%<br />
7-11 months<br />
13<br />
7%<br />
1<br />
5%<br />
2<br />
4%<br />
4<br />
7%<br />
3<br />
12%<br />
3<br />
25%<br />
6<br />
33%<br />
1<br />
2%<br />
2<br />
13%<br />
4<br />
9%<br />
1-2 years<br />
33<br />
18%<br />
4<br />
20%<br />
6<br />
11%<br />
13<br />
23%<br />
7<br />
28%<br />
1<br />
8%<br />
8<br />
44%<br />
1<br />
7%<br />
5<br />
8%<br />
1<br />
6%<br />
7<br />
18%<br />
7<br />
16%<br />
Appendices<br />
3-5 years<br />
21<br />
12%<br />
3<br />
15%<br />
4<br />
7%<br />
2<br />
4%<br />
2<br />
8%<br />
2<br />
17%<br />
1<br />
6%<br />
3<br />
21%<br />
5<br />
8%<br />
1<br />
6%<br />
9<br />
23%<br />
10<br />
22%<br />
2<br />
20%<br />
More than 5 years<br />
108<br />
60%<br />
12<br />
60%<br />
43<br />
75%<br />
38<br />
67%<br />
12<br />
48%<br />
6<br />
50%<br />
2<br />
11%<br />
8<br />
57%<br />
49<br />
80%<br />
11<br />
69%<br />
19<br />
49%<br />
24<br />
53%<br />
5<br />
50%
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Processes of support: IT service and adm<strong>in</strong>istration<br />
Processes of support: HR and education<br />
Processes of support: <strong>in</strong>ternal activities 4<br />
27%<br />
Management and development: strategy 2<br />
13%<br />
Management and development: bus<strong>in</strong>ess management 1<br />
7%<br />
Management and development: development 3<br />
11%<br />
312<br />
3<br />
21%<br />
1<br />
7%<br />
1<br />
6%<br />
3<br />
21%<br />
3<br />
11%<br />
3<br />
21%<br />
2<br />
14%<br />
3<br />
20%<br />
1<br />
6%<br />
1<br />
7%<br />
7<br />
26%<br />
4<br />
29%<br />
5<br />
36%<br />
5<br />
33%<br />
7<br />
44%<br />
5<br />
36%<br />
6<br />
22%<br />
4<br />
29%<br />
6<br />
43%<br />
3<br />
20%<br />
5<br />
31%<br />
4<br />
29%<br />
8<br />
30%
313<br />
Appendices<br />
Appendix 23: Age distribution of population, respondents and test persons<br />
To compare, the total figures of SKAT at the time count:<br />
Total<br />
numbers<br />
17-18 19-25 26-35 36-45 46-55 56-68<br />
3 91 950 2180 2972 2473<br />
% 0% 1% 11% 25% 34% 28%
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Questionnaire respondent distribution<br />
N Valid 340<br />
Miss<strong>in</strong>g 0<br />
Mean 47.29<br />
Median 47.00<br />
Mode 44<br />
Std. Deviation 9.537<br />
Skewness -.354<br />
Std. Error of Skewness .132<br />
Population distribution:<br />
N Valid 8681<br />
Miss<strong>in</strong>g 0<br />
Mean 48.44<br />
Median 49.00<br />
Mode 58<br />
Std. Deviation 9.779<br />
Skewness -.420<br />
Std. Error of Skewness .026<br />
314<br />
Test person distribution<br />
N Valid 31<br />
Miss<strong>in</strong>g 1<br />
Mean 46.45<br />
Median 48.00<br />
Mode 48<br />
Std. Deviation 8.532<br />
Skewness -.250<br />
Std. Error of Skewness .421
Appendix 24: Respondents’ length of service <strong>in</strong> the organization<br />
315<br />
Appendix 1
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
316
Appendix 25: Focus group participants work tasks<br />
317<br />
Appendices<br />
The present appendix shows the distribution of participants across the 19 generic work<br />
tasks. In the <strong>in</strong>troductory part of the focus group <strong>in</strong>terviews, the participants were asked<br />
to present themselves. It is this personal <strong>in</strong>troduction that has functioned as the basis for<br />
the table below. We compared the participants’ descriptions of their work areas with the<br />
work task descriptions from the questionnaire. On this basis, we placed the respondents<br />
as to their primary work task. In some cases, <strong>in</strong> particular <strong>in</strong> the <strong>in</strong>terview concern<strong>in</strong>g<br />
Instruction, other more important areas of responsibility appeared dur<strong>in</strong>g the <strong>in</strong>terview.<br />
In those cases we assessed which work task were more important and let that<br />
assessment guide the placement of the specific participant.
<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />
Ma<strong>in</strong> process Work task Participants<br />
Instruction Instruction R7, R8, R9, R10,<br />
R11, R12<br />
Settlement Common<br />
Prelim<strong>in</strong>ary assessment of<br />
<strong>in</strong>come/personal taxes<br />
R4<br />
Bus<strong>in</strong>ess relations R3, R6<br />
Corporation taxes R2<br />
Customs R18, R19, R20,<br />
R21, R22<br />
Inspection<br />
Vehicles<br />
Estate<br />
Common R23 R24, R25,<br />
R26, R27<br />
Collection<br />
Customs<br />
Collection R33, R34, R35<br />
Processes of support Legal support R1, R13, R14,<br />
R15, R16, R17<br />
M<strong>in</strong>ister service<br />
IT service and adm<strong>in</strong>istration R28<br />
HR and education R1<br />
Internal activities R32, R28<br />
Management and Strategy R29, R30<br />
development<br />
Bus<strong>in</strong>ess management<br />
Legend: R20 refers to the specific focus group participant.<br />
Development R29, R31<br />
318
Appendix 26: Additional sources mentioned by respondents<br />
319<br />
Appendices<br />
This appendix reports the sources listed by the respondents to supplement the<br />
predef<strong>in</strong>ed list of sources used for <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> relation to certa<strong>in</strong> work tasks.<br />
The sources are listed as to the work task, they have been mentioned <strong>in</strong> connection with.
<strong>Automatic</strong> Index<strong>in</strong>g<br />
Work task Additions to predef<strong>in</strong>ed sources<br />
Instruction BIQ systemen<br />
bekendtgørelse, herunder registrer<strong>in</strong>gsbekendtgørelsen og<br />
registrer<strong>in</strong>gsafgiftsloven<br />
SKATs alm<strong>in</strong>delige systemer Fx DR, TStele, KMD osv<br />
Database vedr tilbageholdte forsendelser<br />
DetailCOR (= <strong>in</strong>dkomstoplysn<strong>in</strong>ger <strong>in</strong>deværende år); google<br />
maps til kørselsvejledn<strong>in</strong>g<br />
www.europa.eu<br />
www.skat.dk<br />
Nabolær<strong>in</strong>g i tvilvs tilfælde<br />
kmd-skat<br />
Best Practic Vejledn<strong>in</strong>ger<br />
Diverse <strong>in</strong>terne søgesystemer<br />
skat.dk<br />
Inddrivelsesvejledn<strong>in</strong>gen<br />
skat lign<strong>in</strong>g, ts tele, virksomhedsreg., remedy<br />
Google – søgn<strong>in</strong>g<br />
politiets EDB-programmer<br />
BIQ og SØS<br />
Kollegaer<br />
Generelt bruger jeg alle, men Intranet og Captia er de mest brugte<br />
Interne systemer i Skat.<br />
Tele, KMD osv.<br />
Specielt <strong>in</strong>ternetsider med kort og luftfoto<br />
Remedy, KMD, Dipsy, DR-sys, TP-sys m.fl.<br />
Forskelligt kursusmateriale<br />
trykte lovsaml<strong>in</strong>ger. TfS. Lærerbøger o.l.<br />
Remedy<br />
CVR.dk<br />
SKAT's <strong>in</strong>terne it-systemer<br />
TP, TS-tele, Remedy, Sap, KMD osv.<br />
SKATS hjemmeside - ikke <strong>in</strong>tranettet, men den offentligt<br />
tilgængelige hjemmeside www.skat.dk<br />
I perioder bruge jeg captia ellers bruger jeg <strong>in</strong>gen udover skats<br />
sys.<br />
SKATs egne EDB-systemer<br />
SKAT's generelle systemer<br />
Programmer<strong>in</strong>gsgrundlag, programmer mv.<br />
Jeg bruger SKATs jurister og læser ellers<br />
lovforslag/lov/bemærkn<strong>in</strong>ger og lign<strong>in</strong>gsvejledn<strong>in</strong>g<br />
Sharepo<strong>in</strong>t<br />
EU's databaser, sites for andre landes skattemyndigheder,<br />
<strong>in</strong>teresseorganisationers og virksomheders hjemmesider<br />
Afhænger af den konkrete opgave<br />
Settlement: common politiets edb-systemer<br />
KMDs og statens systemer<br />
320
321<br />
Appendices<br />
Work task Additions to predef<strong>in</strong>ed sources<br />
Skat´s DR-system, SAP og andre skattesystemer<br />
Det afhænger af situationen. Det er mest udenlandsk <strong>in</strong>dkomst jeg<br />
Settlement: prelim-<br />
nary assessment of<br />
<strong>in</strong>come/personal<br />
taxes<br />
Settlement: bus<strong>in</strong>ess<br />
relations<br />
Settlement:<br />
corporation taxes<br />
behandler og det kræver ofte yderligere undersøgelser.<br />
KMD SkatLign<strong>in</strong>g (sagsbehandlersystem). Captia er noget<br />
skrammel at arbejde med til fremf<strong>in</strong>d<strong>in</strong>g af dokumenter<br />
(medm<strong>in</strong>dre jeg endnu ikke har gennemskuet det smarte ved<br />
Captia)<br />
skat.dk<br />
Google søgn<strong>in</strong>g<br />
Kollegaer<br />
Egne mapper over oplysn<strong>in</strong>ger/vejledn<strong>in</strong>ger, som jeg har samlet<br />
gennem tiden eller som vi selv har aftalt i afdel<strong>in</strong>gen.<br />
Hukommelsen og mangeårig erfar<strong>in</strong>g om skat er de væsentligste<br />
kilder i dagligdagen.<br />
Statens systemer og KMD Skatlign<strong>in</strong>g<br />
Remedy<br />
Det afhænger meget af hvilken type <strong>in</strong>dtægt eller situation jeg<br />
behandler. Det er mest udenlandsk <strong>in</strong>dkomst og flytn<strong>in</strong>g til<br />
udlandet jeg arbejder med.<br />
Kollegaer<br />
sparr<strong>in</strong>g med kollegaer<br />
Skattenyt – Schultz<br />
Google<br />
SKAT´s <strong>in</strong>terne edb-systemer. Bus<strong>in</strong>es Object, KMD Skat<br />
Lign<strong>in</strong>g, KMD Skat Forskud, TP-systemet, Remedy, CPRsystemet,<br />
Dipsy, Erhvervssystemet<br />
Aviser<br />
Jeg er afdel<strong>in</strong>gsleder for ca. 20 medarbejdere - jeg anvender stort<br />
set hele m<strong>in</strong> arbejdstid til personaleledelse.<br />
Jeg er afdel<strong>in</strong>gsleder, så jeg laver ikke direkte sagsbehandl<strong>in</strong>g<br />
Momsmanual<br />
Ingen af dem. Jeg hjælper en gang imellem med at taste<br />
momsangivelser. Til daglig sidder jeg med Listeangivelser vedr.<br />
EU-salg<br />
SKAT's generelle systemer<br />
BIQ og SØS<br />
Kollegaer<br />
Settlement: customs <br />
<br />
www.europa.eu<br />
EU's forordn<strong>in</strong>ger vedr. forsendelser<br />
EU's elektroniske opslagsværker<br />
Toldsystemet<br />
EUR-lex<br />
Settlement: vehicles Egne notater<br />
Settlement: estate <br />
<br />
BBR, Kort- og Matrikelstyrelsen, Elektroniske varsl<strong>in</strong>ger<br />
Specielt <strong>in</strong>ternetsider med kort og luftfoto<br />
Danmarks Areal<strong>in</strong>fo, Danmarks Statistikbank, Kort&
<strong>Automatic</strong> Index<strong>in</strong>g<br />
Work task Additions to predef<strong>in</strong>ed sources<br />
Matrikelstyrelsen, OIS, BBR, Kommunernes hjemmesider med<br />
lokalplaner, GEO, Plansystem<br />
Plansystemer, kortopslag, realkreditrådet<br />
Inspection: common BIQ<br />
spørger kollegaer<br />
Nabolær<strong>in</strong>g<br />
SRF - kursusmaterialer, m.v.<br />
KMD-SkatLign<strong>in</strong>g, Remedy,mv<br />
Konkrete oplysn<strong>in</strong>ger fra SKATs egne systemer. TS-tele, KMD<br />
skat/lign<strong>in</strong>g, Remedy. Dvs. systemgenererede, <strong>in</strong>dtastede, <strong>in</strong>terne<br />
dokumenter, arbejdspapirer mv. Kontroloplysn<strong>in</strong>ger på R-75 med<br />
mere.<br />
Diverse andre <strong>in</strong>terne søgesystemer<br />
Amadeus database, Kob, Biq<br />
Kollegaer<br />
Aviser<br />
jeg er ikke sagsbehandler<br />
Remedy, KMD, Dipsy, DR-sys, TP-sys, m.fl.<br />
Inspection: customs FødevareErhvervs hjemmeside.<br />
FødevareErhvervs hjemmeside - EU-tidende<br />
Captia bruges kun til afdel<strong>in</strong>gs sagen<br />
EU's elektroniske opslagsværker<br />
feoga-håndbogen. Forordn<strong>in</strong>ger fra EU<br />
toldsystemet<br />
Collection Momsprogrammer<br />
Undervisn<strong>in</strong>gsmateiale fra studie samt relevante bøger fra studiet<br />
Domstole.dk<br />
saprr<strong>in</strong>g med kollegaer<br />
Egne systemer<br />
Processes of support:<br />
legal support<br />
Processes of support:<br />
IT service and<br />
adm<strong>in</strong>istration<br />
Processes of support:<br />
<strong>in</strong>ternal activities<br />
Skattemappen<br />
kmd-skat<br />
Skattemappen<br />
EU's elektroniske opslagsværker<br />
-one word: google (forresten: jeg har <strong>in</strong>gen økonomiske<br />
<strong>in</strong>teresser i at fremhæve google fremfor andre....kun at google<br />
virker, hvergang)<br />
Ingen<br />
Egen SharePo<strong>in</strong>t løsn<strong>in</strong>g (Sysmod fase 1), Dokumenter i<br />
filstruktur<br />
Microsoft også som trykte medier<br />
Programmer og programmer<strong>in</strong>gsgrundlag<br />
Sharepo<strong>in</strong>t som væsentligste redskab<br />
Gamle mails, Programmer<strong>in</strong>gsgrundlag mv<br />
KMD SKAT LIGNING<br />
cvr registret<br />
322
Work task Additions to predef<strong>in</strong>ed sources<br />
Management and<br />
development:<br />
<br />
<br />
google<br />
Afdel<strong>in</strong>gens fællesdrev<br />
strategy<br />
Management and<br />
development:<br />
development<br />
KMD-SKAT LIGNING<br />
Værktøjer og vejledn<strong>in</strong>ger der er placeret på H-drevet<br />
Datawarehouse, KMD Skat Lign<strong>in</strong>g(sagsstyr<strong>in</strong>g) dipsy<br />
www.skat.dk<br />
Vores SAP-system, udtræk af diverse rapporte<br />
Afdel<strong>in</strong>gens fællesdrev<br />
323<br />
Appendices
<strong>Automatic</strong> Index<strong>in</strong>g<br />
324
Appendix 27: Test persons’ background data<br />
325<br />
Appendices<br />
The appendix conta<strong>in</strong>s tables display<strong>in</strong>g background data for the test persons of the<br />
search test as regards gender, age, length of service, and education. The tables were<br />
generated <strong>in</strong> SPSS and are listed <strong>in</strong> the order just mentioned. For all three tables <strong>in</strong> the<br />
appendix, one person did not respond to these particular questions. This expla<strong>in</strong>s the<br />
difference <strong>in</strong> N (<strong>in</strong> the search test N=32, <strong>in</strong> this appendix N=31).<br />
Gender distribution<br />
Frequency Percent Valid Percent<br />
Cumulative<br />
Percent<br />
Valid Male 10 31.3 32.3 32.3<br />
Female 21 65.6 67.7 100.0<br />
Total 31 96.9 100.0<br />
Miss<strong>in</strong>g System 1 3.1<br />
Total 32 100.0<br />
Legend: The table displays the distribution of men and women <strong>in</strong> the group of test persons. N=31.<br />
The test persons’ year of birth and length of service<br />
Year of birth Length of service<br />
N Valid 31 31<br />
Miss<strong>in</strong>g 1 1<br />
Mean 1963.23 21.68<br />
Median 1962.00 24.00<br />
M<strong>in</strong>imum 1949 4<br />
Maximum 1980 43<br />
Legend: Calculations of the average, m<strong>in</strong>imum and maximum age and length of service of the test<br />
persons. The length of service column denotes the number of years, the test persons have been<br />
work<strong>in</strong>g <strong>in</strong> the organization. N=31.
<strong>Automatic</strong> Index<strong>in</strong>g<br />
Latest education of the test persons<br />
Frequency Percent Valid Percent<br />
326<br />
Cumulative<br />
Percent<br />
Valid Internal clerk programme 6 18.8 19.4 19.4<br />
Adm<strong>in</strong>istrative assistant 4 12.5 12.9 32.3<br />
Other vocational education<br />
and tra<strong>in</strong><strong>in</strong>g<br />
2 6.3 6.5 38.7<br />
Bachelor degree 1 3.1 3.2 41.9<br />
Medium-cycle higher<br />
education<br />
1 3.1 3.2 45.2<br />
Long-cycle higher education 9 28.1 29.0 74.2<br />
Master's programme 8 25.0 25.8 100.0<br />
Total 31 96.9 100.0<br />
Miss<strong>in</strong>g System 1 3.1<br />
Total 32 100.0
Appendix 28: Supplementary search test tables<br />
Table 1: Reformulations <strong>in</strong> sessions<br />
System A M<strong>in</strong>: 0<br />
Max: 5<br />
SD=1.5<br />
System B M<strong>in</strong>: 0<br />
Max: 18<br />
SD=5.2<br />
Total M<strong>in</strong>: 0<br />
Max: 18<br />
SD=3.9<br />
(n=32)<br />
Sim1 Sim2 Sim3 NWT Total<br />
M<strong>in</strong>: 0<br />
Max: 11<br />
SD=2.8<br />
M<strong>in</strong>: 2<br />
Max: 10<br />
SD=2.9<br />
M<strong>in</strong>: 0<br />
Max: 11<br />
SD=3.3<br />
(n=32)<br />
M<strong>in</strong>: 0<br />
Max: 27<br />
SD=6.6<br />
M<strong>in</strong>: 0<br />
Max: 15<br />
SD=3.6<br />
M<strong>in</strong>: 0<br />
Max: 27<br />
SD=5.3<br />
(n=32)<br />
327<br />
M<strong>in</strong>: 0<br />
Max: 9<br />
SD=2.7<br />
M<strong>in</strong>: 0<br />
Max: 10<br />
SD=3.4<br />
M<strong>in</strong>: 0<br />
Max: 10<br />
SD=3.1<br />
(n=32)<br />
Table 2: Correlations of the number of search terms <strong>in</strong> queries and number of hits<br />
Correlations<br />
No. of terms <strong>in</strong><br />
query No. of hits<br />
No. of terms <strong>in</strong> query Pearson Correlation 1 .200 **<br />
Sig. (2-tailed) .002<br />
N 229 229<br />
No. of hits Pearson Correlation .200 **<br />
Sig. (2-tailed) .002<br />
N 229 229<br />
**. Correlation is significant at the 0.01 level (2-tailed).<br />
Legend: The table displays the correlations <strong>in</strong> system A, as the number of<br />
hits <strong>in</strong> system B is much lower due to categorization. Therefore: N=229.<br />
1<br />
Appendices<br />
M<strong>in</strong>: 0<br />
Max: 27<br />
SD=4.0<br />
M<strong>in</strong>: 0<br />
Max: 18<br />
SD=3.9<br />
M<strong>in</strong>: 0<br />
Max: 27<br />
SD=4.0<br />
(n=128)
<strong>Automatic</strong> Index<strong>in</strong>g<br />
Table 3: Correlations between number of terms <strong>in</strong> queries and the succession of search tasks.<br />
Correlations<br />
328<br />
No. of terms <strong>in</strong><br />
query<br />
Succession of<br />
search task<br />
No. of terms <strong>in</strong> query Pearson Correlation 1 .037<br />
Sig. (2-tailed) .386<br />
N 564 564<br />
Succession of search task Pearson Correlation .037 1<br />
Sig. (2-tailed) .386<br />
N 564 564<br />
Table 4: Number of documents retrieved <strong>in</strong> system A and system B us<strong>in</strong>g the possible search<br />
operators (averages)<br />
FT AW ES OW Total<br />
Number of documents 548.3 120.8 27.5 332,6 309.6<br />
retrieved: System A (n=102) (n=110) (n=13) (n=4) (N=229)<br />
Number of documents 24.7 10.2 1.5 18 19.2<br />
retrieved: System B (n=133) (n=78) (n=2) (n=2) (N=215)<br />
Legend: FT=Free text, AW=Pages conta<strong>in</strong><strong>in</strong>g all words, ES=This exact sentence, OW=At least one<br />
of the words. For system B searches: N designate the number of system B searches actually carried<br />
out <strong>in</strong> system B (cf. section 8.2.3).<br />
Table 5: Comb<strong>in</strong>ations of category reformulations with other types of reformulations <strong>in</strong> system B<br />
queries (percentages)<br />
Reformulations Query terms Document type Search operator<br />
Share of N=83. 66 (79.5) 18 (21.7) 17 (20.5)<br />
Legend: The queries <strong>in</strong>cluded <strong>in</strong> the table are system B queries conta<strong>in</strong><strong>in</strong>g category type<br />
reformulations <strong>in</strong> comb<strong>in</strong>ation with the rema<strong>in</strong><strong>in</strong>g types of reformulations. N=83.