01.08.2013 Views

Automatic indexing in e-government - VBN - Aalborg Universitet

Automatic indexing in e-government - VBN - Aalborg Universitet

Automatic indexing in e-government - VBN - Aalborg Universitet

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong>:<br />

Improved access to adm<strong>in</strong>istrative documents for professional<br />

users?<br />

Tanja Svarre<br />

PhD thesis from Department of Communication<br />

<strong>Aalborg</strong> University, Denmark


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong>:<br />

Improved access to adm<strong>in</strong>istrative documents for professional<br />

users?<br />

Tanja Svarre<br />

PhD thesis from Department of Communication<br />

<strong>Aalborg</strong> University, Denmark


CIP – Catalogu<strong>in</strong>g <strong>in</strong> Publication<br />

Svarre, Tanja<br />

<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong>: Improved access to<br />

adm<strong>in</strong>istrative documents for professional users?/ Tanja<br />

Svarre. – <strong>Aalborg</strong>: <strong>Aalborg</strong> University, 2012. xiv, 328 p.<br />

© Copyright Tanja Svarre 2012<br />

All rights reserved


Automatisk <strong>in</strong>dekser<strong>in</strong>g <strong>in</strong>denfor e-<strong>government</strong>:<br />

Forbedret adgang til adm<strong>in</strong>istrative dokumenter<br />

for professionelle brugere?<br />

Tanja Svarre<br />

Ph.d.-afhandl<strong>in</strong>g fra<br />

Institut for Kommunikation, <strong>Aalborg</strong> <strong>Universitet</strong>


Acknowledgments<br />

F<strong>in</strong>ish<strong>in</strong>g this thesis has been possible due to cont<strong>in</strong>uous support from colleagues,<br />

family, and friends. For this, I thank you all.<br />

First and foremost I want to thank my supervisor, Professor Marianne Lykke.<br />

You have readily shared valuable knowledge, <strong>in</strong>sight, and experiences, of which I have<br />

learned a lot. Your constructive feedback for written and oral productions has been<br />

<strong>in</strong>valuable, and always moved the project further. You have been an enthusiastic,<br />

flexible and helpful supervisor. For this I am very grateful.<br />

Also, I am <strong>in</strong>debted to a number of present and former colleagues. First I want<br />

to thank former heads of research programme Pia Borlund and Jette Hyldegaard, former<br />

head of department Jesper W. Schneider, head of department Jack Andersen (all from<br />

RSLIS), director of doctoral school Ann Bygholm and head of department Christian<br />

Jantzen for professionally support<strong>in</strong>g me dur<strong>in</strong>g the different phases of this project. My<br />

gratitude also goes to Professor Pia Borlund for plant<strong>in</strong>g the first seeds of my <strong>in</strong>terest <strong>in</strong><br />

research and for always readily discuss<strong>in</strong>g theoretical and empirical matters with great<br />

enthusiasm. To associate professor Jesper W. Schneider for be<strong>in</strong>g a persistent<br />

discussion partner and <strong>in</strong>spiration on statistical matters and questionnaires, and for good<br />

companionship on the IC3. To associate professors Haakon Lund and Birger Larsen for<br />

your flexible and patient support <strong>in</strong> various technical matters. And to former PhD<br />

student Charles Seger. Despite the distance you have been a valuable support <strong>in</strong> good<br />

times and <strong>in</strong> bad, and a great companion <strong>in</strong> travel<strong>in</strong>g. To assistant professor Mette<br />

Skov, thank for your encouragement, for proofread<strong>in</strong>g chapters, and for be<strong>in</strong>g a good<br />

colleague. PhD Brian Kirkegaard Lunn, you undertook extended responsibilities as my<br />

“buddy” and was an excellent partner <strong>in</strong> teach<strong>in</strong>g. Thank you for your always open<br />

door, at the office and at home. Lastly, I want to thank all my colleagues at Friis for<br />

your warm welcome. It has been a pleasure to jo<strong>in</strong> you.<br />

Further, I am <strong>in</strong>debted to Associate Professors Katri<strong>in</strong>a Byström and Tom<br />

Nyvang, and Professor Gunilla Widén for jo<strong>in</strong><strong>in</strong>g the assessment committee and for<br />

mak<strong>in</strong>g time for read<strong>in</strong>g and comment<strong>in</strong>g on the thesis. I highly appreciate your effort.<br />

I also want to thank people outside the research community of <strong>Aalborg</strong> University and<br />

The Royal school of Library and Information Science. I am grateful to Professor<br />

Susanne Bødker for mak<strong>in</strong>g my research visit at Aarhus University, Department of<br />

I


Computer Science possible, and to PhD Niels Mathiassen for good times dur<strong>in</strong>g my<br />

stay.<br />

I am also very thankful to the former National IT and Telecom Agency (IT &<br />

Telestyrelsen) for provid<strong>in</strong>g the topical frame for the project, and for support<strong>in</strong>g the<br />

project. Special thanks goes to senior consultant Palle Aagaard, my contact person <strong>in</strong><br />

the agency, for your <strong>in</strong>terest <strong>in</strong> the project, for mak<strong>in</strong>g room for practice oriented<br />

perspectives on empirical matters, both dur<strong>in</strong>g plann<strong>in</strong>g and analysis, for provid<strong>in</strong>g<br />

competent <strong>in</strong>put for the project, and for always be<strong>in</strong>g available. I am very grateful to<br />

SKAT too for mak<strong>in</strong>g the collaboration possible, and for mak<strong>in</strong>g employees, office and<br />

IT facilities available for my empirical somersaults. In particular I want to express my<br />

gratitude to my contact person <strong>in</strong> SKAT, special consultant Ebbe Tor Andersen. You<br />

have been an enthusiastic source of <strong>in</strong>spiration, always see<strong>in</strong>g possibilities rather than<br />

limitations. I also want to thank all the participants of the project. The 340<br />

questionnaire respondents, the 35 focus group participants and the 42 people<br />

participat<strong>in</strong>g <strong>in</strong> the search test, either as pilot testers or test persons. Thank you for all<br />

your <strong>in</strong>puts, your time, and for your goodwill. And to my transcriber, Timo Iwersen, I<br />

am grateful for your effort <strong>in</strong> transform<strong>in</strong>g the <strong>in</strong>terviews <strong>in</strong>to text.<br />

Last, but by no means least, I am grateful to my boyfriend Sune, and my dear<br />

family and friends for your cont<strong>in</strong>uous patience with me dur<strong>in</strong>g my writ<strong>in</strong>g and work<strong>in</strong>g<br />

on this project. Thank you for your persistence, your help and support, both mentally<br />

and <strong>in</strong> practical matters, for believ<strong>in</strong>g <strong>in</strong> me, and for still be<strong>in</strong>g there. Without you I<br />

could not have succeeded <strong>in</strong> f<strong>in</strong>ish<strong>in</strong>g this thesis. Your dedication is highly treasured.<br />

To my girls, Annika and Maja, I am blessed to have you <strong>in</strong> my life. Your love carried<br />

me all the way through this project.<br />

II


Abstract<br />

The overall purpose of the present thesis is to <strong>in</strong>vestigate, if automatic assigned<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> methods can improve professional users’ access to work-based documents <strong>in</strong><br />

the doma<strong>in</strong> of e-<strong>government</strong>. The problem is <strong>in</strong>vestigated by means of a case study <strong>in</strong><br />

the Danish tax authorities SKAT. An experimental comparative test was designed on<br />

the basis of a preced<strong>in</strong>g doma<strong>in</strong> study, clarify<strong>in</strong>g the seek<strong>in</strong>g behaviour <strong>in</strong> e<strong>government</strong>.<br />

The <strong>in</strong>troduction of e-<strong>government</strong> has arisen from a desire for effectiveness,<br />

efficiency and greater transparency <strong>in</strong> public adm<strong>in</strong>istration. Today public-sector<br />

employees commonly carry out manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> of <strong>government</strong> documents. With the<br />

thesis we want to <strong>in</strong>vestigate if automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can replace, and perhaps even<br />

improve, the current manual procedures to be able to support efficiency and<br />

effectiveness.<br />

An employee perspective guides the thesis. That <strong>in</strong>volves a user group with<br />

great knowledge of the topic they are work<strong>in</strong>g with. In contrast to citizens and other e<strong>government</strong><br />

stakeholders, not much is known about the seek<strong>in</strong>g behaviour of employees<br />

<strong>in</strong> the doma<strong>in</strong>. In addition the <strong>in</strong>troduction of e-<strong>government</strong> is expected to change<br />

employees’ work tasks, and with that their <strong>in</strong>formation needs. That calls for an<br />

<strong>in</strong>vestigation of the present <strong>in</strong>formation seek<strong>in</strong>g behaviour of e-<strong>government</strong> employees.<br />

In the thesis this is done by means of a doma<strong>in</strong> study. The study is based on a<br />

questionnaire distributed to employees <strong>in</strong> SKAT and subsequent focus group <strong>in</strong>terviews.<br />

The doma<strong>in</strong> study shows that the employees use a number of primarily onl<strong>in</strong>e<br />

<strong>in</strong>formation sources to solve their work tasks. The sources are used frequently. The<br />

employees primarily have verificative and conscious topical <strong>in</strong>formation needs. Besides<br />

that they are experienced <strong>in</strong>formation searchers request<strong>in</strong>g more extensive metadata <strong>in</strong><br />

the system form<strong>in</strong>g the basis of the search test: their <strong>in</strong>tranet.<br />

The knowledge ga<strong>in</strong>ed from the doma<strong>in</strong> study was <strong>in</strong>corporated <strong>in</strong>to the search<br />

test design. The test was an experimental test compar<strong>in</strong>g automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

(free text <strong><strong>in</strong>dex<strong>in</strong>g</strong>) and automatic assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> (categorization). In the assigned<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> a doma<strong>in</strong> specific taxonomy formed the basis of the categories. The test<br />

system was a prototype of a future version of SKATs <strong>in</strong>tranet. 32 test persons carried<br />

out searches with the two <strong><strong>in</strong>dex<strong>in</strong>g</strong> types <strong>in</strong> two separate systems <strong>in</strong> experimental sense.<br />

3 simulated search tasks and 1 genu<strong>in</strong>e search task guided the searches. The the<br />

III


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

simulated search tasks were designed <strong>in</strong> accordance with the f<strong>in</strong>d<strong>in</strong>gs from the doma<strong>in</strong><br />

study regard<strong>in</strong>g the <strong>in</strong>formation needs of the employees. The test showed that the two<br />

automatic types of <strong><strong>in</strong>dex<strong>in</strong>g</strong> are useful to the employees <strong>in</strong> their own way. At a general<br />

level extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> had the best performance measured <strong>in</strong> terms of the average<br />

number of terms and concepts <strong>in</strong> queries, <strong>in</strong> terms of the number of sessions with<br />

reformulations, and <strong>in</strong> terms of the number of reformulations <strong>in</strong> sessions. This showed<br />

that the system with categorization demanded more from the test persons <strong>in</strong> comparison<br />

to the free text <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

It turned out that the test persons had difficulties us<strong>in</strong>g the<br />

categorization <strong>in</strong> some respects. Thus it was not relevant to them, if they retrieved a<br />

highly relevant document with a high rank order before us<strong>in</strong>g the categorization. They<br />

did not f<strong>in</strong>d it relevant either, if they retrieved very few results by the <strong>in</strong>itial search. In<br />

those cases it was easier for them to manually go through the results. In contrast the<br />

categorization was helpful <strong>in</strong> identify<strong>in</strong>g new facets of a search task and <strong>in</strong> suggest<strong>in</strong>g<br />

new search terms <strong>in</strong> reformulations. For future e-<strong>government</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es this<br />

resulted <strong>in</strong> the recommendation that both assigned and extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> should be<br />

represented as search facilitators, as they support their own aspects of the <strong>in</strong>formation<br />

needs aris<strong>in</strong>g for employees <strong>in</strong> e-<strong>government</strong>.<br />

The thesis contributes by provid<strong>in</strong>g new <strong>in</strong>sights <strong>in</strong>to the <strong>in</strong>formation seek<strong>in</strong>g behavior<br />

of employees <strong>in</strong> e-<strong>government</strong> and the way <strong>in</strong> which this behavior can be supported by<br />

automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

IV


Abstract <strong>in</strong> Danish<br />

Formålet med nærværende afhandl<strong>in</strong>g er at afdække, hvorvidt automatisk <strong>in</strong>dekser<strong>in</strong>g<br />

kan forbedre medarbejderes adgang til arbejdsbaseret <strong>in</strong>formationssøgn<strong>in</strong>g <strong>in</strong>denfor<br />

domænet digital forvaltn<strong>in</strong>g. Problemstill<strong>in</strong>gen undersøges i ph.d projektet som et<br />

casestudie hos SKAT. Mere præcist foretages en komparativ søgetest. Testen er<br />

designet på baggrund af et forudgående domænestudie, der afklarer søgeadfærden<br />

<strong>in</strong>denfor digital forvaltn<strong>in</strong>g.<br />

Introduktionen af digital forvaltn<strong>in</strong>g er opstået ud fra et ønske om<br />

effektiviser<strong>in</strong>g af og øget åbenhed i den offentlige forvaltn<strong>in</strong>g. Det er i dag udbredt, at<br />

offentlige medarbejdere manuelt <strong>in</strong>dekserer forvaltn<strong>in</strong>gers dokumenter. Da en af<br />

grundene til at digitalisere forvaltn<strong>in</strong>ger netop er et ønske om øget effektiviser<strong>in</strong>g, vil<br />

det i denne afhandl<strong>in</strong>g blive undersøgt, om en automatisk <strong>in</strong>dekser<strong>in</strong>g af dokumenter<br />

kan erstatte, og måske endda forbedre, den manuelle <strong>in</strong>dekser<strong>in</strong>g i domænet.<br />

I afhandl<strong>in</strong>gen anskues problemstill<strong>in</strong>gen ud fra et medarbejderperspektiv. Det<br />

<strong>in</strong>debærer en brugergruppe, som har en stor viden <strong>in</strong>denfor det emne, de arbejder med.<br />

I modsætn<strong>in</strong>g til f.eks. borgere ved man ikke meget om medarbejderes<br />

<strong>in</strong>formationssøgeadfærd <strong>in</strong>denfor e-<strong>government</strong> litteraturen. Når man samtidig<br />

forventer, at digitaliser<strong>in</strong>gen af forvaltn<strong>in</strong>ger har en <strong>in</strong>dvirkn<strong>in</strong>g på medarbejderes<br />

arbejdsopgaver, og ved, at arbejdsopgaver <strong>in</strong>fluerer på de <strong>in</strong>formationsbehov,<br />

<strong>in</strong>formationssøgere udvikler, så opstår der et behov for at afdække, hvad der<br />

kendetegner søgeadfærden hos medarbejdere i den offentlige forvaltn<strong>in</strong>g i dag. Dette er<br />

i afhandl<strong>in</strong>gen blevet gjort ved hjælp af et domænestudie. Domænestudiet er baseret på<br />

en spørgeskemaundersøgelse blandt medarbejdere i en offentlig forvaltn<strong>in</strong>g, samt<br />

opfølgende fokusgruppe<strong>in</strong>terviews. Domænestudiet viste, at medarbejderne gør brug af<br />

en række forskellige <strong>in</strong>formationssystemer i deres arbejde, og at de gør det hyppigt i<br />

løsn<strong>in</strong>gen af deres opgaver. De har primært verifikative og bevidst emneafgrænsede<br />

<strong>in</strong>formationsbehov. Desuden er de erfarne søgere, som efterspørger langt flere metadata<br />

i det system, der danner grundlag for søgetesten, men især <strong>in</strong>dholdsmæssige metadata.<br />

Erfar<strong>in</strong>gerne fra domænestudiet blev <strong>in</strong>darbejdet i søgetestens design. Testen<br />

er en komparativ test, der sammenligner automatisk udtrukken <strong>in</strong>dekser<strong>in</strong>g (fritekst<br />

<strong>in</strong>dekser<strong>in</strong>g) med automatisk tildelt <strong>in</strong>dekser<strong>in</strong>g (kategoriser<strong>in</strong>g) på baggrund af en<br />

domænespecifik taksonomi. Testsystemet er en prototype af medarbejdernes<br />

kommende <strong>in</strong>tranet. 32 testdeltagere søgte i de to systemer på baggrund af 3 udleverede<br />

V


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

og et af deres egne søgeopgaver. Udformn<strong>in</strong>gen af de konstruerede søgeopgaver blev<br />

udformet i overensstemmelse med det, domænestudiet havde vist omkr<strong>in</strong>g<br />

medarbejdernes <strong>in</strong>formationsbehov. Testen viste, at de to former for <strong>in</strong>dekser<strong>in</strong>g er<br />

anvendelige på hver deres måde. Overordnet havde den udtrukne <strong>in</strong>dekser<strong>in</strong>g den<br />

bedste performance målt i forhold til antallet af ord og begreber, der blev anvendt i<br />

forespørgsler, hvor mange sessioner, der <strong>in</strong>deholdt reformuler<strong>in</strong>ger, samt antallet af<br />

reformuler<strong>in</strong>ger i de sessioner. Det viste, at systemet med kategoriser<strong>in</strong>g krævede mere<br />

af brugerne, både i forhold til antal søgn<strong>in</strong>ger og i forhold til antal termer og begreber,<br />

der blev <strong>in</strong>dtastet.<br />

Det viste sig, at testpersonerne havde problemer med at anvende<br />

kategoriser<strong>in</strong>gen i nogle sammenhænge. Således var den ikke relevant for dem, hvis de<br />

fik et højrelevant dokument frem blandt de første søgeresultater uden at have brugt<br />

kategoriser<strong>in</strong>gen. De fandt den heller ikke relevant, hvis de fik så få resultater frem ved<br />

selve søgn<strong>in</strong>gen, det var hurtigere manuelt at kigge dem igennem. Til gengæld kunne<br />

kategoriser<strong>in</strong>gen hjælpe dem med at identificere nye facetter i søgeopgaver og til at<br />

foreslå nye søgetermer i forb<strong>in</strong>delse med reformuler<strong>in</strong>ger. For det videre arbejde med<br />

retn<strong>in</strong>gsl<strong>in</strong>ier for <strong>in</strong>dekser<strong>in</strong>g mundede dette resultat ud i en anbefal<strong>in</strong>g af, at begge<br />

typer bør være til stede i digitale forvaltn<strong>in</strong>gers <strong>in</strong>dekser<strong>in</strong>g idet de dækker forskellige<br />

aspekter af de <strong>in</strong>formationsbehov, der opstår hos medarbejdere i digital forvaltn<strong>in</strong>g.<br />

I s<strong>in</strong> helhed bidrager afhandl<strong>in</strong>gen ved at give ny viden om sammenhængen<br />

<strong>in</strong>formationssøgeadfærden for medarbejdere i digitale forvaltn<strong>in</strong>ger og den måde,<br />

hvorpå den identificerede adfærd kan understøttes ved hjælp af automatisk <strong>in</strong>dekser<strong>in</strong>g.<br />

VI


Table of contents<br />

1 INTRODUCTION ..................................................................................................................... 1<br />

1.1 RESEARCH OBJECTIVE .................................................................................................................. 3<br />

1.2 EMPIRICAL ASSUMPTIONS ............................................................................................................ 4<br />

1.3 MOTIVATIONS FOR THE THESIS ..................................................................................................... 5<br />

1.4 RESEARCH QUESTIONS ................................................................................................................. 7<br />

1.5 STRUCTURE OF THE THESIS ........................................................................................................... 8<br />

2 METHODOLOGICAL FRAMEWORK .............................................................................. 11<br />

2.1 A COGNITIVE FRAMEWORK FOR INFORMATION RESEARCH ......................................................... 11<br />

2.1.1 Towards a holistic cognitive framework .................................................................................... 13<br />

2.1.2 The role of work tasks ................................................................................................................ 15<br />

2.2 THE COGNITIVE FRAMEWORK AND THE THESIS ........................................................................... 16<br />

2.3 OVERALL RESEARCH METHOD: CASE STUDY .............................................................................. 17<br />

2.4 THE CASE: SKAT ....................................................................................................................... 17<br />

2.4.1 The <strong>in</strong>tranet ............................................................................................................................... 19<br />

2.4.2 The <strong>in</strong>tranet taxonomy ............................................................................................................... 21<br />

2.5 SUMMARY .................................................................................................................................. 23<br />

3 THE E-GOVERNMENT DOMAIN ...................................................................................... 25<br />

3.1 DEFINITION AND PURPOSE .......................................................................................................... 26<br />

3.2 SUBJECT AREAS IN E-GOVERNMENT RESEARCH & DEVELOPMENT (R&D) .................................. 29<br />

3.3 STAKEHOLDERS IN E-GOVERNMENT ........................................................................................... 34<br />

3.4 LIS PERSPECTIVES ON E-GOVERNMENT ...................................................................................... 36<br />

3.4.1 Information systems ................................................................................................................... 36<br />

3.4.2 Knowledge management ............................................................................................................ 40<br />

3.4.3 ICT tools: Metadata <strong>in</strong>itiatives .................................................................................................. 42<br />

3.5 SUMMARY .................................................................................................................................. 46<br />

4 SEEKING BEHAVIOUR IN E-GOVERNMENT ................................................................ 47<br />

4.1 INFORMATION SEEKING AND RELATED CONCEPTS ...................................................................... 47<br />

4.2 THE PURPOSE OF SEEKING STUDIES ............................................................................................ 50<br />

4.3 ENTITIES OF E-GOVERNMENT: STUDIES OF SEEKING BEHAVIOR .................................................. 50<br />

4.4 E-GOVERNMENT EMPLOYEE INFORMATION SEEKING .................................................................. 51<br />

4.4.1 Project INISS ............................................................................................................................. 54<br />

4.4.2 System development <strong>in</strong> the Danish Parliament .......................................................................... 55<br />

4.4.3 Information behavior of employees <strong>in</strong> a eng<strong>in</strong>eer<strong>in</strong>g and technical service <strong>government</strong> office 57<br />

4.4.4 Federal, state, and local policy makers’ selection of <strong>in</strong>formation sources ............................... 58<br />

4.4.5 F<strong>in</strong>nish municipal employees .................................................................................................... 59<br />

4.4.6 Users of the European Parliamentary Documentation Centre .................................................. 60<br />

4.4.7 Information literacy of Scottish <strong>government</strong> civil service staff .................................................. 61<br />

4.4.8 Civil servants’ <strong>in</strong>ternet skills ..................................................................................................... 62<br />

4.5 RELATED STUDIES OF INFORMATION SEEKING AND SEARCHING ................................................. 63<br />

4.5.1 Legal seek<strong>in</strong>g behavior .............................................................................................................. 63<br />

4.5.2 Information behaviour of software eng<strong>in</strong>eers ............................................................................ 65<br />

4.5.3 Professional seek<strong>in</strong>g behaviour ................................................................................................. 65<br />

4.6 SUMMARY .................................................................................................................................. 67<br />

5 INDEXING OF ELECTRONIC DOCUMENTS .................................................................. 71<br />

5.1 THE PROCESS OF INDEXING ........................................................................................................ 72<br />

VII


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

5.2 QUALITY OF INDEXING ............................................................................................................... 74<br />

5.2.1 Specificity .................................................................................................................................. 74<br />

5.2.2 Exhaustivity ............................................................................................................................... 75<br />

5.2.3 Consistency ................................................................................................................................ 76<br />

5.2.4 Performance measures .............................................................................................................. 78<br />

5.3 APPROACHES TO INDEXING ........................................................................................................ 79<br />

5.3.1 Document, user, and doma<strong>in</strong> oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> ........................................................................ 79<br />

5.3.2 Controlled vs. uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> ........................................................................................ 81<br />

5.3.3 Intellectual vs. automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> ............................................................................................ 88<br />

5.4 APPROACHES TO AUTOMATIC INDEXING .................................................................................... 93<br />

5.4.1 <strong>Automatic</strong> extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> .................................................................................................... 94<br />

5.4.1.1 Lexical analysis and stop word lists .......................................................................................................... 94<br />

5.4.1.2 Stemm<strong>in</strong>g .................................................................................................................................................. 95<br />

5.4.1.3 Weight<strong>in</strong>g factors ...................................................................................................................................... 96<br />

5.4.1.4 Compound nouns as <strong>in</strong>dex terms ............................................................................................................... 99<br />

5.4.1.5 Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> ................................................................................................................................... 101<br />

5.4.2 <strong>Automatic</strong> assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> ................................................................................................... 104<br />

5.5 HYBRID TYPES OF INTELLECTUAL AND AUTOMATIC INDEXING ................................................. 108<br />

5.6 SUMMARY ................................................................................................................................ 110<br />

6 EMPIRICAL FRAMEWORK ............................................................................................. 111<br />

6.1 DOMAIN STUDY ........................................................................................................................ 111<br />

6.2 QUESTIONNAIRE DESIGN, COLLECTION, AND ANALYSIS ........................................................... 113<br />

6.2.1 Technique and structure .......................................................................................................... 114<br />

6.2.2 Content .................................................................................................................................... 115<br />

6.2.2.1 Background data ...................................................................................................................................... 116<br />

6.2.2.2 Work tasks ............................................................................................................................................... 116<br />

6.2.2.3 Elaboration of work tasks ........................................................................................................................ 117<br />

6.2.3 Data collection ........................................................................................................................ 121<br />

6.2.4 Pilot test<strong>in</strong>g .............................................................................................................................. 121<br />

6.2.5 Data analysis ........................................................................................................................... 122<br />

6.2.6 Methodical reflections ............................................................................................................. 124<br />

6.3 FOCUS GROUP METHOD ............................................................................................................ 125<br />

6.3.1 Purpose and design ................................................................................................................. 125<br />

6.3.2 Data collection: Interview guide ............................................................................................. 127<br />

6.3.3 Execution and documentation ................................................................................................. 127<br />

6.3.4 Data analysis ........................................................................................................................... 128<br />

6.3.5 Limitations ............................................................................................................................... 130<br />

6.4 SEARCH TEST DESIGN ............................................................................................................... 130<br />

6.4.1 Test system ............................................................................................................................... 130<br />

6.4.2 Test persons ............................................................................................................................. 134<br />

6.4.3 Search tasks ............................................................................................................................. 135<br />

6.4.4 Test procedure ......................................................................................................................... 138<br />

6.4.5 Pilot test ................................................................................................................................... 140<br />

6.4.6 Techniques for data collection and preparation ...................................................................... 141<br />

6.4.7 Data analysis ........................................................................................................................... 142<br />

6.5 LIMITATIONS ............................................................................................................................ 148<br />

6.6 RELATION BETWEEN RESEARCH METHOD AND RESEARCH QUESTIONS ..................................... 149<br />

7 DOMAIN STUDY RESULTS .............................................................................................. 151<br />

7.1 QUESTIONNAIRE RESPONDENTS, THEIR BACKGROUND AND WORK TASKS ................................ 151<br />

7.2 CHARACTERISTICS OF FOCUS GROUP PARTICIPANTS ................................................................. 155<br />

7.3 RESULTS REGARDING PROFESSIONAL E-GOVERNMENT SEEKING BEHAVIOR ............................. 156<br />

7.3.1 Use of <strong>in</strong>formation sources ...................................................................................................... 157<br />

7.3.1.1 Reference works ...................................................................................................................................... 157<br />

7.3.1.2 Web sites ................................................................................................................................................. 158<br />

7.3.1.3 Internal systems ....................................................................................................................................... 159<br />

VIII


7.3.2 Colleagues as sources of <strong>in</strong>formation ...................................................................................... 165<br />

7.4 SEEKING RESULTS REGARDING DEMANDS FOR INDEXING IN E-GOVERNMENT ........................... 167<br />

7.4.1 The frequency on <strong>in</strong>formation seek<strong>in</strong>g ..................................................................................... 167<br />

7.4.2 Types of <strong>in</strong>formation needs ...................................................................................................... 173<br />

7.4.3 Preferred metadata .................................................................................................................. 177<br />

7.5 SUMMARY AND IMPLICATIONS FOR INDEXING .......................................................................... 182<br />

8 SEARCH TEST RESULTS .................................................................................................. 185<br />

8.1 THE TEST PERSONS ................................................................................................................... 185<br />

8.2 OVERALL SEARCHING BEHAVIOUR AND PERFORMANCE ........................................................... 188<br />

8.2.1 The search situation ................................................................................................................ 191<br />

8.2.1.1 Sessions ................................................................................................................................................... 191<br />

8.2.1.2 Queries .................................................................................................................................................... 192<br />

8.2.1.3 Search operators ...................................................................................................................................... 196<br />

8.2.1.4 Filter<strong>in</strong>g by metadata ............................................................................................................................... 198<br />

8.2.2 Reformulations ........................................................................................................................ 202<br />

8.2.3 Comb<strong>in</strong>ed system B sessions and queries ................................................................................ 206<br />

8.3 SUMMARY AND PERFORMANCE IMPLICATIONS FOR FUTURE INDEXING IN E-GOVERNMENT ...... 211<br />

9 CONCLUSION AND RECOMMENDATIONS FOR FUTURE WORK ........................ 215<br />

9.1 SUMMARY OF EMPIRICAL FINDINGS .......................................................................................... 215<br />

9.2 CONTRIBUTIONS OF THE THESIS ............................................................................................... 219<br />

9.3 RECOMMENDATIONS FOR FUTURE WORK ................................................................................. 220<br />

10 REFERENCES ...................................................................................................................... 223<br />

List of abbreviations ................................................................................................................................................... 245<br />

Appendices 247<br />

Appendix 1: Generic work tasks at SKAT ................................................................................................................. 249<br />

Appendix 2: Distribution of employees across ma<strong>in</strong> processes <strong>in</strong> the bus<strong>in</strong>ess model .............................................. 253<br />

Appendix 3: E-mail <strong>in</strong>vitation to employees .............................................................................................................. 255<br />

Appendix 4: Questions conta<strong>in</strong>ed <strong>in</strong> questionnaire .................................................................................................... 257<br />

Appendix 5: Questionnaire pilot test data .................................................................................................................. 259<br />

Appendix 6: L<strong>in</strong>k to questionnaire ............................................................................................................................. 261<br />

Appendix 7: Dates for the conduct of focus group <strong>in</strong>terviews ................................................................................... 263<br />

Appendix 8: Example of the slides guid<strong>in</strong>g a focus group <strong>in</strong>terview ......................................................................... 265<br />

Appendix 9: Focus group <strong>in</strong>terview guide ................................................................................................................. 275<br />

Appendix 10: Transcription conventions ................................................................................................................... 277<br />

Appendix 11: Verbatim Danish versions of quotes used <strong>in</strong> the thesis ........................................................................ 279<br />

Appendix 12: E-mail <strong>in</strong>vitation to participate <strong>in</strong> search test ...................................................................................... 287<br />

Appendix 13: Questionnaire for recruit<strong>in</strong>g test persons for the search test ................................................................ 291<br />

Appendix 14: Simulated search tasks ......................................................................................................................... 293<br />

Appendix 15: Test persons’ <strong>in</strong>sight <strong>in</strong>to simulated search tasks ................................................................................ 295<br />

Appendix 16: E-mail concern<strong>in</strong>g naturalistic <strong>in</strong>formation needs ............................................................................... 297<br />

Appendix 17: Instructions for search test persons ...................................................................................................... 299<br />

Appendix 18: Rotation of search tasks ....................................................................................................................... 303<br />

Appendix 19: Search test <strong>in</strong>terview guide .................................................................................................................. 305<br />

Appendix 20: Judgement of the relevance of retrieved documents <strong>in</strong> search test ...................................................... 307<br />

Appendix 21: Completeness degree of questionnaire responses ................................................................................ 309<br />

Appendix 22: Respondents’ experience with work tasks ........................................................................................... 311<br />

Appendix 23: Age distribution of population, respondents and test persons .............................................................. 313<br />

Appendix 24: Respondents’ length of service <strong>in</strong> the organization ............................................................................. 315<br />

Appendix 25: Focus group participants work tasks .................................................................................................... 317<br />

Appendix 26: Additional sources mentioned by respondents .................................................................................... 319<br />

Appendix 27: Test persons’ background data ............................................................................................................ 325<br />

Appendix 28: Supplementary search test tables ......................................................................................................... 327<br />

IX


List of figures<br />

Figure 2.1 The participat<strong>in</strong>g actors <strong>in</strong> context. Model adapted from Ingwersen & Järvel<strong>in</strong> (2005, p.<br />

261) with m<strong>in</strong>or corrections. ........................................................................................................... 12<br />

Figure 2.2 Extension of the cognitive view, the <strong>in</strong>teractive process of IR and affect<strong>in</strong>g factors.<br />

Adapted from Ingwersen & Järvel<strong>in</strong> (2005, p. 274) with m<strong>in</strong>or corrections. .................................. 14<br />

Figure 2.3 Information behaviour and the <strong>in</strong>fluence from job- or non-job related tasks. Adapted<br />

from Ingwersen & Järvel<strong>in</strong>(Ingwersen & Järvel<strong>in</strong>, 2005, p. 198). .................................................. 16<br />

Figure 2.4 SKATs revised bus<strong>in</strong>ess model ................................................................................................. 18<br />

Figure 2.5 Screen dump from exist<strong>in</strong>g <strong>in</strong>tranet <strong>in</strong>terface ........................................................................... 20<br />

Figure 3.1 Discipl<strong>in</strong>es <strong>in</strong>tegrated <strong>in</strong> the multidiscipl<strong>in</strong>ary research field og e-<strong>government</strong>. Adapted<br />

from Wimmer (2007, p. 14) ............................................................................................................ 27<br />

Figure 3.2 Basic elements and relations <strong>in</strong> <strong>government</strong>al systems (Grönlund, 2003, p. 56) ...................... 28<br />

Figure 3.3 E-<strong>government</strong> hype cycle (Schellong, 2007) ............................................................................ 30<br />

Figure 3.4 Dimensions and stages <strong>in</strong> e-<strong>government</strong> (from Layne & Lee, 2001, p. 124) ............................ 32<br />

Figure 4.1 Nested model of <strong>in</strong>formation seek<strong>in</strong>g and <strong>in</strong>formation search<strong>in</strong>g (Wilson, 1999, p. 263) ........ 48<br />

Figure 4.2 Comprehensive model of <strong>in</strong>formation seek<strong>in</strong>g. Adapted from Johnson et al. (1995). ............. 56<br />

Figure 4.3 Model of cognitive factors affect<strong>in</strong>g <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> the doma<strong>in</strong> of software<br />

eng<strong>in</strong>eer<strong>in</strong>g. Adapted from Freund, Toms & Waterhouse (2005). ................................................. 66<br />

Figure 4.4 The process of <strong>in</strong>formation seek<strong>in</strong>g of professionals. Adapted from Leckie, Pettigrew &<br />

Sylva<strong>in</strong> (1996, p. 180) ..................................................................................................................... 68<br />

Figure 5.1: Illustration of the subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> process (Mai, 2000, p. 279). ............................................. 73<br />

Figure 5.2 Document and doma<strong>in</strong> oriented approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Adapted from Mai (2005, p.<br />

607) ................................................................................................................................................. 81<br />

Figure 5.3 Types of vocabularies and their relationships. Adapted from Morville & Rosenfeld<br />

(2007, p. 195) .................................................................................................................................. 82<br />

Figure 5.4 Generalized characteristics of <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Accumulated on the basis of<br />

Rafferty & Hidderley (2007). .......................................................................................................... 89<br />

Figure 5.5 The resolv<strong>in</strong>g power of significant <strong>in</strong>dex terms. Adapted from Luhn (1958a, p. 161) ............. 97<br />

Figure 6.1 Screen dump from atlas.ti cod<strong>in</strong>g of focus group <strong>in</strong>terviews .................................................. 129<br />

Figure 6.2 Screen dump of the test system: Search fields ........................................................................ 131<br />

Figure 6.3 Screen dump of the test system: Categorization...................................................................... 133<br />

Figure 6.4 Relevance types <strong>in</strong> IR evaluation adapted from Borlund (2003a, p. 915). .............................. 139<br />

XI


List of tables<br />

Table 1.1 Timel<strong>in</strong>e for data collection <strong>in</strong> the PhD project ............................................................................ 7<br />

Table 3.1 Stakeholders <strong>in</strong> e-<strong>government</strong>. Adapted from Rowley (2011, p. 56) ......................................... 35<br />

Table 3.2 Knowledge management processes and the potential role of IT. Adapted from Alavi &<br />

Leidner (2001, p. 125) ..................................................................................................................... 41<br />

Table 4.1 Examples of studies that have exam<strong>in</strong>ed <strong>in</strong>formation seek<strong>in</strong>g and/or search<strong>in</strong>g of various<br />

stakeholder roles.............................................................................................................................. 52<br />

Table 5.1 Possible factors affect<strong>in</strong>g consistency. From Lancaster (2003, p. 71). ...................................... 77<br />

Table 5.2 Summary of strengths and weaknesses of controlled vocabularies and free text. Adapted<br />

from Dubois (1987, p. 249). ............................................................................................................ 84<br />

Table 6.1 Indicators of <strong>in</strong>formation needs <strong>in</strong> questionnaire and correspond<strong>in</strong>g theoretical<br />

descriptions ................................................................................................................................... 119<br />

Table 6.2 List of respondents' preferred metadata listed <strong>in</strong> questionnaire ................................................ 120<br />

Table 6.3 Cross tabulations carried out on the basis of variables <strong>in</strong> questionnaire data ........................... 123<br />

Table 6.4 Overview of participants <strong>in</strong> focus groups ................................................................................. 126<br />

Table 6.5 Examples of genu<strong>in</strong>e search tasks ............................................................................................ 137<br />

Table 6.6 Search test variables, their def<strong>in</strong>ition and measurement ........................................................... 144<br />

Table 6.7 Simulated search task facets ..................................................................................................... 146<br />

Table 6.8 Outl<strong>in</strong>e of the relation between research questions and empirical data .................................... 150<br />

Table 7.1 Distribution of respondents as to their education (percentages) ............................................... 152<br />

Table 7.2 Number of work tasks selected by respondents ........................................................................ 153<br />

Table 7.3 Ranked frequency of work tasks <strong>in</strong> questionnaire results ......................................................... 154<br />

Table 7.4 Focus group participants' educational background ................................................................... 156<br />

Table 7.5 Respondents' use of predef<strong>in</strong>ed <strong>in</strong>formation sources (percentages) (to be cont<strong>in</strong>ued on<br />

the succeed<strong>in</strong>g page) ..................................................................................................................... 160<br />

Table 7.6 Questionnaire results regard<strong>in</strong>g the frequency of <strong>in</strong>formation seek<strong>in</strong>g .................................... 166<br />

Table 7.7 Distribution of <strong>in</strong>dicators of <strong>in</strong>formation needs ........................................................................ 172<br />

Table 7.8 Average percentage distribution of verificative needs (VN), conscious topical needs<br />

(CTN), and muddled topical needs (MTN). .................................................................................. 175<br />

Table 7.9 Metadata preferences distributed across work tasks ................................................................. 178<br />

Table 8.1 Frequency of test persons' <strong>in</strong>tranet use ..................................................................................... 185<br />

Table 8.2 Rank<strong>in</strong>g of test persons' most important <strong>in</strong>formation sources .................................................. 186<br />

Table 8.3 General evaluation of simulated search tasks <strong>in</strong> system a, system b, and total (averages) ....... 187<br />

Table 8.4 Evaluation of simulated search tasks specified to s<strong>in</strong>gle simulated search tasks<br />

(averages) ...................................................................................................................................... 187<br />

Table 8.5 General f<strong>in</strong>d<strong>in</strong>gs of variables <strong>in</strong> search test .............................................................................. 188<br />

XIII


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 8.6 Session success (percentages) .................................................................................................. 189<br />

Table 8.7 Query success (percentages) ..................................................................................................... 190<br />

Table 8.8 Number of queries <strong>in</strong> sessions at task level (averages) ............................................................ 191<br />

Table 8.9 Number of queries <strong>in</strong> sessions as to success or failure (averages) ............................................ 192<br />

Table 8.10 Number of search terms <strong>in</strong> queries (averages) ........................................................................ 193<br />

Table 8.11 Number of search keys <strong>in</strong> queries (averages) ......................................................................... 193<br />

Table 8.12 Number of search terms <strong>in</strong> queries as to success or failure (averages) ................................... 194<br />

Table 8.13 Number of search keys <strong>in</strong> queries as to success or failure (averages) .................................... 194<br />

Table 8.14 Distribution of search operator <strong>in</strong> queries (percentages) ........................................................ 195<br />

Table 8.15 Number of search terms used with search operators <strong>in</strong> queries (averages) ............................ 196<br />

Table 8.16 Success of search operators (percentages) .............................................................................. 198<br />

Table 8.17 Document type filter used <strong>in</strong> queries (percentages) ................................................................ 200<br />

Table 8.18 Search success for the document type filter <strong>in</strong> system A and system B queries<br />

(percentages) ................................................................................................................................. 201<br />

Table 8.19 Number of sessions with query reformulations (percentages) ................................................ 202<br />

Table 8.20 Number of reformulations <strong>in</strong> sessions .................................................................................... 203<br />

Table 8.21 Types of reformulations for all queries (percentages) ............................................................ 204<br />

Table 8.22 Query success on the basis of types of reformulations (percentages) ..................................... 205<br />

Table 8.23 Sessions carried out <strong>in</strong> system B, or <strong>in</strong> a comb<strong>in</strong>ation of System B and system A:<br />

Frequency and success (percentages) ............................................................................................ 207<br />

Table 8.24 System of successful queries <strong>in</strong> comb<strong>in</strong>ed system B sessions ................................................ 208<br />

Table 8.25 System B queries: Frequency of category use and query success (percentages) .................... 208<br />

XIV


1 Introduction<br />

1<br />

Chapter 1<br />

Index<strong>in</strong>g has been carried out for centuries start<strong>in</strong>g with manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In the middle<br />

of the last century, automatic methods were <strong>in</strong>troduced as a counterpart. Though both<br />

manual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> has been studied both theoretically and empirically,<br />

researchers are still able to identify shortages <strong>in</strong> our knowledge of <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> terms of<br />

quality, issues of cost-effectiveness, our understand<strong>in</strong>g of the effect of <strong>in</strong>dexers and<br />

<strong>in</strong>formation users’ cognitive processes, and the like (e.g., Milstead, 1994; Anderson &<br />

Perez-Carballo, 2001a, 2001b). The present PhD project explores <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the<br />

context of a specific doma<strong>in</strong>: E-<strong>government</strong>. Specifically, we <strong>in</strong>vestigate the<br />

performance of two methods for subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the doma<strong>in</strong> of e-<strong>government</strong>. The<br />

purpose of the <strong>in</strong>vestigation is to be able to work out a set of recommendations for<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> practice <strong>in</strong> e-<strong>government</strong>.<br />

Information overload is a widely recognized problem today (e.g., Edmunds &<br />

Morris, 2000; Eppler & Mengis, 2004; Codagnone & Wimmer, 2007). Information<br />

overload is a challenge <strong>in</strong> private and public organizations. Simultaneously, the<br />

importance of <strong>in</strong>formation <strong>in</strong> e-<strong>government</strong> cannot be underestimated. Accord<strong>in</strong>g to<br />

Klischewski, “[d]ocument process<strong>in</strong>g is at the core of adm<strong>in</strong>istrative performance <strong>in</strong><br />

several respects: “Documents are the basis for almost all of the adm<strong>in</strong>istrative<br />

processes, they are the most valuable resources to exploit as they are the ma<strong>in</strong> carriers<br />

of <strong>in</strong>formation and represent a large portion of the overall adm<strong>in</strong>istrative knowledge<br />

base” (2006, p. 34). Thus, <strong>in</strong> democracies, documental support is a key issue for<br />

operations undertaken <strong>in</strong> public adm<strong>in</strong>istrations (Kraemer & Dedrick, 1997;<br />

Klischewski, 2006; Sabucedo & Rifón, 2006). The consequences of not be<strong>in</strong>g able to<br />

f<strong>in</strong>d the needed documents for a given task have previously been considered. The<br />

calculations carried out by Feldman & Sherman suggest, that support<strong>in</strong>g corporate<br />

users’ search<strong>in</strong>g for <strong>in</strong>formation is one step towards efficiency and effectiveness<br />

(Glazer, 1993; Feldman & Sherman, 2001). In addition, public adm<strong>in</strong>istrations<br />

expected to offer security to the public. Not be<strong>in</strong>g able to f<strong>in</strong>d needed <strong>in</strong>formation can<br />

have severe costs (Kraemer & Dedrick, 1997). Studies have <strong>in</strong>dicated, that the facilities<br />

of e-<strong>government</strong> systems still leave room for improvement, for <strong>in</strong>stance <strong>in</strong> terms of<br />

search<strong>in</strong>g (e.g., Goh et al., 2008), navigation (e.g., de Jong & Lentz, 2006), the extent of<br />

metadata adoption (e.g., Kopackova, Michalek & Cejna, 2010). In sum, the support of


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

access to <strong>in</strong>formation should be given high priority, if the aim is effective, efficient, and<br />

secure <strong>government</strong>s.<br />

Edmunds & Morris (2000) mentions different methods to reduce <strong>in</strong>formation<br />

overload <strong>in</strong> organizations, e.g., value-added <strong>in</strong>formation. Value-added <strong>in</strong>formation<br />

limits <strong>in</strong>formation overload concurrently with <strong>in</strong>creas<strong>in</strong>g users’ access to relevant<br />

<strong>in</strong>formation. A concrete way of value-add<strong>in</strong>g <strong>in</strong>formation <strong>in</strong> e-<strong>government</strong> documents<br />

is assignment of metadata. Assignment of metadata <strong>in</strong> the doma<strong>in</strong> serves several<br />

purposes, namely allow<strong>in</strong>g <strong>in</strong>teroperability between systems and enabl<strong>in</strong>g users to<br />

retrieve better and more precise search results (Moen, 2001; Tambouris, Manouselis &<br />

Costopoulou, 2007). Further, metadata ease knowledge shar<strong>in</strong>g between employees <strong>in</strong><br />

e-<strong>government</strong> <strong>in</strong>ternally <strong>in</strong> organizations as well as externally (Schwartz, Divit<strong>in</strong>i &<br />

Brasethvik, 2000; Choo, 2006). The multiplicity of metadata standards developed<br />

specifically for e-<strong>government</strong> reflect, that <strong>government</strong>s are very well aware of the<br />

importance of metadata (cf., Tambouris, Manouselis & Costopoulou, 2007). Metadata<br />

can be assigned either manually by humans or automatically on the basis of a mach<strong>in</strong>e<br />

generated analysis of the words constitut<strong>in</strong>g the documents. In e-<strong>government</strong> the<br />

predom<strong>in</strong>ant approach is manual assignment. With the present thesis we want to<br />

<strong>in</strong>vestigate, whether the use of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can be a means to effective and<br />

efficient <strong>government</strong>s that concurrently can support the important process of<br />

<strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> the doma<strong>in</strong>.<br />

The concept of e-<strong>government</strong> designates <strong>government</strong>s that utilize ICT <strong>in</strong> order<br />

to communicate with and allow access to <strong>in</strong>formation for external parties such as<br />

citizens, bus<strong>in</strong>esses and other <strong>government</strong>s (e.g., Fang, 2002; Jaeger, 2003; Grant &<br />

Chau, 2005). A variety of purposes for e-<strong>government</strong> can be identified <strong>in</strong> the literature.<br />

The more important ones are openness, improved and more flexible services for citizens<br />

and bus<strong>in</strong>esses, and <strong>in</strong>creased coherence and efficiency of <strong>government</strong>al processes (e.g.,<br />

Grönlund & Horan, 2004). The concept of e-<strong>government</strong> has emerged worldwide<br />

dur<strong>in</strong>g the latest decade. The Scand<strong>in</strong>avian countries have been pioneers <strong>in</strong> the process<br />

of digitaliz<strong>in</strong>g <strong>government</strong>s. As a result, Scand<strong>in</strong>avia have had favourable appearance<br />

<strong>in</strong> the various <strong>in</strong>ternational e-<strong>government</strong> <strong>in</strong>dexes (Andersen et al., 2005; Henriksen &<br />

Damsgaard, 2006). In Denmark three successive strategies has formed the basis for the<br />

development of e-<strong>government</strong> with<strong>in</strong> the framework of Project Digital Government. In<br />

2002 “Towards e-<strong>government</strong>: vision and strategy for the public sector <strong>in</strong> Denmark”<br />

(Project Digital Government & The Digital Taskforce, 2002) was published. In 2004,<br />

“The Danish eGovernment Strategy 2004-2006: realis<strong>in</strong>g the potential” (The Danish<br />

2


3<br />

Chapter 1<br />

Government et al., 2004) followed. The latest strategy, “The Danish E-Government<br />

Strategy 2007-2010: Towards Better Digital Service, Increased Efficiency, and Stronger<br />

Collaboration” (The Danish Government, Local Government Denmark (LGDK) &<br />

Danish Regions, 2007) appeared <strong>in</strong> 2007. The strategies have been carried out as<br />

cooperation between the most important actors <strong>in</strong> the Danish <strong>government</strong>al system; the<br />

<strong>government</strong>, the regions, and the municipalities. The strategies altogether cover the<br />

period 2001-2010. Dur<strong>in</strong>g the decade they have been <strong>in</strong> function the strategies have<br />

become <strong>in</strong>creas<strong>in</strong>gly specific concurrently with the <strong>in</strong>creased knowledge of e<strong>government</strong>.<br />

In the two latest strategies, automation of employees’ work<strong>in</strong>g processes<br />

has been specifically addressed as a means to reduc<strong>in</strong>g the use of resources. The<br />

pr<strong>in</strong>ciple of effectiveness is carried on <strong>in</strong> the recent mandate for a new strategy that<br />

replaces the exist<strong>in</strong>g strategy <strong>in</strong> 2011 (The Danish Government, Local Government<br />

Denmark & Danish Regions, 2010). Automation of <strong><strong>in</strong>dex<strong>in</strong>g</strong> procedures may thus<br />

support the e-<strong>government</strong> strategy <strong>in</strong> terms of reduc<strong>in</strong>g the resources spent on carry<strong>in</strong>g<br />

out <strong><strong>in</strong>dex<strong>in</strong>g</strong> and search<strong>in</strong>g for <strong>in</strong>formation.<br />

1.1 Research objective<br />

The PhD project has been f<strong>in</strong>anced by the National IT and Telecom Agency,<br />

the Royal School of Library and Information Science, and Department of<br />

Communication, <strong>Aalborg</strong> University. The overall project idea orig<strong>in</strong>ated from the<br />

National IT and Telecom Agency. The agency requested a set of guidel<strong>in</strong>es for the<br />

application of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods that could be used for the Agency’s work<br />

with standardization and <strong>in</strong>teroperability <strong>in</strong> the Danish public sector. We have met this<br />

assignment by focus<strong>in</strong>g on two <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods; automatically extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

(full text <strong><strong>in</strong>dex<strong>in</strong>g</strong>) and automatically assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> (automatic categorization).<br />

Thus, the objective is to evaluate, if automatic categorization as an approach to<br />

automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can improve retrieval performance <strong>in</strong> e-<strong>government</strong> <strong>in</strong> a<br />

professional context. We use the case study approach as our general methodical<br />

approach. The Danish tax authorities SKAT have will<strong>in</strong>gly agreed to be our case of<br />

study. We <strong>in</strong>vestigate employees at SKAT, that is, professional users of <strong>in</strong>formation.<br />

Compared to e-<strong>government</strong> customers (e.g., citizens and bus<strong>in</strong>esses), our target group<br />

constitutes a homogenous user group.<br />

S<strong>in</strong>ce e-<strong>government</strong> represents a specific doma<strong>in</strong>, we carry out the empirical<br />

<strong>in</strong>vestigation of the overall research problem <strong>in</strong> two parts. First we analyse the specific


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

characteristics of the doma<strong>in</strong>. For this purpose we use a questionnaire for ga<strong>in</strong><strong>in</strong>g an<br />

overview of the organization. Subsequently, focus group <strong>in</strong>terviews are employed <strong>in</strong><br />

order to expla<strong>in</strong> and expand the results of the questionnaire survey. The questionnaire<br />

is used to collect data on the employees’ frequency of <strong>in</strong>formation seek<strong>in</strong>g, the types of<br />

<strong>in</strong>formation needs developed, use of <strong>in</strong>formation sources, and metadata preferences <strong>in</strong><br />

relation to specific work tasks <strong>in</strong> the organization. The assumption is that importance of<br />

<strong>in</strong>formation may depend on the work task <strong>in</strong> question. We refer to this first part of the<br />

empirical foundation for the thesis as the doma<strong>in</strong> study.<br />

The second part of the data collection consists of a search test specifically<br />

<strong>in</strong>vestigat<strong>in</strong>g the performance of the two <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods mentioned above. For the<br />

design of the search test we use knowledge ga<strong>in</strong>ed from the doma<strong>in</strong> study <strong>in</strong> order to<br />

qualify the search test design. The search test <strong>in</strong>vestigates the performance of two test<br />

systems. Both test systems employ automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>; one extracted (free text<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>) and one assigned (automatic categorization). Three simulated and one real<br />

search job forms the basis of the test persons’ evaluation of the performance of the test<br />

systems. The relevance of the search results are evaluated by the test persons. The test<br />

sessions are f<strong>in</strong>ished with a short <strong>in</strong>terview.<br />

1.2 Empirical assumptions<br />

The empirical design of the PhD project has been guided by our<br />

methodological start<strong>in</strong>g po<strong>in</strong>t: the cognitive view of <strong>in</strong>formation seek<strong>in</strong>g and retrieval<br />

(cf., Ingwersen & Järvel<strong>in</strong>, 2005). The cognitive viewpo<strong>in</strong>t is methodologically<br />

considered with<strong>in</strong> the research tradition of cognitive constructivism (Talja, Tuom<strong>in</strong>en &<br />

Savola<strong>in</strong>en, 2005). The cognitive viewpo<strong>in</strong>t has emerged as a reaction to a biased focus<br />

on users <strong>in</strong> the user oriented research tradition and on systems <strong>in</strong> the system oriented<br />

research tradition. Thus, the cognitive viewpo<strong>in</strong>t aims at a holistic view on the process<br />

of IR <strong>in</strong>teraction <strong>in</strong> order to achieve <strong>in</strong>tegration between the user oriented and the<br />

system driven research traditions (e.g., Ingwersen, 1992, 1996; Ingwersen & Järvel<strong>in</strong>,<br />

2005). The cognitive view emphasizes the cognitive actors <strong>in</strong>teract<strong>in</strong>g <strong>in</strong> <strong>in</strong>formation<br />

seek<strong>in</strong>g and retrieval. With this view of <strong>in</strong>formation seek<strong>in</strong>g and retrieval, the users and<br />

the <strong>in</strong>formation system must be taken <strong>in</strong>to account when test<strong>in</strong>g performance of an<br />

<strong>in</strong>formation system. As a consequence we test the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods by<br />

<strong>in</strong>volv<strong>in</strong>g real, potential users <strong>in</strong> the search test. Further, we apply an established<br />

evaluation method for the search test, namely simulated search tasks, which have been<br />

4


5<br />

Chapter 1<br />

suggested by Borlund (Borlund & Ingwersen, 1997; Borlund, 2000, 2003b). The<br />

purpose of simulated search tasks is to be able to evaluate IR systems <strong>in</strong> a way that<br />

ensures both realism and experimental control.<br />

In the latest presentations of the cognitive view the importance of context <strong>in</strong><br />

<strong>in</strong>formation seek<strong>in</strong>g and retrieval have received greater emphasis (e.g., Ingwersen &<br />

Järvel<strong>in</strong>, 2005). The cognitive structures of the <strong>in</strong>dividual still constitute the core of the<br />

viewpo<strong>in</strong>t, but the context is considered an <strong>in</strong>fluential component <strong>in</strong> <strong>in</strong>formation<br />

seek<strong>in</strong>g and retrieval. Accord<strong>in</strong>g to Ingwersen & Järvel<strong>in</strong> (2005, p. 19): “...actors and<br />

other components function as context to one another <strong>in</strong> the <strong>in</strong>teraction process. There<br />

are social, organizational, cultural as well as systemic contexts, which evolve over<br />

time.” The dist<strong>in</strong>ct presence of the concept of context <strong>in</strong> the literature emphasizes, that<br />

context must be considered a factor <strong>in</strong> the <strong>in</strong>teraction process. The def<strong>in</strong>ition of what<br />

constitute context have been discussed and operationalized <strong>in</strong> relation to <strong>in</strong>formation<br />

behaviour (cf., Courtright, 2007). In the present work we are concerned with a work<br />

based, organizational context. This calls for a consideration of the <strong>in</strong>fluence of that<br />

specific context as to the results of the search test. This is the ma<strong>in</strong> reason for carry<strong>in</strong>g<br />

out the first part of the empirical data collection: the doma<strong>in</strong> study. We are not guided<br />

by the theoretical foundation of the doma<strong>in</strong> analysis as formulated by Hjørland and<br />

Albrechtsen (1995) s<strong>in</strong>ce it is primarily concerned with scientific doma<strong>in</strong>s. Rather we<br />

are <strong>in</strong>spired by studies similar to the present doma<strong>in</strong> study. Examples count Leckie,<br />

Pettigrew & Sylva<strong>in</strong> (1996), Nielsen (2001) and Freund, Toms & Waterhouse (2005).<br />

1.3 Motivations for the thesis<br />

The present research is motivated by different conditions. We have already<br />

mentioned one of the basic premises of e-<strong>government</strong>, namely effectiveness and<br />

efficiency. With the present study we want to <strong>in</strong>vestigate, whether automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

<strong>in</strong> the form of automatic categorization can contribute to this premise. The motivation<br />

for the study relates to two different aspects; <strong><strong>in</strong>dex<strong>in</strong>g</strong> and the target group <strong>in</strong> question.<br />

The seek<strong>in</strong>g behaviour of e-<strong>government</strong> employees are, to our knowledge, not<br />

very well discovered. To compare, numerous studies have been made of the customers<br />

of e-<strong>government</strong> (e.g., citizens and bus<strong>in</strong>esses) <strong>in</strong> order to evaluate their use of e<strong>government</strong><br />

solutions. Reviews can be found <strong>in</strong> Robb<strong>in</strong>, Courtright & Davis (2004) and<br />

Case (2006). A basic premise for the thesis is that we need to know what characterizes<br />

e-<strong>government</strong> employees’ seek<strong>in</strong>g behaviour and the role of <strong>in</strong>formation <strong>in</strong> the doma<strong>in</strong>


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

<strong>in</strong> order to be able to tailor the <strong><strong>in</strong>dex<strong>in</strong>g</strong> to the seek<strong>in</strong>g behaviour and <strong>in</strong>formation needs<br />

actually experienced by the employees. We will present the studies that after all do<br />

<strong>in</strong>form us about e-<strong>government</strong> users’ seek<strong>in</strong>g behaviour <strong>in</strong> chapter 3.<br />

As for automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>, there are different motivations for suggest<strong>in</strong>g<br />

automatic categorization <strong>in</strong> the present context. Manual assignment of metadata is a<br />

costly and time consum<strong>in</strong>g process for adm<strong>in</strong>istrative employees. If automatic<br />

categorization proves to support and improve the <strong>in</strong>formation seek<strong>in</strong>g of the thesis<br />

target group, it would at the same time support the <strong>in</strong>tentions about <strong>in</strong>creased<br />

effectiveness and efficiency <strong>in</strong> e-<strong>government</strong>. Also, the literature has demonstrated, that<br />

ensur<strong>in</strong>g quality and consistency <strong>in</strong> manually added metadata can be difficult (Anderson<br />

& Perez-Carballo, 2001a; Lancaster, 2003). Thus, manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> tends to depend on<br />

<strong>in</strong>dexers, both across <strong>in</strong>dexers (<strong>in</strong>ter <strong>in</strong>dexer consistency) and across time (<strong>in</strong>tra <strong>in</strong>dexer<br />

consistency). Further, <strong>in</strong> the field of US federal records management, Sprehe, McClure<br />

& Zellner (2002) found, that different situational factors affected the quality of federal<br />

employees’ record keep<strong>in</strong>g, diverg<strong>in</strong>g the quality of the records management across<br />

<strong>government</strong>s. Factors like availability of resources and guidance, the motivation of the<br />

employees, and efficiency of access to records appeared to be affect<strong>in</strong>g the quality of<br />

records management <strong>in</strong> the study. In a recent study of metadata assignment <strong>in</strong> a F<strong>in</strong>nish<br />

<strong>government</strong> the researchers found, that employees prefer not to assign metadata when<br />

they have the option. Also, the employees tend to accept default values, whenever they<br />

are available (Kettunen & Henttonen, 2010). The results suggest that e-<strong>government</strong><br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> might benefit from an automatic solution to <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> a number of ways.<br />

The literature has already demonstrated, that the assignment of metadata is one among<br />

more prerequisites for retrieval and shar<strong>in</strong>g of knowledge <strong>in</strong> organizations (e.g., Choo,<br />

2006). If automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can improve subject metadata, then there is reason to<br />

assume that the retrieval and shar<strong>in</strong>g of knowledge <strong>in</strong> the doma<strong>in</strong> is also <strong>in</strong>fluenced <strong>in</strong> a<br />

positive sense.<br />

To our knowledge, not much is known about how automatically extracted and<br />

automatically assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods supplement each other. The theory of<br />

polyrepresentation suggests that the more different types of representation, the more<br />

cognitive overlap there will be between the representations (Ingwersen, 1996;<br />

Ingwersen & Järvel<strong>in</strong>, 2005). Further, the comb<strong>in</strong>ation of approaches enables tak<strong>in</strong>g<br />

advantage of the strengths of each approach (Anderson & Perez-Carballo, 2001a).<br />

6


Table 1.1 Timel<strong>in</strong>e for data collection <strong>in</strong> the PhD project<br />

Period of time Data type<br />

December 2008 Survey questionnaire<br />

June-July 2009 Focus group <strong>in</strong>terviews<br />

May 2010 Recruitment questionnaire for search test<br />

May-June 2010 Search test<br />

7<br />

Chapter 1<br />

One f<strong>in</strong>al motivation for the thesis concerns automatic categorization.<br />

Categorization represents a structured way of offer<strong>in</strong>g users a subject based overview of<br />

search results. Categorization have been developed <strong>in</strong> different prototypes dur<strong>in</strong>g the<br />

00’s, though rarely for <strong>in</strong>tranets (Käki, 2005a). Thus, we want to <strong>in</strong>vestigate whether<br />

the use of categorization <strong>in</strong> e-<strong>government</strong> is consistent with exist<strong>in</strong>g studies of<br />

categorization.<br />

1.4 Research questions<br />

Our overall research question designates the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods<br />

<strong>in</strong> the doma<strong>in</strong> of e-<strong>government</strong>. The overall research methodology is a s<strong>in</strong>gle case<br />

study. The specific research questions address the doma<strong>in</strong> study (research question 1)<br />

and the search test (research question 2) respectively.<br />

1. What characterizes the e-<strong>government</strong> employee’s <strong>in</strong>formation seek<strong>in</strong>g behaviour <strong>in</strong><br />

relation to:<br />

1.1. Their use of <strong>in</strong>formation sources?<br />

1.2. Their frequency of <strong>in</strong>formation seek<strong>in</strong>g?<br />

1.3. Their <strong>in</strong>formation needs?<br />

1.4. Their metadata preferences?<br />

1.5. How does the seek<strong>in</strong>g behaviour affect demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong>?<br />

The first research question and related sub questions are answered on the basis of the<br />

doma<strong>in</strong> study. The question and sub questions are answered by the quantitative data<br />

collected from the questionnaire and the qualitative follow up focus group <strong>in</strong>terviews<br />

(see timel<strong>in</strong>e of the data collection <strong>in</strong> Table 1.1). Thus the responses aim at provid<strong>in</strong>g a


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

quantitative answer, but also seek to offer explanations for the patterns identified <strong>in</strong> the<br />

questionnaire data.<br />

2. How do automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> and automatic categorization perform <strong>in</strong><br />

relation to the identified doma<strong>in</strong> characteristics as to<br />

2.1. Number of queries <strong>in</strong> sessions?<br />

2.2. Number of terms <strong>in</strong> queries?<br />

2.3. Number of concepts <strong>in</strong> queries?<br />

2.4. The type of search operator applied?<br />

2.5. The use of document type filters?<br />

2.6. Number of reformulations?<br />

2.7. Types of reformulations?<br />

2.8. Degree of search success <strong>in</strong> queries and sessions?<br />

2.9. Overall performance measured by performance measures?<br />

2.10. Which implications does the performance of different <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods have<br />

for future <strong><strong>in</strong>dex<strong>in</strong>g</strong> and <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es <strong>in</strong> the doma<strong>in</strong> of e-<strong>government</strong>?<br />

The empirical basis for the second research question and related sub questions is the<br />

data collected <strong>in</strong> connection with the search test (see Table 1.1). The search test<br />

consists of an experimental comparison test of two <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods. The test was<br />

carried out <strong>in</strong> a realistic sett<strong>in</strong>g <strong>in</strong> a real life <strong>government</strong>al <strong>in</strong>tranet. As the purpose of<br />

the test is to form a basis for ensur<strong>in</strong>g and develop<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> terms of effectiveness<br />

and efficiency, variables measur<strong>in</strong>g search time and effort were important factors <strong>in</strong> the<br />

test design. Questions 2.1-2.7 are answered on the basis of the search log generated<br />

dur<strong>in</strong>g the course of the test. Questions 2.8-2.9 are based on the test persons’<br />

assessment of retrieved outcomes. Post search <strong>in</strong>terviews are <strong>in</strong>cluded to understand<br />

and expla<strong>in</strong> test person behaviour dur<strong>in</strong>g the test. In question 2.10 we sum up the<br />

f<strong>in</strong>d<strong>in</strong>gs of the search test and provide the perspective of <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es for e<strong>government</strong>.<br />

1.5 Structure of the thesis<br />

The thesis reports two <strong>in</strong>terconnected empirical studies. The first study is the<br />

doma<strong>in</strong> study, which is followed by the second study: the search test. The reason for<br />

the succession is that the doma<strong>in</strong> study forms the basis for the search test. The thesis is<br />

8


9<br />

Chapter 1<br />

<strong>in</strong>troduced by a theoretical part. Next follows the empirical part. The theoretical part is<br />

constituted by the chapters 2, 3, 4, and 5. Chapter 2 makes a more thorough<br />

presentation of the empirical assumptions <strong>in</strong>troduced above. Here the methodological<br />

frame guid<strong>in</strong>g both the theoretical parts and the data collection for both doma<strong>in</strong> study<br />

and search test is outl<strong>in</strong>ed. As the case study comprises a part of the methodological<br />

frame, it is also presented here along with a thorough <strong>in</strong>troduction to the specific case:<br />

SKAT.<br />

Chapter 3, 4, and 5 constitute the theoretical basis for the doma<strong>in</strong> study.<br />

Chapter 3 <strong>in</strong>troduces the research area of e-<strong>government</strong>. The purpose of the chapter is to<br />

outl<strong>in</strong>e the doma<strong>in</strong> that the present thesis navigates <strong>in</strong>. In chapter 4 the focus is<br />

narrowed down to analys<strong>in</strong>g what is known about the seek<strong>in</strong>g behaviour of professional<br />

e-<strong>government</strong> users. The theoretical foundation for the search test is presented <strong>in</strong><br />

chapter 5. The chapter conta<strong>in</strong>s a review of manual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The first<br />

part <strong>in</strong>troduces core concepts and understand<strong>in</strong>gs of <strong><strong>in</strong>dex<strong>in</strong>g</strong> and categorization, and<br />

establishes the connection between the two concepts. The second part presents exist<strong>in</strong>g<br />

knowledge on the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods and categorization.<br />

The empirical part of the thesis comprises the chapters 6, 7, and 8. In chapter 6<br />

the applied methods and underly<strong>in</strong>g considerations are presented, firstly for the doma<strong>in</strong><br />

study, secondly for the search test. The chapter f<strong>in</strong>ishes by connect<strong>in</strong>g the empirical<br />

elements to the research questions of the thesis. Chapter 7 presents the results of the<br />

doma<strong>in</strong> study. First, a questionnaire was carried out. The questionnaire was followed<br />

up by 7 focus group <strong>in</strong>terviews. The purpose of the focus groups was a validation and<br />

elaboration of the questionnaire results. The results are reported when relevant to<br />

research question 1 and connected sub questions. Chapter 8 conta<strong>in</strong>s the results of the<br />

search test. The overall aim of chapter 8 is to be able to answer questions raised <strong>in</strong><br />

research question 2. Chapter 9 summarizes and discusses the empirical results. The<br />

thesis is ended by suggestions for further research.


2 Methodological framework<br />

11<br />

Chapter 2<br />

Chapter 2 presents the methodological framework of the thesis. We beg<strong>in</strong> here, as the<br />

theory of scientific method guides the rema<strong>in</strong><strong>in</strong>g of the thesis content. In the research<br />

literature it is suggested to discrim<strong>in</strong>ate methodology from methods. Methodology is a<br />

superior concept that describes, expla<strong>in</strong>s, and justifies the methods used <strong>in</strong> empirical<br />

studies. Methodology may thus be considered a science theoretical or science<br />

philosophical concept address<strong>in</strong>g epistemological concerns. Conversely, method is<br />

subord<strong>in</strong>ate to methodology and designates the specific methods and techniques applied<br />

<strong>in</strong> empirical studies (Wang, 1999). To structure the methodical parts of the thesis we<br />

are follow<strong>in</strong>g this division. Therefore, <strong>in</strong> the present chapter we will present the<br />

methodological issues that have guided the research design and the collection of data.<br />

In a later chapter (Chapter 6), we account for the specific methods applied to collect the<br />

data that constitutes the empirical basis of the thesis.<br />

2.1 A cognitive framework for <strong>in</strong>formation research<br />

As mentioned <strong>in</strong> the <strong>in</strong>troduction, we have been work<strong>in</strong>g with<strong>in</strong> the the<br />

cognitive framework of <strong>in</strong>formation science. The cognitive view was proposed the first<br />

time <strong>in</strong> 1977 (De Mey, 1977; Ingwersen & Järvel<strong>in</strong>, 2005). Here, the cognitive<br />

viewpo<strong>in</strong>t was proposed as a reaction to the two predom<strong>in</strong>ant research traditions at the<br />

time; the system driven and the user oriented research traditions. With<strong>in</strong> the systemdriven<br />

research tradition significant results has been achieved regard<strong>in</strong>g for <strong>in</strong>stance<br />

best-match retrieval models, Boolean logic, question answer<strong>in</strong>g, and cross-language<br />

retrieval. The user oriented research tradition on the other hand have obta<strong>in</strong>ed<br />

equivalently essential results, though <strong>in</strong> relation to <strong>in</strong>creas<strong>in</strong>g our understand<strong>in</strong>g of enduser<br />

search<strong>in</strong>g, doma<strong>in</strong> oriented <strong>in</strong>formation behaviour and the like (Ingwersen, 1996;<br />

Ingwersen & Järvel<strong>in</strong>, 2005). Despite the respective importance of their f<strong>in</strong>d<strong>in</strong>gs, the<br />

two research traditions have been criticized for be<strong>in</strong>g unilateral <strong>in</strong> their methodological<br />

approaches. Thus, the system-driven tradition has been follow<strong>in</strong>g the pr<strong>in</strong>ciple of test<br />

collections, a pr<strong>in</strong>ciple that arose from the Cranfield model. The Cranfield model<br />

measured retrieval performance on the basis of a test collection, a set of queries, and a


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Figure 2.1 The participat<strong>in</strong>g actors <strong>in</strong> context. Model adapted from Ingwersen & Järvel<strong>in</strong> (2005, p.<br />

261) with m<strong>in</strong>or corrections.<br />

set of relevance assessments (Borlund, 2003b). This laboratory like-approach is the<br />

counterpo<strong>in</strong>t to the user oriented tradition. Like <strong>in</strong> the system-driven tradition, the user<br />

oriented tradition is to a large extent based on empirical <strong>in</strong>vestigations, but the<br />

perspective is operational. Furthermore <strong>in</strong>formation users, and not IR systems, are the<br />

focus of attention (Ingwersen, 1996). This leads to a contrast between the traditions,<br />

which has been summed up by Robertson & Hancock-Beaulieu to comprise “…on the<br />

one hand, control over experimental variables, observability, and repeatability, and on<br />

the other hand, realism.” (1992, p. 460).<br />

It was as a reaction towards just this contrast that the cognitive view emerged.<br />

The pioneers of the cognitive viewpo<strong>in</strong>t reacted towards what was considered a onesided<br />

focus on IR systems or users respectively. Instead an alternative approach was<br />

suggested that offered a holistic picture of the IR process. It was acknowledged that <strong>in</strong><br />

order to ga<strong>in</strong> a comprehensive picture of the process of IR <strong>in</strong>teraction, the cognitive<br />

structure of all cognitive actors of the process of <strong>in</strong>teraction needed to be acknowledged<br />

and taken <strong>in</strong>to consideration (cf. Figure 2.1). Five dimensions represent and summarize<br />

the cognitive view. They comprise:<br />

1. “Information process<strong>in</strong>g takes place <strong>in</strong> senders and recipients of messages;<br />

2. Process<strong>in</strong>g takes place at different levels;<br />

12


13<br />

Chapter 2<br />

3. Dur<strong>in</strong>g communication of <strong>in</strong>formation any actor is <strong>in</strong>fluenced by its past<br />

and present experiences (time) and its social, organizational and cultural<br />

environment;<br />

4. Individual actors <strong>in</strong>fluence the environment or doma<strong>in</strong>;<br />

5. Information is situational and contextual.” (Ingwersen & Järvel<strong>in</strong>, 2005, p.<br />

25).<br />

Thus, <strong>in</strong> the cognitive view, senders and recipients of messages not only encompass<br />

<strong>in</strong>formation users, but any actor contribut<strong>in</strong>g to or participat<strong>in</strong>g <strong>in</strong> an aspect of the<br />

process of IR at that (Ingwersen & Järvel<strong>in</strong>, 2005, p. 27). By that means the framework<br />

supported the <strong>in</strong>tegration of IR techniques and IR systems <strong>in</strong>clud<strong>in</strong>g their underly<strong>in</strong>g<br />

cognitive structures and human <strong>in</strong>formation users and their <strong>in</strong>formation behaviour. In<br />

sum it was emphasized that the approach was not solely user oriented, but rather offered<br />

a framework for all human actors and their cognitive structures <strong>in</strong>volved <strong>in</strong> IR<br />

<strong>in</strong>teraction (Ingwersen & Järvel<strong>in</strong>, 2007, p. 141).<br />

However, the attention to all cognitive actors did not reduce <strong>in</strong>terest for the<br />

<strong>in</strong>formation user. Thus, the <strong>in</strong>formation need of the user functioned as the benchmark<br />

for measurement of the success of IR systems. The understand<strong>in</strong>g of users’ <strong>in</strong>formation<br />

needs and their formation has been captured by the ASK-hypothesis. The hypothesis<br />

stated that an <strong>in</strong>formation need arises from an anomaly <strong>in</strong> a user’s state of knowledge<br />

concern<strong>in</strong>g a topic or situation. Thus, <strong>in</strong> preparation for IR, users should be asked to<br />

describe the anomaly rather than to state a request represent<strong>in</strong>g the <strong>in</strong>formation need to<br />

an IR system (Belk<strong>in</strong>, Oddy & Brooks, 1982, p. 62). To summarize, the cognitive view<br />

allowed for a more detailed representation of <strong>in</strong>formation users compared to what was<br />

previously known from the system driven and the user oriented research traditions.<br />

2.1.1 Towards a holistic cognitive framework<br />

From the very beg<strong>in</strong>n<strong>in</strong>g researchers with<strong>in</strong> the cognitive view were ma<strong>in</strong>ly<br />

concerned with <strong>in</strong>dividual variances of cognitive structures. However, developments <strong>in</strong><br />

surround<strong>in</strong>g research areas have <strong>in</strong> the early 1990’s caused proportional change with<strong>in</strong><br />

the cognitive framework towards an <strong>in</strong>creased attention to contextual matters.<br />

Ingwersen br<strong>in</strong>gs out two particular papers as landmark to the change of focus<br />

(Ingwersen, 1999, p. 11 ff.). One is Schamber, Eisenberg & Nilan’s (1990) paper on the<br />

concept of situational relevance. On the basis of a thorough review the authors<br />

characterize situational relevance to be a:


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

1. “[…] multidimensional cognitive concept whose mean<strong>in</strong>g is largely dependent<br />

on users’ perceptions of <strong>in</strong>formation and their own <strong>in</strong>formation need<br />

situations[…]<br />

2. […] dynamic concept that depends on users’ judgments of the quality of the<br />

relationship between <strong>in</strong>formation and <strong>in</strong>formation need at a certa<strong>in</strong> po<strong>in</strong>t <strong>in</strong><br />

time[…]<br />

3. […] complex but systematic and measurable concept if approached<br />

conceptually and operationally from the user’s perspective.” (Schamber,<br />

Eisenberg & Nilan, 1990, 1990, p. 774).<br />

With Schamber, Eisenberg & Nilans paper, the discussion of relevance was re-opened.<br />

The other paper accentuated by Ingwersen is Robertson & Hancock-Beaulieu’s (1992)<br />

manifestation of the relevance revolution, the cognitive revolution, and the <strong>in</strong>teractive<br />

revolution. The relevance revolution addresses the change towards see<strong>in</strong>g stated<br />

requests and <strong>in</strong>formation needs as two separate phenomenons. The implication is that<br />

relevance should be assessed on the basis of the <strong>in</strong>formation need, and not the request.<br />

The cognitive revolution is closely connected to the relevance revolution and states the<br />

grow<strong>in</strong>g tendency towards <strong>in</strong>clud<strong>in</strong>g cognitive perspectives <strong>in</strong>to the process of IR.<br />

Lastly, the <strong>in</strong>teractive revolution articulates the <strong>in</strong>creased <strong>in</strong>teractivity of IR<br />

systems. This development necessitates a move away from the pr<strong>in</strong>ciple of evaluat<strong>in</strong>g<br />

IR systems <strong>in</strong> terms of “one request <strong>in</strong>, one set of results out”. Instead, time and<br />

Figure 2.2 Extension of the cognitive view, the <strong>in</strong>teractive process of IR and affect<strong>in</strong>g factors.<br />

Adapted from Ingwersen & Järvel<strong>in</strong> (2005, p. 274) with m<strong>in</strong>or corrections.<br />

14


15<br />

Chapter 2<br />

situation need to be taken <strong>in</strong>to consideration <strong>in</strong> order to do justice to the special<br />

characteristics of <strong>in</strong>teractive IR (IIR) (Robertson & Hancock-Beaulieu, 1992, pp. 458-<br />

459). The three revolutions challenge the simplified conception of the IR process<br />

presented <strong>in</strong> the system driven research tradition and po<strong>in</strong>t out that far more factors<br />

<strong>in</strong>fluence the process. The outcome of the developments was an <strong>in</strong>creased focus on<br />

context and <strong>in</strong>teraction <strong>in</strong> the process of IR (see Figure 2.2).<br />

With the shift<strong>in</strong>g of focus, an equivalent change of potential research areas<br />

emerged. To illustrate, five categories of variables appear from Figure 2.2; 1)<br />

organizational task dimensions; 2) actor dimensions; document dimensions; 4)<br />

algorithmic dimensions; and 5) access and <strong>in</strong>teraction dimensions (Ingwersen &<br />

Järvel<strong>in</strong>, 2005, p. 313-314). The <strong>in</strong>tention of the model is to illustrate the <strong>in</strong>fluences<br />

and <strong>in</strong>teractions tak<strong>in</strong>g place dur<strong>in</strong>g IR <strong>in</strong>teraction. Not all studies should necessarily<br />

<strong>in</strong>corporate all elements <strong>in</strong> order to f<strong>in</strong>d themselves with<strong>in</strong> the framework. Rather, they<br />

serve as possible explanations for patterns identified with<strong>in</strong> empirical f<strong>in</strong>d<strong>in</strong>gs.<br />

2.1.2 The role of work tasks<br />

Along with the <strong>in</strong>creased <strong>in</strong>clusion of context and <strong>in</strong>teraction <strong>in</strong> the cognitive<br />

framework, work tasks (or daily-life tasks) have become more central. The work task<br />

methodology was <strong>in</strong>troduced to LIS <strong>in</strong> the early 1990s (Vakkari, 2003). The basic<br />

assumption of us<strong>in</strong>g tasks as the foundation of <strong>in</strong>formation seek<strong>in</strong>g and retrieval studies<br />

is that an <strong>in</strong>formation <strong>in</strong>tensive task <strong>in</strong>volves <strong>in</strong>formation related actions. Thus, the task<br />

becomes a framework for analysis of IR systems (Byström & Hansen, 2005). The work<br />

task methodology has ma<strong>in</strong>ly been applied to professional work tasks. Lately, however,<br />

also non-job related tasks have been <strong>in</strong>vestigated with<strong>in</strong> the context of the task<br />

methodology (e.g., Savola<strong>in</strong>en, 1995; Skov, 2009).<br />

Tasks are important to the cognitive view, because it is considered as “the<br />

central element of the context” (Ingwersen & Järvel<strong>in</strong>, 2005, p. 29). Thus, a work task<br />

arises from an <strong>in</strong>cident outside of the user and triggers an <strong>in</strong>formation need with<strong>in</strong> the<br />

user, which aga<strong>in</strong> triggers seek<strong>in</strong>g behaviour (see Figure 2.3). As a result, to understand<br />

seek<strong>in</strong>g behaviour and IR <strong>in</strong>teraction, we must understand the composition of tasks and<br />

their contextual orig<strong>in</strong>. For evaluation purposes with<strong>in</strong> a cognitive frame, build<strong>in</strong>g on<br />

genu<strong>in</strong>e tasks may be challeng<strong>in</strong>g, as their extent and usefulness may vary a lot. As a<br />

consequence comparison between results is impeded. Therefore, to ensure experimental


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Figure 2.3 Information behaviour and the <strong>in</strong>fluence from job- or non-job related tasks. Adapted<br />

from Ingwersen & Järvel<strong>in</strong>(Ingwersen & Järvel<strong>in</strong>, 2005, p. 198).<br />

control and realism, simulated work tasks have been proposed as a methodical tool.<br />

Here cover stories are handed out to <strong>in</strong>formation users to form the basis for <strong>in</strong>formation<br />

search<strong>in</strong>g. On this basis of the story, <strong>in</strong>formation needs are formed with<strong>in</strong> the user, that<br />

serve as an equal po<strong>in</strong>t of departure for <strong>in</strong>teraction with the IR system under evaluation<br />

(Borlund, 2000, 2003b). A consequence of us<strong>in</strong>g tasks as the basel<strong>in</strong>e for evaluation is<br />

the application of situational relevance for measurement of performance (cf. Saracevic,<br />

1996; Borlund, 2003a).<br />

2.2 The cognitive framework and the thesis<br />

The cognitive framework was chosen as the methodological frame of reference<br />

<strong>in</strong> the present work. The quantitative extent of the framework may be discussed. Thus,<br />

arguments exist on a wide extension of the framework (Cole & Leide, 2006, p. 175) and<br />

vice versa (e.g., Järvel<strong>in</strong>, 2007). Regardless of the prevalence we have applied it to<br />

guide the empirical part of the project. The overall reason was the nature of the task set<br />

by the National IT and Telecom Agency, to produce a foundation for giv<strong>in</strong>g guidel<strong>in</strong>es<br />

for automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> with<strong>in</strong> the particular doma<strong>in</strong> of e-<strong>government</strong>. To be able to<br />

give guidel<strong>in</strong>es, we needed to discover the actual use of the IR technique among e<strong>government</strong><br />

employees, as they are the target user group of the project. That required a<br />

methodological framework allow<strong>in</strong>g for a search test with a contextual perspective. For<br />

this purpose the cognitive framework was found suitable. Hereby we were able to<br />

16


17<br />

Chapter 2<br />

discover the doma<strong>in</strong> specific characteristics of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods and add to the general<br />

and very extensive body of knowledge regard<strong>in</strong>g the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods.<br />

The methodology is mirrored throughout the research design of the thesis. The<br />

<strong>in</strong>itial doma<strong>in</strong> study serves the purpose of uncover<strong>in</strong>g contextual characteristics of the<br />

doma<strong>in</strong> <strong>in</strong> question, and of provid<strong>in</strong>g doma<strong>in</strong> knowledge and <strong>in</strong>sight. In Figure 2.2 this<br />

corresponds to the right hand side of the model. Different methods have been comb<strong>in</strong>ed<br />

for the doma<strong>in</strong> study. Initially, exist<strong>in</strong>g studies on seek<strong>in</strong>g behavior with<strong>in</strong> the doma<strong>in</strong><br />

and adjacent doma<strong>in</strong>s were reviewed. As the amount of exist<strong>in</strong>g studies turned out to<br />

be fairly limited, the review is followed up by an empirical doma<strong>in</strong> study consist<strong>in</strong>g of a<br />

survey questionnaire and 7 focus group <strong>in</strong>terviews. In similar manner, the search test<br />

also reflects the methodology. In Figure 2.2 the search test comprise the center and left<br />

hand side components. Here employees are asked to evaluate a test system on the basis<br />

of a number of simulated work tasks. As called for <strong>in</strong> the cognitive framework,<br />

situational relevance is applied for assessment of search results.<br />

2.3 Overall research method: Case study<br />

The method applied <strong>in</strong> the thesis is a s<strong>in</strong>gle case study (Y<strong>in</strong>, 2003, p. 39-40)<br />

of a large Danish <strong>government</strong>al organization: SKAT. Different motivations exist for<br />

do<strong>in</strong>g case studies. The predom<strong>in</strong>ant rationale <strong>in</strong> the present research study is that the<br />

organisation <strong>in</strong> question constitutes a unique case <strong>in</strong> Denmark due to its pioneer<br />

position with<strong>in</strong> e-<strong>government</strong> (see e.g., Østergaard & Olesen, 2004). The strength of<br />

case studies is their ability to draw on multiple sources of data. Further, case studies<br />

cover contextual aspects of the case <strong>in</strong> question (Y<strong>in</strong>, 2003). The research design<br />

reported here consists of two ma<strong>in</strong> parts; a doma<strong>in</strong> study and a search test. The doma<strong>in</strong><br />

study employs a survey questionnaire and focus group <strong>in</strong>terviews as data sources. The<br />

search test aims at a controlled environment. As <strong>in</strong> the doma<strong>in</strong> study, we document the<br />

search test with both quantitative and qualitative data.<br />

2.4 The case: SKAT<br />

The prevail<strong>in</strong>g task of SKAT is to collect the major part of taxes <strong>in</strong> Denmark.<br />

The organization handles all adm<strong>in</strong>istration related to taxes, duties, customs, debt<br />

collection, tax assessment of real estate and cars, and gam<strong>in</strong>g activities (SKAT, 2010).


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

SKAT is among the largest adm<strong>in</strong>istrations of the Danish state <strong>in</strong> terms of employees,<br />

when compared to similar adm<strong>in</strong>istrations (Personalestyrelsen, 2010). The<br />

organization has approximately 8.500 employees located at different office locations<br />

across Denmark. SKAT has grown over the years due to several mergers of former<br />

s<strong>in</strong>gle, m<strong>in</strong>or organizations (e.g., Johansen, 2007). In this manner, the organization<br />

handles highly diverse work tasks. Snellen (1989, cited from Lips, 1998, p. 326) has<br />

identified three levels <strong>in</strong> <strong>government</strong>s’ service environment; the macro level, the meso<br />

level, and the micro level. SKAT operates at all three levels, serv<strong>in</strong>g the parliament,<br />

bus<strong>in</strong>esses, and citizens. SKAT is organized by tasks rather than geography. In<br />

practice this means, that specialized work tasks have been consolidated at certa<strong>in</strong><br />

geographic locations. The purpose of the sub departments is to serve at national level<br />

(SKAT, 2010). This organizational structure allows for a highly specialized<br />

knowledge among the employees.<br />

Some years ago SKAT carried out a bus<strong>in</strong>ess model for <strong>in</strong>ternal use. The<br />

purpose of the bus<strong>in</strong>ess model was to be able to comprise all work tasks carried out by<br />

the organization. The work identified 19 condensed work tasks distributed across 6<br />

ma<strong>in</strong> processes. The ma<strong>in</strong> processes are: Instruction, settlement, <strong>in</strong>spection, collection,<br />

processes of support, and management and development. The two latter ma<strong>in</strong><br />

processes are <strong>in</strong>ternal processes or aimed at servic<strong>in</strong>g the parliamentary part of<br />

Denmark while the former four has citizens and companies as their target group of<br />

Figure 2.4 SKATs revised bus<strong>in</strong>ess model<br />

18


19<br />

Chapter 2<br />

tasks appear from Appendix 1. In between the data collection for the doma<strong>in</strong> study<br />

service. Work tasks carried out across the organization had been described centrally <strong>in</strong><br />

the organization, while department specific work tasks were described by the<br />

responsible departments. A description of the ma<strong>in</strong> processes and condensed work<br />

and the search test a slight correction of the bus<strong>in</strong>ess model was made. The six ma<strong>in</strong><br />

processes rema<strong>in</strong>ed <strong>in</strong>tact, but the condensed work tasks were extended to be applied<br />

across all ma<strong>in</strong> processes. The revised bus<strong>in</strong>ess model is depicted <strong>in</strong> Figure 2.4. The<br />

size and importance of the ma<strong>in</strong> processes is, at least quantitatively, mirrored by the<br />

distribution of employees. Thus, settlement and <strong>in</strong>spection are the largest of the ma<strong>in</strong><br />

processes, cover<strong>in</strong>g approximately 60 percent of the entire workforce. The rema<strong>in</strong><strong>in</strong>g<br />

40 percent are divided between the 4 rema<strong>in</strong>der of the ma<strong>in</strong> processes (see Appendix<br />

2). Translated to the term<strong>in</strong>ology of Byström & Hansen (2005) the condensed work<br />

tasks are at task description level. The ma<strong>in</strong> processes represent the lowest level of<br />

granularity compared to the condensed work tasks (cf. Vakkari, 2003). But also the<br />

condensed work tasks are fairly coarse gra<strong>in</strong>ed. In the bus<strong>in</strong>ess model the generic<br />

work tasks conta<strong>in</strong> more specific sub task descriptions. In Freund, Toms &<br />

Waterhouse’s (2005) term<strong>in</strong>ology this way of operationaliz<strong>in</strong>g work tasks is contentbased.<br />

As a result it is specifically directed towards tax employees <strong>in</strong> the case<br />

organization.<br />

2.4.1 The <strong>in</strong>tranet<br />

The <strong>in</strong>tranet of SKAT functions as the test system for the search test. The<br />

<strong>in</strong>tranet is a CMS based solution accessible to all employees with<strong>in</strong> the organization<br />

(White, 2005). The <strong>in</strong>tranet mirrors the official web portal of SKAT, which is open to<br />

the public on the web (see http://www.skat.dk). The public portal communicates<br />

<strong>in</strong>formation directed towards citizens, bus<strong>in</strong>esses and legal advisors. Specifically, the<br />

portal conta<strong>in</strong>s legal directions, citizen and bus<strong>in</strong>ess directions and brochures, legal<br />

documents, forms, news, etc. Further, the portal conta<strong>in</strong>s a section for self service for<br />

both citizens and bus<strong>in</strong>esses. On the <strong>in</strong>tranet additional documents are available to the<br />

employees. Examples count m<strong>in</strong>utes, job post<strong>in</strong>gs, reports from f<strong>in</strong>ished <strong>in</strong>ternal<br />

projects, HR <strong>in</strong>formation and other <strong>in</strong>ternal <strong>in</strong>formation from the organization and<br />

departments to the rema<strong>in</strong><strong>in</strong>g employees. The <strong>in</strong>tranet conta<strong>in</strong>s documents from June<br />

25, 1998 and onwards. By June 2010 the number of documents <strong>in</strong> the database was<br />

681.640. The <strong>in</strong>tranet further facilitates personalization of the <strong>in</strong>terface <strong>in</strong> order to<br />

optimize which <strong>in</strong>formation is offered to <strong>in</strong>dividual employees. In sum, we may


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

characterize the <strong>in</strong>tranet as a knowledge portal <strong>in</strong> terms of Dias (2001). Thus, the<br />

<strong>in</strong>tranet is a corporate portal enabl<strong>in</strong>g decision support and collaborative process<strong>in</strong>g. In<br />

addition the “F<strong>in</strong>d colleague” function (“F<strong>in</strong>d kollega”) assists <strong>in</strong> locat<strong>in</strong>g colleagues<br />

either on the basis of organizational affiliation, physical location or expertise, which<br />

corresponds to an <strong>in</strong>tegrated expertise portal.<br />

Apply<strong>in</strong>g the <strong>in</strong>tranet for the search test has a number of implications <strong>in</strong><br />

empirical respect. With this choice of test system the search test belongs to the research<br />

area of enterprise search. Enterprise search <strong>in</strong>cludes organizations with electronic text<br />

content, and search of the organization’s <strong>in</strong>tra-, Internet, or other digitalized text (cf.<br />

Hawk<strong>in</strong>g, 2004). Furthermore, a number of characteristics are shared between<br />

corporate <strong>in</strong>tranets and the web. Thus, both are based on web technology. They<br />

demonstrate a great heterogeneity as to the document collection, a dynamic nature, and<br />

both enable hyper l<strong>in</strong>k<strong>in</strong>g between documents (cf., Fag<strong>in</strong> et al., 2003; Rasmussen,<br />

2003). However, the two system types also differ <strong>in</strong> several respects. Firstly, the<br />

premises of the two system types differentiate. Thus, the function of the web is a<br />

democratic <strong>in</strong>strument allow<strong>in</strong>g everyone to express anyth<strong>in</strong>g. On the contrary,<br />

<strong>in</strong>tranets<br />

Figure 2.5 Screen dump from exist<strong>in</strong>g <strong>in</strong>tranet <strong>in</strong>terface<br />

20


21<br />

Chapter 2<br />

are an organizational tools communicat<strong>in</strong>g <strong>in</strong>formation of relevance for ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g<br />

enterprise work tasks (Fag<strong>in</strong> et al., 2003; Mukherjee & Mao, 2004; Stenmark, 2005).<br />

In 2003, Fag<strong>in</strong> et al. have stated 4 axioms compil<strong>in</strong>g further differences between the<br />

web and <strong>in</strong>tranets. In short the axioms state that by contrast to <strong>in</strong>ternet documents,<br />

<strong>in</strong>tranet documents are ma<strong>in</strong>ly created for distribution of <strong>in</strong>formation, not for attract<strong>in</strong>g<br />

the attention of potential users. In addition, a large amount of <strong>in</strong>tranet queries have a<br />

small set of correct answers, if not even unique answers. Also, <strong>in</strong>tranets are most<br />

likely spam free due to limitations as regards publish<strong>in</strong>g access. Lastly, <strong>in</strong>tranets are<br />

not expected to be search eng<strong>in</strong>e friendly due to the lack of <strong>in</strong>terl<strong>in</strong>k<strong>in</strong>g between<br />

documents. Denot<strong>in</strong>g the characteristics as axioms, Fag<strong>in</strong> et al <strong>in</strong>dicate that we do not<br />

have empirical evidence for the correctness of the differences. The lack of empirical<br />

confirmation may be expla<strong>in</strong>ed by the difficulties of ga<strong>in</strong><strong>in</strong>g systematic access to<br />

perform data collection at corporate <strong>in</strong>tranets (cf. Stenmark, 2005). In terms of the<br />

present <strong>in</strong>vestigation we will account for the specific characteristics concern<strong>in</strong>g<br />

<strong>in</strong>tranets, whenever we have empirical evidence as support.<br />

2.4.2 The <strong>in</strong>tranet taxonomy<br />

The process of <strong><strong>in</strong>dex<strong>in</strong>g</strong> on the Internet is obviously by far more extensive than on an<br />

<strong>in</strong>tranet due to the disparity between numbers of documents. However the need for<br />

organiz<strong>in</strong>g documents on corporate <strong>in</strong>tranets also <strong>in</strong>creases along with the number of<br />

documents stored (cf. Gilchrist, 2001). This is mirrored by the differences between the<br />

former and the current taxonomy used on SKATs <strong>in</strong>tranet. As mentioned above, a new<br />

and enlarged taxonomy was <strong>in</strong>troduced on the <strong>in</strong>tranet as of the beg<strong>in</strong>n<strong>in</strong>g of 2008. The<br />

ma<strong>in</strong> functions of a taxonomy is to be able to elim<strong>in</strong>ate uncerta<strong>in</strong>ty, control synonyms,<br />

and establish hierarchical relationships (Zeng, 2008). The preced<strong>in</strong>g taxonomy<br />

corresponded to these characteristics apart from the latter. Thus, the taxonomy had a<br />

flat structure with a one level hierarchy. 25 subject terms represented the taxonomy.<br />

The succeed<strong>in</strong>g taxonomy was expanded <strong>in</strong> different aspects result<strong>in</strong>g <strong>in</strong> a more detailed<br />

presentation of corporate, controlled terms. One change was the <strong>in</strong>troduction of a<br />

second level <strong>in</strong> the hierarchy that enabled an <strong>in</strong>crease of specificity <strong>in</strong> topic<br />

representations. Also the number of terms <strong>in</strong>cluded <strong>in</strong>creased. As of march 2010, the<br />

taxonomy <strong>in</strong>corporated 169 terms at both levels of the hierarchy. Lastly, the controlled<br />

terms of the taxonomy had been supplied with mouse over texts, which basically had<br />

the form of scope notes as known from thesauri. By these means further reduction of<br />

ambiguity and <strong>in</strong>creased control of synonyms are ga<strong>in</strong>ed.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Hitherto the <strong><strong>in</strong>dex<strong>in</strong>g</strong> of <strong>in</strong>tranet documents have been carried out manually by<br />

a large group of <strong>in</strong>dexers distributed across the organization (between 1000-1500<br />

<strong>in</strong>dexers). A corporate taxonomy has formed the basis for the controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong>. It is<br />

a common practice <strong>in</strong> e-<strong>government</strong>s <strong>in</strong> general, that employees attach subject terms to<br />

adm<strong>in</strong>istrative documents. In section 5.3.3, we presented three different k<strong>in</strong>ds of<br />

<strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>, namely expert-led, author-based, and user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The<br />

manual assignment of subject terms carried out by the employees <strong>in</strong> the organization is<br />

not easily characterized as one or the other. The expert-led type is represented <strong>in</strong> the<br />

way, that not all employees handle the assignment. Rather, a group of employees carry<br />

out the task, though the number is quite large. On one side, when a group of employees<br />

has been appo<strong>in</strong>ted to the task, it is reasonable to expect, that they have a more detailed<br />

<strong>in</strong>sight <strong>in</strong>to the taxonomy compared to the non-<strong><strong>in</strong>dex<strong>in</strong>g</strong> colleagues. On the other hand,<br />

the large number of <strong>in</strong>dexers could mean, that the <strong><strong>in</strong>dex<strong>in</strong>g</strong> task is not a very frequent<br />

one, which aga<strong>in</strong> results <strong>in</strong> a limited <strong>in</strong>sight <strong>in</strong>to the taxonomy. One th<strong>in</strong>g is certa<strong>in</strong><br />

about the group of <strong>in</strong>dexers; the typical <strong>in</strong>dexer is not a professional <strong>in</strong>dexer <strong>in</strong> the sense<br />

that he or she carries a LIS degree. The <strong><strong>in</strong>dex<strong>in</strong>g</strong> at SKAT also conta<strong>in</strong>s elements of<br />

author-based <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the sense, that the <strong>in</strong>dexers occasionally will be the authors of<br />

the <strong>in</strong>dexed documents. Lastly, the <strong><strong>in</strong>dex<strong>in</strong>g</strong> may also be characterized as user-based <strong>in</strong><br />

the sense, that the <strong>in</strong>dexers apart from be<strong>in</strong>g <strong>in</strong>dexers are also users of the system.<br />

The document collection at the exist<strong>in</strong>g <strong>in</strong>tranet can be divided <strong>in</strong> two groups;<br />

documents published before December 31, 2007 and documents published from January<br />

1, 2008 and ahead. January 1, 2008 signifies the day, when a revised taxonomy was<br />

taken <strong>in</strong>to use <strong>in</strong> the case organization. The implementation of the revised taxonomy<br />

had different implications. The manual assignment of <strong>in</strong>dex terms cont<strong>in</strong>ued after the<br />

deadl<strong>in</strong>e, though follow<strong>in</strong>g the structure of the revised taxonomy. However, at the same<br />

time the <strong>in</strong>dex terms assigned to the former group of documents were deleted <strong>in</strong> the<br />

database. Therefore documents published before January 1, 2008 could only be<br />

searched by free text <strong><strong>in</strong>dex<strong>in</strong>g</strong>. When our cooperation with SKAT started, the<br />

organization was already work<strong>in</strong>g on a new portal solution encompass<strong>in</strong>g their <strong>in</strong>ternet<br />

and <strong>in</strong>tranet. The new portal comprises different changes and improvements <strong>in</strong>clud<strong>in</strong>g<br />

automatic categorization of search results, which is brought <strong>in</strong>to focus <strong>in</strong> the present<br />

thesis.<br />

22


2.5 Summary<br />

23<br />

Chapter 2<br />

The present chapter have presented and argued for the overall research<br />

methodology applied for the PhD project. We have reviewed the cognitive framework<br />

and its development from an <strong>in</strong>dividualistic towards a contextual methodological<br />

foundation. The choice of methodological standpo<strong>in</strong>t enables the collection and<br />

analysis of data that supplements the exist<strong>in</strong>g general knowledge on the performance of<br />

automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods. With<strong>in</strong> the cognitive framework the case study<br />

methodology has been applied as the overall frame for the specific collection and<br />

analysis of data reported later. Specifically, we carry out a case study of a Danish<br />

organization, a pioneer <strong>in</strong> terms of e-<strong>government</strong>: SKAT.


3 The e-<strong>government</strong> doma<strong>in</strong><br />

25<br />

Chapter 3<br />

Dur<strong>in</strong>g the past century <strong>government</strong>s all over the World have experienced a cont<strong>in</strong>uous<br />

<strong>in</strong>crease <strong>in</strong> demands for effectivity of procedures and work rout<strong>in</strong>es simultaneously<br />

with expectations for accuracy and quality <strong>in</strong> public servants’ handl<strong>in</strong>g of work tasks<br />

(eg. Homburg, 2004). Increased transparency of <strong>government</strong>s towards citizens has<br />

been another predom<strong>in</strong>ant demand on <strong>government</strong>s dur<strong>in</strong>g the period (eg. Bertot,<br />

Jaeger & Grimes, 2010). The demands for transparency have resulted <strong>in</strong> numerous<br />

technical solutions for citizen access, e.g., self-service, and subsequent user<br />

evaluations. However, the citizen perspective on e-<strong>government</strong> will not be <strong>in</strong>cluded <strong>in</strong><br />

further detail here due to our focus on employees.<br />

The development of <strong>government</strong>s has taken place both at local, national, and<br />

<strong>in</strong>ternational levels. It is <strong>in</strong> this light that the concept of e-<strong>government</strong> has emerged.<br />

Thus, digitalization of <strong>government</strong>s has been an important step towards resolv<strong>in</strong>g the<br />

challenges of <strong>in</strong>creas<strong>in</strong>g effectivity and quality of <strong>government</strong>al processes. The<br />

exam<strong>in</strong>ation of e-<strong>government</strong> as a research area started to grow <strong>in</strong> the late 1990’s (e.g.,<br />

Grönlund & Horan, 2004; Helbig et al., 2008). S<strong>in</strong>ce then the <strong>in</strong>creas<strong>in</strong>g number of<br />

emerg<strong>in</strong>g journals and conferences have <strong>in</strong> their own way clarified the importance of<br />

the research field. However, e-<strong>government</strong> is a complex construction due to its roots<br />

<strong>in</strong> a number of related research fields. Public adm<strong>in</strong>istration, management science,<br />

organization science, <strong>in</strong>formation technology, computer science, and library and<br />

<strong>in</strong>formation science are among the <strong>in</strong>terested parties <strong>in</strong> contribut<strong>in</strong>g to the development<br />

of e-<strong>government</strong>. With the present chapter we <strong>in</strong>troduce the e-<strong>government</strong> doma<strong>in</strong>.<br />

The purpose is to provide an overview and understand<strong>in</strong>g of the doma<strong>in</strong> fram<strong>in</strong>g the<br />

PhD project. Further the presentation enables a characterization and plac<strong>in</strong>g of the<br />

thesis <strong>in</strong> the doma<strong>in</strong>. The chapter forms the first part of two of the doma<strong>in</strong> study<br />

review. We <strong>in</strong>itialize the chapter by def<strong>in</strong><strong>in</strong>g the concept of e-<strong>government</strong> and related<br />

concepts along with the purpose of digitaliz<strong>in</strong>g <strong>government</strong>s. This is followed by an<br />

overview of the steps that have and still do characterize the development with<strong>in</strong> the<br />

doma<strong>in</strong>. Models are <strong>in</strong>cluded here for a graphical presentation of different authors’<br />

perception and <strong>in</strong>terpretation of the development of the field. The chapter ends with a<br />

presentation and discussion of the research field of e-<strong>government</strong>. In this clos<strong>in</strong>g<br />

section, we focus on subject matters relevant to the PhD project as a thorough review


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

of the entire field of e-<strong>government</strong> is outside our scope. Specifically, we address<br />

<strong>in</strong>formation systems, knowledge management, and metadata <strong>in</strong>itiatives.<br />

3.1 Def<strong>in</strong>ition and purpose<br />

Numerous suggestions of what def<strong>in</strong>es the concept of e-<strong>government</strong> exist. A<br />

fairly general def<strong>in</strong>ition is put forward by Gil-Garcia & Mart<strong>in</strong>ez-Moyano (2007, p.<br />

266), who see e-<strong>government</strong> as:<br />

“The use of <strong>in</strong>formation and communication technologies <strong>in</strong> <strong>government</strong><br />

sett<strong>in</strong>gs.“<br />

However, also more detailed def<strong>in</strong>itions have been formulated, e.g., by Fang (2002, p.<br />

3-4):<br />

“- the ability to obta<strong>in</strong> <strong>government</strong> services through nontraditional electronic<br />

means, enabl<strong>in</strong>g access to <strong>government</strong> <strong>in</strong>formation and to completion of<br />

<strong>government</strong> transaction on an anywhere, any time basis and <strong>in</strong> conformance<br />

with equal access requirement.<br />

–offers potential to reshape the public sector and build relationships between<br />

citizens and the <strong>government</strong>.”<br />

Def<strong>in</strong><strong>in</strong>g the concept of e-<strong>government</strong> is not a straight forward task. A number of<br />

researchers have collected and compared several def<strong>in</strong>itions (e.g., Grönlund & Horan,<br />

2004; Robb<strong>in</strong>, Courtright & Davis, 2004; Grant & Chau, 2005; Yildiz, 2007; Hu, Pan<br />

& Wang, 2010). These examples illustrate the miss<strong>in</strong>g common understand<strong>in</strong>g of the<br />

def<strong>in</strong>ition. Today, after more than a decade of research, researchers still <strong>in</strong>quire an<br />

unambiguous def<strong>in</strong>ition (e.g., Grönlund, 2010; Hu, Pan & Wang, 2010). Overall, the<br />

difficulties are related to the content and the designation of the concept. As regards the<br />

content, a number of factors help challenge the task. One factor is the lack of<br />

agreement as to the def<strong>in</strong>ition of central concepts (Robb<strong>in</strong>, Courtright & Davis, 2004).<br />

Thus, e-<strong>government</strong> is def<strong>in</strong>ed and referred to differently, depend<strong>in</strong>g on the actual<br />

scope of research papers (Fang, 2002; Grönlund & Horan, 2004; Grant & Chau, 2005;<br />

Grönlund, 2005). In addition the multidiscipl<strong>in</strong>ary nature of the research field<br />

<strong>in</strong>creases the disagreements (Grönlund & Horan, 2004; Hovy, 2008a). The discipl<strong>in</strong>es<br />

26


27<br />

Chapter 3<br />

Figure 3.1 Discipl<strong>in</strong>es <strong>in</strong>tegrated <strong>in</strong> the multidiscipl<strong>in</strong>ary research field og e-<strong>government</strong>. Adapted<br />

from Wimmer The cont<strong>in</strong>u<strong>in</strong>g (2007, p. 14)<br />

development of the concept is a third factor (c.f., Jaeger, 2003).<br />

considered as contribut<strong>in</strong>g to the field also vary. Wimmer presents the most<br />

comprehensive number of contribut<strong>in</strong>g discipl<strong>in</strong>es <strong>in</strong> her model (see Figure 3.1).<br />

When analysed on the basis of e-<strong>government</strong> researchers’ home departments<br />

Wimmer’s model is supported (Heeks & Bailur, 2007).<br />

Secondly, e-<strong>government</strong> is tak<strong>in</strong>g place at two different levels; the micro level<br />

which concerns the technological changes tak<strong>in</strong>g place with<strong>in</strong> <strong>government</strong>s <strong>in</strong>clud<strong>in</strong>g<br />

ICT; and the macro level which refers to the <strong>in</strong>stitutional changes that are usually also a<br />

part of e-<strong>government</strong> research. The two levels are often separated, which complicates<br />

the understand<strong>in</strong>g of the concept (Meijer & Homburg, 2008). At the micro level,<br />

Grönlund (2003) dist<strong>in</strong>guishes between two fields with<strong>in</strong> e-<strong>government</strong>, one with an<br />

<strong>in</strong>ternal focus and one with external focus organizationally speak<strong>in</strong>g. Both fields imply<br />

changes <strong>in</strong> l<strong>in</strong>e with Meijer & Homburg’s (2008) two levels. The <strong>in</strong>ternal field regards<br />

the <strong>in</strong>ternal changes <strong>in</strong> <strong>government</strong>s that follow from employ<strong>in</strong>g ICT for different<br />

professional operations. This field has been developed for some decades already. The<br />

external field concerns the <strong>in</strong>creas<strong>in</strong>g availability of <strong>in</strong>ternet services aimed at external<br />

parties, e.g., citizens or enterprises (Grönlund, 2003). The ICT systems support<strong>in</strong>g the<br />

two fields are referred to as back office and front office systems respectively (e.g.,<br />

Meijer & Homburg, 2008). In this thesis we are concerned with the micro level


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Figure 3.2 Basic elements and relations <strong>in</strong> <strong>government</strong>al systems (Grönlund, 2003, p. 56)<br />

concern<strong>in</strong>g <strong>in</strong>ternal <strong>government</strong>al changes from ICT, and more specifically <strong>in</strong>formation<br />

seek<strong>in</strong>g and retrieval <strong>in</strong> relation to employees’ work task fulfilment.<br />

What can be <strong>in</strong>ferred from above is that the def<strong>in</strong>ition to some degree depends<br />

on the dist<strong>in</strong>ct references consulted. The number of related concepts does not ease the<br />

def<strong>in</strong>ition task. Consequently we will def<strong>in</strong>e the concepts and the related terms the way<br />

they are used <strong>in</strong> the present work below. We use Figure 3.2 to illustrate the concepts.<br />

The figure is a simplified model of the democratic system, which <strong>in</strong> practice is far more<br />

complex. The figure outl<strong>in</strong>es three zones; civil society, formal politics, and<br />

adm<strong>in</strong>istration and their reciprocal <strong>in</strong>teractions.<br />

Government is considered the overall notion for the concepts to follow. The<br />

concept <strong>government</strong> “covers several aspects of manag<strong>in</strong>g a country, rang<strong>in</strong>g from the<br />

very form of <strong>government</strong> to strategic management to daily operations” (Grönlund, 2003,<br />

p. 56). Others suggest <strong>government</strong> to be more focused on the political aspect yet<br />

without leav<strong>in</strong>g out the adm<strong>in</strong>istrative field. Accord<strong>in</strong>g to Beynon-Davies <strong>government</strong><br />

“connotes a political organization, which is comprised of the <strong>in</strong>dividuals and <strong>in</strong>stitutions<br />

that are authorised to formulate public policies and conduct affairs of state.<br />

Governments are normally tasked with establish<strong>in</strong>g and regulat<strong>in</strong>g the <strong>in</strong>terrelationships<br />

of <strong>in</strong>dividuals, groups and organisations with<strong>in</strong> the boundaries of some territory” (2007,<br />

p. 11). In Figure 3.2 <strong>government</strong> covers the two areas of formal politics and<br />

adm<strong>in</strong>istration. Public adm<strong>in</strong>istration denotes the sector, enterprises, and activities<br />

necessary <strong>in</strong> order to serve a <strong>government</strong> (Mar<strong>in</strong>i, 2000; Johnston, 2004). Here serv<strong>in</strong>g<br />

28


29<br />

Chapter 3<br />

implicates formulat<strong>in</strong>g, advis<strong>in</strong>g on, and implement<strong>in</strong>g <strong>government</strong>al policy, and<br />

manag<strong>in</strong>g resources. Thus, public adm<strong>in</strong>istration deals with all aspects of <strong>government</strong><br />

matters apart from the political, democratic issues. In Figure 3.2 public adm<strong>in</strong>istration<br />

covers the field referred to as adm<strong>in</strong>istration. Moreover, the thesis belongs to the<br />

adm<strong>in</strong>istration subfield, as we do not account for either formal politics or civil society.<br />

As for the designation of the concept, the literature does not offer a unique<br />

label for e-<strong>government</strong>. Examples of synonyms are digital <strong>government</strong> (e.g.,<br />

Marchion<strong>in</strong>i, Samet & Brandt, 2003), one-stop <strong>government</strong> (e.g., Glassey, 2002),<br />

eGovernment (e.g., Schellong, 2007), and onl<strong>in</strong>e <strong>government</strong> (e.g., Peres, Guzmán &<br />

Valbuena, 2009). Digital <strong>government</strong> appears to be the predom<strong>in</strong>ant term <strong>in</strong> the<br />

United States while electronic <strong>government</strong> is the preferred term elsewhere (Grönlund<br />

& Horan, 2004). Grönlund & Horan (2004) differentiate between e-<strong>government</strong> and egovernance.<br />

To illustrate the difference, they draw on Figure 3.2. In their def<strong>in</strong>ition,<br />

e-<strong>government</strong> covers adm<strong>in</strong>istration and perhaps formal politics, while e-governance<br />

embraces all three spheres. Though e-governance <strong>in</strong> this manner appears to be a<br />

broader concept, e-<strong>government</strong> as a term is more dom<strong>in</strong>at<strong>in</strong>g <strong>in</strong> the research field.<br />

Further, s<strong>in</strong>ce e-<strong>government</strong> <strong>in</strong> the def<strong>in</strong>ition of Grönlund & Horan (2004) suits the<br />

scope of the present paper well with our focus on adm<strong>in</strong>istrative <strong>government</strong>al<br />

employees, we will refer to it as e-<strong>government</strong> throughout the thesis. Further, our<br />

focus means that the operationalization of the concept is placed solely <strong>in</strong> the<br />

adm<strong>in</strong>istrative part of Figure 3.2. Due to the lack of agreement as to the term<strong>in</strong>ology,<br />

we use the predom<strong>in</strong>ant European choice of term and refer to the concept as e<strong>government</strong><br />

throughout the thesis. However, <strong>in</strong> the light of the diversities of the<br />

def<strong>in</strong>ition of the concept demonstrated above, we will draw on literature work<strong>in</strong>g with<br />

other def<strong>in</strong>itions as long as it falls with<strong>in</strong> the def<strong>in</strong>ition applied here.<br />

3.2 Subject areas <strong>in</strong> e-<strong>government</strong> research & development (R&D)<br />

The use of <strong>in</strong>formation technology <strong>in</strong> <strong>government</strong>al adm<strong>in</strong>istrations is not a<br />

new phenomenon. Rather, it has been go<strong>in</strong>g on for decades already (e.g., Kraemer &<br />

K<strong>in</strong>g, 1986; Andersen & Kraemer, 1994; Bellamy & Taylor, 1998). However, the term<br />

e-<strong>government</strong> was not <strong>in</strong>troduced until the late 1990s. The two eras have been<br />

considered divided for some time. Whether they still are, or if they are becom<strong>in</strong>g more<br />

<strong>in</strong>tegrated rema<strong>in</strong>s an issue of opposite op<strong>in</strong>ions (Grönlund & Horan, 2004; Andersen et<br />

al., 2005).


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

E-<strong>government</strong> may be seen as a natural consequence of the historical and<br />

technological development. Historically speak<strong>in</strong>g public adm<strong>in</strong>istration has from the<br />

late 1970s become <strong>in</strong>creas<strong>in</strong>gly market-oriented (Johnston & Callender, 1997; Box,<br />

1999; Johnston, 2004). In the wake of this change, the focus for public adm<strong>in</strong>istration<br />

have been on “organizational efficiency, the creation of <strong>in</strong>ternal market-style<br />

competitive conditions and the more purposive application of private-sector bus<strong>in</strong>ess<br />

techniques” (Johnston, 2004, p. 12510). Concurrently, <strong>in</strong>formation technology has<br />

developed rapidly, provid<strong>in</strong>g possibilities for technological support of the change of<br />

focus <strong>in</strong> public adm<strong>in</strong>istration<br />

In 2007, Schellong presents a modified model of the e-<strong>government</strong> hype cycle<br />

(see Figure 3.3). The model presents 2002-2003 as the po<strong>in</strong>t <strong>in</strong> time, where e<strong>government</strong><br />

peaked. The years before the peak lasted for approximately 7 years. Those<br />

years <strong>in</strong>troduced <strong>in</strong>formation sites, s<strong>in</strong>gle agencies onl<strong>in</strong>e services, and portals among<br />

other th<strong>in</strong>gs. In the period after the peak, some problematic issues needed to be dealt<br />

with, for <strong>in</strong>stance security issues and a low citizen uptake. However, this does not<br />

Figure 3.3 E-<strong>government</strong> hype cycle (Schellong, 2007)<br />

30


31<br />

Chapter 3<br />

mean, that the concept of e-<strong>government</strong> is not ongo<strong>in</strong>g anymore. However, it has rather<br />

been replaced by a more stable plateau of productivity with more advanced and<br />

technically demand<strong>in</strong>g solutions. Examples are <strong>in</strong>teroperability, enterprise architecture,<br />

and <strong>in</strong>tegrated data management (Schellong, 2007). The optimism identified <strong>in</strong> Heeks<br />

& Bailur’s (2006) <strong>in</strong>dicates a cont<strong>in</strong>ued belief <strong>in</strong> the potential of e-<strong>government</strong>.<br />

Investigat<strong>in</strong>g the potential of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods has potential to many<br />

subareas mentioned <strong>in</strong> the model.<br />

Though implement<strong>in</strong>g e-<strong>government</strong> is <strong>in</strong> focus across the world and across<br />

types of <strong>government</strong>s, the degree of implementation varies. In order to identify the<br />

stage of development, Layne & Lee (2001) have developed a four stage model (see<br />

Figure 3.4) encompass<strong>in</strong>g the technological and organizational complexity (rang<strong>in</strong>g<br />

from simple to complex) and the <strong>in</strong>tegration (sparse to complete). The model suggests<br />

that the relation between the two variables is proportional, that is, as the technological<br />

and organizational complexity <strong>in</strong>creases, so does the complexity of <strong>in</strong>tegration. The<br />

model expresses the technological level of public adm<strong>in</strong>istrations allow<strong>in</strong>g for different<br />

degrees of services to citizens. Layne & Lee take their po<strong>in</strong>t of departure <strong>in</strong> the first<br />

websites created by <strong>government</strong>s. Thus, the use of ICT before then is not reflected <strong>in</strong><br />

the model.<br />

The four steps conta<strong>in</strong>ed <strong>in</strong> Layne & Lee’s model comprises 1) Catalogue; 2)<br />

Transaction; 3) Vertical <strong>in</strong>tegration; and 4) Horizontal <strong>in</strong>tegration. The step<br />

“Catalogue” refers to the <strong>in</strong>troductory stage of e-<strong>government</strong>, where <strong>government</strong>s create<br />

websites with <strong>in</strong>formation about the <strong>government</strong>. At this step citizens and other<br />

stakeholders are helped with fact f<strong>in</strong>d<strong>in</strong>g. At this po<strong>in</strong>t <strong>in</strong> time there are different<br />

motivations for go<strong>in</strong>g onl<strong>in</strong>e. One reason is the possibility to provide external<br />

stakeholders with <strong>in</strong>formation that would otherwise have to be handled by front office<br />

employees. Another reason is the pressure and expectations from outside that<br />

<strong>in</strong>formation about the <strong>government</strong> can be found on the <strong>in</strong>ternet. At the second step,<br />

“Transaction”, we see the beg<strong>in</strong>n<strong>in</strong>g of onl<strong>in</strong>e transactions for <strong>government</strong> stakeholders.<br />

Thus, it becomes possible to carry out transactions <strong>in</strong> order to report one’s taxes and the<br />

like. The step is characterized by automation and digitalization of exist<strong>in</strong>g processes.<br />

“Vertical <strong>in</strong>tegration” is def<strong>in</strong>ed by a renovation of exist<strong>in</strong>g processes and an <strong>in</strong>creased<br />

degree of connection between <strong>government</strong> systems <strong>in</strong> order to enhance the services<br />

towards stakeholders. Also, the vertical <strong>in</strong>tegration allows for exchange of transaction


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Figure 3.4 Dimensions and stages <strong>in</strong> e-<strong>government</strong> (from Layne & Lee, 2001, p. 124)<br />

data across systems. At the f<strong>in</strong>al step, “Horizontal <strong>in</strong>tegration”, additional <strong>in</strong>tegration is<br />

developed. Aside from exchang<strong>in</strong>g data between <strong>government</strong>s, horizontal <strong>in</strong>tegration<br />

offers <strong>in</strong>tegration across <strong>government</strong> functions, e.g., <strong>in</strong> the form of one-stop services,<br />

that are able to meet the range of adm<strong>in</strong>istrative service needs follow<strong>in</strong>g from a life or<br />

bus<strong>in</strong>ess <strong>in</strong>cident (cf. Gouscos et al., 2003). It should be noted, that Layne & Lee’s<br />

model most likely differs a lot across countries. For <strong>in</strong>stance, the latest report from<br />

United Nations (2012) demonstrates geographical differences as to e-<strong>government</strong><br />

implementation levels. Also, Gil-Garcia & Mart<strong>in</strong>ez-Moyano (2007) hypothesize that<br />

the evolution of e-<strong>government</strong> depends on whether the context is at national, state, or<br />

local level <strong>in</strong>dicat<strong>in</strong>g, that e-<strong>government</strong> <strong>in</strong>itiatives start at national level and are s<strong>in</strong>ce<br />

followed up at state and local level of <strong>government</strong>. The geographical location and level<br />

of <strong>government</strong> will probably not have significant <strong>in</strong>fluence on the succession of the<br />

32


33<br />

Chapter 3<br />

steps but may result <strong>in</strong> differentiated grad<strong>in</strong>g <strong>in</strong> the model. As one among a number of<br />

e-<strong>government</strong> forerunners, Denmark is placed <strong>in</strong> the upper right corner of Layne &<br />

Lee’s model. To exemplify, a recent <strong>in</strong>vestigation among Danish municipal IT<br />

managers showed some prevalence of horizontal <strong>in</strong>tegration between <strong>in</strong>formation<br />

systems (Nielsen et al., 2009). Investigat<strong>in</strong>g automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> is <strong>in</strong> pr<strong>in</strong>ciple useful at<br />

all stages of the model.<br />

The expected outcome of digitaliz<strong>in</strong>g <strong>government</strong>s is impressive. Two ma<strong>in</strong><br />

potentials are cont<strong>in</strong>ual; changes <strong>in</strong> the communication between <strong>government</strong> and civil<br />

society, and more efficient work processes <strong>in</strong>ternally <strong>in</strong> <strong>government</strong>s. Or <strong>in</strong> a more<br />

simple form, the purpose of e-<strong>government</strong> is to deliver “<strong>government</strong> that works better<br />

and costs less” (Office of the Vice President, 1993, cited from Bellamy, 2002, p. 214).<br />

Thus, by <strong>in</strong>troduc<strong>in</strong>g e-<strong>government</strong>, <strong>government</strong>s aim to offer improved access to their<br />

services, make the most of their resources or perhaps even be able to reduce costs, and<br />

enhance democracy by improv<strong>in</strong>g the access to <strong>government</strong> employees and offer<strong>in</strong>g edemocracy<br />

(Edmiston, 2003). With the <strong>in</strong>troduction of e-<strong>government</strong>, a shift may be<br />

observed from <strong>government</strong> centric services towards more citizen (or other<br />

stakeholders) centred services. At the same time, the transparency of <strong>government</strong>al<br />

work is <strong>in</strong>tended to <strong>in</strong>crease along with the level of service. Thus, allow<strong>in</strong>g citizens to<br />

have access to <strong>government</strong> day and night is considered one way of <strong>in</strong>creas<strong>in</strong>g the level<br />

of service towards citizens (Bellamy, 2002). In consequence of this, researchers have<br />

started to <strong>in</strong>vestigate for <strong>in</strong>stance applications, changes <strong>in</strong> the adm<strong>in</strong>istrations and<br />

<strong>in</strong>teraction between <strong>government</strong> and civil society.<br />

The development of <strong>government</strong> processes, organization and technologies has<br />

been expected to change the work tasks of <strong>government</strong> employees. Before the dawn of<br />

e-<strong>government</strong> the concern about <strong>in</strong>formation technology and computerization of<br />

<strong>government</strong>s to a large extent regarded employment (e.g., Kraemer & Dedrick, 1997).<br />

Changes are still expected as a consequence of digitaliz<strong>in</strong>g <strong>government</strong>s. However,<br />

today the use of <strong>in</strong>formation technology is rather expected to affect the composition of<br />

work tasks for <strong>government</strong>al employees (Snellen, 2002; Dörfler, 2003; Marchion<strong>in</strong>i,<br />

Samet & Brandt, 2003; Brown, 2005; Landsforen<strong>in</strong>gen af Kommunale Servicecentre,<br />

2005; Mahler & Regan, 2005). This is supported by research based suggestions for<br />

process models that can support <strong>government</strong>s’ way of handl<strong>in</strong>g work tasks (e.g.,<br />

Palkovits, Woitsch & Karagiannis, 2003; Becker, Pfeiffer & Räckers, 2007).<br />

In 2005, the Danish National Association of Municipal Service Centres<br />

predicted a change <strong>in</strong> the work tasks of municipal e-<strong>government</strong> employees. Thus, due


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

to self-service solutions the number of complex situations was expected to <strong>in</strong>crease,<br />

because the citizens are tak<strong>in</strong>g care of more simple tasks themselves. Further, the<br />

share of tasks related to assist<strong>in</strong>g citizens, who are not able to use self-service were<br />

also expected to <strong>in</strong>crease (Landsforen<strong>in</strong>gen af Kommunale Servicecentre, 2005). This<br />

expectation is consistent with the results Grundén found when <strong>in</strong>terview<strong>in</strong>g employees<br />

<strong>in</strong> a Swedish County Adm<strong>in</strong>istration (2009). Here the need for assistance was<br />

expla<strong>in</strong>ed by the digital divide of the customers of the <strong>government</strong>. Also, Mahler &<br />

Regan (2005) expect changes due to digitalization of federal <strong>government</strong> agencies.<br />

Their conclusions are based on qualitative <strong>in</strong>terviews with agency staff <strong>in</strong> agencies<br />

with either strong or weak <strong>in</strong>ternet presence. They f<strong>in</strong>d that the expected <strong>in</strong>crease of<br />

compla<strong>in</strong>ts from citizens does not actually occur. Further, some, but not all citizens are<br />

able to f<strong>in</strong>d needed <strong>in</strong>formation on the agency websites and avert casework for the<br />

agency. In relation to the present work, this means that we cannot assume <strong>government</strong><br />

work tasks to have rema<strong>in</strong>ed unchanged. The possible change of tasks may also<br />

<strong>in</strong>fluence the <strong>in</strong>formation needs developed <strong>in</strong> terms of complexity (cf. Byströms<br />

f<strong>in</strong>d<strong>in</strong>gs, see section 4.4.5). As a consequence we cannot design a search test based on<br />

older seek<strong>in</strong>g studies <strong>in</strong> the doma<strong>in</strong> without further ado, at least as regards <strong>in</strong>formation<br />

needs. A validation of their cont<strong>in</strong>uous relevance will be needed.<br />

3.3 Stakeholders <strong>in</strong> e-<strong>government</strong><br />

The amount of e-<strong>government</strong> research constantly <strong>in</strong>creases. Reviews of the<br />

literature have proposed different ways of categoriz<strong>in</strong>g the research <strong>in</strong> order to<br />

systematize the research conducted. A common way of characteris<strong>in</strong>g studies of e<strong>government</strong><br />

is to divide the research as to the relation they express. Thus, a relation<br />

between one (or several) <strong>government</strong>s and a stakeholder are predom<strong>in</strong>antly articulated.<br />

The literature has suggested a number of different relations (e.g., Fang, 2002; Beynon-<br />

Davies, 2007). The primary emphasis <strong>in</strong> the e-<strong>government</strong> literature has been on<br />

citizens, bus<strong>in</strong>esses, and <strong>government</strong>s. The relations <strong>in</strong>dicate the <strong>government</strong> as the<br />

key communicator towards different recipient groups. This is stressed by the common<br />

way of denot<strong>in</strong>g the relations as G2C (<strong>government</strong>-to-citizen), G2B (<strong>government</strong>-tobus<strong>in</strong>ess),<br />

G2G (<strong>government</strong>-to-<strong>government</strong>) and so forth. This way of referr<strong>in</strong>g to the<br />

relations is <strong>in</strong>spired by the field of e-commerce, where B2B and B2C are common<br />

designations for bus<strong>in</strong>ess-to-bus<strong>in</strong>ess and bus<strong>in</strong>ess-to-consumer.<br />

34


1 People as service users<br />

2 People as citizens<br />

3 Bus<strong>in</strong>esses<br />

4 Small-to-medium sized enterprises<br />

5 Public adm<strong>in</strong>istrators (employees)<br />

6 Other <strong>government</strong> agencies<br />

7 Non-profit organizations<br />

8 Politicians<br />

9 E-Government project managers<br />

10 Design and IT developers<br />

11 Suppliers and partners<br />

12 Researchers and evaluators<br />

Table 3.1 Stakeholders <strong>in</strong> e-<strong>government</strong>. Adapted from Rowley (2011, p. 56)<br />

35<br />

Chapter 3<br />

The underly<strong>in</strong>g thought about e-<strong>government</strong> stakeholders is that their<br />

respective relations to <strong>government</strong>s differ as to their characteristics. Thus,<br />

<strong>government</strong>s cannot necessarily communicate the same way across different<br />

stakeholders. In her literature review of relations, Rowley proposes a thorough<br />

typology of stakeholders (see Table 3.1). It is stressed that stakeholders must be<br />

characterized as to the roles they play rather than as to the groups they form.<br />

Highlight<strong>in</strong>g roles <strong>in</strong> advance of groups allow for <strong>in</strong>dividuals and organizations to take<br />

different roles depend<strong>in</strong>g on the current situation, they engage <strong>in</strong>. The purpose of<br />

elaborat<strong>in</strong>g a typology of stakeholders is to be able to identify characteristics of<br />

specific stakeholders and allow for comparisons (Rowley, 2011). Further, the typology<br />

enables a more specific address<strong>in</strong>g of stakeholders, when their specific characteristics<br />

are described. In the present work we are concerned with one particular type of<br />

stakeholders, namely public adm<strong>in</strong>istrators (<strong>government</strong> employees). Rowley’s<br />

division of stakeholders just emphasizes that stakeholder groups differ. As a<br />

consequence seek<strong>in</strong>g behaviour identified <strong>in</strong> other stakeholder groups are not<br />

necessarily representative for the behaviour tak<strong>in</strong>g place among employees.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

3.4 LIS perspectives on e-<strong>government</strong><br />

Above we made a general <strong>in</strong>troduction to the concept of e-<strong>government</strong>.<br />

However two core LIS areas with<strong>in</strong> the <strong>government</strong> context make an important frame<br />

of reference to our further work. The areas comprise <strong>in</strong>formation systems, knowledge<br />

management and metadata schemas and standards. Further, s<strong>in</strong>ce our overall<br />

perspective is on employees, this perspective will also guide the presentation of LIS<br />

subject areas.<br />

3.4.1 Information systems<br />

The number of <strong>in</strong>formation systems <strong>in</strong> e-<strong>government</strong> is impressive. Bekkers &<br />

Homburg refer to the amount as “myriad registration functions” (2007, p. 374). The<br />

<strong>in</strong>formation systems are highly diverse <strong>in</strong> their nature and content (cf. Veal, 2001; Liu,<br />

Zhu & Gorton, 2007). Content <strong>in</strong>cludes for <strong>in</strong>stance statistical <strong>in</strong>formation,<br />

geographical <strong>in</strong>formation, legal materials, and <strong>in</strong>formation related to specific cases (e.g.,<br />

Bountouri et al., 2009). In addition, <strong>in</strong>formation systems are often designed with a<br />

specific adm<strong>in</strong>istration <strong>in</strong> m<strong>in</strong>d, <strong>in</strong> addition perhaps developed by the adm<strong>in</strong>istration<br />

itself (M<strong>in</strong>istry of f<strong>in</strong>ance, 2001). Also the designations applied to refer to types of<br />

<strong>in</strong>formation systems are remarkable. A thorough presentation of all system types is a<br />

comprehensive task and beyond the scope of the thesis. Instead we will present<br />

different ways of typologiz<strong>in</strong>g e-<strong>government</strong> <strong>in</strong>formation systems below. The purpose<br />

is to identify exist<strong>in</strong>g types of systems and to provide a context for the characterization<br />

of the system that is the subject for the search test <strong>in</strong> the empirical part of our work.<br />

The types of systems may be divided as to different characteristics. One way<br />

of characteriz<strong>in</strong>g <strong>in</strong>formation systems is as to whether they are back or front office<br />

systems. Front office systems are systems directed towards the customers of e<strong>government</strong>;<br />

citizens, bus<strong>in</strong>esses, and external organizations to the <strong>government</strong><br />

(Millard, 2003). Examples are citizen portals such as www.borger.dk or<br />

www.direct.gov.uk or bus<strong>in</strong>ess portals like www.virk.dk. A front end service <strong>in</strong> the<br />

form of a front end system is a product of the <strong>in</strong>troduction of e-<strong>government</strong>. Obviously,<br />

<strong>government</strong>s have always been communicat<strong>in</strong>g with citizens and bus<strong>in</strong>esses but<br />

previous to the <strong>in</strong>troduction of portals and other front end systems the communication<br />

took place <strong>in</strong> contact offices or through call centres (Codagnone & Wimmer, 2007).<br />

Back office processes are processes <strong>in</strong>ternal to the <strong>government</strong> <strong>in</strong> question. Back office<br />

processes comprises general management and account<strong>in</strong>g, but also process<strong>in</strong>g of<br />

36


37<br />

Chapter 3<br />

customers’ applications (Codagnone & Wimmer, 2007). Back office systems, then, is<br />

the designation for systems that supports <strong>in</strong>ternal processes of very diverg<strong>in</strong>g k<strong>in</strong>d.<br />

Further, the systems deliver the data communicated through front end systems. Back<br />

office systems themselves are commonly not visible to the <strong>government</strong> customers.<br />

Back end systems have been applied <strong>in</strong> <strong>government</strong>al adm<strong>in</strong>istrations for decades<br />

already. As we are test<strong>in</strong>g a back office system <strong>in</strong> the search test, we will focus on this<br />

type of systems below.<br />

Van de Donk & Snellen (1989) have presented a typology of knowledge based<br />

systems that is usable for discrim<strong>in</strong>at<strong>in</strong>g back office systems further. The suggested<br />

typology has been developed with<strong>in</strong> the doma<strong>in</strong> of public adm<strong>in</strong>istration. The<br />

background for the typologization is based on the elements that make up expertise <strong>in</strong><br />

comparison to laymen:<br />

“1. encyclopedic knowledge of facts and relationships concern<strong>in</strong>g a certa<strong>in</strong><br />

field;<br />

2. proficient reason<strong>in</strong>g as the basis of a diagnosis;<br />

3. practical short-circuit reason<strong>in</strong>g to arrive at a diagnosis;<br />

4. proficient reason<strong>in</strong>g as the basis for a solution;<br />

5. practical short-circuit reason<strong>in</strong>g to arrive at a solution” (van de Donk &<br />

Snellen, 1989, p. 4).<br />

In particular 3 and 5 differentiate the expert from the layman. On the basis of these<br />

characteristics three types of knowledge systems are suggested: Handl<strong>in</strong>g systems,<br />

advisory systems, and expert systems. Handl<strong>in</strong>g systems embrace items 1, 2, and 4<br />

above. Handl<strong>in</strong>g systems conta<strong>in</strong> facts related to specific cases. Cases are handled by<br />

be<strong>in</strong>g placed <strong>in</strong> a category, of which solutions are known or diagnoses can be made (van<br />

de Donk & Snellen, 1989). A core example of handl<strong>in</strong>g systems are electronic records<br />

management systems (ERMS) (also known as electronic document management<br />

systems (EDMS)), that support creation, captur<strong>in</strong>g, process<strong>in</strong>g, shar<strong>in</strong>g, and manag<strong>in</strong>g<br />

organizations’ records or documents (Gunnlaugsdottir, 2008; Hu et al., 2010). Advisory<br />

systems embrace items 1, 2, 3, and 4 above. Thus, compared to handl<strong>in</strong>g systems the<br />

possibility of arriv<strong>in</strong>g at a diagnosis for a problem makes the difference between the<br />

two system types. Advisory systems are useful, e.g., when there is uncerta<strong>in</strong>ty about the<br />

facts of a case or when the needed qualifications for reach<strong>in</strong>g a decision are vague.<br />

(van de Donk & Snellen, 1989). Expert systems <strong>in</strong> pr<strong>in</strong>ciple conta<strong>in</strong> all five elements<br />

mentioned above. Thus, they are also able to help users to arrive at solutions for


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

problems. Several characteristics differ advisory systems from expert systems. One<br />

ma<strong>in</strong> difference between the two systems types is that advisory systems support users’<br />

own decisions by provid<strong>in</strong>g access to data and models while expert systems offers<br />

decisions and conclusions. This is what leads Ford (1985, p. 26) to characterize<br />

advisory systems as more flexible than expert systems. In terms of Van de Donk &<br />

Snellen, the present system of <strong>in</strong>vestigation may be characterized as an expert system,<br />

as it conta<strong>in</strong>s documents support<strong>in</strong>g the professional and legal basis for the employees.<br />

Also organizational <strong>in</strong>formation is conta<strong>in</strong>ed, while <strong>in</strong>formation related to specific cases<br />

are stored <strong>in</strong> other systems.<br />

Saxena & Aly (1995) embrace a wider variety of systems <strong>in</strong> their typology.<br />

The context of their work is public adm<strong>in</strong>istration <strong>in</strong>clud<strong>in</strong>g policy plann<strong>in</strong>g, policy<br />

implementation, and policy adm<strong>in</strong>istration. The typology counts:<br />

1. Adm<strong>in</strong>istrative process<strong>in</strong>g systems (APS): are able to process large amounts<br />

of data <strong>in</strong> order to support adm<strong>in</strong>istrative rout<strong>in</strong>es, typically <strong>in</strong> the form of<br />

statistical compilation systems or transaction process<strong>in</strong>g systems.<br />

2. Management report<strong>in</strong>g systems (MRS): offer <strong>in</strong>formation for management for<br />

rout<strong>in</strong>e, structured, and expected decisions. Compared to APS, who are more<br />

oriented towards data and efficiency, MRS is rather characterized by<br />

<strong>in</strong>formation and effectiveness.<br />

3. Decision support systems (DSS): assist users <strong>in</strong> decision mak<strong>in</strong>g by offer<strong>in</strong>g<br />

technological support <strong>in</strong> order for users to become able to develop <strong>in</strong>dividual<br />

decision models, databases, and report formats.<br />

4. Group decision support systems (GDSS): are the group equivalent to DSS.<br />

GDSS are commonly used to refer to systems that support group work such as<br />

communication, <strong>in</strong>formation shar<strong>in</strong>g, generation of ideas and so forth.<br />

5. Executive support systems (ESS): As <strong>in</strong>dicated by the name these systems<br />

offers top executive direct access to management reports, <strong>in</strong>formation and<br />

mail services without the connect<strong>in</strong>g l<strong>in</strong>k of an <strong>in</strong>termediary of some sort.<br />

6. Expert systems (ES): <strong>in</strong>itiates human processes of reason<strong>in</strong>g <strong>in</strong> a form, that<br />

could also be handled by human experts. In other words, ES are able to<br />

supplement or even replace human experts. Expert systems take the form of<br />

either handl<strong>in</strong>g systems or advisory systems (cf. van de Donk & Snellen,<br />

1989) (Saxena & Aly, 1995, p. 280-281).<br />

38


39<br />

Chapter 3<br />

One may question Saxena & Aly’s <strong>in</strong>terpretation of handl<strong>in</strong>g systems and advisory<br />

systems as examples of ES. In the <strong>in</strong>troduction made by van de Donk & Snellen<br />

(1989) we rather see handl<strong>in</strong>g systems as an example of APS and advisory systems as<br />

equivalent to either DSS or GDSS. This is the reason for our overall plac<strong>in</strong>g of the test<br />

system as an ES, also <strong>in</strong> terms of Saxena & Aly, though on the basis of their<br />

description of the system type.<br />

The application of systems depends on whether the context of use is policy<br />

plann<strong>in</strong>g, implementation, or adm<strong>in</strong>istration. Here we are concerned with public<br />

adm<strong>in</strong>istration <strong>in</strong> the form of policy adm<strong>in</strong>istration. Accord<strong>in</strong>g to Saxena & Aly, the<br />

relevant systems for this sub area are APS: transaction process<strong>in</strong>g systems (TPS),<br />

transaction summary <strong>in</strong>formation (TPS-TSI) and detailed transaction lists (TPS-DTI);<br />

DSS, and ES. However, one must keep <strong>in</strong> m<strong>in</strong>d, that it is a complex assignment to put<br />

forward an unequivocal typology due to the great variety of tasks carried out by public<br />

adm<strong>in</strong>istrations even with<strong>in</strong> policy adm<strong>in</strong>istration. The actual system use <strong>in</strong> a real life<br />

adm<strong>in</strong>istration may thus differ as to the typology. To draw a parallel to our<br />

characterization of the test system above, the system also conta<strong>in</strong>s <strong>in</strong>formation that is<br />

not necessarily ES oriented as just mentioned.<br />

In accordance with the focus on efficiency and effectiveness <strong>in</strong> e-<strong>government</strong><br />

<strong>in</strong>itiatives and systems obviously need to be evaluated with the purpose of justification.<br />

Thus, <strong>in</strong>formation systems need to function as <strong>in</strong>tended <strong>in</strong> order to be able to support<br />

efficiency and effectiveness. Evaluation may help discover <strong>in</strong>expediencies <strong>in</strong> the<br />

system, but also to <strong>in</strong>form the developers on the strengths and weaknesses of the<br />

system as regards users’ use of the system. Evaluation consequently constitutes a<br />

rather <strong>in</strong>evitable direction <strong>in</strong> the e-<strong>government</strong> literature on <strong>in</strong>formation systems. The<br />

literature on evaluation takes two forms. One is concerned with evaluation of specific<br />

systems. The other represents a methodological perspective, support<strong>in</strong>g researchers<br />

with tools for evaluat<strong>in</strong>g either prototypes or systems already <strong>in</strong> use. Evaluation of<br />

specific systems is either carried out when a new system is proposed or when the<br />

system has been <strong>in</strong> function for some time. Examples are Floropoulos et al.’s (2010)<br />

evaluation of the Greek Tax Information system (TAXIS) from an employee<br />

perspective, Hu et al.’s (2010) evaluation of agency satisfaction with an ERMS, and


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Quam’s (2001) exam<strong>in</strong>ation of citizens’ use of Bridges 1 . The LIS literature has<br />

outl<strong>in</strong>ed directions for and analyses of system evaluation (e.g., Robertson & Hancock-<br />

Beaulieu, 1992; Kelly, 2009). But core e-<strong>government</strong> also offers methods for<br />

evaluation. For <strong>in</strong>stance Goh et al. (2008) have developed a checklist that can be used<br />

to evaluate the degree of knowledge management <strong>in</strong> e-<strong>government</strong> portals. The<br />

evaluation carried out <strong>in</strong> the search test to follow has been designed with established<br />

LIS evaluation methods as the foundation. A prototype is tested, that is, the system<br />

had not been <strong>in</strong> function among the employees at SKAT at the time of the test<strong>in</strong>g.<br />

3.4.2 Knowledge management<br />

Knowledge management designates the process of identify<strong>in</strong>g and controll<strong>in</strong>g<br />

organizational knowledge <strong>in</strong> order to support the competitiveness of bus<strong>in</strong>esses (de<br />

Groot, 2003). The attempts to manage knowledge <strong>in</strong> organizations have arose from<br />

problems with ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g, locat<strong>in</strong>g and apply<strong>in</strong>g knowledge <strong>in</strong> a systematic manner<br />

(Alavi & Leidner, 2001). Competitiveness may not be a core issue <strong>in</strong> public<br />

organizations as such. However, a clear parallel exists between the measurements of<br />

private sector outcome <strong>in</strong> the form of competitiveness and public sector measurements<br />

of effectiveness. This is reflected <strong>in</strong> the literature on knowledge management <strong>in</strong> e<strong>government</strong>.<br />

Thus, though orig<strong>in</strong>at<strong>in</strong>g from private sector bus<strong>in</strong>esses, knowledge<br />

management have been widely adopted <strong>in</strong> the public sector. In spite of fundamental<br />

differences between the goals of private and public organizations, knowledge<br />

management also has the potential of improv<strong>in</strong>g effectiveness, efficiency, and consumer<br />

satisfaction <strong>in</strong> a <strong>government</strong> context (Ha & Zenebe, 2008). These benefits are very<br />

much <strong>in</strong> l<strong>in</strong>e with the desired outcome of e-<strong>government</strong> (cf. Goh et al., 2008; Ha &<br />

Zenebe, 2008). Includ<strong>in</strong>g knowledge management as one of the future oriented themes<br />

po<strong>in</strong>ted out by eGovRTD2020 2 reflects the relevance of the concept to e-<strong>government</strong><br />

(cf. Dawes, 2009).<br />

However, <strong>government</strong>s are usually organized <strong>in</strong> a more complex manner than<br />

bus<strong>in</strong>esses. This larger degree of complexity may affect the realization of knowledge<br />

management (Ha & Zenebe, 2008). Conversely, the complexity may underp<strong>in</strong> the need<br />

1 M<strong>in</strong>nesota’s Gateway to Environmental Information (http://www.bridges.state.mn.us/, accessed on 19-<br />

06-2012).<br />

2 eGovRTD2020 is a research project funded by the EU with the purpose of<br />

40


41<br />

Chapter 3<br />

of a systematic way of handl<strong>in</strong>g organizational knowledge by mak<strong>in</strong>g visible knowledge<br />

that would otherwise be hidden. In this respect, work tasks that cross <strong>government</strong><br />

boundaries comprise a particular challenge (cf. Peel & Rowley, 2010). De Groot (2003,<br />

p. 95) accumulates the results of not be<strong>in</strong>g able to access employees’ knowledge to be:<br />

“...knowledge is available only to small group of people, [k]nowledge is often not<br />

available to the people who need certa<strong>in</strong> knowledge, [and] [e]mployees are overloaded<br />

with irrelevant <strong>in</strong>formation”. Also more tangible factors like f<strong>in</strong>ancial and time<br />

constra<strong>in</strong>ts may dare the realization of knowledge management <strong>in</strong> <strong>government</strong>s<br />

(Hazlett, McAdam & Beggs, 2008). However, as Southon, Todd & Seneque’s (2002)<br />

study of two private and one public organization shows, the management of knowledge<br />

can also be challenged <strong>in</strong> private organizations.<br />

Knowledge management is fundamentally a construct of organization theory.<br />

Knowledge management concerns both tacit knowledge and explicit knowledge<br />

Table 3.2 Knowledge management processes and the potential role of IT. Adapted from Alavi &<br />

Leidner (2001, p. 125)<br />

KM processes Knowledge<br />

creation<br />

Support<strong>in</strong>g<br />

<strong>in</strong>formation<br />

technologies<br />

Data m<strong>in</strong><strong>in</strong>g<br />

Learn<strong>in</strong>g<br />

tools<br />

IT enables Comb<strong>in</strong><strong>in</strong>g<br />

new sources<br />

of knowledge<br />

Just <strong>in</strong> time<br />

learn<strong>in</strong>g<br />

Knowledge<br />

storage and<br />

retrieval<br />

Electronic<br />

bullet<strong>in</strong> boards<br />

Knowledge<br />

repositories<br />

Databases<br />

Support of<br />

<strong>in</strong>dividual and<br />

organizational<br />

memory<br />

Inter-group<br />

knowledge<br />

access<br />

Knowledge<br />

transfer<br />

Electronic<br />

bullet<strong>in</strong> boards<br />

Discussion<br />

forums<br />

Knowledge<br />

directories<br />

More extensive<br />

<strong>in</strong>ternal network<br />

More<br />

communication<br />

channels<br />

available<br />

Faster access to<br />

knowledge<br />

sources<br />

Knowledge<br />

application<br />

Expert systems<br />

Workflow<br />

systems<br />

Knowledge can<br />

be applied <strong>in</strong><br />

many locations<br />

More rapid<br />

application of<br />

new knowledge<br />

through<br />

workflow<br />

automation


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

manifested <strong>in</strong> some k<strong>in</strong>d physical form, usually as documents (Cong & Pandya, 2003).<br />

As regards the latter type, <strong>in</strong>formation systems <strong>in</strong> the form of knowledge management<br />

systems are commonly applied to support the process (Abecker et al., 1998; Alavi &<br />

Leidner, 2001; Mart<strong>in</strong>, 2008). Employees <strong>in</strong> <strong>government</strong>s and <strong>government</strong><br />

“customers”; citizens (cf. Yang et al., 2006), but also bus<strong>in</strong>esses, <strong>in</strong>terest groups and the<br />

like may earn the benefits of <strong>government</strong> knowledge management.<br />

Groupware, communication technologies, and specifically <strong>in</strong>tranets are<br />

important ICT based tools <strong>in</strong> terms of mediat<strong>in</strong>g knowledge management (Alavi &<br />

Leidner, 2001, p. 125). Four processes constitute knowledge management: knowledge<br />

creation, knowledge storage and retrieval, knowledge transfer, and knowledge<br />

application (see Table 3.2). The search test system takes the form of an <strong>in</strong>tranet with<br />

different functions. First of all, <strong>in</strong> terms of Alavi & Leidner, the system is a repository<br />

of knowledge. Both organizational and specialist knowledge is conta<strong>in</strong>ed. However,<br />

also sub portal regard<strong>in</strong>g topics of relevance to the employees are conta<strong>in</strong>ed. Therefore,<br />

storage and retrieval, transfer and application of knowledge are supported by the<br />

system. However, <strong>in</strong> the search test we are primarily concerned with the support of<br />

retrieval of knowledge, <strong>in</strong>clud<strong>in</strong>g how both <strong>in</strong>dividual and organizational memory are<br />

supported. F<strong>in</strong>d<strong>in</strong>gs may be made as to knowledge transfer and knowledge application,<br />

but they are not the object of our <strong>in</strong>vestigations.<br />

3.4.3 ICT tools: Metadata <strong>in</strong>itiatives<br />

On way of support<strong>in</strong>g <strong>in</strong>formation retrieval is to mark up pieces of <strong>in</strong>formation<br />

by means of metadata The pr<strong>in</strong>ciple of describ<strong>in</strong>g and represent<strong>in</strong>g <strong>in</strong>formation units <strong>in</strong><br />

order to be able to retrieve known items, explore new ones, and establish relations<br />

between items reaches far back <strong>in</strong> LIS (cf., Haynes, 2004). Referr<strong>in</strong>g to the assigned<br />

data as metadata came <strong>in</strong>to play along with the <strong>in</strong>troduction of electronic resources (El-<br />

Sherb<strong>in</strong>i & Klim, 2004). Thus, one of the first <strong>in</strong>cidences of the term metadata appears<br />

<strong>in</strong> the beg<strong>in</strong>n<strong>in</strong>g of the 1990’es (Gilliland-Swetland, 2005).<br />

Information units can be characterized as to their content, context, and<br />

structure. The content expresses what the <strong>in</strong>formation is about. The content is<br />

considered <strong>in</strong>tr<strong>in</strong>sic to the <strong>in</strong>formation unit. The context on the other hand is considered<br />

extr<strong>in</strong>sic to the <strong>in</strong>formation and is associated with the creation of the <strong>in</strong>formation. Whquestions<br />

may help mapp<strong>in</strong>g the contextual issues of the <strong>in</strong>formation. The structure of<br />

the <strong>in</strong>formation may be either <strong>in</strong>tr<strong>in</strong>sic or extr<strong>in</strong>sic or both and expresses formal<br />

associations <strong>in</strong>side one <strong>in</strong>formation unit or across several units (Gilliland, 2008). The<br />

42


43<br />

Chapter 3<br />

purpose of add<strong>in</strong>g metadata is to be able to “arrange, describe, track and otherwise<br />

enhance access to <strong>in</strong>formation objects” (Gilliland, 2008, p. 2). NISO (2004) applies a<br />

slightly different tripartition to characterize metadata. NISO divides metadata <strong>in</strong>to<br />

descriptive, structural, and adm<strong>in</strong>istrative metadata. Here descriptive metadata<br />

describes the <strong>in</strong>formation unit <strong>in</strong> order to support discovery and identification.<br />

Structural metadata has the purpose of <strong>in</strong>dicat<strong>in</strong>g the relation between compound<br />

objects such as the order<strong>in</strong>g of pages to form chapters. Adm<strong>in</strong>istrative metadata<br />

supports the management of resources by <strong>in</strong>form<strong>in</strong>g about creation, file type, technical<br />

<strong>in</strong>formation and access <strong>in</strong>formation. The difference of perspective between Gilliland<br />

and NISO is caused by their difference of application. Gilliland’s tripartition is aimed<br />

at the LIS sector, while NISO rather is applied for <strong>in</strong>teroperability and other technically<br />

oriented contexts. Haynes (2004) suggests a further elaboration on the purpose of<br />

metadata and identifies five core areas of application: 1) resource description, 2)<br />

<strong>in</strong>formation retrieval, 3) management of <strong>in</strong>formation, 4) rights management, ownership<br />

and authenticity, and 5) <strong>in</strong>teroperability and e-commerce. The extent of Haynes’<br />

identification thus appears more thorough <strong>in</strong> that it comprises the perspectives of<br />

Gilliland and NISO at the same time.<br />

Metadata formats differ as to their level of complexity. At the lowest level of<br />

complexity we f<strong>in</strong>d full text <strong>in</strong>dexes based on the documents conta<strong>in</strong>ed <strong>in</strong> the <strong>in</strong>dexed<br />

<strong>in</strong>formation system. Full text <strong>in</strong>dexes at the lowest level due to the lack of structure <strong>in</strong><br />

the metadata. The next level of complexity conta<strong>in</strong>s simple, structured formats. This<br />

medium level does not necessarily require professionals for metadata assignment. An<br />

example is the Dubl<strong>in</strong> Core metadata standard designed for mark-up of <strong>in</strong>ternet<br />

resources. The highest level of complexity standards conta<strong>in</strong>s more detailed and<br />

structured standards. Examples are doma<strong>in</strong> specific standards that aim at characteriz<strong>in</strong>g<br />

the <strong>in</strong>formation units <strong>in</strong> a more detailed manner as for example the MARC format<br />

(Dempsey & Heery, 1998). In the most complex group of standards the assignment of<br />

metadata requires a thorough knowledge of the format. Hence it cannot be carried out<br />

by novices.<br />

Metadata developed for specific doma<strong>in</strong>s are referred to as doma<strong>in</strong>-specific<br />

metadata. Doma<strong>in</strong>-specific metadata have been developed for various fields such as<br />

museums, archives, and mov<strong>in</strong>g pictures (e.g., Vellucci, 1998; Haynes, 2004).<br />

However, also with<strong>in</strong> e-<strong>government</strong> metadata has received quite some attention as a<br />

means of improv<strong>in</strong>g access to <strong>government</strong>al <strong>in</strong>formation. Metadata is considered<br />

particularly challeng<strong>in</strong>g <strong>in</strong> e-<strong>government</strong> due to the heterogeneity of the user group


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

(Alasem, 2009). A number of <strong>in</strong>fluential nations have developed standards for metadata<br />

with the aim of support<strong>in</strong>g e-<strong>government</strong>.<br />

The forerunner for the <strong>in</strong>troduction of <strong>government</strong> metadata <strong>in</strong>itiatives was the<br />

global <strong>in</strong>formation locator service (GILS) <strong>in</strong>itiative presented <strong>in</strong> the early 1990es. The<br />

<strong>in</strong>tention beh<strong>in</strong>d GILS was to outl<strong>in</strong>e a standard for localiz<strong>in</strong>g <strong>in</strong>formation that was<br />

applicable to different doma<strong>in</strong>s <strong>in</strong>clud<strong>in</strong>g <strong>government</strong>s. GILS is based on a set of<br />

metadata <strong>in</strong> order to support semantic mapp<strong>in</strong>g, locator records, and <strong>in</strong>teroperability.<br />

The <strong>in</strong>spiration for GILS is to a large extent <strong>in</strong>spired by the pr<strong>in</strong>ciples of bibliographic<br />

catalogu<strong>in</strong>g (Christian, 1999, 2001). In the United States a large project, <strong>government</strong><br />

<strong>in</strong>formation locator service (also referred to as GILS 3 ), was <strong>in</strong>itiated <strong>in</strong> the mid-1990es<br />

(Andrews & Duhon, 1997; Moen, 2001). The stepp<strong>in</strong>g stone for the project was a<br />

politically based decision about paper reduction <strong>in</strong> the United States. The <strong>government</strong><br />

<strong>in</strong>formation locator service was heavily based on the GILS. The service was<br />

thoroughly evaluated <strong>in</strong> 1996-1997 with a number of purposes, among other th<strong>in</strong>gs<br />

understand<strong>in</strong>g how GILS worked as a tool for <strong>in</strong>formation resources management and<br />

how GILS served different user groups (Moen & McClure, 1997). The evaluation<br />

<strong>in</strong>dicated that the level of implementation at the time of the evaluation still left room for<br />

improvement. In particular, the implementation was uneven and diverse <strong>in</strong> nature <strong>in</strong> the<br />

adm<strong>in</strong>istrations selected as evaluation units, which is hardly surpris<strong>in</strong>g the span of time<br />

taken <strong>in</strong>to account. Thus, the problems were not necessarily caused by <strong>in</strong>expediencies<br />

<strong>in</strong> the service itself but rather by local characteristics of the adm<strong>in</strong>istrations.<br />

The development of specific e-<strong>government</strong> metadata has cont<strong>in</strong>ued across the<br />

World throughout the 2000’s (Tambouris, Manouselis & Costopoulou, 2007; Alasem,<br />

2009). In many cases the Dubl<strong>in</strong> Core metadata standard (Weibel, 1997) has<br />

constituted a central element (Alasem, 2009). Dubl<strong>in</strong> Core conta<strong>in</strong>s 15 data elements<br />

and may thus be considered a simple format for metadata compared to for <strong>in</strong>stance the<br />

highly detailed MARC format. In Australia the standard for <strong>government</strong> metadata,<br />

Australian Government Locator Service (AGLS), was <strong>in</strong>itiated <strong>in</strong> 1997. Instead of<br />

follow<strong>in</strong>g the GILS, Australia developed a standard based on the Dubl<strong>in</strong> Core standard<br />

(Haynes, 2004; National Archives of Australia, 2010). Also the European Union has<br />

3<br />

In order to avoid confusion we will refer to the acronym GILS, when designation the Global<br />

Information Locator Service. The Government Information Locator Service will be designated by its<br />

full name.<br />

44


45<br />

Chapter 3<br />

developed a mark-up language (GovML) <strong>in</strong> a 2-year project funded by the European<br />

Commission. GovML is based on an open XML document structure (Kavadias &<br />

Tambouris, 2003; Glassey, 2004).<br />

In Denmark, the National IT and Telecom Agency has functioned as advisors<br />

for <strong>government</strong>al offices with<strong>in</strong> the framework of the FESD project. The purpose was to<br />

give directions and recommendations for digitaliz<strong>in</strong>g <strong>government</strong>s with specific focus<br />

on implement<strong>in</strong>g electronic document management systems (EDMS) (Center for<br />

effektiviser<strong>in</strong>g og digitaliser<strong>in</strong>g, 2002; Ste<strong>in</strong>mark, 2005). However, apply<strong>in</strong>g the<br />

guidel<strong>in</strong>es was not mandatory as also <strong>in</strong>dicated by the choice of term<strong>in</strong>ology. Likewise,<br />

apply<strong>in</strong>g the <strong>government</strong> <strong>in</strong>formation locator service profile <strong>in</strong> the United States was<br />

voluntary. Some American states have adopted it while others have applied alternative<br />

solutions (Moen, 2001). Recently, the Danish <strong>in</strong>itiatives have concerned standardiz<strong>in</strong>g<br />

of data by means of OIOXML, a XML standard developed with the specific purpose of<br />

exchang<strong>in</strong>g and reus<strong>in</strong>g data across adm<strong>in</strong>istrations (National IT and Telecom Agency,<br />

2009). Obviously, an important presupposition for enabl<strong>in</strong>g exchange of data between<br />

adm<strong>in</strong>istrations is <strong>in</strong>teroperability.<br />

Fewer <strong>in</strong>itiatives have been taken <strong>in</strong> order to standardize descriptive metadata<br />

<strong>in</strong> e-<strong>government</strong>. The <strong>in</strong>itiatives are commonly proposed as a component of enterprise<br />

architecture and takes the form of ontologies (Peristeras, Tatabanis & Goudos, 2009).<br />

Ontologies are considered a type of KOS (see section 5.3.2) though with different<br />

characteristics compared to e.g. taxonomies and thesauri (Soergel, 1999; Gilchrist,<br />

2003; Haynes, 2004; Zeng, 2008). In Denmark, FORM has been developed that<br />

conta<strong>in</strong>s a common language for exchange between Danish <strong>government</strong>s. FORM is the<br />

Danish acronym for Jo<strong>in</strong>t Cross Governmental Bus<strong>in</strong>ess Reference Model (cf., OECD,<br />

2010). In their paper, Abecker et al. (1998) outl<strong>in</strong>e three levels for characteriz<strong>in</strong>g<br />

ontologies: Information, doma<strong>in</strong>, and enterprise. FORM is characterized as an<br />

enterprise ontology by virtue of its identification of work tasks carried out across the<br />

entire Danish public sector (cf., Gilchrist, 2003; OECD, 2010). At present FORM is<br />

applied <strong>in</strong> the national portal to the public doma<strong>in</strong> borger.dk. A number of similar<br />

<strong>in</strong>itiatives and tools have been developed <strong>in</strong> other countries (cf., Peristeras, Tatabanis &<br />

Goudos, 2009). As appears for above, the undertak<strong>in</strong>gs regard<strong>in</strong>g metadata have to a<br />

large extent been concerned with the development of standards. The evaluation of<br />

<strong>in</strong>itiatives has received less attention <strong>in</strong> the research literature. In this sense, the present<br />

project can <strong>in</strong>crease our understand<strong>in</strong>g of the role of metadata, when profession users<br />

seek <strong>in</strong>formation.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

3.5 Summary<br />

E-<strong>government</strong> is a fairly new <strong>in</strong>terdiscipl<strong>in</strong>ary research field compris<strong>in</strong>g<br />

research fields such as social sciences, computer science, public adm<strong>in</strong>istration,<br />

organization studies. The field started out some 20 years ago and <strong>in</strong> the <strong>in</strong>terven<strong>in</strong>g<br />

time it has been consolidated with academic journals and conferences. The po<strong>in</strong>t of<br />

departure of the research field was an <strong>in</strong>creased focus on effectivity and efficiency of<br />

<strong>government</strong>s along with demands for transparency of public adm<strong>in</strong>istrations. To some<br />

extent the development of e-<strong>government</strong> has been <strong>in</strong>spired by e-commerce, that is, the<br />

bus<strong>in</strong>ess world. However, the two worlds differ as to a number of characteristics.<br />

Examples count the number and types of stakeholders and the complexity of<br />

organizations. Thus experiences cannot be directly transferred between the two areas.<br />

The present PhD project is placed with<strong>in</strong> <strong>in</strong>formation science, which also<br />

shapes the approach to e-<strong>government</strong>. A variety of system types exists that supports e<strong>government</strong>.<br />

The system that hosts the comparative test of categorization <strong>in</strong> the PhD<br />

project is an <strong>in</strong>tranet. Overall, we characterize it as an expert system due to different<br />

def<strong>in</strong>itions above. However, as it is an <strong>in</strong>tranet, other objects are conta<strong>in</strong>ed <strong>in</strong> the<br />

database too. However, the character of the system places the system as a tool for<br />

knowledge management. Here, we are ma<strong>in</strong>ly concerned with the retrieval of<br />

knowledge. We test the system with professional employees. From Rowley we have<br />

learned that many stakeholders exist with<strong>in</strong> e-<strong>government</strong> and that they do not<br />

necessarily act the same as regards <strong>in</strong>formation seek<strong>in</strong>g. Further we have seen that the<br />

<strong>in</strong>troduction of e-<strong>government</strong> most likely has meant a change of work tasks for<br />

employees. Together this makes demands for the design of the search test. Lastly, the<br />

<strong>in</strong>vestigations of metadata <strong>in</strong> e-<strong>government</strong> have to a large extent been concerned with<br />

metadata formats and to a less degree with descriptive metadata. Further, the exist<strong>in</strong>g<br />

knowledge of the mean<strong>in</strong>g of metadata <strong>in</strong> e-<strong>government</strong> <strong>in</strong>formation seek<strong>in</strong>g is limited.<br />

Therefore it is our aim to add to this knowledge by means of the project.<br />

46


4 Seek<strong>in</strong>g behaviour <strong>in</strong> e-<strong>government</strong><br />

47<br />

Chapter 4<br />

Information seek<strong>in</strong>g constitutes a core research field <strong>in</strong> the user oriented research<br />

tradition (e.g., Ingwersen, 1996; Åström, 2007). Further, <strong>in</strong>formation seek<strong>in</strong>g has been<br />

studied <strong>in</strong> LIS for decades (Ingwersen & Järvel<strong>in</strong>, 2005). Thus, ARIST started out with<br />

annual reviews on <strong>in</strong>formation needs and uses <strong>in</strong> 1966. Though the reviews on the<br />

subject only had an annual frequency until 1972 the ever <strong>in</strong>creas<strong>in</strong>g number of research<br />

articles and reviews on the subject states the importance of the research field.<br />

Studies of <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> the context of e-<strong>government</strong> serve different<br />

purposes. One purpose is the evaluation of (digital) <strong>in</strong>formation services. Are they<br />

be<strong>in</strong>g used and how? Does the use reflect the <strong>in</strong>tentions beh<strong>in</strong>d the service? Another<br />

purpose is to characterize the use of <strong>in</strong>formation and <strong>in</strong>formation services <strong>in</strong> order to<br />

enable meet<strong>in</strong>g this use, when design<strong>in</strong>g new <strong>in</strong>itiatives (e.g., Wilson, 1999).<br />

The purpose of the present chapter is to supplement the prior theoretical<br />

chapter with a review of <strong>in</strong>formation seek<strong>in</strong>g studies with<strong>in</strong> e-<strong>government</strong>. With the<br />

present chapter we want to provide an overview of the current state of knowledge<br />

concern<strong>in</strong>g professional users of <strong>in</strong>formation <strong>in</strong> the context of e-<strong>government</strong>. We<br />

<strong>in</strong>troduce the chapter with a def<strong>in</strong>ition of the concept of <strong>in</strong>formation seek<strong>in</strong>g as it serves<br />

as the frame of reference for review<strong>in</strong>g studies of <strong>in</strong>formation seek<strong>in</strong>g. Next follows a<br />

presentation of the coverage of different e-<strong>government</strong> stakeholders’ seek<strong>in</strong>g behaviour.<br />

The purpose of this subsection is to compare the amount of knowledge of other<br />

stakeholders to the particular group <strong>in</strong> question here: employees. The brief comparison<br />

is followed by a review of the current state of knowledge about the <strong>in</strong>formation seek<strong>in</strong>g<br />

of e-<strong>government</strong> employees. We f<strong>in</strong>ish the fourth chapter with a summary.<br />

4.1 Information seek<strong>in</strong>g and related concepts<br />

Information seek<strong>in</strong>g designates “the conscious effort to acquire <strong>in</strong>formation <strong>in</strong><br />

response to a need or gap <strong>in</strong> your knowledge” (Case, 2007, p. 5). Further, <strong>in</strong>formation<br />

seek<strong>in</strong>g describes “the variety of methods people employ to discover, and ga<strong>in</strong> access to<br />

<strong>in</strong>formation resources…” (Wilson, 1999, p. 263). Two concepts are closely related to


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Figure 4.1 Nested model of <strong>in</strong>formation seek<strong>in</strong>g and <strong>in</strong>formation search<strong>in</strong>g (Wilson, 1999, p. 263)<br />

<strong>in</strong>formation seek<strong>in</strong>g; <strong>in</strong>formation behavior and <strong>in</strong>formation search<strong>in</strong>g. Information<br />

behavior is superord<strong>in</strong>ate to <strong>in</strong>formation seek<strong>in</strong>g. The concept is considered a part of<br />

general human communication behavior and may be def<strong>in</strong>ed as “the more general field<br />

of <strong>in</strong>vestigation…” (Wilson, 1999, p. 263). Information search<strong>in</strong>g on the other hand is<br />

subord<strong>in</strong>ate to <strong>in</strong>formation seek<strong>in</strong>g and represents the situation, when a user <strong>in</strong>teracts<br />

with an <strong>in</strong>formation system <strong>in</strong> order to solve a need for <strong>in</strong>formation. S<strong>in</strong>ce consult<strong>in</strong>g an<br />

IR system is one among more possible ways to solve an <strong>in</strong>formation need, <strong>in</strong>formation<br />

search<strong>in</strong>g must be characterized as potentially conta<strong>in</strong>ed <strong>in</strong> <strong>in</strong>formation seek<strong>in</strong>g. The<br />

relation between the three concepts is illustrated <strong>in</strong> Figure 4.1. The figure is based on<br />

analyses of a number of exist<strong>in</strong>g models and therefore serves as a metamodel.<br />

It is implicit to the concept of <strong>in</strong>formation seek<strong>in</strong>g that it occurs when a subject<br />

has experienced some sort of gap <strong>in</strong> their knowledge and a need for <strong>in</strong>formation has<br />

arisen. Information needs as the trigger<strong>in</strong>g element have been a common po<strong>in</strong>t of<br />

departure for studies of <strong>in</strong>formation seek<strong>in</strong>g and search<strong>in</strong>g. The concept of <strong>in</strong>formation<br />

need denotes a problematic situation which, unless the problem is very simple or<br />

rout<strong>in</strong>e, causes an <strong>in</strong>formation need (MacMull<strong>in</strong> & Taylor, 1984, p. 93). Different<br />

theories of the nature of the <strong>in</strong>formation need have been presented. Taylor (1968) has<br />

outl<strong>in</strong>ed four stages to characterize an <strong>in</strong>formation need (Q1-Q4). Libraries can apply<br />

the stages <strong>in</strong> order to help the user at which ever stage his <strong>in</strong>formation need is. Belk<strong>in</strong><br />

48


49<br />

Chapter 4<br />

Oddy & Brooks’ (1982) contribution is concerned with the background of the<br />

<strong>in</strong>formation need. They have put forward the ASK hypothesis depict<strong>in</strong>g that an<br />

<strong>in</strong>formation need arises from an anomaly <strong>in</strong> the user’s state of knowledge. The idea<br />

beh<strong>in</strong>d the hypothesis is, that it will be easier for the user to describe the anomaly<br />

<strong>in</strong>stead of describ<strong>in</strong>g the <strong>in</strong>formation need <strong>in</strong> the language of the <strong>in</strong>formation system<br />

(Belk<strong>in</strong>, Oddy & Brooks, 1982, p. 62). Also Ingwersen (1986a) has offered an<br />

empirically based typology of <strong>in</strong>formation needs of users. The typology comprises<br />

three different types; verificative <strong>in</strong>formation needs, conscious topical <strong>in</strong>formation<br />

needs, and muddled topical <strong>in</strong>formation needs. Orig<strong>in</strong>ally the identification of the three<br />

types was based on empirical results from library users. When hav<strong>in</strong>g a verificative<br />

<strong>in</strong>formation need the user wants to locate or verify an item. The user possesses<br />

characteristic bibliographic data on the item wanted. The conscious topical <strong>in</strong>formation<br />

need refers to a situation, where “the user wants to clarify, review or pursue aspects of<br />

known subject matter”. F<strong>in</strong>ally, the muddled topical <strong>in</strong>formation need describes a user<br />

want<strong>in</strong>g to explore new concepts outside of subject matters known to the user ahead of<br />

the <strong>in</strong>formation need. Recently, Ingwersen & Järvel<strong>in</strong> (2005, p. 289-293) have added<br />

further specification to the theories of the <strong>in</strong>formation need. Here, three dimensions<br />

characteriz<strong>in</strong>g the <strong>in</strong>formation need have been identified, namely the user’s<br />

<strong>in</strong>tentionality beh<strong>in</strong>d the search task (whether search<strong>in</strong>g for source contents or data<br />

entities), the type of knowledge known by the user (whether declarative and/or<br />

procedural doma<strong>in</strong> knowledge), and the quality of the user’s current knowledge<br />

(whether well- or ill-def<strong>in</strong>ed). Comb<strong>in</strong>ations of the three dimensions lead the authors to<br />

specify eight different <strong>in</strong>formation need types rang<strong>in</strong>g from different known item<br />

searches to muddled types.<br />

From the 1990s the concept of task has ga<strong>in</strong>ed attention as to expla<strong>in</strong><strong>in</strong>g<br />

<strong>in</strong>formation seek<strong>in</strong>g and <strong>in</strong>formation search<strong>in</strong>g (Vakkari, 2003). Information needs and<br />

seek<strong>in</strong>g strongly depend on the underly<strong>in</strong>g task, which expla<strong>in</strong>s the relevance of the<br />

concept <strong>in</strong> seek<strong>in</strong>g studies. Tasks may have been implicit <strong>in</strong> earlier theories of the<br />

<strong>in</strong>formation need formation (cf., Byström & Järvel<strong>in</strong>, 1995). However, it is with the<br />

empirically based identification of types of tasks and their consequences for <strong>in</strong>formation<br />

seek<strong>in</strong>g actions that the value of tasks as a qualified methodical alternative to<br />

<strong>in</strong>formation needs as the po<strong>in</strong>t of departure for studies of <strong>in</strong>formation seek<strong>in</strong>g has been<br />

proven (see e.g., Byström & Järvel<strong>in</strong>, 1995; Byström, 1997, 2002).


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

4.2 The purpose of seek<strong>in</strong>g studies<br />

The study of <strong>in</strong>formation seek<strong>in</strong>g is important, because it provides important<br />

knowledge about users of <strong>in</strong>formation. This knowledge is essential when develop<strong>in</strong>g<br />

<strong>in</strong>formation services regardless the choice of channel. Thus, studies of <strong>in</strong>formation<br />

seek<strong>in</strong>g may <strong>in</strong>form the design process s<strong>in</strong>ce they are able to specify the navigational<br />

structure and data needed by a particular user group <strong>in</strong> order for them to be able to<br />

localize specific <strong>in</strong>formation (cf. Rouse & Rouse, 1984; Wilson, 1999). But, as po<strong>in</strong>ted<br />

out by Wilson (1981, p. 7), studies of <strong>in</strong>formation seek<strong>in</strong>g can also stand alone as basic<br />

research, not necessarily with any practical applications or implications but rather<br />

<strong>in</strong>creas<strong>in</strong>g our knowledge on why users act the way that they do. This second type of<br />

studies <strong>in</strong> particular expresses the change of approach <strong>in</strong> <strong>in</strong>formation seek<strong>in</strong>g studies.<br />

Thus, the focus of <strong>in</strong>formation seek<strong>in</strong>g studies has moved away from exam<strong>in</strong><strong>in</strong>g the<br />

artifacts of <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> what is referred to as system-centered research.<br />

Recent studies rather emphasize the <strong>in</strong>formation user <strong>in</strong> the user-centered research<br />

tradition of <strong>in</strong>formation seek<strong>in</strong>g studies (e.g., Case, 2007; Courtright, 2007; Vakkari,<br />

1999). Along with this change of emphasis towards the users of <strong>in</strong>formation, the<br />

context for <strong>in</strong>formation seek<strong>in</strong>g has received more attention. Taylor’s (1991) paper on<br />

<strong>in</strong>formation use environments po<strong>in</strong>ts out the differences <strong>in</strong> use and perception of<br />

<strong>in</strong>formation <strong>in</strong> different groups of users, suggest<strong>in</strong>g that <strong>in</strong>formation seek<strong>in</strong>g must be<br />

studied with po<strong>in</strong>t of departure <strong>in</strong> specific user groups.<br />

4.3 Entities of e-<strong>government</strong>: studies of seek<strong>in</strong>g behavior<br />

Information seek<strong>in</strong>g behaviour <strong>in</strong> general has been thoroughly discovered<br />

with<strong>in</strong> library and <strong>in</strong>formation science. One area of seek<strong>in</strong>g studies have been focus<strong>in</strong>g<br />

on the seek<strong>in</strong>g behaviour <strong>in</strong> relation to work contexts, e.g., eng<strong>in</strong>eers and lawyers (e.g.,<br />

Case, 2007). But also the area of e-<strong>government</strong> has been the subject of <strong>in</strong>vestigation.<br />

We have previously outl<strong>in</strong>ed the stakeholders of e-<strong>government</strong> (see section 3.3).<br />

Rowley’s (2011) typology has been presented <strong>in</strong> order to be able to, among others,<br />

identify the differences between needs or demands, that characterize different<br />

stakeholders. In this section we will apply the typology as a tool for categoriz<strong>in</strong>g<br />

studies of seek<strong>in</strong>g behavior with<strong>in</strong> e-<strong>government</strong>. The purpose of <strong>in</strong>troduc<strong>in</strong>g the<br />

typology <strong>in</strong> the present chapter is to outl<strong>in</strong>e the research coverage of the different<br />

stakeholders as to their patterns of <strong>in</strong>formation seek<strong>in</strong>g. Obviously, s<strong>in</strong>ce the typology<br />

50


51<br />

Chapter 4<br />

has not been developed with this particular purpose <strong>in</strong> m<strong>in</strong>d, not all roles are necessarily<br />

relevant as objects of <strong>in</strong>vestigation <strong>in</strong> the present framework. For <strong>in</strong>stance we do not<br />

expect to f<strong>in</strong>d seek<strong>in</strong>g studies <strong>in</strong>vestigat<strong>in</strong>g roles that are not subject to <strong>government</strong><br />

services. By this, we particularly mean the meta actors compris<strong>in</strong>g the last four roles <strong>in</strong><br />

Rowley’s typology, namely project managers, design and IT developers, suppliers and<br />

partners, and researchers and evaluators. Further, s<strong>in</strong>ce this chapter serves the function<br />

of sett<strong>in</strong>g the stage for our empirical doma<strong>in</strong> study, we are limit<strong>in</strong>g the follow<strong>in</strong>g review<br />

to geographic locations that share level of development with Denmark, which is the<br />

geographic location of our case organization.<br />

We have already mentioned (section 3.3) that citizens, bus<strong>in</strong>esses, and<br />

<strong>government</strong>s have received much attention <strong>in</strong> the e-<strong>government</strong> research literature.<br />

Among others, this is also reflected <strong>in</strong> seek<strong>in</strong>g studies of e-<strong>government</strong> stakeholders; <strong>in</strong><br />

particular the seek<strong>in</strong>g and search<strong>in</strong>g behavior of citizens is well discovered. Also<br />

politicians elected for office have been rather well discovered. Table 4.1 presents<br />

studies exemplify<strong>in</strong>g seek<strong>in</strong>g studies of different stakeholders. Employees have been<br />

left out of the table s<strong>in</strong>ce we are go<strong>in</strong>g more <strong>in</strong>to detail with this particular stakeholder<br />

role from section 4.4 and onwards. The division of the typology does have some<br />

<strong>in</strong>fluence on how seek<strong>in</strong>g studies can be placed. For <strong>in</strong>stance citizens are divided as to<br />

whether they are general citizens or users of a particular service. This means that the<br />

studies that can be placed <strong>in</strong> the latter group are ma<strong>in</strong>ly search<strong>in</strong>g studies reflect<strong>in</strong>g the<br />

use and often also evaluation of a particular service. The evaluative character of the<br />

latter type of studies also means that they do not necessarily <strong>in</strong>clude search<strong>in</strong>g behavior<br />

per se, such as selection of search terms or modification of queries.<br />

4.4 E-<strong>government</strong> employee <strong>in</strong>formation seek<strong>in</strong>g<br />

A number of selection criteria have guided the <strong>in</strong>clusion and exclusion of studies <strong>in</strong><br />

this review. We have previously mentioned the diverg<strong>in</strong>g maturity levels of e<strong>government</strong><br />

at national levels. In our review we are focus<strong>in</strong>g on countries that have a<br />

maturity level similar to Denmark. It would be reasonable to argue that an even<br />

narrower geographical delimitation would be required due to the specific<br />

characteristics <strong>in</strong> the Scand<strong>in</strong>avian adm<strong>in</strong>istrative tradition (cf. Arellano-Gault & del<br />

Castillo-Vega, 2004). However, s<strong>in</strong>ce we are concerned with seek<strong>in</strong>g behaviour <strong>in</strong><br />

relation to carry<strong>in</strong>g out adm<strong>in</strong>istrative work tasks and not the adm<strong>in</strong>istrative tradition


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 4.1 Examples of studies that have exam<strong>in</strong>ed <strong>in</strong>formation seek<strong>in</strong>g and/or search<strong>in</strong>g of various<br />

stakeholder roles.<br />

Stakeholder Author(s) Object of study Methods applied<br />

People: Service<br />

users<br />

Fu, Farn &<br />

Chao (2006)<br />

Hu et al.<br />

(2008)<br />

Wang &<br />

Shih (2009)<br />

People: Citizens Jaeger &<br />

Thompson<br />

(2004)<br />

Bus<strong>in</strong>esses and<br />

Small and<br />

Medium sized<br />

Enterprises<br />

Reddick<br />

(2005)<br />

Chau, Fang<br />

& Sheng<br />

(2007)<br />

Citizens’ acceptance of e-tax<br />

fil<strong>in</strong>g<br />

Determ<strong>in</strong>ants of service<br />

quality <strong>in</strong> e-tax fil<strong>in</strong>g<br />

Factors <strong>in</strong>fluenc<strong>in</strong>g citizens’<br />

use of <strong>government</strong><br />

<strong>in</strong>formation kiosks<br />

E-<strong>government</strong> non-users<br />

among citizens<br />

Citizens’ <strong>in</strong>teraction with e<strong>government</strong><br />

Citizens’ use of a particular<br />

website: Utah.gov<br />

Ra<strong>in</strong>s (2008) Citizens’ <strong>in</strong>formation<br />

behavior when look<strong>in</strong>g for<br />

health related <strong>in</strong>formation<br />

Cuillier &<br />

Piotrowski<br />

(2009)<br />

College students’, <strong>in</strong>ternet<br />

volunteers’ and citizens’<br />

general seek<strong>in</strong>g behavior and<br />

perceptions of access to<br />

<strong>government</strong> <strong>in</strong>formation<br />

Ren (1999) SME executives’ use of<br />

<strong>government</strong> <strong>in</strong>formation<br />

sources<br />

52<br />

Survey questionnaire<br />

Two-stage onl<strong>in</strong>e<br />

survey of citizens<br />

Survey questionnaire<br />

Literary study<br />

Citizen telephone<br />

surveys<br />

Log analysis<br />

Survey questionnaire<br />

In-class paper<br />

questionnaires,<br />

onl<strong>in</strong>e surveys, and<br />

phone surveys<br />

Survey questionnaire


Stakeholder Author(s) Object of study Methods applied<br />

Non-profit<br />

organizations<br />

Elwood<br />

(2008)<br />

Politicians Nicholas &<br />

Colgrave<br />

(1996)<br />

The particular challenges<br />

connected to meet<strong>in</strong>g the<br />

data needs of grass root<br />

organizations<br />

Nikoi (2008) NGO-workers’<br />

<strong>in</strong>formation needs<br />

Orton,<br />

Marcella &<br />

Baxter<br />

(2000)<br />

Askim<br />

(2007; 2009)<br />

Information needs of<br />

British local councilors<br />

Parliamentary members<br />

<strong>in</strong> the United K<strong>in</strong>gdom<br />

53<br />

Chapter 4<br />

Observation of 2<br />

organizations and semistructured<br />

<strong>in</strong>terviews of<br />

respondents surround<strong>in</strong>g<br />

the organizations<br />

Interviews, observation,<br />

and analyses of the content<br />

of <strong>in</strong>formation already<br />

gathered by the respondents<br />

Interviews and subsequent<br />

survey questionnaire<br />

Case study of two<br />

parliamentary members<br />

<strong>in</strong>clud<strong>in</strong>g observation and<br />

log analysis<br />

per se we f<strong>in</strong>d it reasonable to <strong>in</strong>clude studies from other adm<strong>in</strong>istrative traditions as<br />

well. F<strong>in</strong>ally, we have not limited the <strong>in</strong>cluded studies as to their time of publication.<br />

The rationale beh<strong>in</strong>d this decision is the presence of ICT <strong>in</strong> <strong>government</strong>s that dates far<br />

back. Thus, employees have been us<strong>in</strong>g ICT as a part of their work for decades<br />

already. Therefore we suppose, that studies that have been carried out before the<br />

<strong>in</strong>troduction of the concept of e-<strong>government</strong> may offer valuable <strong>in</strong>sights <strong>in</strong>to the<br />

seek<strong>in</strong>g behaviour of our target group.<br />

As it will appear from the sections to follow, the amount of research<br />

conducted of <strong>in</strong>formation seek<strong>in</strong>g of employees with<strong>in</strong> e-<strong>government</strong> is limited.<br />

Therefore we will supplement the review with studies of related and relevant user<br />

groups that can enrich our uncover<strong>in</strong>g of e-<strong>government</strong> employees. One area we are<br />

draw<strong>in</strong>g on is seek<strong>in</strong>g studies of professions with similar characteristics to the context<br />

<strong>in</strong> question here. Also, we will consult studies of e-<strong>government</strong> employees that are not


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

core seek<strong>in</strong>g or search<strong>in</strong>g studies, but still contribute to our knowledge of the target<br />

group <strong>in</strong> question. Not all studies strictly <strong>in</strong>vestigate employees. We also see<br />

examples of studies that <strong>in</strong>form us about employees, but at the same time <strong>in</strong>clude other<br />

stakeholders such as politicians or citizens (e.g., Marcella et al., 2007). These studies<br />

will be <strong>in</strong>cluded <strong>in</strong> the review given that they are able to add to the knowledge about<br />

employees. Numerous studies exist on <strong>in</strong>formation needs and seek<strong>in</strong>g behaviour<br />

with<strong>in</strong> medical health professionals (a review of recent studies can be found <strong>in</strong> Fourie,<br />

2009). Though medical health may be considered a subcategory of e-<strong>government</strong>,<br />

these studies will not be considered <strong>in</strong> the review s<strong>in</strong>ce the nature of the applied<br />

<strong>in</strong>formation diverges considerably from the <strong>in</strong>formation employed by the user group<br />

that is <strong>in</strong> focus here.<br />

4.4.1 Project INISS<br />

In the late seventies, Wilson et al (e.g., Wilson & Streatfield, 1977; Wilson,<br />

1980) performed a large observational study of social workers and social<br />

adm<strong>in</strong>istrators; project INISS. Wilson’s participation project INISS (<strong>in</strong>formation<br />

needs and <strong>in</strong>formation services <strong>in</strong> local authority social services departments) had the<br />

purpose of exam<strong>in</strong><strong>in</strong>g <strong>in</strong>formation needs and <strong>in</strong>formation behaviour among social<br />

workers and social adm<strong>in</strong>istrators (Wilson, 1980, p. 199). The results were supposed<br />

to be used for improv<strong>in</strong>g and develop<strong>in</strong>g <strong>in</strong>formation system organization and<br />

<strong>in</strong>formation service delivery (Wilson, 1980, p. 199). The project was carried out <strong>in</strong> a<br />

selected set of British local authorities departments represent<strong>in</strong>g both urban and rural<br />

departments. Furthermore the test persons reflected different categories of employees<br />

(Wilson, 1980, p. 203). 22 subjects were observed us<strong>in</strong>g structured observation,<br />

provid<strong>in</strong>g 6.000 records of communication events (Wilson & Streatfield, 1977, p. 282).<br />

The study is primarily a study of <strong>in</strong>formation behaviour. Hence, it is concerned with<br />

multiple aspects of the work situation of the subjects be<strong>in</strong>g studied. Still there are<br />

elements of <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> the results, e.g. when referr<strong>in</strong>g to the role of current<br />

awareness bullet<strong>in</strong>s (Wilson & Streatfield, 1977, p. 285).<br />

The study shows, that 74% of all sessions last 5 m<strong>in</strong>utes or less (Wilson &<br />

Streatfield, 1977, p. 284). The issue of limited time is still noted years after Wilson &<br />

Streatfield’s study (see e.g., Quirchmayr & Traunmüller, 1991). In addition, the social<br />

service staff members stress the importance of clearly and succ<strong>in</strong>ctly presented texts,<br />

preferably <strong>in</strong> a format, which makes the identification of key elements easy accessible<br />

(Wilson, 1980, p. 211). As regards <strong>in</strong>formation needs, the study <strong>in</strong>dicates, that<br />

54


55<br />

Chapter 4<br />

<strong>in</strong>formation needs among the participants are more complex that just verificative needs.<br />

The study f<strong>in</strong>ds that topical needs, whether conscious or muddled are present among the<br />

participants. However, this does not exclude the presence of verificative <strong>in</strong>formation<br />

needs <strong>in</strong> the social services departments. Put another way, this <strong>in</strong>dicates that the work<br />

tasks of the employees generate both verificative and topical <strong>in</strong>formation needs. In the<br />

personal files observed <strong>in</strong> the study, a number of different <strong>in</strong>formation types are to be<br />

found; e.g. committee papers, pamphlets, reports, and statistics (Wilson, 1980, p. 211).<br />

It means that the length of the s<strong>in</strong>gle units of <strong>in</strong>formation is vary<strong>in</strong>g.<br />

4.4.2 System development <strong>in</strong> the Danish Parliament<br />

In 1989 and onwards Ingwersen (1994) worked as a consultant on a project<br />

regard<strong>in</strong>g the <strong>in</strong>troduction of a new <strong>in</strong>formation system <strong>in</strong> the Danish Parliament. The<br />

project <strong>in</strong>cluded the design and development of a thesaurus. As an <strong>in</strong>troductory part of<br />

the project the <strong>in</strong>formation structure and work<strong>in</strong>g processes of people employed <strong>in</strong> the<br />

Parliament were analysed. For this particular purpose a user and doma<strong>in</strong> analysis was<br />

carried out. The empirical basis for the analysis comprised <strong>in</strong>terviews with 32<br />

respondents on the basis of a structured questionnaire. The respondents were <strong>in</strong> some<br />

cases groups of respondents result<strong>in</strong>g <strong>in</strong> 32+ respondents. The respondents comprise<br />

members of parliament (MP’s), assistants, and secretariat employees. The questionnaire<br />

consisted of four parts: 1) Characteristics of respondents, 2) Quality of <strong>in</strong>formation<br />

(critical <strong>in</strong>cident); 3) Quality of <strong>in</strong>formation (<strong>in</strong> general), and 4) Types of subject terms<br />

and subject levels.<br />

The results of the user and doma<strong>in</strong> analysis primarily <strong>in</strong>form us about the<br />

search<strong>in</strong>g behaviour of the participants. Thus, the study gives directions for the<br />

required functionalities of the future <strong>in</strong>formation system to be implemented. What<br />

characterizes the documents of the organization is that they are connected <strong>in</strong> a highly<br />

complex manner reflect<strong>in</strong>g the different stages <strong>in</strong> the law-mak<strong>in</strong>g process. An<br />

important feature of the system is that it allows for high precision searches. Thus, the<br />

respondents consider it an important facility of search<strong>in</strong>g that they are able to identify a<br />

specific document, when descriptive data or subject data are known <strong>in</strong> advance. This<br />

demand is connected to the participants’ need to be able to limit search results as much<br />

as possible. The paper outl<strong>in</strong>es different ways of reach<strong>in</strong>g this goal. One way is to<br />

assign several controlled <strong>in</strong>dex terms to each document, allow<strong>in</strong>g for discrim<strong>in</strong>ation<br />

between documents. Here one should note the need to discrim<strong>in</strong>ate between documents.<br />

Another parameter is the short w<strong>in</strong>dow of currency of the documents expressed by the


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

respondents. Thus, the participants consider the documents of the <strong>in</strong>formation system<br />

as outdated after two years or even less. Thirdly, exhaustive descriptive metadata need<br />

to be available <strong>in</strong> order to support searches, when one or several descriptive<br />

characteristics of the document are known.<br />

Another result of the study regards potential terms for the future thesaurus.<br />

Here differences of op<strong>in</strong>ion were expressed as to what makes a synonym. Thus, the<br />

respondents were presented with different synonyms <strong>in</strong> the fourth part of the <strong>in</strong>terview<br />

that they marked as either related or not related terms. It appears that different<br />

employment functions did not agree on the relations of specific terms. MPs <strong>in</strong> several<br />

cases seemed more certa<strong>in</strong> than the rema<strong>in</strong><strong>in</strong>g two groups of employment as to when<br />

synonyms were related and when they were not. Thus, when apply<strong>in</strong>g thesauri or other<br />

controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages, one must take <strong>in</strong>to account that sub groups of an<br />

organization may differ as to their perception of relations between concepts despite that<br />

they work with the same subject areas. On the other hand all three groups represented<br />

<strong>in</strong> the study share the op<strong>in</strong>ion that popular expressions for laws and political concepts<br />

must be present <strong>in</strong> the thesaurus.<br />

Figure 4.2 Comprehensive model of <strong>in</strong>formation seek<strong>in</strong>g. Adapted from Johnson et al. (1995).<br />

56


57<br />

Chapter 4<br />

4.4.3 Information behavior of employees <strong>in</strong> a eng<strong>in</strong>eer<strong>in</strong>g and technical service<br />

<strong>government</strong> office<br />

Johnson et al.’s (1995) study takes it’s po<strong>in</strong>t of departure <strong>in</strong> the employees of a<br />

<strong>government</strong>al agency concerned with eng<strong>in</strong>eer<strong>in</strong>g and technical services. The <strong>in</strong>tention<br />

beh<strong>in</strong>d the study is to <strong>in</strong>vestigate the background factors that affect <strong>in</strong>formation seek<strong>in</strong>g<br />

actions. The dependent variable of the study, <strong>in</strong>formation seek<strong>in</strong>g actions, <strong>in</strong>volves two<br />

aspects, namely scope, that is, the range of people consulted <strong>in</strong> order to access<br />

<strong>in</strong>formation, and depth, that is, the amount of <strong>in</strong>formation sought. In this sense the<br />

study is not a seek<strong>in</strong>g study per se. Rather, the study <strong>in</strong>forms us about the factors, that<br />

affect seek<strong>in</strong>g behavior. The model tested <strong>in</strong> the study appears from Figure 4.2.<br />

The empirical basis of the study <strong>in</strong>cludes 380 responses to a survey<br />

questionnaire. 26 percent did not respond to the questionnaire. The respondents were<br />

characterized as hav<strong>in</strong>g a fairly long seniority <strong>in</strong> the organization. Also, the<br />

communication <strong>in</strong> the organization is extensive along with <strong>in</strong>terpersonal and group<br />

<strong>in</strong>terdependence. The tests of the model show that the strongest paths exist between<br />

characteristics and action, and between cultural beliefs and characteristics. As regards<br />

the former path relation it means that among the tested <strong>in</strong>dependent variables, the<br />

respondents’ assessment of the quality of communication channels guides the amount of<br />

<strong>in</strong>formation and people approached <strong>in</strong> order to solve an <strong>in</strong>formation need. Though not<br />

specifically expressed <strong>in</strong> the paper we assume that the relation between the two is<br />

<strong>in</strong>versely proportional mean<strong>in</strong>g that with high quality of the channels less <strong>in</strong>formation<br />

and people need to be approached. The latter path relation moves a step backwards <strong>in</strong><br />

the model from the former relation and expresses a strong relation between cultural<br />

beliefs and characteristics. Thus, the relation documents that for the respondent group<br />

the cultural conception of a channel decides on the subsequent assessment of the<br />

channel. What is also found <strong>in</strong> the study is that some modifications need to be made to<br />

the proposed model. Thus, some of the variables put forward <strong>in</strong> the left column of<br />

Figure 4.2 do not take the path through utility. Instead, they have a direct effect on both<br />

characteristics (<strong>in</strong> the middle column) and actions (right column). This f<strong>in</strong>d<strong>in</strong>g suggests<br />

that the variables <strong>in</strong> the left column can be seen as important to several stages of the<br />

process outl<strong>in</strong>ed <strong>in</strong> the model. In other words, we can see demographics, direct<br />

experience, beliefs, and salience as direct <strong>in</strong>dicators of consulted channels <strong>in</strong><br />

<strong>in</strong>formation seek<strong>in</strong>g.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

4.4.4 Federal, state, and local policy makers’ selection of <strong>in</strong>formation sources<br />

The purpose of Oh’s (1996) study is to <strong>in</strong>vestigate which factors have an<br />

<strong>in</strong>fluence on selection of <strong>in</strong>formation <strong>in</strong> bureaucracies, Also the relation between the<br />

identified <strong>in</strong>fluenc<strong>in</strong>g factors is under study. Thus, the study is similar to the study by<br />

Johnson et al. (1995). Oh’s study <strong>in</strong>forms us about the seek<strong>in</strong>g behaviour of<br />

<strong>government</strong> employees. On the basis of exist<strong>in</strong>g theory and results a theoretical path<br />

model is created to expla<strong>in</strong> <strong>in</strong>formation selection. One assumption beh<strong>in</strong>d the study is<br />

that dist<strong>in</strong>ct policy area affects the selection of <strong>in</strong>formation. This is the reason for<br />

test<strong>in</strong>g the path model <strong>in</strong> two different policy areas with<strong>in</strong> mental health, namely<br />

delivery and f<strong>in</strong>anc<strong>in</strong>g of mental health service. The two policy areas are selected on<br />

the basis of the assumption that the former primarily comprises generalists while the<br />

latter ma<strong>in</strong>ly <strong>in</strong>clude specialists. The method applied <strong>in</strong> the study is twofold. First, a<br />

series of open ended <strong>in</strong>terviews were carried out. The <strong>in</strong>terviews were subsequently<br />

coded. The purpose of the first study was to discover basic <strong>in</strong>formation about the policy<br />

mak<strong>in</strong>g process of the implied policy areas. Second, a series of questionnaires were<br />

carried out <strong>in</strong> order to be able to test the path model.<br />

The results demonstrate that the generalist and the specialist groups do have<br />

some features <strong>in</strong> common while they differ at other po<strong>in</strong>ts. The characteristics of<br />

selection of <strong>in</strong>formation sources compris<strong>in</strong>g both generalists and specialists are that<br />

<strong>in</strong>ternal sources are preferred regardless of the preced<strong>in</strong>g knowledge of the problem at<br />

hand. Also it appears that the education of the employees more likely affects the<br />

selection of sources compared to age. Also the type of <strong>in</strong>formation sought for<br />

<strong>in</strong>fluences the selection process of the respondents. The <strong>in</strong>fluence does not just address<br />

which sources are selected, but also the number of sources selected. Thus, some<br />

<strong>in</strong>formation types require search<strong>in</strong>g of more sources than others. As mentioned above<br />

the two groups differ at some po<strong>in</strong>ts. The specialists have a greater probability of<br />

compar<strong>in</strong>g different sources, when search<strong>in</strong>g for <strong>in</strong>formation than do the generalists.<br />

One reason for this is the specialists’ need for ensur<strong>in</strong>g the reliability and validity of the<br />

collected <strong>in</strong>formation.<br />

It is the differences between specialists and generalists that lead Oh to sum up<br />

that “the factors <strong>in</strong>fluenc<strong>in</strong>g selection of <strong>in</strong>formation sources strongly differ between the<br />

two policy areas”, suggest<strong>in</strong>g, that future studies must take this difference <strong>in</strong>to account.<br />

However, s<strong>in</strong>ce this f<strong>in</strong>d<strong>in</strong>g has only been verified as to employees’ selection of sources<br />

we are not try<strong>in</strong>g to make the same dist<strong>in</strong>ction <strong>in</strong> our doma<strong>in</strong> study and search test. The<br />

58


59<br />

Chapter 4<br />

major reason for this is that our field of study comprises more general seek<strong>in</strong>g and<br />

search<strong>in</strong>g behaviour which results <strong>in</strong> a slightly different focus compared to Oh’s study.<br />

4.4.5 F<strong>in</strong>nish municipal employees<br />

As a part of her dissertation work, Byström (1999) conducted a study of two<br />

F<strong>in</strong>nish local (municipal) <strong>government</strong>s. The study has been presented with different<br />

foci across a number of works (Byström, 1997, 1999, 2002). Therefore we will base the<br />

present section on a comb<strong>in</strong>ation of these three publications. Us<strong>in</strong>g diary, <strong>in</strong>terview,<br />

organizational document review and observation (Byström, 1997, p. 132) 54 (80 of the<br />

cases from the pilot are <strong>in</strong>cluded) cases handled by 19 officials are analyzed. Data on<br />

the cases were collected from the moment they arrived at the registrar’s office<br />

(Byström, 1999, p. 67-68). In the study Byström focuses on <strong>in</strong>formation seek<strong>in</strong>g, and<br />

she is not specifically concerned with the actual <strong>in</strong>formation search<strong>in</strong>g. Among other<br />

th<strong>in</strong>gs, Byström analyzes <strong>in</strong>formation needs, the relation between the complexity of<br />

work tasks and the subject expertise of the participants <strong>in</strong> the study (1999, p. 85). As<br />

expected with a group of fairly experienced participants, the subject expertise is <strong>in</strong> a lot<br />

of cases rather large.<br />

The results of the study are <strong>in</strong>terpreted as to a theoretical frame regard<strong>in</strong>g type<br />

of work tasks and types of <strong>in</strong>formation needed. Five types of work tasks of <strong>in</strong>creased<br />

complexity were appo<strong>in</strong>ted for the study, namely automatic <strong>in</strong>formation process<strong>in</strong>g<br />

tasks, normal <strong>in</strong>formation process<strong>in</strong>g tasks, normal decision tasks, known-genu<strong>in</strong>e<br />

decision tasks, and unknown-genu<strong>in</strong>e decision tasks. The <strong>in</strong>creased level of complexity<br />

is expressed <strong>in</strong> terms of a subject’s level of a priory determ<strong>in</strong>ability of the <strong>in</strong>formation<br />

needed, the <strong>in</strong>formation seek<strong>in</strong>g process, and the expected outcome of the seek<strong>in</strong>g<br />

process. Three <strong>in</strong>formation types are also specified for the <strong>in</strong>formation needed. These<br />

<strong>in</strong>clude task <strong>in</strong>formation (or s<strong>in</strong>gle task related), doma<strong>in</strong> <strong>in</strong>formation (or multi task<br />

related), and task-solv<strong>in</strong>g <strong>in</strong>formation (or <strong>in</strong>structional) (Byström, 1997, 2002).<br />

From the analysis of the collected data it turns out that with the highest degree<br />

of a priori determ<strong>in</strong>ability (automatic and normal <strong>in</strong>formation process<strong>in</strong>g tasks) are by<br />

far the most frequent tasks among the participants. Next follow with decreas<strong>in</strong>g<br />

frequency normal decision tasks and known-genu<strong>in</strong>e decision tasks. Unknown-genu<strong>in</strong>e<br />

decision tasks are not present <strong>in</strong> the data material and must thus be expected to be the<br />

rarest task to the participants. Thus, the participants most often take care of tasks that<br />

have a low degree of uncerta<strong>in</strong>ty as to what <strong>in</strong>formation is needed and what constitute<br />

the process of gett<strong>in</strong>g hold of the <strong>in</strong>formation (Byström, 1997). Further, it seems that


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

different types of <strong>in</strong>formation are needed for the particular work tasks. Thus, the results<br />

<strong>in</strong>dicate that with an <strong>in</strong>creased level of uncerta<strong>in</strong>ty about the task <strong>in</strong> question the number<br />

of types <strong>in</strong>creases. <strong>Automatic</strong> <strong>in</strong>formation process<strong>in</strong>g tasks have the largest share of not<br />

acquir<strong>in</strong>g any <strong>in</strong>formation. This share decreases as complexity of tasks <strong>in</strong>creases. Task<br />

<strong>in</strong>formation is most common <strong>in</strong> normal <strong>in</strong>formation process<strong>in</strong>g tasks while both a<br />

comb<strong>in</strong>ation of task and doma<strong>in</strong> <strong>in</strong>formation or even task, doma<strong>in</strong>, and task solv<strong>in</strong>g<br />

<strong>in</strong>formation is most common <strong>in</strong> decision tasks. Comb<strong>in</strong>ed with the frequency of work<br />

tasks, task <strong>in</strong>formation becomes the <strong>in</strong>formation type with the highest frequency, while<br />

doma<strong>in</strong> <strong>in</strong>formation has a medium frequency and task-solv<strong>in</strong>g <strong>in</strong>formation has the<br />

lowest degree of frequency. A similar distribution is <strong>in</strong>dicated <strong>in</strong> Serola (2006).<br />

Likewise, the type of source applied changes with an <strong>in</strong>crease of uncerta<strong>in</strong>ty <strong>in</strong> the work<br />

task at hand. Hence, the share of documentary sources decreases with an <strong>in</strong>crease of<br />

uncerta<strong>in</strong>ty while people as <strong>in</strong>formation sources <strong>in</strong>crease (Byström, 2002). In sum,<br />

Byström’s results <strong>in</strong>dicate that the complexity of work tasks is somewhat connected to<br />

the type and amount of <strong>in</strong>formation acquired. Also, the use of documentary <strong>in</strong>formation<br />

is more widespread <strong>in</strong> tasks of low uncerta<strong>in</strong>ty and complexity which suggests that it is<br />

these types of tasks that <strong>in</strong> particular should be supported by <strong>in</strong>formation systems.<br />

4.4.6 Users of the European Parliamentary Documentation Centre<br />

In 2004 Marcella et al. (2007) exam<strong>in</strong>ed users of the European Parliamentary<br />

Documentation Centre (PDC) regard<strong>in</strong>g <strong>in</strong>formation needs and <strong>in</strong>formation seek<strong>in</strong>g<br />

behaviour. The ma<strong>in</strong> purpose of the study is to make recommendations for service<br />

development <strong>in</strong> the PDC on the basis of the study of users of the PDC. Semi structured<br />

<strong>in</strong>terviews were conducted with different types of adm<strong>in</strong>istrative staff (72 persons). The<br />

types count adm<strong>in</strong>istrative staff, MEP assistants, legal service adm<strong>in</strong>istrators, and<br />

MEPs. S<strong>in</strong>ce only 5 of the 72 persons are MEPs, we have decided to <strong>in</strong>clude the study<br />

<strong>in</strong> the present review as it reflects seek<strong>in</strong>g behaviour from employees’ po<strong>in</strong>t of view.<br />

Also 11 PDC staff were <strong>in</strong>terviewed. In order to assure experienced test persons, the<br />

data were collected prior to the 2004 election for the European Parliament.<br />

The study explores elements of <strong>in</strong>formation behaviour, <strong>in</strong>formation seek<strong>in</strong>g<br />

behaviour and <strong>in</strong>formation search<strong>in</strong>g. 90% of the <strong>in</strong>terviewees use <strong>in</strong>formation at least<br />

on a daily basis. The <strong>in</strong>formation orig<strong>in</strong>ates from both <strong>in</strong>ternal and external sources. It<br />

is applied for a wide range of activities and takes the form of both raw and analysed<br />

data. The <strong>in</strong>terviewees express difficulties <strong>in</strong> locat<strong>in</strong>g relevant <strong>in</strong>formation. The<br />

reasons for the difficulties count transparency, lack of digitalization (of older materials)<br />

60


61<br />

Chapter 4<br />

and representation of different views of op<strong>in</strong>ion, and objectivity of data. In accordance<br />

with prior studies it is essential that the time available is limited. The time pressure<br />

<strong>in</strong>tensifies the difficulties of locat<strong>in</strong>g <strong>in</strong>formation. This is <strong>in</strong>dicated <strong>in</strong> that the<br />

participants use other people to perform their searches and that an important criterion<br />

for relevance of <strong>in</strong>formation is the size of the <strong>in</strong>formation.<br />

In l<strong>in</strong>e with Byström’s (1997) results presented <strong>in</strong> the previous section the<br />

participants have <strong>in</strong>formation needs of vary<strong>in</strong>g complexity. Marcella et al. do not<br />

estimate the relative extent of different types of <strong>in</strong>formation needs but the paper<br />

<strong>in</strong>dicates the presence of vary<strong>in</strong>g complexity at different places. Search<strong>in</strong>g by enter<strong>in</strong>g<br />

complete citations po<strong>in</strong>ts to <strong>in</strong>formation needs of low complexity. On the other hand<br />

the <strong>in</strong>formation seek<strong>in</strong>g connected to the legislative process, where the <strong>in</strong>formation need<br />

starts out <strong>in</strong> a wide rang<strong>in</strong>g manner and is later becom<strong>in</strong>g more focused po<strong>in</strong>ts to more<br />

complex <strong>in</strong>formation needs.<br />

4.4.7 Information literacy of Scottish <strong>government</strong> civil service staff<br />

The overall purpose of Crawford & Irv<strong>in</strong>g’s (2009) study is to <strong>in</strong>vestigate the<br />

nature of civil service employees’ <strong>in</strong>formation literacy <strong>in</strong> order to be able to direct<br />

improv<strong>in</strong>g <strong>in</strong>itiatives more specifically towards the actual practice. The research<br />

method applied <strong>in</strong> the study is structured <strong>in</strong>terviews that are allowed to change slightly<br />

depend<strong>in</strong>g on the specific type of staff be<strong>in</strong>g <strong>in</strong>terviewed. Thus, the 20 <strong>in</strong>terviews that<br />

were made embraced different types of <strong>government</strong> employees: care home staff, civil<br />

service staff, and social work staff. The paper does not share the word<strong>in</strong>g of questions<br />

that has comprised the <strong>in</strong>terview, nor how the respondents are distributed across the<br />

different employee types.<br />

The most recurrent f<strong>in</strong>d<strong>in</strong>g of the study is the importance of humans as sources<br />

of <strong>in</strong>formation. People are used <strong>in</strong> <strong>in</strong>formation seek<strong>in</strong>g at different levels. Thus, other<br />

people are used as sources of <strong>in</strong>formation, but also <strong>in</strong> order to support the selection of<br />

websites for <strong>in</strong>formation search<strong>in</strong>g. The employees evaluate the sources employed for<br />

<strong>in</strong>formation seek<strong>in</strong>g, whether they are human or ICT-based sources. In this sense they<br />

appear very <strong>in</strong>formation literate. At the same time the <strong>in</strong>formation environment seems<br />

<strong>in</strong>trovert. The authors do not explicate what the <strong>in</strong>trovercy embraces. However, the<br />

scope of the paper could suggest that it covers lack of openness towards chang<strong>in</strong>g the<br />

<strong>in</strong>formation practice accord<strong>in</strong>g to <strong>in</strong>formation literacy courses. The paper also<br />

<strong>in</strong>vestigates aspects of search<strong>in</strong>g behaviour. Thus, <strong>in</strong> connection with the electronic<br />

resource data management system of the adm<strong>in</strong>istration it is mentioned, that the


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

specificity of subject terms assigned is not sufficient. This f<strong>in</strong>d<strong>in</strong>g suggests that the<br />

employees request high precision when search<strong>in</strong>g for <strong>in</strong>formation. The employees also<br />

demonstrate a high understand<strong>in</strong>g of the value of the <strong>in</strong>formation sought and applied.<br />

On the other hand, the authors br<strong>in</strong>g out that the quality of <strong>in</strong>ternet searches varies<br />

across the employees leav<strong>in</strong>g room for improvement, e.g., through <strong>in</strong>formation literacy<br />

courses.<br />

4.4.8 Civil servants’ <strong>in</strong>ternet skills<br />

In a recent study by van Deursen & van Dijk (2010) the <strong>in</strong>ternet skills of civil<br />

servants were the subject of <strong>in</strong>vestigation. The purpose of the study was to f<strong>in</strong>d out the<br />

strength of skills at different levels, namely operational, formal, <strong>in</strong>formation, and<br />

strategic levels. The levels are specified to comprise:<br />

1. Operational: operat<strong>in</strong>g browsers, onl<strong>in</strong>e search eng<strong>in</strong>es, and complet<strong>in</strong>g<br />

onl<strong>in</strong>e forms,<br />

2. Formal: navigate the <strong>in</strong>ternet and ma<strong>in</strong>ta<strong>in</strong> a sense of location while<br />

navigat<strong>in</strong>g the <strong>in</strong>ternet,<br />

3. Information: locate required <strong>in</strong>formation, and<br />

4. Strategic: take advantage of the <strong>in</strong>ternet for specific goals.<br />

98 civil servants from different Dutch executive policy agencies and<br />

municipalities served as the empirical basis of the study. The four levels were<br />

operationalized <strong>in</strong>to two search assignments each, a total of 8 assignments. For every<br />

assignment a maximum of time allowed to solve the task was specified. The paper does<br />

not explicate what the motivation for the allowed search time is. The assignments were<br />

used to test the participants’ ability to fulfill the assignment with<strong>in</strong> the period. The<br />

degree of accomplishment was taken to express the skills of the participants. This<br />

measure was subsequently controlled as to different background data.<br />

The general f<strong>in</strong>d<strong>in</strong>gs of the study are that the participants’ operational and<br />

formal skills are stronger than <strong>in</strong>formation and strategic skills. Also, it seems that age<br />

affects the skills <strong>in</strong> the sense that younger participants perform better <strong>in</strong> solv<strong>in</strong>g the<br />

assignments than do older participants. Another difference <strong>in</strong> performance was<br />

identified, namely as to the type of employment. Thus, the executive employees had a<br />

lower degree of performance compared to policy advisors and adm<strong>in</strong>istrators.<br />

Unfortunately, the paper does not report <strong>in</strong> a more qualitatively manner how the<br />

different assignments have been solved by the participants. However, we can use the<br />

results of the study to make clear that different characteristics about the respondents<br />

62


63<br />

Chapter 4<br />

may affect respondents’ skills and that the skills to some degree depend on the type of<br />

employment.<br />

4.5 Related studies of <strong>in</strong>formation seek<strong>in</strong>g and search<strong>in</strong>g<br />

As appears from the sections above, there are not many clear cut studies of the<br />

seek<strong>in</strong>g behaviour of <strong>government</strong> employees. This is the reason why we are present<strong>in</strong>g<br />

some studies below of professional employees shar<strong>in</strong>g some common features with the<br />

doma<strong>in</strong> <strong>in</strong> question. The studies present the seek<strong>in</strong>g behaviour of professional<br />

<strong>in</strong>formation users to whom the employment of <strong>in</strong>formation conta<strong>in</strong>s a core activity <strong>in</strong><br />

solv<strong>in</strong>g daily work tasks as a part of their job. Further, we have <strong>in</strong>cluded professional<br />

legal seek<strong>in</strong>g behaviour <strong>in</strong> the review consider<strong>in</strong>g that legal sources are expected to play<br />

an important role to <strong>government</strong> employees, s<strong>in</strong>ce their job is to govern the law.<br />

4.5.1 Legal seek<strong>in</strong>g behavior<br />

Different authors have <strong>in</strong>vestigated seek<strong>in</strong>g behavior of both academic lawyers<br />

and attorneys. Parts of the f<strong>in</strong>d<strong>in</strong>gs are <strong>in</strong>terest<strong>in</strong>g <strong>in</strong> this review because both the legal<br />

profession and e-<strong>government</strong> employees take their po<strong>in</strong>t of departure <strong>in</strong> the legal<br />

framework constituted by the law. As a consequence it is to be expected that they to a<br />

certa<strong>in</strong> degree share <strong>in</strong>formation sources and seek<strong>in</strong>g behavior.<br />

Kuhlthau & Tama (2001) have conducted a study that <strong>in</strong>vestigates the seek<strong>in</strong>g<br />

behavior of practic<strong>in</strong>g lawyers. 8 lawyers from different small and medium sized<br />

enterprises were <strong>in</strong>terviewed follow<strong>in</strong>g a semi-structured <strong>in</strong>terview guide. The study<br />

comprises both rout<strong>in</strong>e and complex tasks though most attention is paid to the complex<br />

tasks <strong>in</strong> the analysis. The <strong>in</strong>terviewees prefer pr<strong>in</strong>ted over electronic sources. It is<br />

expressed that the search<strong>in</strong>g possibilities <strong>in</strong> electronic sources does not support<br />

serendipity (cf. Foster & Ford, 2003) which is often needed <strong>in</strong> the lawyers’ work with<br />

complex cases. When carry<strong>in</strong>g out rout<strong>in</strong>e tasks the <strong>in</strong>terviewees are more will<strong>in</strong>g to<br />

apply electronic sources. A number of electronic sources are applied for stay<strong>in</strong>g up to<br />

date, e.g., e-mail and listserv. The <strong>in</strong>terviewees stress the need to be able to filter the<br />

<strong>in</strong>formation <strong>in</strong> order to avoid <strong>in</strong>formation overload. Here, time pressure makes the<br />

difference. Thus, the <strong>in</strong>terviewees do not have time to go through all the <strong>in</strong>formation<br />

and are concerned to miss important <strong>in</strong>formation. In addition to pr<strong>in</strong>ted and electronic<br />

sources the lawyers use persons as <strong>in</strong>formation sources <strong>in</strong> accordance with other user


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

groups presented above. A similar f<strong>in</strong>d<strong>in</strong>g was made by Choo et al. (2006), who<br />

reported that the employees of a Canadian law firm regularly exchanged <strong>in</strong>formation<br />

with the people, they worked with. A f<strong>in</strong>al <strong>in</strong>terest<strong>in</strong>g result for the present work is the<br />

lawyers’ expressed need to have uniform and well organized access to their documents.<br />

In a later, related study by Makri, Blandford & Cox (2008a) legal <strong>in</strong>formation<br />

seek<strong>in</strong>g was <strong>in</strong>vestigated for academic lawyers. The purpose of the study is to be able<br />

to make recommendations for the development of established law databases based on<br />

the users’ seek<strong>in</strong>g and search<strong>in</strong>g behavior. 27 participants, rang<strong>in</strong>g from first year<br />

undergraduate to Professor performed searches to f<strong>in</strong>d <strong>in</strong>formation for their work while<br />

th<strong>in</strong>k<strong>in</strong>g aloud. The frame for the analysis is Ellis’ (1989) model for <strong>in</strong>formation<br />

seek<strong>in</strong>g. Dur<strong>in</strong>g the course of the analysis different sub processes to Ellis’ model are<br />

identified. The academic background of the participants means that search<strong>in</strong>g for<br />

scientific articles is <strong>in</strong> focus throughout the paper at the expense of legal sources. As<br />

regards e-<strong>government</strong> employees we expect the use of scientific articles to be close to<br />

noth<strong>in</strong>g. Still, some of the results should be emphasized. Thus, the authors f<strong>in</strong>d that<br />

stay<strong>in</strong>g updated is particularly important <strong>in</strong> legal matters <strong>in</strong> order to avoid bas<strong>in</strong>g one’s<br />

work on materials that have been overruled, have changed the law, or is no longer the<br />

case <strong>in</strong> general. Updat<strong>in</strong>g behavior takes place <strong>in</strong> connection with Ellis’ level<br />

“monitor<strong>in</strong>g” and at a new level identified by the authors, namely “access<strong>in</strong>g”.<br />

Monitor<strong>in</strong>g is def<strong>in</strong>ed as “ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g awareness of developments of an area through<br />

regularly follow<strong>in</strong>g particular sources” (Ellis, 1989, p. 177). In the study of Makri,<br />

Blandford & Cox monitor<strong>in</strong>g takes place at source, document, and content level. Active<br />

monitor<strong>in</strong>g is carried out by the participants by conduct<strong>in</strong>g searches <strong>in</strong> different law<br />

databases, by brows<strong>in</strong>g particular sources, and by follow<strong>in</strong>g previously bookmarked<br />

web pages. Passive monitor<strong>in</strong>g takes the form of subscrib<strong>in</strong>g to e-mail alert lists. The<br />

“updat<strong>in</strong>g” behavior identified by the authors is a behavior subord<strong>in</strong>ate to “access<strong>in</strong>g”<br />

def<strong>in</strong>ed as “ga<strong>in</strong><strong>in</strong>g access to resources, sources or documents/content” (Makri,<br />

Blandford & Cox, 2008a, p. 625). Updat<strong>in</strong>g differs from monitor<strong>in</strong>g <strong>in</strong> that it<br />

designates the behavior of <strong>in</strong>vestigat<strong>in</strong>g the current understand<strong>in</strong>g of a document or the<br />

content of a document. Updat<strong>in</strong>g is primarily direct <strong>in</strong> the form of searches <strong>in</strong> a law<br />

database or check<strong>in</strong>g footnotes to legal texts. The importance of stay<strong>in</strong>g updated is also<br />

verified <strong>in</strong> other studies of legal <strong>in</strong>formation behavior (e.g., du Plessis & du Toit, 2006).<br />

64


4.5.2 Information behaviour of software eng<strong>in</strong>eers<br />

65<br />

Chapter 4<br />

The overall purpose of Freund, Toms & Waterhouse’s (2005) study of software<br />

eng<strong>in</strong>eers was to identify which contextual factors have an effect on <strong>in</strong>formation<br />

seek<strong>in</strong>g <strong>in</strong> a work context. The study was based on a comb<strong>in</strong>ation of four methods;<br />

focus group, semi structured <strong>in</strong>terviews, observation, and f<strong>in</strong>ally analysis of documents<br />

and digital <strong>in</strong>formation (phase I). In the second study (phase II) reported <strong>in</strong> the paper,<br />

14 software services consultants were <strong>in</strong>terviewed on the basis of a semi-structured<br />

<strong>in</strong>terview. The purpose of the study was to <strong>in</strong>vestigate <strong>in</strong>formation behavior on the<br />

basis of a work task framework.<br />

Essential results from the study of phase I is the dependency of <strong>in</strong>formation needs to the<br />

type of work task at hand. Work tasks can range from short term to long term<br />

commitments and the development of <strong>in</strong>formation needs is highly dependent of this<br />

work task context. Phase II of the study reveals that <strong>in</strong>formation is extremely important<br />

to the <strong>in</strong>terviewees’ work. Thus, on average they use approximately 20-30% of their<br />

work<strong>in</strong>g hours search<strong>in</strong>g for and consult<strong>in</strong>g <strong>in</strong>formation sources. The results of phase<br />

II’s study have been summed up <strong>in</strong> Figure 4.3. The figure illustrates the <strong>in</strong>fluence of<br />

work content on access constra<strong>in</strong>ts and <strong>in</strong>formation characteristics, which aga<strong>in</strong> affects<br />

the strategies applied for search<strong>in</strong>g and select<strong>in</strong>g <strong>in</strong>formation. Specifically, the figure<br />

shows that different characteristics of the work context to a large extent affect the<br />

seek<strong>in</strong>g process. Affect<strong>in</strong>g elements of the work context comprise the employees’<br />

characteristics such as her exist<strong>in</strong>g knowledge about the task at hand. Also the type of<br />

task (whether consultant or eng<strong>in</strong>eer<strong>in</strong>g), and the specific problem at hand (e.g.,<br />

learn<strong>in</strong>g, collect advice, or f<strong>in</strong>d facts), seems to affect <strong>in</strong>formation seek<strong>in</strong>g for the<br />

<strong>in</strong>terviewees. The work contextual factors affect the selection of sources <strong>in</strong> terms of the<br />

time available and the availability of sources, but also the characteristics of <strong>in</strong>formation<br />

and knowledge of the subject. This aga<strong>in</strong> has an effect on the type of channel, source,<br />

and genre selected. The seek<strong>in</strong>g process mirrored <strong>in</strong> the model reflects a l<strong>in</strong>ear<br />

conception of <strong>in</strong>formation seek<strong>in</strong>g. Rather, the strength of the model lies <strong>in</strong> its<br />

enumeration of factors that <strong>in</strong>fluences <strong>in</strong>formation seek<strong>in</strong>g.<br />

4.5.3 Professional seek<strong>in</strong>g behaviour<br />

The purpose of Leckie, Pettigrew & Sylva<strong>in</strong>’s (1996) paper is to model the<br />

<strong>in</strong>formation seek<strong>in</strong>g of not just one specific profession but to identify what characterizes<br />

the <strong>in</strong>formation seek<strong>in</strong>g that takes place across professionals. A review of exist<strong>in</strong>g


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

sought, Figure 4.3 e.g., Model as to of subject, cognitive the factors degree affect<strong>in</strong>g of detail <strong>in</strong>formation and specificity, seek<strong>in</strong>g and <strong>in</strong> the preced<strong>in</strong>g doma<strong>in</strong> of software<br />

engagement eng<strong>in</strong>eer<strong>in</strong>g. Adapted (short or from long Freund, assignment, Toms & the Waterhouse stage a project (2005).<br />

is at, etc.), the type of work<br />

66


67<br />

Chapter 4<br />

studies of eng<strong>in</strong>eers’, lawyers’, and health care professionals’ seek<strong>in</strong>g behaviour, and of<br />

seek<strong>in</strong>g models forms the basis of the general model developed by the authors. The<br />

model is depicted <strong>in</strong> Figure 4.4. The model as such <strong>in</strong>forms us about the major role that<br />

work roles play for the subsequent <strong>in</strong>formation seek<strong>in</strong>g of professionals. What can be<br />

discovered from the model is that work roles should be taken <strong>in</strong>to account, when<br />

<strong>in</strong>vestigat<strong>in</strong>g the seek<strong>in</strong>g behaviour of professionals. Compared to the model of Freund,<br />

Toms & Waterhouse (2005), the present model to a greater extent reflects the<br />

<strong>in</strong>teractivity of <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> that it <strong>in</strong>cludes feedback loops. On the other<br />

hand, the model <strong>in</strong> itself is not very thorough as to the specific steps <strong>in</strong> <strong>in</strong>formation<br />

seek<strong>in</strong>g. Thus, the dist<strong>in</strong>ct steps of <strong>in</strong>formation seek<strong>in</strong>g are not mirrored <strong>in</strong> the model.<br />

However, the authors compensate for this <strong>in</strong> their presentation of the model.<br />

4.6 Summary<br />

The present review has made different perspectives on <strong>government</strong> employee<br />

seek<strong>in</strong>g behaviour clear. A diversity of <strong>in</strong>formation needs is present. Thus, <strong>in</strong>formation<br />

needs range from simple <strong>in</strong>formation needs to far more complex <strong>in</strong>formation needs.<br />

However, apparently simple <strong>in</strong>formation needs are the most common <strong>in</strong> the doma<strong>in</strong>.<br />

The diversity of <strong>in</strong>formation needs <strong>in</strong> the doma<strong>in</strong> requires the presence of different<br />

types of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The <strong>in</strong>formation conta<strong>in</strong>ed <strong>in</strong> <strong>in</strong>formation systems needs to be<br />

represented by sufficient descriptive metadata <strong>in</strong> order to support verificative searches<br />

(Ingwersen & Wormell, 1989). In addition, <strong>in</strong> order to meet more complex <strong>in</strong>formation<br />

needs an adequate amount of subject metadata also needs to be present <strong>in</strong> e-<strong>government</strong><br />

<strong>in</strong>formation systems. Further, the assignment of both descriptive and topic metadata is<br />

required <strong>in</strong> order for the employees to be able to discrim<strong>in</strong>ate between large sets of<br />

documents conta<strong>in</strong>ed <strong>in</strong> <strong>in</strong>formation systems. The diversity of work tasks and<br />

<strong>in</strong>formation needs also affects the amount and types of <strong>in</strong>formation applied by the<br />

employees. The review has shown that the amount of <strong>in</strong>formation and the <strong>in</strong>formation<br />

types applied depends on the work task at hand. With simple work tasks task<br />

<strong>in</strong>formation is the most dom<strong>in</strong>ant type of <strong>in</strong>formation applied. When the complexity of<br />

tasks <strong>in</strong>creases, so does also the amount of <strong>in</strong>formation collected and the types of<br />

<strong>in</strong>formation applied. Thus, doma<strong>in</strong> <strong>in</strong>formation and task solv<strong>in</strong>g <strong>in</strong>formation is<br />

primarily used for solv<strong>in</strong>g complex work tasks.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Figure 4.4 The process of <strong>in</strong>formation seek<strong>in</strong>g of professionals. Adapted from Leckie, Pettigrew &<br />

Sylva<strong>in</strong> (1996, p. 180)<br />

Time is an issue to the employees at different levels. In general the time<br />

available for handl<strong>in</strong>g tasks is limited. The same f<strong>in</strong>d<strong>in</strong>g has been made for other user<br />

groups (cf. Savola<strong>in</strong>en, 2006). The time pressure of the employees calls for the<br />

possibility of carry<strong>in</strong>g out effective searches with high precision <strong>in</strong> <strong>in</strong>formation systems.<br />

One way of meet<strong>in</strong>g this requirement is aga<strong>in</strong> by assign<strong>in</strong>g topic metadata at sufficient<br />

level of specificity. Time is also significant to the respondents as to the importance of<br />

stay<strong>in</strong>g updated on the subject of their work. The topics of e-<strong>government</strong> are dynamic<br />

and the employees need to keep updated with<strong>in</strong> the latest developments. The updat<strong>in</strong>g<br />

is carried out by active <strong>in</strong>formation seek<strong>in</strong>g and <strong>in</strong> a more passive manner by follow<strong>in</strong>g<br />

newsletters and other forms of updat<strong>in</strong>g. F<strong>in</strong>ally, the development of subjects means<br />

that documents become obsolete. Be<strong>in</strong>g able to sort documents as to their currency and<br />

to state documents as to their news value is thus important <strong>in</strong> <strong>in</strong>formation systems.<br />

A long range of <strong>in</strong>formation sources are applied by employees <strong>in</strong> order to solve<br />

<strong>in</strong>formation needs. Information is collected from pr<strong>in</strong>ted, digital, and human<br />

68


69<br />

Chapter 4<br />

<strong>in</strong>formation sources. The assessment of sources as to the <strong>in</strong>formation need <strong>in</strong> question<br />

is highly qualified. The preferences for <strong>in</strong>formation sources depend on a number of<br />

characteristics of the employees. Among other th<strong>in</strong>gs, the policy areas and the type of<br />

employment <strong>in</strong>fluence the selection of sources. Also it seems that the number and type<br />

of sources <strong>in</strong>crease along with the complexity of work tasks and <strong>in</strong>formation needs.<br />

Persons as sources <strong>in</strong> general are very frequent and the importance of this particular<br />

source also <strong>in</strong>creases with the complexity of the work task at hand.<br />

In sum, we do get some <strong>in</strong>sight <strong>in</strong>to the seek<strong>in</strong>g behaviour of e-<strong>government</strong><br />

employees from the presented studies. However, what has also become clear from the<br />

review is that the body of knowledge on the seek<strong>in</strong>g behaviour of <strong>government</strong><br />

employees is limited. Firstly, the number of studies specifically <strong>in</strong>vestigat<strong>in</strong>g the<br />

seek<strong>in</strong>g behaviour of <strong>government</strong> employees is not impressive. Secondly, some of the<br />

studies mentioned above are of an earlier date. This becomes problematic s<strong>in</strong>ce we<br />

have previously stated that the work tasks of employees are expected to change with the<br />

digitalization of <strong>government</strong>s. With a change of work tasks we might also see a change<br />

<strong>in</strong> the character of <strong>in</strong>formation needs and as a consequence also a change <strong>in</strong> seek<strong>in</strong>g<br />

behaviour. The behaviour mirrored <strong>in</strong> the older studies may therefore not reflect the<br />

current situation for <strong>government</strong> employees. Thirdly, several of the studies above do<br />

not provide direct <strong>in</strong>sight <strong>in</strong>to the seek<strong>in</strong>g behaviour of the user group <strong>in</strong> question. This<br />

does not have to do with the quality of the studies. Rather it is an expression of the fact<br />

that the studies were carried out with another purpose than <strong>in</strong>vestigat<strong>in</strong>g specific<br />

seek<strong>in</strong>g behaviour. A core assumption of the empirical foundation of the present work<br />

is that the evaluation of <strong>in</strong>formation systems needs to take its po<strong>in</strong>t of departure <strong>in</strong><br />

potential users. On this basis, we estimate that there is a need for a more thorough<br />

<strong>in</strong>vestigation of the current state of civil servants’ seek<strong>in</strong>g behaviour with particular<br />

emphasis on tax <strong>government</strong>s. This <strong>in</strong>vestigation serves the purpose of qualify<strong>in</strong>g the<br />

design of the search test. This is the primary reason for carry<strong>in</strong>g out the empirical<br />

doma<strong>in</strong> study of the thesis.


5 Index<strong>in</strong>g of electronic documents<br />

71<br />

Chapter 5<br />

The concept of <strong><strong>in</strong>dex<strong>in</strong>g</strong> has different mean<strong>in</strong>gs. In LIS, the widest sense of the concept<br />

designates <strong>in</strong>dex terms as a set of labels that <strong>in</strong>formation searchers can apply <strong>in</strong><br />

<strong>in</strong>formation search<strong>in</strong>g <strong>in</strong> order to denote authors, subjects, journal names etc. (cf.,<br />

Rowley, 1994). Here, we are <strong>in</strong>vestigat<strong>in</strong>g the subject of documents. Hence, we employ<br />

a narrower def<strong>in</strong>ition of the term. The understand<strong>in</strong>g of the concept of <strong><strong>in</strong>dex<strong>in</strong>g</strong>, that<br />

guides the present work is, that it designates the act of carry<strong>in</strong>g out representations of<br />

the subject of <strong>in</strong>formation <strong>in</strong> order to enable <strong>in</strong>clusion and retrieval of documents <strong>in</strong> a<br />

database (Lancaster, 2003; Rowley & Hartley, 2008). For it, <strong><strong>in</strong>dex<strong>in</strong>g</strong> supports the<br />

purpose of subject retrieval systems, namely “...to retrieve documents, whose aboutness<br />

suggest that a user may f<strong>in</strong>d <strong>in</strong> them mean<strong>in</strong>g(s) expedient to a certa<strong>in</strong> need of the<br />

moment” (Beghtol, 1986, p. 85).The subject representation can take the form of for<br />

<strong>in</strong>stance descriptors, subject head<strong>in</strong>gs, or classification codes (Mai, 2005).<br />

Index<strong>in</strong>g has three ma<strong>in</strong> purposes; to facilitate easy location of documents by<br />

topic, to enable the identification of relations between documents, and to predict the<br />

relevance of a document to <strong>in</strong>formation needs (Korfhage, 1997). In other words,<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> is a highly important factor <strong>in</strong> the process of <strong>in</strong>formation retrieval. Or as<br />

Soergel puts it: Index<strong>in</strong>g ”...sets an upper limit for retrieval performance...” (1985, p.<br />

327). When seen <strong>in</strong> relation to the IR process, <strong><strong>in</strong>dex<strong>in</strong>g</strong> represents the <strong>in</strong>put to a system<br />

and retrieval of documents the output respectively (Milstead, 1992, p. 408).<br />

Accord<strong>in</strong>gly, the dist<strong>in</strong>ction between <strong>in</strong>put and output stresses the close relation<br />

between <strong><strong>in</strong>dex<strong>in</strong>g</strong> and retrieval as part of the IR process. Also, the applied <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

practice <strong>in</strong>fluences the results of <strong>in</strong>formation retrieval and retrieval should affect how<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> is carried out.<br />

The relation between subject <strong><strong>in</strong>dex<strong>in</strong>g</strong>, subject catalogu<strong>in</strong>g, and classification<br />

is close. All three concepts are used to designate aspects of labell<strong>in</strong>g and describ<strong>in</strong>g<br />

documents accord<strong>in</strong>g to their content, whether it is <strong>in</strong> the form of classification codes,<br />

subject terms or other <strong>in</strong>dicators (Anderson & Pérez-Carballo, 2005; Lancaster, 2003).<br />

This is reflected <strong>in</strong> the literature, where the concepts are used <strong>in</strong> an ambiguous way.<br />

Turn<strong>in</strong>g to automated acts of <strong><strong>in</strong>dex<strong>in</strong>g</strong> and classification, the situation is the same. Here<br />

the process of decid<strong>in</strong>g the content of documents and group<strong>in</strong>g them accord<strong>in</strong>gly may<br />

be referred to as automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (Lancaster, 2003; Moens, 2000; Salton & McGill,


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

1983) or automatic classification (e.g., Golub, 2007). In the present work, we def<strong>in</strong>e<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> a broad sense, tak<strong>in</strong>g the similarities of subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> and classification<br />

<strong>in</strong>to account. This def<strong>in</strong>ition allows for a broad view on the literature on automated<br />

approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong> and classification as well. We apply the term automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>, s<strong>in</strong>ce our focus is on the labell<strong>in</strong>g and group<strong>in</strong>g of documents.<br />

In order to expose the context of <strong><strong>in</strong>dex<strong>in</strong>g</strong> we make an <strong>in</strong>troduction to the<br />

purpose of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Second the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process is presented and followed by<br />

approaches to and core concepts <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Afterwards approaches to automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> are discussed. We f<strong>in</strong>ish the chapter by look<strong>in</strong>g at hybrid <strong><strong>in</strong>dex<strong>in</strong>g</strong> types that<br />

comb<strong>in</strong>es elements of either human or automatically based <strong><strong>in</strong>dex<strong>in</strong>g</strong> approaches.<br />

5.1 The process of <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

The process of <strong><strong>in</strong>dex<strong>in</strong>g</strong> refers to the act of assign<strong>in</strong>g subject terms to<br />

documents or other types of <strong>in</strong>formation <strong>in</strong> order to enable retrieval. Accord<strong>in</strong>g to<br />

Philipson (2008), the process of <strong><strong>in</strong>dex<strong>in</strong>g</strong> starts when the <strong>in</strong>dexer beg<strong>in</strong>s to familiarize<br />

with a document and ends, whenever the subject description has been completed. The<br />

process of <strong><strong>in</strong>dex<strong>in</strong>g</strong> has been presented conta<strong>in</strong><strong>in</strong>g different numbers of steps <strong>in</strong> the<br />

literature. In its most simplistic form, the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process is constituted by two steps.<br />

In the first step the document is analyzed <strong>in</strong> order to decide on the subject. Here, an<br />

identification of the aboutness of the document is identified. The first step may be<br />

referred to as the conceptual analysis (Lancaster, 2003), but the literature shows<br />

alternative designations as well (Mai, 2000, p. 281). In the second step of the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

process, the document subject is translated <strong>in</strong>to a set of <strong>in</strong>dex terms (Mai, 2005). This<br />

part of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process is denoted translation (Lancaster, 2003).<br />

The two step conception of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process has been challenged by other<br />

scholars (Mai, 2000). The <strong><strong>in</strong>dex<strong>in</strong>g</strong> process has been presented with up to five steps.<br />

The <strong>in</strong>creased number of steps allows for a more differentiated presentation of the<br />

process. However, the two steps <strong>in</strong> the simplified model can always be identified as<br />

underly<strong>in</strong>g the more detailed presentations. Rowley (1988, p. 50) presents the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

process as conta<strong>in</strong><strong>in</strong>g 3 steps: 1) familiarization, 2) analysis, and 3) conversion of<br />

concepts <strong>in</strong>to <strong>in</strong>dex terms. In the first step the <strong>in</strong>dexer becomes acqua<strong>in</strong>ted with the<br />

content of the document. Among other th<strong>in</strong>gs the <strong>in</strong>dexer should be aware of the<br />

structure of the subject. The familiarization forms the basis of the second phase: the<br />

analysis of the document. The second phase can to a certa<strong>in</strong> degree be guided by<br />

72


73<br />

Chapter 5<br />

guidel<strong>in</strong>es such as <strong>in</strong>structions, but experience and <strong>in</strong>tuition are also important here. In<br />

the third phase concepts from the document are matched with <strong>in</strong>dex terms from an <strong>in</strong>dex<br />

vocabulary. Compared to Lancaster’s’ (2003) two step model, Rowley expands the first<br />

step <strong>in</strong>to her first two phases, while Rowley’s third phase corresponds with Lancaster’s<br />

second step. Chowdhury (2004, p. 74) operates with a 5-step model of subject <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

conta<strong>in</strong><strong>in</strong>g the follow<strong>in</strong>g steps: 1) analysis of subject; 2) identification of keywords; 3)<br />

standardization of keywords; 4) choice of <strong><strong>in</strong>dex<strong>in</strong>g</strong> system, whether pre- or post<br />

coord<strong>in</strong>ate, and preparation of entries; and 5) fil<strong>in</strong>g of entries. Aga<strong>in</strong> we see Lancaster’s<br />

two steps as underly<strong>in</strong>g Chowdhury’s five. Thus, Chowdhury’s steps 1-3 correspond to<br />

Lancaster’s first step. At Chowdhury’s first step the <strong>in</strong>dexer analyses the subject of the<br />

document while the second step <strong>in</strong>volves the decision on which part of perhaps several<br />

subjects should be represented <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Whether the third step is ma<strong>in</strong>ly<br />

oriented towards conceptual analysis or translation is difficult to state due to<br />

Chowdhury’s limited description. However, we consider it as a part of the conceptual<br />

analysis s<strong>in</strong>ce it is positioned previous to the <strong>in</strong>troduction of the controlled vocabulary.<br />

In addition, it conta<strong>in</strong>s a standardization of the keywords selected on the basis of the<br />

conceptual analysis. The fourth and fifth steps of Chowdhury match the translation<br />

state of Lancaster. Here the entries <strong>in</strong> the controlled vocabulary are generated and filed<br />

<strong>in</strong>to the system. In sum, vary<strong>in</strong>g levels of detail may be identified <strong>in</strong> presentations of<br />

the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process. Different advantages are associated with a more detailed<br />

presentation of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process. Mai (2000) mentions the usefulness when<br />

carry<strong>in</strong>g out analyses of the process. An additional advantage is that it allows for more<br />

specificity when <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es are developed.<br />

Figure 5.1: Illustration of the subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> process (Mai, 2000, p. 279).


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Though a number of presentations exists on the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process, scholars<br />

with<strong>in</strong> the field of <strong><strong>in</strong>dex<strong>in</strong>g</strong> agree, that not much is known about the subject <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

process. In particular the part concern<strong>in</strong>g the <strong>in</strong>dexer’s determ<strong>in</strong>ation of the subject of a<br />

document is not very well discovered <strong>in</strong> the literature. Despite available <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

politics, standards, or guidel<strong>in</strong>es, it is difficult to decide, what takes place <strong>in</strong> the <strong>in</strong>itial<br />

step; identify<strong>in</strong>g the subject of a document (Mai, 2000; 2005). This is obviously a<br />

problem, s<strong>in</strong>ce the <strong>in</strong>itial step of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process may be considered most<br />

important, s<strong>in</strong>ce it forms the basis for the steps to follow. However, the entire process<br />

of <strong><strong>in</strong>dex<strong>in</strong>g</strong> is associated with a reduction or perhaps even loss of <strong>in</strong>formation compared<br />

to the full text of the document. Figure 5.1 illustrates this. A reduction of <strong>in</strong>formation<br />

is needed, because to end users it reduces the amount of <strong>in</strong>formation to keep track of.<br />

On the other hand, if documents are represented by wrong or mislead<strong>in</strong>g <strong>in</strong>dex terms, it<br />

could cause severe problems. Therefore, ensur<strong>in</strong>g the quality of <strong><strong>in</strong>dex<strong>in</strong>g</strong> is essential to<br />

successful retrieval.<br />

5.2 Quality of <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

Index<strong>in</strong>g quality is closely connected to the retrieval of documents. Thus, if<br />

the quality of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> is low, it will reflect on the quality of search results (Mai,<br />

2000). Index<strong>in</strong>g quality may be expressed <strong>in</strong> different terms. Two overall perspectives<br />

exist on how to measure <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality. One perspective considers the quality <strong>in</strong><br />

terms of retrieval effectiveness. That is, the quality is measured <strong>in</strong> terms of the ability<br />

of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> to be able to discrim<strong>in</strong>ate relevant documents from irrelevant documents<br />

as to search requests (e.g., Schultz, 1970; Borko, 1977; Lancaster, 2003). The other<br />

po<strong>in</strong>t of view considers quality <strong>in</strong> terms of the degree of consistency of the <strong><strong>in</strong>dex<strong>in</strong>g</strong>,<br />

that is, the accuracy <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong> of a document (e.g., Roll<strong>in</strong>g, 1981). However,<br />

other concepts also add to the identification of <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality, namely specificity and<br />

exhaustivity. The concepts are important to <strong><strong>in</strong>dex<strong>in</strong>g</strong> because they help characterize the<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> and are known to affect <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality. Below follows an <strong>in</strong>troduction to the<br />

concepts.<br />

5.2.1 Specificity<br />

Specificity expresses the generic level of assigned <strong>in</strong>dex terms (Soergel, 1994).<br />

The concept of specificity is <strong>in</strong>herently connected to the vocabulary applied for<br />

74


75<br />

Chapter 5<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the sense that the specificity of the applied vocabulary decides the possible<br />

level of specificity <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Thus, the generic levels of vocabularies will differ as to<br />

the scope of the vocabulary. For <strong>in</strong>stance, the same content of a document will most<br />

likely have a different depth of assigned <strong>in</strong>dex terms <strong>in</strong> a general vocabulary compared<br />

to a special vocabulary (cf. Mai, 2004b).<br />

It is a common approach <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> practices that <strong>in</strong>dex terms are chosen at<br />

the most specific level possible with<strong>in</strong> the frame of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language (e.g., Bates,<br />

1979; Lancaster, 2003). Hereby, the “force of discrim<strong>in</strong>ation” (Blair, 2002, p. 280) is<br />

supported. By force of discrim<strong>in</strong>ation is meant that by assign<strong>in</strong>g the most specific <strong>in</strong>dex<br />

terms possible, the database allows for discrim<strong>in</strong>ation between documents <strong>in</strong> the<br />

database, <strong>in</strong> particular between general and specific documents. Plac<strong>in</strong>g documents at<br />

the most specific level <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language ensures that documents at the same<br />

level of description will be retrieved <strong>in</strong> the same search session. For <strong>in</strong>stance,<br />

documents deal<strong>in</strong>g with <strong>in</strong>come taxes <strong>in</strong> general are at a higher generic level than<br />

documents deal<strong>in</strong>g with allowance for travel expenses. The pr<strong>in</strong>ciple of assign<strong>in</strong>g the<br />

most specific <strong>in</strong>dex term possible is beneficial when carry<strong>in</strong>g out specific searches.<br />

However, if the search is broader the system needs to allow for <strong>in</strong>clusion of narrower<br />

descriptors <strong>in</strong> order to avoid the <strong>in</strong>clusion of possibly relevant documents of greater<br />

specificity (Soergel, 1994).<br />

5.2.2 Exhaustivity<br />

Exhaustivity deals with <strong><strong>in</strong>dex<strong>in</strong>g</strong> terms’ coverage of the content of a document<br />

(Salton, 1986; Soergel, 1994; Lancaster, 2003; Anderson & Pérez-Carballo, 2005). Are<br />

just core aspects of the document covered by <strong><strong>in</strong>dex<strong>in</strong>g</strong> terms, or are sub aspects<br />

represented as well? Obviously, the larger the numbers of terms assigned, the greater<br />

the exhaustivity of the document will be. The counterpo<strong>in</strong>t to exhaustivity is selective<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>, where only the central subjects of a document is covered by the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

(Lancaster, 2003).<br />

Soergel (1994) dist<strong>in</strong>guishes between viewpo<strong>in</strong>t exhaustivity and importance<br />

exhaustivity. Importance exhaustivity addresses thresholds for when an aspect of a<br />

document is important enough to be represented <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. That is, how important<br />

must an element of a document be <strong>in</strong> order to be <strong>in</strong>cluded <strong>in</strong> the description of the<br />

document? Viewpo<strong>in</strong>t exhaustivity on the other hand po<strong>in</strong>ts to the depth or range of the<br />

implied <strong><strong>in</strong>dex<strong>in</strong>g</strong> language. Thus, viewpo<strong>in</strong>t exhaustivity designates the degree as to<br />

which facets and viewpo<strong>in</strong>ts expressed <strong>in</strong> a document are represented <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

language. One could say that the level of viewpo<strong>in</strong>t exhaustivity is def<strong>in</strong>ed by the limits<br />

of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language. This way the two types of exhaustivity complement each<br />

other. In the first case the level of the exhaustivity is set by the <strong>in</strong>dexer or the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

rules. In the second case the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language sets the upper limit for exhaustivity. In<br />

practice the two types of exhaustivity <strong>in</strong>teract. Importance exhaustivity will be restricted<br />

by the nature of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language. At the same time there is no need for a highly<br />

exhaustive <strong><strong>in</strong>dex<strong>in</strong>g</strong> language, if the def<strong>in</strong>ed <strong><strong>in</strong>dex<strong>in</strong>g</strong> policy prescribes a low level of<br />

importance exhaustivity. However, dist<strong>in</strong>guish<strong>in</strong>g between the two types of<br />

exhaustivity allow for identification of the factors affect<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> exhaustivity.<br />

The level of exhaustivity has economic implications (Lancaster, 2003). A high<br />

level of exhaustivity will require more effort from <strong>in</strong>dexers than a low level of<br />

exhaustivity. It is not necessarily useful to estimate exhaustivity quantitatively <strong>in</strong> terms<br />

of the number of assigned terms. Thus, other factors have an impact on exhaustivity,<br />

such as the size of the documents. Few <strong>in</strong>dex terms added to short documents may be<br />

just as exhaustive as more <strong>in</strong>dex terms added to longer documents (Anderson & Pérez-<br />

Carballo, 2005). The <strong><strong>in</strong>dex<strong>in</strong>g</strong> approach is another factor. Thus, a s<strong>in</strong>gle controlled<br />

term added, may represent the content of a document more exhaustively than a number<br />

of uncontrolled terms added by an <strong>in</strong>dexer (Fugmann, 1993). In terms of recall and<br />

precision (see section 5.2.4) high exhaustivity of <strong><strong>in</strong>dex<strong>in</strong>g</strong> will <strong>in</strong>crease precision of<br />

search results <strong>in</strong> the sense that documents deal<strong>in</strong>g with the searched subject partially<br />

will be retrieved along with documents whose ma<strong>in</strong> focus is on the same subject.<br />

Simultaneously recall is improved by high exhaustivity when documents can be found<br />

that has a more peripheral mention of the searched subject (Rowley, 1988). Also the<br />

ability to discrim<strong>in</strong>ate between documents must be considered <strong>in</strong> relation to<br />

exhaustivity. Thus, if the same terms are assigned to many documents, the<br />

discrim<strong>in</strong>ation value of the term decreases (Lancaster, 2003)<br />

5.2.3 Consistency<br />

Consistency becomes an issue when deal<strong>in</strong>g with human <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The<br />

consistency problem arises from the subjective process tak<strong>in</strong>g place when <strong>in</strong>dexers<br />

decide on the aboutness of a document. Hence consistency refers to the level of<br />

agreement between two or more <strong>in</strong>dexers on which <strong>in</strong>dex terms to use for the<br />

representation of a document (Zunde & Dexter, 1969). This type of consistency is also<br />

known as <strong>in</strong>ter-<strong>in</strong>dexer consistency (Lancaster, 2003). In other words; do two or more<br />

<strong>in</strong>dexers agree on, what is the subject of a document? And do they select the same <strong>in</strong>dex<br />

76


77<br />

Chapter 5<br />

term to represent the subject? The deviation between <strong>in</strong>dexers may take place at<br />

different levels. Lancaster (2003) lists 7 factors that may <strong>in</strong>fluence the degree of<br />

consistency between <strong>in</strong>dexers. The factors appear <strong>in</strong> Table 5.1. A related concept,<br />

<strong>in</strong>tra-<strong>in</strong>dexer consistency, refers to one <strong>in</strong>dexers level of agreement with himself<br />

(Lancaster, 2003). Here the question would be: Does the same <strong>in</strong>dexer have the same<br />

<strong>in</strong>terpretation of the subject of a document at different times? In this sense, the concept<br />

of consistency takes the subjective nature of human <strong>in</strong>dexers <strong>in</strong>to account and deals<br />

with the fact, that <strong><strong>in</strong>dex<strong>in</strong>g</strong> is a highly subjectively dependent process when performed<br />

by human be<strong>in</strong>gs.<br />

1. Number of terms assigned<br />

2. Controlled vocabulary versus free text <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

3. Size and specificity of vocabulary<br />

4. Characteristics of subject matter and its term<strong>in</strong>ology<br />

5. Indexer factors<br />

6. Tools available to <strong>in</strong>dexer<br />

7. Length of item to be <strong>in</strong>dexed<br />

Table 5.1 Possible factors affect<strong>in</strong>g consistency. From Lancaster (2003, p. 71).<br />

We have briefly mentioned that consistency could be one way to express the<br />

quality of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Roll<strong>in</strong>g (1981, p. 71) even def<strong>in</strong>es <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality <strong>in</strong> terms of<br />

consistency. The assumption is that the similarity of documents <strong>in</strong> an IR system cannot<br />

be properly expressed, if the <strong>in</strong>dexers do not demonstrate a sufficient level of<br />

consistency when assign<strong>in</strong>g <strong>in</strong>dex terms. However, express<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality <strong>in</strong><br />

terms of consistency has been disputed by other scholars. Cooper (1969) challenges<br />

consistency as a measure of quality, because consistency does not necessarily imply<br />

good <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Instead, he emphasizes the need to carry out <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> accordance<br />

with the requests users make to an IR system <strong>in</strong> order to ensure successful retrieval. As<br />

a consequence Cooper suggests <strong>in</strong>dexer-requester consistency as highly relevant to<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> quality. It is implicit to <strong>in</strong>dexer-requester consistency, that it is relevant, when<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> quality is expressed <strong>in</strong> terms of retrieval effectiveness. The assumption is that<br />

consistency might very well be high between <strong>in</strong>dexers, but if users apply other search<br />

terms than the ones consistently assigned by <strong>in</strong>dexers, the performance of searches will<br />

not be good. Achiev<strong>in</strong>g a high degree of <strong>in</strong>dexer-requester consistency is made difficult<br />

by the diverse conditions characteriz<strong>in</strong>g the <strong>in</strong>dexer and the requester respectively.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

However, the f<strong>in</strong>d<strong>in</strong>gs by Gomez, Lochbaum & Landauer (1990) suggest, that the richer<br />

the applied vocabulary, the more likely it is to see correspondence between <strong>in</strong>dexers’<br />

<strong>in</strong>dex terms and searchers’ search terms.. A similar f<strong>in</strong>d<strong>in</strong>g of the study is, that the<br />

more names an <strong>in</strong>formation object is allowed to have <strong>in</strong> an <strong>in</strong>formation system, the<br />

more likely it is, that it will be retrieved by searchers.<br />

5.2.4 Performance measures<br />

An alternative way of measur<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality is to <strong>in</strong>vestigate retrieval<br />

effectiveness. An important <strong>in</strong>strument here is the application of performance<br />

measures. Performance measures give an <strong>in</strong>dication of <strong>in</strong>dexer-requester consistency as<br />

suggested by Cooper (1969). Performance measures provide a macro analysis of the<br />

system performance and should preferably be supplemented by microanalysis as<br />

specific <strong>in</strong>vestigations of retrieval success and failure (Soergel, 1985). Us<strong>in</strong>g<br />

performance measures for IR evaluation have been a common practice s<strong>in</strong>ce the<br />

1950’ies. Kent et al. (1955) are among the first to propose different measures of<br />

performance <strong>in</strong> the shape of a number of factors express<strong>in</strong>g system performance. Two<br />

performance measures - recall and precision - have traditionally been employed <strong>in</strong> order<br />

to measure the quality of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The performance measures are quantitative<br />

measures express<strong>in</strong>g respectively:<br />

Recall = Number of relevant documents retrieved<br />

Total number of relevant documents <strong>in</strong> the collection<br />

Precision= Number of relevant documents retrieved<br />

Total number of documents retrieved from the collection<br />

Technically speak<strong>in</strong>g, precision is easier to measure, s<strong>in</strong>ce the evaluator only<br />

need to know which documents from a list of retrieved documents that are actually<br />

relevant. As for recall, one needs to know the relevance of all documents <strong>in</strong> the<br />

collection. In other words, recall challenges the setup of IR evaluation. Further, it<br />

becomes clear, that the concept of relevance is highly important for the outcome of IR<br />

evaluation due to its core position <strong>in</strong> the equations above. The concept of relevance<br />

represents a large and <strong>in</strong>dependent research area. S<strong>in</strong>ce the concept as such is beyond<br />

the scope of the present work, we will not explore further on it here. However, a<br />

thorough review of the concept can be found <strong>in</strong> Borlund (2003a). S<strong>in</strong>ce the first<br />

78


79<br />

Chapter 5<br />

<strong>in</strong>troductions of recall, precision and related performance measures additional measures<br />

have been <strong>in</strong>troduced, that allows tak<strong>in</strong>g <strong>in</strong>to account the characteristics of large scale<br />

IR systems. Examples are mean average precision, <strong>in</strong>teractive recall, and relative<br />

relevance (Kelly, 2009). We will not go further <strong>in</strong>to detail with these measures here.<br />

Different elements <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages can help <strong>in</strong>crease recall or precision<br />

or both. Accord<strong>in</strong>g to Lancaster (2003) exhaustive <strong><strong>in</strong>dex<strong>in</strong>g</strong> will <strong>in</strong>crease recall and<br />

lower precision s<strong>in</strong>ce exhaustivity <strong>in</strong>creases the number of retrieved items <strong>in</strong> search<strong>in</strong>g.<br />

Further, vocabulary control and the presence of different relationships <strong>in</strong> the vocabulary<br />

will <strong>in</strong>crease recall. Inversely, specificity of <strong><strong>in</strong>dex<strong>in</strong>g</strong>, scope notes, and relationships <strong>in</strong><br />

the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language are examples of precision devices (Aitchison, 1992). In sum, it is<br />

possible to adjust the <strong><strong>in</strong>dex<strong>in</strong>g</strong> quality accord<strong>in</strong>g to the expected use of the <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

5.3 Approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

Index<strong>in</strong>g can be divided and characterized <strong>in</strong> a number of different ways, depend<strong>in</strong>g on<br />

the scope. In the sections to follow, we will present the perspectives needed <strong>in</strong> order to<br />

<strong>in</strong>troduce the Ph.D. project. The approaches presented below have been empirically<br />

tested <strong>in</strong> a variety of ways. S<strong>in</strong>ce some of the approaches are usually close related (e.g.,<br />

<strong>in</strong>tellectual and controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong>), empirical comparisons of the approaches may be<br />

relevant to several of the sections below. Therefore we present the empirical studies<br />

where we consider them most relevant.<br />

5.3.1 Document, user, and doma<strong>in</strong> oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

The approach to <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be def<strong>in</strong>ed by the po<strong>in</strong>t of departure of the<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>; whether document, user or doma<strong>in</strong> oriented. The orientation of the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

captures the focus of the subject analysis that takes place ahead of the assignment of<br />

<strong>in</strong>dex terms, that is, the <strong>in</strong>itial step of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process.<br />

Document oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> (or entity oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong>) seeks to represent the<br />

content of documents (Soergel, 1985; Fidel, 1994; Mai, 2005). Thus, the analysis of<br />

the document carried out <strong>in</strong> the first step of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process is based solely on the<br />

content of the document and does not take <strong>in</strong>to account the potential use of the<br />

document. The purpose of the document oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> is to carry out a description<br />

that is loyal to the content of the document. Document oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> may <strong>in</strong><br />

pr<strong>in</strong>ciple be carried out without any preced<strong>in</strong>g knowledge of the users expected to


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

benefit from the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The strength of the document centred approach is that<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> is kept stable due to the static nature of the document. Hereby, the <strong>in</strong>dexers do<br />

not need to consider potential future use of the document (Mai, 2005). Further, <strong>in</strong>dexers<br />

do not need extensive knowledge about the context of the document, whether the<br />

context implies users or the doma<strong>in</strong> <strong>in</strong> question (Fidel, 1994). As po<strong>in</strong>ted out by Mai<br />

(2005), the document oriented approach is supported by the <strong>in</strong>ternational standard for<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> (ISO, 1985).<br />

User (or request) oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> designates <strong><strong>in</strong>dex<strong>in</strong>g</strong> aimed at meet<strong>in</strong>g the<br />

requests expected from a particular audience (Fidel, 1994; Soergel, 1994; Lancaster,<br />

2003). Here, users’ anticipated requests are form<strong>in</strong>g the basis of the <strong>in</strong>dex terms<br />

assigned to a document. Thus, the <strong>in</strong>dexer considers, whether a document should be<br />

retrieved for a certa<strong>in</strong> request or not. Soergel (1985, p. 233) equates descriptors with<br />

queries. By work<strong>in</strong>g through (parts of) an <strong><strong>in</strong>dex<strong>in</strong>g</strong> language the <strong>in</strong>dexer checks<br />

whether a descriptor is relevant to the document <strong>in</strong> question. This sort of <strong><strong>in</strong>dex<strong>in</strong>g</strong> is<br />

also referred to as checklist <strong><strong>in</strong>dex<strong>in</strong>g</strong> (Soergel, 1985; Fidel, 1994). By reflect<strong>in</strong>g<br />

anticipated requests, user oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> seeks to <strong>in</strong>crease <strong>in</strong>dexer-requester<br />

consistency (cf. Cooper, 1969).<br />

Doma<strong>in</strong> oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be considered an extension of user oriented<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>. The conception of doma<strong>in</strong> is commonly associated with Hjørland’s (2002;<br />

Hjørland & Albrechtsen, 1995) concept of doma<strong>in</strong> analysis, which is primarily<br />

concerned with scientific discipl<strong>in</strong>es. However, Mai (2005, p. 605) considers the term<br />

doma<strong>in</strong> <strong>in</strong> a broader sense and def<strong>in</strong>es it as “a group of people who share common<br />

goals.” This way e.g., professional group<strong>in</strong>gs and <strong>in</strong>terest communities are also<br />

potential recipients of the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The assumption beh<strong>in</strong>d doma<strong>in</strong> oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

is that the subject of a document is to a large extent determ<strong>in</strong>ed by the contextual use of<br />

the document. Doma<strong>in</strong> oriented <strong><strong>in</strong>dex<strong>in</strong>g</strong> extends user based <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the sense that<br />

it does take the context of users <strong>in</strong>to account. Thus, it is supposed that the doma<strong>in</strong> users<br />

are members of, has a significant <strong>in</strong>fluence on the role of the document with<strong>in</strong> that<br />

particular doma<strong>in</strong> (Mai, 2005). It may be discussed whether users or doma<strong>in</strong>s change<br />

the most over time and as a consequence which of the two approaches is the most<br />

durable. However, both approaches need regular updates <strong>in</strong> order to ma<strong>in</strong>ta<strong>in</strong> their<br />

currency towards the users (cf. Lancaster, 2003; Mai, 2005).<br />

80


81<br />

Chapter 5<br />

Figure 5.2 Document and doma<strong>in</strong> oriented approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Adapted from Mai (2005, p.<br />

607)<br />

The approaches mentioned above may be summed up by Mai’s illustration (see<br />

Figure 5.2). It is important to note, that the three approaches are to a certa<strong>in</strong> degree<br />

condensed constructions that serve the purpose of identify<strong>in</strong>g tendencies <strong>in</strong> subject<br />

identification. In practice the approaches will <strong>in</strong> some cases be difficult to perform <strong>in</strong> a<br />

clean-cut manner. As an example of this, Mai (2005) mentions the difficulties <strong>in</strong>dexers<br />

may have, not us<strong>in</strong>g for <strong>in</strong>stance contextual knowledge when <strong>in</strong>terpret<strong>in</strong>g the subject of<br />

a document solely on the basis of the document.<br />

5.3.2 Controlled vs. uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

Controlled and uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> refers to the <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages used to<br />

perform <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In its basic form, a controlled vocabulary is an authority list<br />

specify<strong>in</strong>g the <strong>in</strong>dex terms, <strong>in</strong>dexers can assign when perform<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong>. However,<br />

<strong>in</strong> addition, controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages are commonly express<strong>in</strong>g some sort of<br />

semantic structure <strong>in</strong> order to be able to for <strong>in</strong>stance control synonyms, differentiate<br />

between homographs, and l<strong>in</strong>k related terms (Lancaster, 2003). Controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

languages belong to the type of systems referred to as knowledge organization systems<br />

or KOS. KOS may be characterized as to their structure (or the relationships expressed)<br />

and function (cf., Zeng, 2008). As a consequence a number of different KOS exists.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Figure 5.3 Types of vocabularies and their relationships. Adapted from Morville & Rosenfeld<br />

(2007, p. 195)<br />

Subject head<strong>in</strong>g lists are alphabetically ordered lists of controlled terms and related<br />

subhead<strong>in</strong>gs. Thesauri on the other hand differ by hav<strong>in</strong>g fully organized terms<br />

elaborat<strong>in</strong>g relations between concepts (Aitchison, 1992). Thesauri and subject head<strong>in</strong>g<br />

lists have two features <strong>in</strong> common. When <strong>in</strong> use they control the use and form of<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> terms, and enables relations between terms <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong> language (Rowley,<br />

1988, p. 68). Additionally, taxonomies, ontologies, and classification schemes are<br />

variants of controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages (Aitchison, 1992; Gilchrist, 2003). Controlled<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> languages can be characterized as to their degree of complexity. Morville &<br />

Rosenfeld have illustrated this graphically (see Figure 5.3). Accord<strong>in</strong>g to the model, the<br />

lowest level of complexity represents equivalence relationships while the highest level<br />

represents associative relationships as for <strong>in</strong>stance expressed <strong>in</strong> thesauri.<br />

In controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages “both the terms used to represent subjects,<br />

and the process whereby terms are assigned to particular documents, are controlled or<br />

executed by a person” (Rowley, 1994, p. 109). This is the ma<strong>in</strong> reason for the close<br />

relation between controlled and manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> mentioned <strong>in</strong> section 5.3. 4 However,<br />

as will be seen later <strong>in</strong> the present chapter, automatic methods for controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

have been developed. In other words, the relation between controlled and manual<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> is not unequivocal.<br />

4 We elaborate further on manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the section to follow (section 5.3.3).<br />

82


83<br />

Chapter 5<br />

Uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> extracts <strong><strong>in</strong>dex<strong>in</strong>g</strong> words from the document itself or<br />

from another source outside of the controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> language. When speak<strong>in</strong>g of<br />

uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong>, two generic types exist. Free <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages assigns terms<br />

to documents that not necessarily orig<strong>in</strong>ate from the document itself. Natural language<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> on the other hand applies terms from the document for representation, and is<br />

usually employed when perform<strong>in</strong>g automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (Rowley, 1988). Index<strong>in</strong>g by<br />

natural language forms a subord<strong>in</strong>ate field of research to the general field of natural<br />

language process<strong>in</strong>g (NLP) (Chowdhury, 2003). Uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be carried<br />

out by humans or mach<strong>in</strong>es. However, natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong> is commonly<br />

associated with automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (see section 5.4).<br />

Dubois (1987, p. 249) have summarized the strengths and weaknesses of<br />

controlled vocabularies and free text <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The key po<strong>in</strong>ts appear from Table 5.2.<br />

To a large extent Blair & Maron’s (1985) study <strong>in</strong> an empirical manner supports<br />

Dubois’ summary concern<strong>in</strong>g free text <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Blair & Maron tested the retrieval<br />

effectiveness of a full text retrieval system that is, <strong><strong>in</strong>dex<strong>in</strong>g</strong> by natural language.<br />

Involv<strong>in</strong>g two test persons with<strong>in</strong> the legal doma<strong>in</strong> Blair & Maron found that the level<br />

of recall <strong>in</strong> the searches carried out was surpris<strong>in</strong>gly low. A number of different reasons<br />

expla<strong>in</strong>ed the results regard<strong>in</strong>g recall. For one th<strong>in</strong>g the test persons had difficulties<br />

predict<strong>in</strong>g the exact word<strong>in</strong>g applied <strong>in</strong> the documents searched for. It turned out that<br />

the test persons’ selection of words was decided by their po<strong>in</strong>t of view on the problem<br />

<strong>in</strong> question. Also, misspell<strong>in</strong>gs <strong>in</strong> the documents conta<strong>in</strong>ed <strong>in</strong> the retrieval system<br />

resulted <strong>in</strong> lack of retrieval. Both these f<strong>in</strong>d<strong>in</strong>gs illustrate how the searchers are be<strong>in</strong>g<br />

challenged when search<strong>in</strong>g for <strong>in</strong>formation as po<strong>in</strong>ted out by Dubois (1987). Further it<br />

was found that search terms rated important by the searchers did not occur <strong>in</strong> document<br />

relevant to given requests. In some cases the terms were just not <strong>in</strong>cluded <strong>in</strong> the<br />

documents. In other cases the terms occurred, but were expressed <strong>in</strong> terms of narrower<br />

or broader concepts. This problem is also addressed by Dubois. On the other hand this<br />

can be considered the strength of natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong>. For <strong>in</strong>stance, Tenopir<br />

(1985) found that the use of synonyms <strong>in</strong> natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong> were able to<br />

compensate for users’ <strong>in</strong>complete queries.<br />

The performance of controlled versus uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages have a<br />

core subject of <strong>in</strong>vestigation <strong>in</strong> the LIS research literature. One of the first<br />

<strong>in</strong>vestigations compar<strong>in</strong>g the retrieval effectiveness of different <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages was<br />

the Cranfield tests. The tests took place for approximately a decade beg<strong>in</strong>n<strong>in</strong>g <strong>in</strong> the


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Controlled<br />

vocabularies<br />

Advantages Disadvantages<br />

Solves many semantic<br />

problems<br />

Permits generic<br />

relationships to be<br />

identified<br />

Maps areas of knowledge<br />

Free text Low cost<br />

Simplified search<strong>in</strong>g<br />

Full <strong>in</strong>formation content<br />

searchable<br />

Every word has equal<br />

retrieval value<br />

No human <strong><strong>in</strong>dex<strong>in</strong>g</strong> errors<br />

No delay <strong>in</strong> <strong>in</strong>corporat<strong>in</strong>g<br />

new terms<br />

84<br />

High cost<br />

Possible <strong>in</strong>adequacies of coverage<br />

Human error<br />

Possible out of date vocabulary<br />

Difficulty of systematically<br />

<strong>in</strong>corporat<strong>in</strong>g all relevant<br />

relationships between terms<br />

Greater burden on searcher<br />

Information implicitly but not<br />

overtly <strong>in</strong>cluded <strong>in</strong> text may be<br />

missed<br />

Absence of specific to generic<br />

l<strong>in</strong>kage<br />

Vocabulary of discipl<strong>in</strong>e must be<br />

known<br />

Table 5.2 Summary of strengths and weaknesses of controlled vocabularies and free text. Adapted<br />

from Dubois (1987, p. 249).<br />

mid-1950’s. The overall purpose of the Cranfield tests was to carry out comparative<br />

evaluations of a number of different controlled and uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages.<br />

However, the tests have become at least equally known for their pioneer<br />

contribution to the methodical body of knowledge on evaluation of IR systems (cf.<br />

Sparck Jones, 1981). The Cranfield tests comprised two tests; Cranfield I and Cranfield<br />

II. Cranfield I identified the complexity of isolat<strong>in</strong>g a s<strong>in</strong>gle <strong><strong>in</strong>dex<strong>in</strong>g</strong> language <strong>in</strong> a test<br />

situation, s<strong>in</strong>ce the tested <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages were found to be <strong>in</strong>teract<strong>in</strong>g as to their<br />

functions as precision and recall devices respectively (Cleverdon, 1967). Criticism was<br />

put forward by different authors, ma<strong>in</strong>ly concern<strong>in</strong>g methodical issues (Sparck Jones,<br />

1981). Next followed Cranfield II with a slightly enlarged test collection compared to<br />

Cranfield I. Cranfield II built upon Cranfield I and served the purpose of carry<strong>in</strong>g out a<br />

closer <strong>in</strong>vestigation of the effect s<strong>in</strong>gle <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages had on performance. Like <strong>in</strong><br />

Cranfield I, a number of different <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages were tested aga<strong>in</strong>st each other.<br />

The languages were distributed across three ma<strong>in</strong> types: 1) S<strong>in</strong>gle term <strong><strong>in</strong>dex<strong>in</strong>g</strong>


85<br />

Chapter 5<br />

languages, 2) Simple concept <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages, and 3) Controlled term <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

languages. Furthermore, <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages represent<strong>in</strong>g keywords <strong>in</strong> titles and<br />

abstracts were <strong>in</strong>cluded <strong>in</strong> the test. Among the results of the test the <strong>in</strong>verse relation<br />

between recall and precision was found. By this is meant that when recall is high,<br />

precision tends to be low and vice versa (Cleverdon, 1967). An ordered list of the<br />

performance of the tested <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages when measured <strong>in</strong> terms of normalized<br />

recall further showed that apply<strong>in</strong>g s<strong>in</strong>gle terms (the first group of <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages<br />

mentioned above) was superior to the rema<strong>in</strong><strong>in</strong>g two groups. S<strong>in</strong>gle concepts (the<br />

second group of <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages tested) had the lowest performance, while<br />

controlled terms and keywords from titles and abstracts had a medium score (Cleverdon<br />

& Keen, 1966, p. 253; Cleverdon, 1967, p. 189). 5 Thus, the results suggest that<br />

uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> is certa<strong>in</strong>ly a valuable tool for retrieval purposes, but that they<br />

should preferably be <strong>in</strong> the form of s<strong>in</strong>gle term languages compared to simple concept<br />

languages.<br />

In another study Cous<strong>in</strong>s (1992) compared the performance of basic marc<br />

records, and records enriched with either natural language <strong>in</strong>dex terms or controlled<br />

<strong>in</strong>dex terms. Performance was measured <strong>in</strong> terms of recall. The natural language terms<br />

of the study orig<strong>in</strong>ated from the table of contents and back of the book <strong>in</strong>dexes of the<br />

<strong>in</strong>dexed units. PRECIS represented the controlled vocabulary for the test. The choice of<br />

PRECIS was based on a preced<strong>in</strong>g <strong>in</strong>vestigation, where it was found that out of three<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> languages, PRECIS was the most suitable for the queries guid<strong>in</strong>g the test. 11<br />

queries of vary<strong>in</strong>g themes were applied for the test. In her test Cous<strong>in</strong>s found that the<br />

retrieval performance of the enriched records exceeded the basic records. However, it<br />

was also found that the relative retrieval performance <strong>in</strong> the enriched records depended<br />

on whether the queries applied for the test were truncated or not. Thus, it turned out,<br />

that PRECIS had a better performance when queries were not truncated. Conversely,<br />

when test queries were truncated the retrieval performance of the natural language<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> was superior. Overall, truncated queries applied for natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

had the best retrieval performance of the test. In Cous<strong>in</strong>s discussion she mentions the<br />

<strong>in</strong>fluence of the test queries on the test results. Thus, the formulation of some of the<br />

queries turned out to have quite an effect on the test result due to their choice of terms<br />

5 A thorough presentation of the results of the Cranfield tests has been presented <strong>in</strong> Cleverdon (1960)<br />

(Cranfield I) and Cleverdon & Keen (1966) (Cranfield II).


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

and subsequent potential for truncation. Apart from the search test results, Cous<strong>in</strong>s<br />

study adds to emphasize the importance of the amount and nature of queries applied <strong>in</strong><br />

retrieval tests. This is particularly the case, when the test setup does not <strong>in</strong>clude real<br />

users, but are carried out <strong>in</strong> an experimental sett<strong>in</strong>g like Cous<strong>in</strong>’s.<br />

In a study with a slightly different focus, Gross & Taylor (2005) <strong>in</strong>vestigated<br />

the amount of relevant records be<strong>in</strong>g missed if controlled <strong>in</strong>dex terms were removed<br />

from records <strong>in</strong> a library catalogue. Thus, though not explicated by the authors, recall<br />

was used to measure the performance between records <strong>in</strong>clud<strong>in</strong>g and records exclud<strong>in</strong>g<br />

controlled subject data. A sample of 227 queries drawn from a log of the library<br />

catalogue functioned as the <strong>in</strong>formation needs of the study. The study found that<br />

approximately one third of records would not have been retrieved without the<br />

assignment of controlled subject data. The study supports the general perception that<br />

controlled subject data supports recall. Also, obviously, controlled subject data need to<br />

supplement the natural language appear<strong>in</strong>g <strong>in</strong> records. In a similar study Veenema<br />

(1996) evaluated the performance of controlled <strong>in</strong>dex terms and natural language <strong>in</strong> a<br />

small test collection (553 documents of highly vary<strong>in</strong>g content and form) compiled<br />

from a Canadian embassy. The <strong><strong>in</strong>dex<strong>in</strong>g</strong> policy guid<strong>in</strong>g the manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> is far from<br />

aim<strong>in</strong>g at exhaustivity. This results <strong>in</strong> an average of 2 assigned terms per document <strong>in</strong><br />

the test collection. The comparison of the two <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages shows that the nature<br />

of the <strong>in</strong>formation need affects the performance of the respective languages. Thus, due<br />

to the highly restrictive <strong><strong>in</strong>dex<strong>in</strong>g</strong> policy on the controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> language, the natural<br />

language performed better <strong>in</strong> <strong>in</strong>formation needs concern<strong>in</strong>g locations, while the<br />

controlled <strong>in</strong>dex terms performed better on <strong>in</strong>formation needs regard<strong>in</strong>g a certa<strong>in</strong> sector.<br />

Though the empirical basis of the study is rather limited, the study adds to illustrate the<br />

implications of <strong><strong>in</strong>dex<strong>in</strong>g</strong> policies on test results, but also how specific characteristics of<br />

<strong>in</strong>formation requests may affect outcomes of comparisons of <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages.<br />

Savoy (2005) has compared manual, assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> and automatic,<br />

extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> a study of French database named Amaryllis. Here, we will<br />

present the results relevant to the performance of controlled versus uncontrolled<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>. However, implicitly the differences between manual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

are also illustrated by the study. In the study, manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> was ma<strong>in</strong>ly carried by<br />

us<strong>in</strong>g a controlled vocabulary. The <strong>in</strong>dexers were allowed to supplement with<br />

uncontrolled <strong>in</strong>dex terms. In practice uncontrolled terms occurred rarely, though the<br />

share was not specified <strong>in</strong> the paper. <strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> was represented <strong>in</strong> the study<br />

by ten different <strong><strong>in</strong>dex<strong>in</strong>g</strong> models such as the Okapi probabilistic model, a b<strong>in</strong>ary model<br />

86


87<br />

Chapter 5<br />

where a term either occurs or do not occur, and a number of weighted approaches. The<br />

test collection conta<strong>in</strong>ed approximately 145.000 documents. Thus, the results of the<br />

study cannot necessarily be transferred to real life databases, which commonly conta<strong>in</strong><br />

millions of documents. 25 queries represented the <strong>in</strong>formation needs of the study.<br />

Concern<strong>in</strong>g controlled versus uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> the study found the best<br />

performance to be achieved by us<strong>in</strong>g a comb<strong>in</strong>ation of the two general <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

languages. Similar f<strong>in</strong>d<strong>in</strong>gs have been made by Tenopir (1985) regard<strong>in</strong>g controlled<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> and natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong>. A comparison between controlled and<br />

uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> slightly favored controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> when measured as to mean<br />

average precision. However, the results were not statistically significant. Go<strong>in</strong>g<br />

through the results manually revealed rather comprehensive variations at query level.<br />

This result emphasizes the <strong>in</strong>fluence of test queries on test results and the importance of<br />

validation.<br />

The studies above have compared controlled and uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

languages. Recently, Price et al. (2007; 2009) have <strong>in</strong>troduced the notion of semantic<br />

components that allow for a simultaneous comb<strong>in</strong>ation of controlled and free text<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>. Semantic component <strong><strong>in</strong>dex<strong>in</strong>g</strong> provides a supplementary, enriched<br />

description of document contents by manually mark<strong>in</strong>g up segments of text <strong>in</strong> a<br />

document (i.e., semantic component <strong>in</strong>stances) with labels (semantic component<br />

names). Doma<strong>in</strong>-specific documents tend to conta<strong>in</strong> characteristic types of <strong>in</strong>formation<br />

(semantic components). With semantic components a searcher can search for query<br />

terms with<strong>in</strong> specific semantic components, or specify a preference for documents<br />

conta<strong>in</strong><strong>in</strong>g particular semantic components. Hereby, the searcher can comb<strong>in</strong>e the<br />

advances of uncontrolled full text search and doma<strong>in</strong>-oriented controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> that<br />

emphasizes topics or components of the documents. Semantic components have been<br />

empirically evaluated (e.g., Price et al., 2007; Price et al., 2009). The results suggest<br />

that this particular type of <strong><strong>in</strong>dex<strong>in</strong>g</strong> can be a valuable improvement of full text <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

As appears from the studies presented above, precision and <strong>in</strong> particular recall<br />

has been applied several times. However, the results do not po<strong>in</strong>t to an unambiguous<br />

relation between types of <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages and the mentioned performance measures.<br />

Rather, it seems that Svenonius’ (1986, p. 335) perception that both free text and<br />

controlled vocabularies contribute to recall and precision, but <strong>in</strong> different ways, is<br />

validated. Apparently a comb<strong>in</strong>ation of controlled and uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be<br />

advisable, tak<strong>in</strong>g <strong>in</strong>to account the respective strengths and weaknesses of the respective<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> languages. Rowley (1994) concludes her paper by outl<strong>in</strong><strong>in</strong>g a number of


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

factors that may help decide on the optimal comb<strong>in</strong>ation of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages; 1)<br />

the search<strong>in</strong>g environment, 2) the searchers, 3) available retrieval facilities and<br />

strategies, and 4) the nature of the search. On the basis of these factors and on the basis<br />

of this section <strong>in</strong> general, we can conclude that the selection of <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages<br />

should reflect the actual area of function.<br />

5.3.3 Intellectual vs. automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

Overall, <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be carried out by humans (<strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>), by<br />

mach<strong>in</strong>es (automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>), or by a comb<strong>in</strong>ation of the two (semi-automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>). As <strong>in</strong>dicated by the name, <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong> is the <strong><strong>in</strong>dex<strong>in</strong>g</strong> carried out by<br />

humans, that is, <strong>in</strong>dexers assign <strong>in</strong>dex words to documents, usually on the basis of a<br />

controlled vocabulary. The literature also applies the terms human or manual <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

to designate <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

Rafferty & Hidderley (2007) identify three approaches to <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>:<br />

Expert-led <strong><strong>in</strong>dex<strong>in</strong>g</strong>, author-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>, and user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Traditionally,<br />

<strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong> have been carried out by professional <strong>in</strong>dexers, expert-led<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>. The purpose is to establish a connection between user and document on the<br />

basis of a controlled vocabulary, by us<strong>in</strong>g free text identifiers, or a comb<strong>in</strong>ation of the<br />

two. In scientific databases it is also common, that authors attach keywords to their<br />

contributions. This is referred to as author-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>. These keywords are not<br />

selected from a controlled vocabulary. Rather, they represent the authors’ perception of<br />

the content of their document <strong>in</strong> the form of uncontrolled <strong>in</strong>dex terms. With the amount<br />

of <strong>in</strong>formation produced today, e.g., on the Internet, supplement<strong>in</strong>g or perhaps even<br />

replac<strong>in</strong>g professional <strong>in</strong>dexers with other <strong>in</strong>dexers can be a means to ensure subject<br />

representation of <strong>in</strong>formation objects. Thus, <strong>in</strong> the latest decade, we have seen the<br />

emergence of onl<strong>in</strong>e sources that allow users to assign tags to <strong>in</strong>formation sources (e.g.,<br />

Hunter, 2009; Trant, 2009). User tags broaden the conception of <strong><strong>in</strong>dex<strong>in</strong>g</strong> due to the<br />

supplementary functions, tags also have (Golder & Huberman, 2006). Thus, tags allow<br />

for more f<strong>in</strong>e gra<strong>in</strong>ed access to <strong>in</strong>formation sources than usually possible through<br />

professional <strong><strong>in</strong>dex<strong>in</strong>g</strong> us<strong>in</strong>g a controlled vocabulary (Kipp, 2005). The latter type is<br />

known as user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

Rafferty & Hidderley (2007) characterizes the three types of <strong>in</strong>tellectual<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> as to their communicative potential. Thus, both expert-led and author-based<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> is characterized as monologic, because they express a k<strong>in</strong>d of <strong><strong>in</strong>dex<strong>in</strong>g</strong> that is<br />

88


Controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> Uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

Monologic Professional <strong>in</strong>dexers Author-based <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

Dialogic User-based <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

89<br />

Chapter 5<br />

Figure 5.4 Generalized characteristics of <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Accumulated on the basis of<br />

Rafferty & Hidderley (2007).<br />

not communicat<strong>in</strong>g with the potential users of the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. User-based <strong><strong>in</strong>dex<strong>in</strong>g</strong> on the<br />

other hand represents a dialogic type of <strong><strong>in</strong>dex<strong>in</strong>g</strong>, because it allows for the users of<br />

documents to express their <strong>in</strong>dividual <strong>in</strong>terpretation of an <strong>in</strong>formation unit. This is<br />

graphically illustrated <strong>in</strong> Figure 5.4. One must keep <strong>in</strong> m<strong>in</strong>d, that the figure presents a<br />

generalized view of the <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong> types. In other words exceptions to the<br />

figure do exist. For <strong>in</strong>stance, <strong>in</strong> some cases professional <strong>in</strong>dexers also carry out<br />

uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The reason for connect<strong>in</strong>g professional <strong>in</strong>dexers with controlled<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> after all is the fact, that this is the most frequently occurr<strong>in</strong>g type of <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

performed by this particular group of <strong>in</strong>dexers. As appears from the figure, the box<br />

represent<strong>in</strong>g dialogic, controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> is empty. The reason is that this type of<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> has not yet been fully developed. Different authors have addressed the<br />

problem and different solutions for the lack of control <strong>in</strong> folksonomies have been<br />

proposed (cf., Trant, 2009).<br />

Different studies have <strong>in</strong>vestigated the characteristics of the different types of<br />

<strong>in</strong>dexers mentioned above. Kipp (2005) has compared users’, authors’ and professional<br />

<strong>in</strong>dexers’ assignment of <strong>in</strong>dex terms and tags to 165 scientific papers from core LIS<br />

journals. The analysis presented mostly <strong>in</strong>vestigated the terms assigned by professional<br />

<strong>in</strong>dexers and users. Kipp found that there was some overlap <strong>in</strong> the terms assigned, but<br />

that the overlap often represented narrower terms, broader terms, related terms and<br />

synonyms. However, quite a number of terms were not related to each other between<br />

the <strong>in</strong>dexer groups. Kipp suggested, that one explanation could be, that users could<br />

apply one specific term to address new concepts, whereas <strong>in</strong>dexers needed to express<br />

new terms <strong>in</strong> a controlled vocabulary by a comb<strong>in</strong>ation of controlled terms already<br />

exist<strong>in</strong>g <strong>in</strong> the vocabulary.<br />

In a later study, Strader (2009) made a comparative study <strong>in</strong>vestigat<strong>in</strong>g the<br />

degree of overlap between author-assigned keywords and Library of Congress Subject<br />

Head<strong>in</strong>gs (LCSH), that is, controlled <strong>in</strong>dex terms assigned by professional <strong>in</strong>dexers.<br />

The subject of <strong>in</strong>vestigation was bibliographic records represent<strong>in</strong>g doctoral students’


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

publications <strong>in</strong> an onl<strong>in</strong>e catalogue. 285 theses and dissertations conta<strong>in</strong><strong>in</strong>g a total of<br />

1.681 author keywords and 1.181 LCSH terms were analyzed. The study showed, that<br />

there was a certa<strong>in</strong> overlap between author-assigned and LCSH. However,<br />

approximately half of the author-assigned keywords did not match LCSH. A number of<br />

reasons can expla<strong>in</strong> the lack of overlap between subject terms. One reason may be that<br />

LCSH are not updated frequently enough to reflect current research. Another reason is<br />

that the authors use a different term<strong>in</strong>ology to represent similar concepts. Strader also<br />

found that about one-tenth of author-assigned subject terms and one-third of LCSH<br />

supplements data could be found elsewhere <strong>in</strong> the bibliographic record. In other words,<br />

LCSH to a larger degree supply users with unique access po<strong>in</strong>ts to the <strong>in</strong>vestigated<br />

records. However, it was concluded that both types of <strong><strong>in</strong>dex<strong>in</strong>g</strong> enriches the retrieval<br />

environment for users.<br />

Thomas, Caudle & Schmitz (2009) also exam<strong>in</strong>ed LCSH, but compared it to<br />

user tags <strong>in</strong> Library Th<strong>in</strong>g. Ten books were selected to form the basis of the<br />

<strong>in</strong>vestigation. The criteria for selection were that the books were popular, and that they<br />

represented weak LCSH areas. Both criteria must be taken <strong>in</strong>to account, when apply<strong>in</strong>g<br />

the results of the <strong>in</strong>vestigation, s<strong>in</strong>ce it favors user tags, which potentially affects the<br />

generalizability of the study. On the basis of the <strong>in</strong>vestigation the authors found, that<br />

users tag for their own purposes. Also, there was a certa<strong>in</strong> overlap between LCSH<br />

subject terms and user tags, but user tags were stronger than LCSH terms, when<br />

concern<strong>in</strong>g task organization. Users of <strong>in</strong>formation systems will get the richest<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>, when the system applies a comb<strong>in</strong>ation of user tags and LCSH terms, but the<br />

benefits are greater, when the number of tags is large.<br />

As a part of her Ph.D. work, Choi (2010a; 2010b) carried out a study<br />

<strong>in</strong>vestigat<strong>in</strong>g user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong> and expert-led <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Though prelim<strong>in</strong>ary <strong>in</strong><br />

nature, the study compared <strong>in</strong>dex terms assigned to web documents at the web sites<br />

Intute, BUBL and Delicious. The first study took <strong>in</strong>to account both controlled and<br />

uncontrolled keywords from Intute. The study showed, that the subject perspectives<br />

expressed at the three exam<strong>in</strong>ed websites differed, even between the two sites<br />

represent<strong>in</strong>g professional <strong>in</strong>dexers (Choi, 2010a). The second study left out subjective<br />

and personal tags from Delicious. The study found that the level of similarity between<br />

<strong>in</strong>dexers and users differed as to the subject of the <strong>in</strong>dexed websites. Thus, subjects<br />

with a larger <strong>in</strong>take of new words (e.g., technology) tend to generate less consistency<br />

between <strong>in</strong>dexers (Choi, 2010b).<br />

90


91<br />

Chapter 5<br />

Attar (2006) carried out a study evaluat<strong>in</strong>g the <strong><strong>in</strong>dex<strong>in</strong>g</strong> performance of student<br />

<strong>in</strong>dexers. Unlike the studies just mentioned, Attar’s study is not comparative <strong>in</strong> nature.<br />

The study <strong>in</strong>vestigated subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> and the formal description of <strong>in</strong>formation units<br />

<strong>in</strong> a library catalogue. 37 undergraduate and graduate students catalogued and <strong>in</strong>dexed<br />

a full library collection with very diverse document types after hav<strong>in</strong>g received two<br />

days of detailed tra<strong>in</strong><strong>in</strong>g. The students came from diverg<strong>in</strong>g studies, but none were LIS<br />

students. When possible, the students <strong>in</strong>dexed <strong>in</strong>formation with<strong>in</strong> the subject area they<br />

were familiar with from their study. Evaluat<strong>in</strong>g the <strong><strong>in</strong>dex<strong>in</strong>g</strong> subsequently, Attar found,<br />

that the problems <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong> carried out <strong>in</strong> particular related to <strong>in</strong>consistent and<br />

<strong>in</strong>correct use of subject head<strong>in</strong>gs. For literary works, particularly the use of genre<br />

caused trouble. The problems were caused by lack of tra<strong>in</strong><strong>in</strong>g and lack of familiarity<br />

with LCSH. In this manner, the study stresses the importance of proper tra<strong>in</strong><strong>in</strong>g, when<br />

carry<strong>in</strong>g out <strong><strong>in</strong>dex<strong>in</strong>g</strong> at a professional level.<br />

The empirical studies presented above to a large extent <strong>in</strong>form us about the<br />

characteristics of different types of <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong>. However, as reflected <strong>in</strong><br />

Figure 5.4, the results of the comparisons also elucidate the pros and cons of controlled<br />

vs. uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages presented <strong>in</strong> section 5.3.2. Tak<strong>in</strong>g <strong>in</strong>to account,<br />

that user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong> appear to follow a power law distribution: few <strong>in</strong>formation<br />

units receive most of the assigned tags and vice versa (cf., Thomas, Caudle & Schmitz,<br />

2009), user-based <strong><strong>in</strong>dex<strong>in</strong>g</strong> should not be the only type of <strong><strong>in</strong>dex<strong>in</strong>g</strong>, at least <strong>in</strong> systems<br />

that are also used for high precision searches. Further, the studies, that have been<br />

presented above have one th<strong>in</strong>g <strong>in</strong> common. The method applied is to analyze the<br />

product of the <strong><strong>in</strong>dex<strong>in</strong>g</strong>, namely the assigned tags or <strong>in</strong>dex terms. Several authors make<br />

conclusions on the <strong>in</strong>tentions of the <strong>in</strong>dexers on the basis of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> product. A<br />

study <strong>in</strong>vestigat<strong>in</strong>g <strong>in</strong>dexer <strong>in</strong>tentions for <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> a more qualitative manner would<br />

be an <strong>in</strong>terest<strong>in</strong>g supplement to the exist<strong>in</strong>g and highly enlighten<strong>in</strong>g studies.<br />

<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> constitute a contrast to <strong>in</strong>tellectual <strong><strong>in</strong>dex<strong>in</strong>g</strong> s<strong>in</strong>ce it<br />

designate <strong><strong>in</strong>dex<strong>in</strong>g</strong> carried out solely on the basis of a mechanical identification of <strong>in</strong>dex<br />

terms on the basis of word occurrences <strong>in</strong> documents. We will expla<strong>in</strong> the concept more<br />

thoroughly <strong>in</strong> Section 5.4 and onwards. Accord<strong>in</strong>g to Albrechtsen (1993) automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> represents the most simplistic conception of subject analysis, s<strong>in</strong>ce the subject<br />

of the document is solely based on the frequency of terms. However, automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> may take the form of either extracted or assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> (see Sections 5.4.1<br />

and 5.4.2. below). As far as automatically assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> is concerned,<br />

Albrechtsen’s statement may be discussed. Here the assignment of <strong>in</strong>dex terms is


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

carried out on the basis of a set of rules direct<strong>in</strong>g the occurrence of certa<strong>in</strong> words to<br />

specific po<strong>in</strong>ts <strong>in</strong> a controlled vocabulary. S<strong>in</strong>ce these rules have been formulated by<br />

humans, some sort of <strong>in</strong>tellectual <strong>in</strong>terpretation of the subject relation between the<br />

document and the controlled vocabulary has been established.<br />

If manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> is taken to represent controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> and automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> to represent uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong>, the differences between the two<br />

approaches will to a large extent be reflected <strong>in</strong> Table 5.2. However, additional<br />

differences exist between manual and automatic methods. One obvious difference<br />

between human and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> relates to economy. Thus, it is quite costly to<br />

perform manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>, when it comes to economy and time consumption, at least<br />

concern<strong>in</strong>g expert-led <strong><strong>in</strong>dex<strong>in</strong>g</strong>. This can expla<strong>in</strong> some of the efforts put <strong>in</strong>to<br />

develop<strong>in</strong>g automatic methods. Accord<strong>in</strong>gly, the low costs connected with automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> are implicitly considered a strength. Here it is important to keep <strong>in</strong> m<strong>in</strong>d the<br />

costs related to not be<strong>in</strong>g able to f<strong>in</strong>d <strong>in</strong>formation (cf. Feldman & Sherman, 2001).<br />

Ineffective retrieval may be caused by both manual and automatic methods. Thus for<br />

both <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods Feldman & Sherman’s calculations emphasize the need to carry<br />

out evaluations <strong>in</strong> order to ensure the functionality and quality of <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

However, the two approaches differ as regards to more qualitative aspects as<br />

well. We have previously mentioned consistency, which is a highly relevant concept<br />

here. As appears from the papers reviewed above on manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>, human <strong>in</strong>dexers<br />

undertake an <strong>in</strong>terpretation of the content of a piece of <strong>in</strong>formation ahead of the<br />

assignment of <strong>in</strong>dex terms, whether they are controlled or uncontrolled. This<br />

<strong>in</strong>terpretation most likely leads to <strong>in</strong>consistencies due to differences <strong>in</strong> the <strong>in</strong>dexers’<br />

conception on, what the document is about (Anderson & Perez-Carballo, 2001a).<br />

Concurrently, the human <strong>in</strong>terpretation allows for documents to be represented by terms<br />

not present <strong>in</strong> the document, which potentially enriches the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>, the <strong>in</strong>terpretation is based on statistical calculations based on term occurrence,<br />

which <strong>in</strong>creases consistency considerably. But, as stated by Bloomfield (2002),<br />

consistency of <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be consistently good or bad. As a consequence it could be<br />

added that <strong><strong>in</strong>dex<strong>in</strong>g</strong> can be <strong>in</strong>consistently good. By this is meant that a consistent bad<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> is not necessarily preferably to an <strong>in</strong>consistent <strong><strong>in</strong>dex<strong>in</strong>g</strong> that conta<strong>in</strong>s very<br />

good elements along with very bad elements. Accord<strong>in</strong>g to Mandersloot, Douglas &<br />

Spicer (1970, p. 50), “[human...] <strong><strong>in</strong>dex<strong>in</strong>g</strong> may have <strong>in</strong>consistencies, but it is flexible.<br />

Mach<strong>in</strong>e <strong><strong>in</strong>dex<strong>in</strong>g</strong> may be consistent, but it is rigid.” Whether this op<strong>in</strong>ion is also<br />

reflected <strong>in</strong> empirical comparisons of the two approaches will be <strong>in</strong>vestigated below.<br />

92


93<br />

Chapter 5<br />

Different authors have referred to the difficulties of isolat<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> as a<br />

variable when measur<strong>in</strong>g the performance of an IR system (e.g., Anderson & Perez-<br />

Carballo, 2001a). However, numerous studies exist that have compared the<br />

performance of the two approaches. Salton’s (1986a) review of early studies argues for<br />

the potential of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> various evaluative sett<strong>in</strong>gs. As mentioned<br />

previously, several of the studies reviewed <strong>in</strong> section 5.3.2, are also relevant <strong>in</strong> the<br />

present section due to their comparison of on one side manual controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> and<br />

on the other hand automatic, uncontrolled <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Examples are the Cranfield<br />

experiments (e.g., Cleverdon, 1967) that demonstrated promis<strong>in</strong>g results as regards<br />

automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the form of s<strong>in</strong>gle terms compared to manually assigned<br />

controlled <strong>in</strong>dex terms, and Savoy’s (2005) study that found that the best performance<br />

was achieved by a comb<strong>in</strong>ation of manual and automatic methods. TREC 6 have also<br />

carried out experiments regard<strong>in</strong>g automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> and retrieval. Here the studies<br />

have not been compared to human <strong><strong>in</strong>dex<strong>in</strong>g</strong>, but have been tested <strong>in</strong> isolation. In<br />

particular the tracks test<strong>in</strong>g term weight<strong>in</strong>g are relevant to automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (Harman<br />

& Voorhees, 2006). In sum, apart from expected lower expenditures on automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>, what can also be expected from automation of <strong><strong>in</strong>dex<strong>in</strong>g</strong> procedures is an<br />

<strong>in</strong>creased level of consistency. In the sections below a more detailed presentation of the<br />

characteristics of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> will follow.<br />

5.4 Approaches to automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> designates the situation, when mach<strong>in</strong>es substitute human<br />

<strong>in</strong>dexers and carry out the <strong><strong>in</strong>dex<strong>in</strong>g</strong> of documents (Lancaster, 2003, p. 283). With our<br />

po<strong>in</strong>t of departure <strong>in</strong> the broad def<strong>in</strong>ition of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> as outl<strong>in</strong>ed <strong>in</strong> the<br />

<strong>in</strong>troduction to the present chapter, automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> is covered by a number of<br />

diverse research societies. Golub (2006) differentiates between four approaches (text<br />

categorization, document cluster<strong>in</strong>g, document classification, and mixed approaches)<br />

orig<strong>in</strong>at<strong>in</strong>g from different research societies such as mach<strong>in</strong>e learn<strong>in</strong>g, <strong>in</strong>formation<br />

retrieval, and library science. A fair amount of the automatic approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

are based on techniques and pr<strong>in</strong>ciples that go back <strong>in</strong> time. The most significant<br />

difference between the time of development of the techniques and the present is that the<br />

6 Short for Text REtrieval Conference. TREC first started out <strong>in</strong> 1992 (Harman & Voorhees, 2006).


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

power and capacity of hardware has <strong>in</strong>creased along with the amount of digitalized<br />

documents. As a consequence, the automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> achieved today has a better<br />

performance, though there is still room for improvement (Lancaster, 2003, p. 330-331).<br />

We divide the automatic approaches as to whether they represent extracted or<br />

assigned methods. In relation to the present work, this division makes sense, because it<br />

reflects the manual approaches that are be<strong>in</strong>g mirrored <strong>in</strong> the automatic counterparts.<br />

Moreover, this is the categorization employed by Lancaster (2003) and Moens (2000).<br />

We will use this division <strong>in</strong> the sections to follow. In practice, algorithms for automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> usually make use of more than one method at the time (Coyle, 2008). In our<br />

review, however, we will present and discuss the methods <strong>in</strong> their pure form with<br />

whatever characteristics they may have.<br />

5.4.1 <strong>Automatic</strong> extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

In automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> terms are drawn from the document itself to<br />

represent the content of the document <strong>in</strong> l<strong>in</strong>e with natural language <strong><strong>in</strong>dex<strong>in</strong>g</strong> mentioned<br />

previously. The most basic k<strong>in</strong>d of automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> is <strong><strong>in</strong>dex<strong>in</strong>g</strong> all<br />

occurr<strong>in</strong>g words <strong>in</strong> a collection of documents (Anderson & Perez-Carballo, 2001b, p.<br />

258) . However, not all natural language <strong>in</strong>dex terms appear<strong>in</strong>g <strong>in</strong> documents makes<br />

good descriptors of a document. Therefore, a number of techniques are necessary <strong>in</strong><br />

order to <strong>in</strong>crease the quality of descriptors when extract<strong>in</strong>g <strong>in</strong>dex terms from<br />

documents. The basic procedure consists of five steps of which some or all may be<br />

<strong>in</strong>cluded <strong>in</strong> order to improve the automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>; 1) identification of words<br />

appear<strong>in</strong>g <strong>in</strong> the document collection (lexical analysis); 2) removal of function words<br />

us<strong>in</strong>g a stop word list; 3) execute stemm<strong>in</strong>g <strong>in</strong> order to make words appear <strong>in</strong> their basic<br />

form; 4) compute a weight<strong>in</strong>g factor for the rema<strong>in</strong><strong>in</strong>g words tak<strong>in</strong>g <strong>in</strong>to account the<br />

term frequency and <strong>in</strong>verse document frequency; and 5) represent documents with the<br />

calculated value on the basis of the previous steps (cf. Salton, 1989, p. 304; Salton &<br />

McGill, 1983). Others supplement the five step procedure with additional steps such as<br />

formation of phrases. In some cases this results <strong>in</strong> a changed succession of steps<br />

(Moens, 2000, p. 78). In the sections to follow, we will elaborate on the s<strong>in</strong>gle steps.<br />

5.4.1.1 Lexical analysis and stop word lists<br />

Lexical analysis identifies a stream of characters <strong>in</strong>to a stream of words.<br />

S<strong>in</strong>gle words are identified, when separated by space or punctuation (Moens, 2000).<br />

94


95<br />

Chapter 5<br />

Some challenges, which might occur <strong>in</strong> the process, are abbreviations, hyphenated<br />

terms, punctuation <strong>in</strong> general, and digits. A mach<strong>in</strong>e readable dictionary may help<br />

solve the problems of abbreviations, and <strong>in</strong> some cases hyphenated terms. The<br />

examples do not necessarily cause problems. However, they need to be considered,<br />

when perform<strong>in</strong>g lexical analysis (Fox, 1992; Moens, 2000).<br />

Stop word lists are lists of the most common words that are removed from full<br />

text documents <strong>in</strong> order to reduce the number of possible <strong>in</strong>dex terms. Alternative<br />

designations comprise stop lists or negative vocabularies (Fox, 1989). The assumption<br />

about stop words is that they do not candidate for good <strong>in</strong>dex terms. In particular, it is<br />

desirable to elim<strong>in</strong>ate function words from the list of potential <strong>in</strong>dex terms (e.g., Luhn,<br />

1957; Salton, 1989). Further, stop word lists limits the space needed <strong>in</strong> <strong>in</strong>dices (Wilbur<br />

& Sirotk<strong>in</strong>, 1992). Stop word lists commonly conta<strong>in</strong> between 50 and 400 words when<br />

directed towards English text (Moens, 2000). For both lexical analysis and stop word<br />

lists, the doma<strong>in</strong> <strong>in</strong> question should be taken <strong>in</strong>to consideration. Thus, for some terms,<br />

the usefulness of potential <strong>in</strong>dex terms may differ depend<strong>in</strong>g on the application area<br />

(Fox, 1992). When prepar<strong>in</strong>g a stop word list different choices must be made. The size<br />

of the list, whether large or small, is at the <strong>in</strong>troductory stage decided by the cut-off<br />

level based on the frequency of terms. For <strong>in</strong>stance, Fox (1989) set the cut-off to<br />

occurrences above 300 for a large stop word list. In addition, different qualitative<br />

actions can be made <strong>in</strong> order to qualify the stop word list further. Examples are<br />

reckon<strong>in</strong>g of alternative spell<strong>in</strong>gs and variants of stop words with diverg<strong>in</strong>g prefixes<br />

and suffixes. Also exam<strong>in</strong>ation of potentially relevant and irrelevant words with an<br />

occurrence close to the cut-off limit is likely to qualify the stop word list (cf. Fox,<br />

1989).<br />

5.4.1.2 Stemm<strong>in</strong>g<br />

Stemm<strong>in</strong>g identifies morphologically related terms by reduc<strong>in</strong>g variants of a<br />

word to its stem or root (Moens, 2000; Salton & McGill, 1983; Anderson & Perez-<br />

Carballo, 2001b). Specifically affixes, that is, prefixes and suffixes are removed from<br />

natural language <strong>in</strong> order to identify stems (cf. Hammarström, 2006). The assumption<br />

is, that “when stems are used as <strong>in</strong>dex terms, a greater number of potentially relevant<br />

items can be identified than when one of the orig<strong>in</strong>al full text words is <strong>in</strong> use” (Salton &<br />

McGill, 1983, p. 72). Us<strong>in</strong>g a stemmer is likely to <strong>in</strong>crease recall, as documents with<br />

morphological variations of the same stem are merged to be represented by the same<br />

<strong>in</strong>dex term. Further, like stop word lists, the use of stemm<strong>in</strong>g reduces the need for


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

space <strong>in</strong> the <strong>in</strong>dex, s<strong>in</strong>ce the number of potential <strong>in</strong>dex terms are reduced dur<strong>in</strong>g the<br />

process (Salton & McGill, 1983; Moens, 2000; Willett, 2006).<br />

Stemm<strong>in</strong>g can be divided <strong>in</strong>to manual or automatic methods. Manual methods<br />

employ some type of regular expressions. <strong>Automatic</strong> stemm<strong>in</strong>g on the other hand uses<br />

for <strong>in</strong>stance affix removal, successor varieties, table look ups or n-grams (Frakes, 1992).<br />

Automated stemm<strong>in</strong>g is carried out by a stemm<strong>in</strong>g algorithm that removes prefixes,<br />

suffixes or both. Two potential problems challenge the performance of stemm<strong>in</strong>g:<br />

under stemm<strong>in</strong>g and over stemm<strong>in</strong>g. Under stemm<strong>in</strong>g removes too little of the term,<br />

while over stemm<strong>in</strong>g removes too much of the term and thus corresponds to what is<br />

known as over truncation and under truncation <strong>in</strong> retrieval (cf. Chowdhury, 2004).<br />

However, what causes over stemm<strong>in</strong>g and under stemm<strong>in</strong>g differs between languages<br />

due to the differences <strong>in</strong> morphological structure (Moens, 2000; Willett, 2006)<br />

A number of stemmers have been proposed. However, two algorithms <strong>in</strong><br />

particular stand out, namely Lov<strong>in</strong>s’ stemmer (1968) and Porter’s stemmer (1980).<br />

Both algorithms are aimed at suffix removal, which is the most common type of<br />

stemmers. Further, both stemmers are aimed at s<strong>in</strong>gle-word terms (Galvez, de Moya-<br />

Anegon & Solana, 2005). Lov<strong>in</strong>s’ stemmer <strong>in</strong>volves two steps. In the first step the<br />

stemm<strong>in</strong>g is carried out. In the second step spell<strong>in</strong>g exceptions are handled by a set of<br />

rules <strong>in</strong> order to avoid the merg<strong>in</strong>g of stems with differ<strong>in</strong>g spell<strong>in</strong>gs. Examples are<br />

collide and collision (Lov<strong>in</strong>s, 1968). Like Lov<strong>in</strong>s, Porter specifies a set of suffixes to be<br />

removed from stems. However, spell<strong>in</strong>g exceptions are not <strong>in</strong>corporated <strong>in</strong> the Porter<br />

stemmer (Porter, 1980). Recently, Porter has developed a new generic stemmer,<br />

Snowball, which provides stemmers for a number of different European languages<br />

<strong>in</strong>clud<strong>in</strong>g Danish (Porter, 2001).<br />

5.4.1.3 Weight<strong>in</strong>g factors<br />

When terms are weighted it is implicit, that some terms, even after lexical<br />

analysis, stop word lists, and stemm<strong>in</strong>g have been applied, are more important than<br />

others. In other words, when differentiat<strong>in</strong>g the weights of the rema<strong>in</strong><strong>in</strong>g terms, it is<br />

implied that the first three steps are not sufficient for the identification of good <strong>in</strong>dex<br />

terms. Luhn (e.g., 1961) was a pioneer <strong>in</strong> suggest<strong>in</strong>g, that terms occurr<strong>in</strong>g <strong>in</strong> documents<br />

could substitute for controlled vocabularies <strong>in</strong> respect to <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The assumption was<br />

that the subject of a document is reflected by the occurrence of terms designat<strong>in</strong>g that<br />

subject (Moens, 2000; Salton, 1989; Salton & McGill, 1983). The higher the frequency<br />

of a term (TF), the higher is the probability that the document is concerned with the<br />

96


97<br />

Chapter 5<br />

Figure 5.5 The resolv<strong>in</strong>g power of significant <strong>in</strong>dex terms. Adapted from Luhn (1958a, p. 161)<br />

subject referred to by the term. Obviously, this is only true up to a certa<strong>in</strong> po<strong>in</strong>t. Stop<br />

words and other high frequent words do not make good <strong>in</strong>dex terms. Accord<strong>in</strong>g to<br />

Luhn (1958a), good <strong>in</strong>dex terms should be found among terms with a medium<br />

frequency <strong>in</strong> the document. Luhn’s thoughts are a cont<strong>in</strong>uation of Zipf’s f<strong>in</strong>d<strong>in</strong>gs.<br />

Approximately a decade earlier, Zipf (1949) discovered that the frequency of terms <strong>in</strong> a<br />

document is <strong>in</strong>versely proportional to its rank position. The pr<strong>in</strong>ciples are illustrated<br />

below (see Figure 5.5).<br />

The early ideas by Luhn have been ref<strong>in</strong>ed and expanded <strong>in</strong> the years to follow<br />

due to different issues; among others lack of uniform application and empirical support<br />

(Salton, 1970, 1988). Thus, it has turned out that fundamental problems arise, if TF is<br />

used as the only basis for extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The reason is that mere TF does not take<br />

<strong>in</strong>to account the comparable occurrence of terms across documents. The result will be<br />

low precision <strong><strong>in</strong>dex<strong>in</strong>g</strong>, s<strong>in</strong>ce a term with a high frequency <strong>in</strong> a large collection of<br />

documents is not able to dist<strong>in</strong>guish the documents from each other, which is the<br />

implicit purpose of precision (Salton & Buckley, 1988; Salton, 1989). One way of<br />

correct<strong>in</strong>g for the limitations of TF is to add the <strong>in</strong>verse document frequency (IDF) <strong>in</strong>to<br />

the calculation of term weights. IDF expresses the occurrence of terms <strong>in</strong> a collection


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

of documents. The assumption is, that terms with a high frequency across a collection<br />

is less able to discrim<strong>in</strong>ate between the documents conta<strong>in</strong><strong>in</strong>g that particular term, than<br />

a term that is high frequent <strong>in</strong> just a few documents (Salton, 1989). The formula for<br />

IDF takes just this <strong>in</strong>to account. The formula is (Moens, 2000; Salton & Buckley,<br />

1988):<br />

<br />

N<br />

log<br />

n<br />

t<br />

where<br />

log = common logarithm<br />

N = number of documents <strong>in</strong> the collection<br />

nt = number of documents <strong>in</strong> the collection conta<strong>in</strong><strong>in</strong>g the term t<br />

By comb<strong>in</strong><strong>in</strong>g TF and IDF, high weights are allocated to terms that<br />

simultaneously have a high frequency <strong>in</strong> a document and a low frequency <strong>in</strong> the<br />

document collection. Further, the product of TF*IDF is one way of measur<strong>in</strong>g term<br />

discrim<strong>in</strong>ation values. Thus, term discrim<strong>in</strong>ation is comparable with IDF (Salton, Yang<br />

& Yu, 1975). Term discrim<strong>in</strong>ation value expresses a terms ability to dist<strong>in</strong>guish<br />

documents of a collection from each other. A core concept <strong>in</strong> relation to term<br />

discrim<strong>in</strong>ation value is connectivity. High connectivity is characteristic for bad <strong>in</strong>dex<br />

terms due to their lack of capacity to dist<strong>in</strong>guish between documents <strong>in</strong> a collection. On<br />

the contrary, good <strong>in</strong>dex terms has a low connectivity between documents (Jones &<br />

Furnas, 1987; Moens, 2000). The applicability of the TF*IDF factor have been the<br />

subject of different op<strong>in</strong>ions over the years. In a presentation of early empirical retrieval<br />

tests Sparck Jones (1973) concludes that the comb<strong>in</strong>ation of the two frequencies<br />

improves retrieval considerably, when compared to weight<strong>in</strong>g based of TF alone.<br />

However, as po<strong>in</strong>ted out by Salton & Buckley (1988), a major weakness of the TF*IDF<br />

product is the need for cont<strong>in</strong>uous updates of the frequency factor. This is <strong>in</strong> particular<br />

necessary <strong>in</strong> dynamic document collections. Thus, TF*IDF is more suitable for static<br />

collections.<br />

In addition to IDF, the length of documents (or vectors cf., Salton & Buckley<br />

(1988)) could be taken <strong>in</strong>to consideration, when determ<strong>in</strong><strong>in</strong>g term weights. Thus, long<br />

documents conta<strong>in</strong> more terms than short, which makes a long document more<br />

98


99<br />

Chapter 5<br />

retrievable than a short document due to the higher frequency of terms. In retrieval the<br />

problem of the frequency of terms may be reduced by normaliz<strong>in</strong>g the term frequency<br />

as to the length of the document (S<strong>in</strong>ghal et al., 1996). Evidently, normalization is<br />

particularly necessary <strong>in</strong> document collections conta<strong>in</strong><strong>in</strong>g heterogeneous documents.<br />

Further, a long document usually conta<strong>in</strong>s more synonyms for the same concept, which<br />

also <strong>in</strong>creases retrieval. In this case, obviously, normalization of length will not be<br />

useful for correction. Instead more qualitative tools must be considered. The possible<br />

higher degree of semantic variability <strong>in</strong> longer documents could, at least partly, expla<strong>in</strong><br />

the tendencies observed by S<strong>in</strong>ghal et al. (1996). They f<strong>in</strong>d that <strong>in</strong> spite of<br />

normalization, longer documents still tend to have better retrieval compared to shorter<br />

documents. Similar observations have been made earlier by Sparck Jones (1973), who<br />

concluded that document length normalization has a little, if any, effect on retrieval.<br />

5.4.1.4 Compound nouns as <strong>in</strong>dex terms<br />

The procedure outl<strong>in</strong>ed above refers to extraction of s<strong>in</strong>gle word <strong>in</strong>dex terms.<br />

However, also phrases may be taken <strong>in</strong>to account when consider<strong>in</strong>g weight<strong>in</strong>g factors.<br />

Phrases constitute a particular challenge s<strong>in</strong>ce the occurrence of two or more words <strong>in</strong> a<br />

phrase frequently has a quite different mean<strong>in</strong>g than the s<strong>in</strong>gle terms <strong>in</strong>cluded <strong>in</strong> the<br />

phrase itself. This is the case concern<strong>in</strong>g noun phrases and proper names. Usually,<br />

phrases bear more mean<strong>in</strong>g and specificity <strong>in</strong>dex terms than the s<strong>in</strong>gle terms<br />

constitut<strong>in</strong>g the phrase (Croft, Turtle & Lewis, 1991; Lancaster, 2003). A classic<br />

example is the phrase “venetian bl<strong>in</strong>ds”. When <strong>in</strong> a phrase, the concept refers to a<br />

certa<strong>in</strong> k<strong>in</strong>d of bl<strong>in</strong>ds. When divided <strong>in</strong>to s<strong>in</strong>gle terms, it refers to people from a<br />

specific city and someth<strong>in</strong>g used to cover w<strong>in</strong>dows respectively, that is, a completely<br />

different mean<strong>in</strong>g. When comb<strong>in</strong>ed, but not as a phrase, it may refer to venetians, that<br />

cannot see. On the other hand, the probability that the two words may occur <strong>in</strong> the same<br />

sentence, but without appear<strong>in</strong>g as a phrase is rather low (Salton & McGill, 1983, p.<br />

86). A number of techniques, whether simple (e.g., simple collocations, statistically<br />

validated N-grams, syntactic structures) or advanced (e.g., extended n-grams, or<br />

syntactic pars<strong>in</strong>g), may be used <strong>in</strong> order to identify phrases <strong>in</strong> documents (Strzalkowski<br />

et al., 1999, p. 117). At present, the methods are expensive and time consum<strong>in</strong>g to<br />

perform, and many questions rema<strong>in</strong>s unanswered, such as the pay off as to the<br />

<strong>in</strong>vestments undertaken <strong>in</strong> different contexts (Voorhees & Pazienza, 1999; Anderson &<br />

Perez-Carballo, 2001b). This may also expla<strong>in</strong> why at present the weight<strong>in</strong>g functions<br />

of s<strong>in</strong>gle terms are to some extent employed when weight<strong>in</strong>g phrases as well (Moens,<br />

2000). Phrases may be considered either as a set of words or as separate concepts when


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

weighted (Croft, Turtle & Lewis, 1991). Three scenarios may be l<strong>in</strong>ed up for the<br />

calculation of phrase weights; 1) the phrase is considered as one unit and the assigned<br />

weight is <strong>in</strong>dependent of the components constitut<strong>in</strong>g the phrase; 2) the phrase weight is<br />

calculated on the basis of the s<strong>in</strong>gle terms compris<strong>in</strong>g the phrase; or 3) a comb<strong>in</strong>ation of<br />

1) and 2), where the results of both approaches are considered, when the weight is<br />

computed (Moens, 2000).<br />

The challenges of phrases mentioned here particularly concern the English<br />

language. In the present empirical work the test collection consists of primarily Danish<br />

texts. The Danish language belongs to the Germanic family of languages along with<br />

German, Swedish, and others. Germanic languages differ from English <strong>in</strong> a number of<br />

ways. The way compound nouns are created is particularly pert<strong>in</strong>ent to automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>. Thus, where English compound nouns are created as phrases, Germanic<br />

languages create compounds by jo<strong>in</strong><strong>in</strong>g them together <strong>in</strong> one word (Hedlund, 2002).<br />

This means that results of IR and <strong><strong>in</strong>dex<strong>in</strong>g</strong> studies cannot be transferred to Germanic<br />

languages as a matter of course (Ahlgren & Kekälä<strong>in</strong>en, 2007). Thus, the challenges<br />

related to English basically consist of identify<strong>in</strong>g when two or more s<strong>in</strong>gle terms are <strong>in</strong><br />

fact a noun phrase. The purpose is to enable an <strong>in</strong>crease <strong>in</strong> precision. In the Germanic<br />

languages the challenges are if anyth<strong>in</strong>g the opposite. Here techniques needs to be able<br />

to identify the components of compound nouns <strong>in</strong> order to <strong>in</strong>crease recall (cf. Pedersen,<br />

Navarretta & Hansen, 2005). Techniques for identify<strong>in</strong>g the components of compound<br />

nouns have been developed and tested for a number of languages other than English.<br />

However, Danish is not among the most thoroughly discovered languages <strong>in</strong> this<br />

respect. A 2-year research project, the VID-project, was carried out <strong>in</strong> the mid-00s by<br />

centre for language technology, University of Copenhagen. The overall purpose of the<br />

project was to <strong>in</strong>vestigate the potential of human language technologies as regards<br />

acquisition and representation of <strong>in</strong>formation (Pedersen, Navarretta & Henriksen, 2004).<br />

Amongst others the project contributed with knowledge about how mark<strong>in</strong>g up texts as<br />

to word classes would affect recall and precision <strong>in</strong> IR. The study found that precision<br />

were very satisfy<strong>in</strong>g (=0.9), whereas recall surpris<strong>in</strong>gly were lower (=0.6). The reasons<br />

for the lower recall were expla<strong>in</strong>ed by errors <strong>in</strong> the recognition of terms, and by a<br />

general lack of complexity <strong>in</strong> the recognition (Pedersen, Navarretta & Hansen, 2005, p.<br />

28). The results of the study emphasize the need for language tools to allow for a high<br />

degree of complexity, when aimed at Danish and similar languages.<br />

To our knowledge no other research supplements the VID-project as regards<br />

the uncover<strong>in</strong>g of the Danish language. Swedish, on the other hand, has been<br />

100


101<br />

Chapter 5<br />

<strong>in</strong>vestigated <strong>in</strong> different studies. Due to the large share of similarities we may<br />

reasonably transfer Swedish results to the Danish language. Ahlgren & Kekälä<strong>in</strong>en<br />

(2007) have tested a number of different techniques <strong>in</strong> a comparative study of Swedish<br />

text. 4 <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods rang<strong>in</strong>g from raw text over <strong>in</strong>flection 7 to two variants of<br />

compound splitt<strong>in</strong>g were compared <strong>in</strong> the study. The <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods were evaluated<br />

and compared on a collection of Swedish newspaper articles us<strong>in</strong>g topics from CLEF.<br />

To set the scene for evaluations 6 user profiles were set up. The profiles varied as to<br />

their degree of patience and their perception of what makes a relevant document.<br />

Normalized discounted cumulated ga<strong>in</strong> (nDCG) (Järvel<strong>in</strong> & Kekälä<strong>in</strong>en, 2002) was<br />

used to measure the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods. The study found that, compared<br />

to the rema<strong>in</strong>der of the tested methods, <strong>in</strong> general the simplest <strong><strong>in</strong>dex<strong>in</strong>g</strong> method had the<br />

lowest performance when orig<strong>in</strong>al words from the topics were used as query terms. On<br />

the contrary, when the topic terms were truncated for queries, the same method had the<br />

best performance compared to the rema<strong>in</strong>der. In sum, <strong>in</strong>flection and compound<br />

splitt<strong>in</strong>g did not improve retrieval compared to simple truncation. It appears the lessons<br />

learned <strong>in</strong> phrase <strong><strong>in</strong>dex<strong>in</strong>g</strong>, namely that it is time consum<strong>in</strong>g and that the payoff is<br />

questionable, also seems to be the case for methods applicable for more complex<br />

languages than English.<br />

5.4.1.5 Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> refers to the automatic extraction of <strong>in</strong>dex terms based on<br />

various techniques. The techniques <strong>in</strong>cluded here corresponds to what Golub<br />

designates as document cluster<strong>in</strong>g (Golub, 2006). However, <strong>in</strong> order to be able to<br />

separate the overall concept from the specific technique cluster<strong>in</strong>g, we apply the term<br />

extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> below. As noted by Golub (Golub, 2006), this approach lies with<strong>in</strong><br />

the IR-tradition. The close relation between extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> and <strong>in</strong>formation<br />

retrieval is constituted by the common use of advanced IR techniques for mark<strong>in</strong>g up<br />

documents and match<strong>in</strong>g queries with documents.<br />

Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> its most simple form is based on the steps described <strong>in</strong><br />

the preced<strong>in</strong>g sections (lexical analysis, removal of stop words, stemm<strong>in</strong>g, and term<br />

weight<strong>in</strong>g) (Salton, 1991). However, also more advanced techniques exist. Such a<br />

technique is, for example the vector space model. The vector space model may be<br />

7<br />

Inflection designate the different forms a word can take, whether it is due to mutation caused by plural<br />

form, strong verbs, compound<strong>in</strong>g use of glue morphemes and others (cf. Ahlgren & Kekälä<strong>in</strong>en, 2007).


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

applied for comparison between documents (<strong><strong>in</strong>dex<strong>in</strong>g</strong>) or between documents and<br />

queries (IR). For <strong><strong>in</strong>dex<strong>in</strong>g</strong> purposes documents are represented by vectors on the basis<br />

of terms occurr<strong>in</strong>g <strong>in</strong> the documents of a collection. The steps comprised by simple<br />

extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> mentioned above are followed <strong>in</strong> order to generate a vector<br />

represent<strong>in</strong>g each document. Subsequently the vectors are processed us<strong>in</strong>g cluster<br />

analysis (Salton, Wong & Yang, 1975).<br />

Two ma<strong>in</strong> steps constitute the process of advanced extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong>: 1) First,<br />

documents are be<strong>in</strong>g represented vectors. Subsequently the vectors are be<strong>in</strong>g compared<br />

us<strong>in</strong>g a similarity measure, and 2) Clusters are formed by means of cluster<strong>in</strong>g<br />

algorithms (cf. Golub, 2006, p. 356). Cluster analysis designates a method for data<br />

analysis with numerous areas of application. By means of cluster analysis unlabelled<br />

patterns with<strong>in</strong> a set of items can be grouped <strong>in</strong>to mean<strong>in</strong>gful clusters (Ja<strong>in</strong>, Murty &<br />

Flynn, 1999). In terms of documents cluster analysis clusters documents <strong>in</strong> a collection<br />

accord<strong>in</strong>g to common features between documents <strong>in</strong> a collection.<br />

Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> is an unsupervised way of organiz<strong>in</strong>g documents, because<br />

no pre-categorized documents are used as tra<strong>in</strong><strong>in</strong>g documents. Document cluster<strong>in</strong>g<br />

may be based on terms occurr<strong>in</strong>g <strong>in</strong> the documents, or on co-occurr<strong>in</strong>g citations. Terms<br />

can also form the basis for cluster<strong>in</strong>g. In that case co-occurrence <strong>in</strong> the document<br />

collection constitutes the unit of analysis (Rasmussen, 1992). Document cluster<strong>in</strong>g is<br />

characterized as an extracted k<strong>in</strong>d of <strong><strong>in</strong>dex<strong>in</strong>g</strong>, s<strong>in</strong>ce the clusters are not matched aga<strong>in</strong>st<br />

a controlled vocabulary.<br />

The performance of extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> based on various cluster<strong>in</strong>g techniques<br />

and/or on simple extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> has been exam<strong>in</strong>ed <strong>in</strong> different comparative studies.<br />

We have already mentioned the Cranfield tests, one of the first attempts to evaluate<br />

automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> (see section 5.3.2).<br />

An early attempt to carry out a cluster<strong>in</strong>g <strong>in</strong>terface for post retrieval, Grouper,<br />

was presented by Zamir & Etzioni (1999) <strong>in</strong> the late 90es. The functionality was made<br />

for a meta search eng<strong>in</strong>e, HuskySearch. The technology beh<strong>in</strong>d Grouper consisted of<br />

three steps; 1) stemm<strong>in</strong>g, 2) identification of basic clusters, and 3) merg<strong>in</strong>g of clusters<br />

with a high degree of overlap between conta<strong>in</strong>ed documents. Further, Grouper had<br />

technology built <strong>in</strong>, which allowed for correct<strong>in</strong>g redundant titles of clusters along with<br />

technology allow<strong>in</strong>g for fast process<strong>in</strong>g of search results.. The <strong>in</strong>terface was evaluated<br />

us<strong>in</strong>g search logs. 3.183 queries had been logged at the Grouper <strong>in</strong>terface with<strong>in</strong> 2<br />

months, while 19.330 queries had been logged from the HuskySearch <strong>in</strong>terface<br />

(represent<strong>in</strong>g ranked search results with no cluster<strong>in</strong>g of results). The data material<br />

102


103<br />

Chapter 5<br />

does now allow for an explanation of the patterns identified <strong>in</strong> the search logs due to the<br />

lack of qualitative data. Another limit to the study, which was po<strong>in</strong>ted out by the<br />

authors, was the lack of control of who used the test system (Grouper) and the basel<strong>in</strong>e<br />

system (HuskySearch). From the data it appeared that users explored several clusters <strong>in</strong><br />

order to locate relevant documents <strong>in</strong> the Grouper <strong>in</strong>terface. The authors expla<strong>in</strong>ed the<br />

undesired situation by either a user behavior that searches for different perspectives of<br />

their <strong>in</strong>formation need or simply that generation of clusters were not able to match user<br />

needs sufficiently. When compared to the basel<strong>in</strong>e system it was found that Grouper<br />

users found more documents, perhaps suggest<strong>in</strong>g that the cluster<strong>in</strong>g made it easier to<br />

locate relevant documents. A qualitative follow up on the study would provide a more<br />

thorough understand<strong>in</strong>g of the hypotheses put forward by the authors <strong>in</strong> the light of their<br />

f<strong>in</strong>d<strong>in</strong>gs.<br />

A later study based on extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> is Käki’s (2005a; 2005b; Käki &<br />

Aula, 2005) <strong>in</strong>vestigation of categorization of web documents for his dissertation work.<br />

Here, two algorithms for extract<strong>in</strong>g category candidates were applied. One was allowed<br />

for s<strong>in</strong>gle terms along with phrases, while the other required phrases conta<strong>in</strong><strong>in</strong>g of at<br />

least 2 terms. The algorithms were used <strong>in</strong> order to build a list of categories for<br />

organiz<strong>in</strong>g web results. The results were added to a category if it conta<strong>in</strong>ed the name of<br />

the category <strong>in</strong> its result summary text (Käki, 2005a). It is the extraction of candidate<br />

terms from the documents themselves and the lack of supervision that cause us to<br />

classify Käki’s work with<strong>in</strong> extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

On the basis of the two extraction algorithms two <strong>in</strong>terfaces was set up for<br />

test<strong>in</strong>g. Different evaluations have been reported from the study. Käki & Aula (2005)<br />

made a comparative study of an <strong>in</strong>terface compris<strong>in</strong>g the algorithm and categorized<br />

search <strong>in</strong>terface with the World Wide Web (hereafter: the web) as the test base. The<br />

basel<strong>in</strong>e was a Google web page display<strong>in</strong>g the results as a ranked list. 20 test persons<br />

participated <strong>in</strong> the test, where 9 predef<strong>in</strong>ed queries <strong>in</strong> general topic areas formed the<br />

start<strong>in</strong>g po<strong>in</strong>t of the searches. The test persons were allowed 1 m<strong>in</strong>ute for each task <strong>in</strong><br />

order to reflect a faster behavior that supposedly would be more realistic. The<br />

performance of the experimental system and the basel<strong>in</strong>e system were measured <strong>in</strong><br />

terms of 1) time to accomplish a task, 2) number of results selected for a task, 3)<br />

relevance of selected results measured on a 3-po<strong>in</strong>t scale (relevant, related, and not<br />

relevant), and 4) subjective attitude concern<strong>in</strong>g both experimental and basel<strong>in</strong>e systems<br />

(Käki & Aula, 2005, p. 199). In addition recall and precision were measured on the<br />

basis of the relevance judgments carried out by the test persons. The study found that


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

the categorized <strong>in</strong>terface had a better average performance <strong>in</strong> precision (62%, sd=13<br />

aga<strong>in</strong>st 49%, sd=15) and recall (33%, sd=4 aga<strong>in</strong>st 19%, sd=7). The results of the test<br />

persons’ attitudes aga<strong>in</strong>st the two systems demonstrated a fairly more positive attitude<br />

towards the test system compared to the basel<strong>in</strong>e system. The test did not f<strong>in</strong>d<br />

substantial differences as to the time applied, most likely due to the very short time<br />

w<strong>in</strong>dow applied <strong>in</strong> the test.<br />

The highly controlled test just referred was followed up by a longitud<strong>in</strong>al study<br />

that was considerably less experimental (Käki, 2005b). 16 people participated <strong>in</strong> the<br />

study. The participants did not receive any <strong>in</strong>struction for the use of the test system<br />

besides us<strong>in</strong>g it any way they would like. This reflected the purpose of the study,<br />

namely to reflect the participants’ real behavior. This time no comparison was made to<br />

a basel<strong>in</strong>e system. Like <strong>in</strong> the previous study, the test system was applied to the web.<br />

The data collection lasted for three months <strong>in</strong>clud<strong>in</strong>g one month of compensation for a<br />

holiday period. Two types of data were collected; search logs and questionnaires. One<br />

questionnaire was distributed a week or two after the launch of the data collection, the<br />

other <strong>in</strong> the end of the study. 3099 queries were logged, while 3232 result pages were<br />

accessed and 1915 categories were selected. The relevance of retrieved documents was<br />

not registered. The study found that categories were used to select 26% of the accessed<br />

result pages. In the qualitative part of the first questionnaire, participants <strong>in</strong>dicated that<br />

categories were useful, when “…the orig<strong>in</strong>al query was vague, broad, general, or<br />

conta<strong>in</strong>ed words that have multiple mean<strong>in</strong>gs...” (Käki, 2005b, p. 138). The ability of<br />

the categories to help <strong>in</strong>crease the focus of a less precise query was also expressed <strong>in</strong> the<br />

second questionnaire. Further, categories were found useful, when result rank<strong>in</strong>gs were<br />

deficient. The results of the study are <strong>in</strong>terest<strong>in</strong>g, because it demonstrates that<br />

categoriz<strong>in</strong>g results is not necessarily useful <strong>in</strong> all <strong>in</strong>formation search<strong>in</strong>g situations.<br />

From the analysis we do get some <strong>in</strong>dication of, when categories may be useful.<br />

However, a more systematic <strong>in</strong>vestigation would clearly be relevant.<br />

5.4.2 <strong>Automatic</strong> assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

<strong>Automatic</strong> assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> is the automatic equivalent to human controlled<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>. The major difference between automatic extracted and automatic assigned<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> is that a coupl<strong>in</strong>g is established between terms occurr<strong>in</strong>g <strong>in</strong> a collection of<br />

documents and a controlled vocabulary. The apparent advantage of coupl<strong>in</strong>g natural<br />

language <strong>in</strong>dex terms with a controlled vocabulary is the enabl<strong>in</strong>g of allow<strong>in</strong>g relations<br />

between documents that share one or more controlled <strong>in</strong>dex terms.<br />

104


105<br />

Chapter 5<br />

Different approaches exist for perform<strong>in</strong>g automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Two<br />

methods can be identified with<strong>in</strong> the text categorization literature. 8 One is based on<br />

knowledge eng<strong>in</strong>eer<strong>in</strong>g, the other on mach<strong>in</strong>e learn<strong>in</strong>g (Sebastiani, 2002). Text<br />

categorization is considered an assignment type of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> due to its<br />

categorization of documents <strong>in</strong>to a predef<strong>in</strong>ed set of categories. To compare,<br />

<strong>in</strong>formation filter<strong>in</strong>g represents another means of categoriz<strong>in</strong>g documents, though with<br />

dynamic categories (Belk<strong>in</strong> & Croft, 1992).<br />

Initially, text categorization was carried out us<strong>in</strong>g a rule based approach (or<br />

knowledge eng<strong>in</strong>eer<strong>in</strong>g approach, cf. Sebastiani (2002)). Typically, a set of rules was<br />

built after the pr<strong>in</strong>ciple of if-then, mean<strong>in</strong>g that if a document met certa<strong>in</strong> criteria it<br />

would be categorized <strong>in</strong> a specific category (Sebastiani, 1999). In practice, a profile<br />

was created for each term to be assigned, conta<strong>in</strong><strong>in</strong>g words and phrases with a high<br />

frequency <strong>in</strong> the documents that usually would be assigned with the controlled <strong>in</strong>dex<br />

term (cf. Lancaster, 2003, p. 287-288). The sum of rules is referred to as a classifier.<br />

Hayes & We<strong>in</strong>ste<strong>in</strong> (1990) are frequently mentioned <strong>in</strong> the literature as an example of<br />

this approach. They reported a system for categorization developed for news stories at<br />

Reuters. The categorization of documents is based on two steps; 1) concept recognition,<br />

and 2) categorization rules. In the first phase both s<strong>in</strong>gle terms and phrases are <strong>in</strong>cluded<br />

<strong>in</strong> the recognition. Further, the system is based on a certa<strong>in</strong> degree of human<br />

<strong>in</strong>tervention to either limit or extend the context of terms if necessary. Thus, the system<br />

may basically be considered a hybrid cf. section 5.5. Also the rules have been extended<br />

compared to pla<strong>in</strong> if-then rules. Thus, the context of a term is <strong>in</strong>cluded <strong>in</strong> order to<br />

decide on the strength of the term as to a specific category. Further, when generat<strong>in</strong>g<br />

the rules, the developers may take <strong>in</strong>to account terms’ specific position <strong>in</strong> a news story<br />

just like the length of the document may be considered. 674 rules were created <strong>in</strong> order<br />

to meet the needs of the document collections at Reuters’. Hayes & We<strong>in</strong>ste<strong>in</strong> (1990)<br />

report an evaluation <strong>in</strong> their presentation of the system. However, due to the very<br />

concise presentation we will not go further <strong>in</strong>to the results here. Further, we have not<br />

8<br />

Here we see an example of <strong>in</strong>consistent term<strong>in</strong>ology. In section 5.4.1.5 we have been referr<strong>in</strong>g to Käki<br />

(eg., Käki, 2005a), who applies the term categorization to denote an extracted type of automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>. In the present section text categorization denotes an assigned type of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

To avoid confusion we will dist<strong>in</strong>guish between the two by referr<strong>in</strong>g to the former as categorization or<br />

extracted categorization and to the latter as text categorization or assigned categorization.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

been able to locate supplementary studies perform<strong>in</strong>g more systematic evaluations of<br />

the system. However, the authors report an estimate sav<strong>in</strong>gs of <strong>in</strong>troduc<strong>in</strong>g the system<br />

to be approximately $752.000 <strong>in</strong> the first full year of deployment despite the expenses<br />

of 6.5 person-years for the development of the system.<br />

Also the American Petroleum Institute exemplifies the knowledge eng<strong>in</strong>eer<strong>in</strong>g<br />

approach. Here, the document collection consists of abstracts conta<strong>in</strong><strong>in</strong>g detailed<br />

technical <strong>in</strong>formation. The units of analysis were abstracts that were subjected to<br />

stemm<strong>in</strong>g and analysis at phrase level due to the large proportion of phrases with<strong>in</strong> the<br />

chemical doma<strong>in</strong>. The set of rules were to a large extent built around the thesaurus<br />

applied to the collection. For <strong>in</strong>stance, cross references <strong>in</strong> the thesaurus functioned as<br />

rules po<strong>in</strong>t<strong>in</strong>g natural language terms to the preferred terms <strong>in</strong> the thesaurus (Mart<strong>in</strong>ez,<br />

Lucey & L<strong>in</strong>der, 1987). As <strong>in</strong> Hayes & We<strong>in</strong>ste<strong>in</strong>’s (1990) study the evaluation of the<br />

present study is limited. In the paper by Mart<strong>in</strong>ez et al. the performance of the<br />

knowledge eng<strong>in</strong>eer<strong>in</strong>g approach is evaluated as to percentages of hits and noise. The<br />

evaluation reports on approximately 50% which cannot be considered impressive. In<br />

addition noise is reported to comprise about 15% of the retrieved documents.<br />

The manual and at times rigid elaboration of rules <strong>in</strong> the knowledge<br />

eng<strong>in</strong>eer<strong>in</strong>g approach has turned out to be expensive and time consum<strong>in</strong>g. To illustrate,<br />

the solution of the American Petroleum Institute conta<strong>in</strong>ed about 14,000 rules<br />

(Mart<strong>in</strong>ez, Lucey & L<strong>in</strong>der, 1987, p. 162). In addition the preparation of term profiles<br />

has been somewhat challeng<strong>in</strong>g. Further, decid<strong>in</strong>g on the relation between document<br />

terms and controlled terms has resulted <strong>in</strong> weak results <strong>in</strong> early studies (Apté, Damerau<br />

& Weiss, 1994; Lancaster, 2003, p. 288). As a consequence of these challenges<br />

alternative solutions were sought and the mach<strong>in</strong>e learn<strong>in</strong>g approach emerged dur<strong>in</strong>g<br />

the 1990s (Sebastiani, 1999; 2002). In the mach<strong>in</strong>e learn<strong>in</strong>g approach a classifier is also<br />

built for each category. Essentially, here the process consists of three stages. First, a<br />

number of documents are categorized manually <strong>in</strong>to a set of predef<strong>in</strong>ed categories. The<br />

selected documents serve the function of tra<strong>in</strong><strong>in</strong>g documents. Preferably, the tra<strong>in</strong><strong>in</strong>g<br />

documents already exist <strong>in</strong> the collection to be classified. Alternatively, artificial<br />

documents may be constructed. Next, a classifier is constructed on the basis of<br />

characteristics of the tra<strong>in</strong><strong>in</strong>g documents. A learner forms the basis of build<strong>in</strong>g the<br />

classifier. The learner will usually be available <strong>in</strong> advance. If a learner does not exist,<br />

some effort must be put <strong>in</strong>to construct<strong>in</strong>g one s<strong>in</strong>ce the learner to a large extent decides<br />

the effectiveness ultimately. It is <strong>in</strong> this second step of the categorization process that<br />

the manual production of rules <strong>in</strong> the knowledge eng<strong>in</strong>eer<strong>in</strong>g approach is replaced by<br />

106


107<br />

Chapter 5<br />

mach<strong>in</strong>e learn<strong>in</strong>g. A number of techniques exist for build<strong>in</strong>g the classifier. Examples<br />

count multivariate regression models, nearest neighbour classifiers, probabilistic<br />

Bayesian models, neural networks, symbolic rule learn<strong>in</strong>g, and Support Vector<br />

Mach<strong>in</strong>es (Dumais et al., 1998). Expla<strong>in</strong><strong>in</strong>g each technique <strong>in</strong> detail is besides the<br />

scope of the present work, but thorough reviews can be found <strong>in</strong> Dietterich (1997), and<br />

Kotsiantis, Zaharakis & P<strong>in</strong>telas (2006). The third and f<strong>in</strong>al step of the mach<strong>in</strong>e<br />

learn<strong>in</strong>g approach to categorization consists of apply<strong>in</strong>g the classifier to the full<br />

collection of documents (cf. Sebastiani, 2002; Golub, 2006, p. 352-353).<br />

Text categorization is characterized by be<strong>in</strong>g either hard or ranked. Hard text<br />

categorization basically denotes a fully automated procedure, while ranked text<br />

categorization conta<strong>in</strong>s approval by a human <strong>in</strong>dexer (Sebastiani, 2002). Thus, ranked<br />

text categorization is basically a semiautomatic approach, which will be explored<br />

further <strong>in</strong> section 5.5.<br />

Mach<strong>in</strong>e learn<strong>in</strong>g as an approach to categorization has been thoroughly tested<br />

<strong>in</strong> different studies (see e.g., Cunn<strong>in</strong>gham, Litt<strong>in</strong> & Witten, 1997). The tests have<br />

<strong>in</strong>vestigated a s<strong>in</strong>gle or several of the techniques mentioned above from a system<br />

oriented perspective. Core examples count Apté, Damerau & Weiss’ (1994), Chen<br />

(1995), and Dumais et al. (1998). However, <strong>in</strong> the present work we are concerned with<br />

the usefulness of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> from a user perspective. Therefore, our review of<br />

studies below will <strong>in</strong>clude studies that have <strong>in</strong>corporated users <strong>in</strong> their evaluation. The<br />

aim is to establish a frame of reference for the results found <strong>in</strong> the search test.<br />

Some authors have <strong>in</strong>vestigated particular <strong>government</strong> Web pages. The<br />

GovStat Project (http://www.ils.unc.edu/govstat) has given rise to a number of studies<br />

relevant to the title of the present section. The project is concerned with a specific k<strong>in</strong>d<br />

of <strong>in</strong>formation with<strong>in</strong> the public doma<strong>in</strong>; US <strong>government</strong>al statistical <strong>in</strong>formation.<br />

However, the ma<strong>in</strong> project focus is on user access and use of <strong>government</strong>al statistical<br />

<strong>in</strong>formation, which is why some of the studies provide valuable <strong>in</strong>sight <strong>in</strong>to the doma<strong>in</strong><br />

of e-<strong>government</strong>. We are present<strong>in</strong>g three studies from the GovStat project below,<br />

namely the studies by Efron et al. (2004) and Kules & Shneidermann (2004; 2005). We<br />

f<strong>in</strong>ish this section with a review of Roitblat, Kershaw & Oot (2010).<br />

Efron et al. (2004) have carried out a study of mach<strong>in</strong>e learn<strong>in</strong>g with<strong>in</strong> the<br />

context of the GovStat project. The purpose of the study was to compare three<br />

representations of documents; keyword, title, and the full text of the documents. The<br />

study consisted of two phases. The first phase clustered 1279 content rich documents<br />

us<strong>in</strong>g k-means. The clusters were generated on the basis of either the full text of the


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

documents, the documents’ titles, or on the basis of human generated keyword<br />

metadata. One document could appear <strong>in</strong> one cluster each. The purpose was to identify<br />

the topic of the documents. The quality of the three approaches was evaluated as part of<br />

the first phase. 10 categories were generated on the basis of phase 1.<br />

In the second phase, the rema<strong>in</strong>der 14.000 documents of the collection were<br />

labelled us<strong>in</strong>g automatic classification. Ahead of the classification of documents, a<br />

classifier had been tra<strong>in</strong>ed on the basis of the topics identified <strong>in</strong> phase 1. Four models<br />

formed the basis for the classifier; probabilistic Roccio, naive Bayes (below: limited<br />

model), support vector mach<strong>in</strong>es, and an augmented model, that applied naive Bayes on<br />

a tra<strong>in</strong><strong>in</strong>g set extended with supplementary documents from the www doma<strong>in</strong> <strong>in</strong><br />

question (below: extended model). Based on an analysis of the accuracy of the four<br />

models’ classification, the second phase compared the two versions of the naive Bayes<br />

classifier. 11 human judges work<strong>in</strong>g on the GovStat project tested the generality of the<br />

two rema<strong>in</strong><strong>in</strong>g classifiers.<br />

The analysis demonstrated that if the success of the classifiers were measured<br />

by their ability to classify documents correct <strong>in</strong> either first or second place, the extended<br />

model performed better than the simple model. However, if the success is measured by<br />

the two models’ ability to classify documents correct the first time, the limited model<br />

performs better. Further, when compared to human assignments to the classes, the<br />

naive Bayes tends to have a more even distribution of documents between the classes.<br />

5.5 Hybrid types of <strong>in</strong>tellectual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

Above we have been present<strong>in</strong>g what is considered prototypical approaches to<br />

automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In reality however, also examples of hybrid forms of <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

appear. Computer assisted manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> refers to the process, where elements of<br />

manual and automatic methods are comb<strong>in</strong>ed <strong>in</strong> order to handle <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In the<br />

literature, computer assisted manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> may also be referred to as computer aided<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> (e.g Lancaster, 2003), semiautomatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (e.g., Fangmeyer, 1974),<br />

mach<strong>in</strong>e aided <strong><strong>in</strong>dex<strong>in</strong>g</strong> (e.g., Milstead, 1992), or simply MAI.<br />

Two basic approaches to computer assisted manual <strong><strong>in</strong>dex<strong>in</strong>g</strong> exist. One<br />

approach is labelled candidate term systems. In essence, candidate term systems<br />

suggest terms for assignment that are subsequently approved by human <strong>in</strong>dexers<br />

(Milstead, 1994, p. 579, Lancaster, 2003, p. 292). This k<strong>in</strong>d of MAI is represented <strong>in</strong> a<br />

system for <strong><strong>in</strong>dex<strong>in</strong>g</strong> at NASA (the NASA Lexical Dictionary (NLD)) (Silvester,<br />

108


109<br />

Chapter 5<br />

Genuardi & Kl<strong>in</strong>gbiel, 1994). In NLD an automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> procedure is carried out,<br />

which subsequently presents controlled <strong>in</strong>dex terms to <strong>in</strong>dexers for manual approval.<br />

The <strong><strong>in</strong>dex<strong>in</strong>g</strong> system was tested for the effect on the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process of human<br />

<strong>in</strong>dexers. One result of the implementation of MAI was, that the average number of<br />

<strong>in</strong>dex terms assigned to documents had been reduced, result<strong>in</strong>g <strong>in</strong> an <strong>in</strong>creased<br />

uniformity <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Also, a predom<strong>in</strong>ant part of the <strong>in</strong>dexers were able to save<br />

time due to the suggestions for <strong>in</strong>dex terms provided by the system. This corresponds<br />

to the results of a later study made by Berrios, Cuc<strong>in</strong>a & Fagan (2002). They found that<br />

the number of <strong>in</strong>dexed documents <strong>in</strong>creased along with the degree of automation <strong>in</strong> the<br />

test system. Lastly, Silvester & Kl<strong>in</strong>gbiel’s work <strong>in</strong>dicated that the selection of <strong>in</strong>dex<br />

terms had become more qualitative s<strong>in</strong>ce the <strong>in</strong>dexers did not need to spend their time<br />

look<strong>in</strong>g up terms <strong>in</strong> the controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> language (1993).<br />

The second approach supplements human <strong><strong>in</strong>dex<strong>in</strong>g</strong> by means of some sort of<br />

automatic procedure. Here, <strong><strong>in</strong>dex<strong>in</strong>g</strong> terms assigned by humans (or similar human<br />

<strong>in</strong>puts) are taken as po<strong>in</strong>t of departure for a subsequent add<strong>in</strong>g of <strong>in</strong>dex terms (Milstead,<br />

1994, p. 579-80, Lancaster, 2003, p. 291). In this sense, the approach corresponds with<br />

text categorization mentioned above. When text categorization takes the form of<br />

semiautomatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>, a ranked order<strong>in</strong>g of potential relevant categories is presented<br />

to the <strong>in</strong>dexer for approval (Sebastiani, 2002). One example of this approach is the<br />

MedIndEx project presented by Humphrey (1989). The project has been carried out<br />

with<strong>in</strong> the National Library of Medic<strong>in</strong>e and is based on Medical Subject Head<strong>in</strong>gs<br />

(MeSH) and the literature found <strong>in</strong> Medl<strong>in</strong>e. In MedIndEx a detailed system of<br />

predef<strong>in</strong>ed frames, facts, and rules guide the automatic analyses of documents <strong>in</strong> the<br />

system. These tools form the human <strong>in</strong>put to the automatic part of the <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

procedure. However, though mentioned as an example of supplement<strong>in</strong>g human<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> by Milstead (1994, p. 580), the MedIndEx also shares some characteristics<br />

with candidate term systems by <strong>in</strong>volv<strong>in</strong>g <strong>in</strong>dexers to approve or reject the suggestions<br />

provided after automatic procedures have been carried out.<br />

Hodge characterizes MAI along a cont<strong>in</strong>uum accord<strong>in</strong>g to the degree of<br />

support provided by <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> aid, rang<strong>in</strong>g from no computer support (basically<br />

manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>) to full automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> (1994). At the lowest level of mach<strong>in</strong>e<br />

support, we f<strong>in</strong>d support of clerical activities. Examples are location of <strong>in</strong>dex terms and<br />

entries of terms <strong>in</strong> mach<strong>in</strong>e-readable form. Tools for this type of support comprise<br />

thesauri and other k<strong>in</strong>ds of controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> languages. Next follows support for<br />

quality control. The quality control may take different forms. In general this type of


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

support checks the manual <strong>in</strong>put of <strong>in</strong>dexers rang<strong>in</strong>g from spell<strong>in</strong>g corrections to<br />

suggestions for candidate preferred terms <strong>in</strong> the case of <strong>in</strong>valid terms. The last step of<br />

the cont<strong>in</strong>uum supports <strong>in</strong>tellectual activities regard<strong>in</strong>g the selection of terms as well.<br />

One way would be to prompt the <strong>in</strong>dexer for <strong>in</strong>dex terms, e.g. <strong>in</strong> relation to other terms<br />

entered by the <strong>in</strong>dexer. Another type would be rem<strong>in</strong>d<strong>in</strong>g the <strong>in</strong>dexer of required<br />

elements <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process. Basically, semiautomatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can be useful <strong>in</strong><br />

the <strong>in</strong>troductory stages of full automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> for a manual review of suggested<br />

<strong>in</strong>dex terms. Also, the hybrid between manual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can be applied<br />

with the purpose of enhanc<strong>in</strong>g manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

5.6 Summary<br />

The present chapter have presented the concept and process of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. What<br />

we have seen, is a number of ways to characterize <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The share of variables<br />

outl<strong>in</strong>ed throughout the chapter stresses a basic premise of empirical evaluations of<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>, namely the challenge of controll<strong>in</strong>g variables.<br />

We have outl<strong>in</strong>ed the characteristics of manual and automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>, but<br />

also hybrid types of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. We have seen that irrespective of the <strong><strong>in</strong>dex<strong>in</strong>g</strong> carried<br />

out, pros and cons can be deduced. <strong>Automatic</strong> and semiautomatic methods for <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

have been tested <strong>in</strong> a variety of sett<strong>in</strong>gs. The <strong>in</strong>troduction of automatic, extracted<br />

methods have allowed for an automatic production of controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong>, which is<br />

highly desirable with the amounts of documents produced today. <strong>Automatic</strong>, extracted<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> allows for a controlled <strong><strong>in</strong>dex<strong>in</strong>g</strong> cleared of the challenges of consistency <strong>in</strong><br />

manual <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

As it appears from the reviews <strong>in</strong> the automatic part of the chapter, the web or<br />

parts of it have been the subject of <strong>in</strong>vestigation <strong>in</strong> many studies. In the search test of<br />

present Ph.D. study we <strong>in</strong>vestigate the applicability of automatic, assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> a<br />

particular test sett<strong>in</strong>g, namely an <strong>in</strong>tranet. This equals a considerably smaller amount of<br />

documents compared to the Web. Further, automated approaches have been tested on e<strong>government</strong><br />

subfields with promis<strong>in</strong>g results. However, we do not have knowledge of<br />

studies test<strong>in</strong>g the methods <strong>in</strong> a collection of documents embrac<strong>in</strong>g the entire range of<br />

document types applied <strong>in</strong> e-<strong>government</strong>. This will be the aim of the search test that<br />

follows.<br />

110


6 Empirical framework<br />

111<br />

Chapter 6<br />

We have previously presented the methodological standpo<strong>in</strong>t of the thesis. In the<br />

present chapter, we move on to report on the empirical methods applied <strong>in</strong> the two<br />

studies constitut<strong>in</strong>g the research project, the doma<strong>in</strong> study and the search test. In the<br />

presentation we follow the sequence of the actual execution of the <strong>in</strong>dividual studies.<br />

This means that we <strong>in</strong>itiate with the doma<strong>in</strong> study <strong>in</strong>clud<strong>in</strong>g questionnaire and focus<br />

group <strong>in</strong>terview designs. Next follows the design of the search test. The chapter is<br />

closed by a section expla<strong>in</strong><strong>in</strong>g the relation between the collected empirical data and the<br />

research questions put forward <strong>in</strong> the <strong>in</strong>troductory chapter.<br />

6.1 Doma<strong>in</strong> study<br />

The case study is <strong>in</strong>itiated by a doma<strong>in</strong> study. As expla<strong>in</strong>ed, the purpose of<br />

the doma<strong>in</strong> study is to identify and account for the contextual framework of the search<br />

test as regards the e-<strong>government</strong> doma<strong>in</strong>. The doma<strong>in</strong> study consists of two separate<br />

parts; an analytical and an empirical. The analytical part has been reported <strong>in</strong> the<br />

<strong>in</strong>formation seek<strong>in</strong>g review (Chapter 4). The empirical part comprises two elements; a<br />

survey questionnaire followed by focus group <strong>in</strong>terviews. To be able to dist<strong>in</strong>guish,<br />

we will use the term respondent to denote a questionnaire participant and the term<br />

participant to denote a focus group participant <strong>in</strong> the rema<strong>in</strong>der of the thesis.<br />

The aim of comb<strong>in</strong><strong>in</strong>g different types of data collection methods is to be able to<br />

compensate for <strong>in</strong>herent <strong>in</strong>dividual limitations of the implied methods. Thus, the<br />

weaknesses and the strengths of methods have an effect on the outcome of the data<br />

collection. The comb<strong>in</strong>ation of different research methods <strong>in</strong> order to explore a research<br />

problem is commonly known as method triangulation. The order and types of methods<br />

applied for triangulation may vary. Miles & Huberman have identified four overall<br />

successions of research methods. The succession may either start out with quantitative<br />

methods followed by qualitative methods, with qualitative methods followed by<br />

quantitative methods, or may employ both methods <strong>in</strong> a parallel manner. The choice of<br />

succession depends on the purpose of the study carried out (Miles & Huberman, 1994,<br />

p. 41). In the doma<strong>in</strong> study we followed the first type of succession, quantitative data<br />

followed by qualitative. This succession helps the researcher ga<strong>in</strong> an overview of


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

important phenomena <strong>in</strong> the first part of the data collection (<strong>in</strong> the present work through<br />

a questionnaire). The quantitative data collection is subsequently followed up by<br />

qualitative data collection (<strong>in</strong> the present work through focus group <strong>in</strong>terviews) to<br />

provide <strong>in</strong>sight <strong>in</strong>to and understand<strong>in</strong>g of the patterns identified <strong>in</strong> the quantitative data.<br />

The survey questionnaire was distributed to a sample of the employees <strong>in</strong> the<br />

case organization. The purpose of the questionnaire was to ga<strong>in</strong> <strong>in</strong>sight <strong>in</strong>to the<br />

distribution of work tasks, <strong>in</strong>formation needs and metadata preferences across the case<br />

organization. Further we wanted to <strong>in</strong>vestigate whether there was a dependency<br />

between work tasks and seek<strong>in</strong>g behaviour, as it could <strong>in</strong>fluence on the choice of test<br />

persons for the search test. A number of advantages and disadvantages are connected<br />

with questionnaires. Questionnaires are associated with several strengths. As the<br />

researcher is not present dur<strong>in</strong>g data collection, the costs are lower compared to other<br />

methods. Also the analysis of data is less time consum<strong>in</strong>g (Frankfort-Nachmias &<br />

Nachmias, 1996). This is particularly the case, when us<strong>in</strong>g Kalus (see section 6.2.1) for<br />

data collection, due to the feature <strong>in</strong> the system allow<strong>in</strong>g the research to extract the<br />

results of the survey directly <strong>in</strong>to an excel spread sheet, which can subsequently be<br />

imported to an analysis software, e.g., SPSS. Further, bias may be reduced due to the<br />

lack of <strong>in</strong>teraction between <strong>in</strong>terviewer and <strong>in</strong>terviewee, and due to the high degree of<br />

anonymity (Frankfort-Nachmias & Nachmias, 1996). In both cases reduction of bias is<br />

ascribed to the non-present <strong>in</strong>terviewer. The presence of an <strong>in</strong>terviewer may result <strong>in</strong><br />

bad communication between <strong>in</strong>terviewer and <strong>in</strong>terviewee. Also the skills of an<br />

<strong>in</strong>terviewer may <strong>in</strong>fluence the results, when conduct<strong>in</strong>g <strong>in</strong>terviews. The presence of an<br />

<strong>in</strong>terviewer can also affect the answers of the respondent to become less honest, because<br />

the respondent’s feel<strong>in</strong>g of anonymity is low. However, questionnaires also have<br />

weaknesses. In questionnaires it is highly important, that questions are understandable<br />

to the respondents, s<strong>in</strong>ce the researcher is not present to expla<strong>in</strong> the mean<strong>in</strong>g of<br />

questions to the respondents. In this manner the lack of presence of the researcher at the<br />

same time becomes a strength and a weakness <strong>in</strong> questionnaires. Thus, <strong>in</strong><br />

questionnaires, the importance of understandable and unambiguous questions cannot be<br />

emphasized enough. Further, a common problem <strong>in</strong> questionnaires is low response<br />

rates (Frankfort-Nachmias & Nachmias, 1996).<br />

The second and qualitative part of the doma<strong>in</strong> study consisted of seven focus<br />

group <strong>in</strong>terviews. Focus groups are associated with a number of characteristics<br />

imply<strong>in</strong>g strengths or weaknesses <strong>in</strong> terms of their function as a tool for data collection.<br />

A dist<strong>in</strong>ctive feature of focus groups is the synergy aris<strong>in</strong>g from the <strong>in</strong>teraction between<br />

112


113<br />

Chapter 6<br />

the participants. This is a strength, when result<strong>in</strong>g <strong>in</strong> a more thorough discussion than<br />

can be achieved <strong>in</strong> an <strong>in</strong>dividual <strong>in</strong>terview. On the other hand the presence of other<br />

participants may cause censor<strong>in</strong>g and conform<strong>in</strong>g with the participants, which is not<br />

desirable (Carey & Smith, 1994, p. 124). Hav<strong>in</strong>g more <strong>in</strong>terviewees present at the same<br />

time further enables the <strong>in</strong>terviewer to ask participants to compare experiences and<br />

understand<strong>in</strong>gs, which aga<strong>in</strong> enriches the understand<strong>in</strong>g of the <strong>in</strong>dividual <strong>in</strong> the group<br />

(Morgan, 1996, p. 139). In quantitative terms, the method provides data <strong>in</strong> a quick<br />

manner and at lower costs compared to <strong>in</strong>dividual <strong>in</strong>terviews (Walden, 2006, p. 224).<br />

The comb<strong>in</strong>ation of these features made us choose focus group <strong>in</strong>terviews over<br />

<strong>in</strong>dividual <strong>in</strong>terviews for the qualitative exploration of the survey results.<br />

The focal po<strong>in</strong>t of the focus group <strong>in</strong>terviews were the results of the<br />

questionnaire. Thus, we wanted to <strong>in</strong>troduce the participants to the patterns identified <strong>in</strong><br />

the results <strong>in</strong> order to encourage to elaboration and discussion <strong>in</strong> the group. In the focus<br />

group <strong>in</strong>terviews we wanted to present the participants with the questionnaire results <strong>in</strong><br />

order for them to be able to elaborate on and discuss the patterns. In the section to<br />

follow, we are elaborat<strong>in</strong>g on the methods applied for the doma<strong>in</strong> study.<br />

6.2 Questionnaire design, collection, and analysis<br />

A questionnaire was used as the quantitative part of the doma<strong>in</strong> study to get an<br />

overall view of the distribution of employees, work tasks, and seek<strong>in</strong>g behaviour across<br />

the case organization. The collection of data for the survey lasted for one week took<br />

place between 11th and 18th December 2008. We kept a rather short time w<strong>in</strong>dow for<br />

the <strong>in</strong>vestigation, because we hypothesized that most people, if they respond to a<br />

questionnaire, respond rather quickly, while they still remember hav<strong>in</strong>g received an<br />

<strong>in</strong>vitation to participate. As expected, the majority of responses were received with<strong>in</strong><br />

the first two days after the launch of the survey. An <strong>in</strong>vitation to participate was<br />

distributed by mail (see Appendix 3) to a stratified sample of the employees. The email<br />

expla<strong>in</strong>ed the background of the <strong>in</strong>vestigation. Follow<strong>in</strong>g a l<strong>in</strong>k <strong>in</strong> the e-mail, the<br />

respondents were taken to the onl<strong>in</strong>e survey. After 5 days an e-mail was sent to rem<strong>in</strong>d<br />

the potential respondents of the survey. We settled for one rem<strong>in</strong>der <strong>in</strong> to avoid<br />

annoy<strong>in</strong>g the respondents (cf. Cook, Heath & Thompson, 2000, p. 831).<br />

In Chapter 2 we <strong>in</strong>troduced SKATs bus<strong>in</strong>ess model that comprises and<br />

describes all work tasks handled by the employees, external as well as <strong>in</strong>ternal (see<br />

Appendix 1). The condensed work tasks and the ma<strong>in</strong> processes of the bus<strong>in</strong>ess model


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

have formed the basis for the questionnaire and the recruitment for the focus groups<br />

respectively. As emphasized earlier, the diversity of tasks handled by SKAT is large as<br />

regards the topics and the form of the tasks. In response to this, we had a hypothesis,<br />

that different work tasks might generate differences <strong>in</strong> the seek<strong>in</strong>g behaviour. To test<br />

this hypothesis, we used the work tasks from the bus<strong>in</strong>ess model as the focal po<strong>in</strong>t <strong>in</strong> the<br />

questionnaire. Thus, each respondent answered clarify<strong>in</strong>g questions about work tasks<br />

relevant to them. The questionnaire consists of two overall parts; one common to all<br />

respondents identify<strong>in</strong>g background data; and a second part identify<strong>in</strong>g the work tasks<br />

handled by the respondents. In the second part of the questionnaire, the respondents<br />

answered a number of questions explor<strong>in</strong>g seek<strong>in</strong>g behaviour triggered from their work<br />

tasks. We will elaborate further on this below.<br />

6.2.1 Technique and structure<br />

The questionnaire was developed us<strong>in</strong>g the software Kalus<br />

(http://www.kalus.dk). The questionnaire ma<strong>in</strong>ly consisted of pre-coded (or closed-<br />

choice) questions, that has a f<strong>in</strong>ite range of answers for the respondent to choose from,<br />

when respond<strong>in</strong>g to a question. The questionnaire does also conta<strong>in</strong> a few examples of<br />

open-ended questions. Open-ended questions were <strong>in</strong>cluded <strong>in</strong> order to allow for<br />

clarification or supplement of a prior pre-coded question. The strength of us<strong>in</strong>g precoded<br />

questions is that responses are not subject to a potential mis<strong>in</strong>terpretation before<br />

they can be compared and calculated. This is the ma<strong>in</strong> reason for the prevail<strong>in</strong>g role of<br />

these particular questions <strong>in</strong> the questionnaire. However, concurrently it should be kept<br />

<strong>in</strong> m<strong>in</strong>d that a common problem with this type of questions is the miss<strong>in</strong>g possibilities<br />

for respondents to elaborate on their answers (Buck<strong>in</strong>gham & Saunders, 2004; de Vaus,<br />

2002b). The choice of primarily pre-coded questions and mandatory answers were<br />

closely tied to the purpose of the questionnaire and it’s relation to the overall research<br />

questions. Thus, the overall purpose of the questionnaire was to provide an overview of<br />

the distribution across the organization as to work tasks and <strong>in</strong>formation seek<strong>in</strong>g. The<br />

more detailed elaboration and expla<strong>in</strong><strong>in</strong>g of questionnaire results were to be<br />

<strong>in</strong>vestigated <strong>in</strong> the focus groups. With this <strong>in</strong> m<strong>in</strong>d, it was reasonable to support the<br />

overview function of the questionnaire by ma<strong>in</strong>ly pre-coded questions and prompted<br />

answers. When apply<strong>in</strong>g this approach for a questionnaire, the pilot test<strong>in</strong>g become<br />

even more important (de Vaus, 2002b). Further, research show that the word<strong>in</strong>g of<br />

questions may have an impact on the outcome of surveys (e.g., Olsen, 1997). Also the<br />

<strong>in</strong>troduction to a question affects how a question is answered (Clark & Schober, 1992).<br />

114


115<br />

Chapter 6<br />

When design<strong>in</strong>g the questionnaire, we wanted to take <strong>in</strong>to account the considerable<br />

sensitivity towards use of language that respondents have. One way of do<strong>in</strong>g this is to<br />

aim at mak<strong>in</strong>g the questions as precise as possible, for <strong>in</strong>stance by us<strong>in</strong>g probes or<br />

<strong>in</strong>corporat<strong>in</strong>g cognitive reliefs <strong>in</strong>to the questions (Olsen, 1997, p. 300). What we<br />

wanted to achieve was to reduce the degree of uncerta<strong>in</strong>ty by elaborat<strong>in</strong>g on the<br />

questions and the possibilities for replies form the respondents. The word<strong>in</strong>g of the<br />

questionnaire was tested ahead of the data collection. We elaborate on the pilot and<br />

pretest<strong>in</strong>g <strong>in</strong> section 6.2.4.<br />

Cont<strong>in</strong>gency questions were used to direct respondents to questions relevant to<br />

them (de Vaus, 2002b). In the questionnaire, all questions about work tasks worked as<br />

cont<strong>in</strong>gency questions <strong>in</strong> order to guide the respondents to the questions relevant to the<br />

work task <strong>in</strong> question. The purpose was to make sure respondents only reported seek<strong>in</strong>g<br />

behaviour regard<strong>in</strong>g their actual work tasks. Further, cont<strong>in</strong>gency questions <strong>in</strong>creases<br />

the likel<strong>in</strong>ess of the respondent to f<strong>in</strong>ish the survey, as the cognitive complexity is<br />

reduced (Shropshire, Hawdon & Witte, 2009, p. 356). In most questions we prompted<br />

for answers. This disposition may be discussed. Optional answers have the advantage<br />

of not forc<strong>in</strong>g the respondent to respond to a question. At the same time, optional<br />

answers tend to be skipped, when respondents work through the questionnaire (cf.<br />

Evans & Mathur, 2005, p. 200). Prompted answers on the other hand may cause, that<br />

respondents give up answer<strong>in</strong>g the questionnaire and do not f<strong>in</strong>ish. After careful<br />

consideration we chose to prompt for answers <strong>in</strong> order to make sure, that the important<br />

questions were answered and not avoid hav<strong>in</strong>g to leave out too many answers due to<br />

<strong>in</strong>completeness.<br />

6.2.2 Content<br />

The questionnaire conta<strong>in</strong>s six questions for each work task. The six questions<br />

are replicated for all of the n<strong>in</strong>eteen condensed work tasks <strong>in</strong>cluded <strong>in</strong> the questionnaire,<br />

result<strong>in</strong>g <strong>in</strong> a 75 pages questionnaire. Due to the cont<strong>in</strong>gency character of the questions<br />

regard<strong>in</strong>g work tasks, not all pages were presented to the respondents. Before gett<strong>in</strong>g to<br />

the po<strong>in</strong>t of elaborat<strong>in</strong>g on the work tasks, the respondents were asked about a number<br />

of background data. The questionnaire f<strong>in</strong>ished by thank<strong>in</strong>g the respondents for their<br />

participation, time and contribution.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

6.2.2.1 Background data<br />

The questionnaire was <strong>in</strong>itiated by a number of demographic questions. We<br />

refer to these as background data. The questions count the respondents’:<br />

year of birth (1a),<br />

gender (1b),<br />

most recent f<strong>in</strong>ished education (2a),<br />

title of education (2b),<br />

place of employment (3a),<br />

departmental affiliation (4-10), and<br />

length of service <strong>in</strong> the organization (11) 9<br />

The purpose of the questions is to enable test<strong>in</strong>g for the impact of demographic<br />

characteristics on seek<strong>in</strong>g behaviour. Further, the background data is needed <strong>in</strong> order to<br />

control for the degree to which the sample reflects the population, it has been drawn<br />

from.<br />

6.2.2.2 Work tasks<br />

Ahead of the design of the questionnaire, we hypothesized that seek<strong>in</strong>g<br />

behaviour could differ as to the work task <strong>in</strong> question, both when it come the subject<br />

and <strong>in</strong> particular the complexity of the work task. The assumptions were based on<br />

Byström’s f<strong>in</strong>d<strong>in</strong>gs of the correspondence between work task complexity and seek<strong>in</strong>g<br />

behaviour (e.g., reported <strong>in</strong> Byström & Järvel<strong>in</strong>, 1995; Byström, 1997) (see section<br />

4.4.5). The generic work tasks described by SKAT do not address the complexity <strong>in</strong><br />

Byström’s terms as such. Rather they are topical descriptions of the areas of<br />

responsibility of the organization. Despite the difference of def<strong>in</strong>itions, SKATs<br />

descriptions of work tasks were used as the basic foundation of the questionnaire.<br />

Further, <strong>in</strong> comb<strong>in</strong>ation with the <strong>in</strong>formation need types (see section 6.2.2.3) we do<br />

become an impression of the complexity of the work task. This decision had several<br />

reasons. The large diversity of tasks which have been mentioned previously is a core<br />

characteristic of the organization. Identify<strong>in</strong>g <strong>in</strong>formation seek<strong>in</strong>g characteristics on the<br />

basis of work tasks allowed for data that could <strong>in</strong>form us about potential (and expected)<br />

differences <strong>in</strong> seek<strong>in</strong>g behaviour among the work tasks. We needed this knowledge for<br />

two purposes. The ma<strong>in</strong> purpose was to be able to answer the research questions<br />

9 The parentheses refers to the question numbers (see Appendix 4).<br />

116


117<br />

Chapter 6<br />

concerned with seek<strong>in</strong>g behaviour. Secondly, we wanted to use the data for the<br />

selection of test persons for the search test. For this secondary purpose we wanted to<br />

explore the use of the <strong>in</strong>tranet for different work purposes. Specifically we wanted to<br />

identify potential variations <strong>in</strong> the <strong>in</strong>tensity of use of the <strong>in</strong>tranet. Lastly, the work task<br />

descriptions allowed for a standardized framework of the work areas covered by SKAT<br />

<strong>in</strong> a language familiar to the respondents. The work tasks are represented on pages 12,<br />

16, 38, 45, 49, and 65 <strong>in</strong> the questionnaire (see Appendix 4). In sum, 19 work tasks are<br />

<strong>in</strong>cluded distributed among six ma<strong>in</strong> processes. Probes were considered particularly<br />

important <strong>in</strong> the questions regard<strong>in</strong>g work tasks. Thus, s<strong>in</strong>ce the selection or<br />

deselection of work tasks is highly <strong>in</strong>fluential on the results, it was important, that the<br />

respondents were able to identify their work tasks <strong>in</strong> the generic descriptions. For that<br />

reason we used the probe to give examples of the subtasks conta<strong>in</strong>ed <strong>in</strong> the overall work<br />

task.<br />

6.2.2.3 Elaboration of work tasks<br />

The work tasks were elaborated on through six questions. The first, frequency<br />

of work tasks, were considered relevant <strong>in</strong> order to explore the relation between work<br />

tasks and <strong>in</strong>formation seek<strong>in</strong>g (see question 15, Appendix 4). For the case organization,<br />

this question was particularly relevant due to the share of work tasks that are highly<br />

seasonal. The frequency also allowed <strong>in</strong>sight <strong>in</strong>to to the relative importance of the work<br />

task <strong>in</strong> question, and thereby enables to exam<strong>in</strong>e, whether the frequency affects other<br />

aspects of seek<strong>in</strong>g behaviour. Next followed the respondents’ experience with the work<br />

task (see question 16, Appendix 4). The question was <strong>in</strong>cluded s<strong>in</strong>ce this was expected<br />

to have an <strong>in</strong>fluence on their seek<strong>in</strong>g behaviour, e.g., as to selection of sources and<br />

frequency of <strong>in</strong>formation seek<strong>in</strong>g.<br />

The third question regarded the frequency of <strong>in</strong>formation seek<strong>in</strong>g (see question<br />

17, Appendix 4). The rationale for ask<strong>in</strong>g this question was that some of the work tasks<br />

might have a tendency to generate <strong>in</strong>formation seek<strong>in</strong>g more often than others. Ask<strong>in</strong>g<br />

this question we wanted to explore, whether there was a divergence between how<br />

<strong>in</strong>formation demand<strong>in</strong>g the outl<strong>in</strong>ed work tasks were. We identified the categories of<br />

choice as to their frequency (every time, every second time, every 3 rd or 4 th time, or<br />

practically never) <strong>in</strong>stead of us<strong>in</strong>g less exact alternatives like almost always, sometimes,<br />

once <strong>in</strong> a while, and the like. We are aware, that <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> practice does<br />

not occur on such fixed <strong>in</strong>tervals as suggested <strong>in</strong> the listed answer categories, which<br />

may have confused the respondents. Further, the responses were expected to express<br />

average frequencies. On the other hand, we considered the alternative (e.g., often,


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

rarely etc.) as too semantically open. The challenge of semantically open categories is<br />

the <strong>in</strong>terpretation of results, which become less exact.<br />

Fourth came <strong>in</strong>formation sources (see question 18, Appendix 4). The selection<br />

of <strong>in</strong>formation sources reflects aspects of how a work task is dealt with. This was the<br />

ma<strong>in</strong> reason for <strong>in</strong>clud<strong>in</strong>g the question <strong>in</strong> the questionnaire. Further, the question about<br />

<strong>in</strong>formation sources had the purpose of identify<strong>in</strong>g the relative importance of the<br />

<strong>in</strong>tranet compared to other <strong>in</strong>formation sources. F<strong>in</strong>ally, with this question, we wanted<br />

to identify, if there was a difference <strong>in</strong> the importance of the <strong>in</strong>tranet depend<strong>in</strong>g on the<br />

work task <strong>in</strong> question. This could po<strong>in</strong>t to particular work tasks of relevance, when<br />

identify<strong>in</strong>g test persons for the search test. Be<strong>in</strong>g aware, that the work tasks handled <strong>in</strong><br />

the case organization are quite diverse, we listed some <strong>in</strong>formation sources but also<br />

allowed for the respondents to add miss<strong>in</strong>g sources <strong>in</strong> an open field. This way we were<br />

able to get a comprehensive picture of the use of <strong>in</strong>formation sources, without hav<strong>in</strong>g to<br />

list too many sources that might not be relevant to the majority of respondents. The<br />

question was constructed <strong>in</strong> a way that allowed for the respondents to choose the<br />

sources relevant to them, whether one or more. The question was measured <strong>in</strong> terms of<br />

dichotomous variables s<strong>in</strong>ce it enables us to compare the results with prior results. In<br />

the light of the changes of direction <strong>in</strong> <strong>in</strong>formation seek<strong>in</strong>g studies mentioned <strong>in</strong> section<br />

4.2, one could dispute the relevance of the <strong>in</strong>formation sources questions <strong>in</strong> the<br />

questionnaire. On the other hand, seek<strong>in</strong>g studies that <strong>in</strong>clude sources of <strong>in</strong>formation<br />

cont<strong>in</strong>ue to f<strong>in</strong>d their legitimacy (e.g., Davies, 2007; Makri, Blandford & Cox, 2008a;<br />

Connaway, Dickey & Radford, 2011; Lu & Yuan, 2011). In addition, also more recent<br />

studies and models of <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong>volves the aspect of <strong>in</strong>formation sources,<br />

yet with another focal po<strong>in</strong>t (e.g., Byström & Järvel<strong>in</strong>, 1995; Byström, 1999). In the<br />

present study, <strong>in</strong>vestigat<strong>in</strong>g the use of sources had one significant reason; we wanted to<br />

map the <strong>in</strong>tranet of the case organization, s<strong>in</strong>ce it is the object of the evaluation of<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> methods later <strong>in</strong> the thesis. By mapp<strong>in</strong>g the <strong>in</strong>tranet along with other<br />

<strong>in</strong>formation sources, we wanted to display the relative importance of the <strong>in</strong>tranet as to<br />

its function, strength, and weaknesses as experienced by the organization employees. A<br />

side effect of the question was the possibility of mirror<strong>in</strong>g the scenery for <strong>in</strong>formation<br />

seek<strong>in</strong>g <strong>in</strong> the organization.<br />

The fifth question measured the <strong>in</strong>formation needs that emerge when deal<strong>in</strong>g<br />

with a work task (see question 19, Appendix 4). With the variable ‘<strong>in</strong>formation need’,<br />

we wanted to discover, if the identified work tasks of the organisation have a tendency<br />

to generate certa<strong>in</strong> types of <strong>in</strong>formation needs. However, it may be complicated<br />

118


119<br />

Chapter 6<br />

represent<strong>in</strong>g the variable by the theoretical concepts themselves. This is particularly<br />

difficult when consider<strong>in</strong>g the problem of respondents’ sensitivity towards the<br />

formulation of questions discussed above. Therefore we represented the <strong>in</strong>formation<br />

needs with eight <strong>in</strong>dicators of different <strong>in</strong>formation needs. The rationale is that it is<br />

easier for the respondents to relate to an <strong>in</strong>dicator compared to a theoretical concept.<br />

The decision about which theoretical basis to use for operationalization of the<br />

<strong>in</strong>formation needs was highly <strong>in</strong>fluenced by the method selected for data collection. An<br />

obvious choice would have been to use the recent proposal for types of <strong>in</strong>formation<br />

needs suggested by Ingwersen & Järvel<strong>in</strong> (2005). However the proposal conta<strong>in</strong>s eight<br />

different types of <strong>in</strong>formation needs, which would be difficult to operationalize <strong>in</strong> a<br />

form understandable to the respondents. Instead we used the trichotomy suggested by<br />

Ingwersen (1992, pp. 116-117). Here the suggested <strong>in</strong>formation needs are: 1)<br />

verificative needs (VN), 2) conscious topical needs (CTN), and 3) muddled topical<br />

needs (MTN).<br />

Indicator Description Correspond<strong>in</strong>g<br />

<strong>in</strong>formation need<br />

1 I know exactly which documents I need <strong>in</strong> order to<br />

solve the work task<br />

VN<br />

2 I need to f<strong>in</strong>d a document I have used before VN<br />

3 I pretty much know which documents exists on the CTN<br />

subject<br />

4 I am work<strong>in</strong>g with a new project with<strong>in</strong> a subject area<br />

well known to me. I would like to acqua<strong>in</strong>t myself<br />

with the part, that is new to me<br />

MTN<br />

5 I am look<strong>in</strong>g for documents for a new work task<br />

with<strong>in</strong> a subject area that is familiar to me<br />

6 I am work<strong>in</strong>g with a subject area, that I have not been<br />

work<strong>in</strong>g with before<br />

7 I know the subject well but need a specific piece of<br />

<strong>in</strong>formation<br />

CTN<br />

MTN<br />

CTN<br />

Table 6.1 Indicators of <strong>in</strong>formation needs <strong>in</strong> questionnaire and correspond<strong>in</strong>g theoretical<br />

descriptions


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

When a user is hav<strong>in</strong>g a verificative <strong>in</strong>formation need, he wants to locate an<br />

item or piece of <strong>in</strong>formation, where some k<strong>in</strong>d of bibliographic <strong>in</strong>formation is known.<br />

Conscious topical needs cover <strong>in</strong>formation needs, <strong>in</strong> which the user wants to discover<br />

aspects of a subject matter known to her. Both verificative and conscious topical needs<br />

are associated with strong cognitive structures. That is, the uncerta<strong>in</strong>ty of the<br />

<strong>in</strong>formation user is low. The muddled topical needs cover a situation, where a user<br />

wants to discover concepts and relations with<strong>in</strong> a subject area not well known to him.<br />

In this latter type of <strong>in</strong>formation need the cognitive structures are weaker as to the topic<br />

<strong>in</strong> question. Ingwersen’s (1992) trichotomy of <strong>in</strong>formation needs allowed for each of<br />

the dist<strong>in</strong>ct <strong>in</strong>formation needs to be represented by more than one <strong>in</strong>dicator and at the<br />

same time not overload<strong>in</strong>g the respondents with statements to relate to. Between 2 and<br />

three <strong>in</strong>dicators were developed to represent each <strong>in</strong>formation need. We restricted the<br />

number of <strong>in</strong>dicators due to the possible length of the questionnaire <strong>in</strong> l<strong>in</strong>e with de<br />

Vaus’ (2002b) directions. The <strong>in</strong>dicators used <strong>in</strong> the questionnaire are shown <strong>in</strong> Table<br />

6.1.<br />

The sixth and last question concerned preferred metadata (see question 20,<br />

Appendix 4). The overall rationale for ask<strong>in</strong>g the respondents about preferred metadata<br />

Table 6.2 List of respondents' preferred metadata listed <strong>in</strong> questionnaire<br />

Metadata<br />

1 Target audience (e.g. accountants, employers, divorced, exporters)<br />

2 Superior subjects (from the taxonomy)<br />

3 Subject (description of the specific topic of the document)<br />

4 Name/title of legal text/rul<strong>in</strong>g (e.g. LBK no. 931 as of 18/09/2008)<br />

5 Object of the document (e.g. car, property, stays abroad)<br />

6 Activity (e.g. deposits, assessments, bill<strong>in</strong>g, imports)<br />

7 Geographical data (e.g. name of city, country, region)<br />

8 Responsible <strong>in</strong>stitution or department (who published the document?)<br />

9 Project (is the document connected to a specific project?)<br />

10 Document type (e.g. rul<strong>in</strong>g, form, guidance)<br />

11 Document number (e.g., journal number, number of rul<strong>in</strong>gs, ISBN)<br />

12 Document ID (cont<strong>in</strong>uous number attached to documents at the <strong>in</strong>tranet)<br />

13 Work task (search<strong>in</strong>g for colleagues engaged <strong>in</strong> a particular service or task,<br />

regardless of location)<br />

120


121<br />

Chapter 6<br />

was to <strong>in</strong>vestigate the elements of an ideal search situation. Further, we wanted to use<br />

the results of this question to encourage the participants to expla<strong>in</strong> their present<br />

search<strong>in</strong>g behaviour when present<strong>in</strong>g the results to the focus groups. The question was<br />

designed as a list of 13 different metadata that the respondents could choose from. The<br />

metadata represented both <strong>in</strong>tr<strong>in</strong>sic and extr<strong>in</strong>sic metadata. That is, whether the<br />

metadata can be found directly or <strong>in</strong>directly <strong>in</strong> the document, or if the metadata<br />

designates someth<strong>in</strong>g external, but still relevant to the understand<strong>in</strong>g of the document.<br />

Metadata are usually divided <strong>in</strong>to three types depend<strong>in</strong>g on, whether they refer to the<br />

content, the context, or the structure of the document (Gilliland, 2008). The thirteen<br />

metadata <strong>in</strong>cluded were aimed at represent<strong>in</strong>g all three types of metadata (<strong>in</strong>cluded<br />

metadata appear from Table 6.2). The question f<strong>in</strong>ished with the possibility to suggest<br />

miss<strong>in</strong>g metadata <strong>in</strong> the list.<br />

6.2.3 Data collection<br />

At the time of the launch of the questionnaire SKAT had 8679 employees that<br />

comprise the population of the survey. We distributed the questionnaire to a sample of<br />

this population. A number of advantages are associated with sampl<strong>in</strong>g. One is to<br />

reduce time and costs when collect<strong>in</strong>g and analys<strong>in</strong>g results. In addition, samples are<br />

for the most part sufficiently reflect<strong>in</strong>g the population (Zikmund, 2000). In the present<br />

<strong>in</strong>vestigation a sample was preferred over <strong>in</strong>clud<strong>in</strong>g the population <strong>in</strong> order to reduce<br />

the amount of time spent on respond<strong>in</strong>g by the employees. The questionnaire was<br />

distributed to a stratified random sample of the employees with<strong>in</strong> the organization<br />

(Levy & Lemeshow, 2008). The strata were constructed on the basis of the<br />

departmental affiliation of the employees. With<strong>in</strong> each stratum a random sample was<br />

drawn reflect<strong>in</strong>g the relative size of the departments. The sample size was set to 799.<br />

In this way the sample was abundant above the amount required for a precision of<br />

results of less than 5% (cf. Israel, 1992).<br />

6.2.4 Pilot test<strong>in</strong>g<br />

In order to reduce the risk of errors (e.g., Buck<strong>in</strong>gham & Saunders, 2004, p.<br />

84), the questionnaire was pilot- and pretested ahead of <strong>in</strong>itiat<strong>in</strong>g the survey. The<br />

questionnaire was discussed with our contact person at SKAT and presented at a<br />

research meet<strong>in</strong>g with colleagues at RSLIS. Next, a pilot was carried out among a<br />

number of SKAT employees. The selection criteria reflected the stratified sample, yet


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

with fewer participants. The purpose of the pilot test was twofold. Firstly, we wanted<br />

to get an impression of the recipients’ perceived understand<strong>in</strong>g of the questionnaire and<br />

allow for feedback to potential ambiguities. Secondly, we wanted an <strong>in</strong>dication of what<br />

could be an expected response rate <strong>in</strong> the actual survey. This was needed <strong>in</strong> order to<br />

calculate the size of the sample. Further we wanted to test the questionnaire on a group<br />

of people resembl<strong>in</strong>g the ones, that would be answer<strong>in</strong>g the f<strong>in</strong>al version of the<br />

questionnaire as recommended by de Vaus (2002b). The pilot questionnaire<br />

corresponded to the f<strong>in</strong>al questionnaire. However, <strong>in</strong> the pilot questionnaire text boxes<br />

had been <strong>in</strong>serted <strong>in</strong> order to welcome the pilot respondents’ comments for the<br />

questions. The pilot was distributed to 89 respondents. Of these 29% f<strong>in</strong>ished the<br />

questionnaire (see Appendix 5 for further details on the pilot). The feedback from the<br />

pilot- and pre-tests was <strong>in</strong>corporated <strong>in</strong>to the questionnaire before it was distributed to<br />

the respondents. Corrections comprised add<strong>in</strong>g of options, word<strong>in</strong>g of probes,<br />

simplification of the layout, and the like.<br />

6.2.5 Data analysis<br />

The questionnaire data consisted of scales rang<strong>in</strong>g from categorical to <strong>in</strong>terval<br />

scale. The categorical data obviously appear when ask<strong>in</strong>g about the gender of the<br />

respondent. However, the questionnaire also conta<strong>in</strong>s a number of questions allow<strong>in</strong>g<br />

the respondents to select one or more predef<strong>in</strong>ed answers, e.g., regard<strong>in</strong>g <strong>in</strong>formation<br />

sources (question 18, see Appendix 4). In this case every choice also constitutes a<br />

categorical variable, imply<strong>in</strong>g that the variable either has been selected (=1) or<br />

deselected (=0) by the respondent. The quantum of categorical data <strong>in</strong> the data set<br />

determ<strong>in</strong>es the analysis of the questionnaire data. Thus, we used nonparametric<br />

statistics to analyse the data, s<strong>in</strong>ce the requirements for us<strong>in</strong>g parametric tests are data at<br />

<strong>in</strong>terval level (Siegel & Castellan, 1988, p. 33). The questionnaire data was analysed<br />

us<strong>in</strong>g descriptive statistics. Inferential statistics were carried out too, but did not<br />

perform results of adequate significance for report here.<br />

The descriptive, univariate analysis of the questionnaire data consists of<br />

frequency distributions as to the respondents and their seek<strong>in</strong>g behaviour. Frequencies<br />

are usually reported as percentages, because they are easier to read than raw<br />

frequencies. Further, compared to raw frequencies the comparison of percentages is<br />

more dist<strong>in</strong>ct, because the figures have been normalized. However, the basis of the<br />

normalization is a division of the frequencies by 100. The smaller the sample is, the<br />

more impact the s<strong>in</strong>gle unit gets when report<strong>in</strong>g results as percentages (Healey, 2007).<br />

122


123<br />

Chapter 6<br />

A predom<strong>in</strong>ant part of the work tasks reported <strong>in</strong> the questionnaire part of the doma<strong>in</strong><br />

study has less than 100 answers. In order to avoid comparison of figures <strong>in</strong> the<br />

univariate part of the analysis that is not true to the actual responses with<strong>in</strong> the s<strong>in</strong>gle<br />

work task, we will report the frequencies for all responses. Yet, raw frequencies are<br />

difficult to compare across two or more groups that do not have the same quantum of<br />

responses. With the comparisons of frequencies across work tasks <strong>in</strong> m<strong>in</strong>d, we also<br />

report the percentages <strong>in</strong> the relevant tables.<br />

Table 6.3 Cross tabulations carried out on the basis of variables <strong>in</strong> questionnaire data<br />

Independent<br />

variables<br />

Education<br />

Department<br />

Length of<br />

service<br />

Periodicity of<br />

occurrence of<br />

work task<br />

Experience<br />

with work task<br />

Frequency of<br />

<strong>in</strong>formation<br />

seek<strong>in</strong>g<br />

Use of<br />

<strong>in</strong>formation<br />

sources<br />

Frequency of<br />

<strong>in</strong>formation<br />

seek<strong>in</strong>g<br />

Use of<br />

<strong>in</strong>formation<br />

sources<br />

Dependent variables<br />

Indicators of<br />

<strong>in</strong>formation<br />

needs<br />

Preferred<br />

metadata<br />

Further, whenever relevant, we provide the average percentages <strong>in</strong> the univariate<br />

statistics tables. Two rationales lie beh<strong>in</strong>d this decision. One reason is that some tables<br />

are rather comprehensive because of the number of reported values. In these cases the<br />

average percentages can help ga<strong>in</strong> an overview of the content of the table. Also,<br />

average percentages can help clarify, if a certa<strong>in</strong> work task differs from the average<br />

distribution <strong>in</strong> upper or lower direction. Whenever reported the average percentages are


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

reported at the bottom of the tables. As for the descriptive, bivariate analyses, we<br />

carried out cross tabulations of central variables. The exact cross tabulations appear<br />

from Table 6.3. In the table, the columns represent dependent variables while the rows<br />

represent <strong>in</strong>dependent variables. That is, that we controlled for the degree of <strong>in</strong>fluence<br />

on the four dependent variables from the <strong>in</strong>dependent variables listed <strong>in</strong> the rows of the<br />

table. The results are reported <strong>in</strong> Chapter 7.<br />

6.2.6 Methodical reflections<br />

340 respondents completed the questionnaire result<strong>in</strong>g <strong>in</strong> a response rate of<br />

42,6%. 302 respondents (37,8%) did not log <strong>in</strong>to the questionnaire at all. However, of<br />

the 799 employees, that constituted the sample, 156 (19.5%) began the questionnaire,<br />

but did not f<strong>in</strong>ish (see Appendix 5). In the latter group of respondents, one part is<br />

<strong>in</strong>terest<strong>in</strong>g <strong>in</strong> particular. 66 respondents stop respond<strong>in</strong>g when f<strong>in</strong>ish<strong>in</strong>g the questions<br />

concern<strong>in</strong>g the work task Inspection. Further, 10.9% of the respondents complet<strong>in</strong>g the<br />

questionnaire, did not choose any work tasks (see Table 7.2). Different motives may be<br />

detected for non-response <strong>in</strong> surveys (see e.g., Nakash et al., 2008). We do not know<br />

what the exact reason for non-response <strong>in</strong> the present survey. Both <strong>in</strong>ternal and external<br />

motives can be identified. Internal causes could be that the respondents got bored with<br />

the questionnaire, because the same questions kept be<strong>in</strong>g repeated, when more than one<br />

work task was chosen. This was <strong>in</strong>dicated by some respondents. Another motive could<br />

be that the respondents could not relate to the description of work tasks, and therefore<br />

ended up choos<strong>in</strong>g none. An external reason could be that the employees <strong>in</strong> the<br />

organization are presented with questionnaires from time to time. It might be the case<br />

that this particular questionnaire was deselected because the <strong>in</strong>vited employees felt<br />

overloaded with questionnaire surveys. Whether the reasons are <strong>in</strong>ternal or external, the<br />

amount of respondents quitt<strong>in</strong>g the questionnaire before f<strong>in</strong>ish<strong>in</strong>g it and the amount of<br />

respondents not select<strong>in</strong>g any work tasks stresses the importance of questionnaire<br />

designs.<br />

Another methodical challenge to the questionnaire data is caused by the design<br />

of the questionnaire itself. Thus, a central feature of the questionnaire guides the<br />

respondents to the specific work tasks, they are carry<strong>in</strong>g out. This allows <strong>in</strong>sight <strong>in</strong>to<br />

the characteristics of specific work tasks <strong>in</strong> the organization. Concurrently, however,<br />

this particular feature at the same time has had the effect, that some work tasks received<br />

very few answers (see Table 7.3). This has had the consequence that the reliability of<br />

work tasks with few respondents must be considered. We will report frequencies and<br />

124


125<br />

Chapter 6<br />

percentages <strong>in</strong> regards to the univariate statistics, but will be precautious with the results<br />

from work tasks with few respondents.<br />

Despite the methodical challenges, the data report answers from 340 people<br />

regard<strong>in</strong>g their seek<strong>in</strong>g behaviour. The respondents represent a stratified sample of<br />

approximately 8000 employees, ensur<strong>in</strong>g that many types of employees are represented.<br />

The purpose of the questionnaire data was to ga<strong>in</strong> an overview of the seek<strong>in</strong>g behaviour<br />

across work tasks. This purpose has been met by the questionnaire. The subsequent<br />

focus groups counterbalance for the limitations of the questionnaire.<br />

6.3 Focus group method<br />

Focus group <strong>in</strong>terviews were <strong>in</strong>cluded as the qualitative counterpart <strong>in</strong> the<br />

doma<strong>in</strong> study. We refer to the group <strong>in</strong>terviews as focus group <strong>in</strong>terviews <strong>in</strong> order to<br />

mirror Morgan’s (1996) def<strong>in</strong>ition. He def<strong>in</strong>es focus groups as <strong>in</strong>terviews with a<br />

composite group of people that are controlled by a moderator while discuss<strong>in</strong>g a topic<br />

def<strong>in</strong>ed by the moderator or researcher. The overall purpose of the focus groups was to<br />

validate and elaborate on the survey results. In the follow<strong>in</strong>g sections, we will account<br />

for the research method applied <strong>in</strong> this part of the data collection. We <strong>in</strong>itiate by<br />

present<strong>in</strong>g the data collection as regards purpose and design of the focus groups<br />

(Section 6.3.1), the questions guid<strong>in</strong>g the focus groups (Section 6.3.2), and the conduct<br />

and documentation (Section 6.3.3). We f<strong>in</strong>ish by explicat<strong>in</strong>g the methods used for data<br />

analysis (Section 6.3.4).<br />

6.3.1 Purpose and design<br />

The general <strong>in</strong>tention beh<strong>in</strong>d the focus group <strong>in</strong>terviews was to reduce the<br />

restrictions around them, and allow for the elaborations put forward by the participants,<br />

s<strong>in</strong>ce elaborations were just the purpose of the <strong>in</strong>terviews. On the other hand, we aimed<br />

for a fairly tight form of the focus groups to make sure that all subareas were covered by<br />

the discussions (cf. Halkier, 2008, pp. 38-41). A slide show and an <strong>in</strong>terview guide<br />

were applied to reta<strong>in</strong> structure. The slide show was presented to the participants dur<strong>in</strong>g<br />

the <strong>in</strong>terview sessions. The <strong>in</strong>tention beh<strong>in</strong>d the slide show was to prompt discussions<br />

of the questionnaire results among the participants and encourage them to expla<strong>in</strong> and<br />

clarify the underly<strong>in</strong>g <strong>in</strong>formation behaviour and mean<strong>in</strong>g of <strong>in</strong>formation <strong>in</strong> their daily<br />

work. An example of the focus group slideshows appears from Appendix 8. In


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

addition, a semi structured <strong>in</strong>terview guide was applied to support and guide the group<br />

discussions. The <strong>in</strong>tention beh<strong>in</strong>d the <strong>in</strong>terview guide was not to force the questions on<br />

the participants. Thus, if the participants had other relevant issues, they wanted to<br />

discuss <strong>in</strong> relation to the presented results, they were allowed to. Rather the <strong>in</strong>terview<br />

guide had the function of support<strong>in</strong>g the focus group moderator <strong>in</strong> case discussions<br />

removed too far from the subject <strong>in</strong> question, or <strong>in</strong> case the conversation stalled. In this<br />

sense the <strong>in</strong>terview guide rather served as a supportive tool to ensure that discussions<br />

would develop. This also meant that not all questions necessarily needed answers from<br />

the participants.<br />

Ma<strong>in</strong> process Number of participants<br />

Settlement Participant 1-6 (6 persons)<br />

Instruction Participant 7-12 (6 persons)<br />

Processes of support Participa13-17 (5 persons)<br />

Customs <strong>in</strong>spection Participant 18-22 (5 persons)<br />

Common <strong>in</strong>spection Participant 23-27 (5 persons)<br />

Management and development Participant 28-31 (4 persons)<br />

Collection Participant 32-35 (4 persons)<br />

Table 6.4 Overview of participants <strong>in</strong> focus groups<br />

7 workshops were conducted, each represent<strong>in</strong>g one of the ma<strong>in</strong> processes of<br />

the bus<strong>in</strong>ess model of the case organization. One process, Inspection, was represented<br />

by two workshops, s<strong>in</strong>ce the two work tasks conta<strong>in</strong>ed <strong>in</strong> the ma<strong>in</strong> process were<br />

considered so diverse, that it might affect the outcome, if they had been merged <strong>in</strong>to one<br />

workshop. The workshops took place <strong>in</strong> June 2009 <strong>in</strong> four different locations across<br />

Denmark (see specifications <strong>in</strong> Appendix 7. Each workshop lasted approximately 2<br />

hours and had between 4 and 6 participants and may therefore be characterized as a<br />

m<strong>in</strong>i group type of group compared to full groups, which usually has between 8 and 10<br />

participants (Greenbaum, 1993). In total, 35 persons were <strong>in</strong>terviewed. The<br />

distribution between the ma<strong>in</strong> processes appears from Table 6.4.<br />

Each workshop represents one of the ma<strong>in</strong> processes <strong>in</strong> the bus<strong>in</strong>ess model.<br />

The recruit<strong>in</strong>g of participants consisted of two steps. Firstly, a number of managers<br />

were asked by e-mail to identify approximately five participants <strong>in</strong> their department.<br />

The managers reported a list of names back that were contacted directly by e-mail<br />

afterwards. Different locations were used <strong>in</strong> order to allow for representation of all six<br />

126


127<br />

Chapter 6<br />

ma<strong>in</strong> processes of the bus<strong>in</strong>ess model <strong>in</strong> the workshops. The workshops took place <strong>in</strong><br />

four different physical locations respectively. Collect<strong>in</strong>g the data <strong>in</strong> different locations<br />

had the benefit of represent<strong>in</strong>g different types of offices. Also different types of<br />

employees were represented <strong>in</strong> the focus groups. Thus the participants represented<br />

employees with an academic background, employees educated with<strong>in</strong> the case<br />

organization, and employees with a clerical background.<br />

6.3.2 Data collection: Interview guide<br />

The <strong>in</strong>terview guide appears <strong>in</strong> Appendix 9. The function of the <strong>in</strong>terview guide<br />

was to have a set of questions to br<strong>in</strong>g <strong>in</strong>to play <strong>in</strong> case the participants had trouble<br />

discuss<strong>in</strong>g the presented slides without trigger<strong>in</strong>g questions. The literature suggest, that<br />

the succession <strong>in</strong> <strong>in</strong>terviews starts out with general questions followed by more specific<br />

questions (e.g., Stewart, Shamdasani & Rook, 2007, p. 61). We decided to apply this<br />

succession throughout the <strong>in</strong>terview start<strong>in</strong>g out with an <strong>in</strong>troduction to the participants’<br />

background. Bloor et al. (2001) recommends, that demographic data are collected<br />

ahead of the focus group, e.g., by us<strong>in</strong>g a short questionnaire. However, we found, that<br />

lett<strong>in</strong>g the participants start out by <strong>in</strong>troduc<strong>in</strong>g themselves worked well as a way of<br />

gett<strong>in</strong>g everyone <strong>in</strong>to play from the beg<strong>in</strong>n<strong>in</strong>g of the <strong>in</strong>terview on a topic comfortable to<br />

them. This is just the function of open<strong>in</strong>g questions (Krueger, 1998, p. 23). Next the<br />

second part of the <strong>in</strong>terview followed, concern<strong>in</strong>g the f<strong>in</strong>d<strong>in</strong>gs of the survey. Here, the<br />

questionnaire results relevant to the current focus group were <strong>in</strong>troduced <strong>in</strong> the slide<br />

show. The questions asked followed the four themes of the questionnaire, namely the<br />

frequency of <strong>in</strong>formation seek<strong>in</strong>g, use of <strong>in</strong>formation sources, and developed<br />

<strong>in</strong>formation needs. The focus groups f<strong>in</strong>ished with a discussion of preferred metadata<br />

when seek<strong>in</strong>g <strong>in</strong>formation at the <strong>in</strong>tranet.<br />

6.3.3 Execution and documentation<br />

The <strong>in</strong>terview guide was accompanied by the slide show as an object for discussion and<br />

explanation. This meant that the slide show came to serve as probe for the questions<br />

asked by the <strong>in</strong>terviewer, help<strong>in</strong>g to keep the <strong>in</strong>terview on topic (e.g., Rub<strong>in</strong> & Rub<strong>in</strong>,<br />

2005, p. 164). The work shops were <strong>in</strong>itiated by an <strong>in</strong>troduction to the <strong>in</strong>terviewer, to<br />

the workshop purpose, and the agenda. Hand-outs of the slide show were distributed to<br />

the participants to enable them to go back <strong>in</strong> the slides, if they had additional comments<br />

later <strong>in</strong> the <strong>in</strong>terview. Some goodies were offered to the participants <strong>in</strong> order to show


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

our appreciation of their efforts. A Dictaphone recorded the <strong>in</strong>terview <strong>in</strong> preparation for<br />

documentation purposes. The Dictaphone was started, when the participants started<br />

<strong>in</strong>troduc<strong>in</strong>g themselves. The group <strong>in</strong>terviews ended whenever the participants had<br />

discussed the slides conta<strong>in</strong>ed <strong>in</strong> the slideshow. We f<strong>in</strong>ished the session by thank<strong>in</strong>g the<br />

participants for their time, <strong>in</strong>put, and contributions, and welcomed them to contact us <strong>in</strong><br />

case they recalled topics of relevance after the end<strong>in</strong>g of the <strong>in</strong>terview.<br />

The <strong>in</strong>terviews were subsequently transferred to the transcription software<br />

Express Scribe and transcribed. For the transcription, we developed a list of criteria for<br />

what to <strong>in</strong>clude and what to exclude from the transcription (see Appendix 10). Bloor et<br />

al. (2001) suggest, that all speech are transcribed <strong>in</strong>clud<strong>in</strong>g passages, where other<br />

participants agree with a s<strong>in</strong>gle persons statements. S<strong>in</strong>ce we are not perform<strong>in</strong>g<br />

content analysis of the focus groups on other passages than the ones concern<strong>in</strong>g the<br />

participants’ background, we are not go<strong>in</strong>g to be calculat<strong>in</strong>g the degree of agreement.<br />

This is the ma<strong>in</strong> reason why we have not transcribed these support<strong>in</strong>g “mm”s and<br />

“yeah”s. We f<strong>in</strong>ally anonymized the participants’ names before convert<strong>in</strong>g the<br />

<strong>in</strong>terviews <strong>in</strong>to the rtf format required for import<strong>in</strong>g files <strong>in</strong>to atlas.ti.<br />

6.3.4 Data analysis<br />

The focus groups transcriptions were analysed <strong>in</strong> two sections. The analysis<br />

software atlas.ti (version 5.6.2) was used to support the analysis (see Figure 6.1). The<br />

first analysis concerns the <strong>in</strong>troductory part of the <strong>in</strong>terviews, where the participants<br />

presented themselves. The purpose was to discover the distribution of the participants<br />

as to their work tasks, education and length of service. In this <strong>in</strong>troductory analysis we<br />

were <strong>in</strong>spired by the pr<strong>in</strong>ciples of content analysis, which is a quantitatively oriented<br />

type of analysis with the purpose of summariz<strong>in</strong>g a complete set of data or parts of it<br />

(Neuendorf, 2002; Krippendorff, 2004). In the present analysis, we used the pr<strong>in</strong>ciples<br />

of content analysis to get a quantitative overview of the distribution of the participants.<br />

The second part of the analysis concerns the elaboration and validation of the<br />

survey results <strong>in</strong> preparation for answer<strong>in</strong>g the research questions. This second part of<br />

the analysis was guided by Halkier’s (2008) three steps <strong>in</strong> focus group analysis: 1)<br />

cod<strong>in</strong>g; 2) categorization, and 3) conceptualization. In the cod<strong>in</strong>g process, passages of<br />

text are marked up with prelim<strong>in</strong>ary labels. Here, categorization designates the process,<br />

where the <strong>in</strong>itial codes are related to each other, identify<strong>in</strong>g subord<strong>in</strong>ate, superior, and<br />

co-ord<strong>in</strong>ate codes among the <strong>in</strong>itial codes attached. The categorization can imply a<br />

128


Figure 6.1 Screen dump from atlas.ti cod<strong>in</strong>g of focus group <strong>in</strong>terviews<br />

129<br />

Chapter 6<br />

reduction of the data, when codes are comb<strong>in</strong>ed <strong>in</strong>to superior categories, but also further<br />

complication of the data, if codes are expanded and supplemented with more detailed<br />

sub codes. Identification of relations and contradictions between codes is <strong>in</strong>herent <strong>in</strong><br />

process of categorization. F<strong>in</strong>ally, conceptualization designates the part of the analysis,<br />

where the categorization and codes are related to the data, but also the theoretical<br />

concepts underly<strong>in</strong>g the data, either as to similar studies, theoretical concepts, or other<br />

empirical parts of the research project.<br />

We started out by cod<strong>in</strong>g the <strong>in</strong>terviews with free codes, correspond<strong>in</strong>g to<br />

Halkier’s first step cod<strong>in</strong>g. Next, we used the function <strong>in</strong> atlas.ti allow<strong>in</strong>g group<strong>in</strong>g of<br />

the <strong>in</strong>itial codes <strong>in</strong>to cod<strong>in</strong>g families. Hereby we were able to categorize the codes<br />

accord<strong>in</strong>g to Halkier’s second step categorization. The third step, conceptualization,<br />

were represented by the analysis of the codes and cod<strong>in</strong>g families and relat<strong>in</strong>g these to<br />

other studies, to the questionnaire data and to theoretical concepts. Quotes from the<br />

questionnaire, the focus group <strong>in</strong>terviews and the search test are presented through the<br />

thesis. The applied quotes have been translated <strong>in</strong>to English, but they appear <strong>in</strong> their<br />

orig<strong>in</strong>al Danish word<strong>in</strong>g <strong>in</strong> Appendix 11. The results of the seven focus group<br />

<strong>in</strong>terviews are reported <strong>in</strong> Chapter 7 along with the questionnaire results.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

6.3.5 Limitations<br />

The focus groups <strong>in</strong>terviews were based on a based on a convenience sample of<br />

employees. We acknowledge that a random sample perhaps could have been more<br />

representative of SKAT as such. However, the educational level of the organization<br />

was reflected <strong>in</strong> the participants along with the majority of the organization work tasks.<br />

Further, the focus groups were carried out <strong>in</strong> four different locations across Denmark <strong>in</strong><br />

order to reflect the geographical distribution of the organizations. 35 people<br />

participated <strong>in</strong> 7 focus groups provid<strong>in</strong>g valuable <strong>in</strong>sight <strong>in</strong>to their daily <strong>in</strong>formation<br />

seek<strong>in</strong>g patterns.<br />

6.4 Search test design<br />

The search test compares full text <strong><strong>in</strong>dex<strong>in</strong>g</strong>, an extracted type of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>,<br />

and automatic assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the form of text categorization. The search test was<br />

set up as an experimental test. The test took place <strong>in</strong> June, 2010 <strong>in</strong> two different office<br />

locations of SKAT, below location 1 and 2. In accordance with our methodological<br />

standpo<strong>in</strong>t we asked employees at SKAT to participate <strong>in</strong> the search test. In the<br />

rema<strong>in</strong>der of the thesis we will use the term test person to denote a search test<br />

participant.<br />

6.4.1 Test system<br />

The first draft of the search test design was to carry out the test when the<br />

revised <strong>in</strong>tranet had been implemented, and the employees had had some time to adjust<br />

to the system. However, the process of implement<strong>in</strong>g the new portal at SKATs pages<br />

was delayed. This meant that it was not possible to execute the test <strong>in</strong> the portal<br />

environment <strong>in</strong> operation. Instead we used a prototype of the future <strong>in</strong>tranet as our test<br />

base. At the time of the search test the categorization was still be<strong>in</strong>g tra<strong>in</strong>ed. In<br />

addition, the prototype had some functional <strong>in</strong>expediencies. We expla<strong>in</strong>ed these to the<br />

test persons as a part of the <strong>in</strong>troduction to the test system, but nevertheless the course<br />

of the test was <strong>in</strong> some cases challenged. In order to avoid changes <strong>in</strong> the system across<br />

the s<strong>in</strong>gle test sessions, the test system was not updated dur<strong>in</strong>g the search test period.<br />

From a technical perspective the test system was embedded <strong>in</strong> a separate test<br />

environment. The test database was generated <strong>in</strong> august 2009 and has not been updated<br />

<strong>in</strong> the <strong>in</strong>terven<strong>in</strong>g period of time up to the search test <strong>in</strong> June, 2010. Thus, the newest<br />

130


131<br />

Chapter 6<br />

documents conta<strong>in</strong>ed <strong>in</strong> the test base at the time of the search test were from august<br />

2009. The test base conta<strong>in</strong>ed a sample of the documents conta<strong>in</strong>ed <strong>in</strong> the current<br />

<strong>in</strong>tranet. The test base conta<strong>in</strong>ed 188.600 documents that had been randomly drawn<br />

from the <strong>in</strong>tranet. By comparison, at the time of the search test the <strong>in</strong>tranet <strong>in</strong> use<br />

conta<strong>in</strong>ed 681.640 documents. That is, the test base conta<strong>in</strong>ed approximately 28 % of<br />

the full version of the <strong>in</strong>tranet.<br />

As <strong>in</strong> the <strong>in</strong>tranet <strong>in</strong> function, the prototype was based on CMS technology.<br />

Autonomy’s (www.autonomy.com) search software IDOL provided the search<br />

functionalities of the search <strong>in</strong>terface. The <strong>in</strong>terface is depicted <strong>in</strong> Figure 6.2. Though<br />

more fields were available, the test persons solely used the fields “Søgetekst” (Query<br />

box), “Søgetype” (Search operator), and “Dokumenttype” (Document type) dur<strong>in</strong>g<br />

test<strong>in</strong>g. The possibility to specify searches to forms (“Blanket”), <strong>in</strong>formation, or selfservice<br />

(“Selvbetjen<strong>in</strong>g”) just below the grey bar (<strong>in</strong> the middle of the <strong>in</strong>terface, see<br />

Figure 6.2) was default set to “Information” and was not changed dur<strong>in</strong>g the test.<br />

Neither was the default sett<strong>in</strong>g of rank<strong>in</strong>g search results as to their relevance.<br />

The query box was used for enter<strong>in</strong>g query terms. The box supported the use<br />

of quotation marks for phase searches. Search terms entered were automatically<br />

Figure 6.2 Screen dump of the test system: Search fields


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

truncated. The search operator field specified how search terms were comb<strong>in</strong>ed. One<br />

of four options could be chosen. “Free text” (FT) retrieved documents conta<strong>in</strong><strong>in</strong>g most,<br />

but not necessarily all, entered search terms. “Pages conta<strong>in</strong><strong>in</strong>g all words” (AW)<br />

retrieved documents conta<strong>in</strong><strong>in</strong>g all search terms <strong>in</strong> their exact or truncated form. Thus,<br />

the operator corresponds to us<strong>in</strong>g Boolean “AND” (Large, Tedd & Hartley, 2001, p.<br />

148 ff.). “This exact sentence” (ES) retrieved documents that conta<strong>in</strong>ed the search<br />

terms <strong>in</strong> the exact form and order entered <strong>in</strong>to the query box. The operator corresponds<br />

to enter<strong>in</strong>g the search terms <strong>in</strong> quotation marks. By this the system is consider<strong>in</strong>g the<br />

search terms as a s<strong>in</strong>gle term (Large, Tedd & Hartley, 2001, p. 167 ff.). Lastly, the “At<br />

least one of the words” (OW) operator retrieved documents conta<strong>in</strong><strong>in</strong>g at least one of the<br />

entered search terms <strong>in</strong> truncated form. The operator corresponds to apply<strong>in</strong>g Boolean “OR”<br />

(Large, Tedd & Hartley, 2001, p. 148 ff.). Of the four, the ES operator is the most<br />

restrictive. Next follows the AW operator. The FT and the OW operators <strong>in</strong><br />

comparison retrieve larger sets of documents. The last field available to the test persons<br />

was the metadata field “Document type”. The field made it possible to limit search<br />

results to specific document types. Choice was between 12 different document types <strong>in</strong><br />

a drop down menu. An empty field at the top of the menu was the default sett<strong>in</strong>g of the<br />

menu, which enabled a search with no limitation as to document types. Search results<br />

were delivered on a list ranked as to the relevance of the documents to the search terms<br />

entered. For each hit different pieces of <strong>in</strong>formation were provided; a document title, a<br />

snippet highlight<strong>in</strong>g the search terms and the surround<strong>in</strong>g terms, the document type (cf.<br />

the document type field mentioned above), and the date of publication. An example of<br />

a result list appears from Figure 6.3.<br />

A central feature of IDOL is the ability of automatically categoriz<strong>in</strong>g<br />

documents on the basis of mach<strong>in</strong>e learn<strong>in</strong>g as described <strong>in</strong> section 5.4.2. 10 The IDOL<br />

categorization facilities were applied to categorize the test system search results. The<br />

taxonomy taken <strong>in</strong>to use on January 1, 2008 (see section 2.4.2) formed the basis of the<br />

categories that search results are automatically placed <strong>in</strong>to when presented to the end<br />

users. The categorization tra<strong>in</strong><strong>in</strong>g started out <strong>in</strong> November, 2008. The first step of the<br />

tra<strong>in</strong><strong>in</strong>g consisted of giv<strong>in</strong>g each subject <strong>in</strong> the taxonomy a rough <strong>in</strong>troduction to the<br />

10 For further elaborations of the IDOL, white papers on the system can be found at www.autonomy.com.<br />

In addition, Chaudhry (2010) have made a comparison with 11 other similar systems.<br />

132


Figure 6.3 Screen dump of the test system: Categorization<br />

133<br />

Chapter 6<br />

understand<strong>in</strong>g of the content of that subject. The procedure consisted of select<strong>in</strong>g 5<br />

terms representative of the subject. The 5 terms were subsequently used to search the<br />

test base. The search result was frisked <strong>in</strong> order to identify candidate documents to<br />

represent each category. The m<strong>in</strong>imum number of candidate documents <strong>in</strong> each<br />

category was been set to a m<strong>in</strong>imum of 20. IDOLs manual recommends between 40-50<br />

candidate documents. The status of the categorization at the time of the search test was<br />

as follows: If a document had been manually <strong>in</strong>dexed at the time of import to the test<br />

database, the manual mark-up of the document decided the plac<strong>in</strong>g of the document <strong>in</strong><br />

the portlet. This is the case for documents published after January 1, 2008. However,<br />

older documents did not have any subject terms attached. For this group of documents<br />

the plac<strong>in</strong>g <strong>in</strong> the port let was based on the tra<strong>in</strong><strong>in</strong>g that IDOL had achieved at the time<br />

of the search test.<br />

The categorization appears from Figure 6.3 (the box at the right hand side of<br />

the result list). The selection of one or more categories took place after a search had<br />

been carried out and a result existed. On the basis of the retrieved documents, the<br />

search result was limited as to subjects present <strong>in</strong> the search results. The categorization<br />

w<strong>in</strong>dow just showed the terms from the taxonomy actually conta<strong>in</strong><strong>in</strong>g documents <strong>in</strong> the


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

current result set. If several categories were selected on the basis of the same query, the<br />

first category was not <strong>in</strong>cluded <strong>in</strong> the subsequent category choices.<br />

In the test situation, when the test persons used the test system without<br />

categorization, the right hand side of the screen was covered <strong>in</strong> order to avoid, that the<br />

test persons were affected by the controlled terms from the taxonomy when compos<strong>in</strong>g<br />

queries for the system. In addition the test persons were not tempted to use the<br />

categorization, when it was not visible to them. The cover<strong>in</strong>g of the categorization<br />

w<strong>in</strong>dow means that two test systems are produced <strong>in</strong> methodical sense; one based on<br />

free text <strong><strong>in</strong>dex<strong>in</strong>g</strong> and one based on categorization of search results. S<strong>in</strong>ce the free text<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> system functions as the basel<strong>in</strong>e for measur<strong>in</strong>g the effect of categorization, we<br />

refer to this system as System A. Accord<strong>in</strong>gly, the system employ<strong>in</strong>g categorization<br />

will be denoted as System B.<br />

6.4.2 Test persons<br />

32 test persons participated <strong>in</strong> the search test. The test persons were recruited at<br />

location 1 and location 2. Ingwersen (2000) recommends 40-50 test persons for purely<br />

quantitative studies, and less for qualitative studies. S<strong>in</strong>ce we were carry<strong>in</strong>g out a<br />

qualitative study, we found 32 people to be satisfy<strong>in</strong>g. From the results of the doma<strong>in</strong><br />

study we had found that the frequency of <strong>in</strong>tranet use was high <strong>in</strong> most parts of the<br />

organization. Therefore, we did not f<strong>in</strong>d it necessary to exclude certa<strong>in</strong> work tasks from<br />

the search test. The choice of the two offices was motivated by the condition that the<br />

two departments represent the different educational groups of employees identified <strong>in</strong><br />

the doma<strong>in</strong> study questionnaire.<br />

To locate relevant test persons all employees with<strong>in</strong> the specified offices<br />

received a web questionnaire. In total the questionnaire was sent to 459 employees. In<br />

the questionnaire the employees answered questions about their background, work<br />

tasks, frequency of use of the <strong>in</strong>tranet and frequency of <strong>in</strong>formation seek<strong>in</strong>g (Appendix<br />

4). We refer to this questionnaire as the recruitment questionnaire. Reliability of<br />

research designs is affected by the consistency of measures, among other th<strong>in</strong>gs<br />

(Carm<strong>in</strong>es & Woods, 2005). Keep<strong>in</strong>g the consistency of work tasks consistent between<br />

the doma<strong>in</strong> study questionnaire and the recruitment questionnaire comprised a special<br />

challenge. Thus, <strong>in</strong> the <strong>in</strong>terven<strong>in</strong>g time the bus<strong>in</strong>ess model had changed and another<br />

merger had taken place <strong>in</strong> the organization. In order to capture the modified bus<strong>in</strong>ess<br />

model and still be able to mirror the previous bus<strong>in</strong>ess model we expanded the report<strong>in</strong>g<br />

of current work tasks. Yet, the widen<strong>in</strong>g was carried out <strong>in</strong> a way that allowed for the<br />

134


135<br />

Chapter 6<br />

current work tasks to be fit <strong>in</strong>to the work tasks of the previous bus<strong>in</strong>ess model. Like<br />

with the doma<strong>in</strong> study questionnaire we aimed at reduc<strong>in</strong>g the semantic openness of the<br />

questionnaire by the use of probes (see Section 6.2.1). As for probes regard<strong>in</strong>g the<br />

work tasks, we used the latest annual report of SKAT as <strong>in</strong>spiration (SKAT, 2009).<br />

In our selection of test persons, we emphasized the test persons’ frequency of<br />

use of the <strong>in</strong>tranet and their general frequency of <strong>in</strong>formation seek<strong>in</strong>g. As for frequency<br />

of use, the most important parameter was that <strong>in</strong>formation needs and derived<br />

<strong>in</strong>formation seek<strong>in</strong>g took place more often than “practically never”. 42 people met<br />

these requirements. Of these, 10 were used as pilot testers. The rema<strong>in</strong>der 32 carried<br />

out the actual search test.<br />

6.4.3 Search tasks<br />

The literature suggests three generic types of search tasks for IR evaluation,<br />

namely natural, simulated, and assigned search tasks (cf., Vakkari, 2003). We used<br />

simulated and genu<strong>in</strong>e work tasks for the test evaluation. Based on the<br />

recommendations put forward by Ingwersen (2000, p. 173), three simulated work tasks<br />

were carried out. The purpose of employ<strong>in</strong>g simulated work tasks was to <strong>in</strong>crease the<br />

degree of experimental control <strong>in</strong> the operational evaluation (cf. Borlund, 2000, p. 72,<br />

2003b). Different recommendations have been given for the development and use of<br />

simulated work tasks. Among other th<strong>in</strong>gs, the recommendations comprise that<br />

simulated work tasks and genu<strong>in</strong>e <strong>in</strong>formation needs are employed <strong>in</strong> the same test, that<br />

the work tasks are tailored to the <strong>in</strong>formation environment and the test persons, and that<br />

search jobs are permuted (Borlund, 2003b). As can be seen below, these<br />

recommendations were <strong>in</strong>corporated <strong>in</strong>to the present test design.<br />

IR evaluations are frequently carried out with graduate or undergraduate<br />

students. This is also the case for the empirical use of simulated work tasks (Borlund &<br />

Schneider, 2010). However, a few studies have applied different versions of simulated<br />

work tasks on professional users (e.g., Nielsen, 2004; Suomela & Kekälä<strong>in</strong>en, 2005,<br />

2006; Wacholder et al., 2007; Blomgren, Vallo & Byström, 2004). Simulated work<br />

tasks have been employed <strong>in</strong> a study of professional users by Blomgren, Vallo &<br />

Byström (2004). On the basis of their study they conclude, that “…compos<strong>in</strong>g a<br />

simulated work task situation that offers a sufficient level of reality for all participants,<br />

must be done with great care” (Blomgren, Vallo & Byström, 2004, p. 66). Obviously,<br />

trigger<strong>in</strong>g real <strong>in</strong>formation needs <strong>in</strong> a simulated and professional context is challeng<strong>in</strong>g,<br />

not least when participants have different work tasks and backgrounds with<strong>in</strong> the


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

professional context, which is the case <strong>in</strong> the present study and <strong>in</strong> the study by<br />

Blomgren, Vallo & Byström. In the study by Price et al. (2009), an subject expert<br />

participates <strong>in</strong> the development of simulated work tasks <strong>in</strong> order to ensure wellfunction<strong>in</strong>g<br />

tasks. The importance of reality of simulated work tasks is emphasized by<br />

several authors (e.g., Blomgren, Vallo & Byström, 2004; Borlund, 2000). Different<br />

aspects may be kept <strong>in</strong> m<strong>in</strong>d <strong>in</strong> order to ensure realism. Here we operationalize realism<br />

as a relevant subject comb<strong>in</strong>ed with a level of complexity correspond<strong>in</strong>g to the test<br />

persons’ genu<strong>in</strong>e <strong>in</strong>formation needs.<br />

As regards the subject content of the simulated work tasks we used different<br />

sources as <strong>in</strong>spiration. We went through the fields <strong>in</strong> the doma<strong>in</strong> study questionnaire<br />

that allowed for open responses. Also the focus group <strong>in</strong>terviews were scanned <strong>in</strong> order<br />

to locate ideas for search tasks. Lastly, we consulted web pages communicat<strong>in</strong>g<br />

citizens’ and m<strong>in</strong>or bus<strong>in</strong>esses’ questions about taxes for <strong>in</strong>spiration. To decide on the<br />

level of complexity of the simulated work tasks, we consulted the results of the doma<strong>in</strong><br />

study. The doma<strong>in</strong> study revealed that <strong>in</strong>formation needs of low complexity were far<br />

more common than more complex types. In the questionnaire, the most frequent<br />

<strong>in</strong>dicators were “I need to f<strong>in</strong>d a document I have used before” and “I know the subject<br />

well but need to f<strong>in</strong>d a specific piece of <strong>in</strong>formation”. Saracevic et al. (1987, p. 35)<br />

def<strong>in</strong>es the complexity of search tasks as to the number of concepts conta<strong>in</strong>ed. Iivonen<br />

(1995) operationalizes the complexity further by decid<strong>in</strong>g, that simple search tasks<br />

consists of up to three concepts. Complex search tasks conta<strong>in</strong>s above three concepts.<br />

Tak<strong>in</strong>g <strong>in</strong>to account that the employees reported simple <strong>in</strong>formation needs as their<br />

predom<strong>in</strong>ant type, we developed simulated work tasks that conta<strong>in</strong>ed three concepts (or<br />

search keys) or less. About ten simulated work tasks were developed. We subsequently<br />

carried out a pilot test <strong>in</strong> order to f<strong>in</strong>d out, how the work tasks worked <strong>in</strong> the test<br />

situation. We wanted <strong>in</strong>formation about the understandability of the work tasks for the<br />

test persons, and specifically, if the test persons with<strong>in</strong> a reasonable amount of time<br />

were able to solve the work tasks. Also we wanted to reduce the number of work tasks.<br />

The work resulted <strong>in</strong> three simulated work tasks concern<strong>in</strong>g the sale of an apartment<br />

(SIM 1), taxation of e-bus<strong>in</strong>esses (SIM 2), and tax based issues related to work<strong>in</strong>g as a<br />

freelancer (SIM 3). The latter of the three search tasks conta<strong>in</strong>ed four search keys (see<br />

Table 6.7). However, one is a non-topical facet, which is the reason for still consider<strong>in</strong>g<br />

it a simple task <strong>in</strong> terms of Iivonen. The f<strong>in</strong>al simulated work tasks appear from<br />

Appendix 14.<br />

136


137<br />

Chapter 6<br />

To be able to control for the test persons <strong>in</strong>sight <strong>in</strong>to the controlled search task<br />

an on screen questionnaire was filled out every time a task had been completed (see<br />

Appendix 15). We asked the test persons about their <strong>in</strong>sight <strong>in</strong>to the subject of the work<br />

task, their view on the difficulty of the work task and the resemblance of the work task<br />

with their usual work tasks. All questions were graded on a 5-po<strong>in</strong>t Likert scale.<br />

In addition to the simulated work task situations, the test persons were asked to<br />

br<strong>in</strong>g a genu<strong>in</strong>e <strong>in</strong>formation need to the test session. A genu<strong>in</strong>e <strong>in</strong>formation need serves<br />

several purposes (Borlund & Schneider, 2010). We consider the most important ones to<br />

be the function as a basel<strong>in</strong>e for simulated needs and the possibility to ga<strong>in</strong> <strong>in</strong>sight <strong>in</strong>to<br />

the system’s effect on real <strong>in</strong>formation needs. Also, it appears, that genu<strong>in</strong>e <strong>in</strong>formation<br />

needs may get better scores on different performance measures (cf., Blomgren, Vallo &<br />

Byström, 2004). Specifically, we e-mailed the test persons shortly before their test<br />

session to ask them to br<strong>in</strong>g a genu<strong>in</strong>e <strong>in</strong>formation need. This way, the e-mail served a<br />

second function; namely as a rem<strong>in</strong>der for the test persons to show up. The exact<br />

word<strong>in</strong>g of the e-mail appears from Appendix 16.<br />

The genu<strong>in</strong>e tasks brought by the respondents confirmed the lack of<br />

uncontrollability <strong>in</strong> controlled test sett<strong>in</strong>gs. The tasks were highly vary<strong>in</strong>g as to their<br />

content reflect<strong>in</strong>g ma<strong>in</strong>ly specialist matters. Also organisational matters such as the<br />

annual summer party were represented though. The tasks also <strong>in</strong>cluded examples that<br />

could not be solved us<strong>in</strong>g the prototype as the <strong>in</strong>formation sought was not <strong>in</strong>cluded <strong>in</strong><br />

the database. In those cases the test persons made up a new task for themselves. The<br />

character of the tasks corresponded to the simulated search task <strong>in</strong> terms on the number<br />

of facets <strong>in</strong>cluded. Thus, the genu<strong>in</strong>e tasks conta<strong>in</strong>ed between one and three facets.<br />

Three examples are listed <strong>in</strong> Table 6.5.<br />

Table 6.5 Examples of genu<strong>in</strong>e search tasks<br />

Search terms Document type Category Search operator Facets<br />

<strong>in</strong>cluded<br />

Ordrenumre Internal - Free text 3<br />

store selskaber <strong>in</strong>formation<br />

Bødetakster - Penalty Free text 2<br />

Skattekvittance - - Free text 1


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

6.4.4 Test procedure<br />

The test procedure consisted of three parts; 1) an <strong>in</strong>troduction to the session, 2) the<br />

search part, where the test persons searched the two systems us<strong>in</strong>g the search tasks and<br />

evaluated retrieved documents, and 3) a post search <strong>in</strong>terview.<br />

The <strong>in</strong>troduction to the test session consisted of different elements. Firstly, the<br />

guidel<strong>in</strong>es for perform<strong>in</strong>g the search tasks were carried out. Next, the test system was<br />

<strong>in</strong>troduced to the test persons. Due to time constra<strong>in</strong>ts the <strong>in</strong>troduction did not <strong>in</strong>clude<br />

time for the test persons to try out the system. The presentation <strong>in</strong>cluded the<br />

characteristics of the system as to search possibilities and the shortcom<strong>in</strong>gs <strong>in</strong> the<br />

prototype. The elements conta<strong>in</strong>ed <strong>in</strong> the <strong>in</strong>troduction are listed <strong>in</strong> Appendix 17. The<br />

<strong>in</strong>troduction was closed by <strong>in</strong>form<strong>in</strong>g and ensur<strong>in</strong>g the test person of their anonymity of<br />

the test (Kvale & Br<strong>in</strong>kmann, 2009, p. 63 ff.).<br />

The test persons searched us<strong>in</strong>g 4 search tasks; 3 simulated and one genu<strong>in</strong>e.<br />

The tasks were rotated as to their succession and the succession of the test systems<br />

(System A and B) <strong>in</strong> order to control for order effects on the test results (cf. Kelly,<br />

2009) and to meet the recommendations put forward by Borlund as to the use of<br />

simulated search tasks (2003b). The miss<strong>in</strong>g try-out of the system even further<br />

necessitated the rotation of work tasks. The rotations applied appear from Appendix 18.<br />

Also appear<strong>in</strong>g from the rotation appendix is that the rotations also addressed the<br />

succession of test systems. When search<strong>in</strong>g <strong>in</strong> System B, it was mandatory that the test<br />

persons made use of the categorization menu <strong>in</strong> the right hand side of the screen. This<br />

was necessary s<strong>in</strong>ce the only <strong>in</strong>dication of the categorization <strong>in</strong> the system is visible<br />

here. Thus, search results are not presented accord<strong>in</strong>g to the <strong>in</strong>herent categorization.<br />

This decision also means that searches omitt<strong>in</strong>g categorization when it should have been<br />

applied was removed from the results. Whenever a task was completed (or resigned<br />

from), a short questionnaire was completed on the screen.<br />

The documents were evaluated on the basis of the title and snippets <strong>in</strong>cluded <strong>in</strong><br />

the result lists. The ma<strong>in</strong> reason for this was that it removes the snippet-document<br />

relationship as a variable <strong>in</strong> the results (cf. Turp<strong>in</strong> et al., 2009; He et al., 2010) and<br />

allows for comparison with correspond<strong>in</strong>g studies (e.g., Käki & Aula, 2005). Further,<br />

the prototype had trouble connect<strong>in</strong>g from l<strong>in</strong>ks <strong>in</strong> the result lists for certa<strong>in</strong> document<br />

types. The test persons were asked to assess the relevance of documents as to the work<br />

task <strong>in</strong> question, that is, situational relevance (cf. Figure 6.4).<br />

The relevance of search results was noted when the result lists were shown to<br />

138


W<br />

CW<br />

assessor/user<br />

N<br />

SR<br />

P<br />

r/q<br />

Real world<br />

IT<br />

A<br />

O-O n<br />

Collection<br />

of objects<br />

139<br />

Chapter 6<br />

Legend:<br />

: Assessor’s / user’s cognitive<br />

space<br />

W : Work task situation<br />

CW : Cognitive perceptionof W<br />

SR : Situational relevance<br />

P : Pert<strong>in</strong>ence relevance<br />

IT : Intellectual topicality<br />

A : Algorithmic relevance<br />

N : Information need<br />

r/q : request/query version<br />

O : retrieved <strong>in</strong>formationobject(s)<br />

: Relevance assessment(s)<br />

or <strong>in</strong>terpretation (s)<br />

: Transformation<br />

: IR system<br />

Figure 6.4 Relevance types <strong>in</strong> IR evaluation adapted from Borlund (2003a, p. 915).<br />

the test persons. This way we received the immediate evaluation of the document while<br />

the test person remembered the document. After the search part of the test, a short post<br />

search <strong>in</strong>terview was conducted. The purpose of the <strong>in</strong>terview was to make the test<br />

persons sum up and reflect on their overall impressions of the test system, on their<br />

present use of the <strong>in</strong>tranet, and how categorization could be useful <strong>in</strong> their daily work.<br />

Due to time constra<strong>in</strong>ts the <strong>in</strong>terview guide was kept rather short. The <strong>in</strong>terview guide<br />

appears from Appendix 19.<br />

Dur<strong>in</strong>g the test the test manager was present <strong>in</strong> the room. There were several<br />

purposes for this. One was that the test persons could be observed dur<strong>in</strong>g their searches.<br />

This enabled the possibility to ask the test persons to elaborate on specific moves <strong>in</strong> the<br />

subsequent <strong>in</strong>terview. Further, the schedule did not leave time for the test persons to get<br />

acqua<strong>in</strong>ted with the test system before the test started. By lett<strong>in</strong>g the test manager be<br />

present dur<strong>in</strong>g the session, the test persons had the possibility to ask clarify<strong>in</strong>g questions<br />

dur<strong>in</strong>g the session. At the closure of the session, the test persons received a m<strong>in</strong>or<br />

acknowledgement for their <strong>in</strong>volvement.<br />

Physically the test took place at location 1 and 2. At location 1 a test room was<br />

available for the conduct of the test. The test room had a stationary mach<strong>in</strong>e for the test<br />

persons to use and a laptop for the test manager. Morae was <strong>in</strong>stalled on both mach<strong>in</strong>es.<br />

The Morae Observer module monitored the test persons’ actions on the laptop screen


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

for the test manager to follow. The monitor<strong>in</strong>g was not kept secret to the test persons,<br />

s<strong>in</strong>ce the purpose of us<strong>in</strong>g it was to avoid physically hav<strong>in</strong>g to look the test persons over<br />

their shoulders. At location 2 a test room was not available. Therefore we brought a<br />

laptop with Morae <strong>in</strong>stalled to enable logg<strong>in</strong>g. Dur<strong>in</strong>g the tests at location 2 the test<br />

manager was obliged to follow the test persons’ moves on the test mach<strong>in</strong>e. The<br />

predom<strong>in</strong>ant part of the tests was carried out at location 1.<br />

6.4.5 Pilot test<br />

Pilot tests were carried out at several stages of the process ahead of the launch of the<br />

search test. Specifically, the recruitment questionnaire, the simulated work tasks, and<br />

the test procedure were tested ahead of the actual collection of data.<br />

The recruitment questionnaire was pretested by a number of colleagues at the<br />

RSLIS. Further, a number of employees at SKAT pilot tested the questionnaire. S<strong>in</strong>ce<br />

the recruitment questionnaire had quite some resemblances with the doma<strong>in</strong> study<br />

questionnaire, we could to a certa<strong>in</strong> extent rely on the methodical experiences ga<strong>in</strong>ed<br />

here. However, the changes <strong>in</strong> the bus<strong>in</strong>ess model necessitated a pilot to ensure that the<br />

modified work tasks were understandable to the recruitment respondents. The<br />

questionnaire was adjusted accord<strong>in</strong>g to the feedback from both RSLIS colleagues and<br />

SKAT employees.<br />

The search task pretest also conta<strong>in</strong>ed different elements. We have already<br />

mentioned the pretest with the purpose of identify<strong>in</strong>g the most relevant work tasks and<br />

reduc<strong>in</strong>g the total number of work tasks. In addition, we tested the work tasks <strong>in</strong> the<br />

test system. Thus, <strong>in</strong> advance of the pilot of the search tasks among employees at<br />

SKAT, the search tasks were tested for their relevance to the test system. Thus, we<br />

tested if the outputs of the search tasks were suitable with the purpose of the search<br />

tasks. We wanted to f<strong>in</strong>d out whether the number documents that would match the<br />

requests were sufficient. In their pr<strong>in</strong>ciples for search result visualization Kules &<br />

Shneiderman (Kules & Shneiderman, 2004, p. 2) suggest that 100-1000 results are<br />

needed as a m<strong>in</strong>imum for an adequate basis of a categorized overview. However, their<br />

pr<strong>in</strong>ciples are based on the web, where the number of documents by far outnumbers our<br />

test collection. Kules & Shneiderman do follow the pr<strong>in</strong>ciple with the reservation that<br />

the optimal number of results depends on many factors such as task doma<strong>in</strong>, and<br />

document quality. Due to the size of the test collection we have had to aim at a lower<br />

number of results. Instead we have emphasized the availability of highly relevant<br />

140


141<br />

Chapter 6<br />

documents to match the simulated work tasks <strong>in</strong> our f<strong>in</strong>al choice of tasks. 11 work tasks<br />

were tested and of these 3 were picked out for the search test.<br />

Also the test situation as such was pilot tested. We needed <strong>in</strong>formation about<br />

how to handle practical matters such as how to document the searches, which<br />

succession of test elements to follow, and how to carry out the evaluations of work tasks<br />

and search results. Also we wanted an approximate estimation of the duration of a test<br />

session. The pilot tests provided very useful <strong>in</strong>sight <strong>in</strong>to these matters and the test<br />

design was corrected accord<strong>in</strong>g to the experiences ga<strong>in</strong>ed <strong>in</strong> the pilot tests. In actual<br />

practice the simulated search tasks and the test procedure were pilot tested<br />

simultaneously. We let the first test persons recruited by the recruitment questionnaire<br />

function as pilot testers and cont<strong>in</strong>ued to pilot until the test design was suitable for data<br />

collection. In total 10 pilot testers participated.<br />

6.4.6 Techniques for data collection and preparation<br />

Dur<strong>in</strong>g the course of the search test different methods for data collection were used <strong>in</strong><br />

order to allow for elaboration of the search process. The test persons’ <strong>in</strong>teraction with<br />

the test system was logged us<strong>in</strong>g the software Morae (see<br />

http://www.techsmith.com/morae.asp). Morae facilitate logg<strong>in</strong>g of key and screen<br />

activity. Both options were applied for documentation of the test, though we are<br />

primarily us<strong>in</strong>g the key log for analysis. Search (or transaction) logs have been widely<br />

used <strong>in</strong> order to document and analyze <strong>in</strong>teractions with retrieval systems and search<strong>in</strong>g<br />

behavior. The most significant strength of search logs as to the present test setup is the<br />

unobtrusiveness of the method (Jansen, 2006, p. 424-425). However the data delivered<br />

are descriptive (Jansen & Pooch, 2001, p. 242). That implies that the search log data<br />

should not stand alone, if we want to expla<strong>in</strong> and understand the <strong>in</strong>teraction between<br />

system and user. For that reason the log data were supplemented with qualitative data<br />

<strong>in</strong> order to compensate for the limitations of the search log as a research tool.<br />

Participant observation also took place dur<strong>in</strong>g the test procedure (cf. Ely, 1991,<br />

p. 41 ff.). As the test manager was present dur<strong>in</strong>g the test, observations were made <strong>in</strong><br />

order to capture moves, comments, modifications, and other acts of relevance to the<br />

search test. The observations are not reported here <strong>in</strong>dependently. Rather, the purpose<br />

of the observation was to qualify the post search <strong>in</strong>terview and enable the test manager<br />

to ask the test persons specifically about their <strong>in</strong>teraction with the system.<br />

Interviews, both oral and <strong>in</strong> questionnaire form, were carried out along the<br />

course of the search test. The recruitment questionnaire provided background data on


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

the test persons’ demographic data, seek<strong>in</strong>g behavior, and the like. Dur<strong>in</strong>g the search<br />

test the simulated work tasks were assessed as to the test persons’ knowledge of the<br />

subject, their perception of the degree of difficulty, and the extent of similarity with<br />

their genu<strong>in</strong>e work tasks. Lastly, after the test persons had carried out the search tasks,<br />

a post <strong>in</strong>terview were carried out. The purpose of the <strong>in</strong>terview was to ask follow up<br />

questions <strong>in</strong> order to get a more comprehensive picture of the search situation. For<br />

documentation purposes a Dictaphone was set to record the search test and the post<br />

<strong>in</strong>terview. It was decided to record the full event <strong>in</strong> case the test persons gave<br />

comments dur<strong>in</strong>g search<strong>in</strong>g that would be of value to our understand<strong>in</strong>g of their<br />

<strong>in</strong>teraction with the test system. Further, us<strong>in</strong>g the Dictaphone reduced the need for<br />

note tak<strong>in</strong>g dur<strong>in</strong>g search<strong>in</strong>g and allowed for the test manager to focus on the test<br />

persons and their actions. The recorded sequences were transcribed subsequently.<br />

The last type of documentation comprises the relevance assessments made<br />

dur<strong>in</strong>g search<strong>in</strong>g. Relevance was captured along two dimensions; the degree of<br />

relevance and the criteria applied for the assessment (Borlund, 2003a). The more<br />

systematical of the two were the measurement of the degree of relevance. We have<br />

already mentioned that relevance assessments took its po<strong>in</strong>t of departure <strong>in</strong> situational<br />

relevance. The degrees of situational relevance of the documents retrieved were<br />

measured on a 4-po<strong>in</strong>t scale. We followed Sormunens (2002) four po<strong>in</strong>t scale s<strong>in</strong>ce it<br />

allows for a dist<strong>in</strong>ction between the two categories of partial relevance <strong>in</strong>to relevant and<br />

useful and relevant and potential useless (Sormunen, 2002, p. 329). In order to reflect<br />

this dist<strong>in</strong>ction we followed Sormunens description of the respective degrees <strong>in</strong> our<br />

explanation to the test persons. In addition, we asked the test persons about the<br />

motivations for their assessments, i.e. the relevance criteria applied. The purpose of<br />

<strong>in</strong>clud<strong>in</strong>g relevance criteria was not to make a systematic <strong>in</strong>vestigation of relevance<br />

criteria. Rather, the criteria were <strong>in</strong>cluded as a tool to encourage the test persons to<br />

expla<strong>in</strong> the assessments given. The questions appear from the post search <strong>in</strong>terview<br />

guide, though asked <strong>in</strong> connection with relevance assessments.<br />

6.4.7 Data analysis<br />

The data collected consisted of 1) background data (from the recruitment<br />

questionnaire), 2) <strong>in</strong>terview transcriptions (from the search sessions and the post search<br />

<strong>in</strong>terview), 3) search logs, 4) relevance assessments, and 5) assessments of the<br />

simulated search tasks. Background data and assessments of tasks were analysed us<strong>in</strong>g<br />

descriptive statistics <strong>in</strong> SPSS. As the data were used to ga<strong>in</strong> an <strong>in</strong>sight <strong>in</strong>to the<br />

142


143<br />

Chapter 6<br />

characteristics of the test persons and the appropriateness of the search tasks, we did not<br />

f<strong>in</strong>d reason to expand this part of the analysis further. Aga<strong>in</strong> the record<strong>in</strong>gs from the<br />

search test were transcribed to facilitate structured analysis. In the present case the<br />

transcription was carried out by an external transcriber. The procedure is clarified <strong>in</strong><br />

Appendix 10. Like with the focus group <strong>in</strong>terviews we used atlas.ti for analysis and<br />

followed Halkier’s (2008) three steps for analysis of qualitative data.<br />

The search log registered search time and keys applied. From the screen video<br />

recorded dur<strong>in</strong>g the searches, we manually drew number of hits retrieved, selection of<br />

subject categories, use of <strong>in</strong>formation filters and search types. All were registered <strong>in</strong><br />

SPSS for the analysis purposes. Lastly the relevance assessments of documents were<br />

typed <strong>in</strong>to SPSS. This work resulted <strong>in</strong> the identification of a number of variables listed<br />

<strong>in</strong> Table 6.6.<br />

At query level we measured the number of terms applied, search keys applied, the<br />

search operators and document type specifications used, the number of hits, the success<br />

of queries, and the type of reformulations undertaken. The number of search terms is<br />

<strong>in</strong>cluded to provide <strong>in</strong>formation about the number of terms needed <strong>in</strong> order to achieve a<br />

satisfy<strong>in</strong>g number of results. Search keys provide knowledge about the number of<br />

search task facets covered <strong>in</strong> queries. The facets identified <strong>in</strong> Table 6.7 (the outer right<br />

column) forms the basis of <strong>in</strong>terpretation of the queries. All elements of a query could<br />

add to the facets; query terms, document types, and categories (the latter only <strong>in</strong> system<br />

B queries). When a category was <strong>in</strong>cluded <strong>in</strong> a query, it was counted as one concept no<br />

matter the number of terms describ<strong>in</strong>g the category. The variable is <strong>in</strong>cluded for several<br />

reasons. One reason is to be able to identify the average number of terms used to<br />

represent search keys. This <strong>in</strong>forms about which search keys are considered more<br />

important to the test persons, but also the level of detail <strong>in</strong> the representation of search<br />

keys. The other reason is the option of identify<strong>in</strong>g the optimal number of search keys<br />

for obta<strong>in</strong><strong>in</strong>g a useful search result.<br />

Search operators and the use of the document type filter express, how searchers<br />

comb<strong>in</strong>e their search terms, and whether they aim for a narrow or a broad search result.<br />

The number of hits is another <strong>in</strong>dicator of the success of a search. Thus, a set of results<br />

can be very small (e.g. 0 hits) or very large (e.g. 50.000 hits). Includ<strong>in</strong>g the number of<br />

hits further enables to compare the quantitative output of different queries. The test<br />

system provided an approximate count of the number of results. In very small sets it


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 6.6 Search test variables, their def<strong>in</strong>ition and measurement<br />

Variable Def<strong>in</strong>ition Measurement<br />

Query level<br />

Terms per query Number of words separated by a s<strong>in</strong>gle Average number of<br />

spac<strong>in</strong>g. Dashes were not counted as terms per query<br />

s<strong>in</strong>gle terms. Terms connected with a<br />

dash (eg.” e-handel”) were counted as<br />

one term.<br />

Search keys per Number of search keys applied <strong>in</strong> queries Average number of<br />

query<br />

search keys per query<br />

Use of search The search operator chosen for a specific Distribution of queries<br />

operators <strong>in</strong> query<br />

us<strong>in</strong>g each of the four<br />

queries<br />

search types <strong>in</strong><br />

Use of the filter<br />

percentages<br />

The DT filter chosen (if any) for a Average number of<br />

“Document type” specific query<br />

queries us<strong>in</strong>g the DT<br />

(DT) <strong>in</strong> queries<br />

filter <strong>in</strong> percentages<br />

Number of hits <strong>in</strong> The number of hits retrieved <strong>in</strong> queries. Average number of<br />

queries<br />

hits retrieved<br />

Query success Queries retriev<strong>in</strong>g at least one document Percentage of<br />

with a relevance score of 2 or 3 are successful queries<br />

considered successful<br />

Type of Reformulations <strong>in</strong> queries. Registered as Percentage of<br />

reformulations the change from the past to the present reformulations <strong>in</strong><br />

query. Registered types count: Category,<br />

query terms, document type, search<br />

operator, and a comb<strong>in</strong>ation of the<br />

above.<br />

queries<br />

144


Number of sessions with<br />

reformulations<br />

Number of reformulations<br />

per session<br />

Session level<br />

Number of sessions conta<strong>in</strong><strong>in</strong>g<br />

more than one query.<br />

Reformulations comprise changes<br />

of queries, search type (or<br />

categories <strong>in</strong> system B), or<br />

document type.<br />

Number of times a query have<br />

been reformulated <strong>in</strong> a session<br />

Session success Sessions conta<strong>in</strong><strong>in</strong>g at least one<br />

successful query are considered<br />

successful<br />

Test persons’ assessment Measured on a scale from 1-5,<br />

of their <strong>in</strong>sight <strong>in</strong>to the where<br />

simulated search tasks 1=No <strong>in</strong>sight, and 5=Great <strong>in</strong>sight<br />

Test persons’ assessment Measured on a scale from 1-5,<br />

of simulated search tasks’ where<br />

level of difficulty 1=Very easy, and 5=Very difficult<br />

Test persons’ assessment Measured on a scale from 1-5,<br />

of the resemblance where<br />

between the simulated 1=No resemblance, and 5=Great<br />

search task and their daily<br />

work tasks<br />

resemblance<br />

145<br />

Chapter 6<br />

Percentage of<br />

sessions<br />

reformulations<br />

with<br />

Average number of<br />

reformulations per<br />

session<br />

Average number of<br />

sessions solved<br />

Average<br />

<strong>in</strong>sight<br />

score on<br />

Average score on the<br />

level of difficulty<br />

Average score on<br />

resemblance between<br />

search task and daily<br />

work tasks<br />

could be verified, that the count was approximated, as the actual number of results<br />

sometimes differed slightly from the <strong>in</strong>formed count. To give equal conditions to small<br />

and large retrieval sets, the number of search results summarized by the system was<br />

registered as the result for all searches. Lastly, the type of reformulations was <strong>in</strong>cluded.<br />

Query reformulations (or modifications) designate the actions taken by searchers <strong>in</strong><br />

order to adjust an <strong>in</strong>adequate search result. For that reason reformulations are highly<br />

<strong>in</strong>formative as to users’ <strong>in</strong>teraction with an IR system. Huang & Efthimiadis (2009, p.<br />

79) have suggested a taxonomy of reformulations that reflect modifications of search<br />

terms alone. With the present identification of reformulations we wanted reflect the<br />

changes made <strong>in</strong> all fields of the search <strong>in</strong>terface <strong>in</strong>clud<strong>in</strong>g the categorization w<strong>in</strong>dow.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Overall, we may term these variables as <strong>in</strong>teraction variables (cf. Kelly, 2009, p. 105<br />

ff.). A related, and very common variable to <strong>in</strong>clude <strong>in</strong> this type of studies, is the search<br />

time applied. Search time was excluded from the present data, as <strong>in</strong>teraction with the<br />

Table 6.7 Simulated search task facets<br />

Search<br />

task<br />

Description Facets<br />

Sim1 Sell<strong>in</strong>g apartment purchased by parents for their<br />

children. Can the parent get a tax relief for expenses<br />

concern<strong>in</strong>g the estate agent, repairs, and the loss<br />

ga<strong>in</strong>ed, when the apartment was sold?<br />

F<strong>in</strong>d documents outl<strong>in</strong><strong>in</strong>g the fiscal conditions<br />

concern<strong>in</strong>g apartments purchased by parents for<br />

their children.<br />

Sim2 Taxation of e-commerce: An owner-managed one<br />

man publish<strong>in</strong>g house wants to sell books onl<strong>in</strong>e <strong>in</strong><br />

the United States and other countries. The<br />

permanent establishment is <strong>in</strong> Denmark. How is<br />

the owner taxed on his earn<strong>in</strong>gs?<br />

F<strong>in</strong>d documents outl<strong>in</strong><strong>in</strong>g, how e-commerce with<br />

permanent establishment <strong>in</strong> Denmark is taxed.<br />

Sim3 Freelance work: A freelance teacher is about to<br />

expand his activities, which will make him earn<br />

about 100.000 DKR per year. Now he is not sure,<br />

whether he can cont<strong>in</strong>ue as a salaried worker, or if<br />

he must start his own bus<strong>in</strong>ess and become<br />

registered for VAT.<br />

F<strong>in</strong>d documents outl<strong>in</strong><strong>in</strong>g the rules for when to<br />

become registered for VAT.<br />

146<br />

Topical facets:<br />

- Bus<strong>in</strong>ess activity: Parents’<br />

purchase<br />

- Taxation: Tax relief<br />

Non-topical facets:<br />

- Information type: Legal<br />

guidances, citizen booklets,<br />

legislation<br />

Topical facets:<br />

- Bus<strong>in</strong>ess activity:<br />

- E-commerce<br />

- Taxation: Permanent<br />

establishment DK, foreign<br />

<strong>in</strong>come<br />

Non-topical facets:<br />

- Information type: Legal<br />

guidances, bus<strong>in</strong>ess<br />

guidances, legislation<br />

Topical facets:<br />

- Bus<strong>in</strong>ess format: Freelance<br />

- Bus<strong>in</strong>ess activity: Teach<strong>in</strong>g<br />

- Taxation: VAT register<strong>in</strong>g<br />

Non-topical facets:<br />

- Information type: Legal<br />

guidances, bus<strong>in</strong>ess<br />

guidances, legislation


147<br />

Chapter 6<br />

observer took place <strong>in</strong> many search sessions and affected the time spent. As a result,<br />

search time would not have been a valid variable <strong>in</strong> the present data set.<br />

Performance is another prevalent variable type <strong>in</strong> IR evaluation studies (cf.<br />

Kelly, 2009, 106 ff.). Commonly established performance measures are used to<br />

quantify and compare the performance of IR systems. We have already mentioned<br />

precision and recall (section 5.2.4). Other examples count the discounted cumulative<br />

ga<strong>in</strong> (DCG), a measure tak<strong>in</strong>g <strong>in</strong>to account the rank<strong>in</strong>g of documents (Järvel<strong>in</strong> &<br />

Kekälä<strong>in</strong>en, 2002), and mean average precision, a measure that calculates the mean of<br />

precision after all relevant documents have been retrieved (Voorhees, 2000). However,<br />

the form of the log file did not enable these calculations, as it did not store the<br />

documents retrieved. However, we did measure query success <strong>in</strong> terms of the query’s<br />

ability to retrieve relevant documents as outl<strong>in</strong>ed <strong>in</strong> the previous section. For the<br />

purpose of performance measurement, we set a successful search to be a query<br />

retriev<strong>in</strong>g at least one document with a relevance of 2 or 3 (on a scale from 0-3, where 3<br />

is the score of full relevance). 2 was <strong>in</strong>cluded <strong>in</strong> the measurement of success, as it<br />

turned out that the test persons at several occasions stopped their search<strong>in</strong>g, when a<br />

level 2 document had been retrieved. To exemplify, two test persons stop with the<br />

follow<strong>in</strong>g statements:<br />

“Well, I didn’t f<strong>in</strong>d anyth<strong>in</strong>g that states exactly how to do it, but I have found<br />

someth<strong>in</strong>g <strong>in</strong>dicat<strong>in</strong>g where I might f<strong>in</strong>d the rules.” (TP1, l<strong>in</strong>e 243-244), and<br />

“[I still th<strong>in</strong>k it is a 2..] because I do get some <strong>in</strong>formation about the tax<br />

rules… But of course you do need to go one level deeper <strong>in</strong> order to hit a 3” (TP6, l<strong>in</strong>e<br />

49-52).<br />

Another reason for <strong>in</strong>clud<strong>in</strong>g level 2 documents <strong>in</strong> the def<strong>in</strong>ition of a successful query is<br />

the assessment of documents from the metadata conta<strong>in</strong>ed <strong>in</strong> the result lists, and not the<br />

full document. As one test person puts it:<br />

“...I would not give it a 3. Actually, I would probably give 1 to both of them,<br />

because I can’t know if it is what I need, before I get <strong>in</strong> and see if it is correct. But<br />

those are the ones, I would choose. Unless I can see that I can move on...” (TP3, l<strong>in</strong>e<br />

73-76).


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

It appears from the quote that documents might be rated lower because the test persons<br />

do not get to assess the full version of documents.<br />

At session level a number of <strong>in</strong>teraction variables were also <strong>in</strong>cluded; the<br />

number of sessions with reformulations, the number of reformulations, the session<br />

success. The reformulations basically provide same <strong>in</strong>formation as with queries, though<br />

at session level the impression might change slightly, which is why it was <strong>in</strong>cluded here<br />

too. Likewise, the session success is a condensation of the query level <strong>in</strong> order to be<br />

able to compare across the search tasks at a more overall level. In terms of Kelly (2009,<br />

p. 104-105) the rema<strong>in</strong><strong>in</strong>g three variables at session level are characterized as<br />

<strong>in</strong>formation need variables. Here we measured the test persons’ assessments of the<br />

search tasks (solely for simulated search tasks) <strong>in</strong> terms of the level of difficulty, their<br />

<strong>in</strong>sight <strong>in</strong>to the topic, and the similarity of the task with genu<strong>in</strong>e work tasks. Though<br />

the risk of receiv<strong>in</strong>g highly subjective answers <strong>in</strong> this type of assessments, we <strong>in</strong>cluded<br />

them to have some <strong>in</strong>dication of the test persons perception of simulated search tasks.<br />

Subsequently to the registration of data, statistical analyses were carried out.<br />

The analysis consisted of univariate and bivariate statistics, frequencies, means, and<br />

correlations. In addition, <strong>in</strong>ferential statistics was carried out, when relevant. We used<br />

Pearson’s R for <strong>in</strong>terval and scale level data and chi square (2) for data at nom<strong>in</strong>al<br />

level.<br />

6.5 Limitations<br />

It is recognized that the search test has limitations. As the test is designed as a<br />

laboratory, controlled test, it does not necessarily reflect the everyday seek<strong>in</strong>g behaviour<br />

of the employees. Also test persons searched on the basis of three simulated search<br />

tasks. The challenges of design<strong>in</strong>g suitable tasks for professional users have been<br />

outl<strong>in</strong>ed above. From the results presented <strong>in</strong> Chapter 8, the searchers’ handl<strong>in</strong>g of the<br />

genu<strong>in</strong>e search tasks differ <strong>in</strong> some respects from the simulated search tasks. However<br />

<strong>in</strong> most cases the differences are m<strong>in</strong>or. Further, the accordance of facets <strong>in</strong> simulated<br />

and genu<strong>in</strong>e search tasks demonstrates realism to the employees concern<strong>in</strong>g this aspect.<br />

In addition the test persons carried out their own <strong>in</strong>terpretations of the search tasks as to<br />

construct<strong>in</strong>g queries, provid<strong>in</strong>g 128 sessions and 564 queries. In this respect the test<br />

provides knowledge of the test persons’ understand<strong>in</strong>g of and ability to <strong>in</strong>corporate<br />

elements of a search <strong>in</strong>terface <strong>in</strong>to their queries. Lastly, we want to address the state of<br />

the prototype used for the test. Though the system <strong>in</strong>cluded about a fourth of the<br />

148


149<br />

Chapter 6<br />

documents of the runn<strong>in</strong>g <strong>in</strong>tranet, it could have meant that known documents were not<br />

<strong>in</strong>cluded. In addition the tra<strong>in</strong><strong>in</strong>g of the categorization was not f<strong>in</strong>al at the time of the<br />

test, which at times challenged the test persons and may have affected the search log<br />

data. However, the search <strong>in</strong>terviews provided valuable qualitative data to expla<strong>in</strong> and<br />

understand the nature of the challenges and the test persons use of the prototype.<br />

6.6 Relation between research method and research questions<br />

In the previous sections we have outl<strong>in</strong>ed the research methods form<strong>in</strong>g the basis for the<br />

collection and analysis of data. We will close the present chapter by <strong>in</strong>terconnect<strong>in</strong>g the<br />

research method with the research questions guid<strong>in</strong>g the thesis <strong>in</strong> order to clarify the<br />

purpose of the specific elements of the research method. The relations between research<br />

questions and their empirical basis are outl<strong>in</strong>ed <strong>in</strong> Table 6.8. As appears from the table<br />

RQ 1.1-1.4 and 2.1-2.9 are empirically based, while RQ 1.5 and RQ2.10 puts the<br />

empirical f<strong>in</strong>d<strong>in</strong>gs <strong>in</strong>to perspective. Next we will present the results of the doma<strong>in</strong><br />

study.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 6.8 Outl<strong>in</strong>e of the relation between research questions and empirical data<br />

Research question Empirical basis<br />

RQ1: What characterizes the e-<strong>government</strong> employee’s <strong>in</strong>formation seek<strong>in</strong>g behaviour <strong>in</strong><br />

relation to:<br />

1.1 Their use of <strong>in</strong>formation sources? Survey questionnaire and focus group<br />

1.2 Their frequency of <strong>in</strong>formation seek<strong>in</strong>g? <strong>in</strong>terviews<br />

1.3 Their <strong>in</strong>formation needs?<br />

1.4 Their metadata preferences?<br />

1.5 How does the seek<strong>in</strong>g behaviour affect The empirical f<strong>in</strong>d<strong>in</strong>gs of RQ 1.1-1.4<br />

demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong>?<br />

are analysed from an <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

perspective. The response to the<br />

question is analytical.<br />

RQ2: How do automatic extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> and automatic categorization perform <strong>in</strong><br />

relation to the identified doma<strong>in</strong> characteristics as to<br />

2.1 Number of queries <strong>in</strong> sessions? Search log supported by search<br />

2.2 Number of terms <strong>in</strong> queries?<br />

2.3 Number of concepts <strong>in</strong> queries?<br />

2.4 The type of search operator applied?<br />

2.5 The use of document type filters?<br />

2.6 Number of reformulations?<br />

2.7 Types of reformulations?<br />

2.8 Degree of search success <strong>in</strong> queries and<br />

sessions?<br />

<strong>in</strong>terviews<br />

2.9 Overall performance measured by<br />

performance measures<br />

2.10 Which implications does the performance The empirical f<strong>in</strong>d<strong>in</strong>gs of RQ 2.1-2.9<br />

of different <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods have for future are analysed <strong>in</strong> terms of their<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> and <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es <strong>in</strong> the doma<strong>in</strong> implications. The response to the<br />

of e-<strong>government</strong>?<br />

question is analytical.<br />

150<br />

Chapter 7<br />

Chapter 8


7 Doma<strong>in</strong> study results<br />

151<br />

Chapter 7<br />

The purpose of the doma<strong>in</strong> study is to be able to answer the research questions<br />

regard<strong>in</strong>g the <strong>in</strong>formation seek<strong>in</strong>g behaviour of e-<strong>government</strong> employees and how the<br />

doma<strong>in</strong> characteristics affect demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong> with<strong>in</strong> the doma<strong>in</strong> (research<br />

question 1) outl<strong>in</strong>ed <strong>in</strong> Chapter 1. The <strong>in</strong>vestigation of seek<strong>in</strong>g behavior <strong>in</strong> the doma<strong>in</strong><br />

served more purposes <strong>in</strong> the project. Most importantly, the doma<strong>in</strong> study should <strong>in</strong>form<br />

the subsequent search test as to how it is designed <strong>in</strong> order to reflect the behavior with<strong>in</strong><br />

the doma<strong>in</strong>. Secondly, we wanted a validation of the relevance of the system chosen for<br />

the search test (a prototype of a future version of the <strong>in</strong>tranet at SKAT, see section<br />

6.4.1).<br />

The results of the questionnaire (see section 6.2) and the focus groups (see<br />

section 6.3) form the basis for the doma<strong>in</strong> study. The chapter is <strong>in</strong>troduced by a<br />

presentation of the questionnaire respondents and the focus group participants (section<br />

7.1 and 7.2). Next follows results the results and analysis of the empirical data<br />

collection regard<strong>in</strong>g research questions 1.1-1.4. The purpose of the section is to be able<br />

to characterize the seek<strong>in</strong>g behaviour of e-<strong>government</strong> employees <strong>in</strong> the case study. We<br />

have divided the analysis <strong>in</strong> two parts. The first section concerns the f<strong>in</strong>d<strong>in</strong>gs related to<br />

general seek<strong>in</strong>g behaviour of the employees (section 7.3). The succeed<strong>in</strong>g section is<br />

concerned with the results generat<strong>in</strong>g demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong> (section 7.4). The chapter<br />

is f<strong>in</strong>ished by a summary.<br />

7.1 Questionnaire respondents, their background and work tasks<br />

340 respondents completed the questionnaire result<strong>in</strong>g <strong>in</strong> a response rate on 42, 6% (see<br />

Appendix 21), which was an <strong>in</strong>crease of responses compared to the pilot test. Here the<br />

response rate was 29 % (see Appendix 5). The degree of response of the rema<strong>in</strong><strong>in</strong>g 57<br />

% also appears from Appendix 21. As we are only us<strong>in</strong>g the full responses as basis for<br />

the data analysis, the 42, 6% are the focal po<strong>in</strong>t of the rema<strong>in</strong>der of this chapter as to the<br />

questionnaire part.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 7.1 Distribution of respondents as to their education (percentages)<br />

152<br />

# Percentages<br />

Internal clerk programme 97 28.5<br />

Adm<strong>in</strong>istrative assistant 95 27.9<br />

Other vocational education and tra<strong>in</strong><strong>in</strong>g 26 7.6<br />

Upper secondary education 10 2.9<br />

Short-cycle higher education 10 2.9<br />

Bachelor degree 7 2.1<br />

Medium-cycle higher education 26 7.6<br />

Long-cycle higher education 43 12.6<br />

Master’s programme 26 7.6<br />

Total 340 100<br />

The age of the respondents ranges between 19 and 68 years. The average age of the<br />

respondents is slightly above 47 years with a standard deviation of 9.5 years, which<br />

reflect the population figures (see Appendix 23). The respondents overall have quite a<br />

long length of service <strong>in</strong> the organization (see Appendix 24). Accord<strong>in</strong>gly, the<br />

respondents’ experience with the s<strong>in</strong>gle work tasks are also extensive, when measured<br />

as the number of years, the respondents have been work<strong>in</strong>g with the task (see Appendix<br />

22). Thus, the exchange of employees is limited, and that the respondents tend to<br />

cont<strong>in</strong>ue carry<strong>in</strong>g out the same work tasks for some time. However, <strong>in</strong>ternal circulation<br />

of employees <strong>in</strong> the organization also takes place. Thus, both the focus group<br />

<strong>in</strong>terviews and the search test have revealed employees that have carried out numerous<br />

different and diverse tasks dur<strong>in</strong>g their time of service. The majority of the respondents<br />

are educated with<strong>in</strong> the organization or are adm<strong>in</strong>istrative assistants (see Table 7.1).<br />

Another large group have f<strong>in</strong>ished a higher education or master’s programmes. In sum,<br />

the respondents may be characterized as employees of a certa<strong>in</strong> age that are expected to<br />

have a quite some <strong>in</strong>sight <strong>in</strong> organization matters and topics due to<br />

the general long length of service with<strong>in</strong> the organization and due to the educational<br />

background that <strong>in</strong> many cases can be considered as organization specific.<br />

The respondents could select 19 different generic work tasks as their work<br />

tasks <strong>in</strong> the questionnaire. There were neither upper nor lower limits to the number of<br />

selections. The frequencies are shown <strong>in</strong> Table 7.2. We have already discussed the<br />

10,9 % of the respondents not select<strong>in</strong>g any work tasks <strong>in</strong> section 6.2.6 and will not


153<br />

Chapter 7<br />

elaborate further on this issue here. Respondents most frequently chose one (27, 9 %)<br />

or two (35, 8 %) work tasks. From three and upwards, the number of respondents<br />

decreases. The number of work tasks selected by the respondents show that employees<br />

predom<strong>in</strong>antly carry out a few work tasks dur<strong>in</strong>g their work day. This corresponds to<br />

the task oriented organization structure mentioned <strong>in</strong> section 2.4. Further the size of the<br />

organization allows for highly specialized employees.<br />

It may be discussed, how one person can take care of up to as many as six<br />

generic work tasks. The answer may be found <strong>in</strong> exactly the word generic. It may have<br />

caused the respondents some problems identify<strong>in</strong>g exactly their work area <strong>in</strong> the generic<br />

nature of the description of the work tasks (see Appendix 1), when the actual work<br />

Table 7.2 Number of work tasks selected by respondents<br />

Number of work tasks<br />

carried out by employees<br />

Data from web questionnaire:<br />

All employees,<br />

N=340<br />

# %<br />

0 WT 37 10,9%<br />

1 WT 95 27,9%<br />

2 WT 122 35,8%<br />

3 WT 52 15,2%<br />

4 WT 20 5,9%<br />

5 WT 10 2,9%<br />

6 WT 5 1,5%<br />

Total 340 100%


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 7.3 Ranked frequency of work tasks <strong>in</strong> questionnaire results<br />

Work task carried out by employees Data from web<br />

questionnaire:<br />

All respondents,<br />

n=340<br />

# %<br />

Instruction 181 53%<br />

Inspection: common 61 18%<br />

Settlement: prelim<strong>in</strong>ary assessment of <strong>in</strong>come/personal<br />

taxes<br />

57 17%<br />

Settlement: bus<strong>in</strong>ess relations 57 17%<br />

Processes of support: legal support 45 13%<br />

Collection 39 12%<br />

Management and development: development 27 8%<br />

Settlement: corporation taxes 25 7%<br />

Settlement: common 20 6%<br />

Settlement: vehicles 18 5%<br />

Inspection: customs 16 5%<br />

Management and development: strategy 16 5%<br />

Processes of support: <strong>in</strong>ternal activities 15 4%<br />

Settlement: estate 14 4%<br />

Processes of support: IT service and adm<strong>in</strong>istration 14 4%<br />

Processes of support: HR and education 14 4%<br />

Management and development: bus<strong>in</strong>ess management 14 4%<br />

Settlement: customs 12 4%<br />

Processes of support: m<strong>in</strong>ister service 10 3%<br />

Total 655<br />

154


155<br />

Chapter 7<br />

area consists of a comb<strong>in</strong>ation of several work tasks. 11 Further, the respondents were<br />

asked to “pick also work tasks that “you carry out elements of” (page 12 of the<br />

questionnaire, see Appendix 4). However, the majority of the respondents selected<br />

between one and three work tasks. The work tasks are represented by the respondents<br />

accord<strong>in</strong>g to Table 7.3. In total, the respondents answered questions about 655 work<br />

tasks distributed between the 19 generic work tasks. The table demonstrates the relative<br />

extent of the work tasks among the respondents. The most dom<strong>in</strong>at<strong>in</strong>g work task is<br />

Instruction. Instruction differs from most of the other work tasks. Thus, accord<strong>in</strong>g to<br />

the def<strong>in</strong>ition of Instruction, it represents a different layer, because it operates at a meta<br />

level basically concern<strong>in</strong>g the contact with clients, whether citizens or bus<strong>in</strong>esses.<br />

Instruction does not refer to specific subject areas <strong>in</strong> the organisation which is the case<br />

for the rema<strong>in</strong>der of the work tasks.<br />

7.2 Characteristics of focus group participants<br />

The participants <strong>in</strong> the focus groups were assembled <strong>in</strong> order to represent the six ma<strong>in</strong><br />

processes <strong>in</strong> the bus<strong>in</strong>ess model of the organization. As can be seen from Appendix 25,<br />

all six ma<strong>in</strong> processes <strong>in</strong> the bus<strong>in</strong>ess model were represented by the participants. It<br />

turned out, however, that several of the participants covered more than one of the work<br />

tasks. This is <strong>in</strong> l<strong>in</strong>e with the questionnaire results just mentioned. As a consequence,<br />

some participants are placed several places <strong>in</strong> the table. Instruction constitutes a special<br />

case, s<strong>in</strong>ce it is a part of most of the participants’ daily work <strong>in</strong> some sense next to their<br />

other primary functions. The six participants placed here are the ones participat<strong>in</strong>g <strong>in</strong><br />

the focus group specifically concern<strong>in</strong>g Instruction.<br />

The participants represented a number of different educational backgrounds.<br />

When counted by the division from the questionnaire, the participants are distributed<br />

accord<strong>in</strong>g to Table 7.4. Some of the educations mentioned <strong>in</strong> the questionnaire<br />

11<br />

For <strong>in</strong>stance one comb<strong>in</strong>ation <strong>in</strong> the questionnaire represents three tasks: Settlement: Prelim<strong>in</strong>ary<br />

assessment of <strong>in</strong>come/personal taxes, Inspection: Common, and Processes of support: Legal support.<br />

Thus, the work area is concerns <strong>in</strong>spections and legal support <strong>in</strong> regards to <strong>in</strong>come taxes.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 7.4 Focus group participants' educational background<br />

Title of education Data from focus groups:<br />

Focus group participants,<br />

N=35<br />

156<br />

# %<br />

Internal clerk programme 19 54<br />

Adm<strong>in</strong>istrative assistant 3 9<br />

Other vocational education<br />

and tra<strong>in</strong><strong>in</strong>g<br />

Upper secondary education -<br />

Short-cycle higher education -<br />

Bachelor degree -<br />

Medium-cycle higher<br />

education<br />

Long-cycle higher education 8 23<br />

Master’s programme 2 6<br />

Could not be placed 3 9<br />

Total 35<br />

have not been represented <strong>in</strong> the focus groups. We do not consider this a problem s<strong>in</strong>ce<br />

it is the same educations that are less frequent <strong>in</strong> the questionnaire results (see Table<br />

7.1). Also, the aim of the focus groups was not necessarily to be representative for the<br />

organization as to the level of education. The participants range between a few months<br />

and up to about 40 years as to their length of service with<strong>in</strong> the organization. Thus,<br />

both high experience employees and newcomers are represented <strong>in</strong> the groups.<br />

7.3 Results regard<strong>in</strong>g professional e-<strong>government</strong> seek<strong>in</strong>g behavior<br />

The purpose of section 7.3 is to present the general seek<strong>in</strong>g behavior found <strong>in</strong> the<br />

questionnaire and the focus group <strong>in</strong>terviews. The section addresses the employees’<br />

seek<strong>in</strong>g behavior <strong>in</strong> terms of <strong>in</strong>formation sources applied.<br />

-<br />

-


7.3.1 Use of <strong>in</strong>formation sources<br />

157<br />

Chapter 7<br />

The respondents’ selection of sources appears <strong>in</strong> Table 7.5. The questionnaire<br />

does not reveal the relative importance of the listed sources to solve certa<strong>in</strong> work tasks,<br />

as it was not <strong>in</strong>corporated <strong>in</strong> the design of the questionnaire. Thus, we have asked<br />

which sources are used by the respondents, but not the frequency of the s<strong>in</strong>gle source.<br />

The content of Table 7.5 therefore expresses the range of <strong>in</strong>formation sources. The<br />

questionnaire allowed for the respondents to propose additional sources besides the<br />

predef<strong>in</strong>ed ones. Also the focus groups contributed with supplementary sources and<br />

verified the sources mentioned by the respondents. The organization demonstrates a<br />

very broad use of <strong>in</strong>formation sources. From the percentages mentioned at the bottom<br />

row of Table 7.5 it appears that the average importance of the predef<strong>in</strong>ed sources varies<br />

to a large extent. From the table it appears that the <strong>in</strong>tranet is the predom<strong>in</strong>ant source of<br />

<strong>in</strong>formation to the employees. On average 85% of all work tasks applies the system for<br />

problem solv<strong>in</strong>g. Also the WWW and reference works are important to the employees.<br />

The predef<strong>in</strong>ed sources can be arranged <strong>in</strong> three overall groups; reference<br />

works, various web sites, and <strong>in</strong>ternal systems. The groups are not mutually exclusive,<br />

but are used to characterize the systems applied. A fourth group came up dur<strong>in</strong>g the<br />

open questions of the questionnaire and dur<strong>in</strong>g the focus groups: Colleagues as sources<br />

of <strong>in</strong>formation. The results regard<strong>in</strong>g this particular source of <strong>in</strong>formation will be<br />

presented <strong>in</strong> section 7.3.2. The groups guide the analysis of use of sources <strong>in</strong> the<br />

sections to follow. The additional sources cover <strong>in</strong>ternal systems apart from the<br />

predef<strong>in</strong>ed sources, other specialized systems, specific websites, and colleagues (see<br />

Appendix 26). The appendix reflects the myriads of sources used <strong>in</strong> a large specialized<br />

organization as SKAT. The sources are <strong>in</strong>cluded <strong>in</strong> the relevant sections below, when it<br />

has a purpose.<br />

7.3.1.1 Reference works<br />

Due to its area of function, SKAT is to a large extent guided by legislation and rules. In<br />

this section we denote reference works as digital and pr<strong>in</strong>ted reference works. This is<br />

mirrored <strong>in</strong> importance of reference works, whether pr<strong>in</strong>ted or digital appear<strong>in</strong>g <strong>in</strong><br />

Table 7.5. From the table the importance of the legal basis of the organization is<br />

emphasized. In general terms the employees use reference works to a large extent: both<br />

types were used <strong>in</strong> about 40% of the work tasks.<br />

The dist<strong>in</strong>ction between pr<strong>in</strong>ted and electronic sources addresses a general<br />

change <strong>in</strong> organizations that pr<strong>in</strong>ted books are phased out for the benefit of pr<strong>in</strong>ted


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

editions. Focus<strong>in</strong>g on the work tasks with above 50 respondents, the digital versions<br />

have a vaguely higher score (see Table 7.5). However pr<strong>in</strong>ted versions are still<br />

important to the employees. The participants mentioned different reasons for still<br />

need<strong>in</strong>g pr<strong>in</strong>ted versions of reference works. The overall label for the reasons was<br />

practical matters. The label covers different motivations. One is the nature of the work<br />

task. The nature of the work tasks designates the aspect that some employees <strong>in</strong> the<br />

organization carry out parts of their work away from their desk top due to meet<strong>in</strong>gs,<br />

either <strong>in</strong>ternally, or externally pay<strong>in</strong>g visits to citizens, bus<strong>in</strong>esses, and other<br />

<strong>government</strong>s. This supports the recommendations given by Garcia et al. (2006), that the<br />

implementation of technology <strong>in</strong> work places should be closely related to how work<br />

tasks are carried out <strong>in</strong> particular sett<strong>in</strong>gs. Also some participants found pr<strong>in</strong>ted<br />

versions are easier to read, and lastly it was mentioned that pr<strong>in</strong>ted versions are easier to<br />

search. Regard<strong>in</strong>g the search<strong>in</strong>g, differences of op<strong>in</strong>ions were expressed <strong>in</strong> the focus<br />

groups though, which may also expla<strong>in</strong> the even use of the two. To exemplify:<br />

“I use electronic reference works a lot. I believe that I f<strong>in</strong>d far the most here.<br />

If I search right, I will get it. But they also conta<strong>in</strong> cross references to all the th<strong>in</strong>g on<br />

the <strong>in</strong>tranet, and it is supposed to get, what is on the Internet too, through the<br />

Parliament and the like.” (R16, p. 5), and<br />

”…as long as you have a pr<strong>in</strong>ted reference work, they are easier to consult.<br />

That is, if you know where to look.” (R23, p. 11).<br />

The two quotes illustrate how the selection of either electronic or pr<strong>in</strong>ted sources is a<br />

matter of the user’s preferences and experience.<br />

7.3.1.2 Web sites<br />

The predef<strong>in</strong>ed websites covered by the head<strong>in</strong>g “Web sites” is the homepage of the<br />

Danish Parliament, m<strong>in</strong>istry homepages, borger.dk 12 , “Rets<strong>in</strong>formation” 13 and the<br />

12<br />

Borger.dk is the Danish common portal of communication between citizens and <strong>government</strong>s. The<br />

portal enables self-service for citizens, but has also got an area for public authorities. See<br />

www.borger.dk.<br />

13<br />

Rets<strong>in</strong>formation is the official danish website conta<strong>in</strong><strong>in</strong>g the acts, their procedural history, historical<br />

law, and the like. The database is located at: www.rets<strong>in</strong>fo.dk.<br />

158


159<br />

Chapter 7<br />

<strong>in</strong>ternet <strong>in</strong> general. As appears from Table 7.5, the preferred resource to use of the five<br />

listed is the Internet. Further, the <strong>in</strong>ternet is the second most used <strong>in</strong>formation source of<br />

all the predef<strong>in</strong>ed types. Of course, the designation of the source may have a say<strong>in</strong>g <strong>in</strong><br />

its predom<strong>in</strong>ance. Thus, <strong>in</strong> pr<strong>in</strong>ciple, the <strong>in</strong>ternet <strong>in</strong> general could <strong>in</strong>clude the rema<strong>in</strong>der<br />

of the web based sources listed <strong>in</strong> the questionnaire. As appears from the table, the<br />

<strong>in</strong>ternet <strong>in</strong> general is the second most frequent source of <strong>in</strong>formation <strong>in</strong> SKAT. Us<strong>in</strong>g<br />

Google for search<strong>in</strong>g was brought forward several times dur<strong>in</strong>g the focus groups. The<br />

search eng<strong>in</strong>e was used for explorative searches and as a gateway to search<strong>in</strong>g other<br />

systems like the <strong>in</strong>tranet. To exemplify:<br />

“For me, if I need rul<strong>in</strong>gs, I use Google even though I know I can access<br />

Rets<strong>in</strong>formation and Thomson too. But I search Google, because I f<strong>in</strong>d the electronic<br />

reference works too bad. Then I f<strong>in</strong>d the rul<strong>in</strong>g <strong>in</strong> Google and then I might get referred<br />

to one of those pages that we are perhaps supposed to use, but I simply f<strong>in</strong>d their search<br />

functionalities too bad.” (R11, p. 5)<br />

Further websites is one of the examples of sources that are closely related to<br />

specific work tasks. Thus both Rets<strong>in</strong>formation, m<strong>in</strong>istry homepages, and the<br />

homepage of the Danish parliament are far more extended <strong>in</strong> the ma<strong>in</strong> processes<br />

“Processes of support” and “Management and development”. The <strong>in</strong>creased use here<br />

demonstrates that the employees to a larger extent than other employees are engaged <strong>in</strong><br />

detailed legal matters <strong>in</strong> the organization.<br />

7.3.1.3 Internal systems<br />

In the predef<strong>in</strong>ed sources, the <strong>in</strong>tranet and Captia (an electronic case management<br />

system) are the representatives of <strong>in</strong>ternal sources of the organization. Apart from the<br />

two, the questionnaire and focus groups reported a number of additional sources with<strong>in</strong><br />

the <strong>in</strong>ternal group of systems. To a large extent, the <strong>in</strong>ternal systems added represent<br />

systems equivalent to Captia. Examples are Dipsy, KMD, Remedy, TST, DR, and the<br />

like. The systems serve different purposes <strong>in</strong> the organization, but have one th<strong>in</strong>g <strong>in</strong><br />

common; they are all systems for registration of either cases, requests or other data.<br />

The systems mentioned reflected local differences as an implication of the mergers, but<br />

also highly specialized systems support<strong>in</strong>g the professional activities of the employees<br />

(see Appendix 26). As a preparation for the search test we were <strong>in</strong>terested <strong>in</strong> f<strong>in</strong>d<strong>in</strong>g<br />

out, if the <strong>in</strong>tranet use differed as to work tasks. Thus, we wanted to f<strong>in</strong>d out if some<br />

work tasks were more appropriate for the search test than others. It


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 7.5 Respondents' use of predef<strong>in</strong>ed <strong>in</strong>formation sources (percentages) (to be cont<strong>in</strong>ued on the succeed<strong>in</strong>g page)<br />

Sources used for<br />

certa<strong>in</strong> work<br />

tasks<br />

Intranet Digital<br />

reference<br />

works<br />

Pr<strong>in</strong>ted<br />

reference<br />

works<br />

Homepage of<br />

the Danish<br />

Parliament<br />

160<br />

Sources<br />

Captia M<strong>in</strong>istry<br />

home-<br />

pages<br />

Borger.dk Rets<strong>in</strong>-<br />

forma-tion<br />

The Internet <strong>in</strong><br />

general<br />

# % # % # % # % # % # % # % # % # %<br />

Instruction 154 85 102 56 98 54 17 9 47 26 31 17 21 12 42 23 89 49<br />

Settlement:<br />

common<br />

Settlement:<br />

prelim<strong>in</strong>ary<br />

assessment of<br />

<strong>in</strong>come/ personal<br />

taxes<br />

Settlement:<br />

bus<strong>in</strong>ess relations<br />

Settlement:<br />

corporation taxes<br />

Settlement:<br />

customs<br />

16 80 11 55 9 45 2 10 6 30 2 10 2 10 3 15 8 40<br />

50 88 42 74 33 58 1 2 6 11 4 7 6 11 5 9 22 39<br />

46 81 34 60 33 58 8 14 14 25 8 14 4 7 12 21 24 42<br />

21 84 21 84 17 68 4 16 10 40 4 16 - - 6 24 16 64<br />

9 75 1 8 7 58 - - 2 17 - - - - - - 2 17<br />

Legend: The table states the percentages of respondents that use a specific <strong>in</strong>formation source for a certa<strong>in</strong> work task. S<strong>in</strong>ce the table at least for some<br />

<strong>in</strong>formation sources reflects a wide variation between work tasks, the last row summarize the total average percentage across all work tasks reported.


Sources used for<br />

certa<strong>in</strong> work<br />

tasks<br />

Settlement:<br />

vehicles<br />

Intranet Digital<br />

reference<br />

works<br />

Pr<strong>in</strong>ted<br />

reference<br />

works<br />

Homepage of<br />

the Danish<br />

Parliament<br />

161<br />

Sources<br />

Captia M<strong>in</strong>istry<br />

home-<br />

pages<br />

Borger.dk Rets<strong>in</strong>-<br />

forma-tion<br />

Chapter 7<br />

The Internet <strong>in</strong><br />

general<br />

# % # % # % # % # % # % # % # % # %<br />

15 83 3 17 4 22 2 11 3 17 2 11 3 17 3 !% 10<br />

Settlement: estate 11 79 8 57 9 64 1 7 6 43 5 36 3 21 5 36 9 64<br />

Inspection:<br />

common<br />

Inspection:<br />

customs<br />

49 80 51 84 43 71 6 10 14 23 4 7 5 8 19 31 35 57<br />

11 69 1 6 5 31 1 6 4 25 1 6 - - 1 6 4 25<br />

Collection 32 82 16 41 11 28 2 5 16 41 3 7 6 15 8 21 13 33<br />

Processes of<br />

support: legal<br />

support<br />

Processes of<br />

support: m<strong>in</strong>ister<br />

service<br />

43 96 36 80 32 71 17 38 17 38 18 40 3 7 22 49 20 44<br />

9 90 5 50 3 30 8 80 4 40 6 60 - - 5 50 5 50<br />

56


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 7.5 Respondents' use of predef<strong>in</strong>ed <strong>in</strong>formation sources (percentages). Part 2 (to be cont<strong>in</strong>ued on the succeed<strong>in</strong>g page)<br />

Sources used for<br />

certa<strong>in</strong> work<br />

tasks<br />

Processes of<br />

support: IT<br />

service and<br />

adm<strong>in</strong>istration<br />

Processes of<br />

support: HR and<br />

education<br />

Processes of<br />

support: <strong>in</strong>ternal<br />

activities<br />

Management and<br />

development:<br />

strategy<br />

Intranet Digital<br />

reference<br />

works<br />

Pr<strong>in</strong>ted<br />

reference<br />

works<br />

Homepage of<br />

the Danish<br />

Parliament<br />

162<br />

Sources<br />

Captia M<strong>in</strong>istry<br />

home-<br />

pages<br />

Borger.dk Rets<strong>in</strong>-<br />

forma-tion<br />

The Internet <strong>in</strong><br />

general<br />

# % # % # % # % # % # % # % # % # %<br />

11 79 1 7 2 14 - - 3 21 - - 1 7 - - 9 64<br />

12 86 3 21 1 7 - - 3 21 3 21 - - 1 7 9 64<br />

13 87 1 7 - - - - 5 33 - - 1 7 2 13 12 80<br />

16 100 3 19 5 31 5 31 1 6 6 38 1 6 3 19 12 75


Sources used for<br />

certa<strong>in</strong> work<br />

tasks<br />

Management and<br />

development:<br />

bus<strong>in</strong>ess<br />

management<br />

Management<br />

and<br />

development:<br />

development<br />

Total average<br />

percentage<br />

Intranet Digital<br />

reference<br />

works<br />

Pr<strong>in</strong>ted<br />

reference<br />

works<br />

Homepage of<br />

the Danish<br />

Parliament<br />

163<br />

Sources<br />

Captia M<strong>in</strong>istry<br />

home-<br />

pages<br />

Borger.dk Rets<strong>in</strong>-<br />

forma-tion<br />

Chapter 7<br />

The Internet <strong>in</strong><br />

general<br />

# % # % # % # % # % # % # % # % # %<br />

14 100 1 7 4 29 2 14 3 21 2 14 1 7 1 7 8 57<br />

26 96 4 15 6 22 8 30 6 22 7 26 - - 9 33 19 70<br />

85 39 40 15 26 17 7 19 52<br />

Legend: The table states the percentages of respondents that use a specific <strong>in</strong>formation source for a certa<strong>in</strong> work task. S<strong>in</strong>ce the table at least for some<br />

<strong>in</strong>formation sources reflects a wide variation between work tasks, the last row summarize the total average percentage across all work tasks reported.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

turned out that the <strong>in</strong>tranet was listed as the most frequently used source across all work<br />

tasks except for one (Inspection: common) (see Table 7.5), and may thus overall be<br />

considered the most important source of <strong>in</strong>formation <strong>in</strong> the organization. Tak<strong>in</strong>g the<br />

focus groups <strong>in</strong>to account, it appears that the <strong>in</strong>tranet holds different functions to the<br />

participants. Key functions are as a library of <strong>in</strong>ternal messages and documents, as a<br />

tool for be<strong>in</strong>g updated on topics of <strong>in</strong>terest, and as a library of specialist <strong>in</strong>formation.<br />

To illustrate:<br />

“…so the <strong>in</strong>formation need that I have [regard<strong>in</strong>g the <strong>in</strong>tranet]is more aimed<br />

towards changes <strong>in</strong> new court decisions, new legislation, and we usually get that from<br />

the <strong>in</strong>tranet. That means that I go there every morn<strong>in</strong>g to see, if anyth<strong>in</strong>g new have<br />

come <strong>in</strong> relation to collection, and that is how I stay updated” (R33, p. 1)<br />

Hav<strong>in</strong>g placed the <strong>in</strong>tranet as the most important general source of <strong>in</strong>formation, the<br />

<strong>in</strong>tranet is at the same time considered a challeng<strong>in</strong>g system to use by the participants.<br />

The challenges has different overall directions: too much <strong>in</strong>formation, irrelevant<br />

<strong>in</strong>formation, and trouble locat<strong>in</strong>g relevant <strong>in</strong>formation. Two quotes exemplify:<br />

“So the <strong>in</strong>tranet, it is our common notice board. And that also decides the<br />

search results. You might even get recipes, if they have been published.” (R7, p. 5), and<br />

“It is very often, when we are answer<strong>in</strong>g agent telephones. For <strong>in</strong>stance when<br />

e-<strong>in</strong>come was new, they would ask us “how do you do a correction of wrongly stated<br />

taxes”, and we also didn’t know many of the questions and then we could search the<br />

<strong>in</strong>tranet, but we gave up. We had to pass them on to someone deal<strong>in</strong>g with it, because it<br />

took us too long, and it was confus<strong>in</strong>g to search the <strong>in</strong>tranet. We couldn’t f<strong>in</strong>d the<br />

answers, we needed, because you got page by page conta<strong>in</strong><strong>in</strong>g the least bit about e<strong>in</strong>come,<br />

that’s what you get.” (XX, settlement, p. 3)<br />

The problems are solved <strong>in</strong> different ways. For documents, that may also be<br />

found at the official web page of the organization, several of the participants mention,<br />

that they f<strong>in</strong>d the web site easier to navigate than the <strong>in</strong>tranet. Others perform a Google<br />

search, either at the whole www or limited to the doma<strong>in</strong> www.skat.dk. A third<br />

common way of solv<strong>in</strong>g the problem is to ask a colleague for help. Regardless the<br />

164


165<br />

Chapter 7<br />

approach applied to solve the <strong>in</strong>tranet search problems, the importance of qualified<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> is stressed.<br />

To sum up a variety of sources are used by the employees along with creativity<br />

to f<strong>in</strong>d <strong>in</strong>formation when needed. In terms of the search test we were <strong>in</strong>formed that<br />

apart from “Settlement: Customs”, the <strong>in</strong>tranet had an extensive use <strong>in</strong> the organization.<br />

That supported our choice of system for the search test.<br />

7.3.2 Colleagues as sources of <strong>in</strong>formation<br />

One general characteristic throughout all of the focus groups is the importance<br />

of colleagues as <strong>in</strong>formation sources. We did not ask about this particular type of<br />

source <strong>in</strong> the questionnaire. Though, a number of respondents mentioned colleagues<br />

and neighbor tra<strong>in</strong><strong>in</strong>g as additional sources <strong>in</strong> the open box below the predef<strong>in</strong>ed answer<br />

options for <strong>in</strong>formation sources. Colleagues as <strong>in</strong>formation sources has been<br />

<strong>in</strong>vestigated <strong>in</strong> the LIS research previously, <strong>in</strong> the public doma<strong>in</strong> (e.g., Hazlett,<br />

McAdam & Beggs, 2008; Woudstra & van den Hooff, 2008) as well as <strong>in</strong> other<br />

professional contexts (e.g., Herzum et al., 2002; Herzum & Pejtersen, 2000; Xu, Tan &<br />

Yang, 2006). Here, the employees put forward two ma<strong>in</strong> reasons for us<strong>in</strong>g colleagues:<br />

For efficiency matters:<br />

“Well, if it is tasks with<strong>in</strong> special problem areas, and we know we have<br />

colleagues with knowledge about it, then it is tempt<strong>in</strong>g to go ask, because the person is<br />

likely to know the latest decisions <strong>in</strong> the area. Instead of start<strong>in</strong>g to… It is also a<br />

matter of time. You can save time by…” (XX, Guidance, p. 11),<br />

And for validation matters:<br />

”Well, I do prefer to consult the customs guidance <strong>in</strong> the first place, and then…<br />

if I am not really sure if anyth<strong>in</strong>g new has come, then I will check out the electronic and<br />

stuff. And then I always go ask…” (R19, p. 5)<br />

To sum up, colleagues are important to the employees, here and <strong>in</strong> related studies.<br />

However, to some extent it is due to <strong>in</strong>effective retrieval systems, which emphasizes the<br />

need for an improved <strong><strong>in</strong>dex<strong>in</strong>g</strong> practice.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 7.6 Questionnaire results regard<strong>in</strong>g the frequency of <strong>in</strong>formation seek<strong>in</strong>g<br />

Work tasks<br />

Every<br />

time<br />

Instruction 33<br />

18%<br />

Settlement: common 5<br />

Settlement: prelim<strong>in</strong>ary<br />

assessment of <strong>in</strong>come/personal<br />

taxes<br />

25%<br />

5<br />

9%<br />

Settlement: bus<strong>in</strong>ess relations 10<br />

18%<br />

Settlement: corporation taxes 10<br />

40%<br />

Settlement: customs 2<br />

17%<br />

Settlement: vehicles 4<br />

22%<br />

Settlement: estate 4<br />

29%<br />

Inspection: common 18<br />

30%<br />

Inspection: customs 5<br />

31%<br />

Collection 8<br />

21%<br />

Processes of support: legal support 14<br />

Processes of support: m<strong>in</strong>ister<br />

service<br />

31%<br />

5<br />

50%<br />

166<br />

Every second<br />

time<br />

20<br />

11%<br />

2<br />

10%<br />

6<br />

11%<br />

6<br />

11%<br />

4<br />

16%<br />

1<br />

8%<br />

2<br />

11%<br />

1<br />

7%<br />

13<br />

21%<br />

-<br />

-<br />

9<br />

20%<br />

1<br />

10%<br />

Frequencies<br />

Every 3rd<br />

or 4 th time<br />

86<br />

48%<br />

5<br />

25%<br />

27<br />

47%<br />

27<br />

47%<br />

10<br />

40%<br />

3<br />

25%<br />

7<br />

39%<br />

6<br />

43%<br />

24<br />

39%<br />

4<br />

25%<br />

14<br />

36%<br />

18<br />

40%<br />

2<br />

20%<br />

Practically<br />

never<br />

42<br />

23%<br />

8<br />

40%<br />

19<br />

33%<br />

14<br />

25%<br />

1<br />

4%<br />

6<br />

50%<br />

5<br />

28%<br />

3<br />

21%<br />

5<br />

8%<br />

7<br />

44%<br />

17<br />

44%<br />

4<br />

9%<br />

2<br />

20%


Work tasks<br />

Processes of support: IT service<br />

and adm<strong>in</strong>istration<br />

Processes of support: HR and<br />

education<br />

Processes of support: <strong>in</strong>ternal<br />

activities<br />

Management and development:<br />

strategy<br />

Management and development:<br />

bus<strong>in</strong>ess management<br />

Management and development:<br />

development 27<br />

Every<br />

time<br />

4<br />

29%<br />

2<br />

14%<br />

3<br />

20%<br />

2<br />

13%<br />

3<br />

21%<br />

6<br />

22%<br />

167<br />

Every second<br />

time<br />

1<br />

7%<br />

3<br />

21%<br />

1<br />

7%<br />

2<br />

13%<br />

2<br />

14%<br />

5<br />

19%<br />

Frequencies<br />

Every 3rd<br />

or 4 th time<br />

5<br />

36%<br />

6<br />

43%<br />

5<br />

33%<br />

7<br />

44%<br />

5<br />

36%<br />

14<br />

52%<br />

7.4 Seek<strong>in</strong>g results regard<strong>in</strong>g demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Chapter 7<br />

Practically<br />

never<br />

4<br />

29%<br />

3<br />

21%<br />

6<br />

40%<br />

5<br />

31%<br />

4<br />

29%<br />

In the sections to follow, we report on the f<strong>in</strong>d<strong>in</strong>gs <strong>in</strong>form<strong>in</strong>g about the demands for<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong>.<br />

7.4.1 The frequency on <strong>in</strong>formation seek<strong>in</strong>g<br />

The need for <strong>in</strong>formation seek<strong>in</strong>g were documented <strong>in</strong> the questionnaire by question 17<br />

(see Appendix 4) regard<strong>in</strong>g frequency of <strong>in</strong>formation seek<strong>in</strong>g. The question was not<br />

aimed specifically at the <strong>in</strong>tranet. Rather the question was formulated broadly <strong>in</strong> order<br />

to <strong>in</strong>vestigate <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> general. This means, that the question maps the<br />

<strong>in</strong>formation seek<strong>in</strong>g regardless the source applied. The distribution for the s<strong>in</strong>gle work<br />

tasks appears <strong>in</strong> Table 7.6. The table shows, that the most common frequency for<br />

<strong>in</strong>formation seek<strong>in</strong>g is every third or fourth time (column 3). Thus, <strong>in</strong> 12 of 19 work<br />

tasks this is the frequency with the highest score. Apparently the general picture is that<br />

2<br />

7%


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

<strong>in</strong>formation seek<strong>in</strong>g does take place rather frequently, but not necessarily every time a<br />

work task is solved.<br />

In the focus groups, the issue of <strong>in</strong>formation seek<strong>in</strong>g received some attention,<br />

because some of the frequencies from the questionnaire did not mirror the frequencies<br />

of the participants. Thus, the participants discussed, what constitutes <strong>in</strong>formation<br />

seek<strong>in</strong>g. Some participants <strong>in</strong>tuitively understood <strong>in</strong>formation seek<strong>in</strong>g as mere look ups<br />

<strong>in</strong> an <strong>in</strong>formation system. One participant says:<br />

“If a client reports to the counter out here, you ask for his civil registration<br />

number and log <strong>in</strong>to his <strong>in</strong>formation. This is the first <strong>in</strong>formation. You cannot serve a<br />

client unless you seek <strong>in</strong>formation at least once... But if someone asks a specialist<br />

question, then the need for <strong>in</strong>formation is not nearly as substantial. Because then you<br />

answer on the basis of someth<strong>in</strong>g you know like the back of your hand...The only<br />

requests that do not require <strong>in</strong>formation are the ones ask<strong>in</strong>g for direction to the motor<br />

unit. They are handed over an <strong>in</strong>struction. Everyone else <strong>in</strong>volves look ups.” (R7, p. 2-<br />

3)<br />

Another participant (R33) supplements:<br />

“We cannot do anyth<strong>in</strong>g without hav<strong>in</strong>g the ICT based possibilities of look<strong>in</strong>g<br />

up companies, demands, what does this company owe, this person, what does he or she<br />

owes. We need to access the network all the time.”<br />

In other words, if <strong>in</strong>formation seek<strong>in</strong>g is understood by the respondents <strong>in</strong> the sense of<br />

mere look-ups <strong>in</strong> some sort of <strong>in</strong>formation system, then <strong>in</strong>formation seek<strong>in</strong>g occurs very<br />

frequently if not even every time a work task is solved.<br />

Information seek<strong>in</strong>g triggered by an <strong>in</strong>formation need occurs less frequently.<br />

The frequency is affected by different conditions. One condition is the number of self<br />

service solutions developed <strong>in</strong> the organization. Self-service implies that citizens are<br />

handl<strong>in</strong>g a range of tasks by themselves. One consequence of this is that some<br />

knowledge areas of the employees are not ma<strong>in</strong>ta<strong>in</strong>ed, because the work tasks that used<br />

to help ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g the knowledge areas are handled by the citizens themselves now. In<br />

relation to the frequency of <strong>in</strong>formation seek<strong>in</strong>g, this means, that the frequency<br />

<strong>in</strong>creases, because an <strong>in</strong>formation need now emerges <strong>in</strong> situations that used to be dealt<br />

with by the employees memory. The follow<strong>in</strong>g quote illustrates this:<br />

168


169<br />

Chapter 7<br />

“...when we were employed by the municipality, our job was to assess as many<br />

people as possible, that is go<strong>in</strong>g through their tax return to see, whether they did it right<br />

or wrong... this means, that back then we ga<strong>in</strong>ed experience all the time and kept up<br />

with what happened <strong>in</strong> this area and this area... now we need to make people use selfservice<br />

and make error lists, so we keep los<strong>in</strong>g, what we once used to know by memory.<br />

I certa<strong>in</strong>ly feel, that many of the questions, I used to answer just like that, now requires<br />

read<strong>in</strong>g. Just to be brought up to date and see, if someth<strong>in</strong>g new has occurred s<strong>in</strong>ce the<br />

last time.” (R10, p. 4)<br />

This discussion may also expla<strong>in</strong>, why several work tasks <strong>in</strong> Table 7.6 has a peak of<br />

frequency at both “Every 3 rd or 4 th time” and “Every time”.<br />

Another condition is the prior knowledge of the case handled. Accord<strong>in</strong>g to<br />

R35 <strong>in</strong>formation seek<strong>in</strong>g only takes place, if:<br />

“... you are handl<strong>in</strong>g a completely new case. Then, obviously, I need to seek<br />

more <strong>in</strong>formation about this company. If it is a company, I know <strong>in</strong> advance, I might<br />

just check, what has been declared and what has been paid. But no matter what, I<br />

always seek before I am go<strong>in</strong>g to talk to a company.”<br />

Some work tasks differ from the general tendency of “Every 3 rd or 4 th time”<br />

be<strong>in</strong>g the most common frequency. Us<strong>in</strong>g percentage distribution as an <strong>in</strong>dicator, seven<br />

work tasks generate noticeably more or less frequent <strong>in</strong>formation seek<strong>in</strong>g than the most<br />

frequent category.<br />

With<strong>in</strong> “Processes of support” two work tasks differed from the overall pattern.<br />

“M<strong>in</strong>ister service” generated a higher frequency of <strong>in</strong>formation seek<strong>in</strong>g with the largest<br />

percentage share of all work tasks on “Every time”. “Internal activities” on the other<br />

hand had a lower frequency than the general picture ad had the majority of the<br />

respondents seek<strong>in</strong>g for <strong>in</strong>formation every third or fourth time or practically never.<br />

Neither of the work tasks had a lot of respondents. But still, the focus groups and the<br />

description of the work tasks added to our understand<strong>in</strong>g of the respondents’ behaviour<br />

<strong>in</strong> the two particular work tasks.<br />

“M<strong>in</strong>ister service” was not directly represented <strong>in</strong> the focus group <strong>in</strong>terviews,<br />

but was discussed <strong>in</strong> the focus group for “Processes of support”. The <strong>in</strong>terview<br />

supported, that “M<strong>in</strong>ister service” make a special case as to the frequency of


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

<strong>in</strong>formation seek<strong>in</strong>g due to the content of the work task. R14 compares it to the other<br />

work tasks with<strong>in</strong> Processes of support this way:<br />

“My spontaneous explanation for “M<strong>in</strong>ister service” is, that, well, so much<br />

more is at stake, when servic<strong>in</strong>g the m<strong>in</strong>ister. You need to be so much more certa<strong>in</strong>.<br />

...with “M<strong>in</strong>ister service”, you need to be 100% certa<strong>in</strong>. Of course you need to <strong>in</strong> other<br />

cases as well, but more is just at stake with “M<strong>in</strong>ister service”... You need to be 100%<br />

sure, that what you write and produce and contribute with is correct.” (R14, p. 3)<br />

Thus, it seems, that the importance of correct <strong>in</strong>formation becomes even more<br />

important, when passed on to the target group of “M<strong>in</strong>ister service”.<br />

“Internal activities” on the other hand generated an average frequency of<br />

<strong>in</strong>formation seek<strong>in</strong>g that was fairly low compared to the general picture. Thus, the<br />

majority of respondent selected either “every 3 rd or 4 th time” or “Practically never” to<br />

describe their frequency. In the focus groups, the two representatives of “Internal<br />

activities” were rather different. One (R32) took care of mail, that could not be<br />

delivered directly to the relevant party. The other one (R28) worked with<br />

communication. This difference of work tasks may expla<strong>in</strong> the distribution of<br />

frequencies of <strong>in</strong>formation seek<strong>in</strong>g. To R32, what made the frequency of <strong>in</strong>formation<br />

seek<strong>in</strong>g decrease was the k<strong>in</strong>d of <strong>in</strong>formation needed:<br />

“In our group, experience is more important. We almost need to know, what<br />

the different departments are do<strong>in</strong>g, and that is what we try to... But all the time it is<br />

what you remember. He has got someth<strong>in</strong>g to do with this and he has got someth<strong>in</strong>g to<br />

do with that. You can practically not look it up anywhere” (R32, p. 4)<br />

However, when asked about <strong>in</strong>formation sources later on <strong>in</strong> the <strong>in</strong>terview, it appeared<br />

that a number of sources were considered highly necessary <strong>in</strong> order to solve the work<br />

task at hand.<br />

R28 on the other hand considered “Every 3 rd or 4 th time” <strong>in</strong>sufficient when<br />

describ<strong>in</strong>g her own frequency of <strong>in</strong>formation seek<strong>in</strong>g. When asked if she looked for<br />

<strong>in</strong>formation more often than “Every 3 rd or 4 th time”, she replied:<br />

“Yes, I th<strong>in</strong>k so, because I also use it to orientate myself about some th<strong>in</strong>gs<br />

before I show up or answer an e-mail... But it is also related to how I understand a task<br />

170


171<br />

Chapter 7<br />

because to me the <strong>in</strong>tranet and seek<strong>in</strong>g is a part of my job all the time about, well both<br />

SKAT as a bus<strong>in</strong>ess but also the subject area, I am work<strong>in</strong>g with. So you somehow<br />

either seek <strong>in</strong>formation or have signed up for a news mail... And all that <strong>in</strong>formation<br />

aids to how a task is solved one way or the other.”(R28, p. 6)<br />

It seems that the actual frequency of <strong>in</strong>formation seek<strong>in</strong>g is rather frequent with<strong>in</strong><br />

“Processes of support”. The reason for the high frequency of “Practically never” at<br />

“Internal processes” may be expla<strong>in</strong>ed by the sub work tasks that are also <strong>in</strong>cluded <strong>in</strong><br />

the overall description of “Internal processes”, for <strong>in</strong>stance purchas<strong>in</strong>g and<br />

adm<strong>in</strong>istrat<strong>in</strong>g goods, services, and build<strong>in</strong>gs. These are not work tasks that necessarily<br />

generate a high frequency of <strong>in</strong>formation seek<strong>in</strong>g.<br />

Other work tasks generated less <strong>in</strong>formation seek<strong>in</strong>g than the general tendency<br />

and had the majority of respondents <strong>in</strong>dicat<strong>in</strong>g “Practically never” as the frequency of<br />

their <strong>in</strong>formation seek<strong>in</strong>g. The specific work tasks are: “Settlement: common”,<br />

“Settlement: customs”, “Inspection: customs”, and “Collection”.<br />

“Customs”, whether <strong>in</strong> the ma<strong>in</strong> process of “Settlement” or “Inspection”, were<br />

discussed <strong>in</strong> the fourth focus group. All participants turned out to have their primary<br />

function with<strong>in</strong> the ma<strong>in</strong> process of “Settlement”, but also had some <strong>in</strong>sight <strong>in</strong>to<br />

Inspection. The participants had difficulties relat<strong>in</strong>g to, that the majority of respondents<br />

of “Settlement: customs” had answered “Practically never” to represent their frequency<br />

of <strong>in</strong>formation seek<strong>in</strong>g. They did provide examples of work tasks that did not require<br />

<strong>in</strong>formation seek<strong>in</strong>g, either because the <strong>in</strong>formation needed was well known to them or<br />

was already a part of the papers provided for the case. But a large part of the work tasks<br />

carried out needed some k<strong>in</strong>d of <strong>in</strong>formation seek<strong>in</strong>g. The participants of the focus<br />

group did not come to at full agreement on, what was the correct frequency, but the<br />

group agreed, that “Practically never” did not provide a sufficient picture of the actual<br />

frequency.<br />

A hypothesis for the doma<strong>in</strong> study was that the seek<strong>in</strong>g behaviour would differ<br />

depend<strong>in</strong>g on the work task at hand. Look<strong>in</strong>g at Table 7.6, this hypothesis to some<br />

extent is confirmed. A few work tasks stand out with a more frequent behaviour, others<br />

with a less frequent behaviour. Though the general impression is that the employees<br />

look for <strong>in</strong>formation regularly, but not every time they are engaged with a work task.<br />

However disagreements as to the general figures could be traced <strong>in</strong> the focus groups,<br />

<strong>in</strong>dicat<strong>in</strong>g that numbers from the table are percentages, and that <strong>in</strong>dividual differences<br />

occur. In the recruitment of test persons for the search test we wanted to reflect this


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 7.7 Distribution of <strong>in</strong>dicators of <strong>in</strong>formation needs<br />

Work task Information needs<br />

Instruction (181) 38<br />

172<br />

1 2 3 4 5 6 7<br />

21%<br />

Settlement: common (20) 5<br />

Settlement: prelim<strong>in</strong>ary assessment of<br />

<strong>in</strong>come/personal taxes (57)<br />

25%<br />

16<br />

28%<br />

Settlement: bus<strong>in</strong>ess relations (57) 17<br />

30%<br />

Settlement: corporation taxes (25) 3<br />

Settlement: customs (12)<br />

12%<br />

Settlement: vehicles (18) 8<br />

44%<br />

Settlement: estate (14) 5<br />

36%<br />

Inspection: common (61) 13<br />

21%<br />

Inspection: customs (16) 4<br />

25%<br />

Collection (39) 6<br />

15%<br />

Processes of support: legal support (45) 14<br />

Processes of support: m<strong>in</strong>ister service<br />

(10)<br />

Processes of support: IT service and<br />

adm<strong>in</strong>istration (14)<br />

Processes of support: HR and education<br />

(14)<br />

Processes of support: <strong>in</strong>ternal activities<br />

(15)<br />

31%<br />

4<br />

29%<br />

2<br />

14%<br />

3<br />

20%<br />

84<br />

46%<br />

11<br />

55%<br />

26<br />

46%<br />

27<br />

47%<br />

13<br />

52%<br />

3<br />

25%<br />

9<br />

50%<br />

8<br />

57%<br />

37<br />

61%<br />

5<br />

31%<br />

13<br />

33%<br />

26<br />

58%<br />

4<br />

40%<br />

7<br />

50%<br />

8<br />

57%<br />

8<br />

53%<br />

46<br />

25%<br />

6<br />

30%<br />

17<br />

30%<br />

15<br />

26%<br />

6<br />

24%<br />

4<br />

33%<br />

5<br />

28%<br />

4<br />

29%<br />

18<br />

30%<br />

8<br />

50%<br />

7<br />

18%<br />

16<br />

36%<br />

3<br />

30%<br />

1<br />

7%<br />

5<br />

36%<br />

5<br />

33%<br />

26<br />

14%<br />

4<br />

20%<br />

4<br />

7%<br />

8<br />

14%<br />

3<br />

12%<br />

1<br />

8%<br />

3<br />

17%<br />

1<br />

7%<br />

16<br />

26%<br />

3<br />

19%<br />

3<br />

8%<br />

5<br />

11%<br />

5<br />

50%<br />

1<br />

7%<br />

1<br />

7%<br />

4<br />

27%<br />

48<br />

27%<br />

5<br />

25%<br />

19<br />

33%<br />

13<br />

23%<br />

5<br />

20%<br />

2<br />

17%<br />

4<br />

22%<br />

3<br />

21%<br />

24<br />

39%<br />

3<br />

19%<br />

6<br />

15%<br />

16<br />

36%<br />

5<br />

50%<br />

3<br />

21%<br />

4<br />

29%<br />

2<br />

13%<br />

51<br />

28%<br />

4<br />

20%<br />

20<br />

35%<br />

15<br />

26%<br />

6<br />

24%<br />

2<br />

17%<br />

3<br />

17%<br />

4<br />

29%<br />

26<br />

43%<br />

4<br />

25%<br />

12<br />

31%<br />

14<br />

31%<br />

4<br />

40%<br />

6<br />

43%<br />

6<br />

43%<br />

4<br />

27%<br />

117<br />

65%<br />

13<br />

65%<br />

36<br />

63%<br />

27<br />

47%<br />

18<br />

72%<br />

6<br />

50%<br />

11<br />

61%<br />

9<br />

64%<br />

42<br />

69%<br />

5<br />

31%<br />

23<br />

59%<br />

34<br />

76%<br />

5<br />

50%<br />

7<br />

50%<br />

9<br />

64%<br />

11<br />

73%


Work task Information needs<br />

Management and development: strategy<br />

(16)<br />

Management and development: bus<strong>in</strong>ess<br />

management (14)<br />

Management and development:<br />

development (27)<br />

173<br />

Chapter 7<br />

1 2 3 4 5 6 7<br />

3<br />

19%<br />

3<br />

21%<br />

5<br />

19%<br />

8<br />

50%<br />

6<br />

43%<br />

13<br />

48%<br />

5<br />

31%<br />

3<br />

21%<br />

5<br />

19%<br />

8<br />

50%<br />

5<br />

36%<br />

9<br />

33%<br />

8<br />

50%<br />

4<br />

29%<br />

10<br />

37%<br />

9<br />

56%<br />

7<br />

50%<br />

15<br />

56%<br />

10<br />

63%<br />

7<br />

50%<br />

17<br />

63%<br />

Legend:<br />

1) I know exactly which documents I need <strong>in</strong> order to solve the work task<br />

2) I need to f<strong>in</strong>d a document I have used before<br />

3) I pretty much know which documents exist on the subject<br />

4) I am work<strong>in</strong>g with a new project with<strong>in</strong> a subject area well known to me. I would like to<br />

acqua<strong>in</strong>t myself with the part that is new to me<br />

5) I am look<strong>in</strong>g for documents for a new work task with<strong>in</strong> a subject area that is familiar to me<br />

6) I am work<strong>in</strong>g with a subject area that I have not been work<strong>in</strong>g with before<br />

7) I know the subject well but need a specific piece of <strong>in</strong>formation<br />

<strong>in</strong>dividuality. To do this, it was decided to let the <strong>in</strong>dividual frequency use of the<br />

<strong>in</strong>tranet guide, who was selected as test persons for the test.<br />

7.4.2 Types of <strong>in</strong>formation needs<br />

Types of <strong>in</strong>formation needs were <strong>in</strong>vestigated <strong>in</strong> the questionnaire <strong>in</strong> terms of a<br />

number of <strong>in</strong>dicators of each of the three <strong>in</strong>formation needs employed <strong>in</strong> the thesis. It is<br />

important to keep <strong>in</strong> m<strong>in</strong>d that the question about <strong>in</strong>formation needs is formulated<br />

specifically towards the <strong>in</strong>tranet due to the search test. If the range and diversity of<br />

sources applied by the employees is taken <strong>in</strong>to account (see section 7.3.1), it is possible<br />

that specific sources are used for certa<strong>in</strong> <strong>in</strong>formation needs. What we are report<strong>in</strong>g on<br />

<strong>in</strong> the present section is therefore the <strong>in</strong>formation needs that are solved us<strong>in</strong>g the<br />

<strong>in</strong>tranet.<br />

Information needs were represented <strong>in</strong> the questionnaire as a number of<br />

<strong>in</strong>dicators represent<strong>in</strong>g the <strong>in</strong>formation needs suggested by Ingwersen (1992). The<br />

distribution of the respondents across work tasks and <strong>in</strong>formation need <strong>in</strong>dicators<br />

appears from Table 7.7. Two <strong>in</strong>dicators <strong>in</strong> particular describe the situation of the<br />

respondents across the work tasks, namely <strong>in</strong>dicator 2 (I need to f<strong>in</strong>d a document I have


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

used before) and 7 (I know the subject well but need to f<strong>in</strong>d a specific piece of<br />

<strong>in</strong>formation). Thus, <strong>in</strong> most of the 19 work tasks, these are the most frequently<br />

occurr<strong>in</strong>g situations trigger<strong>in</strong>g an <strong>in</strong>formation need.<br />

This distribution corresponds well with the focus group results. Thus several<br />

participants express, that seek<strong>in</strong>g carried out at the <strong>in</strong>tranet is usually focused and that<br />

more open searches are carried out elsewhere. Accord<strong>in</strong>g to R23:<br />

“I do not use [the <strong>in</strong>tranet] to seek without a specific goal. I would at Google,<br />

otherwise not. Used or seen before… It is possible, that you have not used the<br />

document before, but you have seen it before at the least.”. (R23, p. 12)<br />

R18 agrees:<br />

“I know this document is <strong>in</strong> there and I need to use it now. Or: I know this<br />

court rul<strong>in</strong>g exists and I need to f<strong>in</strong>d it now. Or someth<strong>in</strong>g else... Typically probably<br />

someth<strong>in</strong>g I have seen before, that I need to use aga<strong>in</strong>.” (R18, p. 5)<br />

A third <strong>in</strong>dicator is common <strong>in</strong> the work tasks belong<strong>in</strong>g to the ma<strong>in</strong> process<br />

management and development, namely <strong>in</strong>dicator 6 (I am work<strong>in</strong>g with a subject area<br />

that I have not been work<strong>in</strong>g with before). The focus group on management and<br />

development clarifies why. Thus, management and development is a ma<strong>in</strong> process,<br />

where new projects are planned, developed, and launched. Thus, look<strong>in</strong>g for <strong>in</strong>spiration<br />

was the participants’ explanation for the higher frequency for <strong>in</strong>dicator 6. For<br />

Inspection: customs, the most frequent <strong>in</strong>dicator is number 3 (I pretty much know which<br />

documents exist on the subject). This may be related to the frequency of <strong>in</strong>formation<br />

seek<strong>in</strong>g for the work task mentioned above (Section 7.4.1). Thus, it seems, that this<br />

particular work task deal<strong>in</strong>g with field work related to controll<strong>in</strong>g goods and means of<br />

transportation is rout<strong>in</strong>e, and that the employees know the sources needed to solve the<br />

work task.<br />

On the other side, two <strong>in</strong>dicators generally have low frequencies, namely 4 (I<br />

am work<strong>in</strong>g with a new project with<strong>in</strong> a subject area well known to me. I would like to<br />

acqua<strong>in</strong>t myself with the part that is new to me) and 1 (I know exactly which documents<br />

I need <strong>in</strong> order to solve the work task). Indicator 4 may have a low frequency, because<br />

it is less frequent to be start<strong>in</strong>g up new projects than deal<strong>in</strong>g with rout<strong>in</strong>e types of tasks.<br />

174


175<br />

Chapter 7<br />

We previously outl<strong>in</strong>ed the <strong>in</strong>formation needs correspond<strong>in</strong>g to the <strong>in</strong>dicators<br />

(see Table 6.1). A translation of Table 7.7 to the <strong>in</strong>herent <strong>in</strong>formation needs referred to<br />

by the <strong>in</strong>dicators, displays the predom<strong>in</strong>ant <strong>in</strong>formation needs of the respondents. Table<br />

7.8 displays the average percentage distribution of the three types of <strong>in</strong>formation needs<br />

underly<strong>in</strong>g the <strong>in</strong>dicators from Table 7.7. Aga<strong>in</strong> we see that the verificative and the<br />

conscious topical needs are the most common <strong>in</strong>formation needs.<br />

Table 7.8 Average percentage distribution of verificative needs (VN), conscious topical needs<br />

(CTN), and muddled topical needs (MTN).<br />

Work tasks Information needs<br />

VN CTN MTN<br />

Instruction (181) 38% 39% 21%<br />

Settlement common (20) 40% 40% 20%<br />

prelim<strong>in</strong>ary assessment of<br />

<strong>in</strong>come/personal taxes (57)<br />

37% 42% 21%<br />

bus<strong>in</strong>ess relations (57) 39% 32% 20%<br />

corporation taxes (25) 32% 39% 18%<br />

customs (12) 13% 33% 13%<br />

vehicles (18) 47% 37% 17%<br />

estate (14) 46% 38% 18%<br />

Inspection common (61) 41% 46% 34%<br />

customs (16) 28% 33% 22%<br />

Collection (39) 24% 31% 19%<br />

Processes of support legal support (45) 44% 49% 21%<br />

Management and<br />

development:<br />

m<strong>in</strong>ister service (10) 20% 43% 45%<br />

IT service and adm<strong>in</strong>istration (14) 39% 26% 25%<br />

HR and education (14) 36% 43% 25%<br />

<strong>in</strong>ternal activities (15) 37% 40% 27%<br />

strategy (16) 34% 48% 53%<br />

bus<strong>in</strong>ess management (14) 32% 33% 43%<br />

development (27) 33% 40% 44%<br />

Legend: The table displays the mean of occurrences of one or more representatives of the<br />

<strong>in</strong>dicators of <strong>in</strong>formation needs.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Some work tasks has a different rank<strong>in</strong>g of importance as to <strong>in</strong>formation needs.<br />

Three work tasks <strong>in</strong> particular stand out, namely the work tasks belong<strong>in</strong>g to the ma<strong>in</strong><br />

process Management and development. Here, muddled topical needs are the most<br />

frequently occurr<strong>in</strong>g needs.<br />

One aspect of <strong>in</strong>formation seek<strong>in</strong>g is not triggered by a specific work task<br />

when it comes to the <strong>in</strong>tranet at SKAT. The <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> question is the<br />

seek<strong>in</strong>g carried out by the employees <strong>in</strong> order to ma<strong>in</strong>ta<strong>in</strong> a current state of knowledge<br />

as to their work tasks. Thus, besides seek<strong>in</strong>g for <strong>in</strong>formation <strong>in</strong> relation to specific work<br />

tasks, the <strong>in</strong>tranet is also used to stay updated on recent developments with<strong>in</strong> topics of<br />

<strong>in</strong>terest to the employees. The <strong>in</strong>tranet is cont<strong>in</strong>uously updated with news and updates.<br />

This flow of <strong>in</strong>formation is partly caused by the characteristic of the foundation of<br />

the organization. Thus, the work at SKAT is largely guided by legal rules that<br />

constantly evolve. This further means, that the knowledge of the employees needs to be<br />

stay updated. Several of the participants state that they consult the <strong>in</strong>tranet on a daily<br />

basis for updates with<strong>in</strong> their work<strong>in</strong>g areas.<br />

We cannot verify this particular behaviour on the basis of the respondents,<br />

s<strong>in</strong>ce the questionnaire did not aim at <strong>in</strong>vestigat<strong>in</strong>g this k<strong>in</strong>d of behaviour. Instead this<br />

characteristic of the employees’ seek<strong>in</strong>g behaviour was revealed dur<strong>in</strong>g the focus group<br />

<strong>in</strong>terviews.<br />

”...I th<strong>in</strong>k, that the <strong>in</strong>tranet and seek<strong>in</strong>g is a part of my work all the time, also<br />

just keep<strong>in</strong>g myself updated on, well both on SKAT as a bus<strong>in</strong>ess but also the field, I am<br />

work<strong>in</strong>g with. So you somehow either seek <strong>in</strong>formation or have signed up for a<br />

newsletter, and then you receive the <strong>in</strong>formation that way. And all that <strong>in</strong>formation<br />

helps you solve the work task one way or the other.”<br />

Another participant (R28) agrees:<br />

“It also the place, where control signals and the like are com<strong>in</strong>g. What we<br />

need to obey with<strong>in</strong> the bus<strong>in</strong>ess. And also… the directions, the legal directions. When<br />

they are updated, they are published there too. So there is a lot to be attentive to there,<br />

really. You cannot avoid it. It would be scary, if it was not at 100 %, our <strong>in</strong>tranet. You<br />

sort of need to be <strong>in</strong> there to be able to do your job.” (R28, p. 10-11)<br />

176


177<br />

Chapter 7<br />

This behavior is not dist<strong>in</strong>ctive for the doma<strong>in</strong> <strong>in</strong> question here. Thus, similar f<strong>in</strong>d<strong>in</strong>gs<br />

have been made <strong>in</strong> different doma<strong>in</strong>s. With<strong>in</strong> the doma<strong>in</strong> of eng<strong>in</strong>eer<strong>in</strong>g Bigdeli (2007)<br />

found, that develop<strong>in</strong>g knowledge and expertise was among the most important<br />

motivations to look for <strong>in</strong>formation. Further, Del Fiol et al. (2008) <strong>in</strong>cludes knowledge<br />

update as a criterion for success <strong>in</strong> their evaluation of an <strong>in</strong>formation system for<br />

cl<strong>in</strong>icians. Information needs that are not directly tied to a work task thus occur <strong>in</strong> other<br />

professional user groups apart from the one <strong>in</strong> question <strong>in</strong> the thesis.<br />

To sum up, the most frequently occurr<strong>in</strong>g <strong>in</strong>formation needs on the <strong>in</strong>tranet are<br />

verificative and conscious topical needs. Aga<strong>in</strong> we see “M<strong>in</strong>ister service” stand out<br />

with a high score on all <strong>in</strong>dicators. However, this reflects the high frequency of<br />

<strong>in</strong>formation seek<strong>in</strong>g as to the work task reported <strong>in</strong> the prior section. In terms of<br />

<strong>in</strong>formation seek<strong>in</strong>g this work task differs from the rema<strong>in</strong>der.<br />

7.4.3 Preferred metadata<br />

Each group of questions regard<strong>in</strong>g a work task <strong>in</strong> the questionnaire were closed by<br />

ask<strong>in</strong>g the respondents, which metadata they would like to be able to apply for<br />

search<strong>in</strong>g the <strong>in</strong>tranet 14 . The distribution of the respondents’ preferences appears from<br />

Table 7.9. The here it is evident that the most desired type of metadata among the<br />

employees is concerned with the topic of the document. Though the percentage po<strong>in</strong>ts<br />

is vary<strong>in</strong>g, the metadata “subject” has the highest occurrence <strong>in</strong> 16 out of 19 work tasks.<br />

The importance of a well-function<strong>in</strong>g description of the subjects of the documents is<br />

obvious (with an emphasis on well-function<strong>in</strong>g). As addressed by a focus group<br />

participant:<br />

It all depends how good you are at describ<strong>in</strong>g the subject. Which terms are<br />

used? Who divides it <strong>in</strong>to the superior subjects that can be searched for? It all depends<br />

on the quality of what is there. And the people, who uploaded it.” (R1, p. 10).<br />

The orientation towards the subject and content of documents is hardly surpris<strong>in</strong>g.<br />

What is <strong>in</strong>terest<strong>in</strong>g, though, is that requests for superior subjects (the upper level of the<br />

taxonomy) are far less extended. We <strong>in</strong>terpret it as a request for metadata support<strong>in</strong>g<br />

14 The list of preferred metadata and their probes from the questionnaire appears from Table 6.2.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 7.9 Metadata preferences distributed across work tasks<br />

Work tasks Metadata<br />

Instruction 38<br />

1 2 3 4 5 6 7 8 9 10 11 12 13<br />

(21%)<br />

Settlement: common 3<br />

(15%)<br />

Settlement: prelim<strong>in</strong>ary 21<br />

assessment of (37%)<br />

<strong>in</strong>come/personal taxes<br />

Settlement: bus<strong>in</strong>ess 17<br />

relations<br />

(30%)<br />

Settlement: corporation 4<br />

taxes<br />

(16%)<br />

Settlement: customs 1<br />

(8%)<br />

31<br />

(17%)<br />

3<br />

(15%)<br />

14<br />

(25%)<br />

10<br />

(18%)<br />

9<br />

(36%)<br />

1<br />

(8%)<br />

117<br />

(65%)<br />

12<br />

(60%)<br />

36<br />

(63%)<br />

33<br />

(58%)<br />

15<br />

(60%)<br />

4<br />

(33%)<br />

65<br />

(36%)<br />

8<br />

(40%)<br />

19<br />

(33%)<br />

21<br />

(37%)<br />

14<br />

(56%)<br />

3<br />

(25%)<br />

77<br />

(43%)<br />

10<br />

(50%)<br />

24<br />

(42%)<br />

22<br />

(39%)<br />

9<br />

(36%)<br />

1<br />

(8%)<br />

178<br />

59<br />

(33%)<br />

8<br />

(40%)<br />

12<br />

(21%)<br />

16<br />

(28%)<br />

5<br />

(20%)<br />

2<br />

(17%)<br />

20<br />

(11%)<br />

2<br />

(10%)<br />

5<br />

(9%)<br />

8<br />

(14%)<br />

2<br />

(8%)<br />

23<br />

(13%)<br />

2<br />

(10%)<br />

8<br />

(14%)<br />

4<br />

(7%)<br />

3<br />

(12%)<br />

24<br />

(13%)<br />

2<br />

(10%)<br />

9<br />

(16%)<br />

10<br />

(18%)<br />

3<br />

(12%)<br />

60<br />

(33%)<br />

8<br />

(40%)<br />

18<br />

(32%)<br />

17<br />

(30%)<br />

12<br />

(48)<br />

2<br />

(17%)<br />

53<br />

(29%)<br />

5<br />

(25%)<br />

17<br />

(30%)<br />

14<br />

(25%)<br />

11<br />

(44%)<br />

2<br />

(17%)<br />

18<br />

(10%)<br />

2<br />

(10%)<br />

7<br />

(12%)<br />

7<br />

(12%)<br />

2<br />

(8%)<br />

83<br />

(46%)<br />

5<br />

(25%)<br />

24<br />

(42%)<br />

19<br />

(33%)<br />

9<br />

(36%)<br />

5<br />

(42%)<br />

Legend: The table displays the total numbers of respondents with<strong>in</strong> a work task choos<strong>in</strong>g a certa<strong>in</strong> type of metadata. The percentages refer to<br />

percentages of all respondents with<strong>in</strong> the work task <strong>in</strong> the questionnaire. The numbers of columns represent the metadata conta<strong>in</strong>ed <strong>in</strong> the<br />

questionnaire, namely: 1) Target group, 2) Superior subject, 3) Subject, 4) Name of legal text or court decision, 5) Object, 6) Activity, 7) Geographic<br />

data, 8) Responsible <strong>in</strong>stitution or department, 9) Project, 10) Document type, 11) Document number, 12) Document ID, 13) Work task.


Work tasks Metadata<br />

Settlement: vehicles 3<br />

179<br />

Chapter 7<br />

1 2 3 4 5 6 7 8 9 10 11 12 13<br />

(17%)<br />

Settlement: estate 3<br />

(21%)<br />

Inspection: common 13<br />

(21%)<br />

Inspection: customs 4<br />

(25%)<br />

Collection 6<br />

(15%)<br />

Processes of support: 12<br />

legal support<br />

(27%)<br />

Processes of support: 4<br />

m<strong>in</strong>ister service (40%)<br />

Processes of support: 3<br />

IT service and (21%)<br />

adm<strong>in</strong>istration<br />

Processes of support: 3<br />

HR and education (21%)<br />

5<br />

(28%)<br />

3<br />

(21%)<br />

19<br />

(31%)<br />

2<br />

(13%)<br />

8<br />

(21%)<br />

16<br />

(36%)<br />

6<br />

(60%)<br />

5<br />

(36%)<br />

4<br />

(29%)<br />

9<br />

(50%)<br />

12<br />

(86%)<br />

44<br />

(72%)<br />

8<br />

(50%)<br />

23<br />

(59%)<br />

35<br />

(78%)<br />

8<br />

(80%)<br />

6<br />

(43%)<br />

9<br />

(64%)<br />

7<br />

(39%)<br />

7<br />

(50%)<br />

30<br />

(49%)<br />

5<br />

(31%)<br />

8<br />

(21%)<br />

29<br />

(64%)<br />

6<br />

(60%)<br />

4<br />

(29%)<br />

2<br />

(14%)<br />

11<br />

(61%)<br />

8<br />

(57%)<br />

24<br />

(39%)<br />

3<br />

(19%)<br />

15<br />

(39%)<br />

18<br />

(40%)<br />

6<br />

(60%)<br />

3<br />

(21%)<br />

3<br />

(21%)<br />

2<br />

(11%)<br />

10<br />

(74%)<br />

17<br />

(28%)<br />

6<br />

(38%)<br />

16<br />

(41%)<br />

12<br />

(27%)<br />

4<br />

(40%)<br />

4<br />

(29%)<br />

1<br />

(7%)<br />

2<br />

(11%)<br />

3<br />

(21%)<br />

1<br />

(2%)<br />

3<br />

(19%)<br />

3<br />

(8%)<br />

5<br />

(11%)<br />

3<br />

(30%)<br />

4<br />

(29%)<br />

2<br />

(14%)<br />

2<br />

(11%)<br />

2<br />

(14%)<br />

8<br />

(13%)<br />

3<br />

(19%)<br />

4<br />

(10%)<br />

11<br />

(24%)<br />

3<br />

(30%)<br />

3<br />

(21%)<br />

5<br />

(36%)<br />

1<br />

(6%)<br />

1<br />

(7%)<br />

11<br />

(18%)<br />

2<br />

(13%)<br />

3<br />

(8%)<br />

5<br />

(11%)<br />

5<br />

(50%)<br />

6<br />

(43%)<br />

2<br />

(14%)<br />

1<br />

(6%)<br />

7<br />

(50%)<br />

29<br />

(48%)<br />

6<br />

(38%)<br />

12<br />

(31%)<br />

24<br />

(53%)<br />

8<br />

(80%)<br />

3<br />

(21%)<br />

1<br />

(7%)<br />

1<br />

(6%)<br />

6<br />

(43%)<br />

27<br />

(44%)<br />

4<br />

(25%)<br />

7<br />

(18%)<br />

19<br />

(42%)<br />

4<br />

(40%)<br />

2<br />

(14%)<br />

2<br />

(14%)<br />

1<br />

(6%)<br />

1<br />

(7%)<br />

5<br />

(5%)<br />

1<br />

(6%)<br />

2<br />

(5%)<br />

5<br />

(11%)<br />

2<br />

(20%)<br />

2<br />

(14%)<br />

2<br />

(14%)<br />

6<br />

(33%)<br />

5<br />

(36%)<br />

23<br />

(38%)<br />

9<br />

(56%)<br />

18<br />

(46%)<br />

22<br />

(49%)<br />

6<br />

(60%)<br />

5<br />

(36%)<br />

6<br />

(43%)


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Work tasks Metadata<br />

Processes of support:<br />

<strong>in</strong>ternal activities<br />

Management and<br />

development: strategy<br />

Management and<br />

development: bus<strong>in</strong>ess<br />

management<br />

Management and<br />

development:<br />

development<br />

1 2 3 4 5 6 7 8 9 10 11 12 13<br />

2<br />

(13%)<br />

4<br />

(25%)<br />

6<br />

(43%)<br />

7<br />

(26%)<br />

5<br />

(33%)<br />

6<br />

(38%)<br />

8<br />

(57%)<br />

12<br />

(44%)<br />

9<br />

(60%)<br />

13<br />

(81%)<br />

10<br />

71%)<br />

18<br />

(67%)<br />

2<br />

(13%)<br />

3<br />

(19%)<br />

5<br />

(36%)<br />

10<br />

(37%)<br />

2<br />

(13%)<br />

6<br />

(38%)<br />

6<br />

(43%)<br />

9<br />

(33%)<br />

180<br />

2<br />

(13%)<br />

7<br />

(44%)<br />

8<br />

(57%)<br />

11<br />

(41%)<br />

3<br />

(20%)<br />

4<br />

(25%)<br />

3<br />

(21%)<br />

8<br />

(30%)<br />

3<br />

(20%)<br />

10<br />

(63%)<br />

7<br />

(50%)<br />

13<br />

(48%)<br />

4<br />

(27%)<br />

9<br />

(56%)<br />

4<br />

(29%)<br />

15<br />

(56%)<br />

3<br />

(20%)<br />

6<br />

(38%)<br />

4<br />

(29%)<br />

6<br />

(22%)<br />

1<br />

(7%)<br />

3<br />

(19%)<br />

2<br />

(14%)<br />

4<br />

(15%)<br />

3<br />

(20%)<br />

3<br />

(19%)<br />

2<br />

(14%)<br />

4<br />

(15%)<br />

9<br />

(60%)<br />

7<br />

(44%)<br />

6<br />

(43%)<br />

11<br />

(41%)<br />

Legend: The table displays the total numbers of respondents with<strong>in</strong> a work task choos<strong>in</strong>g a certa<strong>in</strong> type of metadata. The percentages refer to<br />

percentages of all respondents with<strong>in</strong> the work task <strong>in</strong> the questionnaire. The numbers of columns represent the metadata conta<strong>in</strong>ed <strong>in</strong> the<br />

questionnaire, namely: 1) Target group, 2) Superior subject, 3) Subject, 4) Name of legal text or court decision, 5) Object, 6) Activity, 7) Geographic<br />

data, 8) Responsible <strong>in</strong>stitution or department, 9) Project, 10) Document type, 11) Document number, 12) Document ID, 13) Work task.


181<br />

Chapter 7<br />

highly specific searches <strong>in</strong> a system that tends to overload the users with many<br />

irrelevant documents.<br />

Another type of metadata is frequently requested by the employees: Work task.<br />

Work task metadata is def<strong>in</strong>ed as “search<strong>in</strong>g for colleagues engaged <strong>in</strong> a particular<br />

service or task regardless of location” (from Table 6.2). Three work tasks of the<br />

questionnaire ranged it as the most important metadata (“Settlement: customs”,<br />

“Inspection: customs”, and “Processes of support: <strong>in</strong>ternal activities”). The rema<strong>in</strong>der<br />

of the work tasks ranged this particular metadata as be<strong>in</strong>g <strong>in</strong> the middle of the spectrum<br />

of importance. In the focus groups work task metadata also received quite some<br />

attention. Thus, the participants, regardless of work tasks, required improved<br />

possibilities to locate colleagues across the organization. Above we saw, that<br />

colleagues are widely used as <strong>in</strong>formation sources across the organization (see section<br />

7.3.2). The focus on work tasks as metadata <strong>in</strong> the focus groups corresponds well to the<br />

role of colleagues as <strong>in</strong>formation sources.<br />

Geographic data, document number, and document ID are found <strong>in</strong> the lower<br />

end of requirements for metadata. The metadata types were also not mentioned <strong>in</strong> the<br />

focus groups. This is <strong>in</strong>terpreted as an <strong>in</strong>dication of that the employees commonly<br />

would not be us<strong>in</strong>g the metadata actively to retrieve <strong>in</strong>formation. One th<strong>in</strong>g is<br />

surpris<strong>in</strong>g, though. Document types were ranked middle to low among the work tasks,<br />

when compared to other work tasks. In the focus groups, document types received<br />

more attention. Here they were assessed as an important type of metadata. To<br />

exemplify:<br />

“Often you go look for, well, decisions, orders or judgments <strong>in</strong> the equivalent<br />

area. And then you actively go search for judgments or orders, so it is exclusively the<br />

document type <strong>in</strong> the first place, that you know that you want. But it is not because it is<br />

the most important th<strong>in</strong>g, but it is a part of what we use <strong>in</strong> exactly <strong>in</strong> handl<strong>in</strong>g that<br />

case.” (R7, p. 8).<br />

As appears from the search test <strong>in</strong> the follow<strong>in</strong>g chapter, document types were used as<br />

an important filter here too. On this basis we must consider it an important type. In<br />

particular with<strong>in</strong> a doma<strong>in</strong> with such a variety of document types as is the case <strong>in</strong> e<strong>government</strong>.<br />

For the rema<strong>in</strong>der of the types a medium or low frequency was traced <strong>in</strong><br />

the table, suggest<strong>in</strong>g that all types could be relevant at some po<strong>in</strong>t, but that not<br />

necessarily all should be <strong>in</strong>cluded <strong>in</strong> a default search <strong>in</strong>terface.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

7.5 Summary and implications for <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

We <strong>in</strong>troduce the present section with 2 quotes from the focus groups,<br />

emphasiz<strong>in</strong>g the role of <strong>in</strong>formation <strong>in</strong> e-<strong>government</strong>:<br />

“There is a high, high frequency of <strong>in</strong>formation seek<strong>in</strong>g. It is <strong>in</strong>deed necessary<br />

and important that everyth<strong>in</strong>g go<strong>in</strong>g out from here is correct. Whether a rate or a<br />

reference for a paragraph or whatever it just needs to be <strong>in</strong> order.” (R26, p. ), and:<br />

“...you cannot memorize all the rules. That is why you go <strong>in</strong> and read them.”<br />

(R16, p. ).<br />

The quotes emphasize two important aspects of <strong>in</strong>formation use <strong>in</strong> SKAT: The<br />

<strong>in</strong>formation passed on to customers, such as citizens or <strong>government</strong>s, must be accurate.<br />

In addition the area is controlled by so many rules that it is not possible to memorize<br />

everyth<strong>in</strong>g. The purpose of the present section is to summarize the f<strong>in</strong>d<strong>in</strong>gs of the<br />

chapter and draw the implications of the seek<strong>in</strong>g behavior identified above to<br />

requirements for <strong><strong>in</strong>dex<strong>in</strong>g</strong>.<br />

On the basis of the employees’ preferences for <strong>in</strong>formation sources it was<br />

reflected that the <strong>in</strong>tranet was an important source of <strong>in</strong>formation. However, several<br />

participants <strong>in</strong> the focus groups expressed dissatisfaction with the system’s ability to<br />

retrieve relevant <strong>in</strong>formation. It was also found that along with the <strong>in</strong>tranet and the<br />

<strong>in</strong>ternet, colleagues were important sources of <strong>in</strong>formation, to validate f<strong>in</strong>d<strong>in</strong>gs and to<br />

save time search<strong>in</strong>g.<br />

The frequency of <strong>in</strong>formation varied between work tasks, but the most<br />

common frequency <strong>in</strong> the questionnaire was every 3 rd or 4 th time a task was solved.<br />

This <strong>in</strong>dicates a frequent seek<strong>in</strong>g behavior and suggests that the employees are<br />

experienced <strong>in</strong>formation searchers. In terms of demands for <strong><strong>in</strong>dex<strong>in</strong>g</strong> practice this also<br />

means that the employees are able to perform exact searches, if they have the right<br />

options available and that they are able to assess the consequences of a query. The<br />

predom<strong>in</strong>ant <strong>in</strong>formation needs among the employees were verificative and conscious<br />

topical needs. A s<strong>in</strong>gle work tasks stood out, but it had few cases and does not move<br />

the general picture. To meet these <strong>in</strong>formation needs, <strong><strong>in</strong>dex<strong>in</strong>g</strong> must be able to support<br />

verificative searches by add<strong>in</strong>g or draw<strong>in</strong>g metadata from the documents. Thus,<br />

verificative <strong>in</strong>formation needs are characterized by be<strong>in</strong>g guided by some k<strong>in</strong>d of<br />

known bibliographic <strong>in</strong>formation about the document. The conscious topical needs<br />

182


183<br />

Chapter 7<br />

should be supported by sufficient and high-quality metadata describ<strong>in</strong>g the content of<br />

documents. This is supported by the employees’ demands for metadata. However, the<br />

reduced <strong>in</strong>terest for superior subjects <strong>in</strong>dicates that subject metadata must be at a certa<strong>in</strong><br />

level of specificity <strong>in</strong> order to meet the employees’ large <strong>in</strong>sight <strong>in</strong>to their work areas.<br />

Lastly, the employees made requirements for metadata accessibility. Apart<br />

from subject metadata, work tasks were highly desired by the employees, <strong>in</strong>dicat<strong>in</strong>g the<br />

importance of be<strong>in</strong>g able to locate topic experts <strong>in</strong> the national organization. Document<br />

types did not receive much attention <strong>in</strong> the questionnaire, but <strong>in</strong> the focus groups the<br />

participants emphasized the document type as an important type of metadata. No<br />

metadata listed <strong>in</strong> the questionnaire were dismissed. However the metadata varied <strong>in</strong><br />

their importance to the employees <strong>in</strong>dicat<strong>in</strong>g, that <strong>in</strong> the particular work area <strong>in</strong> question<br />

must be explored when develop<strong>in</strong>g metadata <strong>in</strong> e-<strong>government</strong>.


8 Search test results<br />

185<br />

Chapter 8<br />

In the search test we made an experimental test <strong>in</strong> a prototype of SKATs future <strong>in</strong>tranet.<br />

Two systems were tested; system A and system B (for screen dumps: see section 6.4.1).<br />

System A represents a free-text web based search <strong>in</strong>terface with the possibility of<br />

limit<strong>in</strong>g search results as to document types and adjust<strong>in</strong>g search results by means of<br />

search operators. System B extends system A’s search facilities by offer<strong>in</strong>g a subject<br />

based categorization of search results (see section 6.4.1). In the present chapter we<br />

present the f<strong>in</strong>d<strong>in</strong>gs of the search test.<br />

8.1 The test persons<br />

32 test persons participated <strong>in</strong> the search test, 11 males and 21 females. The mean age<br />

of the test persons was 47, while the average length of service comprised approximately<br />

22 years (see Appendix 27). The age distribution corresponds to that of the population<br />

and of the survey questionnaire respondents (see Appendix 23). The majority of the test<br />

persons either had academic educations or were educated with<strong>in</strong> the organization. The<br />

same pattern appeared <strong>in</strong> the questionnaire results of the doma<strong>in</strong> study, though the share<br />

of persons with an academic education was slightly higher <strong>in</strong> the search test compared<br />

to the doma<strong>in</strong> study (see section 7.1 and 7.2). In our selection of test persons we<br />

emphasized that the test persons had a certa<strong>in</strong> frequency of use of the current <strong>in</strong>tranet.<br />

This is mirrored <strong>in</strong> the frequency of <strong>in</strong>tranet use depicted <strong>in</strong> Table 8.1. Thus, 25 of the<br />

32 participants estimate their frequency of use to be on a daily basis or even several<br />

times a day. The rema<strong>in</strong><strong>in</strong>g 7 consult the system on a weekly basis.<br />

Table 8.1 Frequency of test persons' <strong>in</strong>tranet use<br />

Frequency Percent of N=32<br />

Several times a day 18 56.3<br />

On a daily basis 7 21.9<br />

On a weekly basis 7 21.9<br />

Total 32 100.0<br />

Legend: The <strong>in</strong>tranet use frequency of the test persons participat<strong>in</strong>g <strong>in</strong> the search test. N=32.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 8.2 Rank<strong>in</strong>g of test persons' most important <strong>in</strong>formation sources<br />

Information sources Frequency Percent of N=32<br />

Intranet 29 91%<br />

Internal systems 19 60%<br />

The Internet 18 56%<br />

Electronic reference works 15 47%<br />

Colleagues (<strong>in</strong>clud<strong>in</strong>g the staff register) 13 41%<br />

Legend: The table depicts the <strong>in</strong>formation systems most frequently mentioned by the test persons<br />

as be<strong>in</strong>g among the three most important systems <strong>in</strong> terms of solv<strong>in</strong>g daily work tasks. Systems<br />

mentioned by less than 40 % of the respondents have been excluded from the table. N=32.<br />

In the recruitment questionnaire the forthcom<strong>in</strong>g test persons were asked the po<strong>in</strong>t out<br />

their three most important <strong>in</strong>formation sources from a predef<strong>in</strong>ed list. An open field<br />

provided the option of <strong>in</strong>dication additional sources.<br />

The sources important to most of the test persons are listed <strong>in</strong> Table 8.2. As<br />

emerges from the table, the vast majority of the test persons list the <strong>in</strong>tranet as be<strong>in</strong>g<br />

among their three most important sources of <strong>in</strong>formation. Subsequently, <strong>in</strong>ternal<br />

systems and the Internet follow. To sum up, the test persons are experienced users of<br />

the current <strong>in</strong>tranet and we can expect them to have a f<strong>in</strong>e idea of what can be found<br />

there and how. This also opens the possibility to compare the experimental system of<br />

the search test and the test persons’ use of it to their genu<strong>in</strong>e use of the runn<strong>in</strong>g <strong>in</strong>tranet.<br />

Three simulated search tasks (sim1, sim2, and sim3) and one genu<strong>in</strong>e<br />

<strong>in</strong>formation need (NWT) formed the basis of the search test. Sim1 was concerned with<br />

fiscal conditions when sell<strong>in</strong>g an apartment, sim2 with taxation of e-commerce, and<br />

sim3 with VAT registration of freelance teachers. To control the possible <strong>in</strong>fluence of<br />

the simulated search tasks to the test results, the test persons carried out a short<br />

evaluation every time a task had been completed. The evaluation was measured on a 5po<strong>in</strong>t<br />

Likert scale. The general scores of the evaluation appear from Table 8.3. Across<br />

all sessions the questions were assessed as just below average. The test persons rate<br />

their <strong>in</strong>sight <strong>in</strong>to the task topics at just below 3. Along with an average resemblance<br />

with daily work tasks of 2.34 and the test persons’ long average length of service it is<br />

assumed that the test persons have estimated that their knowledge of the work tasks is<br />

general, but not detailed. The average of 2.59 concern<strong>in</strong>g the difficulty of the search<br />

186


187<br />

Chapter 8<br />

Table 8.3 General evaluation of simulated search tasks <strong>in</strong> system a, system b, and total (averages)<br />

System A System B All sessions<br />

N=48 N=47 with SWT<br />

(One miss<strong>in</strong>g) N=95<br />

(One miss<strong>in</strong>g)<br />

Difficulty of search task 2.19 3.00 2.59<br />

Insight <strong>in</strong>to the topic of the search task 2.88 2.85 2.86<br />

Resemblance with daily tasks 2.40 2.28 2.34<br />

Legend: In total, 60 sessions were carried out <strong>in</strong> each system, <strong>in</strong>clud<strong>in</strong>g the genu<strong>in</strong>e work tasks.<br />

However, we did not ask the test persons to evaluate their own search tasks. Therefore: N=48,<br />

when calculated for the respective systems.<br />

tasks <strong>in</strong>dicate that the tasks have not been either too hard or too easy to solve us<strong>in</strong>g the<br />

two systems. However, here we see a fairly large distance between the level of<br />

difficulty between system A (2.19) and system B (3.00). It appears that the system have<br />

had some <strong>in</strong>fluence on the test persons’ perception of the level of difficulty.<br />

Table 8.4 is more specific and dist<strong>in</strong>guishes the task assessments between the<br />

three simulated search tasks. Here, m<strong>in</strong>or differences exist as to the <strong>in</strong>sight of the<br />

search tasks and their resemblance with the test persons genu<strong>in</strong>e work tasks. Aga<strong>in</strong><br />

system B most significantly differs from system A regard<strong>in</strong>g the level of difficulty. The<br />

largest distance between the two systems concerns sim2 (e-commerce). The assessment<br />

of sim2 <strong>in</strong> system B very well support the trouble, the test persons experienced when<br />

solv<strong>in</strong>g the task. We will explore possible explanations for the differences of<br />

assessments between system A and system B later <strong>in</strong> this chapter.<br />

Table 8.4 Evaluation of simulated search tasks specified to s<strong>in</strong>gle simulated search tasks (averages)<br />

Sim1 Sim2 Sim3 Total<br />

SysA<br />

(n=16)<br />

SysB<br />

(n=16)<br />

SysA<br />

(n=16)<br />

SysB<br />

(n=16)<br />

SysA<br />

(n=16)<br />

SysB<br />

(n=15, 1<br />

miss<strong>in</strong>g)<br />

SysA SysB<br />

Difficulty 2.13 2.44 2.44 4.06 2.00 2.47 2.19 3.00<br />

Insight 3.06 3.44 2.75 1.88 2.81 3.27 2.88 2.85<br />

Resemblance 2.31 2.25 2.38 2.00 2.50 2.60 2.40 2.28


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

8.2 Overall search<strong>in</strong>g behaviour and performance<br />

The search test provides data on the search<strong>in</strong>g behaviour <strong>in</strong> the two test systems, system<br />

A and system B. The empirical data support<strong>in</strong>g the rema<strong>in</strong>der of the chapter comprises<br />

the search log, search <strong>in</strong>terviews, and relevance assessments. In total, 128 sessions<br />

consist<strong>in</strong>g of 564 queries were undertaken by the 32 test persons, 64 sessions <strong>in</strong> each of<br />

the two systems. Table 8.5 summarizes the general f<strong>in</strong>d<strong>in</strong>gs.<br />

The average number of terms used <strong>in</strong> the queries of the test is 2.25 for system<br />

A and slightly higher <strong>in</strong> system B: 2.43. This corresponds to the average number of<br />

terms found <strong>in</strong> similar studies. For <strong>in</strong>stance Jansen, Sp<strong>in</strong>k & Saracevic (2000, p. 214)<br />

measured an average of 2.21 terms <strong>in</strong> their analysis of search logs <strong>in</strong> Excite. In a log<br />

analysis of a university OPAC, Lau & Goh (2006, p. 1322) found the average query<br />

length to be 2.86. In a cluster<strong>in</strong>g search eng<strong>in</strong>e (vivisimo.com) Koshman, Sp<strong>in</strong>k &<br />

Jansen (2006, p. 1879) found and average of 3.13, also based on log analysis. Some<br />

years later Hochstotter & Koch (2009, p. 55) identified a slightly lower average<br />

(between 1.6 and 1.8) <strong>in</strong> their study based on live tickers <strong>in</strong> a number of general and<br />

meta Web search eng<strong>in</strong>es. Lately, Lykke, Price & Delcambre (2012) showed averages<br />

of 1.5 and 2.0 <strong>in</strong> their comparative search test of a web based health portal. Lastly, <strong>in</strong> a<br />

study compar<strong>in</strong>g categorized searches with non-categorized searches, Käki (2005b, p.<br />

136) found an average of 2.10 for the former type, and 2.04 for the latter. Our f<strong>in</strong>d<strong>in</strong>gs<br />

corresponds to the f<strong>in</strong>d<strong>in</strong>gs of a highly similar study then, support<strong>in</strong>g that on average<br />

more search terms are applied <strong>in</strong> categorized queries than <strong>in</strong> non-categorized queries.<br />

Table 8.5 General f<strong>in</strong>d<strong>in</strong>gs of variables <strong>in</strong> search test<br />

Variables System A System B<br />

Sessions N=64 Sessions N=64<br />

Queries N=229 Queries N=335<br />

Number of terms <strong>in</strong> queries (average) 2.25 2.43<br />

Number of search keys <strong>in</strong> queries (average) 1,67 1.90<br />

Search filter “document type” applied (percentage) 43.2 31.6<br />

Number of sessions with reformulations (percentage) 65.6 82.8<br />

Number of reformulations <strong>in</strong> sessions (average) 2.58 4.23<br />

Query success (percentage) 30.6 21.5<br />

Session success (percentage) 89.1 84.4<br />

188


189<br />

Chapter 8<br />

As regards the average number of search keys the slightly higher average of<br />

terms <strong>in</strong> system B is reflected <strong>in</strong> the average number of search keys. Thus, system B<br />

queries conta<strong>in</strong> 1.90 search keys compared to 1.67 <strong>in</strong> system B. To compare, the<br />

differences between average number of terms and search keys <strong>in</strong> Lykke, Price &<br />

Delcambre’s (2012) study was slightly lower compared to the present results. Thus, the<br />

test persons used more terms to represent a search keys <strong>in</strong> the present test.<br />

Both systems offered filter<strong>in</strong>g by document type. The filter was used <strong>in</strong> 42.3 %<br />

of queries <strong>in</strong> system A and <strong>in</strong> 31.6 % of queries <strong>in</strong> system B. This distribution was<br />

expected as system A has fewer query specification options. Reformulations took place<br />

<strong>in</strong> both systems. However, <strong>in</strong> system A the share of sessions with reformulations was<br />

65.6 %, while 82.8 % of the sessions <strong>in</strong> system B required reformulations. In addition<br />

the average number of reformulations was notably higher <strong>in</strong> system B (4.23) compared<br />

to system A (2.58). This obviously means that an average session <strong>in</strong> system A conta<strong>in</strong>s<br />

3.58 queries while the correspond<strong>in</strong>g number for system B is 5.23. The averages are<br />

slightly above the f<strong>in</strong>d<strong>in</strong>gs of similar studies of web search eng<strong>in</strong>es and web portals. To<br />

compare Lykke, Price & Delcambre (2012) found an average of 2.5 and 3.2 queries per<br />

session. Koshman, Sp<strong>in</strong>k & Jansen’s (2006, p. 1879) average was marg<strong>in</strong>ally higher:<br />

3.37. To sum up, the present study, and <strong>in</strong> particular system B, has an <strong>in</strong>creased number<br />

of queries <strong>in</strong> sessions, when compared to similar studies.<br />

The success of sessions and queries has been summed up <strong>in</strong> Table 8.6 and<br />

Table 8.7. The total success at session level slightly benefits system A with relevant<br />

documents found <strong>in</strong> 89.1 % of all sessions. System B succeeded <strong>in</strong> 84.4 sessions. A<br />

specification as to search tasks reveals a fairly even distribution of successful sessions<br />

Table 8.6 Session success (percentages)<br />

Sim1 Sim2 Sim3 NWT Total<br />

SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />

Session 15 16 15 9 16 16 11 13 57 54<br />

succeeded<br />

(93.8) (100.0) (93.8) (56.3) (100.0 (100.0 (68.8) (81.3) (89.1) (84.4)<br />

Session 1 0 (0.0) 1 7 0 (0.0) 0 5 3 7 10<br />

failed (6.3)<br />

(6.3) (43.8) (0.0) (31.3) (18.8) (10.9) (15.6)<br />

Total 16 16 16 16 16 16 16 16 64 64


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 8.7 Query success (percentages)<br />

Query<br />

succeeded<br />

Sim1 Sim2 Sim3 NWT Total<br />

SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />

18<br />

Query failed 13<br />

Total 31<br />

(58.1)<br />

(41.9)<br />

(100.0)<br />

23<br />

(33.3)<br />

46<br />

(66.7)<br />

69<br />

(100.0)<br />

17<br />

(30.4)<br />

39<br />

(69.6)<br />

56<br />

(100.0)<br />

11<br />

(9.7)<br />

102<br />

(90.3)<br />

113<br />

(100.0)<br />

20<br />

(27.8)<br />

52<br />

(72.2)<br />

72<br />

(100.0)<br />

190<br />

22<br />

(25.6)<br />

64<br />

(74.4)<br />

86<br />

(100.0)<br />

15<br />

(21.4)<br />

55<br />

(78.6)<br />

70<br />

(100.0)<br />

16<br />

(23.9)<br />

51<br />

(76.1)<br />

67<br />

(100.0)<br />

70<br />

(30.6)<br />

159<br />

(69.4)<br />

229<br />

(100.0)<br />

72<br />

(21.5)<br />

263<br />

(78.5)<br />

335<br />

(100.0)<br />

between the two systems except <strong>in</strong> sim2. For the rema<strong>in</strong>der of the sessions, the systems<br />

performs equally, and even with a m<strong>in</strong>or advantage for system B. In sim2, 1 session<br />

failed <strong>in</strong> system A, while 7 sessions failed <strong>in</strong> system B. This may very well expla<strong>in</strong> a<br />

part of why the test persons assessed the task as markedly more difficult, as we have<br />

just seen.<br />

At query level the total number of successful searches is fairly even between<br />

the two systems. Only, <strong>in</strong> system B the total numbers of failed queries are markedly<br />

higher than <strong>in</strong> system A, particularly concern<strong>in</strong>g sim1 and sim2, and as a consequence<br />

also when compared at system level <strong>in</strong> the last two columns <strong>in</strong> Table 8.7. Thus, the<br />

performance at query level <strong>in</strong>creases the differences of performance at the benefit of<br />

system A compared to the more even overall performance at session level. In short, the<br />

two systems provide approximately the same number of successful queries. It just<br />

requires more failed queries <strong>in</strong> system B.<br />

To sum up, the overall comparison of the two test systems shows a slight<br />

advantage of system A at session level <strong>in</strong> terms of ability to retrieve relevant<br />

documents. The advantage of system A <strong>in</strong>creases, when measured at query level. In<br />

addition, system A differs from system B, as fewer terms are needed <strong>in</strong> queries, and the<br />

share and number of reformulations are lower. In the sections to follow we will explore<br />

the nature and causes of the difference of performance of the two systems. We will<br />

explore, what characterizes the search situation (section 8.2.1), the number and types of<br />

reformulations carried out (section 0), and the un<strong>in</strong>tended use of system A <strong>in</strong> system B<br />

searches (section 8.2.3).


8.2.1 The search situation<br />

191<br />

Chapter 8<br />

The search situation is characterized by different components. In the dataset<br />

we have identified four components that is guid<strong>in</strong>g this presentation of results: sessions,<br />

queries, search operators, and filter<strong>in</strong>g by document type. We present the results <strong>in</strong> that<br />

order.<br />

8.2.1.1 Sessions<br />

128 sessions were carried out <strong>in</strong> the search test, 64 <strong>in</strong> system A and 64 <strong>in</strong> system B. As<br />

appears from the total numbers, more queries are executed <strong>in</strong> system B than <strong>in</strong> system<br />

A. This is also the case at task level (see Table 8.8). Here, the average number of<br />

queries needed <strong>in</strong> order to solve a task <strong>in</strong> system differs with almost 2 queries (the last<br />

column). As regards the <strong>in</strong>dividual search tasks, the genu<strong>in</strong>e <strong>in</strong>formation need has a<br />

slightly lower average <strong>in</strong> system B compared to system A, <strong>in</strong>dicat<strong>in</strong>g that the genu<strong>in</strong>e<br />

<strong>in</strong>formation need actually benefitted from the categories. In the rema<strong>in</strong>der of the search<br />

tasks system B is above system A <strong>in</strong> terms of averages. It has already been shown that<br />

particularly sim1 and sim2 conta<strong>in</strong>ed a significantly higher share of failed queries <strong>in</strong><br />

system B compared to system A. That also appears <strong>in</strong> the present table, where sim1 and<br />

sim2 executed <strong>in</strong> system B has an average of queries twice as large as <strong>in</strong> system A. In<br />

terms of variance, the standard deviation of the two systems is practically the same. At<br />

task level, the two differs more with the highest maximum of system A <strong>in</strong> sim3 (27<br />

reformulations), and the highest maximum of system B <strong>in</strong> sim1 (18 reformulations) (see<br />

Table 1, Appendix 28). Thus, the variances with<strong>in</strong> both systems are fairly large. For<br />

sim1 the difference is caused by a very high success rate <strong>in</strong> system A. A further<br />

explanation could be that sim1 <strong>in</strong> system A was assessed as below average as regards<br />

difficulty, and that the test persons had a rather good knowledge of <strong>in</strong> advance (cf.<br />

Table 8.4)<br />

Table 8.8 Number of queries <strong>in</strong> sessions at task level (averages)<br />

System A 1.94 (n=16)<br />

System B 4.31 (n=16)<br />

Sim1 Sim2 Sim3 NWT Total<br />

3.50 (n=16) 4.50 (n=16) 4.38 (n=16) 3.58 (n=64)<br />

7.06 (n=16) 5.38 (n=16) 4.19 (n=16) 5.23 (n=64)<br />

Total 3.13 (n=32) 5.28 (n=32) 4.94 (n=32) 4.28 (n=32) 4.41 (n=128)


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 8.9 Number of queries <strong>in</strong> sessions as to success or failure (averages)<br />

Session<br />

succeeded<br />

Session<br />

failed<br />

Sim1 Sim2 Sim3 NWT Total<br />

SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />

2.00<br />

(n=15)<br />

1.00<br />

(n=1)<br />

Total 1.94<br />

(n=16)<br />

4.31<br />

(n=16)<br />

2.93<br />

(n=15)<br />

. 12.00<br />

(n=1)<br />

4.31<br />

(n=16)<br />

3.50<br />

(n=16)<br />

6.78<br />

(n=9)<br />

7.43<br />

(n=7)<br />

7.06<br />

(n=16)<br />

4.50<br />

(n=16)<br />

192<br />

5.38<br />

(n=16)<br />

3.73<br />

(n=11)<br />

- - 5.80<br />

(n=5)<br />

4.50<br />

(n=16)<br />

5.38<br />

(n=16)<br />

4.38<br />

(n=16)<br />

3.46<br />

(n=13)<br />

7.33<br />

(n=3)<br />

4.19<br />

(n=16)<br />

3.28<br />

(n=57)<br />

6.00<br />

(n=7)<br />

3.58<br />

(n=64)<br />

4.83<br />

(n=54)<br />

7.40<br />

(n=10)<br />

5.23<br />

(n=64)<br />

Table 8.9 illustrates the number of queries <strong>in</strong> sessions of success and failure<br />

respectively. The majority of the sessions are f<strong>in</strong>ished with the retrieval of one or more<br />

relevant documents. From the table it is clear that, apart from one exception (sim1,<br />

system A), sessions with fewer reformulations are more likely to succeed. In the three<br />

simulated search tasks, system A is superior to system B, as system A sessions have a<br />

lower average of queries. For the genu<strong>in</strong>e <strong>in</strong>formation need (NWT) the average is a<br />

little lower for system B sessions than for system A sessions, but not close enough to<br />

change the overall impression of system A as the most efficient system <strong>in</strong> terms of low<br />

average number of queries <strong>in</strong> sessions. To sum up, at session level test persons put<br />

more effort, <strong>in</strong> terms of the number of queries, <strong>in</strong>to sessions that rema<strong>in</strong> unsolved <strong>in</strong> the<br />

end. The average number of queries is higher <strong>in</strong> system B searches, and a session is<br />

more likely to succeed, if it is solved with fewer queries.<br />

8.2.1.2 Queries<br />

The average number of terms has already been summed up to be 2.25 for<br />

system A and 2.43 for system B. In Table 8.10 the calculations have been made up at<br />

task level. The table shows that more terms have been entered <strong>in</strong> all system B queries<br />

when compared to system A, except for sim3. Here the average number of terms is<br />

notably lower <strong>in</strong> system B than <strong>in</strong> system A. One possible reason for this could aga<strong>in</strong><br />

be found <strong>in</strong> the test persons’ assessments of the task. Thus, sim3, system B have<br />

received the absolute highest score on resemblance with the test persons’ daily work<br />

tasks. However, due to the scores of sim3, the connection between the average numbers<br />

of search terms <strong>in</strong> queries is not consistently higher <strong>in</strong> system B than <strong>in</strong> system A.<br />

However, when measured as the number of search terms entered <strong>in</strong> the respective<br />

systems, the overall impression is a superior system A.


Table 8.10 Number of search terms <strong>in</strong> queries (averages)<br />

Sim1 Sim2 Sim3 NWT Total<br />

193<br />

Chapter 8<br />

System A 2.32<br />

(n=31)<br />

2.39<br />

(n=56)<br />

2.42<br />

(n=72)<br />

1.94<br />

(n=70)<br />

2.25<br />

(N=229)<br />

System B 2.54<br />

(n=69)<br />

2.88<br />

(n=113)<br />

1.79<br />

(n=86)<br />

2.39<br />

(n=67)<br />

2.43<br />

(N=335)<br />

Total 2.47 (n=100) 2.72 (n=169) 2.08 (n=158) 2.16 (n=137) 2.36 (N=564)<br />

Table 8.11 outl<strong>in</strong>es the number of search keys used for the <strong>in</strong>dividual search<br />

tasks of the test. Overall, the average number of search terms is above the average<br />

number of search keys <strong>in</strong> queries. That means that on average each search key was<br />

represented with more than one term. The figures count examples of synonym terms for<br />

the same concept and phrases such as “when to become VAT registered”. The average<br />

number of search keys to some extent reflects the average number of search terms just<br />

identified. Thus, the average is higher <strong>in</strong> system B for sim1 and sim2, while sim3 is<br />

higher <strong>in</strong> system A. The low average <strong>in</strong> sim1, system A reveals that the search task had<br />

one very significant word, parents’ purchase (forældrekøb), which, when used as a<br />

query term, listed a highly relevant citizen booklet that most test persons assessed as<br />

relevant. As can be seen from the table, more concepts have been used <strong>in</strong> system B<br />

compared to system A. That may, at least partially be expla<strong>in</strong>ed by the test persons’<br />

lack of <strong>in</strong>sight <strong>in</strong>to the test system. In a number of cases the test persons composed a<br />

full query that were able to retrieve the documents wanted and then they were asked to<br />

filter by a category. In some of these cases categories represent<strong>in</strong>g a search key already<br />

represented by the search terms were chosen. In other cases an additional search key<br />

Table 8.11 Number of search keys <strong>in</strong> queries (averages)<br />

Sim1 Sim2 Sim3 Total<br />

System A 1.29 (n=31) 1.82 (n=56) 1.72 (n=72) 1.67 (N=159)<br />

System B 1.97 (n=69) 2.12 (n=113) 1.56 (n=86) 1.90 (N=268)<br />

Total 1.76 (n=100) 2.02 (n=169) 1.63 (n=158) 1.81 (N=427)<br />

Legend: The table reflects the figures from the simulated search tasks, as we could not perform the<br />

query search key analysis for the genu<strong>in</strong>e search tasks. That expla<strong>in</strong>s the reduced N compared to<br />

Table 8.10.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 8.12 Number of search terms <strong>in</strong> queries as to success or failure (averages)<br />

Query<br />

success<br />

Query<br />

failure<br />

Sim1 Sim2 Sim3 NWT Total<br />

SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />

2.28<br />

(n=18)<br />

2.38<br />

(n=13)<br />

Total 2.32<br />

(n=31)<br />

2.3<br />

(n=23)<br />

2.65<br />

(n=46)<br />

2.54<br />

(n=69)<br />

2.35<br />

(n=17)<br />

2.41<br />

(n=39)<br />

2.39<br />

(n=56)<br />

3.00<br />

(n=11)<br />

2.86<br />

(n=102)<br />

2.88<br />

(n=113)<br />

2.2<br />

(n=20)<br />

2.5<br />

(n=52)<br />

2.42<br />

(n=72)<br />

194<br />

1.77<br />

(n=22)<br />

1.8<br />

(n=64)<br />

1.79<br />

(n=86)<br />

1.93<br />

(n=15)<br />

1.95<br />

(n=55)<br />

1.94<br />

(n=70)<br />

2.13<br />

(n=16)<br />

2.47<br />

(n=51)<br />

2.39<br />

(n=67)<br />

2.20<br />

(n=70)<br />

2.28<br />

(n=159)<br />

2.25<br />

(n=229)<br />

2.21<br />

(n=72)<br />

2.49<br />

(n=263)<br />

2.43<br />

(n=335)<br />

was added to the query by the choice of a category. These latter cases expla<strong>in</strong> a part of<br />

the reason for the general <strong>in</strong>creased number of search keys <strong>in</strong> system B searches.<br />

In Table 8.12, the average number of search terms <strong>in</strong> successful queries, and <strong>in</strong> queries<br />

that failed is listed. With the exception of sim2, system B, queries have consistently<br />

had a higher success rate with a lower number of search terms. In terms of search keys<br />

the same overall picture is the same (see Table 8.13). Here successful queries<br />

consistently represent fewer search keys, when compared to failed queries. Thus, <strong>in</strong> the<br />

present database a query based on few search terms and search keys is more likely to<br />

retrieve relevant documents. A part of the explanation could be the relatively small<br />

database beh<strong>in</strong>d the prototype. The more search terms entered the less documents may<br />

match the search terms. This is supported by a correlation analysis show<strong>in</strong>g a<br />

statistically significant relation between the number of search terms entered and the<br />

number of hits (see table 4, Appendix 28). Further, the succession of search tasks did<br />

Table 8.13 Number of search keys <strong>in</strong> queries as to success or failure (averages)<br />

Query<br />

success<br />

Query<br />

failure<br />

Sim1 Sim2 Sim3 Total<br />

SysA SysB SysA SysB SysA SysB SysA SysB<br />

1.28<br />

(n=18)<br />

1.31<br />

(n=13)<br />

Total 1.29<br />

(n=31)<br />

1.57<br />

(n=23)<br />

2.17<br />

(n=46)<br />

1.97<br />

(n=69)<br />

1.53<br />

(n=17)<br />

1.95<br />

(n=39)<br />

1.82<br />

(n=56)<br />

2.09<br />

(n=11)<br />

2.12<br />

(n=102)<br />

2.12<br />

(n=113)<br />

1.65<br />

(n=20)<br />

1.75<br />

(n=52)<br />

1.72<br />

(n=72)<br />

1.55<br />

(n=22)<br />

1.56<br />

(n=64)<br />

1.56<br />

(n=86)<br />

1.49<br />

(n=55)<br />

1.77<br />

(n=104)<br />

1.67<br />

(N=159)<br />

1.66<br />

(n=56)<br />

1.96<br />

(n=211)<br />

1.90<br />

(N=267)


195<br />

Chapter 8<br />

not either have an effect on the number of search terms applied (see table 4, Appendix<br />

28). Thus, there were no significance as to the succession of search tasks and the<br />

number of terms entered <strong>in</strong> the query field. Another reason for the higher success of<br />

queries with fewer search terms and search keys may be the test persons’ professional<br />

background. By this is meant that the test persons <strong>in</strong> a number of queries entered<br />

specific and correct search terms that efficiently retrieved documents. When enter<strong>in</strong>g<br />

more terms or search keys at the same time the number of search results became very<br />

limited. To conclude, the experiences ga<strong>in</strong>ed dur<strong>in</strong>g the test did not change how test<br />

persons composed their queries, at least <strong>in</strong> terms <strong>in</strong> the number of terms entered. Also<br />

queries with fewer terms and concepts were superior, most likely because the test<br />

persons’ <strong>in</strong>sights <strong>in</strong>to the general topic made them enter qualified search terms, and<br />

because fewer terms and concepts did not restrict the number of results too much. More<br />

terms and concepts were applied <strong>in</strong> system B, partially because categories were added <strong>in</strong><br />

system B to queries that at times were complete without the category. When the<br />

category was added, it occasionally represented a new concept, which <strong>in</strong>creased the<br />

average number of concepts <strong>in</strong> system B.<br />

Table 8.14 Distribution of search operator <strong>in</strong> queries (percentages)<br />

Sim1 Sim2 Sim3 NWT Total<br />

SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />

Free text 16 22 27 61 21 62 38 32 102 177<br />

(51.6) (31.9) (48.2) (54.0) (29.2) (72.1) (54.3) (47.8 (44.5) (52.8)<br />

Pages 13 45 28 48 46 23 23 29 110 145<br />

conta<strong>in</strong><strong>in</strong>g<br />

all words<br />

(41.9) (65.2) (50.0) (42.5) (63.9) (26.7) (32.9) (43.3) (48.0) (43.3)<br />

This exact 2 - 1 4 5 1 5 6 13 11<br />

sentence (6.5) (1.8) (3.5) (6.9) (1.2) (7.1) (9.0) (5.7) (3.3)<br />

At least - 2 - - - - 4 - 4 2<br />

one of the<br />

words<br />

(2.9)<br />

(5.7) (1.7) (0.6)<br />

Total 31 69 56 113 72 86 70 67 229 335<br />

Legend: The AW operator retrieves documents conta<strong>in</strong><strong>in</strong>g all search terms. FT retrieve documents<br />

that conta<strong>in</strong> most, but not necessarily all, search terms. ES corresponds to apply<strong>in</strong>g quotation<br />

marks. And the OW operator retrieves documents, where at least one of the types search terms is<br />

conta<strong>in</strong>ed. In the search test all search results were ranked as to the best match (relevance).


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

8.2.1.3 Search operators<br />

In the search <strong>in</strong>terface the default sett<strong>in</strong>g of the search operator field is “Free<br />

text” (FT). Therefore, as the test persons did not have any prior experience with the test<br />

systems, it was plausible that the default FT operator would be the most frequently used<br />

<strong>in</strong> the test. Thus, users tend to use the default sett<strong>in</strong>gs put forward by the system<br />

(Markey, 2007a, p. 1077). As expected, the FT operator had a high frequency across the<br />

queries, though along with the “Pages conta<strong>in</strong><strong>in</strong>g all words” (AW) operator. The<br />

unexpected is that system A has a slightly higher frequency of FT searches, while the<br />

opposite is the case for system B. Thus, <strong>in</strong> system B, the AW operator is more frequent<br />

than the FT operator (see Table 8.14). We have previously mentioned that the AW<br />

operator is the more restrictive of the two (see section 6.4.1). In comb<strong>in</strong>ation with the<br />

mandatory categorization <strong>in</strong> system B, it is likely to result <strong>in</strong> large differences between<br />

the sizes of search results <strong>in</strong> the two systems.<br />

One explanation for the unexpected distribution of the FT and the AW<br />

operators between system A and system B is that some test persons had trouble<br />

Table 8.15 Number of search terms used with search operators <strong>in</strong> queries (averages)<br />

Free text<br />

(FT)<br />

Pages<br />

conta<strong>in</strong><strong>in</strong>g<br />

all words<br />

(AW)<br />

This exact<br />

sentence<br />

(ES)<br />

At least one<br />

of the<br />

words<br />

(OW)<br />

Sim1 Sim2 Sim3 NWT Total<br />

SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />

2.25<br />

(n=16)<br />

2.62<br />

(n=13)<br />

1.0<br />

(n=2)<br />

Total 2.32<br />

2.41<br />

(n=22)<br />

2.62<br />

- 2.0<br />

(n=31<br />

(n=45)<br />

2.30<br />

(n=27)<br />

2.46<br />

(n=28)<br />

- 3.00<br />

(n=2)<br />

2.54<br />

(n=69)<br />

(n=1)<br />

2.64<br />

(n=61)<br />

3.10<br />

(n=48)<br />

3.75<br />

(n=4)<br />

1.57<br />

(n=21)<br />

2.85<br />

(n=46)<br />

2.00<br />

(n=5)<br />

196<br />

1.58<br />

(n=62)<br />

2.17<br />

(n=23)<br />

6.00<br />

(n=1)<br />

1.76<br />

(n=38)<br />

1.83<br />

(n=23)<br />

2.40<br />

(n=5)<br />

- - - - 3.75<br />

2.39<br />

(n=56)<br />

2.88<br />

(n=113)<br />

2.42<br />

(n=72)<br />

1.79<br />

(n=86)<br />

(n=4)<br />

1.94<br />

(n=70)<br />

2.03<br />

(n=32)<br />

2.76<br />

(n=29)<br />

2.50<br />

(n=6)<br />

1.94<br />

(n=102)<br />

2.51<br />

(n=110)<br />

2.08<br />

(n=13)<br />

- 3.75<br />

2.39<br />

(n=67)<br />

(n=4)<br />

2.13<br />

(n=177)<br />

2.74<br />

(n=145)<br />

3.27<br />

(n=11)<br />

2.0<br />

(n=2)


197<br />

Chapter 8<br />

<strong>in</strong>corporat<strong>in</strong>g the two operators and separat<strong>in</strong>g them from each other. Thus, test persons<br />

<strong>in</strong>termittently wondered, why search terms did not occur <strong>in</strong> their result list, when us<strong>in</strong>g<br />

the FT operator. To exemplify:<br />

“Yes, but on the other hand it could also give… free text… then they all ought<br />

to come…” (TP15, l<strong>in</strong>e 306)<br />

In addition the test persons consistently used more search terms when apply<strong>in</strong>g the AW<br />

operator than when the FT operator was used (see Table 8.15), result<strong>in</strong>g <strong>in</strong> a gap<br />

concern<strong>in</strong>g the number of documents retrieved when us<strong>in</strong>g one of the two preferred<br />

operators. To illustrate, an average search <strong>in</strong> system A us<strong>in</strong>g the FT operator retrieved<br />

548 documents while the AW operator <strong>in</strong> the same system on average retrieved 121<br />

documents. In system B, average FT searches retrieved 25 documents, while average<br />

AW searches retrieved 10 documents (see Table 3, Appendix 28). Thus, the searches<br />

carried out <strong>in</strong> system B were significantly narrower than the broader system A searches,<br />

as the search results <strong>in</strong> addition were filtered as to the subject. In terms of Boolean<br />

logic, the addition of a category corresponds to comb<strong>in</strong><strong>in</strong>g a query with an additional<br />

term, and <strong>in</strong> some cases an additional concept, as a Boolean “AND”. Aga<strong>in</strong> it appears<br />

that some test persons have had trouble fully understand<strong>in</strong>g the comparative<br />

implications of the two search operators. Unfortunately, it has not been possible to<br />

deduce causes for the difference <strong>in</strong> search operators between the two systems <strong>in</strong> the<br />

search <strong>in</strong>terviews, as the test persons have not addressed it dur<strong>in</strong>g their searches.<br />

Lastly, the search operator field was rarely used to adjust search results <strong>in</strong><br />

reformulations (cf. Table 8.21). That <strong>in</strong>dicates that the test persons did not feel<br />

sufficiently safe us<strong>in</strong>g the operators for reformulations and <strong>in</strong>stead preferred other types<br />

of reformulations (we analyse reformulations closer below (section 0)). That the<br />

understand<strong>in</strong>g of Boolean operators challenges end users corresponds to the f<strong>in</strong>d<strong>in</strong>gs of<br />

similar studies (eg. Markey, 2007a).<br />

As it is evident from above, the use of search operators has resulted <strong>in</strong><br />

significant differences as to numbers of hits retrieved <strong>in</strong> the two test systems due to the<br />

use of search operators. However, the success of the queries <strong>in</strong> terms of search<br />

operators is needed <strong>in</strong> order to identify, which performed better. That appears from<br />

Table 8.16. The success rate of system A is higher on a general level, when compared<br />

to system B. System A queries have a slightly higher success rate <strong>in</strong> FT searches


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 8.16 Success of search operators (percentages)<br />

System A System B<br />

FT AW ES OW FT AW ES OW<br />

Success 33 30 7 - 38 32 1 (9.1) 1<br />

(32.4) (27.3) (53.8) (21.5) (22.1)<br />

(50.0)<br />

Failure 69 80 6 4 139 113 10 1<br />

(67.6) (72.7) (46.2) (100.0) (78.5) (77.9) (90.9) (50.0)<br />

Total 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0<br />

Legend: FT=Free text, AW=Pages conta<strong>in</strong><strong>in</strong>g all words, ES=This exact sentence, OW=At least one<br />

of the words.<br />

compared to AW searches. As appeared from Table 8.15, the number of search terms<br />

was lower <strong>in</strong> FT searches compared to AW searches. This suggests that system A has<br />

the best performance <strong>in</strong> open searches (us<strong>in</strong>g the FT operator and fewer terms). As<br />

regards system B, the success of FT and AW is fairly even, but below the average of<br />

system A. Apparently, the queries <strong>in</strong> system B have filtered out too many relevant hits.<br />

To conclude, the best performance was found <strong>in</strong> system A us<strong>in</strong>g the FT operator. This<br />

also <strong>in</strong>dicates a well-function<strong>in</strong>g relevance rank<strong>in</strong>g with<strong>in</strong> the system. To conclude,<br />

system A managed to perform better on the basis of the broader queries applied. The<br />

test persons had difficulties apply<strong>in</strong>g and understand<strong>in</strong>g the search operators correctly,<br />

which lead to a weaker performance of system B due to a majority of small result sets.<br />

8.2.1.4 Filter<strong>in</strong>g by metadata<br />

The test system documents were marked up as to which document type the<br />

document belongs to. That facilitates an <strong>in</strong>clusion of the particular metadata field<br />

“document type”, when queries are built up. Document type metadata are a powerful<br />

retrieval tool <strong>in</strong> a collection like the test collection with many heterogeneous document<br />

types. Thus, us<strong>in</strong>g the document type filter removes many irrelevant documents by<br />

their type. Request<strong>in</strong>g a specific document type was optional <strong>in</strong> the prototype. From<br />

the comments given by test persons dur<strong>in</strong>g their search sessions it is clear that the<br />

specification of document types is an important option <strong>in</strong> the search <strong>in</strong>terface. The<br />

possibility for specifications was not commented by the test persons as such, but it was<br />

used as a natural function <strong>in</strong> queries. In addition, document types were also mentioned<br />

as one among several important metadata <strong>in</strong> the doma<strong>in</strong> study (see section 7.4.2). In<br />

198


199<br />

Chapter 8<br />

particular legal guidances were emphasized as important to the employees’ work. For<br />

<strong>in</strong>stance:<br />

“Well, it is the common assessment guidance, the one we refer to as our Bible,<br />

you can say, where you need to go check, if it is right <strong>in</strong> the legal rules…” (TP02, l<strong>in</strong>e<br />

112-113).<br />

And the statement is supported:<br />

“And that is just the problem: Is it a bus<strong>in</strong>ess or is it not? And I know the<br />

assessment guidance so well that I know that all facets are <strong>in</strong>cluded here. There might<br />

be four or seven sub divisions to that document, but it is <strong>in</strong> there. It is just a matter of<br />

click<strong>in</strong>g further and further down until you f<strong>in</strong>d it…” (TP05, l<strong>in</strong>e 65-68).<br />

From Table 8.17 it appears that overall a larger share of system A queries used<br />

the document type filter <strong>in</strong> comparison to system B. When apply<strong>in</strong>g the document type<br />

filter, legal guidances was the preferred document type searched across both systems.<br />

That emphasizes the importance of the document type <strong>in</strong> the test persons’ daily work,<br />

which was also expressed dur<strong>in</strong>g the <strong>in</strong>terviews. Further, legal guidances are listed as<br />

one among more relevant document types <strong>in</strong> the non-topical facet for all three simulated<br />

search tasks (see Table 6.7).<br />

At search task level it becomes evident that the overall averages (the outer right<br />

columns <strong>in</strong> Table 8.17) represent a large variation. In system A the average of 56.8 %<br />

of “None chosen” <strong>in</strong>cludes the highest average of 83.9 % <strong>in</strong> sim1 and the lowest <strong>in</strong> sim3<br />

(29.2 %). The differences <strong>in</strong> system B are smaller, but still of <strong>in</strong>terest. Here the highest<br />

average is 86.6 % <strong>in</strong> the genu<strong>in</strong>e <strong>in</strong>formation need and the lowest at 55.1 % <strong>in</strong> sim1.<br />

The general lowest use of the document type filter is <strong>in</strong> sim1, system A and <strong>in</strong> the<br />

genu<strong>in</strong>e search task, system B. Here, less than a quarter of the queries applied the filter.<br />

In sim3 the biggest difference between the two systems appears. Here the filter was<br />

used <strong>in</strong> approximately 70 % of the system A queries, while only approximately a<br />

quarter of the system B queries applied the filter. We may expla<strong>in</strong> the general higher<br />

use of the document type filter <strong>in</strong> system A with the lower number of filter<strong>in</strong>g<br />

possibilities, when compared to system B.<br />

When compared to the facet analysis of the simulated search tasks (section<br />

6.4.7; Table 6.7), the largest share of correct document filter sett<strong>in</strong>gs was used <strong>in</strong> sim1.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Here legal guidances, legislation, and citizen booklets were listed as correct document<br />

types for the task. All queries used either no filter or one of the three just mentioned.<br />

The same was the case for sim 3, system B. For sim 3, system A and sim 2, a greater<br />

variety of document types were applied <strong>in</strong> the queries. Consider<strong>in</strong>g the f<strong>in</strong>d<strong>in</strong>gs<br />

Table 8.17 Document type filter used <strong>in</strong> queries (percentages)<br />

None<br />

chosen<br />

Legal<br />

guidances<br />

Bus<strong>in</strong>ess<br />

guidances<br />

Citizen<br />

booklets<br />

Legislation<br />

Internal<br />

<strong>in</strong>formation<br />

Internal<br />

guidances<br />

Bus<strong>in</strong>ess<br />

newsletters<br />

Sim1 Sim2 Sim3 NWT Total<br />

SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />

26<br />

(83.9)<br />

2<br />

(6.5)<br />

3<br />

(9.7)<br />

38<br />

(55.1)<br />

26<br />

(37.7)<br />

- -<br />

5<br />

(7.2)<br />

33<br />

(58.9)<br />

11<br />

(19.6)<br />

5<br />

(8.9)<br />

4<br />

(7.1)<br />

- - -<br />

71<br />

(62.8)<br />

16<br />

(14.2)<br />

18<br />

(15.9)<br />

-<br />

5<br />

(4.4)<br />

21<br />

(29.2)<br />

10<br />

(13.9)<br />

11<br />

(15.3)<br />

200<br />

3<br />

(4.2)<br />

13<br />

(18.1)<br />

62<br />

(72.1)<br />

8<br />

(9.3)<br />

3<br />

(3.5)<br />

-<br />

13<br />

(15.1)<br />

- - - - - -<br />

- -<br />

1<br />

(1.8)<br />

- - -<br />

1<br />

(0.9)<br />

2<br />

(1.8)<br />

5<br />

(6.9)<br />

5<br />

(6.9)<br />

50<br />

(71.4)<br />

4<br />

(5.7)<br />

4<br />

(5.7)<br />

6<br />

(8.6)<br />

58<br />

(86.6)<br />

8<br />

(11.9)<br />

- -<br />

-<br />

- -<br />

- - -<br />

- - -<br />

-<br />

130<br />

(56.8)<br />

27<br />

(11.8)<br />

16<br />

(7.0)<br />

14<br />

(6.1)<br />

13<br />

(5.7)<br />

6<br />

(2.6)<br />

6<br />

(2.6)<br />

5<br />

(2.2)<br />

229<br />

(68.4)<br />

58<br />

(17.3)<br />

Case law - - - - - - 4 (5.7) - 4 (1.7) -<br />

Others - - 2 (3.6) - 1 (1.4) - - - 3 (1.3) -<br />

Legislative<br />

materials<br />

Forms<br />

SKAT<br />

circulars<br />

21<br />

(6.3)<br />

5<br />

(1.5)<br />

18<br />

(5.4)<br />

-<br />

1<br />

(0.3)<br />

2<br />

(0.6)<br />

- - - - 2 (2.8) - - - 2 (0.9) -<br />

- - - - - - 2 (2.9)<br />

1<br />

(1.5)<br />

2 (0.9)<br />

1<br />

(0.3)<br />

- - - - 1 (1.4) - - - 1 (0.4) -<br />

Total 31 69 56 113 72 86 70 67 229 335


201<br />

Chapter 8<br />

Table 8.18 Search success for the document type filter <strong>in</strong> system A and system B queries<br />

(percentages)<br />

System A Total System B Total<br />

Success Failure system A Success Failure system B<br />

None chosen 52 (40.0) 78 (60.0) 130 59 (25.8) 170(74.2) 229<br />

(100.0)<br />

(100.0)<br />

Legal<br />

2 (7.4) 25 (92.6) 27 6 (10.3) 52 (10.3) 58<br />

guidances<br />

(100.0)<br />

(100.0)<br />

Legislation 0 13 13 2 (11.1) 16 (88.9) 18<br />

(100.0) (100.0)<br />

(100.0)<br />

Bus<strong>in</strong>ess 7 (43.8) 9 (56.3) 16 3 (14.3) 18 (85.7) 21<br />

guidances<br />

(100.0)<br />

(100.0)<br />

Bus<strong>in</strong>ess<br />

newsletters<br />

0 5 (100.0) 5 (100.0) 0 2 (100.0) 2 (100.0)<br />

Internal<br />

guidances<br />

Citizen<br />

booklets<br />

Internal<br />

<strong>in</strong>formation<br />

1 (16.7) 5 (83.3) 6 (100.0) 0 1 (100.0) 1 (100.0)<br />

6 (42.9) 8 (57.1) 14<br />

(100.0)<br />

2 (40.0) 3 (60.0) 5 (100.0)<br />

1 (16.7) 5 (8 3.3) 6 (100.0) - - -<br />

Legend: Document types that have been applied less than 5 times <strong>in</strong> total across the two systems<br />

have been omitted from the table.<br />

regard<strong>in</strong>g sessions (section 8.2.1.1), where just the same two simulated search tasks had<br />

the highest average number of reformulations it appears that a wrong choice of<br />

document types lead to more reformulations <strong>in</strong> both systems. For sim2 and sim3 the<br />

correct document types (<strong>in</strong> terms of the facet analysis) was legal guidances, legislation,<br />

and bus<strong>in</strong>ess guidances. To conclude, the use of the document filter is overall higher <strong>in</strong><br />

system A. Further, the frequency of application very much depends on the specific task<br />

at hand. And lastly, if a wrong document type has been chosen, the query is likely to<br />

result <strong>in</strong> an unsatisfactory search result and subsequent reformulation.<br />

The <strong>in</strong>fluence of the document type filter on the search success appears from<br />

Table 8.18. In Table 8.5 it became evident that system A had the highest share of<br />

successful sessions and queries. The difference between the two systems was slightly


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

higher for queries than for sessions. The general results reflect on the division as to use<br />

of the document type filter.<br />

For system A searches particularly bus<strong>in</strong>ess guidances and citizen booklets<br />

were helpful <strong>in</strong> retriev<strong>in</strong>g relevant documents. Both document types were mentioned as<br />

one among more possibilities <strong>in</strong> the aforementioned facet analysis of the simulated<br />

search tasks. Also queries that did not <strong>in</strong>clude the document type filter performed well.<br />

At least 40 % of the queries us<strong>in</strong>g these sett<strong>in</strong>gs <strong>in</strong> system A retrieved relevant<br />

documents. At a general level the percentages of successful search results are lower <strong>in</strong><br />

system B. The highest share of successful queries (25.8 %) was found <strong>in</strong> queries that<br />

did not <strong>in</strong>clude the document type filter. One exception is the document type “Citizen<br />

booklet”, but the result is less significant as it was used <strong>in</strong> 5 documents. Apart from the<br />

“Citizen booklet” the share of successful queries decreases, when the document type<br />

filter is <strong>in</strong>cluded. We have already mentioned the over specifications of queries <strong>in</strong><br />

system B. The results from Table 8.18 support the exist<strong>in</strong>g impression of over<br />

specifications <strong>in</strong> system B. Thus, us<strong>in</strong>g the document type filter <strong>in</strong> system A helps<br />

specify and reduce search results, while the filter <strong>in</strong> system B tends to limit the search<br />

results too much.<br />

8.2.2 Reformulations<br />

Reformulations are <strong>in</strong>terest<strong>in</strong>g, because they can <strong>in</strong>form us about if and how<br />

searchers try to correct a query on the basis of an unsatisfy<strong>in</strong>g search result. Previously<br />

Table 8.5 demonstrated frequent reformulations <strong>in</strong> both systems, though with a higher<br />

frequency <strong>in</strong> system B compared to system A. System B also accounts for the highest<br />

average number of reformulations. Table 8.19 and Table 8.20 specify the figures at task<br />

level. From the first of the two tables it appears that the general figures are mirrored at<br />

Table 8.19 Number of sessions with query reformulations (percentages)<br />

Reformulations<br />

No reformulations<br />

Sim1 Sim2 Sim3 NWT Total<br />

SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />

6<br />

(37.5)<br />

10<br />

(62.5)<br />

11<br />

(68.8)<br />

5<br />

(31.3)<br />

12<br />

(75.0)<br />

4<br />

(25.0)<br />

16<br />

(100.0)<br />

10<br />

(62.5)<br />

0 (0.0) 6<br />

(37.5)<br />

202<br />

15<br />

(93.8)<br />

1<br />

(6.3)<br />

14<br />

(87.5)<br />

2<br />

(12.5)<br />

11<br />

(68.8)<br />

5<br />

(31.3)<br />

42<br />

(65.6)<br />

22<br />

(34.4)<br />

Total 16 16 16 16 16 16 16 16 64 64<br />

53<br />

(82.8)<br />

11<br />

(17.2)


Table 8.20 Number of reformulations <strong>in</strong> sessions<br />

Sim1 Sim2 Sim3 NWT Total<br />

203<br />

Chapter 8<br />

SysA 2.50 (n=6) 3.33 (n=12) 5.60 (n=10) 3.86 (n=14) 3.93 (n=42)<br />

SysB 4.82 (n=11) 6.06 (n=16) 4.67 (n=15) 4.64 (n=11) 5.11 (n=53)<br />

Total 4.00 (n=17) 4.89 (n=28) 5.04 (n=25) 4.20 (n=25) 4.59 (N=95)<br />

Legend: Sessions without reformulations have been excluded from the present table, which makes<br />

N=95. That implies that of the total of 128 sessions <strong>in</strong> the search test, 95 had reformulations.<br />

session level with one exception. The genu<strong>in</strong>e search task is the only task hav<strong>in</strong>g fewer<br />

reformulations <strong>in</strong> system B than <strong>in</strong> system A. However, the number of reformulations<br />

is still high. Further, sim1, system A is the only example of a task where the number of<br />

sessions without reformulations surpasses the number of sessions with reformulations.<br />

In sessions with reformulations the general average number of reformulations was 4.59,<br />

a little less for system A searches and a little above for system B searches (see Table<br />

8.20, outer right column, bottom cell). From the table below it is apparent that sim3 had<br />

a higher average of reformulations <strong>in</strong> system A. For the rema<strong>in</strong>der of the tasks, system<br />

B had the highest average number of reformulations. As concluded <strong>in</strong> section 8.2.1.1 it<br />

required more queries to retrieve relevant documents <strong>in</strong> system B.<br />

Types of reformulations add to our understand<strong>in</strong>g of the search moves carried<br />

out by the test persons. We have analysed reformulations as to whether the category,<br />

the search terms, the document type, or the search operator were changed, if several<br />

parameters were changed, or if no reformulation occurred (mostly <strong>in</strong> the first query of a<br />

session) (see Table 8.21). As mentioned before, changes of search operators are rare <strong>in</strong><br />

both systems. In system A the overall preferred reformulation is a change of search<br />

terms. Next follows a change of the document type and simultaneous change of two or<br />

more parameters. As discussed <strong>in</strong> section 8.2.1.3, the search operator is rarely used as a<br />

s<strong>in</strong>gle reformulation move. Compared to system B, the use of the document type filter<br />

is far more used <strong>in</strong> system A, most likely because this is the only possible way of<br />

reduc<strong>in</strong>g search results <strong>in</strong> system A apart from chang<strong>in</strong>g the search terms or the search<br />

operator. Thus, the test persons actually used the available options for modification of<br />

their search results. Further the regular use of the document type filter emphasizes the<br />

importance and relevance of the filter.<br />

In system B the preferred reformulation was a change of categories, closely<br />

followed by a comb<strong>in</strong>ation of two or more parameters. Next <strong>in</strong> terms of frequency


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 8.21 Types of reformulations for all queries (percentages)<br />

No<br />

reformulations<br />

Sim1 Sim2 Sim3 NWT Total<br />

SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB<br />

16<br />

(51.6)<br />

16<br />

(23.2)<br />

Category - 15<br />

(21.7)<br />

Query terms 11<br />

(35,5)<br />

15<br />

(21.7)<br />

16<br />

(28.6)<br />

15<br />

(13.3)<br />

- 46<br />

(40.7)<br />

28<br />

(50.0)<br />

Document type 4<br />

(7.1)<br />

Search<br />

operators<br />

>1 types<br />

simultaneously<br />

1<br />

(3.2)<br />

3<br />

(9.7)<br />

Total 31<br />

(100)<br />

1<br />

(1.4)<br />

22<br />

(31.9)<br />

69<br />

(100)<br />

3<br />

(5.4)<br />

5<br />

(8.9)<br />

56<br />

(100)<br />

6<br />

(5.3)<br />

6<br />

(5.3)<br />

2<br />

(1.8)<br />

38<br />

(33.6)<br />

113<br />

(100)<br />

204<br />

20<br />

(27.8)<br />

15<br />

(17.4)<br />

- 41<br />

(47.7)<br />

23<br />

(31.9)<br />

18<br />

(25.0)<br />

8<br />

(9.3)<br />

1<br />

(1.2)<br />

17<br />

(24.3)<br />

16<br />

(23.9)<br />

- 12<br />

(17.9)<br />

35<br />

(50)<br />

6<br />

(8.6)<br />

- 4<br />

(5.7)<br />

11<br />

(15.3)<br />

72<br />

(100)<br />

21<br />

(24.4)<br />

86<br />

(100)<br />

8<br />

(11.4)<br />

70<br />

(100)<br />

18<br />

(26.9)<br />

1<br />

(1.5)<br />

2<br />

(3.0)<br />

18<br />

(26.9)<br />

67<br />

(100)<br />

69<br />

(30.1)<br />

62<br />

(18.5)<br />

- 114<br />

(34.0)<br />

97<br />

(42.4)<br />

28<br />

(12.2)<br />

8<br />

(3.5)<br />

27<br />

(11.8)<br />

229<br />

(100)<br />

47<br />

(14.0)<br />

8<br />

(2.4)<br />

5<br />

(1.5)<br />

99<br />

(29.6)<br />

335<br />

(100)<br />

followed a change of query terms, while document type and search operators were<br />

rarely used as query modifiers. Here it is evident that categories are important, which is<br />

to be expected as they were mandatory <strong>in</strong> system B. In addition categories were to a<br />

large extent comb<strong>in</strong>ed with other parameters. Most commonly a change of category<br />

was comb<strong>in</strong>ed with a change of search terms (see table 6, Appendix 28). This reflects<br />

the design of the system, where only categories with content were shown to the<br />

searchers. Thus, when search terms were changed, a change of available categories was<br />

likely to occur, as the categories reflected the list of retrieved documents. This also<br />

expla<strong>in</strong>s the importance of a change of query terms as a reformulation.<br />

The division of search tasks <strong>in</strong> Table 8.21 shows some <strong>in</strong>dividual<br />

characteristics. One characteristic is the use of categories across system B queries.<br />

Thus, categories were used approximately twice as much <strong>in</strong> sim2 and sim3 compared to<br />

sim1 and the genu<strong>in</strong>e work task. As the categories were not comb<strong>in</strong>ed with other<br />

modification tools, the number refers to queries, where the test persons have clicked<br />

different categories on the basis of the same query terms to f<strong>in</strong>d relevant documents.


Table 8.22 Query success on the basis of types of reformulations (percentages)<br />

205<br />

Chapter 8<br />

System A Total System B Total<br />

Success Failure system A Success Failure system B<br />

Category - - - 24 90 114<br />

(21.1) (78.9) (100.0)<br />

Query terms 22 75 97 5 42 47<br />

(22.7) (77.3) (100.0) (10.6) (89.4) (100.0)<br />

Document type 9 19 28 1 7 8<br />

(32.1) (67.9) (100.0) (12.5) (87.5) (100.0)<br />

Search operators 1 7 8 1 4 5<br />

(12.5) (87.5) (100.0) (20.0) (80.0) (100.0)<br />

>1 types 11 16 27 19 80 99<br />

simultaneously (40.7) (59.3) (100.0) (19.2) (80.8) (100.0)<br />

The success of the respective types of reformulations has been summed up <strong>in</strong><br />

Table 8.22. Overall, system A has a higher share of successful reformulations when<br />

compared to system B. At the level of types of reformulations the best performance is<br />

achieved <strong>in</strong> system A by us<strong>in</strong>g a comb<strong>in</strong>ation of terms. Here, about 40 % of queries<br />

manage to retrieve relevant documents. Next follows a change of sett<strong>in</strong>gs of the<br />

document type filter. In system B the variance of performance were smaller than <strong>in</strong><br />

system A. Here the test persons had less success <strong>in</strong> improv<strong>in</strong>g their outputs by<br />

chang<strong>in</strong>g query terms and search operators, mean<strong>in</strong>g that the two most frequent<br />

reformulation types accounted for fairly the same share of successful queries.<br />

Categories, search operators, and a comb<strong>in</strong>ation of query modifiers had the best<br />

performance with<strong>in</strong> the system, but the performance was below the percentages ga<strong>in</strong>ed<br />

<strong>in</strong> system A. Thus, with<strong>in</strong> system B we may conclude that reformulations based on a<br />

change of categories perform better <strong>in</strong> comparison with the rema<strong>in</strong>der modification<br />

tools. System A reformulations were most successful, when they consisted of a<br />

comb<strong>in</strong>ation of more parameters simultaneously. However, the share of successful<br />

reformulations leaves room for improvement <strong>in</strong> both systems.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

8.2.3 Comb<strong>in</strong>ed system B sessions and queries<br />

Dur<strong>in</strong>g the course of the search test, test persons occasionally ended up assess<strong>in</strong>g<br />

documents before choos<strong>in</strong>g a category <strong>in</strong> system B queries. The behavior had different<br />

causes. One cause was the speed of the system. Thus, <strong>in</strong> the time wait<strong>in</strong>g for the<br />

system to categorize search results, some test persons began to review the documents<br />

found on the basis of the <strong>in</strong>itial query. On other occasions the test persons actually saw<br />

the document they were look<strong>in</strong>g for <strong>in</strong> the results list before even decid<strong>in</strong>g on a category<br />

to reduce search results by, and ended up assess<strong>in</strong>g the <strong>in</strong>itial search results without<br />

filter<strong>in</strong>g by a category. We denote these searches as comb<strong>in</strong>ed system B searches. The<br />

follow<strong>in</strong>g quote serves as an illustration of comb<strong>in</strong>ed system B searches:<br />

“But the first time I searched, I got an e-commerce handbook. I would have<br />

preferred that to go<strong>in</strong>g down there [“down there” refers to the categorization w<strong>in</strong>dow<br />

on the right hand side of the screen]” (TP10, l<strong>in</strong>e 10-11).<br />

In several cases when a highly relevant document had been discovered before the choice<br />

of a category <strong>in</strong> system B, the test persons could not locate the document <strong>in</strong> the<br />

categories, which occasionally led to frustrations. To exemplify:<br />

”It is just as bad, because it says “Arrears”. And “Employers”, and it is<br />

neither of them. So let’s see about “Employers”… Because it says “Employers and Ataxes”<br />

And it is withhold by the A-taxes, just like our employers withhold our taxes. I<br />

simply can’t f<strong>in</strong>d it. I know it is <strong>in</strong> there. But on the basis of this, I can’t get <strong>in</strong> there.<br />

Because when I know where it is at, I would go directly for it <strong>in</strong>stead.” (TP05, l<strong>in</strong>e 113-<br />

117).<br />

A third type of behavior also triggered comb<strong>in</strong>ed system B queries. It has previously<br />

been observed that system B searches tended to be narrow. When the <strong>in</strong>itial query<br />

resulted <strong>in</strong> very few search results, it did not seem natural to the test persons to further<br />

reduce an already limited search results. Some test persons undertook the<br />

categorization despite the few results, while others omitted the categorization and<br />

assessed the results retrieved on the basis of the rema<strong>in</strong><strong>in</strong>g search possibilities.<br />

“It says just that... Well, the costs to the European border should be <strong>in</strong>cluded<br />

<strong>in</strong> the customs value. The other one regard<strong>in</strong>g transportation, I can see that it is<br />

206


207<br />

Chapter 8<br />

expla<strong>in</strong>ed with great precision. But <strong>in</strong> this case I did not search for “Customs” down<br />

here [<strong>in</strong> the categories]. I got it by search<strong>in</strong>g for freight and customs value and “pages<br />

with all words”. And then I got the customs guidance, which is also the one referr<strong>in</strong>g to<br />

the customs codes treat<strong>in</strong>g the rules about the amount of carriage to add. So this<br />

[document] is a three then. But I didn’t get it by search<strong>in</strong>g for “Bus<strong>in</strong>ess imports” or<br />

“Shipp<strong>in</strong>g” or “Exports” [referr<strong>in</strong>g to categories]” (TP32, l<strong>in</strong>e 295-301)<br />

The quote illustrates, <strong>in</strong> a comb<strong>in</strong>ed system B search with just two retrieval results, how<br />

the test person ends up assess<strong>in</strong>g the documents retrieved without categorization.<br />

The comb<strong>in</strong>ed system B queries and sessions were coded as system B searches<br />

<strong>in</strong>asmuch as the test persons had access to the taxonomy and could be <strong>in</strong>fluenced by it.<br />

In methodical respect, an overview of the extent of the queries must be provided<br />

though. To be able to do this, additional codes were added to enable separation from<br />

the correct system B queries. Report<strong>in</strong>g on the extent of comb<strong>in</strong>ed system B sessions is<br />

the purpose of the present section. Table 8.23 lists the share of comb<strong>in</strong>ed system B<br />

sessions. The table shows that about 60 % of the system B sessions conta<strong>in</strong>ed one or<br />

more queries omitt<strong>in</strong>g categories. It is also evident from the table that approximately 60<br />

% of the successful sessions <strong>in</strong> system B had at least one query that did not <strong>in</strong>clude the<br />

choice of a category. The extent of sessions that to some degree pass over the<br />

categorization is substantial then.<br />

Table 8.24 enlarge on comb<strong>in</strong>ed system B sessions. The table shows the<br />

system deliver<strong>in</strong>g successful results for queries conta<strong>in</strong>ed <strong>in</strong> sessions. In that way the<br />

Table 8.23 Sessions carried out <strong>in</strong> system B, or <strong>in</strong> a comb<strong>in</strong>ation of System B and system A:<br />

Frequency and success (percentages)<br />

Number of sessions <strong>in</strong> Number of successful<br />

system B<br />

sessions system B<br />

System B 26 (40.6) 22 (40.7)<br />

Comb<strong>in</strong>ed system B sessions 38 (59.4) 32 (59.3)<br />

Total 64 (100.0) 54 (100.0)<br />

Legend: System B denotes sessions, that have been carried out <strong>in</strong> system B exclusively. “Comb<strong>in</strong>ed<br />

system B sessions” refers to the sessions that should have been carried out <strong>in</strong> system B, but where<br />

test persons have assessed the relevance of documents found <strong>in</strong> system A and <strong>in</strong> system B.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Table 8.24 System of successful queries <strong>in</strong> comb<strong>in</strong>ed system B sessions<br />

208<br />

Frequency Percent<br />

Valid Task not solved 6 15.8<br />

System A 13 34.2<br />

System B 15 39.5<br />

Both systems applied 4 10.5<br />

Total 38 100.0<br />

Legend: The table lists the systems that have provided documents with a relevance score of 2 or 3 <strong>in</strong><br />

comb<strong>in</strong>ed system B sessions. That expla<strong>in</strong>s why N=38.<br />

table address the sessions based on a comb<strong>in</strong>ation of the two test systems. It is<br />

identified that though a comb<strong>in</strong>ed system B session have <strong>in</strong>cluded queries conducted <strong>in</strong><br />

system A and system B, not both systems have necessarily provided useful search<br />

results. The share of successful sessions is fairly even between the two systems. 13<br />

sessions were solved by omitt<strong>in</strong>g categories, 15 sessions had success <strong>in</strong> <strong>in</strong>clud<strong>in</strong>g the<br />

categories <strong>in</strong> their queries. Only 4 sessions found relevant documents by means of both<br />

systems. This means that at session level the share of success is fairly even between the<br />

two systems. It also means that the test persons may have omitted the categorization <strong>in</strong><br />

some queries of a session, but it may still be by means of categorization that relevant<br />

documents are found.<br />

Table 8.25 extends the prior table and present the share of successes at query<br />

level. The table present all queries carried out <strong>in</strong> system B; both dist<strong>in</strong>ct system B<br />

queries and comb<strong>in</strong>ed system B queries. Though the test persons <strong>in</strong> a number of cases<br />

found the categorization irrelevant, it was still used <strong>in</strong> approximately two thirds of the<br />

queries (see outer right hand column). In addition, when calculated <strong>in</strong> terms of the<br />

Table 8.25 System B queries: Frequency of category use and query success (percentages)<br />

Success Failure Total<br />

Queries with categories 52 (24.2) 163 (75.8) 215 (100.0)<br />

Queries without categories 20 (16.7) 100 (83.3) 120 (100.0)<br />

Total 72 263 335<br />

Legend: The table conta<strong>in</strong>s all queries processed <strong>in</strong> system B, both regular system B queries and<br />

comb<strong>in</strong>ed system B queries (N=335).


209<br />

Chapter 8<br />

share of successful queries, queries <strong>in</strong>clud<strong>in</strong>g categories had a better performance (24.2<br />

% of queries with success) than queries omitt<strong>in</strong>g categorization (16.7 % of queries were<br />

successful). Summ<strong>in</strong>g up on comb<strong>in</strong>ed system B searches, more than half of system B<br />

sessions <strong>in</strong>cluded system A queries to some extent. However, at query level for all<br />

system B queries, queries <strong>in</strong>clud<strong>in</strong>g a category had a larger chance of succeed<strong>in</strong>g <strong>in</strong><br />

comparison with queries that basically corresponded to system A queries.<br />

In the post search <strong>in</strong>terviews the test persons were asked to assess system B<br />

(see <strong>in</strong>terview guide <strong>in</strong> Appendix 19). In the responses we found answers to, when the<br />

categorization was useful, when it was not. The answers are analysed <strong>in</strong> the present<br />

section <strong>in</strong> order to elaborate further on the results ga<strong>in</strong>ed from the search log presented<br />

above.<br />

There was an overall agreement between the test persons that the<br />

categorization was ma<strong>in</strong>ly useful, when they had a large set of results. TP21 said on the<br />

basis of a query with 14 results:<br />

“It did not help me so much there, because the query didn’t have that many<br />

results. It was possible to cope with the documents there, whether the categorization<br />

had been there or not. Only 14 documents were retrieved. You could cope with that. It<br />

is ma<strong>in</strong>ly helpful, when you get large results, a thousand documents or so” (TP21, l<strong>in</strong>e<br />

257-260)<br />

When the categorization was useful <strong>in</strong> terms of retrieval set sizes varied. Some<br />

mentioned 40 documents, others far more like TP21. Categorization was also found<br />

useful <strong>in</strong> generat<strong>in</strong>g new perspectives on the composition of a query and for<br />

understand<strong>in</strong>g the facets of the search task. That supports the decision of cod<strong>in</strong>g<br />

comb<strong>in</strong>ed system B queries and sessions as system B queries and sessions <strong>in</strong> the overall<br />

cod<strong>in</strong>g of the search log. One example is TP02, who would have liked to have access to<br />

the categorization <strong>in</strong> a system A session:<br />

“At the end I would have liked to be able to go over there [<strong>in</strong>to the<br />

categorization], because no matter what I did, I could not f<strong>in</strong>d anyth<strong>in</strong>g. And then I need<br />

somewhere else to search, where I have the option of see<strong>in</strong>g other sub-topics, <strong>in</strong> order<br />

to perhaps access it that way.” (TP02, l<strong>in</strong>e 625-633).<br />

TP09 supports the statement of TP02 <strong>in</strong> discuss<strong>in</strong>g a system B session:


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

“It worked well there, because suddenly I found a pr<strong>in</strong>cipal topic that I could<br />

click on. And that gave me that… Hey! Yes! That has to do with company taxation. So it<br />

also helped me th<strong>in</strong>k<strong>in</strong>g what this is at all” (TP09, l<strong>in</strong>e 553-555)<br />

The f<strong>in</strong>d<strong>in</strong>gs confirms Käki’s f<strong>in</strong>d<strong>in</strong>gs (though based on extracted categorization, see<br />

section 5.4.1.5), that when “…the orig<strong>in</strong>al query was vague, broad, general, or<br />

conta<strong>in</strong>ed words that have multiple mean<strong>in</strong>gs...” (Käki, 2005b, p. 138). Still, the test<br />

persons of the present search test discussed, if the categorization was more useful to<br />

people with some or no <strong>in</strong>sight <strong>in</strong>to the topic of the tasks. TP06 knew what to look for<br />

<strong>in</strong> one of the tasks:<br />

“I knew that if I was to look for someth<strong>in</strong>g about the taxation, then I would<br />

also know someth<strong>in</strong>g about <strong>in</strong>dependent bus<strong>in</strong>esses. And then I could go <strong>in</strong> there faster.<br />

So I knew that I should choose “Personal <strong>in</strong>comes” over “Capital <strong>in</strong>come” [examples<br />

of categories]. I know the tax rules. So it is easier to choose between the categories,<br />

when the answer is known <strong>in</strong> advance” (TP06, l<strong>in</strong>e 392-395)<br />

TP20 on the other hand did not f<strong>in</strong>d much help <strong>in</strong> the categorization:<br />

“But I don’t know, if I would ever start go<strong>in</strong>g through all this [the categories]. I<br />

th<strong>in</strong>k it takes more time, because I don’t know what is beh<strong>in</strong>d. If I was a specialist <strong>in</strong><br />

SKAT and knew all about company tax settlements or the like, then it [the<br />

categorization] might be perfect for me. Because then I would know that I can go <strong>in</strong><br />

there exactly, click that, and get the documents out. But I don’t know if it would forget<br />

about some documents that I need, if it limits the results too much”. (TP20, l<strong>in</strong>e 339-<br />

344)<br />

TP24 sums up the usefulness for both users with large knowledge on the task topic and<br />

users with less knowledge:<br />

If I know what I am look<strong>in</strong>g for, or at least th<strong>in</strong>k I know where to go [<strong>in</strong> the<br />

categories], then it is really good. But when I don’t know, it might also be good,<br />

because you get to try out different keywords [taxonomy terms]. But if you have the<br />

wrong keyword, you will def<strong>in</strong>itely not f<strong>in</strong>d it that way.“ (TP24, l<strong>in</strong>e 320-323)<br />

210


211<br />

Chapter 8<br />

The reason for the difference of op<strong>in</strong>ion may be due to lack of <strong>in</strong>sight, <strong>in</strong>to the<br />

functionalities of the system, and <strong>in</strong>to the structure and content of the taxonomy. Thus,<br />

a considerable number of the test persons expressed lack of experience with the test<br />

system as an important reason, if they experienced difficulties locat<strong>in</strong>g relevant<br />

documents. The difficulties can be read <strong>in</strong> Table 8.21 above. Here 34 % of all system<br />

B reformulations consist of chang<strong>in</strong>g the category, mean<strong>in</strong>g that test persons clicked<br />

around between categories with no simultaneous changes of the rema<strong>in</strong>der of the search<br />

options. In other cases the trouble experienced by the test persons were caused by<br />

apparently curious categorizations offered by system B. One example was the presence<br />

of the taxonomy term “Tonnage taxes” <strong>in</strong> a query regard<strong>in</strong>g property ga<strong>in</strong> taxes (TP13).<br />

We have already mentioned the vary<strong>in</strong>g sizes of the documents of the collection and the<br />

importance of the document type directions to the employees, a very large document<br />

type. The f<strong>in</strong>d<strong>in</strong>g suggests that <strong>in</strong> collections with large documents, the documents<br />

should be <strong>in</strong>dexed <strong>in</strong> smaller units to obta<strong>in</strong> precision of search results. On the other<br />

side, when perform<strong>in</strong>g categorization of search results that are already very limited as<br />

was the case <strong>in</strong> many system B searches, the results of the categorization may also be<br />

skewed. Be it lack of experience with the categorization <strong>in</strong> system B, too narrow<br />

queries or odd suggestions for categories, we consider all three as explanations for the<br />

general <strong>in</strong>creased number of queries <strong>in</strong> system B sessions described <strong>in</strong> section 8.2.1.1.<br />

TP14 summarizes the discussion by say<strong>in</strong>g:<br />

Once you beg<strong>in</strong> to get an idea, what the categories are, what they stand for…<br />

Then you fumble, until you f<strong>in</strong>d out what it is. Are there more roads lead<strong>in</strong>g to Rome,<br />

or which is the fastest, or… Well, it is an adaptation with some th<strong>in</strong>gs. What is the<br />

wisest th<strong>in</strong>g to do…” (TP14, l<strong>in</strong>e 493-495)<br />

8.3 Summary and performance implications for future <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e<strong>government</strong><br />

The purpose of the present chapter was to answer research question 2 and 3 regard<strong>in</strong>g<br />

the comparative performance of automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> terms of extracted versus<br />

assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong>, and the implications for future <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es <strong>in</strong> e-<strong>government</strong>.<br />

Above the focus has been on research question 2 and the results of the search test. In<br />

this summary we will unify the conclusions drawn along the respective sections of the


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

chapter <strong>in</strong> order to be able state the implications of the results for e-<strong>government</strong><br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es (research question 2.10). The general figures of the test (section<br />

8.2) demonstrated a better performance of system A <strong>in</strong> terms of fewer terms and<br />

concepts <strong>in</strong> queries, fewer sessions with reformulations, fewer queries <strong>in</strong> sessions with<br />

reformulations, and a higher share of success <strong>in</strong> sessions and queries. However a more<br />

detailed analysis of figures and <strong>in</strong>terviews provided a more differentiated picture. Thus,<br />

when counted as to search tasks system B were equal to or above <strong>in</strong> some tasks. This<br />

was the case for the genu<strong>in</strong>e work tasks <strong>in</strong> terms of session success, query success, the<br />

number of queries <strong>in</strong> sessions and the number of queries <strong>in</strong> successful sessions. In<br />

sim3, system B outperformed system A <strong>in</strong> terms of a lower average number of search<br />

terms and search keys <strong>in</strong> queries, both as regards total numbers and successful queries.<br />

The analysis also detected a higher use of the document type filter <strong>in</strong> system A, which<br />

was expla<strong>in</strong>ed by the reduced number of query composition tools <strong>in</strong> system A. In<br />

addition the search log discovered that above half of system B sessions <strong>in</strong>cluded one or<br />

more queries omitt<strong>in</strong>g categorization. The search log and the search <strong>in</strong>terviews revealed<br />

different reasons for the omissions: When search results were too small, if a relevant<br />

document was discovered at the list of results while wait<strong>in</strong>g for the system to categorize<br />

results, or related to the previous: if a highly relevant document was found among the<br />

first results before a category was chosen.<br />

Different causes were found for the lower general performance of system B.<br />

One reason was the test persons’ challenges of handl<strong>in</strong>g the search operators available<br />

<strong>in</strong> the prototype. Significantly more restrictions were applied <strong>in</strong> system B queries,<br />

result<strong>in</strong>g <strong>in</strong> at times very few search results, and also reduc<strong>in</strong>g the assessment of the<br />

documents retrieved. Another reason was found <strong>in</strong> the post search <strong>in</strong>terviews. Here<br />

lack of experience with the categorization features of system B was a frequent<br />

explanation for the difficulties experienced. Furthermore, some test persons found it<br />

difficult to identify by the label, which documents were conta<strong>in</strong>ed <strong>in</strong> the respective<br />

categories. The f<strong>in</strong>d<strong>in</strong>gs emphasize the importance of users’ familiarity with the design<br />

and functionality of retrieval systems. The outcome of the difficulties could be detected<br />

<strong>in</strong> the types of reformulations carried out. To expla<strong>in</strong>, about one third of all queries<br />

carried out <strong>in</strong> system B were reformulations based on a change of categories alone.<br />

Opposite understand<strong>in</strong>gs also existed among the test persons though. Overall,<br />

categorization was useful, when there was a certa<strong>in</strong> amount of documents to categorize.<br />

At few results it was easier to look through the documents manually. System B was<br />

also useful, when the employees had some knowledge of the search task topic. Then it<br />

212


213<br />

Chapter 8<br />

was considered easier to assess the relevance of the categories, as the labels of the<br />

categories made sense. However, the categorization of system B was also beneficial,<br />

when test persons had a limited knowledge of the search task at hand. In those cases<br />

categories helped the test persons discover and understand facets conta<strong>in</strong>ed <strong>in</strong> the task.<br />

Here it is important to make clear that limited knowledge should be understood as<br />

generalist knowledge of the organization topics.<br />

As appears from the conclud<strong>in</strong>g remarks the use and omission of categorization<br />

<strong>in</strong> solv<strong>in</strong>g search tasks is not the same to all users despite that they may f<strong>in</strong>d themselves<br />

with<strong>in</strong> the same doma<strong>in</strong>, as with the case study carried out <strong>in</strong> the thesis. On the basis of<br />

the search test it is concluded that at times free text <strong><strong>in</strong>dex<strong>in</strong>g</strong> as represented <strong>in</strong> system A<br />

is preferred by users. This is <strong>in</strong> particular the case, when they know precisely what to<br />

look for. In these situations metadata like the type of the document is helpful <strong>in</strong><br />

compos<strong>in</strong>g queries of high precision. When few documents of high precision are the<br />

result of a query, the employees prefer search<strong>in</strong>g by metadata. What has also become<br />

evident dur<strong>in</strong>g the test is the employees’ emphasis of document types, both concern<strong>in</strong>g<br />

queries and when assess<strong>in</strong>g query outputs. The employees had a large <strong>in</strong>sight <strong>in</strong>to the<br />

range of documents at the <strong>in</strong>tranet, as specific document types often were the outcome<br />

of a work process. That stresses the importance of metadata <strong>in</strong> e-<strong>government</strong>, when<br />

compos<strong>in</strong>g queries, but also <strong>in</strong> document snippets of search results.<br />

The overall implication of the search test for <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es <strong>in</strong> e<strong>government</strong><br />

is that both extracted and assigned automatic types should be present <strong>in</strong><br />

professional-<strong>government</strong>. As appeared from chapter 5, categorization has primarily<br />

been tested on the www, but as demonstrated <strong>in</strong> the search test, also smaller and more<br />

specialized systems may benefit from the <strong><strong>in</strong>dex<strong>in</strong>g</strong> approach. From the doma<strong>in</strong> study<br />

we learned that verificative <strong>in</strong>formation needs and conscious topical <strong>in</strong>formation needs<br />

are prevalent among e-<strong>government</strong> employees. Extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> terms of free text<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> has proven to be useful particular to the verificative type of <strong>in</strong>formation needs,<br />

while categorization was used more, when the test persons need ideas for search terms<br />

or perspectives of the work task at hand, that is, the conscious topical needs.<br />

Assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> the form of categorization assisted the users <strong>in</strong> their<br />

<strong>in</strong>formation seek<strong>in</strong>g, when they needed ideas for query reformulation or when they had<br />

difficulties <strong>in</strong>terpret<strong>in</strong>g the concepts conta<strong>in</strong>ed <strong>in</strong> a search task. In future e-<strong>government</strong><br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> both <strong><strong>in</strong>dex<strong>in</strong>g</strong> approaches should be represented <strong>in</strong> order to meet the diversity<br />

of <strong>in</strong>formation need types identified <strong>in</strong> the doma<strong>in</strong> study. The search test has also<br />

emphasized the importance of users’ familiarity with the KOS (<strong>in</strong> this case the


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

taxonomy). When the employees didn’t correspond with the categories, it easier to<br />

manually go through an number of results, than it was to click a number of categories to<br />

f<strong>in</strong>d someth<strong>in</strong>g relevant to the task at hand.<br />

214


9 Conclusion and recommendations for future work<br />

215<br />

Chapter 9<br />

The purpose of the thesis was to <strong>in</strong>vestigate if and how automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong> can improve<br />

professional e-<strong>government</strong> users’ access to digitalized, work based <strong>in</strong>formation. To do<br />

this, the preced<strong>in</strong>g chapters have reviewed, <strong>in</strong>vestigated, and analysed <strong><strong>in</strong>dex<strong>in</strong>g</strong>,<br />

<strong>in</strong>formation seek<strong>in</strong>g and search<strong>in</strong>g <strong>in</strong> e-<strong>government</strong> from a professional, user based<br />

perspective. Chapter 2 expla<strong>in</strong>ed the methodological standpo<strong>in</strong>t of the thesis. Chapter<br />

3, 4, and 5 reviewed the e-<strong>government</strong> doma<strong>in</strong>, e-<strong>government</strong> seek<strong>in</strong>g behaviour, and<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> methods respectively. The review chapters served the purpose of guid<strong>in</strong>g the<br />

empirical <strong>in</strong>vestigations. In chapter 6 we outl<strong>in</strong>ed and accounted for the empirical<br />

designs, data collection and analysis of the two overall empirical studies of the thesis,<br />

the doma<strong>in</strong> study and the search test. The results of the studies were reported and<br />

analysed <strong>in</strong> Chapter 7 and 8. Chapter 7 addressed research question 1 concern<strong>in</strong>g<br />

professional e-<strong>government</strong> seek<strong>in</strong>g behaviour and the related <strong><strong>in</strong>dex<strong>in</strong>g</strong> demands by<br />

account<strong>in</strong>g for the results of the doma<strong>in</strong> study. Chapter 8 were concerned with the<br />

search test. By do<strong>in</strong>g this, research questions 2 and 3 concern<strong>in</strong>g the doma<strong>in</strong> specific<br />

performance of two <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods, and the related implications for <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

guidel<strong>in</strong>es with<strong>in</strong> the doma<strong>in</strong> was answered. The purpose of the present chapter is to<br />

unify the thesis’ threads <strong>in</strong> order to answer the research questions put forward <strong>in</strong><br />

Chapter 1. In section 9.1 we summarize the empirical f<strong>in</strong>d<strong>in</strong>gs of the thesis. Section<br />

9.2 makes recommendations for future work.<br />

9.1 Summary of empirical f<strong>in</strong>d<strong>in</strong>gs<br />

From the doma<strong>in</strong> study it was found that the e-<strong>government</strong> employees applied a myriad<br />

of ma<strong>in</strong>ly electronic <strong>in</strong>formation sources <strong>in</strong> their daily work. The predom<strong>in</strong>ant source<br />

was the <strong>in</strong>tranet. It has the highest use across all work tasks, while other types of<br />

sources depend on the work task at hand. The general prevalence of the <strong>in</strong>tranet<br />

supports its relevance to our choice of test system for the search test. Apart from direct<br />

<strong>in</strong>formation sources both the open field of the questionnaire and the focus group<br />

participants expressed an extensive use of colleagues as sources of <strong>in</strong>formation.<br />

The employees had a large work experience with<strong>in</strong> SKAT. With a long length<br />

of service <strong>in</strong> the organization the frequency of <strong>in</strong>formation seek<strong>in</strong>g predom<strong>in</strong>antly took


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

place with<strong>in</strong> regular <strong>in</strong>tervals, though not all the time. The employees demonstrated a<br />

good basic <strong>in</strong>sight <strong>in</strong>to their work topics on the basis of their experience. Though,<br />

particularly the employees engaged <strong>in</strong> citizen service had experienced, that their work<br />

tasks had changed with the <strong>in</strong>troduction of self-service. The change had caused a<br />

reduced memoriz<strong>in</strong>g of rules and regulations, as the employees were less <strong>in</strong> contact with<br />

citizens. The result was an <strong>in</strong>creased need for verification and updat<strong>in</strong>g <strong>in</strong>formation.<br />

Beyond that employee rout<strong>in</strong>e and topic <strong>in</strong>sight are attributable to the general frequency<br />

of <strong>in</strong>formation seek<strong>in</strong>g of approximately every 3 rd of 4 th time a work task is handled. To<br />

conclude, the study confirms the expected changes of employees’ work tasks with the<br />

<strong>in</strong>troduction of e-<strong>government</strong>, at least regard<strong>in</strong>g employees occupied with servic<strong>in</strong>g<br />

citizens.<br />

The ma<strong>in</strong> reason for consult<strong>in</strong>g the <strong>in</strong>tranet was verificative and conscious<br />

topical <strong>in</strong>formation needs. A few work tasks from the adm<strong>in</strong>istrative parts of the<br />

organization stood out with a high share of more complex <strong>in</strong>formation needs <strong>in</strong> terms of<br />

muddled topical needs, but they were exceptions to the general picture. It must be taken<br />

<strong>in</strong>to account though, that the questions guid<strong>in</strong>g the <strong>in</strong>formation needs questions of the<br />

questionnaire specifically concerned the <strong>in</strong>tranet, and that other <strong>in</strong>formation needs may<br />

occur <strong>in</strong> relation to other <strong>in</strong>formation sources. However, the results correspond well to<br />

the experience and <strong>in</strong>sight of the employees and to the conclusions drawn above, that<br />

employees often check up on <strong>in</strong>formation and rules to make sure they are updated. In<br />

addition, the results regard<strong>in</strong>g <strong>in</strong>formation needs were verified by the focus groups.<br />

Concern<strong>in</strong>g metadata the doma<strong>in</strong> study found an extensive need for metadata<br />

among the employees. A part of the reason for requir<strong>in</strong>g more and higher quality<br />

metadata orig<strong>in</strong>ated from a general difficulty of locat<strong>in</strong>g relevant documents <strong>in</strong> the<br />

runn<strong>in</strong>g <strong>in</strong>tranet. The difficulties often made the employees consult a colleague <strong>in</strong>stead<br />

of the <strong>in</strong>tranet In particular content metadata <strong>in</strong> terms of subject metadata were<br />

requested by the employees at the <strong>in</strong>tranet, but also other types were <strong>in</strong>quired by many<br />

employees. Thus, a general <strong>in</strong>terest towards metadata existed among the employees.<br />

The f<strong>in</strong>d<strong>in</strong>gs emphasize the importance of high quality mark up of documents to<br />

effective shar<strong>in</strong>g of knowledge <strong>in</strong> e-<strong>government</strong>.<br />

On the basis of the f<strong>in</strong>d<strong>in</strong>gs of the doma<strong>in</strong> study, the follow<strong>in</strong>g demands for<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> were deduced. As both verificative and conscious topical needs were<br />

dom<strong>in</strong>ant among the employees when consult<strong>in</strong>g the <strong>in</strong>tranet, both contextual and<br />

content metadata should be represented <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. A part of the def<strong>in</strong>ition of a<br />

verificative <strong>in</strong>formation need is that the user wants to locate a document on the basis of<br />

216


217<br />

Chapter 9<br />

some k<strong>in</strong>d of known bibliographic <strong>in</strong>formation. This calls for contextual metadata.<br />

Simultaneously conscious topical needs are solved by explor<strong>in</strong>g aspects of a known<br />

subject matter. Here content and contextual metadata is <strong>in</strong> place. To conclude, eemployees<br />

can ga<strong>in</strong> from both types of metadata <strong>in</strong> terms of their <strong>in</strong>formation needs,<br />

which is why they should be represented <strong>in</strong> the <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In addition the dissatisfaction<br />

with the search outcomes of the present <strong>in</strong>tranet was remarkable, and <strong>in</strong> many cases it<br />

resulted <strong>in</strong> giv<strong>in</strong>g up and consult<strong>in</strong>g colleagues <strong>in</strong>stead. For <strong><strong>in</strong>dex<strong>in</strong>g</strong> guidel<strong>in</strong>es it is<br />

emphasized that not only are metadata needed, they also need to be carefully added <strong>in</strong><br />

order to ensure quality. The quality is a premise for employees to be able to carry out<br />

effective and efficient <strong>in</strong>formation seek<strong>in</strong>g.<br />

The search test had its ma<strong>in</strong> focus on content metadata <strong>in</strong> terms of the subject<br />

categorization tested. However, also metadata <strong>in</strong> terms of document types turned out to<br />

be important to the test persons dur<strong>in</strong>g the test. On the basis of the doma<strong>in</strong> study<br />

f<strong>in</strong>d<strong>in</strong>gs regard<strong>in</strong>g <strong>in</strong>formation needs, three low complexity simulated search tasks<br />

guided the test searches along with one genu<strong>in</strong>e <strong>in</strong>formation need brought by each test<br />

person. Both simulated and genu<strong>in</strong>e search tasks were simple <strong>in</strong> terms of the number of<br />

concepts <strong>in</strong>cluded. Hence, all tasks consisted of three topical concepts or below.<br />

At a general level the search test found system B (compris<strong>in</strong>g categorization) to<br />

have more average terms <strong>in</strong> queries (2.43 to 2.25 <strong>in</strong> system A), and more average<br />

concepts <strong>in</strong> queries (1.90 to 1.67), and to have a lower share of queries apply<strong>in</strong>g the<br />

document type filter (31.6 to 43.2). Furthermore it required more work from the test<br />

persons to ga<strong>in</strong> success <strong>in</strong> system B. Here the share of sessions with reformulations was<br />

82.8 to 65.6 <strong>in</strong> system A, and the average number of reformulations was higher (4.23 to<br />

2.58 <strong>in</strong> system A). At session level system B was equal to or above system A <strong>in</strong> 3 of<br />

the 4 tasks <strong>in</strong> terms of the number of successful sessions. In terms of queries the total<br />

number of successful queries was fairly even between the two systems, though the<br />

number of failed queries were significantly higher <strong>in</strong> system B compared to system A.<br />

To conclude the effort required to locate relevant documents <strong>in</strong> system B was<br />

significantly higher.<br />

Further, a general f<strong>in</strong>d<strong>in</strong>g of the study was that queries with fewer terms were<br />

more likely to succeed. That <strong>in</strong>dicates that the test persons are very good at f<strong>in</strong>d<strong>in</strong>g<br />

relevant search terms. Further it means that the comb<strong>in</strong>ation with a category has a risk<br />

of over restrict<strong>in</strong>g results. This could <strong>in</strong>dicate that a less specific taxonomy could be<br />

useful to the employees, at least <strong>in</strong> a relatively small database as the one tested here.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Different causes were found for the <strong>in</strong>creased effort to retrieve relevant<br />

documents <strong>in</strong> system B. The test persons consistently used more search terms with the<br />

more restrictive search operator “Pages conta<strong>in</strong><strong>in</strong>g all words” and fewer search terms<br />

with the less restrictive search operator “Free text”. This shows that the test persons<br />

had difficulties understand<strong>in</strong>g the mean<strong>in</strong>g of the two predom<strong>in</strong>ant operators of the<br />

system. However, <strong>in</strong> terms of search results, system B further reduced the results <strong>in</strong> the<br />

categorization <strong>in</strong> order to complete the query, result<strong>in</strong>g <strong>in</strong> very limited search results,<br />

while the same operation <strong>in</strong> system A resulted <strong>in</strong> faster retrieval of relevant documents,<br />

because no further restrictions were added to the query. Further, <strong>in</strong> terms of system B,<br />

some test persons expressed trouble f<strong>in</strong>d<strong>in</strong>g suitable categories <strong>in</strong> the categorization to<br />

match their queries due to lack of knowledge of the taxonomy. The trouble was<br />

identified <strong>in</strong> the analysis of types of reformulations <strong>in</strong> system B too, where a change of<br />

mere categorization accounted for 40% of all reformulations <strong>in</strong> sim2, 47% percent <strong>in</strong><br />

sim3, and 34% <strong>in</strong> total numbers. The results stress the importance of an appropriate and<br />

mean<strong>in</strong>gful level of detail <strong>in</strong> controlled vocabularies. However, the results also stress<br />

that though the employees are considered experienced <strong>in</strong>formation searchers, they may<br />

be confused by the mean<strong>in</strong>g of Boolean operators. To compare, the average number of<br />

queries us<strong>in</strong>g the document type filter was higher <strong>in</strong> system A, though with large<br />

variations at task level. However, the use reflected a better understand<strong>in</strong>g of the use of<br />

the document type filter as a query tool <strong>in</strong> the two systems.<br />

Omissions of categorizations <strong>in</strong> one third of system B queries were the result of<br />

the test persons’ challenges. Analyses of the queries carried out <strong>in</strong> system B showed a<br />

fairly even distribution of successful sessions as to whether the session had been solved<br />

by means of categorization or not. At query level the <strong>in</strong>clusion of a category was<br />

successful <strong>in</strong> 24.2 % of queries, while of the queries that omitted categories had a<br />

success rate of 16.7 %. In the <strong>in</strong>terviews carried out, the omissions of categories were<br />

expla<strong>in</strong>ed. Categorization was not supportive <strong>in</strong> queries, where a highly relevant result<br />

came out among the first results. Neither was it relevant, if a very small set of results<br />

were retrieved. In those cases the categorization were considered as <strong>in</strong>convenient to the<br />

retrieval process, as it was easier to manually look through the results <strong>in</strong>stead of<br />

decid<strong>in</strong>g on the correct category. On the other hand categorization was useful <strong>in</strong><br />

suggest<strong>in</strong>g new search terms for a query<br />

Overall, it is concluded that there is a basis for implement<strong>in</strong>g categorization <strong>in</strong><br />

<strong>in</strong>formation systems support<strong>in</strong>g professional e-<strong>government</strong> users. Metadata based<br />

218


219<br />

Chapter 9<br />

extracted <strong><strong>in</strong>dex<strong>in</strong>g</strong> are important for successful retrieval <strong>in</strong> the doma<strong>in</strong> too, <strong>in</strong> order to<br />

be able to support verificative <strong>in</strong>formation needs <strong>in</strong> the doma<strong>in</strong>..<br />

9.2 Contributions of the thesis<br />

The contributions of the thesis <strong>in</strong> terms of the theoretical and empirical framework are<br />

identified to be:<br />

A confirmation of the non-verified assumptions of the doma<strong>in</strong> of e-<strong>government</strong> that<br />

work tasks of e-<strong>government</strong> employees are expected to change as a result of <strong>in</strong>creased<br />

self-service among external stakeholders <strong>in</strong> the doma<strong>in</strong> (Snellen, 2002; Dörfler, 2003;<br />

Marchion<strong>in</strong>i, Samet & Brandt, 2003; Brown, 2005; Landsforen<strong>in</strong>gen af Kommunale<br />

Servicecentre, 2005; Mahler & Regan, 2005). The results have shown, that at least for<br />

employees engaged <strong>in</strong> servic<strong>in</strong>g citizens, the need to verify <strong>in</strong>formation has <strong>in</strong>creased,<br />

as less is memorized due to less rout<strong>in</strong>e. To LIS, the consequences of <strong>in</strong>creased<br />

<strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> order to rema<strong>in</strong> updated are important.<br />

As regards <strong>in</strong>formation seek<strong>in</strong>g of professional e-<strong>government</strong> users it has been outl<strong>in</strong>ed<br />

that the user group is not very well discovered. In the light of changes <strong>in</strong> the work tasks<br />

just mentioned, an update of e-<strong>government</strong> employees was needed. The thesis has<br />

added to our knowledge of the user group <strong>in</strong> terms of their:<br />

use of <strong>in</strong>formation sources<br />

frequency of <strong>in</strong>formation seek<strong>in</strong>g<br />

metadata preferences<br />

predom<strong>in</strong>ant types of <strong>in</strong>formation needs developed and how these needs are met<br />

by means of contextual and content metadata<br />

search<strong>in</strong>g behavior<br />

Regard<strong>in</strong>g the performance of automatic categorization <strong>in</strong> the doma<strong>in</strong> different th<strong>in</strong>gs<br />

have been learned:<br />

Categorization is supportive to users <strong>in</strong> tasks, where a new perspective of a task<br />

is needed, either <strong>in</strong> the form of suggestions for new search terms or <strong>in</strong> offer<strong>in</strong>g<br />

an understand<strong>in</strong>g of the facets conta<strong>in</strong>ed <strong>in</strong> the search task. In addition<br />

categorization supports users <strong>in</strong> reduc<strong>in</strong>g large search results. In verificative


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

searches categorization is less useful, if highly relevant documents are retrieved<br />

fast. In those cases categorization reduces efficiency.<br />

Categorization have primarily been tested <strong>in</strong> larger collection than the present<br />

test collection. From the present results it has been learned that categorization is<br />

also useful <strong>in</strong> smaller document collections.<br />

However, <strong>in</strong> order to be able to be supportive to the user group an appropriate<br />

level of specificity must be expressed through the KOS. In addition, the<br />

categorization of documents must be correct and meet the employees’<br />

understand<strong>in</strong>g of the doma<strong>in</strong>.<br />

9.3 Recommendations for future work<br />

In cont<strong>in</strong>uation of the conclusions drawn above, the present section suggests<br />

recommendations for future work. Thus, though the thesis have added to our<br />

knowledge about professional e-<strong>government</strong> <strong>in</strong>formation seek<strong>in</strong>g and the ability of<br />

automatic categorization to support this behavior, new question have arisen, that<br />

rema<strong>in</strong>s to be answered. The suggestions are divided <strong>in</strong> two: Suggestions regard<strong>in</strong>g the<br />

empirical sett<strong>in</strong>g and suggestions regard<strong>in</strong>g the tools applied <strong>in</strong> the study.<br />

The empirical stett<strong>in</strong>g of the thesis was a case study of SKAT, the largest<br />

<strong>government</strong> agency <strong>in</strong> Denmark. We have touched upon the long length of service of<br />

the employees and its implications for <strong>in</strong>formation needs and seek<strong>in</strong>g behavior. This is<br />

not necessarily a general tendency. Therefore it would be <strong>in</strong>terest<strong>in</strong>g to <strong>in</strong>vestigate, if<br />

the behavior is different <strong>in</strong> smaller <strong>government</strong>s.<br />

As a consequence of the <strong>in</strong>formation needs characterized <strong>in</strong> the doma<strong>in</strong> study,<br />

low complexity simulated search tasks were used as the po<strong>in</strong>t of departure of the search<br />

test along with one genu<strong>in</strong>e search task. It was found that system B performed better <strong>in</strong><br />

some variables <strong>in</strong> relation to the genu<strong>in</strong>e search task as the only task. For that reason it<br />

would be enrich<strong>in</strong>g to explore the performance of e-<strong>government</strong> categorization <strong>in</strong> a<br />

study designed to reflect genu<strong>in</strong>e search tasks to a larger extent. In this connection<br />

another question arises. Thus, we have not been with<strong>in</strong> the aim of the test design to<br />

state the performance of categorization <strong>in</strong> terms of more complex <strong>in</strong>formation needs. A<br />

study <strong>in</strong>vestigat<strong>in</strong>g just complex <strong>in</strong>formation needs <strong>in</strong> e-<strong>government</strong> would add further<br />

to our knowledge of the performance of categorization <strong>in</strong> the doma<strong>in</strong>. Lastly, <strong>in</strong> relation<br />

to the empirical sett<strong>in</strong>g, the search test provided an <strong>in</strong>sight <strong>in</strong>to employees’ use of a<br />

system for a short amount of time <strong>in</strong> a system that was new to them. This has<br />

220


221<br />

Chapter 9<br />

advantages, which have been outl<strong>in</strong>ed <strong>in</strong> the empirical framework. However a study<br />

<strong>in</strong>vestigat<strong>in</strong>g categorization <strong>in</strong> a more natural sett<strong>in</strong>g could add other perspectives to our<br />

understand<strong>in</strong>g of the field.<br />

The search test has applied different tools. The tools have also raised questions<br />

to be asked ahead. The categorization made use of a two level taxonomy for arrang<strong>in</strong>g<br />

search results. Different op<strong>in</strong>ions have been put forward from the test persons as to the<br />

specificity of the taxonomy. As it was not a part of the purpose and design of the search<br />

test to address this question, we have not been able to validate the cause of the<br />

differences. With<strong>in</strong> professional users it would therefore be <strong>in</strong>terest<strong>in</strong>g to ga<strong>in</strong> more<br />

knowledge of what the appropriate specificity and choice of concepts with<strong>in</strong> KOS like<br />

taxonomies is. In addition the project have <strong>in</strong>vestigated automatic assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong><br />

terms of automated categorization and found it supportive <strong>in</strong> e-<strong>government</strong> seek<strong>in</strong>g<br />

behavior. Investigations of other types of assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> would <strong>in</strong>crease our<br />

knowledge of the relative performance of different assigned <strong><strong>in</strong>dex<strong>in</strong>g</strong> methods <strong>in</strong> the<br />

doma<strong>in</strong>.<br />

This thesis adds to our knowledge about professional e-<strong>government</strong> seek<strong>in</strong>g behavior,<br />

and has <strong>in</strong>creased our understand<strong>in</strong>g of how this behavior can be supported by<br />

automatic <strong><strong>in</strong>dex<strong>in</strong>g</strong>.


10 References<br />

223<br />

References<br />

Abecker, A., Bernardi, A., H<strong>in</strong>kelmann, K., Kühn, O. & S<strong>in</strong>tek, M. (1998). Toward a<br />

technology for organizational memories. IEEE Intelligent Systems, 13(3), 40-48.<br />

Ahlgren, P. & Kekälä<strong>in</strong>en, J. (2007). Index<strong>in</strong>g strategies for Swedish full text retrieval<br />

under different user scenarios. Information Process<strong>in</strong>g & Management, 43(1),<br />

81-102.<br />

Aitchison, J. (1992). Index<strong>in</strong>g languages and <strong><strong>in</strong>dex<strong>in</strong>g</strong>. In: Dossett, P. (Ed.), Handbook<br />

of Special Librarianship and Information Work (6. ed., pp. 191-233). London:<br />

Aslib.<br />

Alasem, A. (2009). An overview of e-Government metadata standards and Initiatives<br />

based on Dubl<strong>in</strong> Core. Electronic Journal of e-Government, 7(1), 1-10.<br />

Alavi, M. & Leidner, D.E. (2001). Review: Knowledge management and knowledge<br />

management systems: Conceptual foundations and research issues. MIS<br />

Quarterly, 25(1), 107-136.<br />

Albrechtsen, H. (1993). Subject analysis and <strong><strong>in</strong>dex<strong>in</strong>g</strong>: from automated <strong><strong>in</strong>dex<strong>in</strong>g</strong> to<br />

doma<strong>in</strong> analysis. The Indexer, 18(4), 219-224.<br />

Andersen, K.V., Grönlund, Å., Moe, C.E. & Se<strong>in</strong>, M.K. (2005). Introduction to the<br />

special issue. Scand<strong>in</strong>avian Journal of Information Systems, 17(2), 3-10.<br />

Andersen, K.V. & Kraemer, K.L. (1994). Information technology and transitions <strong>in</strong> the<br />

public service: A comparison of Scand<strong>in</strong>avia and the United States.<br />

Scand<strong>in</strong>avian Journal of Information Systems, 6(1), 3-24.<br />

Anderson, J.D. & Perez-Carballo, J. (2001a). The nature of <strong><strong>in</strong>dex<strong>in</strong>g</strong>: How humans and<br />

mach<strong>in</strong>es analyze messages and texts for retrieval. Part I: Research, and the<br />

nature of human <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Information Process<strong>in</strong>g & Management, 37(2), 231-<br />

254.<br />

Anderson, J.D. & Perez-Carballo, J. (2001b). The nature of <strong><strong>in</strong>dex<strong>in</strong>g</strong>: How humans and<br />

mach<strong>in</strong>es analyze messages and texts for retrieval. Part II: Mach<strong>in</strong>e <strong><strong>in</strong>dex<strong>in</strong>g</strong>,<br />

and the allocation of human versus mach<strong>in</strong>e effort. Information Process<strong>in</strong>g &<br />

Management, 37(2), 255-277.<br />

Anderson, J.D. & Pérez-Carballo, J. (2005). Information Retrieval Design: Pr<strong>in</strong>ciples<br />

and Options for Information Description, Organization, Display, and Access <strong>in</strong><br />

Information Retrieval Databases, Digital Libraries, Catalogs, and Indexes. St.<br />

Petersburg: Ometeca Institute.<br />

Andrews, J. & Duhon, L. (1997). GILS, Government Information Locator Service:<br />

Blend<strong>in</strong>g old and new to access U.S. <strong>government</strong>al <strong>in</strong>formation. The Serials<br />

Librarian, 31(1-2), 327-333.<br />

Apté, C., Damerau, F. & Weiss, S.M. (1994). Automated learn<strong>in</strong>g of decision rules for<br />

text categorization. ACM Transactions on Information Systems, 12(3), 233-251.<br />

Arellano-Gault, D. & del Castillo-Vega, A. (2004). Maturation of public adm<strong>in</strong>istration<br />

<strong>in</strong> a multicultural environment: Lessons from the Anglo-Saxon, Lat<strong>in</strong>, and<br />

Scand<strong>in</strong>avian political traditions. International Journal of Public<br />

Adm<strong>in</strong>istration, 27(7), 519-528.<br />

Askim, J. (2007). How do politicians use performance <strong>in</strong>formation? An analysis of the<br />

Norwegian local <strong>government</strong> experience. International Review of Adm<strong>in</strong>istrative<br />

Sciences, 73(3), 453-472.<br />

Askim, J. (2009). The demand side of performance measurement: Expla<strong>in</strong><strong>in</strong>g<br />

councillors' utilization of performance <strong>in</strong>formation <strong>in</strong> policymak<strong>in</strong>g.<br />

International Public Management Journal, 12(1), 24-47.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Attar, K.E. (2006). Why appo<strong>in</strong>t professionals? A student catalogu<strong>in</strong>g project. Journal<br />

of Librarianship and Information Science, 38(3), 173-185.<br />

Attfield, S., Blandford, A. & Makri, S. (2010). Social and <strong>in</strong>teractional practices for<br />

dissem<strong>in</strong>at<strong>in</strong>g current awareness <strong>in</strong>formation <strong>in</strong> an organisational sett<strong>in</strong>g.<br />

Information Process<strong>in</strong>g & Management, 46(6), 632-645.<br />

Bates, M.J. (1979). Information search tactics. Journal of the American Society for<br />

Information Science, 30(4), 205-214.<br />

Becker, J., Pfeiffer, D. & Räckers, M. (2007). Doma<strong>in</strong> specific process modell<strong>in</strong>g <strong>in</strong><br />

public adm<strong>in</strong>istrations: The PICTURE approach. In: Wimmer, M.A., Scholl,<br />

H.J. & Grönlund, Å., (Eds.), EGOV 2007, (pp. 68-79). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />

Beghtol, C. (1986). Bibliographic classification theory and text l<strong>in</strong>guistics: aboutness<br />

analysis, <strong>in</strong>tertextuality and the cognitive act of classify<strong>in</strong>g documents. Journal<br />

of Documentation, 42(2), 84-113.<br />

Bekkers, V. & Homburg, V. (2007). The myths of e-<strong>government</strong>: Look<strong>in</strong>g beyond the<br />

assumptions of a new and better <strong>government</strong>. Information Society, 23(5), 373-<br />

382.<br />

Belk<strong>in</strong>, N.J. & Croft, W.B. (1992). Information filter<strong>in</strong>g and <strong>in</strong>formation retrieval: Two<br />

sides of the same co<strong>in</strong>? Communications of the ACM, 35(12), 29-38.<br />

Belk<strong>in</strong>, N.J., Oddy, R.N. & Brooks, H.M. (1982). ASK for <strong>in</strong>formation retrieval: Part 1.<br />

Background and theory. Journal of Documentation, 38(2), 61-71.<br />

Bellamy, C. (2002). From automation to knowledge management: Moderniz<strong>in</strong>g British<br />

<strong>government</strong> with ICTS. International Review of Adm<strong>in</strong>istrative Sciences, 68(2),<br />

213-230.<br />

Bellamy, C. & Taylor, J.A. (1998). Govern<strong>in</strong>g <strong>in</strong> the Information Age. Buck<strong>in</strong>gham:<br />

Open University Press.<br />

Berrios, D.C., Cuc<strong>in</strong>a, R.J. & Fagan, L.M. (2002). Methods for semi-automated<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> for high precision <strong>in</strong>formation retrieval. Journal of the American<br />

Medical Informatics Association, 9(6), 637-652.<br />

Bertot, J.C., Jaeger, P.T. & Grimes, J.M. (2010). Us<strong>in</strong>g ICTs to create a culture of<br />

transparency: E-<strong>government</strong> and social media as openness and anti-corruption<br />

tools for societies. Government Information Quarterly, 27(3), 264-271.<br />

Beynon-Davies, P. (2007). Models for e-<strong>government</strong>. Transform<strong>in</strong>g Government:<br />

People, Process and Policy, 1(1), 7-28.<br />

Bigdeli, Z. (2007). Iranian eng<strong>in</strong>eers' <strong>in</strong>formation needs and seek<strong>in</strong>g habits: An agro<strong>in</strong>dustry<br />

company experience. Information Research, 12(2).<br />

Blair, D.C. (2002). The challenge of commercial document retrieval, Part I: Major<br />

issues, and a framework based on search exhaustivity, determ<strong>in</strong>acy of<br />

representation and document collection size. Information Process<strong>in</strong>g &<br />

Management, 38(2), 273-291.<br />

Blair, D.C. & Maron, M.E. (1985). An evaluation of retrieval effectiveness for a fulltext<br />

document-retrieval system. Communications of the ACM, 28(3), 289-299.<br />

Blomgren, L., Vallo, H. & Byström, K. (2004). Evaluation of an <strong>in</strong>formation system <strong>in</strong><br />

an <strong>in</strong>formation seek<strong>in</strong>g process. In: Heery, R. & Lyon, L. (Eds.), ECDL 2004<br />

(pp. 57-68). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />

Bloomfield, M. (2002). Index<strong>in</strong>g: Neglected and poorly understood. Catalog<strong>in</strong>g &<br />

Classification Quarterly, 33(1), 63-75.<br />

Bloor, M., Frankland, J., Thomas, M. & Robson, K. (2001). Focus Groups <strong>in</strong> Social<br />

Research. London: Sage.<br />

Borko, H. (1977). Toward a theory of <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Information Process<strong>in</strong>g &<br />

Management, 13, 355-365.<br />

Borlund, P. (2000). Experimental components for the evaluation of <strong>in</strong>teractive<br />

<strong>in</strong>formation retrieval systems. Journal of Documentation, 56(1), 71-90.<br />

224


225<br />

References<br />

Borlund, P. (2003a). The concept of relevance <strong>in</strong> IR. Journal of the American Society<br />

for Information Science and Technology, 54(10), 913-925.<br />

Borlund, P. (2003b). The IIR evaluation model: A framework for evaluation of<br />

<strong>in</strong>teractive <strong>in</strong>formation retrieval systems. Information Research, 8(3).<br />

Borlund, P. & Ingwersen, P. (1997). The development of a method for the evaluation of<br />

<strong>in</strong>teractive <strong>in</strong>formation retrieval systems. Journal of Documentation, 53(3), 225-<br />

250.<br />

Borlund, P. & Schneider, J.W. (2010). Reconsideration of the simulated work task<br />

situation: A context <strong>in</strong>strument for evaluation of <strong>in</strong>formation retrieval<br />

<strong>in</strong>teraction. In: Belk<strong>in</strong>, N.J. & Kelly, D. (Eds.), IIiX 2010. New Brunswick, New<br />

Jersey: ACM.<br />

Bountouri, L., Papatheodorou, C., Soulikias, V. & Stratis, M. (2009). Metadata<br />

<strong>in</strong>teroperability <strong>in</strong> public sector <strong>in</strong>formation. Journal of Information Science,<br />

35(2), 204-231.<br />

Box, R.C. (1999). Runn<strong>in</strong>g <strong>government</strong> like a bus<strong>in</strong>ess: Implications for public<br />

adm<strong>in</strong>istration theory and practice. The American Review of Public<br />

Adm<strong>in</strong>istration, 29(1), 19-43.<br />

Brown, D. (2005). Electronic <strong>government</strong> and public adm<strong>in</strong>istration. International<br />

Review of Adm<strong>in</strong>istrative Sciences, 71(2), 241-254.<br />

Buck<strong>in</strong>gham, A. & Saunders, P. (2004). The Survey Methods Workbook: From Design<br />

to Analysis. Cambridge: Polity Press.<br />

Byström, K. (1997). Municipal adm<strong>in</strong>istrators at work: Information needs and seek<strong>in</strong>g<br />

(IN&S) <strong>in</strong> relation to task complexity: A case-study amongst municipal officials,<br />

Information Seek<strong>in</strong>g <strong>in</strong> Context. Tampere, F<strong>in</strong>land: Taylor Graham.<br />

Byström, K. (1999). Task Complexity, Information Types and Information Sources.<br />

Unpublished Doctoral dissertation, University of Tampere, Tampere.<br />

Byström, K. (2002). Information and <strong>in</strong>formation sources <strong>in</strong> tasks of vary<strong>in</strong>g<br />

complexity. Journal of the American Society for Information Science and<br />

Technology, 53(7), 581-591.<br />

Byström, K. & Hansen, P. (2005). Conceptual framework for tasks <strong>in</strong> <strong>in</strong>formation<br />

studies. Journal of the American Society for Information Science and<br />

Technology, 56(10), 1050-1061.<br />

Byström, K. & Järvel<strong>in</strong>, K. (1995). Task complexity affects <strong>in</strong>formation seek<strong>in</strong>g and<br />

use. Information Process<strong>in</strong>g & Management, 31(2), 191-213.<br />

Carey, M.A. & Smith, M.W. (1994). Captur<strong>in</strong>g the group effect <strong>in</strong> focus groups: A<br />

special concern <strong>in</strong> analysis. Qualitative Health Research, 4(1), 123-127.<br />

Carm<strong>in</strong>es, E.G. & Woods, J.A. (2005). Reliability assessment. In: Encyclopedia of<br />

Social Measurement (Vol. 3, pp. 361-365).<br />

Case, D.O. (2006). Information behavior. Annual Review of Information Science and<br />

Technology, 40, 293-327.<br />

Case, D.O. (2007). Look<strong>in</strong>g for Information: A Survey of Research on Information<br />

Seek<strong>in</strong>g, Needs, and Behavior. Amsterdam: Elsevier.<br />

Center for effektiviser<strong>in</strong>g og digitaliser<strong>in</strong>g (2002). Prospekt for FESD (Fællesoffentlig<br />

Elektronisk Sags- og Dokumenthåndter<strong>in</strong>g). Retrieved 13-03, 2011, from<br />

http://moderniser<strong>in</strong>g.dk/fileadm<strong>in</strong>/user_upload/documents/Projekter/FESD/Bagg<br />

rund/FESD-prospekt.pdf.<br />

Chau, M., Fang, X. & Sheng, O.R.L. (2007). What are people search<strong>in</strong>g on <strong>government</strong><br />

web sites? A study of search activity on the Utah.gov web site. Communications<br />

of the ACM, 50(4), 87-92.<br />

Chaudhry, A.S. (2010). Assessment of taxonomy build<strong>in</strong>g tools. The Electronic<br />

Library, 28(6), 769-788.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Chen, H. (1995). Mach<strong>in</strong>e learn<strong>in</strong>g for <strong>in</strong>formation retrieval: Neural networks, symbolic<br />

learn<strong>in</strong>g, and genetic algorithms. Journal of the American Society for<br />

Information Science, 46(3), 194-216.<br />

Choi, Y. (2010a). Enhanc<strong>in</strong>g access to the Web: Vocabulary analysis on users' tags and<br />

professionals' <strong>in</strong>dex terms, iConference. University of Ill<strong>in</strong>ois at Urbana-<br />

Champaign, Ill<strong>in</strong>ois, U.S.A.<br />

Choi, Y. (2010b). Traditional versus emerg<strong>in</strong>g knowledge organization systems:<br />

Consistency of subject <strong><strong>in</strong>dex<strong>in</strong>g</strong> of the Web by <strong>in</strong>dexers and taggers, ASIST<br />

2010. Pittsburgh, PA, USA.<br />

Choo, C.W. (2006). The Know<strong>in</strong>g Organization: How Organizations Use Information<br />

to Construct Mean<strong>in</strong>g, Create Knowledge, and Make Decisions (2. ed.). New<br />

York: Oxford University Press.<br />

Choo, C.W., Furness, C., Paquette, S., van den Berg, H., Detlor, B., Bergeron, P. &<br />

Heaton, L. (2006). Work<strong>in</strong>g with <strong>in</strong>formation: Information management and<br />

culture <strong>in</strong> a professional services organization. Journal of Information Science,<br />

32(6), 491-510.<br />

Chowdhury, G.G. (2003). Natural language process<strong>in</strong>g. Annual Review of Information<br />

Science and Technology, 37, 51-89.<br />

Chowdhury, G.G. (2004). Introduction to Modern Information Retrieval (2. ed.).<br />

London: Facet.<br />

Christian, E. (1999). Experiences with <strong>in</strong>formation locator services. Journal of<br />

Government Information, 26(3), 271-285.<br />

Christian, E. (2001). A metadata <strong>in</strong>itiative for global <strong>in</strong>formation discovery.<br />

Government Information Quarterly, 18(3), 209-221.<br />

Clark, H.H. & Schober, M.F. (1992). Ask<strong>in</strong>g questions and <strong>in</strong>fluenc<strong>in</strong>g answers. In:<br />

Tanur, J.M. (Ed.), Questions about Questions: Inquiries <strong>in</strong>to the Cognitive Bases<br />

of Surveys (pp. 15-48). New York: Russel Sage Foundation.<br />

Cleverdon, C. (1967). The Cranfield tests on <strong>in</strong>dex language devices. Aslib<br />

Proceed<strong>in</strong>gs, 19(6), 173-194.<br />

Cleverdon, C. & Keen, M. (1966). Aslib Cranfield research project. Factors<br />

determ<strong>in</strong><strong>in</strong>g the performance of <strong><strong>in</strong>dex<strong>in</strong>g</strong> systems. Volume 2: Test results.<br />

Cranfield: College of Aeronautics.<br />

Cleverdon, C.W. (1960). ASLIB Cranfield Research Project: Report on the first stage of<br />

an <strong>in</strong>vestigation <strong>in</strong>to the comparative efficiency of <strong><strong>in</strong>dex<strong>in</strong>g</strong> systems. Cranfield:<br />

College of Aeronautics.<br />

Codagnone, C. & Wimmer, M.A., (Eds.). (2007). Roadmapp<strong>in</strong>g eGovernment<br />

Research: Visions and Measures towards Innovative Governments <strong>in</strong> 2020.<br />

[Koblentz]: eGovRTD2020 Project Consortium.<br />

Cole, C. & Leide, J. (2006). A cognitive framework for human <strong>in</strong>formation behavior:<br />

The place of metaphor <strong>in</strong> human <strong>in</strong>formation organiz<strong>in</strong>g behavior. In: Sp<strong>in</strong>k, A.<br />

& Cole, C. (Eds.), New Directions <strong>in</strong> Human Information Behavior (Vol. 8, pp.<br />

171-202). Netherlands: Spr<strong>in</strong>ger.<br />

Cong, X. & Pandya, K.V. (2003). Issues of knowledge management <strong>in</strong> the public sector.<br />

Electronic Journal of Knowledge Management, 1(2), 25-33.<br />

Connaway, L.S., Dickey, T.J. & Radford, M.L. (2011). "If it is too <strong>in</strong>convenient I'm not<br />

go<strong>in</strong>g after it": Convenience as a critical factor <strong>in</strong> <strong>in</strong>formation-seek<strong>in</strong>g<br />

behaviors. Library & Information Science Research, 33(3), 179-190.<br />

Cook, C., Heath, F. & Thompson, R. (2000). A meta-analysis of response rates <strong>in</strong> Web-<br />

or <strong>in</strong>ternet-based surveys. Educational and psychological measurement, 60(6),<br />

821-836.<br />

Cooper, W.S. (1969). Is <strong>in</strong>ter<strong>in</strong>dexer consistency a hobgobl<strong>in</strong>?' American<br />

Documentation, 20(3), 268-279.<br />

226


227<br />

References<br />

Courtright, C. (2007). Context <strong>in</strong> <strong>in</strong>formation behavior research. Annual Review of<br />

Information Science and Technology, 41, 273-306.<br />

Cous<strong>in</strong>s, S.A. (1992). Enhanc<strong>in</strong>g subject access to opacs: Controlled vocabulary vs.<br />

natural language. Journal of Documentation, 48(3), 291-309.<br />

Coyle, K. (2008). Mach<strong>in</strong>e <strong><strong>in</strong>dex<strong>in</strong>g</strong>. The Journal of Academic Librarianship, 34(6),<br />

530-531.<br />

Crawford, J. & Irv<strong>in</strong>g, C. (2009). Information literacy <strong>in</strong> the workplace: A qualitative<br />

exploratory study. Journal of Librarianship and Information Science, 42(1), 29-<br />

38.<br />

Croft, W.B., Turtle, H.R. & Lewis, D.D. (1991, October 13-16.). The use of phrases and<br />

structured queries <strong>in</strong> <strong>in</strong>formation retrieval. In: Bookste<strong>in</strong>, A., Chiaramella, Y.,<br />

Salton, G. & Raghavan, V.V., (Eds.), Proceed<strong>in</strong>gs of the 14th Annual<br />

International ACM SIGIR Conference on Research and Development <strong>in</strong><br />

Information Retrieval, (pp. 32-45). Chicago, Ill<strong>in</strong>ois, USA: New York: ACM.<br />

Cuillier, D. & Piotrowski, S.J. (2009). Internet <strong>in</strong>formation-seek<strong>in</strong>g and its relation to<br />

support for access to <strong>government</strong> records. Government Information Quarterly,<br />

26(3), 441-449.<br />

Cunn<strong>in</strong>gham, S.J., Litt<strong>in</strong>, J. & Witten, I.H. (1997). Applications of Mach<strong>in</strong>e Learn<strong>in</strong>g <strong>in</strong><br />

Information Retrieval (Work<strong>in</strong>g Paper 97/6). Hamilton, New Zealand: The<br />

University of Waikato, Department of Computer Science.<br />

Davies, K. (2007). The <strong>in</strong>formation-seek<strong>in</strong>g behaviour of doctors: A review of the<br />

evidence. Health Information and Libraries Journal, 24(2), 78-94.<br />

Dawes, S.S. (2009). Governance <strong>in</strong> the digital age: A research and action framework for<br />

an uncerta<strong>in</strong> future. Government Information Quarterly, 26(2), 257-264.<br />

de Groot, D. (2003). Vigorous knowledge management <strong>in</strong> the Dutch public sector. In:<br />

Wimmer, M.A. (Ed.), 4th IFIP International Work<strong>in</strong>g Conference, KMGov 2003<br />

(pp. 94-99): Spr<strong>in</strong>ger.<br />

de Jong, M. & Lentz, L. (2006). Municipalities on the Web: User-Friendl<strong>in</strong>ess of<br />

Government Information on the Internet. In: Wimmer, M., Scholl, H., Grönlund,<br />

Å. & Andersen, K. (Eds.), Electronic Government, 5th International<br />

Conference, EGOV 2006 (pp. 174-185). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />

De Mey, M. (1977). The cognitive viewpo<strong>in</strong>t: Its development and its scope. In: De<br />

Mey, M., P<strong>in</strong>xten, R., Poriau, m. & Vandamme, F. (Eds.), International<br />

Workshop on the Cognitive Viewpo<strong>in</strong>t (pp. xvi-xxxii). Ghent, Belgium:<br />

University of Ghent.<br />

de Vaus, D. (2002b). Surveys <strong>in</strong> Social Research (5. ed.). London: Routledge.<br />

Del Fiol, G., Haug, P.J., Cim<strong>in</strong>o, J.J., Narus, S.P., Norl<strong>in</strong>, C. & Mitchell, J.A. (2008).<br />

Effectiveness of topic-specific <strong>in</strong>fobuttons: A randomized controlled trial.<br />

Journal of the American Medical Informatics Association, 15(6), 752-759.<br />

Dempsey, L. & Heery, R. (1998). Metadata: A current view of practice and issues.<br />

Journal of Documentation, 54(2), 145-172.<br />

Dias, C. (2001). Corporate portals: A literature review of a new concept <strong>in</strong> Information<br />

Management. International Journal of Information Management, 21(4), 269-<br />

287.<br />

Dietterich, T.G. (1997). Mach<strong>in</strong>e-learn<strong>in</strong>g research: Four current directions. AI<br />

Magaz<strong>in</strong>e, 18(4), 97-136.<br />

du Plessis, T. & du Toit, A.S.A. (2006). Knowledge management and legal practice.<br />

International Journal of Information Management, 26(5), 360-371.<br />

Dubois, C.P.R. (1987). Free text versus controlled vocabulary. Onl<strong>in</strong>e Review, 11(4),<br />

243-253.<br />

Dumais, S., Platt, J., Heckerman, D. & Sahami, M. (1998). Inductive learn<strong>in</strong>g<br />

algorithms and representations for text categorization. In: Makki, K. &<br />

Bouganim, L. (Eds.), CIKM '98 Proceed<strong>in</strong>gs of the seventh <strong>in</strong>ternational


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

conference on Information and knowledge management (pp. 148-155). New<br />

York: ACM.<br />

Dörfler, A. (2003). Bus<strong>in</strong>ess process modell<strong>in</strong>g and help systems as part of KM <strong>in</strong> e<strong>government</strong>.<br />

In: Wimmer, M.A., (Ed.), KMGov, (pp. 297-303).<br />

Edmiston, K.D. (2003). State and local e-<strong>government</strong>: Prospects and challenges. The<br />

American Review of Public Adm<strong>in</strong>istration, 33(1), 20-45.<br />

Edmunds, A. & Morris, A. (2000). The problem of <strong>in</strong>formation overload <strong>in</strong> bus<strong>in</strong>ess<br />

organisations: a review of the literature. International Journal of Information<br />

Management, 20(1), 17-28.<br />

Efron, M., Elsas, J., Marchion<strong>in</strong>i, G. & Zhang, J. (2004). Mach<strong>in</strong>e learn<strong>in</strong>g for<br />

<strong>in</strong>formation architecture <strong>in</strong> a large <strong>government</strong>al website. In: Proceed<strong>in</strong>gs of the<br />

4th ACM/IEEE-CS jo<strong>in</strong>t conference on Digital libraries, (pp. 151-159). Tuscon,<br />

AZ, USA.<br />

El-Sherb<strong>in</strong>i, M. & Klim, G. (2004). Metadata and catalog<strong>in</strong>g practices. The Electronic<br />

Library, 22(3), 238-248.<br />

Ellis, D. (1989). A behavioural approach to <strong>in</strong>formation retrieval system design. Journal<br />

of Documentation, 45(3), 171-212.<br />

Elwood, S. (2008). Grassroots groups as stakeholders <strong>in</strong> spatial data <strong>in</strong>frastructures:<br />

Challenges and opportunities for local data development and shar<strong>in</strong>g.<br />

International Journal of Geographical Information Science, 22(1), 71-90.<br />

Ely, M. (1991). Do<strong>in</strong>g Qualitative Research: Circles With<strong>in</strong> Circles. London:<br />

Routledge.<br />

Eppler, M.J. & Mengis, J. (2004). The concept of <strong>in</strong>formation overload: A review of<br />

literature from organization science, account<strong>in</strong>g, market<strong>in</strong>g, MIS, and related<br />

discipl<strong>in</strong>es. The Information Society, 20(5), 325-344.<br />

Evans, J.R. & Mathur, A. (2005). The value of onl<strong>in</strong>e surveys. Internet Research, 15(2),<br />

195 -219.<br />

Fag<strong>in</strong>, R., Kumar, R., McCurley, K.S., Novak, J., Sivakumar, D., Toml<strong>in</strong>, J.A. &<br />

Williamson, D.P. (2003, May 20–24, 2003). Search<strong>in</strong>g the workplace web. In:<br />

WWW2003: Proceed<strong>in</strong>gs of the 12th <strong>in</strong>ternational conference on World Wide<br />

Web, (pp. 366-375). Budapest, Hungary.<br />

Fang, Z. (2002). E-<strong>government</strong> <strong>in</strong> digital era: Concept, practice, and development.<br />

International Journal of The Computer, The Internet and Management, 10(2), 1-<br />

22.<br />

Fangmeyer, H. (1974). Semi <strong>Automatic</strong> Index<strong>in</strong>g: State of the Art. Neuilly Sur Se<strong>in</strong>e,<br />

France: North Atlantic Treaty Organization.<br />

Feldman, S. & Sherman, C. (2001). The High Cost of Not F<strong>in</strong>d<strong>in</strong>g Information.<br />

Retrieved 21-03, 2010, from<br />

http://www.ejitime.com/materials/IDC%20on%20The%20High%20Cost%20Of<br />

%20Not%20F<strong>in</strong>d<strong>in</strong>g%20Information.pdf.<br />

Fidel, R. (1994). User-centred <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Journal of the American Society for<br />

Information Science, 45(8), 572-576.<br />

Floropoulos, J., Spathis, C., Halvatzis, D. & Tsipouridou, M. (2010). Measur<strong>in</strong>g the<br />

success of the Greek Taxation Information System. International Journal of<br />

Information Management, 30(1), 47-56.<br />

Ford, F.N. (1985). Decision support systems and expert systems: A comparison.<br />

Information & Management, 8(1), 21-26.<br />

Foster, A. & Ford, N. (2003). Serendipity and <strong>in</strong>formation seek<strong>in</strong>g: An empirical study.<br />

Journal of Documentation, 59(3), 321-340.<br />

Fourie, I. (2009). Learn<strong>in</strong>g from research on the <strong>in</strong>formation behaviour of healthcare<br />

professionals: A review of the literature 2004–2008 with a focus on emotion.<br />

Health Information and Libraries Journal, 26(3), 171-186.<br />

228


229<br />

References<br />

Fox, C. (1989). A stop list for general text. Newsletter ACM SIGIR Forum, 24(1-2), 19-<br />

35.<br />

Fox, C. (1992). Lexical analysis and stoplists. In: Frakes, W.B. & Baeza-Yates, R.<br />

(Eds.), Information Retrieval: Data Structures & Algorithms (pp. 102-130).<br />

Englewood Cliffs, New Jersey: Prentice Hall.<br />

Frakes, W.B. (1992). Stemm<strong>in</strong>g algorithms. In: Frakes, W.B. & Baeza-Yates, R. (Eds.),<br />

Information Retrieval: Data Structures & Algorithms (pp. 131-160). Englewood<br />

Cliffs, New Jersey: Prentice Hall.<br />

Frankfort-Nachmias, C. & Nachmias, D. (1996). Research Methods <strong>in</strong> the Social<br />

Sciences (5. ed.). London: Arnold.<br />

Freund, L., Toms, E.G. & Waterhouse, J. (2005). Model<strong>in</strong>g the <strong>in</strong>formation behaviour<br />

of software eng<strong>in</strong>eers us<strong>in</strong>g a work - task framework. Proceed<strong>in</strong>gs of the<br />

American Society for Information Science and Technology, 42(1).<br />

Fu, J.-R., Farn, C.-K. & Chao, W.-P. (2006). Acceptance of electronic tax fil<strong>in</strong>g: A<br />

study of taxpayer <strong>in</strong>tentions. Information & Management, 43(1), 109-126.<br />

Fugmann, R. (1993). Subject Analysis and Index<strong>in</strong>g: Theoretical Foundation and<br />

Practical Advice. Frankfurt/Ma<strong>in</strong>: Indeks Verlag.<br />

Galvez, C., de Moya-Anegon, F. & Solana, V.H. (2005). Term conflation methods <strong>in</strong><br />

<strong>in</strong>formation retrieval: Non-l<strong>in</strong>guistic and l<strong>in</strong>guistic approaches. Journal of<br />

Documentation, 61(4), 520-547.<br />

Garcia, A.C., Dawes, M.E., Kohne, M.L., Miller, F.M. & Groschwitz, S.F. (2006).<br />

Workplace studies and technological change. Annual Review of Information<br />

Science and Technology, 40(1), 393-437.<br />

Gil-Garcia, J.R. & Mart<strong>in</strong>ez-Moyano, I.J. (2007). Understand<strong>in</strong>g the evolution of e<strong>government</strong>:<br />

The <strong>in</strong>fluence of systems of rules on public sector dynamics.<br />

Government Information Quarterly, 24, 266-290.<br />

Gilchrist, A. (2001). Corporate taxonomies: Report on a survey of current practice.<br />

Onl<strong>in</strong>e Information Review, 25(2), 94-103.<br />

Gilchrist, A. (2003). Tesauri, taxonomies and ontologies: An etymological note. Journal<br />

of Documentation, 59(1), 7-18.<br />

Gilliland-Swetland, A. (2005). Electronic records management. Annual Review of<br />

Information Science and Technology, 39, 219-253.<br />

Gilliland, A.J. (2008). Sett<strong>in</strong>g the stage. In: Baca, M. (Ed.), Introduction to Metadata<br />

(Onl<strong>in</strong>e version, ver. 3.0 ed.).<br />

Glassey, O. (2002). A one-stop <strong>government</strong> prototype based on use cases and scenarios.<br />

In: Traunmüller, R. & Lenk, K. (Eds.), EGOV 2002 (pp. 116-123).<br />

Glassey, O. (2004). Develop<strong>in</strong>g a one-stop <strong>government</strong> data model. Government<br />

Information Quarterly, 21(2), 156-169.<br />

Glazer, R. (1993). Measur<strong>in</strong>g the value of <strong>in</strong>formation: The <strong>in</strong>formation-<strong>in</strong>tensive<br />

organization. IBM Systems Journal, 32(1), 99-110.<br />

Goh, D.H.-L., Chua, A.Y.-K., Luyt, B. & Lee, C.S. (2008). Knowledge access, creation<br />

and transfer <strong>in</strong> e-<strong>government</strong> portals. Onl<strong>in</strong>e <strong>in</strong>formation review, 32(3), 348-369.<br />

Golder, S.A. & Huberman, B.A. (2006). Usage patterns of collaborative tagg<strong>in</strong>g<br />

systems. Journal of Information Science, 32(2), 198-208.<br />

Golub, K. (2006). Automated subject classification of textual web documents. Journal<br />

of Documentation, 62(3), 350-371.<br />

Golub, K. (2007). Automated Subject Classification of Textual Documents <strong>in</strong> the<br />

Context of Web-Based Hierarchical Brows<strong>in</strong>g. Unpublished PhD thesis, Lund<br />

University, Lund.<br />

Gomez, L.M., Lochbaum, C.C. & Landauer, T.K. (1990). All the right words: F<strong>in</strong>d<strong>in</strong>g<br />

what you want as a function of richness of <strong><strong>in</strong>dex<strong>in</strong>g</strong> vocabulary. Journal of the<br />

American Society for Information Science, 41(8), 547-559.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Gouscos, D., Lambrou, M., Mentzas, G. & Georgiadis, P. (2003). A methodological<br />

approach for def<strong>in</strong><strong>in</strong>g one-stop e-<strong>government</strong> service offer<strong>in</strong>gs. In: Traunmüller,<br />

R. (Ed.), Electronic Government (pp. 173-176). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />

Grant, G. & Chau, D. (2005). Develop<strong>in</strong>g a generic framework for e-<strong>government</strong>.<br />

Journal of Global Information Management, 13(1), 1-30.<br />

Greenbaum, T.L. (1993). The Handbook for Focus Group Research (Revised and<br />

expanded ed.). New York: Lex<strong>in</strong>gton.<br />

Gross, T. & Taylor, A.G. (2005). What have we got to lose? The effect of controlled<br />

vocabulary on keyword search<strong>in</strong>g results. College & Research Libraries, 66(3),<br />

212-230.<br />

Grundén, K. (2009). A social perspective on implementation of e-<strong>government</strong>: A<br />

longitud<strong>in</strong>al study at the County Adm<strong>in</strong>istration of Sweden. Electronic Journal<br />

of e-Government, 7(1), 65-76.<br />

Grönlund, Å. (2003). Emerg<strong>in</strong>g electronic <strong>in</strong>frastructures: Explor<strong>in</strong>g democratic<br />

components. Social Science Computer Review, 21(1), 55-72.<br />

Grönlund, Å. (2005). What's <strong>in</strong> a field: Explor<strong>in</strong>g the eGovernment doma<strong>in</strong>,<br />

Proceed<strong>in</strong>gs of the 38th Hawaii International Conference on System Sciences.<br />

Grönlund, Å. (2010). Ten years of e-<strong>government</strong>: The 'end of history' and new<br />

beg<strong>in</strong>n<strong>in</strong>g. In: Wimmer, M.A.e.a. (Ed.), Electronic Government (pp. 13-24):<br />

Spr<strong>in</strong>ger.<br />

Grönlund, Å. & Horan, T.A. (2004). Introduc<strong>in</strong>g e-gov: history, def<strong>in</strong>ition, and issues.<br />

Communications of the Association for Information Systems, 15, 713-729.<br />

Gunnlaugsdottir, J. (2008). Register<strong>in</strong>g and search<strong>in</strong>g for records <strong>in</strong> electronic records<br />

management systems. International Journal of Information Management, 28(4),<br />

293-304.<br />

Ha, L. & Zenebe, A. (2008). Knowledge management <strong>in</strong> <strong>government</strong>, The 2nd<br />

International International Conference <strong>in</strong> Knowledge Generation,<br />

Communication and Management. Orlando, Florida: International Institute of<br />

Informatics and Systemics.<br />

Halcomb, E.J. & Davidson, P.M. (2006). Is verbatim transcription of <strong>in</strong>terview data<br />

always necessary? Applied Nurs<strong>in</strong>g Research, 19(1), 38–42.<br />

Halkier, B. (2008). Fokusgrupper (2. ed.). Frederiksberg: Samfundslitteratur.<br />

Hammarström, H. (2006). A naive theory of affixation and an algorithm for extraction.<br />

In: Wicentowski, R. & Kondrak, G. (Eds.), SIGPHON '06: Proceed<strong>in</strong>gs of the<br />

Eighth Meet<strong>in</strong>g of the ACL Special Interest Group on Computational Phonology<br />

and Morphology (pp. 79-88). Stroudsburg: Association for Computational<br />

L<strong>in</strong>guistics.<br />

Harman, D.K. & Voorhees, E.M. (2006). TREC: An overview. Annual Review of<br />

Information Science and Technology, 40, 113-155.<br />

Hawk<strong>in</strong>g, D. (2004). Challenges <strong>in</strong> enterprise search. In: Proceed<strong>in</strong>gs of the 15th<br />

Australasian database conference, (pp. 15-24). Duned<strong>in</strong>, New Zealand.<br />

Hayes, P.J. & We<strong>in</strong>ste<strong>in</strong>, S.P. (1990). Construe-TIS: A system for content-based<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong> of a database of news stories. In: Rappaport, A. & Smith, R. (Eds.),<br />

The Second Conference on Innovative Applications of Artificial Intelligence<br />

(IAAI), Wash<strong>in</strong>gton, DC. Menlo Park, California: AAAI Press.<br />

Haynes, D. (2004). Metadata for Information Management and Retrieval. London:<br />

Facet.<br />

Hazlett, S.A., McAdam, R. & Beggs, V. (2008). An exploratory study of knowledge<br />

flows: A case study of Public Sector Procurement. Total Quality Management,<br />

19(1-2), 57-66.<br />

He, J., Shu, B., Li, X. & Yan, H. (2010). Effective Time Ratio: A Measure for Web<br />

Search Eng<strong>in</strong>es with Document Snippets<br />

230


231<br />

References<br />

Information Retrieval Technology. In: Cheng, P.-J., Kan, M.-Y., Lam, W. & Nakov, P.<br />

(Eds.), 6th Asia Information Retrieval Societies Conference, AIRS 2010, Taipei,<br />

Taiwan, December 1-3, 2010. Proceed<strong>in</strong>gs (Vol. 6458, pp. 73-84). Berl<strong>in</strong>:<br />

Spr<strong>in</strong>ger.<br />

Healey, J.F. (2007). The Essentials of Statistics: A Tool for Social Research. Belmont,<br />

CA: Thomson Higher Education.<br />

Hedlund, T. (2002). Compounds <strong>in</strong> dictionary-based cross-language <strong>in</strong>formation<br />

retrieval. Information Research, 7(2).<br />

Heeks, R. & Bailur, S. (2006). Analyz<strong>in</strong>g eGovernment Research: Perspectives,<br />

Philosophies, Theories, Methods and Practice (Vol. 16, iGovernment Work<strong>in</strong>g<br />

Paper Series). Manchester: University of Manchester, Institute for Development<br />

Policy and Management.<br />

Heeks, R. & Bailur, S. (2007). Analyz<strong>in</strong>g e-<strong>government</strong> research: Perspectives,<br />

philosophies, theories, methods, and practice. Government Information<br />

Quarterly, 24(2), 243-265.<br />

Helbig, N., Dawes, S.S., Mulki, F.H., Hrd<strong>in</strong>ova, J.L. & Cook, M.E. (2008).<br />

International Digital Government Research: A Reconnaissance Study: Center<br />

for Technology <strong>in</strong> Government, University at Albany, SUNY.<br />

Henriksen, H.Z. & Damsgaard, J. (2006). The rise and descent of visions for e<strong>government</strong>.<br />

In: Donnellan, B., Larsen, T.J., Lev<strong>in</strong>e, L. & DeGross, J.I. (Eds.),<br />

The Transfer and Diffusion of Information Technology got Organizational<br />

Resilience: IFIP TC8 WG 8.6 International Work<strong>in</strong>g Conference, June 7-10,<br />

2006, Galway, Ireland (pp. 275-289). New York: Spr<strong>in</strong>ger.<br />

Herzum, M., Andersen, H.H.K., Andersen, V. & Hansen, C.B. (2002). Trust <strong>in</strong><br />

<strong>in</strong>formation sources: Seek<strong>in</strong>g <strong>in</strong>formation from people, documents, and virtual<br />

agents. Interact<strong>in</strong>g with Computers, 14(5), 575-599.<br />

Herzum, M. & Pejtersen, A.M. (2000). The <strong>in</strong>formation-seek<strong>in</strong>g practices of eng<strong>in</strong>eers:<br />

search<strong>in</strong>g for documents as well as for people. Information Process<strong>in</strong>g &<br />

Management, 36(5), 761-778.<br />

Hjørland, B. (2002). Doma<strong>in</strong> analysis <strong>in</strong> <strong>in</strong>formation science: Eleven approaches<br />

traditional as well as <strong>in</strong>novative. Journal of Documentation, 58(4), 422-462.<br />

Hjørland, B. & Albrechtsen, H. (1995). Toward a new horizon <strong>in</strong> <strong>in</strong>formation science:<br />

Doma<strong>in</strong> analysis. Journal of the American Society for Information Science,<br />

46(6), 400-425.<br />

Hochstotter, N. & Koch, M. (2009). Standard parameters for search<strong>in</strong>g behaviour <strong>in</strong><br />

search eng<strong>in</strong>es and their empirical evaluation. Journal of Information Science,<br />

35(1), 45-65.<br />

Hodge, G.M. (1994). Computer-assisted database <strong><strong>in</strong>dex<strong>in</strong>g</strong>: The state-of-the-art. The<br />

Indexer, 19(1), 23-27.<br />

Homburg, V. (2004). E-<strong>government</strong> and NPM: a perfect marriage? In: Janssen, M., Sol,<br />

H.G. & Wagenaar, R.W., (Eds.), ICEC '04 Proceed<strong>in</strong>gs of the 6th <strong>in</strong>ternational<br />

conference on Electronic commerce, (pp. 547-555). New York: ACM.<br />

Hovy, E. (2008a). An outl<strong>in</strong>e for the foundations of digital <strong>government</strong> research. In:<br />

Chen, H., Brandt, L., Gregg, V., Traunmüller, R., Dawes, S., Hovy, E.,<br />

Mac<strong>in</strong>tosh, A. & Larson, C.A. (Eds.), Digital Government: E-<strong>government</strong><br />

Research, Case studies, and Implementation (pp. 43-59). New York: Spr<strong>in</strong>ger.<br />

Hu, G., Pan, W. & Wang, J. (2010). The dist<strong>in</strong>ctive lexicon and consensual conception<br />

of e-Government: an exploratory perspective. International Review of<br />

Adm<strong>in</strong>istrative Sciences, 76(3), 577-597.<br />

Hu, P.J.-H., Brown, S.A., Thong, J.Y.L., Chan, F.K.Y. & Tam, K.Y. (2008).<br />

Determ<strong>in</strong>ants of service quality and cont<strong>in</strong>uance <strong>in</strong>tention of onl<strong>in</strong>e services:The<br />

case of eTax. Journal of the American Society for Information Science and<br />

Technology, 60(2), 292-306.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Hu, P.J.-H., Hsu, F.-M., Hu, H.-f. & Chen, H. (2010). Agency satisfaction with<br />

electronic record management systems: A large-scale survey. Journal of the<br />

American Society for Information Science and Technology, 61(12), 2559-2574.<br />

Huang, J. & Efthimiadis, E.N. (2009). Analyz<strong>in</strong>g and evaluat<strong>in</strong>g query reformulation<br />

strategies <strong>in</strong> web search logs, Proceed<strong>in</strong>gs of the 18th ACM conference on<br />

Information and knowledge management. Hong Kong, Ch<strong>in</strong>a: ACM.<br />

Humphrey, S.M. (1989). Research on <strong>in</strong>teractive knowledge-based <strong><strong>in</strong>dex<strong>in</strong>g</strong>: The<br />

MedIndEx prototype. Proceed<strong>in</strong>gs of the Annual Symposium on Computer<br />

Application <strong>in</strong> Medical Care, 527-533.<br />

Hunter, J. (2009). Collaborative semantic tagg<strong>in</strong>g and annotation systems. Annual<br />

Review of Information Science and Technology, 43, 187-239.<br />

Iivonen, M. (1995). Consistency <strong>in</strong> the selection of search concepts and search terms.<br />

Information Process<strong>in</strong>g & Management, 31(2), 173-190.<br />

Ingwersen, P. (1986a). Cognitive analysis and the role of the <strong>in</strong>termediary <strong>in</strong><br />

<strong>in</strong>formation retrieval. In: Davies, R. (Ed.), Intelligent Information Systems:<br />

Progress and Prospects (pp. 206-237). Chichester: Horwood.<br />

Ingwersen, P. (1992). Information Retrieval Interaction. London: Taylor Graham.<br />

Ingwersen, P. (1994). Systemudvikl<strong>in</strong>g i et <strong>in</strong>-house miljø: Folket<strong>in</strong>gets<br />

emneordssystem som case-studie. Biblioteksarbejde, 41, 5-23.<br />

Ingwersen, P. (1996). Cognitive perspectives of <strong>in</strong>formation retrieval <strong>in</strong>teraction:<br />

elements of a cognitive IR theory. Journal of Documentation, 52(1), 3-50.<br />

Ingwersen, P. (1999). Cognitive <strong>in</strong>formation retrieval. Annual Review of Information<br />

Science and Technology, 34, 3-52.<br />

Ingwersen, P. (2000). Users <strong>in</strong> context. In: Agosti, M., Crestani, F. & Pasi, G. (Eds.),<br />

Lectures on Information Retrieval (pp. 157-178): Spr<strong>in</strong>ger.<br />

Ingwersen, P. & Järvel<strong>in</strong>, K. (2005). The Turn: Integration of Information Seek<strong>in</strong>g and<br />

Retrieval <strong>in</strong> Context. Dordrecht: Spr<strong>in</strong>ger.<br />

Ingwersen, P. & Järvel<strong>in</strong>, K. (2007). On the holistic cognitive theory for <strong>in</strong>formation<br />

retrieval: Drift<strong>in</strong>g outside the cave of the laboratory framework. In: Dom<strong>in</strong>ich,<br />

S. & Kiss, F. (Eds.), International Conference on the Theory of Information<br />

Retrieval (pp. 135-147). Budapest, Hungary: Foundation for Information<br />

Society.<br />

Ingwersen, P. & Wormell, I. (1989). Modern <strong><strong>in</strong>dex<strong>in</strong>g</strong> and retrieval tecgniques<br />

match<strong>in</strong>g different types of <strong>in</strong>formation needs. In: Koskiala, S. & Launo, R.<br />

(Eds.), Proceed<strong>in</strong>gs of the forty-fourth FID Congress held <strong>in</strong> Hels<strong>in</strong>ki, F<strong>in</strong>land,<br />

28 August-1 September, 1988 (pp. 79-90). Amsterdam: Elsevier.<br />

ISO. (1985). Documentation: Methods for Exam<strong>in</strong><strong>in</strong>g Documents, Determ<strong>in</strong><strong>in</strong>g Their<br />

Subjects and Select<strong>in</strong>g Index<strong>in</strong>g Terms (ISO 5963-1985). Geneva: International<br />

Organization for Standardization.<br />

Israel, G.D. (1992). Determ<strong>in</strong><strong>in</strong>g Sample Size (Fact Sheet PEOD-6). Ga<strong>in</strong>esville, FL:<br />

University of Florida.<br />

Jaeger, P.T. (2003). The endless wire: E-<strong>government</strong> as global phenomenon.<br />

Government Information Quarterly, 20, 323-331.<br />

Jaeger, P.T. & Thompson, K.M. (2004). Social <strong>in</strong>formation behavior and the democratic<br />

process: Information poverty, normative behavior, and electronic <strong>government</strong> <strong>in</strong><br />

the United States. Library & Information Science Research, 26(1), 94-107.<br />

Ja<strong>in</strong>, A.K., Murty, M.N. & Flynn, P.J. (1999). Data cluster<strong>in</strong>g: A review. ACM<br />

Comput<strong>in</strong>g Surveys, 31(3), 264-323.<br />

Jansen, B.J. (2006). Search log analysis: What it is, what's been done, how to do it.<br />

Library & Information Science Research, 28(3), 407-432.<br />

Jansen, B.J. & Pooch, U. (2001). A review of web search<strong>in</strong>g studies and a framework<br />

for future research. Journal of the American Society for Information Science and<br />

Technology, 52(3), 235-246.<br />

232


233<br />

References<br />

Jansen, B.J., Sp<strong>in</strong>k, A. & Saracevic, T. (2000). Real life, real users, and real needs: A<br />

study and analysis of user queries on the web. Information Process<strong>in</strong>g &<br />

Management, 36(2), 207-227.<br />

Johansen, H.C. (2007). Dansk skattehistorie: Indkomstskatter og offentlig vækst 1903-<br />

2005 (Vol. 6): Told- og Skattehistorisk Selskab.<br />

Johnson, J.D., Donohue, W.A., Atk<strong>in</strong>, C.K. & Johnson, S. (1995). A comprehensive<br />

model of <strong>in</strong>formation seek<strong>in</strong>g: Tests focus<strong>in</strong>g on a technical organization.<br />

Science Communication, 16(3), 274-303.<br />

Johnston, J. (2004). Public adm<strong>in</strong>istration: Organizational aspects. In: International<br />

Encyclopedia of the Social & Behavioral Sciences (pp. 12507-12512).<br />

Johnston, J. & Callender, G. (1997). Vulnerable <strong>government</strong>s: Inadvertent de-skill<strong>in</strong>g <strong>in</strong><br />

the new global economic and managerialist paradigm? International Review of<br />

Adm<strong>in</strong>istrative Sciences, 63(1), 41-56.<br />

Jones, W.P. & Furnas, G.W. (1987). Pictures of relevance: A geometric analysis of<br />

swimilarity measures. Journal of the American Society for Information Science,<br />

38(6), 420-442.<br />

Järvel<strong>in</strong>, K. (2007). An analysis of two approaches <strong>in</strong> <strong>in</strong>formation retrieval: From<br />

frameworks to study designs. Journal of the American Society for Information<br />

Science and Technology, 58(7), 971-986.<br />

Järvel<strong>in</strong>, K. & Kekälä<strong>in</strong>en, J. (2002). Cumulated ga<strong>in</strong>-based evaluation of IR<br />

techniques. ACM Transactions on Information Systems, 20(4), 422-446.<br />

Kavadias, G. & Tambouris, E. (2003). GovML: A markup language for describ<strong>in</strong>g<br />

public services and life events. In: Wimmer, M.A. (Ed.), Knowledge<br />

Management <strong>in</strong> Electronic Government. Proceed<strong>in</strong>gs of the 4th IFIP<br />

International Work<strong>in</strong>g Conference, KMGov 2003, Rhodes, Greece, May 26–28,<br />

2003. Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />

Kelly, D. (2009). Methods for evaluat<strong>in</strong>g <strong>in</strong>teractive <strong>in</strong>formation retrieval systems with<br />

users. Foundations and Trends <strong>in</strong> Information Retrieval, 3(1-2), 1-224.<br />

Kent, A., Berry, M.M., Luehrs, F.U. & Perry, J.W. (1955). Mach<strong>in</strong>e literature search<strong>in</strong>g<br />

VIII. Operational criteria for design<strong>in</strong>g <strong>in</strong>formation retrieval systems. American<br />

Documentation, 6(2), 93-101.<br />

Kettunen, K. & Henttonen, P. (2010). Miss<strong>in</strong>g <strong>in</strong> action? Content of records<br />

management metadata <strong>in</strong> real life. Library & Information Science Research,<br />

32(1), 43-52.<br />

Kipp, M.E.I. (2005). Complementary or discrete contexts <strong>in</strong> onl<strong>in</strong>e <strong><strong>in</strong>dex<strong>in</strong>g</strong>: A<br />

comparison of user, creator, and <strong>in</strong>termediary keywords. Canadian Journal of<br />

Information and Library Science, 29(4), 419-436.<br />

Klischewski, R. (2006). Ontologies for e-document management <strong>in</strong> public<br />

adm<strong>in</strong>istration. Bus<strong>in</strong>ess Process Management Journal, 12(1), 34-47.<br />

Kopackova, H., Michalek, K. & Cejna, K. (2010). Accessibility and f<strong>in</strong>dability of local<br />

e-<strong>government</strong> websites <strong>in</strong> the Czech Republic. Universal Access In The<br />

Information Society, 9(1), 51-61.<br />

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: Wiley.<br />

Koshman, S., Sp<strong>in</strong>k, A. & Jansen, B.J. (2006). Web search<strong>in</strong>g on the Vivisimo search<br />

eng<strong>in</strong>e. Journal of the American Society for Information Science and<br />

Technology, 57(14), 1875-1887.<br />

Kotsiantis, S.B., Zaharakis, I.D. & P<strong>in</strong>telas, P.E. (2006). Mach<strong>in</strong>e learn<strong>in</strong>g: A review of<br />

classification and comb<strong>in</strong><strong>in</strong>g techniques. Artificial Intelligence Review, 26(3),<br />

159-190.<br />

Kraemer, K.L. & Dedrick, J. (1997). Comput<strong>in</strong>g and Public Organizations. Journal of<br />

Public Adm<strong>in</strong>istration Research and Theory, 7(1), 89-112.<br />

Kraemer, K.L. & K<strong>in</strong>g, J.L. (1986). Comput<strong>in</strong>g and public organizations. Public<br />

Adm<strong>in</strong>istration Review, 46(6), 488-496.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Krippendorff, K. (2004). Content Analysis: An Introduction to Its Methodology (2. ed.).<br />

Thousand Oaks: Sage.<br />

Krueger, R.A. (1998). Develop<strong>in</strong>g Questions for Focus Groups (Vol. 3). Thousand<br />

Oaks: Sage.<br />

Kuhlthau, C.C. & Tama, S.L. (2001). Information search process of lawyers: A call for<br />

'just for me' <strong>in</strong>formation services. Journal of Documentation, 57(1), 25-43.<br />

Kules, B. & Shneiderman, B. (2004). Categorized graphical overviews for web search<br />

results: An exploratory study us<strong>in</strong>g U. S. <strong>government</strong> agencies as a mean<strong>in</strong>gful<br />

and stable structure, Proceed<strong>in</strong>gs of the Third Annual Workshop on HCI<br />

Research <strong>in</strong> MIS. Wash<strong>in</strong>gton, D.C.<br />

Kules, B. & Shneiderman, B. (2005). Us<strong>in</strong>g mean<strong>in</strong>gful and stable categories to support<br />

exploratory web search: Two formative studies (HCIL Technical Report 2005-<br />

31). Maryland: Human-Computer Interaction Laboratory, University of<br />

Maryland.<br />

Kvale, S. & Br<strong>in</strong>kmann, S. (2009). Interviews: Learn<strong>in</strong>g the Craft of Qualitative<br />

Research Interview<strong>in</strong>g (2. ed.). Los Angeles: Sage.<br />

Käki, M. (2005a). Enhanc<strong>in</strong>g Web Search Result Access with <strong>Automatic</strong> Categorization.<br />

Unpublished Doctoral Dissertation, Department of Computer Sciences,<br />

University of Tampere, Tampere, F<strong>in</strong>land, from http://acta.uta.fi/pdf/951-44-<br />

6490-7.pdf.<br />

Käki, M. (2005b). F<strong>in</strong>dex: Search result categories help users when document rank<strong>in</strong>g<br />

fails. In: Proceed<strong>in</strong>gs of the SIGCHI conference on Human factors <strong>in</strong><br />

comput<strong>in</strong>g systems, (pp. 131-140). Portland, Oregon: ACM.<br />

Käki, M. & Aula, A. (2005). F<strong>in</strong>dex: Improv<strong>in</strong>g search result use through automatic<br />

filter<strong>in</strong>g categories. Interact<strong>in</strong>g with Computers, 17(2), 187-206.<br />

Lancaster, F.W. (2003). Index<strong>in</strong>g and Abstract<strong>in</strong>g <strong>in</strong> Theory and Practice (3. ed.).<br />

London: Facet.<br />

Landsforen<strong>in</strong>gen af Kommunale Servicecentre, A.o.I. (2005). LKS: Projekt<br />

Borgerbetjen<strong>in</strong>g 2007: Rapport fra arbejdsgruppen om IT.<br />

Large, A., Tedd, L.A. & Hartley, R.J. (2001). Information Seek<strong>in</strong>g <strong>in</strong> the Onl<strong>in</strong>e Age:<br />

Pr<strong>in</strong>ciples and Practice. München: K. G. Saur.<br />

Lau, E.P. & Goh, D.H.-L. (2006). In search of query patterns: A case study of a<br />

university OPAC. Information Process<strong>in</strong>g & Managament, 42, 1316-1329.<br />

Layne, K. & Lee, J. (2001). Develop<strong>in</strong>g fully functional E-<strong>government</strong>: A four stage<br />

model. Government Information Quarterly, 18, 122-136.<br />

Leckie, G.J., Pettigrew, K.E. & Sylva<strong>in</strong>, C. (1996). Model<strong>in</strong>g the <strong>in</strong>formation seek<strong>in</strong>g of<br />

professionals: A general model derived from research on eng<strong>in</strong>eers, health care<br />

professionals, and lawyers. Library Quarterly, 66(2), 161-193.<br />

Lev<strong>in</strong>e, M.M. (1974). Information Needs <strong>in</strong> Milwaukee: Agencies and Groups (Ed-089<br />

769). Milwuakee: Milwaukee Urban Observatory.<br />

Levy, P.S. & Lemeshow, S. (2008). Sampl<strong>in</strong>g of Populations: Methods and<br />

Applications (4. ed.). Hoboken, New Jersey: Wiley.<br />

Lips, M. (1998). Reorganiz<strong>in</strong>g public service delivery <strong>in</strong> an <strong>in</strong>formation age. In:<br />

Snellen, I.T.M. & van de Donk, W.B.H.J. (Eds.), Public Adm<strong>in</strong>istration <strong>in</strong> an<br />

Information Age (pp. 325-339). Amsterdam: IOS.<br />

Liu, Y., Zhu, L. & Gorton, I. (2007). Performance Assessment for e-Government<br />

Services: An Experience Report. In: Schmidt, H.W., Crnkovic, I., He<strong>in</strong>eman,<br />

G.T. & Stafford, J.A. (Eds.), Component-Based Software Eng<strong>in</strong>eer<strong>in</strong>g. 10th<br />

International Symposium, CBSE 2007, Medford, MA, USA, July 9-11, 2007 (pp.<br />

74-89). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />

Lov<strong>in</strong>s, J.B. (1968). Development of a stemm<strong>in</strong>g algorithm. Mechanical Translation<br />

and Computational L<strong>in</strong>guistics, 11(1-2), 22-31.<br />

234


235<br />

References<br />

Lu, L. & Yuan, Y.C. (2011). Shall I google it or ask the competent villa<strong>in</strong> down the<br />

hall? The moderat<strong>in</strong>g role of <strong>in</strong>formation need <strong>in</strong> <strong>in</strong>formation source selection.<br />

Journal of the American Society for Information Science and Technology, 62(1),<br />

133-145.<br />

Luhn, H.P. (1957). A statistical approach to mechanized encod<strong>in</strong>g and search<strong>in</strong>g of<br />

literary <strong>in</strong>formation. IBM Journal of Research and Development, 1(4), 309-317.<br />

Luhn, H.P. (1958a). The automatic creation of literature abstracts. IBM Journal of<br />

Research and Development, 2(2), 159-165.<br />

Luhn, H.P. (1961). The automatic derivation of <strong>in</strong>formation retrieval encodements from<br />

mach<strong>in</strong>e-readable texts. In: Kent, A. (Ed.), Information Retrieval and Mach<strong>in</strong>e<br />

Translation (Vol. 3, pt. 2, pp. 1021-1028). New York: Interscience.<br />

Lykke, M., Price, S. & Delcambre, L. (2012). How doctors search: A study of query<br />

behaviour and the impact on search results. Information Process<strong>in</strong>g &<br />

Managament(0).<br />

MacMull<strong>in</strong>, S.E. & Taylor, R.S. (1984). Problem dimensions and <strong>in</strong>formation traits. The<br />

Information Society, 3(1), 91-111.<br />

Mahler, J.G. & Regan, P.M. (2005). Agency <strong>in</strong>ternets and the chang<strong>in</strong>g dynamics of<br />

congressional oversight. In: Garson, G.D. (Ed.), Handbook of Public<br />

Information Systems (2. ed., pp. 559-568). Boca Raton: Taylor & Francis.<br />

Mai, J.-E. (2004b). The future of general classification. Catalog<strong>in</strong>g & Classification<br />

Quarterly, 37(1 & 2), 3-12.<br />

Mai, J.E. (2000). Deconstruct<strong>in</strong>g the <strong><strong>in</strong>dex<strong>in</strong>g</strong> process. Advances <strong>in</strong> Librarianship, 23,<br />

269-298.<br />

Mai, J.E. (2005). Analysis <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong>: document and doma<strong>in</strong> centered approaces.<br />

Information Process<strong>in</strong>g & Management, 41, 599-611.<br />

Makri, S., Blandford, A. & Cox, A.L. (2008a). Investigat<strong>in</strong>g the <strong>in</strong>formation-seek<strong>in</strong>g<br />

behaviour of academic lawyers: From Ellis' model to design. Information<br />

Process<strong>in</strong>g & Management, 44(2), 613-634.<br />

Mandersloot, W.G.B., Douglas, E.M.B. & Spicer, N. (1970). Thesaurus control: The<br />

selection, group<strong>in</strong>g, and cross-referenc<strong>in</strong>g of terms for <strong>in</strong>clusion <strong>in</strong> a coord<strong>in</strong>ate<br />

<strong>in</strong>dex word list. Journal of the American Society for Information Science, 21(1),<br />

49-57.<br />

Marcella, R., Baxter, G., Davies, S. & Toornstra, D. (2007). The <strong>in</strong>formation needs and<br />

<strong>in</strong>formation-seek<strong>in</strong>g behaviour of the users of the European Parliamentary<br />

Documentation Centre: A customer knowledge study. Journal of<br />

Documentation, 63(6), 920-934.<br />

Marchion<strong>in</strong>i, G., Samet, H. & Brandt, L. (2003). Digital <strong>government</strong>. Communications<br />

of the ACM, 46(1), 25-27.<br />

Mar<strong>in</strong>i, F. (2000). Public adm<strong>in</strong>istration. In: Shafritz, J.M. (Ed.), Def<strong>in</strong><strong>in</strong>g Public<br />

Adm<strong>in</strong>istration: Selections from the International Encyclopedia of Public Policy<br />

and Adm<strong>in</strong>istration (pp. 3-16). Jaipur: Rawat.<br />

Markey, K. (2007a). Twenty-five years of end-user search<strong>in</strong>g, part 1: Research f<strong>in</strong>d<strong>in</strong>gs.<br />

Journal of the American Society for Information Science and Technology, 58(8),<br />

1071-1081.<br />

Mart<strong>in</strong>, B. (2008). Knowledge management. Annual Review of Information Science and<br />

Technology, 42, 371-424.<br />

Mart<strong>in</strong>ez, C., Lucey, J. & L<strong>in</strong>der, E. (1987). An expert system for mach<strong>in</strong>e-aided<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>. Journal of Chemical Information and Computer Sciences, 27(4), 158-<br />

162.<br />

Meijer, A.J. & Homburg, V.M.F. (2008). Introduction: Zoom<strong>in</strong>g <strong>in</strong> and zoom<strong>in</strong>g out on<br />

electronic <strong>government</strong>. International Journal of Public Adm<strong>in</strong>istration, 31(7),<br />

707-710.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Miles, M.B. & Huberman, A.M. (1994). Qualitative Data Analysis: An Expanded<br />

Sourcebook (2. ed.). Thousand Oaks: Sage.<br />

Millard, J. (2003). ePublic services <strong>in</strong> Europe: Past, present and future. Research<br />

f<strong>in</strong>d<strong>in</strong>gs and new challenges. Aarhus: Danish Technological Institute.<br />

Milstead, J.L. (1992). Methodologies for subject analysis <strong>in</strong> bibliographic databases.<br />

Information Process<strong>in</strong>g & Management, 28(3), 407-431.<br />

Milstead, J.L. (1994). Needs for research <strong>in</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Journal of the American Society<br />

for Information Science, 45(8), 577-582.<br />

M<strong>in</strong>istry of f<strong>in</strong>ance. (2001). IT, the Internet, and the Public Sector. Copenhagen:<br />

M<strong>in</strong>istry of f<strong>in</strong>ance.<br />

Moen, W.E. (2001). The metadata approach to access<strong>in</strong>g <strong>government</strong> <strong>in</strong>formation.<br />

Government Information Quarterly, 18(3), 155-165.<br />

Moen, W.E. & McClure, C.R. (1997). An Evaluation of the Federal Government's<br />

Implementation of the Government Information Locator Service (GILS): F<strong>in</strong>al<br />

Report. Wash<strong>in</strong>gton, DC.: General Services Adm<strong>in</strong>istration Office of<br />

Information Technology Integration.<br />

Moens, M.-F. (2000). <strong>Automatic</strong> Index<strong>in</strong>g and Abstract<strong>in</strong>g of Document Texts. Boston:<br />

Kluwer.<br />

Morgan, D.L. (1996). Focus groups. Annual Review of Sociology, 22, 129-152.<br />

Mukherjee, R. & Mao, J. (2004). Enterprise search: Tough stuff. Queue, 2(2), 36-46.<br />

Nakash, R.A., Hutton, J.L., Lamb, S.E., Gates, S. & Fisher, J. (2008). Response and<br />

non-response to postal questionnaire follow-up <strong>in</strong> a cl<strong>in</strong>ical trial: A qualitative<br />

study of the patient's perspective. Journal of Evaluation <strong>in</strong> Cl<strong>in</strong>ical Practice, 14,<br />

226-235.<br />

National Archives of Australia (2010). Development history. Retrieved 01-04-2011,<br />

2011, from http://www.agls.gov.au/about/.<br />

National IT and Telecom Agency. (2009). Overordnede Pr<strong>in</strong>cipper og Best Practice:<br />

Version 1.0. Copenhagen: National IT and Telecom Agency.<br />

Neuendorf, K.A. (2002). The Content Analysis Guidebook. Thousand Oaks: Sage.<br />

Nicholas, D. & Colgrave, K. (1996). Councillors and <strong>in</strong>formation: A study of<br />

<strong>in</strong>formation needs and <strong>in</strong>formation provision. Aslib Proceed<strong>in</strong>gs, 48(2), 37-46.<br />

Nielsen, J.A., Kræmmergaard, P., Nielsen, P.A. & Bjørnholt, B. (2009). Det kommunale<br />

digitaliser<strong>in</strong>gslandskab 2009: Status og udfordr<strong>in</strong>ger. <strong>Aalborg</strong>: <strong>Aalborg</strong><br />

University.<br />

Nielsen, M.L. (2001). A framework for work task based thesaurus design. Journal of<br />

Documentation, 57(6), 774-797.<br />

Nielsen, M.L. (2004). Task-based evaluation of associative thesaurus <strong>in</strong> real-life<br />

environment. Proceed<strong>in</strong>gs of the 67th ASIS&T Annual Meet<strong>in</strong>g, 41, 437-447.<br />

Nikoi, S.K. (2008). Information needs of NGOs: A case study of NGO development<br />

workers <strong>in</strong> the northern region of Ghana. Information Development, 24(1), 44-<br />

52.<br />

NISO (2004). Understand<strong>in</strong>g Metadata. Retrieved 23-03, 2011, from<br />

http://www.niso.org/publications/press/Understand<strong>in</strong>gMetadata.pdf.<br />

OECD. (2010). Denmark: Efficient E-<strong>government</strong> For Smarter Public Service Delivery:<br />

Prelim<strong>in</strong>ary Copy. Paris, France: OECD.<br />

Oh, C.H. (1996). Information search<strong>in</strong>g <strong>in</strong> <strong>government</strong>al bureaucracies: An <strong>in</strong>tegrated<br />

model. The American Review of Public Adm<strong>in</strong>istration, 26(1), 41-70.<br />

Olsen, H. (1997). Tal taler ikke uden ord. Politica, 29(3), 295-310.<br />

Orton, R., Marcella, R. & Baxter, G. (2000). An observational study of the <strong>in</strong>formation<br />

seek<strong>in</strong>g behaviour of Members of Parliament <strong>in</strong> the United K<strong>in</strong>gdom. Aslib<br />

Proceed<strong>in</strong>gs, 52(6), 207-217.<br />

236


237<br />

References<br />

Palkovits, S., Woitsch, R. & Karagiannis, D. (2003). Process-based knowledge<br />

management and modell<strong>in</strong>g <strong>in</strong> e-<strong>government</strong>: An <strong>in</strong>evitable comb<strong>in</strong>ation. In:<br />

Wimmer, M.A. (Ed.), KMGov 2003 (pp. 213-218): Spr<strong>in</strong>ger.<br />

Pedersen, B.S., Navarretta, C. & Hansen, D.H. (2005). Ontologibaseret teksthåndter<strong>in</strong>g:<br />

Med sprogteknologi (VID-rapport no. 6). Copenhagen: Center for<br />

Sprogteknologi.<br />

Pedersen, B.S., Navarretta, C. & Henriksen, L. (2004). Build<strong>in</strong>g bus<strong>in</strong>ess ontologies<br />

with language technology techniques: The VID project. In: OntoLex 2004<br />

Proceed<strong>in</strong>gs (pp. 30-35). Paris: European Language Resources Association.<br />

Peel, M. & Rowley, J. (2010). Information shar<strong>in</strong>g practice <strong>in</strong> multi-agency work<strong>in</strong>g.<br />

Aslib Proceed<strong>in</strong>gs, 62(1), 11-28.<br />

Peres, M., Guzmán, F. & Valbuena, T. (2009). Onl<strong>in</strong>e <strong>government</strong> strategy<br />

development model for <strong>in</strong>teractional and transactional phases <strong>in</strong> the territorial<br />

order, The 3rd International Conference on Theory and Practice of Electronic<br />

Governance. Bogota, Columbia: ACM.<br />

Peristeras, V., Tatabanis, K. & Goudos, S.K. (2009). Model-driven eGovernment<br />

<strong>in</strong>teroperability: A review of the state of the art. Computer Standards &<br />

Interfaces, 31(4), 316-328.<br />

Personalestyrelsen (2010). Forhandl<strong>in</strong>gsdatabasen. Retrieved 26-01, 2010, from<br />

http://perst.dk/Arbejdspladsen/Ledelses<strong>in</strong>formation%20og%20statistik/Ledelsesi<br />

nformation%20og%20lonstyr<strong>in</strong>g/Forhandl<strong>in</strong>gsdatabasen.aspx.<br />

Philipson, K.B. (2008). Indekser<strong>in</strong>gsprocessen: Konsistensmål til sammenlign<strong>in</strong>g af<br />

tilgange til emnebestemmelse og emnebeskrivelse. Dansk Biblioteksforskn<strong>in</strong>g,<br />

4(3), 57-71.<br />

Poland, B.D. (2003). Transcription qualiry. In: Holste<strong>in</strong>, J.A. & Gubrium, J.F. (Eds.),<br />

Inside Interview<strong>in</strong>g: New Lenses, New Concerns (pp. 267-287). Thousand Oaks:<br />

Sage.<br />

Porter, M.F. (1980). An algorithm for suffix stripp<strong>in</strong>g. Program: Electronic Library and<br />

Information Systems, 14(3), 130-137.<br />

Porter, M.F. (2001). Snowball: A language for stemm<strong>in</strong>g algorithms. Retrieved 19-08,<br />

2011, from http://snowball.tartarus.org/texts/<strong>in</strong>troduction.html.<br />

Price, S.L., Nielsen, M.L., Delcambre, L.M.L. & Vedsted, P. (2007). Semantic<br />

components enhance retrieval of doma<strong>in</strong>-specific documents. In: CIKM '07:<br />

Proceed<strong>in</strong>gs of the sixteenth ACM conference on Conference on <strong>in</strong>formation and<br />

knowledge management (pp. 429-438). New York: ACM.<br />

Price, S.L., Nielsen, M.L., Delcambre, L.M.L., Vedsted, P. & Ste<strong>in</strong>hauer, J. (2009).<br />

Us<strong>in</strong>g semantic components to search for doma<strong>in</strong>-specific documents: An<br />

evaluation from the system perspective and the user perspective. Information<br />

Systems, 34, 724-752.<br />

Project Digital Government & The Digital Taskforce (2002). Towards E-Government:<br />

Vision and Strategy for the Public Sector <strong>in</strong> Denmark. Retrieved 13-07, 2010,<br />

from http://www.epractice.eu/files/media/media_362.pdf.<br />

Quam, E. (2001). Inform<strong>in</strong>g and evaluat<strong>in</strong>g a metadata <strong>in</strong>itiative: Usability and<br />

metadata studies <strong>in</strong> M<strong>in</strong>nesotaメs Foundations Project. Government Information<br />

Quarterly, 18(3), 181-194.<br />

Quirchmayr, G. & Traunmüller, R. (1991). Expert systems <strong>in</strong> law and public<br />

adm<strong>in</strong>istration: Recent developments and future prospects. In: Traunmüller, R.<br />

(Ed.), Governmental and Municipal Information Systems, II: Proceed<strong>in</strong>gs of the<br />

2nd IFIP TC(/WG8.5 Work<strong>in</strong>g Conference on Governmental and Municipal<br />

Information Systems, Balatonfüred, Hungary, 3-6 June (pp. 145-163).<br />

Amsterdan: Elsevier.<br />

Rafferty, P. & Hidderley, R. (2007). Flickr and Democratic Index<strong>in</strong>g: Dialogic<br />

approaches to <strong><strong>in</strong>dex<strong>in</strong>g</strong>. Aslib Proceed<strong>in</strong>gs, 59(4/5), 397-410.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Ra<strong>in</strong>s, S.A. (2008). Health at high speed: Broadband <strong>in</strong>ternet access, health<br />

communication, and the digital divide. Communication Research, 35(3), 283-<br />

297.<br />

Rasmussen, E. (1992). Cluster<strong>in</strong>g algorithms. In: Frakes, W.B. & Baeza-Yates, R.<br />

(Eds.), Information Retrieval: Data Structures & Algorithms (pp. 419-442).<br />

Englewood Cliffs, New Jersey: Prentice Hall.<br />

Rasmussen, E.M. (2003). Index<strong>in</strong>g and retrieval for the Web. Annual Review of<br />

Information Science and Technology, 37, 91-124.<br />

Reddick, C.G. (2005). Citizen <strong>in</strong>teraction with e-<strong>government</strong>: From the streets to<br />

servers? Government Information Quarterly, 22(1), 38-57.<br />

Ren, W.-H. (1999). Self-efficacy and the search for <strong>government</strong> <strong>in</strong>formation. Reference<br />

& User Services Quarterly, 38(3), 283-291.<br />

Robb<strong>in</strong>, A., Courtright, C. & Davis, L. (2004). ICTs and political life. Annual Review of<br />

Information Science and Technology, 38(1), 411-482.<br />

Robertson, S.E. & Hancock-Beaulieu, M.M. (1992). On the evaluation of IR systems.<br />

Information Process<strong>in</strong>g & Management, 28(4), 457-466.<br />

Roitblat, H.L., Kershaw, A. & Oot, P. (2010). Document categorization <strong>in</strong> legal<br />

electronic discovery: Computer classification vs. manual review. Journal of the<br />

American Society for Information Science and Technology, 61(1), 70-80.<br />

Roll<strong>in</strong>g, L. (1981). Index<strong>in</strong>g consistency, quality and efficiency. Information<br />

Process<strong>in</strong>g & Management, 17(2), 69-76.<br />

Rouse, W.B. & Rouse, S.H. (1984). Human <strong>in</strong>formation seek<strong>in</strong>g and design of<br />

<strong>in</strong>formation systems. Information Process<strong>in</strong>g & Management, 20(1-2), 129-138.<br />

Rowley, J. (1988). Abstract<strong>in</strong>g and Index<strong>in</strong>g (2. ed.). London: Clive B<strong>in</strong>gley.<br />

Rowley, J. (1994). The controlled versus natural <strong><strong>in</strong>dex<strong>in</strong>g</strong> language debate revisited: A<br />

perspective on <strong>in</strong>formation retrieval practice and research. Journal of<br />

Information Science, 20(2), 108-118.<br />

Rowley, J. (2011). e-Government stakeholders: Who are they and what do they want?<br />

International Journal of Information Management, 31(1), 53-62.<br />

Rowley, J. & Hartley, R. (2008). Organiz<strong>in</strong>g Knowledge: An Introduction to Manag<strong>in</strong>g<br />

Access to Information (4. ed.). Hampshire: Ashgate.<br />

Rub<strong>in</strong>, H.J. & Rub<strong>in</strong>, I.S. (2005). Qualitative Interview<strong>in</strong>g: The Art of Hear<strong>in</strong>g Data (2.<br />

ed.). Thousand Oaks: Sage.<br />

Sabucedo, L.Á. & Rifón, L.A. (2006). Semantic Service Oriented Architectures for<br />

eGovernment Platforms. Retrieved 08-01-2010.<br />

Salton, G. (1970). <strong>Automatic</strong> text analysis. Science, 168(3929), 335-343.<br />

Salton, G. (1986a). Another look at automatic text-retrieval systems. Communications<br />

of the ACM, 29(7), 648-656.<br />

Salton, G. (1988). <strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> and abstract<strong>in</strong>g. In: Willet, P. (Ed.), Document<br />

Retrieval Systems (pp. 42-80). London: Taylor Graham.<br />

Salton, G. (1989). <strong>Automatic</strong> Text Process<strong>in</strong>g: The Transformation, Analysis, and<br />

Retrieval of Information by Computer. Read<strong>in</strong>g, Massachusetts: Addison-<br />

Wesley.<br />

Salton, G. (1991). Developments <strong>in</strong> automatic text retrieval. Science, 253(5023), 974-<br />

980.<br />

Salton, G. & Buckley, C. (1988). Term-weight<strong>in</strong>g approaches <strong>in</strong> automatic text<br />

retrieval. Information Process<strong>in</strong>g & Management, 24(5), 513-523.<br />

Salton, G. & McGill, M.J. (1983). Introduction to Modern Information Retrieval. New<br />

York: McGraw-Hill.<br />

Salton, G., Wong, A. & Yang, C.S. (1975). A vector space model for automatic<br />

<strong><strong>in</strong>dex<strong>in</strong>g</strong>. Communications of the ACM, 18(11), 613-620.<br />

238


239<br />

References<br />

Salton, G., Yang, C.S. & Yu, C.T. (1975). A theory of term importance <strong>in</strong> automatic<br />

text analysis. Journal of the American Society for Information Science, 26(1),<br />

33-44.<br />

Saracevic, T. (1996). Relevance reconsidered '96. In: Ingwersen, P. & Pors, N.O. (Eds.),<br />

Colis 2 - Second International Conference On Conceptions Of Library And<br />

Information Science: Integration In Perspective, Proceed<strong>in</strong>gs (pp. 201-218).<br />

Copenhagen S: Royal School Librarianship.<br />

Saracevic, T., Kantor, P., Chamis, A. & Trivison, D. (1987). Experiments on the<br />

Cognitive Aspects of Information Seek<strong>in</strong>g and Information Retriev<strong>in</strong>g. F<strong>in</strong>al<br />

Report and Appendices. Wash<strong>in</strong>gton, D.C.: National Science Foundation, Div.<br />

of Information Science and Technology.<br />

Savola<strong>in</strong>en, R. (1995). Everyday life <strong>in</strong>formation seek<strong>in</strong>g: Approach<strong>in</strong>g <strong>in</strong>formation<br />

seek<strong>in</strong>g <strong>in</strong> the context of モway of lifeヤ. Library & Information Science<br />

Research, 17(3), 259-294.<br />

Savola<strong>in</strong>en, R. (2006). Time as a context of <strong>in</strong>formation seek<strong>in</strong>g. Library & Information<br />

Science Research, 28(1), 110-127.<br />

Savoy, J. (2005). Bibliographic database access us<strong>in</strong>g free-text and controlled<br />

vocabulary: An evaluation. Information Process<strong>in</strong>g & Management, 41(4), 873-<br />

890.<br />

Saxena, K.B.C. & Aly, A.M.M. (1995). Information technology support for<br />

reeng<strong>in</strong>eer<strong>in</strong>g public adm<strong>in</strong>istration: A conceptual framework. International<br />

Journal of Information Management, 15(4), 271-293.<br />

Schamber, L., Eisenberg, M.B. & Nilan, M.S. (1990). A re-exam<strong>in</strong>ation of relevance:<br />

Toward a dynamic, situational def<strong>in</strong>ition. Information Process<strong>in</strong>g &<br />

Managament, 26(6), 755-776.<br />

Schellong, A. (2007). Cross<strong>in</strong>g the boundary: Why putt<strong>in</strong>g the e <strong>in</strong> <strong>government</strong> is the<br />

easy part. In: PNG Work<strong>in</strong>g Paper Series, PNG07-002. Retrieved 18-01, 2010,<br />

from<br />

http://www.hks.harvard.edu/netgov/files/png_work<strong>in</strong>gpaper_series/PNG07-<br />

002_Work<strong>in</strong>gPaper_cross<strong>in</strong>g_the_boundary_schellong.pdf.<br />

Schultz, C.K. (1970). Cost-effectiveness as a guide <strong>in</strong> develop<strong>in</strong>g <strong><strong>in</strong>dex<strong>in</strong>g</strong> rules.<br />

Information Storage and Retrieval, 6(4), 335-340.<br />

Schwartz, D.G., Divit<strong>in</strong>i, M. & Brasethvik, T. (2000). Internet-Based Organizational<br />

Memory and Knowledge Management. Hershey, USA: Idea Group.<br />

Sebastiani, F. (1999). A tutorial on automated text categorisation. In: Amandi, A. &<br />

Zun<strong>in</strong>o, A. (Eds.), Proceed<strong>in</strong>gs of the 1st Argent<strong>in</strong>ian Symposium on Artificial<br />

Intelligence (ASAI'99) (pp. 7-35). Buenos Aires, AR.<br />

Sebastiani, F. (2002). Mach<strong>in</strong>e learn<strong>in</strong>g <strong>in</strong> automated text categorization. ACM<br />

Comput<strong>in</strong>g Surveys, 34(1), 1-47.<br />

Serola, S. (2006). City planners' <strong>in</strong>formation seek<strong>in</strong>g behavior: Information channels<br />

used and <strong>in</strong>formation types needed <strong>in</strong> vary<strong>in</strong>g types of perceived work tasks. In:<br />

Ruthven, I. (Ed.), IIiX: Proceed<strong>in</strong>gs of the 1st International Conference on<br />

Information Interaction <strong>in</strong> Context (pp. 42-45). New York: ACM.<br />

Shropshire, K.O., Hawdon, J.E. & Witte, J.C. (2009). Web survey design: Balanc<strong>in</strong>g<br />

measurement, response, and topical <strong>in</strong>terest. Sociological Methods Research,<br />

37(3), 344-370.<br />

Siegel, S. & Castellan, N.J. (1988). Nonparametric Statistics for the Behavioral<br />

Sciences (2. ed.). New York: McGraw Hill.<br />

Silvester, J.P., Genuardi, M.T. & Kl<strong>in</strong>gbiel, P.H. (1994). Mach<strong>in</strong>e-aided <strong><strong>in</strong>dex<strong>in</strong>g</strong> at<br />

Nasa. Information Process<strong>in</strong>g & Managament, 30, 631-645.<br />

Silvester, J.P. & Kl<strong>in</strong>gbiel, P.H. (1993). An operational system for subject switch<strong>in</strong>g<br />

between controlled vocabularies. Information Process<strong>in</strong>g & Management, 29(1),<br />

47-59.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

S<strong>in</strong>ghal, A., Salton, G., Mitra, M. & Buckley, C. (1996). Document length<br />

normalization. Information Process<strong>in</strong>g & Management, 32(5), 619-633.<br />

SKAT (2009). Årsrapport 2008. Retrieved 04-02, 2010, from<br />

http://www.skat.dk/SKAT.aspx?oId=1809360&vId=0.<br />

SKAT (2010). About us. Retrieved 26-01, 2010, from<br />

http://www.skat.dk/SKAT.aspx?oId=1826783&vId=0.<br />

Skov, M. (2009). The Re<strong>in</strong>vented Museum: Explor<strong>in</strong>g Information Seek<strong>in</strong>g Behaviour <strong>in</strong><br />

a Digital Museum Context. Unpublished Doctoral dissertation, Research<br />

Programme Information Interaction and Information Architecture, Royal School<br />

of Library and Information Science, Copenhagen.<br />

Snellen, I.T.M. (2002). Electronic governance: Implications for citizens, politicians and<br />

public servants. International Review of Adm<strong>in</strong>istrative Sciences, 68(2), 183-<br />

198.<br />

Soergel, D. (1985). Organiz<strong>in</strong>g Information: Pr<strong>in</strong>ciples of Data Base and Retrieval<br />

Systems. San Diego, CA: Academic Press.<br />

Soergel, D. (1994). Index<strong>in</strong>g and retrieval performance: The logical evidence. Journal<br />

of the American Society for Information Science, 45(8), 589-599.<br />

Soergel, D. (1999). The rise of ontologies or the re<strong>in</strong>vention of classification. Journal of<br />

the American Society for Information Science, 50(12), 1119-1120.<br />

Solomon, P. (1997a). Discover<strong>in</strong>g <strong>in</strong>formation behavior <strong>in</strong> sense mak<strong>in</strong>g.1. Time and<br />

tim<strong>in</strong>g. Journal of the American Society for Information Science, 48(12), 1097-<br />

1108.<br />

Solomon, P. (1997b). Discover<strong>in</strong>g <strong>in</strong>formation behavior <strong>in</strong> sense mak<strong>in</strong>g.2. The social.<br />

Journal of the American Society for Information Science, 48(12), 1109-1126.<br />

Solomon, P. (1997c). Discover<strong>in</strong>g <strong>in</strong>formation behavior <strong>in</strong> sense mak<strong>in</strong>g.3. The person.<br />

Journal of the American Society for Information Science, 48(12), 1127-1138.<br />

Sormunen, E. (2002). Liberal relevance criteria of TREC - count<strong>in</strong>g on negligible<br />

documents? In: SIGIR '02: Proceed<strong>in</strong>gs of the 25th annual <strong>in</strong>ternational ACM<br />

SIGIR conference on Research and development <strong>in</strong> <strong>in</strong>formation retrieval, (pp.<br />

324-330). August 11-15, 2002, Tampere, F<strong>in</strong>land: ACM.<br />

Southon, F.C.G., Todd, R.J. & Seneque, M. (2002). Knowledge management <strong>in</strong> three<br />

organizations: An exploratory study. Journal of the American Society for<br />

Information Science and Technology, 53(12), 1047-1059.<br />

Sparck Jones, K. (1973). Index term weight<strong>in</strong>g. Information Storage and Retrieval,<br />

9(11), 619-633.<br />

Sparck Jones, K. (1981). The Cranfield tests. In: Jones, K.S. (Ed.), Information<br />

Retrieval Experiment (pp. 256-284). London: Butterworths.<br />

Sprehe, J.T., McClure, C.R. & Zellner, P. (2002). The role of situational factors <strong>in</strong><br />

manag<strong>in</strong>g U.S. federal recordkeep<strong>in</strong>g. Government Information Quarterly,<br />

19(3), 289-305.<br />

Ste<strong>in</strong>mark, C. (2005). EDM <strong>in</strong> the Danish public sector: The FESD project. Aslib<br />

Proceed<strong>in</strong>gs, 57(4), 369-377.<br />

Stenmark, D. (2005). How Intranets differ from the Web: Organisational cultureʼs effect<br />

on technology. In: Bartmann, D., Rajola, F., Kall<strong>in</strong>ikos, J., Avison, D.E.,<br />

W<strong>in</strong>ter, R., E<strong>in</strong>-Dor, P., Becker, J., Bodendorf, F. & We<strong>in</strong>hardt, C. (Eds.),<br />

European Conference on Information Systems ECIS 05.<br />

Stewart, D.W., Shamdasani, P.N. & Rook, D.W. (2007). Focus Groups: Theory and<br />

Practice (2. ed.). Thousand Oaks: Sage.<br />

Strader, C.R. (2009). Author-assigned keywords versus Library of Congress Subject<br />

Head<strong>in</strong>gs implications for the catalog<strong>in</strong>g of electronic theses and dissertations.<br />

Library Ressources & Technical Services, 53(4), 243-250.<br />

Strzalkowski, T., L<strong>in</strong>, F., Wang, J. & Perez-Carballo, J. (1999). Evaluat<strong>in</strong>g natural<br />

language process<strong>in</strong>g techniques <strong>in</strong> <strong>in</strong>formation retrieval. In: Strzalkowski, T.<br />

240


241<br />

References<br />

(Ed.), Natural Language Information Retrieval (pp. 113-145). Dordrecht:<br />

Kluwer.<br />

Suomela, S. & Kekälä<strong>in</strong>en, J. (2005). Ontology as a search-tool: A study of real users'<br />

query formulation with and without conceptual support. In: Losada, D.E. &<br />

Fernandez-Luna (Eds.), ECIR proceed<strong>in</strong>gs 2005 (pp. 315-329): Spr<strong>in</strong>ger.<br />

Suomela, S. & Kekälä<strong>in</strong>en, J. (2006). User evaluation of ontology as query construction<br />

tool. Information Retrieval, 9, 455-475.<br />

Svenonius, E. (1986). Unanswered questions <strong>in</strong> the design of controlled vocabularies.<br />

Journal of the American Society for Information Science, 37(5), 331-341.<br />

Talja, S., Tuom<strong>in</strong>en, K. & Savola<strong>in</strong>en, R. (2005). "Isms" <strong>in</strong> <strong>in</strong>formation science:<br />

Constructivism, collectivism and constructionism. Journal of Documentation,<br />

61(1), 79-101.<br />

Tambouris, E., Manouselis, N. & Costopoulou, C. (2007). Metadata for digital<br />

collections of e-<strong>government</strong> resources. The Electronic Library, 25(2), 176-192.<br />

Taylor, R.S. (1968). Question-negotiation and <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> libraries. College<br />

& Research Libraries, 29(3), 178-194.<br />

Taylor, R.S. (1991). Information use environments. In: Derv<strong>in</strong>, B. & Voigt, M.J. (Eds.),<br />

Progress <strong>in</strong> Communication Sciences (Vol. 10, pp. 217-255). Norwood, NJ:<br />

Ablex.<br />

Tenopir, C. (1985). Full text database retrieval performance. Onl<strong>in</strong>e Information<br />

Review, 9(2), 149-164.<br />

The Danish Government, Local Government Denmark & Danish Regions. (2010).<br />

Mandate: New Common Public Strategy for Digitalization 2011-2015: Local<br />

Government Denmark.<br />

The Danish Government, Local Government Denmark, Danish Regions, Copenhagen<br />

Municipality & Frederiksberg Municipality (2004). The Danish eGovernment<br />

Strategy 2004-2006: Realis<strong>in</strong>g the Potential. Retrieved 13-07, 2010, from<br />

http://www.epractice.eu/files/media/media_275.pdf.<br />

The Danish Government, Local Government Denmark (LGDK) & Danish Regions<br />

(2007). The Danish E-Government Strategy 2007-2010: Towards Better Digital<br />

Service, Increased Efficiency and Stronger Collaboration. from<br />

http://www.moderniser<strong>in</strong>g.dk/fileadm<strong>in</strong>/user_upload/documents/Projekter/digita<br />

liser<strong>in</strong>gsstrategi/Danish_E-<strong>government</strong>_strategy_2007-2010.pdf.<br />

Thomas, M., Caudle, D.M. & Schmitz, C.M. (2009). To tag or not to tag? Library Hi<br />

Tech, 27(3), 411-434.<br />

Trant, J. (2009). Study<strong>in</strong>g social tagg<strong>in</strong>g and folksonomy: A review and framework.<br />

Journal of Digital Information, 10(1).<br />

Turp<strong>in</strong>, A., Scholer, F., Järvel<strong>in</strong>, K., Wu, M. & Culpepper, J.S. (2009). Includ<strong>in</strong>g<br />

summaries <strong>in</strong> system evaluation. In: Allan, J. (Ed.), Proceed<strong>in</strong>gs of the 32nd<br />

<strong>in</strong>ternational ACM SIGIR conference on Research and development <strong>in</strong><br />

<strong>in</strong>formation retrieval. New York: ACM.<br />

United Nations. (2012). E-Government Survey 2012: E-Government for the People.<br />

New York: United Nations.<br />

Vakkari, P. (1999). Task complexity, problem structure and <strong>in</strong>formation actions:<br />

Integrat<strong>in</strong>g studies on <strong>in</strong>formation seek<strong>in</strong>g and retrieval. Information Process<strong>in</strong>g<br />

& Management, 35, 819-837.<br />

Vakkari, P. (2003). Task-based <strong>in</strong>formation search<strong>in</strong>g. Annual Review of Information<br />

Science and Technology, 37, 413-464.<br />

van de Donk, W.B.H.J. & Snellen, I.T.M. (1989). Knowledge-based systems <strong>in</strong> public<br />

adm<strong>in</strong>istration: Evolv<strong>in</strong>g practices and norms. In: Snellen, I.T.M., van de Donk,<br />

W.B.H.J. & Baquiast, J.-P. (Eds.), Expert Systems <strong>in</strong> Public Adm<strong>in</strong>istration:<br />

Evolv<strong>in</strong>g Practices and Norms (pp. 3-22). Amsterdam: Elsevier.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

van Deursen, A. & van Dijk, J. (2010). Civil servantsメ <strong>in</strong>ternet skills: Are they ready for<br />

e-<strong>government</strong>? In: Wimmer, M.A., Chappelet, J.-L., Janssen, M. & Scholl, H.J.<br />

(Eds.), Electronic Government. 9th IFIP WG 8.5 International Conference,<br />

EGOV 2010, Lausanne, Switzerland, August 29 - September 2, 2010.<br />

Proceed<strong>in</strong>gs (pp. 132-143). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />

Veal, D.C. (2001). Techniques of document management: A review of text retrieval and<br />

related technologies. Journal of Documentation, 57(2), 192-217.<br />

Veenema, F. (1996). To <strong>in</strong>dex or not to <strong>in</strong>dex. Canadian Journal of Information and<br />

Library Science, 21(2), 1-22.<br />

Vellucci, S.L. (1998). Metadata. Annual Review of Information Science and<br />

Technology, 33, 187-222.<br />

Voorhees, E. & Pazienza, M. (1999). Natural language process<strong>in</strong>g and <strong>in</strong>formation<br />

retrieval. In: Lecture Notes <strong>in</strong> Computer Science: Information Extraction (Vol.<br />

1714, pp. 32-48). Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />

Voorhees, E.M. (2000). Variations <strong>in</strong> relevance judgments and the measurement of<br />

retrieval effectiveness. Information Process<strong>in</strong>g & Managament, 36, 697-716.<br />

Wacholder, N., Kelly, D., Kantor, P., Rittman, R., Sun, Y., Bai, B., Small, S., Yamrom,<br />

B. & Strzalkowski, T. (2007). A model for quantitative evaluation of an end-toend<br />

question-answer<strong>in</strong>g system. Journal of the American Society for Information<br />

Science and Technology, 58(8), 1082-1099.<br />

Walden, G.R. (2006). Focus group <strong>in</strong>terview<strong>in</strong>g <strong>in</strong> the library literature: A selective<br />

annotated bibliography 1996-2005. Reference Services Review, 34(2), 222-241.<br />

Wang, P. (1999). Methodologies and methods for user behavioral research. Annual<br />

Review of Information Science and Technology, 34, 53-99.<br />

Wang, Y.-S. & Shih, Y.-W. (2009). Why do people use <strong>in</strong>formation kiosks? A<br />

validation of the unified theory of acceptance and use of technology.<br />

Government Information Quarterly, 26(1), 158-165.<br />

Weibel, S. (1997). The Dubl<strong>in</strong> Core: A simple content description model for electronic<br />

resources. Bullet<strong>in</strong> of the American Society for Information Science, 24(1), 9-11.<br />

White, M. (2005). The Content Management Handbook. London: Facet.<br />

Wilbur, W.J. & Sirotk<strong>in</strong>, K. (1992). The automatic identification of stop words. Journal<br />

of Information Science, 18(1), 45-55.<br />

Willett, P. (2006). The Porter stemm<strong>in</strong>g algorithm: Then and now. Program: Electronic<br />

Library and Information Systems, 40(3), 219-223.<br />

Wilson, T.D. (1980). Information system design implications of research <strong>in</strong>to the<br />

<strong>in</strong>formation behaviour of social workers and social adm<strong>in</strong>istrators. In: Harbo,<br />

O.K., L., (Ed.), Theory and application of <strong>in</strong>formation research: Proceed<strong>in</strong>gs of<br />

the Second International Research Forum on Information Science, 3-6 August,<br />

1977, (pp. 198-213). Royal School of Librarianship, Copenhagen: London, UK:<br />

Mansell.<br />

Wilson, T.D. (1981). On user studies and <strong>in</strong>formation needs. Journal of Documentation,<br />

37(1), 3-15.<br />

Wilson, T.D. (1999). Models <strong>in</strong> <strong>in</strong>formation behaviour research. Journal of<br />

Documentation, 55(3), 249-270.<br />

Wilson, T.D. & Streatfield, D.R. (1977). Information needs <strong>in</strong> local authority social<br />

services departments: an <strong>in</strong>terim report on project INISS. Journal of<br />

Documentation, 33(4), 277-293.<br />

Wimmer, M.A. (2007). eGovernment as a multidiscipl<strong>in</strong>ary research field. In:<br />

Codagnone, C. & Wimmer, M.A. (Eds.), Roadmapp<strong>in</strong>g eGovernment Research:<br />

Visions and Measures towards Innovative Governments <strong>in</strong> 2020 (pp. 12-14).<br />

[Koblentz]: eGovRTD2020 Project Consortium.<br />

242


243<br />

References<br />

Woudstra, L. & van den Hooff, B. (2008). Inside the source selection process: Selection<br />

criteria for human <strong>in</strong>formation sources. Information Process<strong>in</strong>g & Management,<br />

44(3), 1267-1278.<br />

Xu, Y.C., Tan, C.Y.B. & Yang, L. (2006). Who will you ask? An empirical study of<br />

<strong>in</strong>terpersonal task <strong>in</strong>formation seek<strong>in</strong>g. Journal of the American Society for<br />

Information Science and Technology, 57(12), 1666-1677.<br />

Yang, D., Tong, L., Ye, Y. & Wu, H. (2006). Support<strong>in</strong>g effective operation of e<strong>government</strong>al<br />

services through workflow and knowledge management. In:<br />

Aberer, K., Peng, Z., Rundenste<strong>in</strong>er, E.A., Zhang, Y. & Li, X., (Eds.), Web<br />

Information Systems: WISE 2006, 7th International Conference on Web<br />

Information Systems Eng<strong>in</strong>eer<strong>in</strong>g, Wuhan, Ch<strong>in</strong>a, October 23-26, (pp. 102-113).<br />

Berl<strong>in</strong>: Spr<strong>in</strong>ger.<br />

Yildiz, M. (2007). E-<strong>government</strong> research: Review<strong>in</strong>g the literature, limitations, and<br />

ways forward. Government Information Quarterly, 24, 646-665.<br />

Zamir, O. & Etzioni, O. (1999). Grouper: A dynamic cluster<strong>in</strong>g <strong>in</strong>terface to Web search<br />

results. Computer Networks, 31(11-16, 17 May 1999), 1361-1374.<br />

Zeng, M.L. (2008). Knowledge Organization Systems (KOS). Knowledge Organization,<br />

35(2/3), 160-182.<br />

Zikmund, W.G. (2000). Bus<strong>in</strong>ess Research Methods (6. ed.). Fort Worth: Hartcourt.<br />

Zipf, G.K. (1949). Human Behavior and the Pr<strong>in</strong>ciple of Least Effort. Cambridge:<br />

Addison-Wesley.<br />

Zunde, P. & Dexter, M.E. (1969). Index<strong>in</strong>g consistency and quality. American<br />

Documentation, 20(3), 259-267.<br />

Østergaard, M. & Olesen, J.D. (2004). Digital forkalkn<strong>in</strong>g: En debatbog om digital<br />

forvaltn<strong>in</strong>g i Danmark. Frederikshavn: Dafolo.<br />

Åström, F. (2007). Changes <strong>in</strong> the LIS research front: Time-sliced cocitation analyses of<br />

LIS journal articles, 1990-2004. Journal of the American Society for Information<br />

Science and Technology, 58(7), 947-957.


List of abbreviations<br />

ARIST Annual Review of Information Science and Technology<br />

ICT Information and Communication Technology<br />

IDF Inverse document frequency<br />

IIR Interactive Information Retrieval<br />

IR Information Retrieval<br />

KOS Knowledge Organiz<strong>in</strong>g Systems<br />

LCSH Library of Congress Subject Head<strong>in</strong>gs<br />

LIS Library and Information Science<br />

MAI Mach<strong>in</strong>e aided/assisted <strong><strong>in</strong>dex<strong>in</strong>g</strong><br />

RSLIS Royal School of Library and Information Science<br />

TF Term frequency<br />

245<br />

Abbreviations


Appendices<br />

247<br />

Appendices<br />

List of abbreviations ................................................................................................................................. 245<br />

Appendices ............................................................................................................................................... 247<br />

Appendix 1: Generic work tasks at SKAT ............................................................................................... 249<br />

Appendix 2: Distribution of employees across ma<strong>in</strong> processes <strong>in</strong> the bus<strong>in</strong>ess model ............................ 253<br />

Appendix 3: E-mail <strong>in</strong>vitation to employees ............................................................................................ 255<br />

Appendix 4: Questions conta<strong>in</strong>ed <strong>in</strong> questionnaire .................................................................................. 257<br />

Appendix 5: Questionnaire pilot test data ................................................................................................ 259<br />

Appendix 6: L<strong>in</strong>k to questionnaire ........................................................................................................... 261<br />

Appendix 7: Dates for the conduct of focus group <strong>in</strong>terviews ................................................................. 263<br />

Appendix 8: Example of the slides guid<strong>in</strong>g a focus group <strong>in</strong>terview ....................................................... 265<br />

Appendix 9: Focus group <strong>in</strong>terview guide................................................................................................ 275<br />

Appendix 10: Transcription conventions.................................................................................................. 277<br />

Appendix 11: Verbatim Danish versions of quotes used <strong>in</strong> the thesis ...................................................... 279<br />

Appendix 12: E-mail <strong>in</strong>vitation to participate <strong>in</strong> search test..................................................................... 287<br />

Appendix 13: Questionnaire for recruit<strong>in</strong>g test persons for the search test .............................................. 291<br />

Appendix 14: Simulated search tasks ....................................................................................................... 293<br />

Appendix 15: Test persons’ <strong>in</strong>sight <strong>in</strong>to simulated search tasks .............................................................. 295<br />

Appendix 16: E-mail concern<strong>in</strong>g naturalistic <strong>in</strong>formation needs ............................................................. 297<br />

Appendix 17: Instructions for search test persons .................................................................................... 299<br />

Appendix 18: Rotation of search tasks ..................................................................................................... 303<br />

Appendix 19: Search test <strong>in</strong>terview guide ................................................................................................ 305<br />

Appendix 20: Judgement of the relevance of retrieved documents <strong>in</strong> search test .................................... 307<br />

Appendix 21: Completeness degree of questionnaire responses .............................................................. 309<br />

Appendix 22: Respondents’ experience with work tasks ......................................................................... 311<br />

Appendix 23: Age distribution of population, respondents and test persons ............................................ 313<br />

Appendix 24: Respondents’ length of service <strong>in</strong> the organization ........................................................... 315<br />

Appendix 25: Focus group participants work tasks .................................................................................. 317<br />

Appendix 26: Additional sources mentioned by respondents ................................................................... 319<br />

Appendix 27: Test persons’ background data .......................................................................................... 325<br />

Appendix 28: Supplementary search test tables ....................................................................................... 327


Appendix 1: Generic work tasks at SKAT<br />

249<br />

Appendices<br />

This appendix summarizes and expla<strong>in</strong>s the content of the 19 generic work tasks<br />

constitut<strong>in</strong>g the version of the bus<strong>in</strong>ess model that formed the basis of the survey<br />

questionnaire.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Ma<strong>in</strong> process Work task Description<br />

Instruction Common Answer<strong>in</strong>g requests, whether written, <strong>in</strong> person, or by<br />

phone.<br />

Market<strong>in</strong>g, guidance, and outgo<strong>in</strong>g service.<br />

Settlement Common Handl<strong>in</strong>g payments, settlements, certifications, access<br />

to records or registration, expenditures,<br />

reimbursements or deal<strong>in</strong>g with compla<strong>in</strong>ts.<br />

Prelim<strong>in</strong>ary Prelim<strong>in</strong>ary <strong>in</strong>come assessments, annual tax<br />

assessment of statements and returns of personal taxes, family<br />

<strong>in</strong>come/person allowance, gift taxes, taxation of estate of deceased<br />

al taxes persons, undivided estate, and settlements regard<strong>in</strong>g<br />

personal taxes.<br />

Bus<strong>in</strong>ess Handl<strong>in</strong>g and mak<strong>in</strong>g decisions about bus<strong>in</strong>esses<br />

relations regard<strong>in</strong>g VAT settlements, excise duties, retirement<br />

benefits, taxes on labor costs, A tax, and differences of<br />

<strong>in</strong>come.<br />

Corporation Prelim<strong>in</strong>ary <strong>in</strong>come assessments, annual tax<br />

taxes<br />

statements, deal<strong>in</strong>g with applications and mak<strong>in</strong>g<br />

decisions – all regard<strong>in</strong>g foundations, associations,<br />

and companies.<br />

Customs Registration of imports and exports, custom<br />

procedures for private persons and companies, mak<strong>in</strong>g<br />

decisions about areas of customs and deal<strong>in</strong>g with<br />

applications<br />

permissions.<br />

for custom licenses and custom<br />

Vehicles Expedition of vehicles and license plates, handl<strong>in</strong>g<br />

procedures concern<strong>in</strong>g duty exemption, assessments,<br />

and monthly specifications.<br />

Estate Assessments (depreciations) of estate, handl<strong>in</strong>g<br />

assessment communications, taxation on the basis of<br />

the law of assessed valuation, recalculation of taxes,<br />

and registration of property.<br />

250


251<br />

Appendices<br />

Inspection Common Handl<strong>in</strong>g crim<strong>in</strong>al cases and cases of liability,<br />

<strong>in</strong>clud<strong>in</strong>g the right to operate, divided estates, and<br />

<strong>in</strong>spections.<br />

Customs Inspection of customs (goods and means of<br />

transportation) towards citizens and bus<strong>in</strong>esses.<br />

Collection Common Collection tasks, <strong>in</strong>clud<strong>in</strong>g enforced payments,<br />

Processes of Legal support<br />

adm<strong>in</strong>istration of estates, and handl<strong>in</strong>g compla<strong>in</strong>ts<br />

about collection.<br />

Dissem<strong>in</strong>ation of rules, <strong>in</strong>structions, <strong>in</strong>formation, and<br />

support<br />

<strong>in</strong>terpretation of practice, rules, and laws.<br />

Secretary Preparation of draft m<strong>in</strong>isterial replies to the<br />

service parliament and citizens, of memos and analyzes, and<br />

submission of hear<strong>in</strong>g statements for legislation and<br />

m<strong>in</strong>isterial responses for the Fiscal Affairs Committee.<br />

IT service and Ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g processes to ensure and document the<br />

adm<strong>in</strong>istration best IT support of SKATs processes through<br />

appo<strong>in</strong>tments and contact with the bus<strong>in</strong>ess. System<br />

ownership,<br />

management.<br />

platform ownership, and change<br />

HR and Adm<strong>in</strong>istration to ensure proper staff conditions and<br />

education the treatment of employees accord<strong>in</strong>g to current rules.<br />

Examples count recruitment, hir<strong>in</strong>g, tra<strong>in</strong><strong>in</strong>g and<br />

development, payroll, and absenteeism.<br />

Internal Procurement and adm<strong>in</strong>istration of goods, services and<br />

activities build<strong>in</strong>gs, account<strong>in</strong>g, communications, press, and<br />

secretarial service.<br />

Management Strategy Processes through which strategies for SKAT are<br />

and<br />

planned by means of sight l<strong>in</strong>es, overall objectives,<br />

development<br />

development<br />

prioritization.<br />

of strategic <strong>in</strong>itiatives, and their<br />

Bus<strong>in</strong>ess Adm<strong>in</strong>istration of grants and contracts, production<br />

management plann<strong>in</strong>g, management of contracts and vendors, and<br />

IT architecture management<br />

Development Ma<strong>in</strong>ta<strong>in</strong><strong>in</strong>g tasks that support the legislative process<br />

or participation <strong>in</strong> development projects.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

252


253<br />

Appendices<br />

Appendix 2: Distribution of employees across ma<strong>in</strong> processes <strong>in</strong> the<br />

bus<strong>in</strong>ess model<br />

The figures <strong>in</strong> the table below orig<strong>in</strong>ate from an e-mail correspondence with SKAT. The<br />

figures reflect the distribution of full-time equivalents across the six ma<strong>in</strong> processes<br />

from the bus<strong>in</strong>ess model of SKAT.<br />

Ma<strong>in</strong> process # %<br />

Instruction 881 11,1%<br />

Settlement 2355 29,8%<br />

Inspection 2321 29,3%<br />

Collection 1020 12,9%<br />

Processes of support 819 10,4%<br />

Management and development 516 6,5%<br />

Total 7912 100%


Appendix 3: E-mail <strong>in</strong>vitation to employees<br />

Subject: Invitation til at svare på et spørgeskema<br />

Kære medarbejder hos SKAT<br />

255<br />

Appendices<br />

Jeg er ph.d studerende ved Danmarks Biblioteksskole. Som en del af et større<br />

forskn<strong>in</strong>gsprojekt, der udføres i samarbejde med IT & Telestyrelsen, er jeg i øjeblikket<br />

ved at foretage en undersøgelse af, hvordan medarbejdere hos SKAT benytter<br />

forskellige <strong>in</strong>formationskilder i forb<strong>in</strong>delse med deres forskellige arbejdsopgaver.<br />

Formålet er at undersøge, hvordan man kan forbedre medarbejderes søgn<strong>in</strong>g efter<br />

<strong>in</strong>formation, når de løser forskellige arbejdsopgaver.<br />

Jeg bruger blandt andet et spørgeskema til at <strong>in</strong>dsamle data. I den forb<strong>in</strong>delse skriver jeg<br />

til dig for at høre, om du vil bidrage til undersøgelsen ved at besvare spørgeskemaet.<br />

Det tager ca. 10 m<strong>in</strong>utter at besvare spørgeskemaet, som er tilgængeligt på <strong>in</strong>ternettet.<br />

D<strong>in</strong> besvarelse vil naturligvis blive behandlet fortroligt. Det betyder, at resultaterne kun<br />

vil blive gjort op på en sådan måde, at enkeltpersoners besvarelser ikke kan identificeres<br />

i resultaterne.<br />

Jeg håber du vil hjælpe med projektet ved at besvare spørgeskemaet. Undersøgelsen<br />

kører <strong>in</strong>dtil den 18/12-2008 kl. 18.<br />

Du kommer i gang ved trykke på følgende l<strong>in</strong>k:<br />

http://kalus3.kalus.dk/l?d=3LPNCV2EeE3E<br />

Du er selvfølgelig velkommen til at kontakte mig, hvis du har kommentarer, spørgsmål<br />

eller lignende. På forhånd mange tak for d<strong>in</strong> tid og hjælp.<br />

Med venlig hilsen<br />

Tanja Svarre<br />

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,<br />

Tanja Svarre, ph.d studerende<br />

Danmarks Biblioteksskole, <strong>Aalborg</strong>-afdel<strong>in</strong>gen, Frederik Bajers Vej 7K, 9220 <strong>Aalborg</strong><br />

Øst<br />

Tlf. 9815 7922, fax 9815 1042<br />

E-mail: tas@db.dk


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

256


Appendix 4: Questions conta<strong>in</strong>ed <strong>in</strong> questionnaire<br />

QUESTION<br />

NUMBER<br />

TITLE OF<br />

QUESTION<br />

1 Age 1a<br />

2 Gender 1b<br />

3 Education 2a<br />

4 Title of education 2b<br />

5 Place of employment 3a<br />

6 Further comments to place of employment 3b<br />

7 Departmental affiliation 4‐10<br />

8 Length of service <strong>in</strong> organization 11<br />

9 Work tasks with<strong>in</strong> <strong>in</strong>struction 12<br />

10 Work tasks with<strong>in</strong> settlement 16<br />

11 Work tasks with<strong>in</strong> <strong>in</strong>spection 38<br />

12 Work tasks with<strong>in</strong> collection 45<br />

13 Work tasks with<strong>in</strong> processes of support 49<br />

14 Work tasks with<strong>in</strong> management and development 65<br />

257<br />

PAGE OF REFERENCE<br />

Appendices<br />

IN WEB QUESTIONNAIRE<br />

15 Frequency of work task 13a, 17a, 20a, 23a, 26a, 29a,<br />

32a, 35a, 39a, 42a, 46a, 50a,<br />

53a, 56a, 59a, 62a, 66a, 69a,<br />

72a<br />

16 Work task experience 13b, 17b, 20b, 23b, 26b,<br />

29b, 32b, 35b, 39b, 42b,<br />

46b, 50b, 53b, 56b, 59b,<br />

62b, 66b, 69b, 72b<br />

17 Need for <strong>in</strong>formation to solve work task 14a, 18a, 21a, 24a, 27a, 30a,<br />

33a, 36a, 40a, 43a, 47a, 51a,<br />

54a, 57a, 60a, 63a, 67a, 70a,<br />

73a<br />

18 Information sources 14b,18b, 21b, 24b, 27b, 30b,<br />

33b, 36b, 40b, 43b, 47b,<br />

51b, 54b, 57b, 60b, 63b,<br />

67b, 70b, 73b


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

19 Information needs 15a, 19a, 22a, 25a, 28a, 31a,<br />

34a, 37a, 41a, 44a, 48a, 52a,<br />

55a, 58a, 61a, 64a, 68a, 71a,<br />

74a<br />

20 Metadata 15b, 19b, 22b, 25b, 28b,<br />

31b, 34b, 37b, 41b, 44b,<br />

48b, 52b, 55b, 58b, 61b,<br />

64b, 68b, 71b, 74b<br />

21 Closure and further contact 75<br />

258


Appendix 5: Questionnaire pilot test data<br />

Pilot recipients Logged <strong>in</strong>to pilot<br />

F<strong>in</strong>ished pilot<br />

questionnaire<br />

questionnaire<br />

100% 89 46% 41 29% 26<br />

259<br />

Appendices


Appendix 6: L<strong>in</strong>k to questionnaire<br />

The full version of the questionnaire can be found follow<strong>in</strong>g this l<strong>in</strong>k:<br />

http://kalus3.kalus.dk/l?d=zTBK24SAF6ep<br />

261<br />

Appendices


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

262


Appendix 7: Dates for the conduct of focus group <strong>in</strong>terviews<br />

Date Work task<br />

9/6-2009 Settlement<br />

11/6-2009 Instruction<br />

22/6-2009 Processes of support<br />

22/6-2009 Inspection: Customs<br />

23/6-2009 Inspection: Common<br />

29/6-2009 Management and development<br />

1/7-2009 Collection<br />

263<br />

Appendices


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

264


Appendix 8: Example of the slides guid<strong>in</strong>g a focus group <strong>in</strong>terview<br />

265<br />

Appendices<br />

The follow<strong>in</strong>g slides guided the focus group for management and development. 15<br />

slides were presented to the participants, <strong>in</strong>troduc<strong>in</strong>g the purpose of the <strong>in</strong>terview and<br />

present<strong>in</strong>g results from the questionnaire. The form of the slides <strong>in</strong> the present<br />

appendix was followed <strong>in</strong> the rema<strong>in</strong>der of the focus group slides.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

266


267<br />

Appendices


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

268


269<br />

Appendices


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

270


271<br />

Appendices


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

272


273<br />

Appendices


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

274


Appendix 9: Focus group <strong>in</strong>terview guide<br />

275<br />

Appendices<br />

The focus group <strong>in</strong>terview guide followed the structure of the questionnaire and the<br />

succession of the focus group slide shows. Below the <strong>in</strong>terview guide questions are<br />

listed as to the slides they were support<strong>in</strong>g (cf. the previous appendix).<br />

Correspond<strong>in</strong>g<br />

slides<br />

Slide 5<br />

Slide 6-7<br />

Slides 8-9<br />

Interview guide questions<br />

The <strong>in</strong>terview started out with a short presentation of each participant<br />

as to:<br />

Their concrete work task with<strong>in</strong> the ma<strong>in</strong> process of the<br />

bus<strong>in</strong>ess model,<br />

Their experience with the work task,<br />

Their educational background,<br />

How often they carry out the work task, and<br />

Whether they carry out other work tasks than the one<br />

discussed today<br />

Does the frequency of <strong>in</strong>formation seek<strong>in</strong>g depend on the<br />

concrete work tasks? How? Why?<br />

Is it possible, that the answers from the survey express<br />

average frequencies? If so, what is the real frequency? What is<br />

the actual oscillation?<br />

How often do you seek <strong>in</strong>formation?<br />

Are you seek<strong>in</strong>g <strong>in</strong>formation for certa<strong>in</strong> work tasks?<br />

Is there a difference between the way, you seek <strong>in</strong>formation<br />

depend<strong>in</strong>g on the work task <strong>in</strong> question?<br />

What sources are used when, and why?<br />

And for which work tasks?<br />

What is the frequency of use of concrete <strong>in</strong>formation sources?


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Slides 10-11<br />

Slides 12-14<br />

Do the results reflect your everyday <strong>in</strong>formation needs?<br />

How?<br />

Are there any differences?<br />

Can you try to expla<strong>in</strong>, when which metadata could be of use?<br />

o Is it <strong>in</strong> certa<strong>in</strong> situations?<br />

o For certa<strong>in</strong> work tasks?<br />

o For certa<strong>in</strong> <strong>in</strong>formation needs?<br />

o For certa<strong>in</strong> types of documents?<br />

How do you th<strong>in</strong>k, <strong>in</strong>formation seek<strong>in</strong>g at the <strong>in</strong>tranet is<br />

work<strong>in</strong>g at present?<br />

276


Appendix 10: Transcription conventions<br />

277<br />

Appendices<br />

Verbatim transcription took place <strong>in</strong> connection with focus group <strong>in</strong>terviews <strong>in</strong> the<br />

doma<strong>in</strong> study and <strong>in</strong>dividual <strong>in</strong>terviews <strong>in</strong> the search test. In order to target<br />

transcription consistency, a set of guidel<strong>in</strong>es were developed ahead of the transcription<br />

process (cf. Poland, 2003). Some guidel<strong>in</strong>es recurred <strong>in</strong> the transcriptions of both focus<br />

group and <strong>in</strong>dividual <strong>in</strong>terviews. In both cases, topics that were irrelevant to the theme<br />

of the <strong>in</strong>terview were omitted from the transcription along with laughter, <strong>in</strong>terjections,<br />

and the like. Whenever passages were counted out, it was marked with “...”.<br />

S<strong>in</strong>ce the two forms of <strong>in</strong>terviews carried out at some po<strong>in</strong>ts differ with respect<br />

to transcription issues, some type specific guidel<strong>in</strong>es supplemented the common<br />

recommendations mentioned above. Verbatim transcription is a challeng<strong>in</strong>g task<br />

(Halcomb & Davidson, 2006). Transcription of focus group <strong>in</strong>terviews is particularly<br />

challeng<strong>in</strong>g. Apart from identify<strong>in</strong>g and typ<strong>in</strong>g s<strong>in</strong>gle words and statements, the<br />

transcriber must identify who said what when. In addition the participants occasionally<br />

spoke all at once. All this considered, it was decided to transcribe the focus group<br />

<strong>in</strong>terviews without outside assistance. To keep focus on the content of the<br />

conversations tak<strong>in</strong>g place, affirmative remarks from fellow participants were omitted<br />

from the transcriptions.<br />

As regards the <strong>in</strong>dividual search test <strong>in</strong>terviews an external transcriber was<br />

hired. In addition to the common omissions, further elements were systematically<br />

sorted out dur<strong>in</strong>g the transcriptions. These elements comprise:<br />

Introductions to the search test (see Appendix 17 for a description of the<br />

<strong>in</strong>troduction delivered to all test persons ahead of the search test), whether <strong>in</strong> the<br />

beg<strong>in</strong>n<strong>in</strong>g of the search test or <strong>in</strong> the middle <strong>in</strong>troduc<strong>in</strong>g the part referr<strong>in</strong>g to the<br />

categorization.<br />

Conversations dur<strong>in</strong>g the search test, that was considered irrelevant to the<br />

content of the thesis. These pieces of conversation especially took place, when<br />

the system had a long response time to a request.<br />

Clarify<strong>in</strong>g comments, questions and related responses concern<strong>in</strong>g the execution<br />

of the test.<br />

All transcriptions were set up with l<strong>in</strong>e numbers <strong>in</strong> order to enable accurate referral.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

278


Appendix 11: Verbatim Danish versions of quotes used <strong>in</strong> the thesis<br />

279<br />

Appendices<br />

This appendix reports the Danish quotes applied <strong>in</strong> the thesis. The quotes orig<strong>in</strong>ate from<br />

the focus group and search test <strong>in</strong>terviews. The full transcriptions have been enclosed<br />

for the assessment committee. Other <strong>in</strong>terested are referred to the author for details on<br />

the <strong>in</strong>terviews.<br />

The identification of quotes follows the structure below:<br />

Focus group<br />

transcriptions<br />

Test person<br />

transcriptions<br />

Reference <strong>in</strong><br />

text<br />

Explanation<br />

(R1, p. 3) R1 refers to the focus group participant, p.3 to the<br />

pages of the transcription referred to.<br />

(TP1, l<strong>in</strong>e 2-<br />

6) (r<br />

TP refers to the test person deliver<strong>in</strong>g the quote.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />

data<br />

R16, p. 5 Jeg bruger elektroniske opslagsværker rigtig meget. Jeg synes jeg f<strong>in</strong>der langt det meste<br />

på de elektroniske opslagsværker. Hvis jeg søger rigtigt, så får jeg det. Men det<br />

krydshenviser jo også til både alt det, der ligger på <strong>in</strong>tranettet og det skulle jo også gerne<br />

fange det, der ligger på Internettet, nemlig via folket<strong>in</strong>gets hjemmeside og den slags t<strong>in</strong>g.<br />

R23, p. 11 Men så længe, man har et trykt opslagsværk, så er det jo nemmere at slå op i. Hvis man<br />

lige ved, hvor man skal lede.<br />

R11, p. 5 For mit vedkommende hvis jeg skal bruge nogle afgørelser, så bruger jeg Google, selvom<br />

jeg ved jeg kan gå <strong>in</strong>d i rets<strong>in</strong>formation og Thomson også. Men jeg søger på Google, for<br />

jeg synes de der elektroniske opslagsværker, som vi har, de er simpelthen for dårlige. Så<br />

f<strong>in</strong>der jeg afgørelsen <strong>in</strong>de på Google og så kan det godt være jeg bliver henvist til en af de<br />

sider, som vi måske egentlig ret beset burde bruge, men jeg synes deres søgefunktioner er<br />

simpelthen for dårlige.<br />

R33, p. 1 Så det <strong>in</strong>formationsbehov, jeg har, det er jo måske mere målrettet på det, som ændrer sig i<br />

nye retsafgørelser, ny lovgivn<strong>in</strong>g, og det får vi jo normalt via <strong>in</strong>tranettet og det vil jo sige,<br />

at så går jeg <strong>in</strong>d hver morgen og ser, er der kommet noget, der relaterer sig til <strong>in</strong>ddrivelse,<br />

og det er den måde, jeg holder mig ajour på.<br />

R7, p. 5 Så <strong>in</strong>tranettet det er jo vores alle sammens opslagstavle og så er søgeresultatet jo altså<br />

også derefter. Du får jo bageopskrifterne også, hvis de er lagt <strong>in</strong>d.<br />

XX,<br />

afregn<strong>in</strong>g, p.<br />

3<br />

xx,<br />

guidance, p.<br />

11<br />

R19<br />

(0:33:27):,<br />

p. 5<br />

Det er tit når vi sidder på agenttelefonerne, f. eks da e-<strong>in</strong>dkomst var nyt, så kunne de<br />

spørge os ”hvordan laver man en efterangivelse” og vi var også i tvivl om mange af<br />

spørgsmålene, så kunne vi gå <strong>in</strong>d og søge på <strong>in</strong>tranettet, men vi opgav. Vi blev nødt til at<br />

stille dem videre til nogle af dem, der sad med det, for det tog for lang tid og det var<br />

uoverskueligt at søge på <strong>in</strong>tranettet. Vi kunne ikke f<strong>in</strong>de de svar, vi havde brug for. Fordi<br />

du fik side op og side ned og alt der stod bare med den m<strong>in</strong>dste om e-<strong>in</strong>dkomst, det<br />

kommer jo med.<br />

Jamen hvis det er opgaver <strong>in</strong>denfor specielle problemstill<strong>in</strong>ger, hvor vi ved at vi har<br />

kolleger, der har nogle spidskompetencer der, så er det jo fristende at gå hen og spørge,<br />

fordi vedkommende mange gange også kender måske de sidste afgørelser, der ligger på<br />

det område. Frem for at begynde at… der er også en tidsfaktor i det. Man kan spare en del<br />

tid ved at…<br />

Jo, men der har jeg det også lidt sådan at, jeg kan egentlig lidt bedst lide at slå op i<br />

toldvejledn<strong>in</strong>gen i første omgang og så… hvis det ikke rigtigt jeg synes at jeg er sikker på<br />

om der nu er kommet noget nyt, så går jeg <strong>in</strong>d og roder lidt og ser den elektroniske og<br />

280


281<br />

Appendices<br />

Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />

data<br />

sådan noget. Og så går jeg altid over og spørger…<br />

R1, p. 10 Det kommer jo helt an på hvor god man er til at beskrive det emne. Hvad er det for ord,<br />

man bruger? Hvem er det, der deler det op i de hovedemner, der kan søges? Det kommer<br />

helt an på kvaliteten af det, der ligger der. Og dem, der har lagt det <strong>in</strong>d.<br />

R7, p. 8 Mange gange i forb<strong>in</strong>delse med sagsbehandl<strong>in</strong>g så går du jo også <strong>in</strong>d og leder efter jamen<br />

er der afgørelser, kendelser eller domme på tilsvarende område. Og så går du jo positivt<br />

<strong>in</strong>d og søger i domme og kendelser, så det er udelukkende dokumenttypen i første<br />

omgang, som at du ved at det er sådan en, du vil have fat i. men det er ikke fordi det er det<br />

vigtigste, men det er en del af det, vi bruger i lige præcis den salgsbehandl<strong>in</strong>g.<br />

R7, p. 2-3 Hvis der kommer en kunde herude, og henvender sig ved skranken, så beder du om at få<br />

vedkommendes cpr-nummer og går <strong>in</strong>d på deres oplysn<strong>in</strong>ger. Det er den først <strong>in</strong>formation.<br />

Du kan ikke ekspedere en kunde andet end at du søger <strong>in</strong>formation m<strong>in</strong>dst en gang. Og så<br />

er spørgsmålet at hvis folk har troet, at <strong>in</strong>formationen var først på det tidspunkt, at der<br />

blev stillet et spørgsmål, at man så gik <strong>in</strong>d og brugte det. Men <strong>in</strong>formation er jo allerede,<br />

når vi henter data frem på skatteyderen. Når vi skifter billede, så henter vi en ny<br />

<strong>in</strong>formation.<br />

R10, p. 4 Det er fordi vores… da vi var kommunalt ansatte, der gik vores opgave ud på at ligne så<br />

mange folk som muligt, altså gennemse deres selvangivelse og se, om de gjorde det rigtigt<br />

eller forkert. Det er så lavet om efter vi er kommet til staten, og det vil sige dengang der<br />

fik vi en erfar<strong>in</strong>g hele tide og holdt ved lige med hvad sker der på det område og det<br />

område. Men efter vi er kommet til staten, der er det ikke første prioritet, det er tvært imod<br />

nok lavest prioriteret, nu der er det at vi skal sørge for at få folk til at bruge tast selv og<br />

lave fejllister, så derfor mister vi hele tiden noget af det, vi engang bare kunne på<br />

rygraden. Jeg kan i hvert fald mærke med mig selv, at mange af de spørgsmål, jeg førhen<br />

bare havde sådan der, det skal du altså <strong>in</strong>d og læse om nu her. For lige at ajourføre og se<br />

er der kommet noget nyt siden.<br />

R14, p. 3 M<strong>in</strong> umiddelbare forklar<strong>in</strong>g på m<strong>in</strong>isterbetjen<strong>in</strong>g vil være, at jamen der er der så meget<br />

mere på spil, når man betjener m<strong>in</strong>isteren, at man skal være så meget mere sikker i s<strong>in</strong><br />

sag. Det er m<strong>in</strong> umiddelbare vurder<strong>in</strong>g af det, hvor imod jamen altså den paratviden vi har<br />

som juridiske eksperter på hvert vores område gør, at vi meget ofte kan klare et spørgsmål<br />

eller et problem med et skud fra hoften med den viden, vi har og så nogle gange, jamen så<br />

har man brug for lige at slå t<strong>in</strong>g efter. Men altså med m<strong>in</strong>isterbetjen<strong>in</strong>g, der skal man være<br />

100 % sikker, det skal man selvfølgelig også i andre sager, men der er bare mere på spil


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />

data<br />

med m<strong>in</strong>isterbetjen<strong>in</strong>g.<br />

R32, p. 4 Nu nede i vores gruppe, der er det mere erfar<strong>in</strong>g. Der skal vi vide, hvad den afdel<strong>in</strong>g laver<br />

næsten, og den afdel<strong>in</strong>g laver og det prøver vi også. Nu skal vi have møde igen på fredag,<br />

hvor det skal gøres yderligere bemærket med, hvad de enkelte afdel<strong>in</strong>ger laver. Men det er<br />

jo altså hvad man kan huske hele tiden. Det har den der noget med at gøre og det har den<br />

der noget med at gøre. Du kan næsten ikke slå det op nogen steder.<br />

R23, p. 12 Jeg bruger det ikke bare til at søge ud i den blå luft. Det ville jeg gøre på google. Ikke<br />

ellers. Brugt eller set før. Det er ikke sikkert, man har brugt det, men man har i hvert fald<br />

set det.<br />

R28, p. 10-<br />

11<br />

TP1, l<strong>in</strong>e<br />

243-244<br />

TP6, l<strong>in</strong>e<br />

49-52<br />

TP3, l<strong>in</strong>e<br />

73-76<br />

FOC 3, L<strong>in</strong>e<br />

200-202<br />

(FOC 6,<br />

L<strong>in</strong>e 145-<br />

Det er jo der også at styresignaler og t<strong>in</strong>g og sager kommer. Det vi skal rette os efter<br />

<strong>in</strong>denfor forretn<strong>in</strong>gen. Og også... vejledn<strong>in</strong>gerne, de juridiske vejledn<strong>in</strong>ger, når de bliver<br />

opdateret, kommer det jo også ud der. Så egentlig er der jo rigtig meget, man følger med i.<br />

Man kan ikke undgå det. Det ville være uhyggeligt, hvis den ikke var på 100 %, vores<br />

<strong>in</strong>tranet. På en eller anden måde er man ligesom der<strong>in</strong>de for at kunne passe sit arbejde.<br />

Altså jeg har ikke fundet noget, hvor der står decideret for, hvordan man gør, men jeg har<br />

fundet noget, der måske <strong>in</strong>dikerer, at der kan jeg f<strong>in</strong>de reglerne.<br />

IP: Men det er stadig en to’er for dig?<br />

TP06: Ja, det synes jeg, det er, fordi man får alligevel lidt at vide om, hvordan<br />

beskatn<strong>in</strong>gsreglerne er… Men man skal selvfølgelig niveauet længere ned for at ramme en<br />

treer på det.<br />

Jeg ville ikke give den en treer. Jeg ville nok faktisk give en etter til begge to, fordi jeg<br />

først kan vide, om det er det korrekte, når jeg kommer <strong>in</strong>d i og ser, om det egentlig er det,<br />

jeg har brug for. Men det er dem, jeg ville vælge - med m<strong>in</strong>dre jeg kan se, at jeg kan gå<br />

videre.<br />

...det er fuldstændigt ubrugeligt. Man kan ikke f<strong>in</strong>de noget. Ja, det kan du godt, du kan<br />

f<strong>in</strong>de 5.000 hits på et eller andet. Man kan ikke bruge det til noget. Det er også derfor jeg<br />

tror, der er mange, der gerne vil have bøger. Det er fordi de er rimeligt sikre på de der<br />

stikordsregistre...<br />

Det er en høj, høj, høj frekvens af <strong>in</strong>formationssøgn<strong>in</strong>g. Det er jo pibende nødvendigt og<br />

vigtigt, at alt det, vi sender ud herfra, det er bare rigtigt. Om så det er en sats eller en<br />

paragrafhenvisn<strong>in</strong>g eller hvad dælen det er, så skal det bare være i orden.<br />

... man kan jo ikke huske alle reglerne udenad, så derfor går man <strong>in</strong>d og læser på dem.<br />

282


283<br />

Appendices<br />

Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />

data<br />

146<br />

FOC 7, L<strong>in</strong>e<br />

120-145<br />

FOC 1, L<strong>in</strong>e<br />

216-218<br />

FOC 2, L<strong>in</strong>e<br />

285-288<br />

FOC 6, L<strong>in</strong>e<br />

159-173<br />

FOC 2, L<strong>in</strong>e<br />

273-277<br />

FOC 5, L<strong>in</strong>e<br />

281-294<br />

Hvis der kommer en kunde herude, og henvender sig ved skranken, så beder du om at få<br />

vedkommendes cpr-nummer og går <strong>in</strong>d på deres oplysn<strong>in</strong>ger. Det er den første<br />

<strong>in</strong>formation. Du kan ikke ekspedere en kunde andet end at du søger <strong>in</strong>formation m<strong>in</strong>dst en<br />

gang.... Men er der en, der kommer og stille en et fagligt spørgsmål, så er behovet ikke<br />

nær så stort. Fordi så sidder der noget på rygraden, du svarer ud fra... De eneste<br />

henvendelser, der ikke kræver <strong>in</strong>formation er dem, der spørger om vej til motorkontoret,<br />

de får en vejledn<strong>in</strong>g udleveret. Alle andre er der opslag i forb<strong>in</strong>delse med.<br />

Vi kan jo <strong>in</strong>gent<strong>in</strong>g uden at vi har edb mulighed for at gå <strong>in</strong>d og spørge på en virksomhed,<br />

krav, hvad skylder den her virksomhed, den her person, hvad skylder han eller hun. Vi<br />

skal <strong>in</strong>d over nettet hele tiden.<br />

Det er jo kun, synes jeg, hvis man skal ekspedere en helt ny sag. Så bliver jeg selvfølgelig<br />

nødt til at søge noget mere om den her virksomhed, og vis det er en virksomhed, jeg<br />

kender i forvejen, så går jeg måske bare lige <strong>in</strong>d og tjekker, hvad er der angivet og hvad er<br />

der betalt. Men uanset hvad, så går jeg jo altid <strong>in</strong>d og søger <strong>in</strong>den jeg skal snakke med en<br />

virksomhed.<br />

M<strong>in</strong> umiddelbare forklar<strong>in</strong>g på m<strong>in</strong>isterbetjen<strong>in</strong>g vil være, at jamen der er der så meget<br />

mere på spil, når man betjener m<strong>in</strong>isteren, at man skal være så meget mere sikker i s<strong>in</strong><br />

sag... med m<strong>in</strong>isterbetjen<strong>in</strong>g, der skal man være 100 % sikker, det skal man selvfølgelig<br />

også i andre sager, men der er bare mere på spil med m<strong>in</strong>isterbetjen<strong>in</strong>g.... Man skal være<br />

100 % sikker på det man skriver og yder og bidrager med, det er rigtigt.<br />

Nu nede i vores gruppe, der er det mere erfar<strong>in</strong>g. Der skal vi vide, hvad den afdel<strong>in</strong>g laver<br />

næsten, og den afdel<strong>in</strong>g laver og det prøver vi også... Men det er jo altså hvad man kan<br />

huske hele tiden. Det har den der noget med at gøre og det har den der noget med at gøre.<br />

Du kan næsten ikke slå det op nogen steder.<br />

Jo, det synes jeg, fordi jeg synes også man bruger det til at orientere sig om nogle t<strong>in</strong>g, før<br />

man møder op eller <strong>in</strong>den man skriver mailen... det er jo også i forhold til hvordan man<br />

betragter en opgave, for jeg tænker lidt at <strong>in</strong>tranet og søgn<strong>in</strong>g er jo hele tiden en del af mit<br />

arbejde, også bare det at holde mig orienteret om, jamen både for SKAT som forretn<strong>in</strong>g<br />

men også det fagområde, jeg sidder med. Så man på en eller anden måde enten søger<br />

<strong>in</strong>formation eller har tilmeldt sig en nyhedsmail... Og alle de <strong>in</strong>formationer er jo med til,<br />

hvordan man kan løse en opgave på en eller anden facon.<br />

FOC 4, L<strong>in</strong>e ...de søger generelt ikke ret meget. De r<strong>in</strong>ger eventuelt, hvis der er et eller andet.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />

data<br />

67-68<br />

FOC 3, L<strong>in</strong>e<br />

877-878)<br />

FOC 4, L<strong>in</strong>e<br />

371-373<br />

FOC 7, L<strong>in</strong>e<br />

208-216<br />

FOC 5, L<strong>in</strong>e<br />

289-294<br />

FOC 5, L<strong>in</strong>e<br />

554-558<br />

TP10, l<strong>in</strong>e<br />

10-11<br />

TP05, l<strong>in</strong>e<br />

113-117<br />

Jeg bruger det ikke bare til at søge ud i den blå luft. Det ville jeg gøre på google. Ikke<br />

ellers. Brugt eller set før. Det er ikke sikkert, man har brugt det, men man har i hvert fald<br />

set det.<br />

Jeg ved der ligger et eller andet dokument, det skal jeg lige bruge nu. Eller jeg ved at der<br />

f<strong>in</strong>des denne her dom, den skal jeg lige f<strong>in</strong>de frem. Eller et eller andet. Typisk nok noget,<br />

jeg har set før, men som jeg nu skal bruge igen.<br />

...da vi var kommunalt ansatte, der gik vores opgave ud på at ligne så mange folk som<br />

muligt, altså gennemse deres selvangivelse og se, om de gjorde det rigtigt eller forkert...<br />

det vil sige dengang der fik vi en erfar<strong>in</strong>g hele tide og holdt ved lige med hvad sker der på<br />

det område og det område... nu der er det at vi skal sørge for at få folk til at bruge tast<br />

selv og lave fejllister, så derfor mister vi hele tiden noget af det, vi engang bare kunne på<br />

rygraden. Jeg kan i hvert fald mærke med mig selv, at mange af de spørgsmål, jeg førhen<br />

bare havde sådan der, det skal du altså <strong>in</strong>d og læse om nu her. For lige at ajourføre og se<br />

er der kommet noget nyt siden.<br />

...for jeg tænker lidt at <strong>in</strong>tranet og søgn<strong>in</strong>g er jo hele tiden en del af mit arbejde, også bare<br />

det at holde mig orienteret om, jamen både for SKAT som forretn<strong>in</strong>g men også det<br />

fagområde, jeg sidder med. Så man på en eller anden måde enten søger <strong>in</strong>formation eller<br />

har tilmeldt sig en nyhedsmail, hvor man så får det <strong>in</strong>d på den måde. Og alle de<br />

<strong>in</strong>formationer er jo med til, hvordan man kan løse en opgave på en eller anden facon.<br />

”Det er jo der også at styresignaler og t<strong>in</strong>g og sager kommer. Det vi skal rette os efter<br />

<strong>in</strong>denfor forretn<strong>in</strong>gen. Og også... vejledn<strong>in</strong>gerne, de juridiske vejledn<strong>in</strong>ger, når de bliver<br />

opdateret, kommer det jo også ud der. Så egentlig er der jo rigtig meget, man følger med i.<br />

Man kan ikke undgå det. Det ville være uhyggeligt, hvis den ikke var på 100 %, vores<br />

<strong>in</strong>tranet. På en eller anden måde er man ligesom der<strong>in</strong>de for at kunne passe sit arbejde.”<br />

Men første gang, jeg søgte, der kom der en håndbog om e-handel. Den ville jeg hellere<br />

have valgt end at gå derned.<br />

Det er ligeså r<strong>in</strong>ge, for der står restance. Og arbejdsgivere, og det er <strong>in</strong>gen af delene. Så<br />

skal vi se med arbejdsgivere… fordi der står arbejdsgivere og A-skat. Og det er <strong>in</strong>deholdt<br />

af A-skat, ligesom vores arbejdsgiver <strong>in</strong>deholder vores skat. Det kan jeg simpelthen ikke<br />

f<strong>in</strong>de. Jeg ved, den ligger der<strong>in</strong>de. Men ud fra det her kommer jeg aldrig der<strong>in</strong>d. For når<br />

jeg ved, hvor det ligger, så ville jeg gå direkte efter den der i stedet for.<br />

TP15, l<strong>in</strong>e Ja, men omvendt kunne den jo også give… fritekst… så skulle de jo alle sammen komme.<br />

284


285<br />

Appendices<br />

Quote, (Id.) Orig<strong>in</strong>al Danish word<strong>in</strong>g of questions and responses <strong>in</strong> questionnaire and focus group<br />

data<br />

306<br />

TP32, l<strong>in</strong>e<br />

295-301<br />

TP21, l<strong>in</strong>e<br />

257-260<br />

TP02, l<strong>in</strong>e<br />

625-633<br />

TP09, l<strong>in</strong>e<br />

553-555<br />

TP06, l<strong>in</strong>e<br />

392-395<br />

Der står lige nøjagtig, at… Altså, omkostn<strong>in</strong>ger til EU's grænse skal medregnes i<br />

toldværdien. Den anden, der vedrører transporten, der kan jeg se, at den her<strong>in</strong>de forklarer<br />

det helt præcist her. Men der har jeg heller ikke været <strong>in</strong>d og søge på ”told” hernede. Den<br />

kom på bare på, at jeg søgte på ”fragt og toldværdi” og ”sider med alle ord”. Og så kom<br />

jeg <strong>in</strong>d på toldvejledn<strong>in</strong>g, som også er den, der henviser til toldkodeks, som behandler de<br />

der regler om, hvor meget fragt der skal lægges til. Så denne her er jo en treer. Men jeg<br />

kom ikke <strong>in</strong>d til den ved at søge på ”erhvervsmæssig import” eller ”forsendelse” eller<br />

”eksport”.<br />

TP21: Der hjalp den ikke så meget, for der var ikke så mange dokumenter alligevel. Der<br />

kunne du overskue de dokumenter, der var der, om du havde haft den eller ej. Der var kun<br />

14 dokumenter, der kom frem. Dem ville du kunne overskue. Den vil nok hovedsageligt<br />

være en hjælp, når du kommer op på de store mængder, altså 1000 dokumenter og den<br />

slags.<br />

Jeg sad her til sidste og kunne gå tænke mig at gå over. For uanset hvad jeg gjorde, kunne<br />

jeg ikke f<strong>in</strong>de det. Og så må jeg have et andet søgested, hvor jeg kan have en mulighed for<br />

at se nogle andre underpunkter, så jeg måske ad den vej kan gå <strong>in</strong>d. Så i den sidste synes<br />

jeg, jeg manglede det.<br />

IP: Sådan til at generere ideer til, hvad man kunne søge på, eller?<br />

TP02: Ja, fordi jeg synes, at det, jeg satte <strong>in</strong>d… Det hedder måske noget andet i<br />

momsloven, end det jeg satte <strong>in</strong>d. Det skal jeg lige have fundet ud af. I forhold til det, der<br />

manglede jeg den her. Der irriterede det mig, at der var en seddel. For uanset hvad jeg<br />

gjorde, kunne jeg ikke få den op.<br />

TP09: Der fungerede det jo godt, for der fandt jeg jo lige pludselig et overemne, som jeg<br />

så kunne klikke <strong>in</strong>d på. Og det gav mig… hov, ja, det har noget med selskabsbeskatn<strong>in</strong>g at<br />

gøre. Så det hjalp mig lidt på mig, også med at tænke, hvad det er for noget, det her.<br />

TP06: Ja, det havde jeg. Jeg vidste, at hvis jeg skulle gå <strong>in</strong>d at kigge på noget med<br />

beskatn<strong>in</strong>gen, så vidste jeg også noget om selvstændig virksomhed. Og så kunne jeg<br />

hurtigere gå <strong>in</strong>d der… Så vidste jeg skulle gå <strong>in</strong>d under personlig <strong>in</strong>dkomst og ikke<br />

kapital<strong>in</strong>dkomst. Jeg kender de skattemæssige regler. Så er det nemmere at gå <strong>in</strong>d i<br />

kategorierne, når man sådan set kender svaret på forhånd.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

TP20,<br />

l<strong>in</strong>e<br />

339-344<br />

TP14,<br />

l<strong>in</strong>e<br />

493-495<br />

Men jeg ved ikke om jeg nogens<strong>in</strong>de ville begynde at løbe alt det igennem. For jeg synes det<br />

for mig tager længere tid, fordi jeg ikke har kendskab nok til, hvad der ligger bag. Hvis jeg nu<br />

var fagmenneske i Skat og vidste alt om virksomhedsskatteordn<strong>in</strong>ger e.l., så kan det godt<br />

være, at den var genial for mig. For jeg ville vide, at jeg lige præcis kan godt <strong>in</strong>d og så trykke<br />

på det der og så få dokumenterne frem. Men jeg ved ikke, om den måske ville glemme nogle<br />

dokumenter, som jeg har brug for. Om den begrænser for meget.<br />

Når man får det første spadestik i , hvad det er for nogle kategorier, hvad de står for og dækker<br />

over, sådan at… Så fumler man, <strong>in</strong>dtil man f<strong>in</strong>der ud, hvad det er. Er der flere veje til Rom,<br />

eller hvordan er den hurtigste, eller… ja. Det er en tilvænn<strong>in</strong>g med nogle t<strong>in</strong>g. Hvad er det<br />

smarteste at gøre…<br />

286


Appendix 12: E-mail <strong>in</strong>vitation to participate <strong>in</strong> search test<br />

287<br />

Appendices<br />

The test persons for the search test were contacted by e-mail. The e-mail <strong>in</strong>formed the<br />

employees about the purpose of the search test and the progress of the search test. Also,<br />

the e-mail <strong>in</strong>formed potential test persons about privacy issues. The e-mail appears at<br />

the follow<strong>in</strong>g page (<strong>in</strong> Danish).


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Subject: Vil du bidrage til at forbedre SKATs <strong>in</strong>tranet?<br />

Kære medarbejder hos SKAT<br />

Som en del af et større forskn<strong>in</strong>gsprojekt vedr. søgemuligheder på SKATs <strong>in</strong>tranet,<br />

foretager vi i den kommende tid en evaluer<strong>in</strong>g af SKATs <strong>in</strong>tranetløsn<strong>in</strong>g. Formålet med<br />

evaluer<strong>in</strong>gen er at undersøge, hvordan man kan forbedre medarbejderes søgn<strong>in</strong>g efter<br />

<strong>in</strong>formation, når de løser forskellige arbejdsopgaver.<br />

I den forb<strong>in</strong>delse har vi brug for d<strong>in</strong> hjælp. Intranettet bliver testet af et udvalg af<br />

medarbejdere i SKAT. Testen udføres hhv. på ”location 1” og ”location 2”. Det tager<br />

ca. 1 1/2 time og består i, at du får nogle søgeopgaver udleveret, som er dit<br />

udgangspunkt for søgetesten. Der søges i både den nuværende og den nye<br />

<strong>in</strong>tranetløsn<strong>in</strong>g. Søgetesten afsluttes med et kort <strong>in</strong>terview. D<strong>in</strong> deltagelse vil naturligvis<br />

blive behandlet fortroligt og resultaterne formidlet på en måde, så du ikke vil kunne<br />

identificeres.<br />

Hvis du vil være med, beder vi dig udfylde, hvornår du har mulighed for at deltage samt<br />

hvordan vi kan komme i kontakt med dig ved at trykke på dette l<strong>in</strong>k: [log<strong>in</strong>data]<br />

Du vil desuden blive stillet nogle enkelte spørgsmål omkr<strong>in</strong>g d<strong>in</strong> arbejdsfunktion og<br />

brug af <strong>in</strong>tranettet. Besvarelsen tager omkr<strong>in</strong>g 3 m<strong>in</strong>utter.<br />

Vi vil meget gerne have d<strong>in</strong> tilkendegivelse hurtigst muligt og senest torsdag den 20/5-<br />

2010 kl. 18.<br />

Forskn<strong>in</strong>gsprojektet udføres som et samarbejde mellem Danmarks Biblioteksskole, IT<br />

& Telestyrelsen og SKAT. Hos SKAT er projektet forankret i Projektenheden (Ebbe<br />

Tor Andersen). Søgetesten er godkendt af viceskattedirektør Kaj Kirkegaard. Hvis du<br />

har kommentarer, spørgsmål eller lignende, er du velkommen til at kontakte Tanja<br />

Svarre (kontaktoplysn<strong>in</strong>ger nedenfor).<br />

På forhånd mange tak for d<strong>in</strong> tid og hjælp.<br />

Med venlig hilsen<br />

288


Ebbe Tor Andersen (Kommunikation, SKAT) og<br />

Tanja Svarre (Danmarks Biblioteksskole)<br />

289<br />

Appendices<br />

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,<br />

Tanja Svarre, ph.d.-studerende<br />

Danmarks Biblioteksskole, <strong>Aalborg</strong>-afdel<strong>in</strong>gen, Fredrik Bajers Vej 7K, 9220 <strong>Aalborg</strong><br />

Øst<br />

Tlf. 9815 7922, fax 9815 1042<br />

E-mail: tas@db.dk<br />

Ebbe Tor Andersen, specialkonsulent<br />

Projektenheden, SKAT<br />

Tlf. 7 17 02<br />

E-mail: Ebbe.Tor@Skat.dk


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

290


291<br />

Appendices<br />

Appendix 13: Questionnaire for recruit<strong>in</strong>g test persons for the search test<br />

The present appendix presents the questionnaire applied for recruit<strong>in</strong>g test persons for<br />

the search test. The questionnaire was prepared <strong>in</strong> Kalus and is available at:<br />

http://kalus3.kalus.dk/l?d=RmMBCvq24teH


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

292


Appendix 14: Simulated search tasks<br />

293<br />

Appendices<br />

The present appendix presents the three search tasks form<strong>in</strong>g the basis for the controlled<br />

searched <strong>in</strong> the search test.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

SIM 1: Salg af forældrekøbt lejlighed<br />

Søgecase:<br />

Kirsten har solgt en lejlighed købt som forældrekøb. Hun har haft tab på salget og i<br />

samme forb<strong>in</strong>delse haft udgifter til ejendomsmægler og renover<strong>in</strong>g af lejligheden. Kan<br />

hun nu trække tab og udgifter fra i skat?<br />

Søgeopgave:<br />

F<strong>in</strong>d dokumenter, der angiver de skattemæssige forhold omkr<strong>in</strong>g et forældrekøb.<br />

SIM 2: Beskatn<strong>in</strong>g af e-handel<br />

Søgecase:<br />

Et personligt ejet enkeltmandsforlag ønsker at sælge egne, engelsksprogede bøger ved<br />

hjælp af e-handel på hjemmesider i USA og andre lande, eksempelvis Amazon.com og<br />

Smashwords.com. Der er fast driftssted i Danmark. Hvordan skal <strong>in</strong>dehaveren forholde<br />

sig i forhold til beskatn<strong>in</strong>g af <strong>in</strong>dtægten på salget?<br />

Søgeopgave:<br />

F<strong>in</strong>d dokumenter, der angiver, hvordan man beskatter e-handel, der har fast driftssted i<br />

Danmark.<br />

SIM 3: Freelancer<br />

Søgecase:<br />

Jens underviser freelance for en virksomhed, men er på vej til at udvide med flere<br />

kunder. Lykkes alle forhåndsaftaler, vil han komme til at tjene omkr<strong>in</strong>g 100.000 årligt.<br />

Nu er han blevet i tvivl om, om han i givet fald kan fortsætte som lønmodtager eller om<br />

han skal starte erhvervsmæssig virksomhed op og momsregistreres.<br />

Søgeopgave:<br />

F<strong>in</strong>d dokumenter, der angiver reglerne for, hvornår man skal momsregistreres.<br />

294


Appendix 15: Test persons’ <strong>in</strong>sight <strong>in</strong>to simulated search tasks<br />

295<br />

Appendices<br />

Every time a search task had been completed the test persons answered a short on<br />

screen questionnaire captur<strong>in</strong>g their <strong>in</strong>sight <strong>in</strong>to the subject of the task. The<br />

questionnaire was embedded <strong>in</strong> Morae. The questions conta<strong>in</strong>ed <strong>in</strong> the questionnaire<br />

were:<br />

1. M<strong>in</strong> <strong>in</strong>dsigt i arbejdsopgavens emne:<br />

Ingen<br />

<strong>in</strong>dsigt<br />

2. Hvor svær var opgaven?<br />

1 2 3 4 5 Stor <strong>in</strong>dsigt<br />

Meget let 1 2 3 4 5 Meget svær<br />

3. Hvor meget m<strong>in</strong>dede opgaven om de arbejdsopgaver, du sidder med til dagligt?<br />

Intet<br />

sammenfald<br />

1 2 3 4 5<br />

Stort<br />

sammenfald


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

296


Appendix 16: E-mail concern<strong>in</strong>g naturalistic <strong>in</strong>formation needs<br />

297<br />

Appendices<br />

A few days before the search test the test persons received an e-mail ask<strong>in</strong>g them to<br />

br<strong>in</strong>g a search task for the test session. We <strong>in</strong>tentionally sent the e-mail shortly before<br />

the time of the test. This way we wanted make sure, that the test persons actually<br />

remembered to br<strong>in</strong>g the task when show<strong>in</strong>g up. An alternative would have been to<br />

mention it <strong>in</strong> the e-mail confirm<strong>in</strong>g the appo<strong>in</strong>tment. However for some test persons,<br />

they received the confirmative e-mail weeks before the appo<strong>in</strong>tment and that could have<br />

caused the test persons to forget about the extra task. Another benefit was that the test<br />

persons were rem<strong>in</strong>ded of their upcom<strong>in</strong>g appo<strong>in</strong>tment. The text of the e-mail appears<br />

below:<br />

Fra: Tanja Svarre Jonasen<br />

Sendt: on 16-06-2010 10:17<br />

Emne: Vedr. søgetest<br />

Kære testperson,<br />

Når du møder op til søgetesten en af de kommende dage bedes du medbr<strong>in</strong>ge et problem eller<br />

en søgeopgave, som du for nyligt har løst ved at søge på det nuværende <strong>in</strong>tranet. Opgaven<br />

skal helst bære præg af at være typisk for d<strong>in</strong> brug af <strong>in</strong>tranettet.<br />

Vel mødt i lokale F-1-46 (lokalet ved siden af videokonferencen).<br />

Mange hilsner,<br />

Tanja Svarre<br />

Tlf. 9877 3025


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

298


Appendix 17: Instructions for search test persons<br />

299<br />

Appendices<br />

This appendix presents the elements conta<strong>in</strong>ed <strong>in</strong> the <strong>in</strong>struction given to the test<br />

persons <strong>in</strong> advance of the search test.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Instructions for test persons<br />

Test procedure:<br />

You will receive a set of search tasks. The search tasks are divided <strong>in</strong>to two<br />

groups to be searched us<strong>in</strong>g their own search functionality.<br />

Please f<strong>in</strong>d the documents you need to solve the search task you are work<strong>in</strong>g on.<br />

Maybe you know the answer to the task <strong>in</strong> advance, but please search until you<br />

have the document or the documents that can answer the task.<br />

Documents are assessed by a 4-po<strong>in</strong>t relevance scale (the description is<br />

presented and handed out to the test person <strong>in</strong> pr<strong>in</strong>t to enable brush-up dur<strong>in</strong>g<br />

the test).<br />

When you make a search please pr<strong>in</strong>t out the search results. It will be used for<br />

relevance judgments afterwards.<br />

After completion of each of the tasks you fill out a questionnaire on the screen<br />

concern<strong>in</strong>g the task and then f<strong>in</strong>ally I have some general questions about the<br />

system you have been search<strong>in</strong>g.<br />

.<br />

Presentation of the prototype:<br />

System functionalities<br />

o <strong>Automatic</strong> truncation<br />

o Search type: Explanation of the different possibilities<br />

o Document types conta<strong>in</strong>ed: The types are presented and a pr<strong>in</strong>ted<br />

overview is handed out.<br />

o Categorization (is presented as the test person starts out us<strong>in</strong>g it, either<br />

<strong>in</strong>itially or on the way)<br />

o Time of publish<strong>in</strong>g<br />

The system is a prototype. This means that you need to be aware of:<br />

o Please dispense with the numbers stated after the categories – they are<br />

not exact at the present time<br />

o The database has been generated <strong>in</strong> the fall of 2009, which means that<br />

the latest documents are not conta<strong>in</strong>ed. In case you are look<strong>in</strong>g for<br />

<strong>in</strong>structions or similar documents that have been updated recently, it will<br />

be sufficient for you to f<strong>in</strong>d the latest document able to answer your<br />

request <strong>in</strong> the collection.<br />

300


301<br />

Appendices<br />

o At present the system does not offer correspondence between result lists<br />

and the full text of the documents. Therefore, please consider the<br />

relevance of your search results on the basis of the hit lists.<br />

o You need to use your mouse to click “search”. Us<strong>in</strong>g the “enter”-button<br />

will direct you to simple search.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

302


Appendix 18: Rotation of search tasks<br />

303<br />

Appendices<br />

A set of pr<strong>in</strong>ciples were set up for the construction of the rotation. The first row of<br />

unique successions was generated by list<strong>in</strong>g the work tasks as to their number <strong>in</strong><br />

ascend<strong>in</strong>g and decl<strong>in</strong><strong>in</strong>g succession respectively. Next, all work tasks moved one<br />

position to the right. The follow<strong>in</strong>g rotation moved the last task to position number two.<br />

Lastly, a rotation was generated by mov<strong>in</strong>g the task at position number three to position<br />

number one. All rotations were generated twice, start<strong>in</strong>g respectively with<br />

categorization, or without. By these means, 42 unique rotations were formed. Of these,<br />

32 were needed for the search test. These rotations are listed <strong>in</strong> the table on the next<br />

page.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

1. position 2. position 3. position 4. position<br />

1 1 (Sys B) 2 (Sys B) 3 (Sys A) 4 (Sys A)<br />

2 1 (Sys A) 2 (Sys A) 3 (Sys B) 4 (Sys B)<br />

3 1 (Sys B) 4 (Sys B) 2 (Sys A) 3 (Sys A)<br />

4 1 (Sys A) 4 (Sys A) 2 (Sys B) 3 (Sys B)<br />

5 1 (Sys A) 3 (Sys A) 4 (Sys B) 2 (Sys B)<br />

6 1 (Sys A) 3 (Sys A) 2 (Sys B) 4 (Sys B)<br />

7 1 (Sys B) 3 (Sys B) 4 (Sys A) 2 (Sys A)<br />

8 1 (Sys B) 3 (Sys B) 2 (Sys A) 4 (Sys A)<br />

9 2 (Sys A) 4 (Sys A) 3 (Sys B) 1 (Sys B)<br />

10 2 (Sys A) 3 (Sys A) 4 (Sys B) 1 (Sys B)<br />

11 2 (Sys A) 1 (Sys A) 4 (Sys B) 3 (Sys B)<br />

12 2 (Sys A) 4 (Sys A) 1 (Sys B) 3 (Sys B)<br />

13 2 (Sys B) 4 (Sys B) 3 (Sys A) 1 (Sys A)<br />

14 2 (Sys B) 3 (Sys B) 4 (Sys A) 1 (Sys A)<br />

15 2 (Sys B) 1 (Sys B) 4 (Sys A) 3 (Sys A)<br />

16 2 (Sys B) 4 (Sys B) 1 (Sys A) 3 (Sys A)<br />

17 3 (Sys A) 1 (Sys A) 2 (Sys B) 4 (Sys B)<br />

18 3 (Sys A) 2 (Sys A) 4 (Sys B) 1 (Sys B)<br />

19 3 (Sys A) 2 (Sys A) 1 (Sys B) 4 (Sys B)<br />

20 3 (Sys A) 4 (Sys A) 2 (Sys B) 1 (Sys B)<br />

21 3 (Sys B) 1 (Sys B) 2 (Sys A) 4 (Sys A)<br />

22 3 (Sys B) 2 (Sys B) 4 (Sys A) 1 (Sys A)<br />

23 3 (Sys B) 2 (Sys B) 1 (Sys A) 4 (Sys A)<br />

24 3 (Sys B) 4 (Sys B) 2 (Sys A) 1 (Sys A)<br />

25 4 (Sys A) 2 (Sys A) 1 (Sys B) 3 (Sys B)<br />

26 4 (Sys A) 3 (Sys A) 2 (Sys B) 1 (Sys B)<br />

27 4 (Sys A) 1 (Sys A) 2 (Sys B) 3 (Sys B)<br />

28 4 (Sys A) 3 (Sys A) 1 (Sys B) 2 (Sys B)<br />

29 4 (Sys B) 2 (Sys B) 1 (Sys A) 3 (Sys A)<br />

30 4 (Sys B) 3 (Sys B) 2 (Sys A) 1 (Sys A)<br />

31 4 (Sys B) 1 (Sys B) 2 (Sys A) 3 (Sys A)<br />

32 4 (Sys B) 3 (Sys B) 1 (Sys A) 2 (Sys A)<br />

Legend: Sys A and Sys B refer to the designations of the two parts of the test system (see section<br />

6.4.1). The columns list the search tasks as to their position <strong>in</strong> the order of succession.<br />

304


Appendix 19: Search test <strong>in</strong>terview guide<br />

305<br />

Appendices<br />

This appendix presents the <strong>in</strong>terview guide f<strong>in</strong>ish<strong>in</strong>g the test sessions. The questions<br />

are categorized <strong>in</strong> three superior groups.<br />

Perception of the test system<br />

When was the categorization (not) helpful to you dur<strong>in</strong>g search<strong>in</strong>g?<br />

In which way?<br />

What was it about the categorization that made it (un)helpful to you?<br />

In which situations did you not need the categorization?<br />

Present use of the <strong>in</strong>tranet<br />

What characterizes your use of the present <strong>in</strong>tranet? (situations, where it is omitted,<br />

documents you look for <strong>in</strong> the <strong>in</strong>tranet and the like)?<br />

Categorization <strong>in</strong> your daily work<br />

I would like you to describe a typical situation from your daily work, where you make<br />

use of the <strong>in</strong>tranet.<br />

To that situation, how would categorization be useful to you?<br />

To that situation, how would categorization not be relevant to you?


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

306


307<br />

Appendices<br />

Appendix 20: Judgement of the relevance of retrieved documents <strong>in</strong> search<br />

test<br />

The present appendix presents the four degrees of relevance, the test persons could use<br />

dur<strong>in</strong>g the assessment of retrieval sets. The explanation of the dist<strong>in</strong>ct degrees are<br />

based on Sormunen (2002). In the test situation, the content of the appendix were<br />

expla<strong>in</strong>ed to the test persons. Further, a pr<strong>in</strong>t of the explanations was placed next to the<br />

test mach<strong>in</strong>e to allow for the test persons to consult it whenever needed.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Dokumenters relevans<br />

0: Dokumentet <strong>in</strong>deholder <strong>in</strong>gen <strong>in</strong>formation om emnet<br />

1: Dokumentet peger på emnet, men <strong>in</strong>deholder hverken mere eller anden<br />

<strong>in</strong>formation end emnebeskrivelsen, typisk en sætn<strong>in</strong>g eller et faktum.<br />

2: Dokumentet <strong>in</strong>deholder mere <strong>in</strong>formation end emnebeskrivelsen, men ikke på<br />

en udtømmende måde. Er der tale om et emne med flere facetter er det kun<br />

visse undertemaer eller synspunkter, der er dækket. Typisk et tekstafsnit, nogle<br />

sætn<strong>in</strong>ger eller fakta.<br />

3: Dokumentet diskuterer emnets temaer udtømmende. Er der tale om et emne<br />

med flere facetter, er alle eller næsten alle facetter eller synspunkter dækket af<br />

dokumentet. Typisk flere tekstafsnit eller en del sætn<strong>in</strong>ger eller fakta.<br />

308


Appendix 21: Completeness degree of questionnaire responses<br />

309<br />

Appendices<br />

#<br />

% of<br />

798<br />

Completes the questionnaire 340 42,6%<br />

Answer beyond Inspection (page 45 <strong>in</strong> the questionnaire) but quits<br />

somewhere hereafter<br />

27 3,4%<br />

Stop, when the questions regard<strong>in</strong>g Inspection (page 44 <strong>in</strong> the<br />

questionnaire) has f<strong>in</strong>ished<br />

66 8,3%<br />

Stop when the work tasks starts (before page 12 <strong>in</strong> the questionnaire) 13 1,6%<br />

Start the questionnaire but quits before answer<strong>in</strong>g their place of<br />

employment (before page 3 <strong>in</strong> the questionnaire)<br />

14 1,8%<br />

Sign <strong>in</strong>to the questionnaire, but does not answer any questions 36 4,5%<br />

Do not log <strong>in</strong> at all 302 37,8%<br />

Total 798* 100%<br />

Legend: The questionnaire was sent to 799 respondents. However, one could not be <strong>in</strong>cluded due to<br />

errors and were deleted. Therefore, the sum of respondents adds up to 798.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

310


Appendix 22: Respondents’ experience with work tasks<br />

Work task/<br />

experience with work task<br />

311<br />

0-6 months<br />

Instruction 6<br />

3%<br />

Settlement: common<br />

Settlement: prelim<strong>in</strong>ary assessment of<br />

<strong>in</strong>come/personal taxes<br />

Settlement: bus<strong>in</strong>ess relations<br />

2<br />

4%<br />

Settlement: corporation taxes 1<br />

4%<br />

Settlement: customs<br />

Settlement: vehicles 1<br />

6%<br />

Settlement: estate 2<br />

14%<br />

Inspection: common<br />

Inspection: customs 1<br />

6%<br />

Collection 4<br />

10%<br />

Processes of support: legal support<br />

Processes of support: m<strong>in</strong>ister service 3<br />

30%<br />

7-11 months<br />

13<br />

7%<br />

1<br />

5%<br />

2<br />

4%<br />

4<br />

7%<br />

3<br />

12%<br />

3<br />

25%<br />

6<br />

33%<br />

1<br />

2%<br />

2<br />

13%<br />

4<br />

9%<br />

1-2 years<br />

33<br />

18%<br />

4<br />

20%<br />

6<br />

11%<br />

13<br />

23%<br />

7<br />

28%<br />

1<br />

8%<br />

8<br />

44%<br />

1<br />

7%<br />

5<br />

8%<br />

1<br />

6%<br />

7<br />

18%<br />

7<br />

16%<br />

Appendices<br />

3-5 years<br />

21<br />

12%<br />

3<br />

15%<br />

4<br />

7%<br />

2<br />

4%<br />

2<br />

8%<br />

2<br />

17%<br />

1<br />

6%<br />

3<br />

21%<br />

5<br />

8%<br />

1<br />

6%<br />

9<br />

23%<br />

10<br />

22%<br />

2<br />

20%<br />

More than 5 years<br />

108<br />

60%<br />

12<br />

60%<br />

43<br />

75%<br />

38<br />

67%<br />

12<br />

48%<br />

6<br />

50%<br />

2<br />

11%<br />

8<br />

57%<br />

49<br />

80%<br />

11<br />

69%<br />

19<br />

49%<br />

24<br />

53%<br />

5<br />

50%


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Processes of support: IT service and adm<strong>in</strong>istration<br />

Processes of support: HR and education<br />

Processes of support: <strong>in</strong>ternal activities 4<br />

27%<br />

Management and development: strategy 2<br />

13%<br />

Management and development: bus<strong>in</strong>ess management 1<br />

7%<br />

Management and development: development 3<br />

11%<br />

312<br />

3<br />

21%<br />

1<br />

7%<br />

1<br />

6%<br />

3<br />

21%<br />

3<br />

11%<br />

3<br />

21%<br />

2<br />

14%<br />

3<br />

20%<br />

1<br />

6%<br />

1<br />

7%<br />

7<br />

26%<br />

4<br />

29%<br />

5<br />

36%<br />

5<br />

33%<br />

7<br />

44%<br />

5<br />

36%<br />

6<br />

22%<br />

4<br />

29%<br />

6<br />

43%<br />

3<br />

20%<br />

5<br />

31%<br />

4<br />

29%<br />

8<br />

30%


313<br />

Appendices<br />

Appendix 23: Age distribution of population, respondents and test persons<br />

To compare, the total figures of SKAT at the time count:<br />

Total<br />

numbers<br />

17-18 19-25 26-35 36-45 46-55 56-68<br />

3 91 950 2180 2972 2473<br />

% 0% 1% 11% 25% 34% 28%


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Questionnaire respondent distribution<br />

N Valid 340<br />

Miss<strong>in</strong>g 0<br />

Mean 47.29<br />

Median 47.00<br />

Mode 44<br />

Std. Deviation 9.537<br />

Skewness -.354<br />

Std. Error of Skewness .132<br />

Population distribution:<br />

N Valid 8681<br />

Miss<strong>in</strong>g 0<br />

Mean 48.44<br />

Median 49.00<br />

Mode 58<br />

Std. Deviation 9.779<br />

Skewness -.420<br />

Std. Error of Skewness .026<br />

314<br />

Test person distribution<br />

N Valid 31<br />

Miss<strong>in</strong>g 1<br />

Mean 46.45<br />

Median 48.00<br />

Mode 48<br />

Std. Deviation 8.532<br />

Skewness -.250<br />

Std. Error of Skewness .421


Appendix 24: Respondents’ length of service <strong>in</strong> the organization<br />

315<br />

Appendix 1


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

316


Appendix 25: Focus group participants work tasks<br />

317<br />

Appendices<br />

The present appendix shows the distribution of participants across the 19 generic work<br />

tasks. In the <strong>in</strong>troductory part of the focus group <strong>in</strong>terviews, the participants were asked<br />

to present themselves. It is this personal <strong>in</strong>troduction that has functioned as the basis for<br />

the table below. We compared the participants’ descriptions of their work areas with the<br />

work task descriptions from the questionnaire. On this basis, we placed the respondents<br />

as to their primary work task. In some cases, <strong>in</strong> particular <strong>in</strong> the <strong>in</strong>terview concern<strong>in</strong>g<br />

Instruction, other more important areas of responsibility appeared dur<strong>in</strong>g the <strong>in</strong>terview.<br />

In those cases we assessed which work task were more important and let that<br />

assessment guide the placement of the specific participant.


<strong>Automatic</strong> <strong><strong>in</strong>dex<strong>in</strong>g</strong> <strong>in</strong> e-<strong>government</strong><br />

Ma<strong>in</strong> process Work task Participants<br />

Instruction Instruction R7, R8, R9, R10,<br />

R11, R12<br />

Settlement Common<br />

Prelim<strong>in</strong>ary assessment of<br />

<strong>in</strong>come/personal taxes<br />

R4<br />

Bus<strong>in</strong>ess relations R3, R6<br />

Corporation taxes R2<br />

Customs R18, R19, R20,<br />

R21, R22<br />

Inspection<br />

Vehicles<br />

Estate<br />

Common R23 R24, R25,<br />

R26, R27<br />

Collection<br />

Customs<br />

Collection R33, R34, R35<br />

Processes of support Legal support R1, R13, R14,<br />

R15, R16, R17<br />

M<strong>in</strong>ister service<br />

IT service and adm<strong>in</strong>istration R28<br />

HR and education R1<br />

Internal activities R32, R28<br />

Management and Strategy R29, R30<br />

development<br />

Bus<strong>in</strong>ess management<br />

Legend: R20 refers to the specific focus group participant.<br />

Development R29, R31<br />

318


Appendix 26: Additional sources mentioned by respondents<br />

319<br />

Appendices<br />

This appendix reports the sources listed by the respondents to supplement the<br />

predef<strong>in</strong>ed list of sources used for <strong>in</strong>formation seek<strong>in</strong>g <strong>in</strong> relation to certa<strong>in</strong> work tasks.<br />

The sources are listed as to the work task, they have been mentioned <strong>in</strong> connection with.


<strong>Automatic</strong> Index<strong>in</strong>g<br />

Work task Additions to predef<strong>in</strong>ed sources<br />

Instruction BIQ systemen<br />

bekendtgørelse, herunder registrer<strong>in</strong>gsbekendtgørelsen og<br />

registrer<strong>in</strong>gsafgiftsloven<br />

SKATs alm<strong>in</strong>delige systemer Fx DR, TStele, KMD osv<br />

Database vedr tilbageholdte forsendelser<br />

DetailCOR (= <strong>in</strong>dkomstoplysn<strong>in</strong>ger <strong>in</strong>deværende år); google<br />

maps til kørselsvejledn<strong>in</strong>g<br />

www.europa.eu<br />

www.skat.dk<br />

Nabolær<strong>in</strong>g i tvilvs tilfælde<br />

kmd-skat<br />

Best Practic Vejledn<strong>in</strong>ger<br />

Diverse <strong>in</strong>terne søgesystemer<br />

skat.dk<br />

Inddrivelsesvejledn<strong>in</strong>gen<br />

skat lign<strong>in</strong>g, ts tele, virksomhedsreg., remedy<br />

Google – søgn<strong>in</strong>g<br />

politiets EDB-programmer<br />

BIQ og SØS<br />

Kollegaer<br />

Generelt bruger jeg alle, men Intranet og Captia er de mest brugte<br />

Interne systemer i Skat.<br />

Tele, KMD osv.<br />

Specielt <strong>in</strong>ternetsider med kort og luftfoto<br />

Remedy, KMD, Dipsy, DR-sys, TP-sys m.fl.<br />

Forskelligt kursusmateriale<br />

trykte lovsaml<strong>in</strong>ger. TfS. Lærerbøger o.l.<br />

Remedy<br />

CVR.dk<br />

SKAT's <strong>in</strong>terne it-systemer<br />

TP, TS-tele, Remedy, Sap, KMD osv.<br />

SKATS hjemmeside - ikke <strong>in</strong>tranettet, men den offentligt<br />

tilgængelige hjemmeside www.skat.dk<br />

I perioder bruge jeg captia ellers bruger jeg <strong>in</strong>gen udover skats<br />

sys.<br />

SKATs egne EDB-systemer<br />

SKAT's generelle systemer<br />

Programmer<strong>in</strong>gsgrundlag, programmer mv.<br />

Jeg bruger SKATs jurister og læser ellers<br />

lovforslag/lov/bemærkn<strong>in</strong>ger og lign<strong>in</strong>gsvejledn<strong>in</strong>g<br />

Sharepo<strong>in</strong>t<br />

EU's databaser, sites for andre landes skattemyndigheder,<br />

<strong>in</strong>teresseorganisationers og virksomheders hjemmesider<br />

Afhænger af den konkrete opgave<br />

Settlement: common politiets edb-systemer<br />

KMDs og statens systemer<br />

320


321<br />

Appendices<br />

Work task Additions to predef<strong>in</strong>ed sources<br />

Skat&acute;s DR-system, SAP og andre skattesystemer<br />

Det afhænger af situationen. Det er mest udenlandsk <strong>in</strong>dkomst jeg<br />

Settlement: prelim-<br />

nary assessment of<br />

<strong>in</strong>come/personal<br />

taxes<br />

Settlement: bus<strong>in</strong>ess<br />

relations<br />

Settlement:<br />

corporation taxes<br />

behandler og det kræver ofte yderligere undersøgelser.<br />

KMD SkatLign<strong>in</strong>g (sagsbehandlersystem). Captia er noget<br />

skrammel at arbejde med til fremf<strong>in</strong>d<strong>in</strong>g af dokumenter<br />

(medm<strong>in</strong>dre jeg endnu ikke har gennemskuet det smarte ved<br />

Captia)<br />

skat.dk<br />

Google søgn<strong>in</strong>g<br />

Kollegaer<br />

Egne mapper over oplysn<strong>in</strong>ger/vejledn<strong>in</strong>ger, som jeg har samlet<br />

gennem tiden eller som vi selv har aftalt i afdel<strong>in</strong>gen.<br />

Hukommelsen og mangeårig erfar<strong>in</strong>g om skat er de væsentligste<br />

kilder i dagligdagen.<br />

Statens systemer og KMD Skatlign<strong>in</strong>g<br />

Remedy<br />

Det afhænger meget af hvilken type <strong>in</strong>dtægt eller situation jeg<br />

behandler. Det er mest udenlandsk <strong>in</strong>dkomst og flytn<strong>in</strong>g til<br />

udlandet jeg arbejder med.<br />

Kollegaer<br />

sparr<strong>in</strong>g med kollegaer<br />

Skattenyt – Schultz<br />

Google<br />

SKAT&acute;s <strong>in</strong>terne edb-systemer. Bus<strong>in</strong>es Object, KMD Skat<br />

Lign<strong>in</strong>g, KMD Skat Forskud, TP-systemet, Remedy, CPRsystemet,<br />

Dipsy, Erhvervssystemet<br />

Aviser<br />

Jeg er afdel<strong>in</strong>gsleder for ca. 20 medarbejdere - jeg anvender stort<br />

set hele m<strong>in</strong> arbejdstid til personaleledelse.<br />

Jeg er afdel<strong>in</strong>gsleder, så jeg laver ikke direkte sagsbehandl<strong>in</strong>g<br />

Momsmanual<br />

Ingen af dem. Jeg hjælper en gang imellem med at taste<br />

momsangivelser. Til daglig sidder jeg med Listeangivelser vedr.<br />

EU-salg<br />

SKAT's generelle systemer<br />

BIQ og SØS<br />

Kollegaer<br />

Settlement: customs <br />

<br />

www.europa.eu<br />

EU's forordn<strong>in</strong>ger vedr. forsendelser<br />

EU's elektroniske opslagsværker<br />

Toldsystemet<br />

EUR-lex<br />

Settlement: vehicles Egne notater<br />

Settlement: estate <br />

<br />

BBR, Kort- og Matrikelstyrelsen, Elektroniske varsl<strong>in</strong>ger<br />

Specielt <strong>in</strong>ternetsider med kort og luftfoto<br />

Danmarks Areal<strong>in</strong>fo, Danmarks Statistikbank, Kort&


<strong>Automatic</strong> Index<strong>in</strong>g<br />

Work task Additions to predef<strong>in</strong>ed sources<br />

Matrikelstyrelsen, OIS, BBR, Kommunernes hjemmesider med<br />

lokalplaner, GEO, Plansystem<br />

Plansystemer, kortopslag, realkreditrådet<br />

Inspection: common BIQ<br />

spørger kollegaer<br />

Nabolær<strong>in</strong>g<br />

SRF - kursusmaterialer, m.v.<br />

KMD-SkatLign<strong>in</strong>g, Remedy,mv<br />

Konkrete oplysn<strong>in</strong>ger fra SKATs egne systemer. TS-tele, KMD<br />

skat/lign<strong>in</strong>g, Remedy. Dvs. systemgenererede, <strong>in</strong>dtastede, <strong>in</strong>terne<br />

dokumenter, arbejdspapirer mv. Kontroloplysn<strong>in</strong>ger på R-75 med<br />

mere.<br />

Diverse andre <strong>in</strong>terne søgesystemer<br />

Amadeus database, Kob, Biq<br />

Kollegaer<br />

Aviser<br />

jeg er ikke sagsbehandler<br />

Remedy, KMD, Dipsy, DR-sys, TP-sys, m.fl.<br />

Inspection: customs FødevareErhvervs hjemmeside.<br />

FødevareErhvervs hjemmeside - EU-tidende<br />

Captia bruges kun til afdel<strong>in</strong>gs sagen<br />

EU's elektroniske opslagsværker<br />

feoga-håndbogen. Forordn<strong>in</strong>ger fra EU<br />

toldsystemet<br />

Collection Momsprogrammer<br />

Undervisn<strong>in</strong>gsmateiale fra studie samt relevante bøger fra studiet<br />

Domstole.dk<br />

saprr<strong>in</strong>g med kollegaer<br />

Egne systemer<br />

Processes of support:<br />

legal support<br />

Processes of support:<br />

IT service and<br />

adm<strong>in</strong>istration<br />

Processes of support:<br />

<strong>in</strong>ternal activities<br />

Skattemappen<br />

kmd-skat<br />

Skattemappen<br />

EU's elektroniske opslagsværker<br />

-one word: google (forresten: jeg har <strong>in</strong>gen økonomiske<br />

<strong>in</strong>teresser i at fremhæve google fremfor andre....kun at google<br />

virker, hvergang)<br />

Ingen<br />

Egen SharePo<strong>in</strong>t løsn<strong>in</strong>g (Sysmod fase 1), Dokumenter i<br />

filstruktur<br />

Microsoft også som trykte medier<br />

Programmer og programmer<strong>in</strong>gsgrundlag<br />

Sharepo<strong>in</strong>t som væsentligste redskab<br />

Gamle mails, Programmer<strong>in</strong>gsgrundlag mv<br />

KMD SKAT LIGNING<br />

cvr registret<br />

322


Work task Additions to predef<strong>in</strong>ed sources<br />

Management and<br />

development:<br />

<br />

<br />

google<br />

Afdel<strong>in</strong>gens fællesdrev<br />

strategy<br />

Management and<br />

development:<br />

development<br />

KMD-SKAT LIGNING<br />

Værktøjer og vejledn<strong>in</strong>ger der er placeret på H-drevet<br />

Datawarehouse, KMD Skat Lign<strong>in</strong>g(sagsstyr<strong>in</strong>g) dipsy<br />

www.skat.dk<br />

Vores SAP-system, udtræk af diverse rapporte<br />

Afdel<strong>in</strong>gens fællesdrev<br />

323<br />

Appendices


<strong>Automatic</strong> Index<strong>in</strong>g<br />

324


Appendix 27: Test persons’ background data<br />

325<br />

Appendices<br />

The appendix conta<strong>in</strong>s tables display<strong>in</strong>g background data for the test persons of the<br />

search test as regards gender, age, length of service, and education. The tables were<br />

generated <strong>in</strong> SPSS and are listed <strong>in</strong> the order just mentioned. For all three tables <strong>in</strong> the<br />

appendix, one person did not respond to these particular questions. This expla<strong>in</strong>s the<br />

difference <strong>in</strong> N (<strong>in</strong> the search test N=32, <strong>in</strong> this appendix N=31).<br />

Gender distribution<br />

Frequency Percent Valid Percent<br />

Cumulative<br />

Percent<br />

Valid Male 10 31.3 32.3 32.3<br />

Female 21 65.6 67.7 100.0<br />

Total 31 96.9 100.0<br />

Miss<strong>in</strong>g System 1 3.1<br />

Total 32 100.0<br />

Legend: The table displays the distribution of men and women <strong>in</strong> the group of test persons. N=31.<br />

The test persons’ year of birth and length of service<br />

Year of birth Length of service<br />

N Valid 31 31<br />

Miss<strong>in</strong>g 1 1<br />

Mean 1963.23 21.68<br />

Median 1962.00 24.00<br />

M<strong>in</strong>imum 1949 4<br />

Maximum 1980 43<br />

Legend: Calculations of the average, m<strong>in</strong>imum and maximum age and length of service of the test<br />

persons. The length of service column denotes the number of years, the test persons have been<br />

work<strong>in</strong>g <strong>in</strong> the organization. N=31.


<strong>Automatic</strong> Index<strong>in</strong>g<br />

Latest education of the test persons<br />

Frequency Percent Valid Percent<br />

326<br />

Cumulative<br />

Percent<br />

Valid Internal clerk programme 6 18.8 19.4 19.4<br />

Adm<strong>in</strong>istrative assistant 4 12.5 12.9 32.3<br />

Other vocational education<br />

and tra<strong>in</strong><strong>in</strong>g<br />

2 6.3 6.5 38.7<br />

Bachelor degree 1 3.1 3.2 41.9<br />

Medium-cycle higher<br />

education<br />

1 3.1 3.2 45.2<br />

Long-cycle higher education 9 28.1 29.0 74.2<br />

Master's programme 8 25.0 25.8 100.0<br />

Total 31 96.9 100.0<br />

Miss<strong>in</strong>g System 1 3.1<br />

Total 32 100.0


Appendix 28: Supplementary search test tables<br />

Table 1: Reformulations <strong>in</strong> sessions<br />

System A M<strong>in</strong>: 0<br />

Max: 5<br />

SD=1.5<br />

System B M<strong>in</strong>: 0<br />

Max: 18<br />

SD=5.2<br />

Total M<strong>in</strong>: 0<br />

Max: 18<br />

SD=3.9<br />

(n=32)<br />

Sim1 Sim2 Sim3 NWT Total<br />

M<strong>in</strong>: 0<br />

Max: 11<br />

SD=2.8<br />

M<strong>in</strong>: 2<br />

Max: 10<br />

SD=2.9<br />

M<strong>in</strong>: 0<br />

Max: 11<br />

SD=3.3<br />

(n=32)<br />

M<strong>in</strong>: 0<br />

Max: 27<br />

SD=6.6<br />

M<strong>in</strong>: 0<br />

Max: 15<br />

SD=3.6<br />

M<strong>in</strong>: 0<br />

Max: 27<br />

SD=5.3<br />

(n=32)<br />

327<br />

M<strong>in</strong>: 0<br />

Max: 9<br />

SD=2.7<br />

M<strong>in</strong>: 0<br />

Max: 10<br />

SD=3.4<br />

M<strong>in</strong>: 0<br />

Max: 10<br />

SD=3.1<br />

(n=32)<br />

Table 2: Correlations of the number of search terms <strong>in</strong> queries and number of hits<br />

Correlations<br />

No. of terms <strong>in</strong><br />

query No. of hits<br />

No. of terms <strong>in</strong> query Pearson Correlation 1 .200 **<br />

Sig. (2-tailed) .002<br />

N 229 229<br />

No. of hits Pearson Correlation .200 **<br />

Sig. (2-tailed) .002<br />

N 229 229<br />

**. Correlation is significant at the 0.01 level (2-tailed).<br />

Legend: The table displays the correlations <strong>in</strong> system A, as the number of<br />

hits <strong>in</strong> system B is much lower due to categorization. Therefore: N=229.<br />

1<br />

Appendices<br />

M<strong>in</strong>: 0<br />

Max: 27<br />

SD=4.0<br />

M<strong>in</strong>: 0<br />

Max: 18<br />

SD=3.9<br />

M<strong>in</strong>: 0<br />

Max: 27<br />

SD=4.0<br />

(n=128)


<strong>Automatic</strong> Index<strong>in</strong>g<br />

Table 3: Correlations between number of terms <strong>in</strong> queries and the succession of search tasks.<br />

Correlations<br />

328<br />

No. of terms <strong>in</strong><br />

query<br />

Succession of<br />

search task<br />

No. of terms <strong>in</strong> query Pearson Correlation 1 .037<br />

Sig. (2-tailed) .386<br />

N 564 564<br />

Succession of search task Pearson Correlation .037 1<br />

Sig. (2-tailed) .386<br />

N 564 564<br />

Table 4: Number of documents retrieved <strong>in</strong> system A and system B us<strong>in</strong>g the possible search<br />

operators (averages)<br />

FT AW ES OW Total<br />

Number of documents 548.3 120.8 27.5 332,6 309.6<br />

retrieved: System A (n=102) (n=110) (n=13) (n=4) (N=229)<br />

Number of documents 24.7 10.2 1.5 18 19.2<br />

retrieved: System B (n=133) (n=78) (n=2) (n=2) (N=215)<br />

Legend: FT=Free text, AW=Pages conta<strong>in</strong><strong>in</strong>g all words, ES=This exact sentence, OW=At least one<br />

of the words. For system B searches: N designate the number of system B searches actually carried<br />

out <strong>in</strong> system B (cf. section 8.2.3).<br />

Table 5: Comb<strong>in</strong>ations of category reformulations with other types of reformulations <strong>in</strong> system B<br />

queries (percentages)<br />

Reformulations Query terms Document type Search operator<br />

Share of N=83. 66 (79.5) 18 (21.7) 17 (20.5)<br />

Legend: The queries <strong>in</strong>cluded <strong>in</strong> the table are system B queries conta<strong>in</strong><strong>in</strong>g category type<br />

reformulations <strong>in</strong> comb<strong>in</strong>ation with the rema<strong>in</strong><strong>in</strong>g types of reformulations. N=83.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!