Automatic indexing in e-government - VBN - Aalborg Universitet

Automatic indexing in e-government: 

Improved access to administrative documents for professional 

users? 

Tanja Svarre 

PhD thesis from Department of Communication 

Aalborg University, Denmark

Automatic indexing in e-government: 

Improved access to administrative documents for professional 

users? 

Tanja Svarre 

PhD thesis from Department of Communication 

Aalborg University, Denmark

CIP – Cataloguing in Publication 

Svarre, Tanja 

Automatic indexing in e-government: Improved access to 

administrative documents for professional users?/ Tanja 

Svarre. – Aalborg: Aalborg University, 2012. xiv, 328 p. 

© Copyright Tanja Svarre 2012 

All rights reserved

Automatisk indeksering indenfor e-government: 

Forbedret adgang til administrative dokumenter 

for professionelle brugere? 

Tanja Svarre 

Ph.d.-afhandling fra 

Institut for Kommunikation, Aalborg Universitet

Acknowledgments 

Finishing this thesis has been possible due to continuous support from colleagues, 

family, and friends. For this, I thank you all. 

First and foremost I want to thank my supervisor, Professor Marianne Lykke. 

You have readily shared valuable knowledge, insight, and experiences, of which I have 

learned a lot. Your constructive feedback for written and oral productions has been 

invaluable, and always moved the project further. You have been an enthusiastic, 

flexible and helpful supervisor. For this I am very grateful. 

Also, I am indebted to a number of present and former colleagues. First I want 

to thank former heads of research programme Pia Borlund and Jette Hyldegaard, former 

head of department Jesper W. Schneider, head of department Jack Andersen (all from 

RSLIS), director of doctoral school Ann Bygholm and head of department Christian 

Jantzen for professionally supporting me during the different phases of this project. My 

gratitude also goes to Professor Pia Borlund for planting the first seeds of my interest in 

research and for always readily discussing theoretical and empirical matters with great 

enthusiasm. To associate professor Jesper W. Schneider for being a persistent 

discussion partner and inspiration on statistical matters and questionnaires, and for good 

companionship on the IC3. To associate professors Haakon Lund and Birger Larsen for 

your flexible and patient support in various technical matters. And to former PhD 

student Charles Seger. Despite the distance you have been a valuable support in good 

times and in bad, and a great companion in traveling. To assistant professor Mette 

Skov, thank for your encouragement, for proofreading chapters, and for being a good 

colleague. PhD Brian Kirkegaard Lunn, you undertook extended responsibilities as my 

“buddy” and was an excellent partner in teaching. Thank you for your always open 

door, at the office and at home. Lastly, I want to thank all my colleagues at Friis for 

your warm welcome. It has been a pleasure to join you. 

Further, I am indebted to Associate Professors Katriina Byström and Tom 

Nyvang, and Professor Gunilla Widén for joining the assessment committee and for 

making time for reading and commenting on the thesis. I highly appreciate your effort. 

I also want to thank people outside the research community of Aalborg University and 

The Royal school of Library and Information Science. I am grateful to Professor 

Susanne Bødker for making my research visit at Aarhus University, Department of 

I

Computer Science possible, and to PhD Niels Mathiassen for good times during my 

stay. 

I am also very thankful to the former National IT and Telecom Agency (IT & 

Telestyrelsen) for providing the topical frame for the project, and for supporting the 

project. Special thanks goes to senior consultant Palle Aagaard, my contact person in 

the agency, for your interest in the project, for making room for practice oriented 

perspectives on empirical matters, both during planning and analysis, for providing 

competent input for the project, and for always being available. I am very grateful to 

SKAT too for making the collaboration possible, and for making employees, office and 

IT facilities available for my empirical somersaults. In particular I want to express my 

gratitude to my contact person in SKAT, special consultant Ebbe Tor Andersen. You 

have been an enthusiastic source of inspiration, always seeing possibilities rather than 

limitations. I also want to thank all the participants of the project. The 340 

questionnaire respondents, the 35 focus group participants and the 42 people 

participating in the search test, either as pilot testers or test persons. Thank you for all 

your inputs, your time, and for your goodwill. And to my transcriber, Timo Iwersen, I 

am grateful for your effort in transforming the interviews into text. 

Last, but by no means least, I am grateful to my boyfriend Sune, and my dear 

family and friends for your continuous patience with me during my writing and working 

on this project. Thank you for your persistence, your help and support, both mentally 

and in practical matters, for believing in me, and for still being there. Without you I 

could not have succeeded in finishing this thesis. Your dedication is highly treasured. 

To my girls, Annika and Maja, I am blessed to have you in my life. Your love carried 

me all the way through this project. 

II

Abstract 

The overall purpose of the present thesis is to investigate, if automatic assigned 

indexing methods can improve professional users’ access to work-based documents in 

the domain of e-government. The problem is investigated by means of a case study in 

the Danish tax authorities SKAT. An experimental comparative test was designed on 

the basis of a preceding domain study, clarifying the seeking behaviour in egovernment. 

The introduction of e-government has arisen from a desire for effectiveness, 

efficiency and greater transparency in public administration. Today public-sector 

employees commonly carry out manual indexing of government documents. With the 

thesis we want to investigate if automatic indexing can replace, and perhaps even 

improve, the current manual procedures to be able to support efficiency and 

effectiveness. 

An employee perspective guides the thesis. That involves a user group with 

great knowledge of the topic they are working with. In contrast to citizens and other egovernment 

stakeholders, not much is known about the seeking behaviour of employees 

in the domain. In addition the introduction of e-government is expected to change 

employees’ work tasks, and with that their information needs. That calls for an 

investigation of the present information seeking behaviour of e-government employees. 

In the thesis this is done by means of a domain study. The study is based on a 

questionnaire distributed to employees in SKAT and subsequent focus group interviews. 

The domain study shows that the employees use a number of primarily online 

information sources to solve their work tasks. The sources are used frequently. The 

employees primarily have verificative and conscious topical information needs. Besides 

that they are experienced information searchers requesting more extensive metadata in 

the system forming the basis of the search test: their intranet. 

The knowledge gained from the domain study was incorporated into the search 

test design. The test was an experimental test comparing automatic extracted indexing 

(free text indexing) and automatic assigned indexing (categorization). In the assigned 

indexing a domain specific taxonomy formed the basis of the categories. The test 

system was a prototype of a future version of SKATs intranet. 32 test persons carried 

out searches with the two indexing types in two separate systems in experimental sense. 

3 simulated search tasks and 1 genuine search task guided the searches. The the 

III

Automatic indexing in e-government 

simulated search tasks were designed in accordance with the findings from the domain 

study regarding the information needs of the employees. The test showed that the two 

automatic types of indexing are useful to the employees in their own way. At a general 

level extracted indexing had the best performance measured in terms of the average 

number of terms and concepts in queries, in terms of the number of sessions with 

reformulations, and in terms of the number of reformulations in sessions. This showed 

that the system with categorization demanded more from the test persons in comparison 

to the free text indexing. 

It turned out that the test persons had difficulties using the 

categorization in some respects. Thus it was not relevant to them, if they retrieved a 

highly relevant document with a high rank order before using the categorization. They 

did not find it relevant either, if they retrieved very few results by the initial search. In 

those cases it was easier for them to manually go through the results. In contrast the 

categorization was helpful in identifying new facets of a search task and in suggesting 

new search terms in reformulations. For future e-government indexing guidelines this 

resulted in the recommendation that both assigned and extracted indexing should be 

represented as search facilitators, as they support their own aspects of the information 

needs arising for employees in e-government. 

The thesis contributes by providing new insights into the information seeking behavior 

of employees in e-government and the way in which this behavior can be supported by 

automatic indexing. 

IV

Abstract in Danish 

Formålet med nærværende afhandling er at afdække, hvorvidt automatisk indeksering 

kan forbedre medarbejderes adgang til arbejdsbaseret informationssøgning indenfor 

domænet digital forvaltning. Problemstillingen undersøges i ph.d projektet som et 

casestudie hos SKAT. Mere præcist foretages en komparativ søgetest. Testen er 

designet på baggrund af et forudgående domænestudie, der afklarer søgeadfærden 

indenfor digital forvaltning. 

Introduktionen af digital forvaltning er opstået ud fra et ønske om 

effektivisering af og øget åbenhed i den offentlige forvaltning. Det er i dag udbredt, at 

offentlige medarbejdere manuelt indekserer forvaltningers dokumenter. Da en af 

grundene til at digitalisere forvaltninger netop er et ønske om øget effektivisering, vil 

det i denne afhandling blive undersøgt, om en automatisk indeksering af dokumenter 

kan erstatte, og måske endda forbedre, den manuelle indeksering i domænet. 

I afhandlingen anskues problemstillingen ud fra et medarbejderperspektiv. Det 

indebærer en brugergruppe, som har en stor viden indenfor det emne, de arbejder med. 

I modsætning til f.eks. borgere ved man ikke meget om medarbejderes 

informationssøgeadfærd indenfor e-government litteraturen. Når man samtidig 

forventer, at digitaliseringen af forvaltninger har en indvirkning på medarbejderes 

arbejdsopgaver, og ved, at arbejdsopgaver influerer på de informationsbehov, 

informationssøgere udvikler, så opstår der et behov for at afdække, hvad der 

kendetegner søgeadfærden hos medarbejdere i den offentlige forvaltning i dag. Dette er 

i afhandlingen blevet gjort ved hjælp af et domænestudie. Domænestudiet er baseret på 

en spørgeskemaundersøgelse blandt medarbejdere i en offentlig forvaltning, samt 

opfølgende fokusgruppeinterviews. Domænestudiet viste, at medarbejderne gør brug af 

en række forskellige informationssystemer i deres arbejde, og at de gør det hyppigt i 

løsningen af deres opgaver. De har primært verifikative og bevidst emneafgrænsede 

informationsbehov. Desuden er de erfarne søgere, som efterspørger langt flere metadata 

i det system, der danner grundlag for søgetesten, men især indholdsmæssige metadata. 

Erfaringerne fra domænestudiet blev indarbejdet i søgetestens design. Testen 

er en komparativ test, der sammenligner automatisk udtrukken indeksering (fritekst 

indeksering) med automatisk tildelt indeksering (kategorisering) på baggrund af en 

domænespecifik taksonomi. Testsystemet er en prototype af medarbejdernes 

kommende intranet. 32 testdeltagere søgte i de to systemer på baggrund af 3 udleverede 

V


og et af deres egne søgeopgaver. Udformningen af de konstruerede søgeopgaver blev 

udformet i overensstemmelse med det, domænestudiet havde vist omkring 

medarbejdernes informationsbehov. Testen viste, at de to former for indeksering er 

anvendelige på hver deres måde. Overordnet havde den udtrukne indeksering den 

bedste performance målt i forhold til antallet af ord og begreber, der blev anvendt i 

forespørgsler, hvor mange sessioner, der indeholdt reformuleringer, samt antallet af 

reformuleringer i de sessioner. Det viste, at systemet med kategorisering krævede mere 

af brugerne, både i forhold til antal søgninger og i forhold til antal termer og begreber, 

der blev indtastet. 

Det viste sig, at testpersonerne havde problemer med at anvende 

kategoriseringen i nogle sammenhænge. Således var den ikke relevant for dem, hvis de 

fik et højrelevant dokument frem blandt de første søgeresultater uden at have brugt 

kategoriseringen. De fandt den heller ikke relevant, hvis de fik så få resultater frem ved 

selve søgningen, det var hurtigere manuelt at kigge dem igennem. Til gengæld kunne 

kategoriseringen hjælpe dem med at identificere nye facetter i søgeopgaver og til at 

foreslå nye søgetermer i forbindelse med reformuleringer. For det videre arbejde med 

retningslinier for indeksering mundede dette resultat ud i en anbefaling af, at begge 

typer bør være til stede i digitale forvaltningers indeksering idet de dækker forskellige 

aspekter af de informationsbehov, der opstår hos medarbejdere i digital forvaltning. 

I sin helhed bidrager afhandlingen ved at give ny viden om sammenhængen 

informationssøgeadfærden for medarbejdere i digitale forvaltninger og den måde, 

hvorpå den identificerede adfærd kan understøttes ved hjælp af automatisk indeksering. 

VI

Table of contents 

1 INTRODUCTION ..................................................................................................................... 1 

1.1 RESEARCH OBJECTIVE .................................................................................................................. 3 

1.2 EMPIRICAL ASSUMPTIONS ............................................................................................................ 4 

1.3 MOTIVATIONS FOR THE THESIS ..................................................................................................... 5 

1.4 RESEARCH QUESTIONS ................................................................................................................. 7 

1.5 STRUCTURE OF THE THESIS ........................................................................................................... 8 

2 METHODOLOGICAL FRAMEWORK .............................................................................. 11 

2.1 A COGNITIVE FRAMEWORK FOR INFORMATION RESEARCH ......................................................... 11 

2.1.1 Towards a holistic cognitive framework .................................................................................... 13 

2.1.2 The role of work tasks ................................................................................................................ 15 

2.2 THE COGNITIVE FRAMEWORK AND THE THESIS ........................................................................... 16 

2.3 OVERALL RESEARCH METHOD: CASE STUDY .............................................................................. 17 

2.4 THE CASE: SKAT ....................................................................................................................... 17 

2.4.1 The intranet ............................................................................................................................... 19 

2.4.2 The intranet taxonomy ............................................................................................................... 21 

2.5 SUMMARY .................................................................................................................................. 23 

3 THE E-GOVERNMENT DOMAIN ...................................................................................... 25 

3.1 DEFINITION AND PURPOSE .......................................................................................................... 26 

3.2 SUBJECT AREAS IN E-GOVERNMENT RESEARCH & DEVELOPMENT (R&D) .................................. 29 

3.3 STAKEHOLDERS IN E-GOVERNMENT ........................................................................................... 34 

3.4 LIS PERSPECTIVES ON E-GOVERNMENT ...................................................................................... 36 

3.4.1 Information systems ................................................................................................................... 36 

3.4.2 Knowledge management ............................................................................................................ 40 

3.4.3 ICT tools: Metadata initiatives .................................................................................................. 42 

3.5 SUMMARY .................................................................................................................................. 46 

4 SEEKING BEHAVIOUR IN E-GOVERNMENT ................................................................ 47 

4.1 INFORMATION SEEKING AND RELATED CONCEPTS ...................................................................... 47 

4.2 THE PURPOSE OF SEEKING STUDIES ............................................................................................ 50 

4.3 ENTITIES OF E-GOVERNMENT: STUDIES OF SEEKING BEHAVIOR .................................................. 50 

4.4 E-GOVERNMENT EMPLOYEE INFORMATION SEEKING .................................................................. 51 

4.4.1 Project INISS ............................................................................................................................. 54 

4.4.2 System development in the Danish Parliament .......................................................................... 55 

4.4.3 Information behavior of employees in a engineering and technical service government office 57 

4.4.4 Federal, state, and local policy makers’ selection of information sources ............................... 58 

4.4.5 Finnish municipal employees .................................................................................................... 59 

4.4.6 Users of the European Parliamentary Documentation Centre .................................................. 60 

4.4.7 Information literacy of Scottish government civil service staff .................................................. 61 

4.4.8 Civil servants’ internet skills ..................................................................................................... 62 

4.5 RELATED STUDIES OF INFORMATION SEEKING AND SEARCHING ................................................. 63 

4.5.1 Legal seeking behavior .............................................................................................................. 63 

4.5.2 Information behaviour of software engineers ............................................................................ 65 

4.5.3 Professional seeking behaviour ................................................................................................. 65 

4.6 SUMMARY .................................................................................................................................. 67 

5 INDEXING OF ELECTRONIC DOCUMENTS .................................................................. 71 

5.1 THE PROCESS OF INDEXING ........................................................................................................ 72 

VII


5.2 QUALITY OF INDEXING ............................................................................................................... 74 

5.2.1 Specificity .................................................................................................................................. 74 

5.2.2 Exhaustivity ............................................................................................................................... 75 

5.2.3 Consistency ................................................................................................................................ 76 

5.2.4 Performance measures .............................................................................................................. 78 

5.3 APPROACHES TO INDEXING ........................................................................................................ 79 

5.3.1 Document, user, and domain oriented indexing ........................................................................ 79 

5.3.2 Controlled vs. uncontrolled indexing ........................................................................................ 81 

5.3.3 Intellectual vs. automatic indexing ............................................................................................ 88 

5.4 APPROACHES TO AUTOMATIC INDEXING .................................................................................... 93 

5.4.1 Automatic extracted indexing .................................................................................................... 94 

5.4.1.1 Lexical analysis and stop word lists .......................................................................................................... 94 

5.4.1.2 Stemming .................................................................................................................................................. 95 

5.4.1.3 Weighting factors ...................................................................................................................................... 96 

5.4.1.4 Compound nouns as index terms ............................................................................................................... 99 

5.4.1.5 Extracted indexing ................................................................................................................................... 101 

5.4.2 Automatic assigned indexing ................................................................................................... 104 

5.5 HYBRID TYPES OF INTELLECTUAL AND AUTOMATIC INDEXING ................................................. 108 

5.6 SUMMARY ................................................................................................................................ 110 

6 EMPIRICAL FRAMEWORK ............................................................................................. 111 

6.1 DOMAIN STUDY ........................................................................................................................ 111 

6.2 QUESTIONNAIRE DESIGN, COLLECTION, AND ANALYSIS ........................................................... 113 

6.2.1 Technique and structure .......................................................................................................... 114 

6.2.2 Content .................................................................................................................................... 115 

6.2.2.1 Background data ...................................................................................................................................... 116 

6.2.2.2 Work tasks ............................................................................................................................................... 116 

6.2.2.3 Elaboration of work tasks ........................................................................................................................ 117 

6.2.3 Data collection ........................................................................................................................ 121 

6.2.4 Pilot testing .............................................................................................................................. 121 

6.2.5 Data analysis ........................................................................................................................... 122 

6.2.6 Methodical reflections ............................................................................................................. 124 

6.3 FOCUS GROUP METHOD ............................................................................................................ 125 

6.3.1 Purpose and design ................................................................................................................. 125 

6.3.2 Data collection: Interview guide ............................................................................................. 127 

6.3.3 Execution and documentation ................................................................................................. 127 

6.3.4 Data analysis ........................................................................................................................... 128 

6.3.5 Limitations ............................................................................................................................... 130 

6.4 SEARCH TEST DESIGN ............................................................................................................... 130 

6.4.1 Test system ............................................................................................................................... 130 

6.4.2 Test persons ............................................................................................................................. 134 

6.4.3 Search tasks ............................................................................................................................. 135 

6.4.4 Test procedure ......................................................................................................................... 138 

6.4.5 Pilot test ................................................................................................................................... 140 

6.4.6 Techniques for data collection and preparation ...................................................................... 141 

6.4.7 Data analysis ........................................................................................................................... 142 

6.5 LIMITATIONS ............................................................................................................................ 148 

6.6 RELATION BETWEEN RESEARCH METHOD AND RESEARCH QUESTIONS ..................................... 149 

7 DOMAIN STUDY RESULTS .............................................................................................. 151 

7.1 QUESTIONNAIRE RESPONDENTS, THEIR BACKGROUND AND WORK TASKS ................................ 151 

7.2 CHARACTERISTICS OF FOCUS GROUP PARTICIPANTS ................................................................. 155 

7.3 RESULTS REGARDING PROFESSIONAL E-GOVERNMENT SEEKING BEHAVIOR ............................. 156 

7.3.1 Use of information sources ...................................................................................................... 157 

7.3.1.1 Reference works ...................................................................................................................................... 157 

7.3.1.2 Web sites ................................................................................................................................................. 158 

7.3.1.3 Internal systems ....................................................................................................................................... 159 

VIII

7.3.2 Colleagues as sources of information ...................................................................................... 165 

7.4 SEEKING RESULTS REGARDING DEMANDS FOR INDEXING IN E-GOVERNMENT ........................... 167 

7.4.1 The frequency on information seeking ..................................................................................... 167 

7.4.2 Types of information needs ...................................................................................................... 173 

7.4.3 Preferred metadata .................................................................................................................. 177 

7.5 SUMMARY AND IMPLICATIONS FOR INDEXING .......................................................................... 182 

8 SEARCH TEST RESULTS .................................................................................................. 185 

8.1 THE TEST PERSONS ................................................................................................................... 185 

8.2 OVERALL SEARCHING BEHAVIOUR AND PERFORMANCE ........................................................... 188 

8.2.1 The search situation ................................................................................................................ 191 

8.2.1.1 Sessions ................................................................................................................................................... 191 

8.2.1.2 Queries .................................................................................................................................................... 192 

8.2.1.3 Search operators ...................................................................................................................................... 196 

8.2.1.4 Filtering by metadata ............................................................................................................................... 198 

8.2.2 Reformulations ........................................................................................................................ 202 

8.2.3 Combined system B sessions and queries ................................................................................ 206 

8.3 SUMMARY AND PERFORMANCE IMPLICATIONS FOR FUTURE INDEXING IN E-GOVERNMENT ...... 211 

9 CONCLUSION AND RECOMMENDATIONS FOR FUTURE WORK ........................ 215 

9.1 SUMMARY OF EMPIRICAL FINDINGS .......................................................................................... 215 

9.2 CONTRIBUTIONS OF THE THESIS ............................................................................................... 219 

9.3 RECOMMENDATIONS FOR FUTURE WORK ................................................................................. 220 

10 REFERENCES ...................................................................................................................... 223 

List of abbreviations ................................................................................................................................................... 245 

Appendices 247 

Appendix 1: Generic work tasks at SKAT ................................................................................................................. 249 

Appendix 2: Distribution of employees across main processes in the business model .............................................. 253 

Appendix 3: E-mail invitation to employees .............................................................................................................. 255 

Appendix 4: Questions contained in questionnaire .................................................................................................... 257 

Appendix 5: Questionnaire pilot test data .................................................................................................................. 259 

Appendix 6: Link to questionnaire ............................................................................................................................. 261 

Appendix 7: Dates for the conduct of focus group interviews ................................................................................... 263 

Appendix 8: Example of the slides guiding a focus group interview ......................................................................... 265 

Appendix 9: Focus group interview guide ................................................................................................................. 275 

Appendix 10: Transcription conventions ................................................................................................................... 277 

Appendix 11: Verbatim Danish versions of quotes used in the thesis ........................................................................ 279 

Appendix 12: E-mail invitation to participate in search test ...................................................................................... 287 

Appendix 13: Questionnaire for recruiting test persons for the search test ................................................................ 291 

Appendix 14: Simulated search tasks ......................................................................................................................... 293 

Appendix 15: Test persons’ insight into simulated search tasks ................................................................................ 295 

Appendix 16: E-mail concerning naturalistic information needs ............................................................................... 297 

Appendix 17: Instructions for search test persons ...................................................................................................... 299 

Appendix 18: Rotation of search tasks ....................................................................................................................... 303 

Appendix 19: Search test interview guide .................................................................................................................. 305 

Appendix 20: Judgement of the relevance of retrieved documents in search test ...................................................... 307 

Appendix 21: Completeness degree of questionnaire responses ................................................................................ 309 

Appendix 22: Respondents’ experience with work tasks ........................................................................................... 311 

Appendix 23: Age distribution of population, respondents and test persons .............................................................. 313 

Appendix 24: Respondents’ length of service in the organization ............................................................................. 315 

Appendix 25: Focus group participants work tasks .................................................................................................... 317 

Appendix 26: Additional sources mentioned by respondents .................................................................................... 319 

Appendix 27: Test persons’ background data ............................................................................................................ 325 

Appendix 28: Supplementary search test tables ......................................................................................................... 327 

IX

List of figures 

Figure 2.1 The participating actors in context. Model adapted from Ingwersen & Järvelin (2005, p. 

261) with minor corrections. ........................................................................................................... 12 

Figure 2.2 Extension of the cognitive view, the interactive process of IR and affecting factors. 

Adapted from Ingwersen & Järvelin (2005, p. 274) with minor corrections. .................................. 14 

Figure 2.3 Information behaviour and the influence from job- or non-job related tasks. Adapted 

from Ingwersen & Järvelin(Ingwersen & Järvelin, 2005, p. 198). .................................................. 16 

Figure 2.4 SKATs revised business model ................................................................................................. 18 

Figure 2.5 Screen dump from existing intranet interface ........................................................................... 20 

Figure 3.1 Disciplines integrated in the multidisciplinary research field og e-government. Adapted 

from Wimmer (2007, p. 14) ............................................................................................................ 27 

Figure 3.2 Basic elements and relations in governmental systems (Grönlund, 2003, p. 56) ...................... 28 

Figure 3.3 E-government hype cycle (Schellong, 2007) ............................................................................ 30 

Figure 3.4 Dimensions and stages in e-government (from Layne & Lee, 2001, p. 124) ............................ 32 

Figure 4.1 Nested model of information seeking and information searching (Wilson, 1999, p. 263) ........ 48 

Figure 4.2 Comprehensive model of information seeking. Adapted from Johnson et al. (1995). ............. 56 

Figure 4.3 Model of cognitive factors affecting information seeking in the domain of software 

engineering. Adapted from Freund, Toms & Waterhouse (2005). ................................................. 66 

Figure 4.4 The process of information seeking of professionals. Adapted from Leckie, Pettigrew & 

Sylvain (1996, p. 180) ..................................................................................................................... 68 

Figure 5.1: Illustration of the subject indexing process (Mai, 2000, p. 279). ............................................. 73 

Figure 5.2 Document and domain oriented approaches to indexing. Adapted from Mai (2005, p. 

607) ................................................................................................................................................. 81 

Figure 5.3 Types of vocabularies and their relationships. Adapted from Morville & Rosenfeld 

(2007, p. 195) .................................................................................................................................. 82 

Figure 5.4 Generalized characteristics of intellectual indexing. Accumulated on the basis of 

Rafferty & Hidderley (2007). .......................................................................................................... 89 

Figure 5.5 The resolving power of significant index terms. Adapted from Luhn (1958a, p. 161) ............. 97 

Figure 6.1 Screen dump from atlas.ti coding of focus group interviews .................................................. 129 

Figure 6.2 Screen dump of the test system: Search fields ........................................................................ 131 

Figure 6.3 Screen dump of the test system: Categorization...................................................................... 133 

Figure 6.4 Relevance types in IR evaluation adapted from Borlund (2003a, p. 915). .............................. 139 

XI

List of tables 

Table 1.1 Timeline for data collection in the PhD project ............................................................................ 7 

Table 3.1 Stakeholders in e-government. Adapted from Rowley (2011, p. 56) ......................................... 35 

Table 3.2 Knowledge management processes and the potential role of IT. Adapted from Alavi & 

Leidner (2001, p. 125) ..................................................................................................................... 41 

Table 4.1 Examples of studies that have examined information seeking and/or searching of various 

stakeholder roles.............................................................................................................................. 52 

Table 5.1 Possible factors affecting consistency. From Lancaster (2003, p. 71). ...................................... 77 

Table 5.2 Summary of strengths and weaknesses of controlled vocabularies and free text. Adapted 

from Dubois (1987, p. 249). ............................................................................................................ 84 

Table 6.1 Indicators of information needs in questionnaire and corresponding theoretical 

descriptions ................................................................................................................................... 119 

Table 6.2 List of respondents' preferred metadata listed in questionnaire ................................................ 120 

Table 6.3 Cross tabulations carried out on the basis of variables in questionnaire data ........................... 123 

Table 6.4 Overview of participants in focus groups ................................................................................. 126 

Table 6.5 Examples of genuine search tasks ............................................................................................ 137 

Table 6.6 Search test variables, their definition and measurement ........................................................... 144 

Table 6.7 Simulated search task facets ..................................................................................................... 146 

Table 6.8 Outline of the relation between research questions and empirical data .................................... 150 

Table 7.1 Distribution of respondents as to their education (percentages) ............................................... 152 

Table 7.2 Number of work tasks selected by respondents ........................................................................ 153 

Table 7.3 Ranked frequency of work tasks in questionnaire results ......................................................... 154 

Table 7.4 Focus group participants' educational background ................................................................... 156 

Table 7.5 Respondents' use of predefined information sources (percentages) (to be continued on 

the succeeding page) ..................................................................................................................... 160 

Table 7.6 Questionnaire results regarding the frequency of information seeking .................................... 166 

Table 7.7 Distribution of indicators of information needs ........................................................................ 172 

Table 7.8 Average percentage distribution of verificative needs (VN), conscious topical needs 

(CTN), and muddled topical needs (MTN). .................................................................................. 175 

Table 7.9 Metadata preferences distributed across work tasks ................................................................. 178 

Table 8.1 Frequency of test persons' intranet use ..................................................................................... 185 

Table 8.2 Ranking of test persons' most important information sources .................................................. 186 

Table 8.3 General evaluation of simulated search tasks in system a, system b, and total (averages) ....... 187 

Table 8.4 Evaluation of simulated search tasks specified to single simulated search tasks 

(averages) ...................................................................................................................................... 187 

Table 8.5 General findings of variables in search test .............................................................................. 188 

XIII


Table 8.6 Session success (percentages) .................................................................................................. 189 

Table 8.7 Query success (percentages) ..................................................................................................... 190 

Table 8.8 Number of queries in sessions at task level (averages) ............................................................ 191 

Table 8.9 Number of queries in sessions as to success or failure (averages) ............................................ 192 

Table 8.10 Number of search terms in queries (averages) ........................................................................ 193 

Table 8.11 Number of search keys in queries (averages) ......................................................................... 193 

Table 8.12 Number of search terms in queries as to success or failure (averages) ................................... 194 

Table 8.13 Number of search keys in queries as to success or failure (averages) .................................... 194 

Table 8.14 Distribution of search operator in queries (percentages) ........................................................ 195 

Table 8.15 Number of search terms used with search operators in queries (averages) ............................ 196 

Table 8.16 Success of search operators (percentages) .............................................................................. 198 

Table 8.17 Document type filter used in queries (percentages) ................................................................ 200 

Table 8.18 Search success for the document type filter in system A and system B queries 

(percentages) ................................................................................................................................. 201 

Table 8.19 Number of sessions with query reformulations (percentages) ................................................ 202 

Table 8.20 Number of reformulations in sessions .................................................................................... 203 

Table 8.21 Types of reformulations for all queries (percentages) ............................................................ 204 

Table 8.22 Query success on the basis of types of reformulations (percentages) ..................................... 205 

Table 8.23 Sessions carried out in system B, or in a combination of System B and system A: 

Frequency and success (percentages) ............................................................................................ 207 

Table 8.24 System of successful queries in combined system B sessions ................................................ 208 

Table 8.25 System B queries: Frequency of category use and query success (percentages) .................... 208 

XIV

1 Introduction 

1 

Chapter 1 

Indexing has been carried out for centuries starting with manual indexing. In the middle 

of the last century, automatic methods were introduced as a counterpart. Though both 

manual and automatic indexing has been studied both theoretically and empirically, 

researchers are still able to identify shortages in our knowledge of indexing in terms of 

quality, issues of cost-effectiveness, our understanding of the effect of indexers and 

information users’ cognitive processes, and the like (e.g., Milstead, 1994; Anderson & 

Perez-Carballo, 2001a, 2001b). The present PhD project explores indexing in the 

context of a specific domain: E-government. Specifically, we investigate the 

performance of two methods for subject indexing in the domain of e-government. The 

purpose of the investigation is to be able to work out a set of recommendations for 

indexing practice in e-government. 

Information overload is a widely recognized problem today (e.g., Edmunds & 

Morris, 2000; Eppler & Mengis, 2004; Codagnone & Wimmer, 2007). Information 

overload is a challenge in private and public organizations. Simultaneously, the 

importance of information in e-government cannot be underestimated. According to 

Klischewski, “[d]ocument processing is at the core of administrative performance in 

several respects: “Documents are the basis for almost all of the administrative 

processes, they are the most valuable resources to exploit as they are the main carriers 

of information and represent a large portion of the overall administrative knowledge 

base” (2006, p. 34). Thus, in democracies, documental support is a key issue for 

operations undertaken in public administrations (Kraemer & Dedrick, 1997; 

Klischewski, 2006; Sabucedo & Rifón, 2006). The consequences of not being able to 

find the needed documents for a given task have previously been considered. The 

calculations carried out by Feldman & Sherman suggest, that supporting corporate 

users’ searching for information is one step towards efficiency and effectiveness 

(Glazer, 1993; Feldman & Sherman, 2001). In addition, public administrations 

expected to offer security to the public. Not being able to find needed information can 

have severe costs (Kraemer & Dedrick, 1997). Studies have indicated, that the facilities 

of e-government systems still leave room for improvement, for instance in terms of 

searching (e.g., Goh et al., 2008), navigation (e.g., de Jong & Lentz, 2006), the extent of 

metadata adoption (e.g., Kopackova, Michalek & Cejna, 2010). In sum, the support of


access to information should be given high priority, if the aim is effective, efficient, and 

secure governments. 

Edmunds & Morris (2000) mentions different methods to reduce information 

overload in organizations, e.g., value-added information. Value-added information 

limits information overload concurrently with increasing users’ access to relevant 

information. A concrete way of value-adding information in e-government documents 

is assignment of metadata. Assignment of metadata in the domain serves several 

purposes, namely allowing interoperability between systems and enabling users to 

retrieve better and more precise search results (Moen, 2001; Tambouris, Manouselis & 

Costopoulou, 2007). Further, metadata ease knowledge sharing between employees in 

e-government internally in organizations as well as externally (Schwartz, Divitini & 

Brasethvik, 2000; Choo, 2006). The multiplicity of metadata standards developed 

specifically for e-government reflect, that governments are very well aware of the 

importance of metadata (cf., Tambouris, Manouselis & Costopoulou, 2007). Metadata 

can be assigned either manually by humans or automatically on the basis of a machine 

generated analysis of the words constituting the documents. In e-government the 

predominant approach is manual assignment. With the present thesis we want to 

investigate, whether the use of automatic indexing can be a means to effective and 

efficient governments that concurrently can support the important process of 

information seeking in the domain. 

The concept of e-government designates governments that utilize ICT in order 

to communicate with and allow access to information for external parties such as 

citizens, businesses and other governments (e.g., Fang, 2002; Jaeger, 2003; Grant & 

Chau, 2005). A variety of purposes for e-government can be identified in the literature. 

The more important ones are openness, improved and more flexible services for citizens 

and businesses, and increased coherence and efficiency of governmental processes (e.g., 

Grönlund & Horan, 2004). The concept of e-government has emerged worldwide 

during the latest decade. The Scandinavian countries have been pioneers in the process 

of digitalizing governments. As a result, Scandinavia have had favourable appearance 

in the various international e-government indexes (Andersen et al., 2005; Henriksen & 

Damsgaard, 2006). In Denmark three successive strategies has formed the basis for the 

development of e-government within the framework of Project Digital Government. In 

2002 “Towards e-government: vision and strategy for the public sector in Denmark” 

(Project Digital Government & The Digital Taskforce, 2002) was published. In 2004, 

“The Danish eGovernment Strategy 2004-2006: realising the potential” (The Danish 

2

3 

Chapter 1 

Government et al., 2004) followed. The latest strategy, “The Danish E-Government 

Strategy 2007-2010: Towards Better Digital Service, Increased Efficiency, and Stronger 

Collaboration” (The Danish Government, Local Government Denmark (LGDK) & 

Danish Regions, 2007) appeared in 2007. The strategies have been carried out as 

cooperation between the most important actors in the Danish governmental system; the 

government, the regions, and the municipalities. The strategies altogether cover the 

period 2001-2010. During the decade they have been in function the strategies have 

become increasingly specific concurrently with the increased knowledge of egovernment. 

In the two latest strategies, automation of employees’ working processes 

has been specifically addressed as a means to reducing the use of resources. The 

principle of effectiveness is carried on in the recent mandate for a new strategy that 

replaces the existing strategy in 2011 (The Danish Government, Local Government 

Denmark & Danish Regions, 2010). Automation of indexing procedures may thus 

support the e-government strategy in terms of reducing the resources spent on carrying 

out indexing and searching for information. 

1.1 Research objective 

The PhD project has been financed by the National IT and Telecom Agency, 

the Royal School of Library and Information Science, and Department of 

Communication, Aalborg University. The overall project idea originated from the 

National IT and Telecom Agency. The agency requested a set of guidelines for the 

application of automatic indexing methods that could be used for the Agency’s work 

with standardization and interoperability in the Danish public sector. We have met this 

assignment by focusing on two indexing methods; automatically extracted indexing 

(full text indexing) and automatically assigned indexing (automatic categorization). 

Thus, the objective is to evaluate, if automatic categorization as an approach to 

automatic indexing can improve retrieval performance in e-government in a 

professional context. We use the case study approach as our general methodical 

approach. The Danish tax authorities SKAT have willingly agreed to be our case of 

study. We investigate employees at SKAT, that is, professional users of information. 

Compared to e-government customers (e.g., citizens and businesses), our target group 

constitutes a homogenous user group. 

Since e-government represents a specific domain, we carry out the empirical 

investigation of the overall research problem in two parts. First we analyse the specific


characteristics of the domain. For this purpose we use a questionnaire for gaining an 

overview of the organization. Subsequently, focus group interviews are employed in 

order to explain and expand the results of the questionnaire survey. The questionnaire 

is used to collect data on the employees’ frequency of information seeking, the types of 

information needs developed, use of information sources, and metadata preferences in 

relation to specific work tasks in the organization. The assumption is that importance of 

information may depend on the work task in question. We refer to this first part of the 

empirical foundation for the thesis as the domain study. 

The second part of the data collection consists of a search test specifically 

investigating the performance of the two indexing methods mentioned above. For the 

design of the search test we use knowledge gained from the domain study in order to 

qualify the search test design. The search test investigates the performance of two test 

systems. Both test systems employ automatic indexing; one extracted (free text 

indexing) and one assigned (automatic categorization). Three simulated and one real 

search job forms the basis of the test persons’ evaluation of the performance of the test 

systems. The relevance of the search results are evaluated by the test persons. The test 

sessions are finished with a short interview. 

1.2 Empirical assumptions 

The empirical design of the PhD project has been guided by our 

methodological starting point: the cognitive view of information seeking and retrieval 

(cf., Ingwersen & Järvelin, 2005). The cognitive viewpoint is methodologically 

considered within the research tradition of cognitive constructivism (Talja, Tuominen & 

Savolainen, 2005). The cognitive viewpoint has emerged as a reaction to a biased focus 

on users in the user oriented research tradition and on systems in the system oriented 

research tradition. Thus, the cognitive viewpoint aims at a holistic view on the process 

of IR interaction in order to achieve integration between the user oriented and the 

system driven research traditions (e.g., Ingwersen, 1992, 1996; Ingwersen & Järvelin, 

2005). The cognitive view emphasizes the cognitive actors interacting in information 

seeking and retrieval. With this view of information seeking and retrieval, the users and 

the information system must be taken into account when testing performance of an 

information system. As a consequence we test the performance of indexing methods by 

involving real, potential users in the search test. Further, we apply an established 

evaluation method for the search test, namely simulated search tasks, which have been 

4

5 

Chapter 1 

suggested by Borlund (Borlund & Ingwersen, 1997; Borlund, 2000, 2003b). The 

purpose of simulated search tasks is to be able to evaluate IR systems in a way that 

ensures both realism and experimental control. 

In the latest presentations of the cognitive view the importance of context in 

information seeking and retrieval have received greater emphasis (e.g., Ingwersen & 

Järvelin, 2005). The cognitive structures of the individual still constitute the core of the 

viewpoint, but the context is considered an influential component in information 

seeking and retrieval. According to Ingwersen & Järvelin (2005, p. 19): “...actors and 

other components function as context to one another in the interaction process. There 

are social, organizational, cultural as well as systemic contexts, which evolve over 

time.” The distinct presence of the concept of context in the literature emphasizes, that 

context must be considered a factor in the interaction process. The definition of what 

constitute context have been discussed and operationalized in relation to information 

behaviour (cf., Courtright, 2007). In the present work we are concerned with a work 

based, organizational context. This calls for a consideration of the influence of that 

specific context as to the results of the search test. This is the main reason for carrying 

out the first part of the empirical data collection: the domain study. We are not guided 

by the theoretical foundation of the domain analysis as formulated by Hjørland and 

Albrechtsen (1995) since it is primarily concerned with scientific domains. Rather we 

are inspired by studies similar to the present domain study. Examples count Leckie, 

Pettigrew & Sylvain (1996), Nielsen (2001) and Freund, Toms & Waterhouse (2005). 

1.3 Motivations for the thesis 

The present research is motivated by different conditions. We have already 

mentioned one of the basic premises of e-government, namely effectiveness and 

efficiency. With the present study we want to investigate, whether automatic indexing 

in the form of automatic categorization can contribute to this premise. The motivation 

for the study relates to two different aspects; indexing and the target group in question. 

The seeking behaviour of e-government employees are, to our knowledge, not 

very well discovered. To compare, numerous studies have been made of the customers 

of e-government (e.g., citizens and businesses) in order to evaluate their use of egovernment 

solutions. Reviews can be found in Robbin, Courtright & Davis (2004) and 

Case (2006). A basic premise for the thesis is that we need to know what characterizes 

e-government employees’ seeking behaviour and the role of information in the domain


in order to be able to tailor the indexing to the seeking behaviour and information needs 

actually experienced by the employees. We will present the studies that after all do 

inform us about e-government users’ seeking behaviour in chapter 3. 

As for automatic indexing, there are different motivations for suggesting 

automatic categorization in the present context. Manual assignment of metadata is a 

costly and time consuming process for administrative employees. If automatic 

categorization proves to support and improve the information seeking of the thesis 

target group, it would at the same time support the intentions about increased 

effectiveness and efficiency in e-government. Also, the literature has demonstrated, that 

ensuring quality and consistency in manually added metadata can be difficult (Anderson 

& Perez-Carballo, 2001a; Lancaster, 2003). Thus, manual indexing tends to depend on 

indexers, both across indexers (inter indexer consistency) and across time (intra indexer 

consistency). Further, in the field of US federal records management, Sprehe, McClure 

& Zellner (2002) found, that different situational factors affected the quality of federal 

employees’ record keeping, diverging the quality of the records management across 

governments. Factors like availability of resources and guidance, the motivation of the 

employees, and efficiency of access to records appeared to be affecting the quality of 

records management in the study. In a recent study of metadata assignment in a Finnish 

government the researchers found, that employees prefer not to assign metadata when 

they have the option. Also, the employees tend to accept default values, whenever they 

are available (Kettunen & Henttonen, 2010). The results suggest that e-government 

indexing might benefit from an automatic solution to indexing in a number of ways. 

The literature has already demonstrated, that the assignment of metadata is one among 

more prerequisites for retrieval and sharing of knowledge in organizations (e.g., Choo, 

2006). If automatic indexing can improve subject metadata, then there is reason to 

assume that the retrieval and sharing of knowledge in the domain is also influenced in a 

positive sense. 

To our knowledge, not much is known about how automatically extracted and 

automatically assigned indexing methods supplement each other. The theory of 

polyrepresentation suggests that the more different types of representation, the more 

cognitive overlap there will be between the representations (Ingwersen, 1996; 

Ingwersen & Järvelin, 2005). Further, the combination of approaches enables taking 

advantage of the strengths of each approach (Anderson & Perez-Carballo, 2001a). 

6

Table 1.1 Timeline for data collection in the PhD project 

Period of time Data type 

December 2008 Survey questionnaire 

June-July 2009 Focus group interviews 

May 2010 Recruitment questionnaire for search test 

May-June 2010 Search test 

7 

Chapter 1 

One final motivation for the thesis concerns automatic categorization. 

Categorization represents a structured way of offering users a subject based overview of 

search results. Categorization have been developed in different prototypes during the 

00’s, though rarely for intranets (Käki, 2005a). Thus, we want to investigate whether 

the use of categorization in e-government is consistent with existing studies of 

categorization. 

1.4 Research questions 

Our overall research question designates the performance of indexing methods 

in the domain of e-government. The overall research methodology is a single case 

study. The specific research questions address the domain study (research question 1) 

and the search test (research question 2) respectively. 

1. What characterizes the e-government employee’s information seeking behaviour in 

relation to: 

1.1. Their use of information sources? 

1.2. Their frequency of information seeking? 

1.3. Their information needs? 

1.4. Their metadata preferences? 

1.5. How does the seeking behaviour affect demands for indexing? 

The first research question and related sub questions are answered on the basis of the 

domain study. The question and sub questions are answered by the quantitative data 

collected from the questionnaire and the qualitative follow up focus group interviews 

(see timeline of the data collection in Table 1.1). Thus the responses aim at providing a


quantitative answer, but also seek to offer explanations for the patterns identified in the 

questionnaire data. 

2. How do automatic extracted indexing and automatic categorization perform in 

relation to the identified domain characteristics as to 

2.1. Number of queries in sessions? 

2.2. Number of terms in queries? 

2.3. Number of concepts in queries? 

2.4. The type of search operator applied? 

2.5. The use of document type filters? 

2.6. Number of reformulations? 

2.7. Types of reformulations? 

2.8. Degree of search success in queries and sessions? 

2.9. Overall performance measured by performance measures? 

2.10. Which implications does the performance of different indexing methods have 

for future indexing and indexing guidelines in the domain of e-government? 

The empirical basis for the second research question and related sub questions is the 

data collected in connection with the search test (see Table 1.1). The search test 

consists of an experimental comparison test of two indexing methods. The test was 

carried out in a realistic setting in a real life governmental intranet. As the purpose of 

the test is to form a basis for ensuring and developing indexing in terms of effectiveness 

and efficiency, variables measuring search time and effort were important factors in the 

test design. Questions 2.1-2.7 are answered on the basis of the search log generated 

during the course of the test. Questions 2.8-2.9 are based on the test persons’ 

assessment of retrieved outcomes. Post search interviews are included to understand 

and explain test person behaviour during the test. In question 2.10 we sum up the 

findings of the search test and provide the perspective of indexing guidelines for egovernment. 

1.5 Structure of the thesis 

The thesis reports two interconnected empirical studies. The first study is the 

domain study, which is followed by the second study: the search test. The reason for 

the succession is that the domain study forms the basis for the search test. The thesis is 

8

9 

Chapter 1 

introduced by a theoretical part. Next follows the empirical part. The theoretical part is 

constituted by the chapters 2, 3, 4, and 5. Chapter 2 makes a more thorough 

presentation of the empirical assumptions introduced above. Here the methodological 

frame guiding both the theoretical parts and the data collection for both domain study 

and search test is outlined. As the case study comprises a part of the methodological 

frame, it is also presented here along with a thorough introduction to the specific case: 

SKAT. 

Chapter 3, 4, and 5 constitute the theoretical basis for the domain study. 

Chapter 3 introduces the research area of e-government. The purpose of the chapter is to 

outline the domain that the present thesis navigates in. In chapter 4 the focus is 

narrowed down to analysing what is known about the seeking behaviour of professional 

e-government users. The theoretical foundation for the search test is presented in 

chapter 5. The chapter contains a review of manual and automatic indexing. The first 

part introduces core concepts and understandings of indexing and categorization, and 

establishes the connection between the two concepts. The second part presents existing 

knowledge on the performance of indexing methods and categorization. 

The empirical part of the thesis comprises the chapters 6, 7, and 8. In chapter 6 

the applied methods and underlying considerations are presented, firstly for the domain 

study, secondly for the search test. The chapter finishes by connecting the empirical 

elements to the research questions of the thesis. Chapter 7 presents the results of the 

domain study. First, a questionnaire was carried out. The questionnaire was followed 

up by 7 focus group interviews. The purpose of the focus groups was a validation and 

elaboration of the questionnaire results. The results are reported when relevant to 

research question 1 and connected sub questions. Chapter 8 contains the results of the 

search test. The overall aim of chapter 8 is to be able to answer questions raised in 

research question 2. Chapter 9 summarizes and discusses the empirical results. The 

thesis is ended by suggestions for further research.

2 Methodological framework 

11 

Chapter 2 

Chapter 2 presents the methodological framework of the thesis. We begin here, as the 

theory of scientific method guides the remaining of the thesis content. In the research 

literature it is suggested to discriminate methodology from methods. Methodology is a 

superior concept that describes, explains, and justifies the methods used in empirical 

studies. Methodology may thus be considered a science theoretical or science 

philosophical concept addressing epistemological concerns. Conversely, method is 

subordinate to methodology and designates the specific methods and techniques applied 

in empirical studies (Wang, 1999). To structure the methodical parts of the thesis we 

are following this division. Therefore, in the present chapter we will present the 

methodological issues that have guided the research design and the collection of data. 

In a later chapter (Chapter 6), we account for the specific methods applied to collect the 

data that constitutes the empirical basis of the thesis. 

2.1 A cognitive framework for information research 

As mentioned in the introduction, we have been working within the the 

cognitive framework of information science. The cognitive view was proposed the first 

time in 1977 (De Mey, 1977; Ingwersen & Järvelin, 2005). Here, the cognitive 

viewpoint was proposed as a reaction to the two predominant research traditions at the 

time; the system driven and the user oriented research traditions. Within the systemdriven 

research tradition significant results has been achieved regarding for instance 

best-match retrieval models, Boolean logic, question answering, and cross-language 

retrieval. The user oriented research tradition on the other hand have obtained 

equivalently essential results, though in relation to increasing our understanding of enduser 

searching, domain oriented information behaviour and the like (Ingwersen, 1996; 

Ingwersen & Järvelin, 2005). Despite the respective importance of their findings, the 

two research traditions have been criticized for being unilateral in their methodological 

approaches. Thus, the system-driven tradition has been following the principle of test 

collections, a principle that arose from the Cranfield model. The Cranfield model 

measured retrieval performance on the basis of a test collection, a set of queries, and a


Figure 2.1 The participating actors in context. Model adapted from Ingwersen & Järvelin (2005, p. 

261) with minor corrections. 

set of relevance assessments (Borlund, 2003b). This laboratory like-approach is the 

counterpoint to the user oriented tradition. Like in the system-driven tradition, the user 

oriented tradition is to a large extent based on empirical investigations, but the 

perspective is operational. Furthermore information users, and not IR systems, are the 

focus of attention (Ingwersen, 1996). This leads to a contrast between the traditions, 

which has been summed up by Robertson & Hancock-Beaulieu to comprise “…on the 

one hand, control over experimental variables, observability, and repeatability, and on 

the other hand, realism.” (1992, p. 460). 

It was as a reaction towards just this contrast that the cognitive view emerged. 

The pioneers of the cognitive viewpoint reacted towards what was considered a onesided 

focus on IR systems or users respectively. Instead an alternative approach was 

suggested that offered a holistic picture of the IR process. It was acknowledged that in 

order to gain a comprehensive picture of the process of IR interaction, the cognitive 

structure of all cognitive actors of the process of interaction needed to be acknowledged 

and taken into consideration (cf. Figure 2.1). Five dimensions represent and summarize 

the cognitive view. They comprise: 

1. “Information processing takes place in senders and recipients of messages; 

2. Processing takes place at different levels; 

12

13 

Chapter 2 

3. During communication of information any actor is influenced by its past 

and present experiences (time) and its social, organizational and cultural 

environment; 

4. Individual actors influence the environment or domain; 

5. Information is situational and contextual.” (Ingwersen & Järvelin, 2005, p. 

25). 

Thus, in the cognitive view, senders and recipients of messages not only encompass 

information users, but any actor contributing to or participating in an aspect of the 

process of IR at that (Ingwersen & Järvelin, 2005, p. 27). By that means the framework 

supported the integration of IR techniques and IR systems including their underlying 

cognitive structures and human information users and their information behaviour. In 

sum it was emphasized that the approach was not solely user oriented, but rather offered 

a framework for all human actors and their cognitive structures involved in IR 

interaction (Ingwersen & Järvelin, 2007, p. 141). 

However, the attention to all cognitive actors did not reduce interest for the 

information user. Thus, the information need of the user functioned as the benchmark 

for measurement of the success of IR systems. The understanding of users’ information 

needs and their formation has been captured by the ASK-hypothesis. The hypothesis 

stated that an information need arises from an anomaly in a user’s state of knowledge 

concerning a topic or situation. Thus, in preparation for IR, users should be asked to 

describe the anomaly rather than to state a request representing the information need to 

an IR system (Belkin, Oddy & Brooks, 1982, p. 62). To summarize, the cognitive view 

allowed for a more detailed representation of information users compared to what was 

previously known from the system driven and the user oriented research traditions. 

2.1.1 Towards a holistic cognitive framework 

From the very beginning researchers within the cognitive view were mainly 

concerned with individual variances of cognitive structures. However, developments in 

surrounding research areas have in the early 1990’s caused proportional change within 

the cognitive framework towards an increased attention to contextual matters. 

Ingwersen brings out two particular papers as landmark to the change of focus 

(Ingwersen, 1999, p. 11 ff.). One is Schamber, Eisenberg & Nilan’s (1990) paper on the 

concept of situational relevance. On the basis of a thorough review the authors 

characterize situational relevance to be a:


1. “[…] multidimensional cognitive concept whose meaning is largely dependent 

on users’ perceptions of information and their own information need 

situations[…] 

2. […] dynamic concept that depends on users’ judgments of the quality of the 

relationship between information and information need at a certain point in 

time[…] 

3. […] complex but systematic and measurable concept if approached 

conceptually and operationally from the user’s perspective.” (Schamber, 

Eisenberg & Nilan, 1990, 1990, p. 774). 

With Schamber, Eisenberg & Nilans paper, the discussion of relevance was re-opened. 

The other paper accentuated by Ingwersen is Robertson & Hancock-Beaulieu’s (1992) 

manifestation of the relevance revolution, the cognitive revolution, and the interactive 

revolution. The relevance revolution addresses the change towards seeing stated 

requests and information needs as two separate phenomenons. The implication is that 

relevance should be assessed on the basis of the information need, and not the request. 

The cognitive revolution is closely connected to the relevance revolution and states the 

growing tendency towards including cognitive perspectives into the process of IR. 

Lastly, the interactive revolution articulates the increased interactivity of IR 

systems. This development necessitates a move away from the principle of evaluating 

IR systems in terms of “one request in, one set of results out”. Instead, time and 

Figure 2.2 Extension of the cognitive view, the interactive process of IR and affecting factors. 

Adapted from Ingwersen & Järvelin (2005, p. 274) with minor corrections. 

14

15 

Chapter 2 

situation need to be taken into consideration in order to do justice to the special 

characteristics of interactive IR (IIR) (Robertson & Hancock-Beaulieu, 1992, pp. 458- 

459). The three revolutions challenge the simplified conception of the IR process 

presented in the system driven research tradition and point out that far more factors 

influence the process. The outcome of the developments was an increased focus on 

context and interaction in the process of IR (see Figure 2.2). 

With the shifting of focus, an equivalent change of potential research areas 

emerged. To illustrate, five categories of variables appear from Figure 2.2; 1) 

organizational task dimensions; 2) actor dimensions; document dimensions; 4) 

algorithmic dimensions; and 5) access and interaction dimensions (Ingwersen & 

Järvelin, 2005, p. 313-314). The intention of the model is to illustrate the influences 

and interactions taking place during IR interaction. Not all studies should necessarily 

incorporate all elements in order to find themselves within the framework. Rather, they 

serve as possible explanations for patterns identified within empirical findings. 

2.1.2 The role of work tasks 

Along with the increased inclusion of context and interaction in the cognitive 

framework, work tasks (or daily-life tasks) have become more central. The work task 

methodology was introduced to LIS in the early 1990s (Vakkari, 2003). The basic 

assumption of using tasks as the foundation of information seeking and retrieval studies 

is that an information intensive task involves information related actions. Thus, the task 

becomes a framework for analysis of IR systems (Byström & Hansen, 2005). The work 

task methodology has mainly been applied to professional work tasks. Lately, however, 

also non-job related tasks have been investigated within the context of the task 

methodology (e.g., Savolainen, 1995; Skov, 2009). 

Tasks are important to the cognitive view, because it is considered as “the 

central element of the context” (Ingwersen & Järvelin, 2005, p. 29). Thus, a work task 

arises from an incident outside of the user and triggers an information need within the 

user, which again triggers seeking behaviour (see Figure 2.3). As a result, to understand 

seeking behaviour and IR interaction, we must understand the composition of tasks and 

their contextual origin. For evaluation purposes within a cognitive frame, building on 

genuine tasks may be challenging, as their extent and usefulness may vary a lot. As a 

consequence comparison between results is impeded. Therefore, to ensure experimental


Figure 2.3 Information behaviour and the influence from job- or non-job related tasks. Adapted 

from Ingwersen & Järvelin(Ingwersen & Järvelin, 2005, p. 198). 

control and realism, simulated work tasks have been proposed as a methodical tool. 

Here cover stories are handed out to information users to form the basis for information 

searching. On this basis of the story, information needs are formed within the user, that 

serve as an equal point of departure for interaction with the IR system under evaluation 

(Borlund, 2000, 2003b). A consequence of using tasks as the baseline for evaluation is 

the application of situational relevance for measurement of performance (cf. Saracevic, 

1996; Borlund, 2003a). 

2.2 The cognitive framework and the thesis 

The cognitive framework was chosen as the methodological frame of reference 

in the present work. The quantitative extent of the framework may be discussed. Thus, 

arguments exist on a wide extension of the framework (Cole & Leide, 2006, p. 175) and 

vice versa (e.g., Järvelin, 2007). Regardless of the prevalence we have applied it to 

guide the empirical part of the project. The overall reason was the nature of the task set 

by the National IT and Telecom Agency, to produce a foundation for giving guidelines 

for automatic indexing within the particular domain of e-government. To be able to 

give guidelines, we needed to discover the actual use of the IR technique among egovernment 

employees, as they are the target user group of the project. That required a 

methodological framework allowing for a search test with a contextual perspective. For 

this purpose the cognitive framework was found suitable. Hereby we were able to 

16

17 

Chapter 2 

discover the domain specific characteristics of indexing methods and add to the general 

and very extensive body of knowledge regarding the performance of indexing methods. 

The methodology is mirrored throughout the research design of the thesis. The 

initial domain study serves the purpose of uncovering contextual characteristics of the 

domain in question, and of providing domain knowledge and insight. In Figure 2.2 this 

corresponds to the right hand side of the model. Different methods have been combined 

for the domain study. Initially, existing studies on seeking behavior within the domain 

and adjacent domains were reviewed. As the amount of existing studies turned out to 

be fairly limited, the review is followed up by an empirical domain study consisting of a 

survey questionnaire and 7 focus group interviews. In similar manner, the search test 

also reflects the methodology. In Figure 2.2 the search test comprise the center and left 

hand side components. Here employees are asked to evaluate a test system on the basis 

of a number of simulated work tasks. As called for in the cognitive framework, 

situational relevance is applied for assessment of search results. 

2.3 Overall research method: Case study 

The method applied in the thesis is a single case study (Yin, 2003, p. 39-40) 

of a large Danish governmental organization: SKAT. Different motivations exist for 

doing case studies. The predominant rationale in the present research study is that the 

organisation in question constitutes a unique case in Denmark due to its pioneer 

position within e-government (see e.g., Østergaard & Olesen, 2004). The strength of 

case studies is their ability to draw on multiple sources of data. Further, case studies 

cover contextual aspects of the case in question (Yin, 2003). The research design 

reported here consists of two main parts; a domain study and a search test. The domain 

study employs a survey questionnaire and focus group interviews as data sources. The 

search test aims at a controlled environment. As in the domain study, we document the 

search test with both quantitative and qualitative data. 

2.4 The case: SKAT 

The prevailing task of SKAT is to collect the major part of taxes in Denmark. 

The organization handles all administration related to taxes, duties, customs, debt 

collection, tax assessment of real estate and cars, and gaming activities (SKAT, 2010).


SKAT is among the largest administrations of the Danish state in terms of employees, 

when compared to similar administrations (Personalestyrelsen, 2010). The 

organization has approximately 8.500 employees located at different office locations 

across Denmark. SKAT has grown over the years due to several mergers of former 

single, minor organizations (e.g., Johansen, 2007). In this manner, the organization 

handles highly diverse work tasks. Snellen (1989, cited from Lips, 1998, p. 326) has 

identified three levels in governments’ service environment; the macro level, the meso 

level, and the micro level. SKAT operates at all three levels, serving the parliament, 

businesses, and citizens. SKAT is organized by tasks rather than geography. In 

practice this means, that specialized work tasks have been consolidated at certain 

geographic locations. The purpose of the sub departments is to serve at national level 

(SKAT, 2010). This organizational structure allows for a highly specialized 

knowledge among the employees. 

Some years ago SKAT carried out a business model for internal use. The 

purpose of the business model was to be able to comprise all work tasks carried out by 

the organization. The work identified 19 condensed work tasks distributed across 6 

main processes. The main processes are: Instruction, settlement, inspection, collection, 

processes of support, and management and development. The two latter main 

processes are internal processes or aimed at servicing the parliamentary part of 

Denmark while the former four has citizens and companies as their target group of 

Figure 2.4 SKATs revised business model 

18

19 

Chapter 2 

tasks appear from Appendix 1. In between the data collection for the domain study 

service. Work tasks carried out across the organization had been described centrally in 

the organization, while department specific work tasks were described by the 

responsible departments. A description of the main processes and condensed work 

and the search test a slight correction of the business model was made. The six main 

processes remained intact, but the condensed work tasks were extended to be applied 

across all main processes. The revised business model is depicted in Figure 2.4. The 

size and importance of the main processes is, at least quantitatively, mirrored by the 

distribution of employees. Thus, settlement and inspection are the largest of the main 

processes, covering approximately 60 percent of the entire workforce. The remaining 

40 percent are divided between the 4 remainder of the main processes (see Appendix 

2). Translated to the terminology of Byström & Hansen (2005) the condensed work 

tasks are at task description level. The main processes represent the lowest level of 

granularity compared to the condensed work tasks (cf. Vakkari, 2003). But also the 

condensed work tasks are fairly coarse grained. In the business model the generic 

work tasks contain more specific sub task descriptions. In Freund, Toms & 

Waterhouse’s (2005) terminology this way of operationalizing work tasks is contentbased. 

As a result it is specifically directed towards tax employees in the case 

organization. 

2.4.1 The intranet 

The intranet of SKAT functions as the test system for the search test. The 

intranet is a CMS based solution accessible to all employees within the organization 

(White, 2005). The intranet mirrors the official web portal of SKAT, which is open to 

the public on the web (see http://www.skat.dk). The public portal communicates 

information directed towards citizens, businesses and legal advisors. Specifically, the 

portal contains legal directions, citizen and business directions and brochures, legal 

documents, forms, news, etc. Further, the portal contains a section for self service for 

both citizens and businesses. On the intranet additional documents are available to the 

employees. Examples count minutes, job postings, reports from finished internal 

projects, HR information and other internal information from the organization and 

departments to the remaining employees. The intranet contains documents from June 

25, 1998 and onwards. By June 2010 the number of documents in the database was 

681.640. The intranet further facilitates personalization of the interface in order to 

optimize which information is offered to individual employees. In sum, we may


characterize the intranet as a knowledge portal in terms of Dias (2001). Thus, the 

intranet is a corporate portal enabling decision support and collaborative processing. In 

addition the “Find colleague” function (“Find kollega”) assists in locating colleagues 

either on the basis of organizational affiliation, physical location or expertise, which 

corresponds to an integrated expertise portal. 

Applying the intranet for the search test has a number of implications in 

empirical respect. With this choice of test system the search test belongs to the research 

area of enterprise search. Enterprise search includes organizations with electronic text 

content, and search of the organization’s intra-, Internet, or other digitalized text (cf. 

Hawking, 2004). Furthermore, a number of characteristics are shared between 

corporate intranets and the web. Thus, both are based on web technology. They 

demonstrate a great heterogeneity as to the document collection, a dynamic nature, and 

both enable hyper linking between documents (cf., Fagin et al., 2003; Rasmussen, 

2003). However, the two system types also differ in several respects. Firstly, the 

premises of the two system types differentiate. Thus, the function of the web is a 

democratic instrument allowing everyone to express anything. On the contrary, 

intranets 

Figure 2.5 Screen dump from existing intranet interface 

20

21 

Chapter 2 

are an organizational tools communicating information of relevance for maintaining 

enterprise work tasks (Fagin et al., 2003; Mukherjee & Mao, 2004; Stenmark, 2005). 

In 2003, Fagin et al. have stated 4 axioms compiling further differences between the 

web and intranets. In short the axioms state that by contrast to internet documents, 

intranet documents are mainly created for distribution of information, not for attracting 

the attention of potential users. In addition, a large amount of intranet queries have a 

small set of correct answers, if not even unique answers. Also, intranets are most 

likely spam free due to limitations as regards publishing access. Lastly, intranets are 

not expected to be search engine friendly due to the lack of interlinking between 

documents. Denoting the characteristics as axioms, Fagin et al indicate that we do not 

have empirical evidence for the correctness of the differences. The lack of empirical 

confirmation may be explained by the difficulties of gaining systematic access to 

perform data collection at corporate intranets (cf. Stenmark, 2005). In terms of the 

present investigation we will account for the specific characteristics concerning 

intranets, whenever we have empirical evidence as support. 

2.4.2 The intranet taxonomy 

The process of indexing on the Internet is obviously by far more extensive than on an 

intranet due to the disparity between numbers of documents. However the need for 

organizing documents on corporate intranets also increases along with the number of 

documents stored (cf. Gilchrist, 2001). This is mirrored by the differences between the 

former and the current taxonomy used on SKATs intranet. As mentioned above, a new 

and enlarged taxonomy was introduced on the intranet as of the beginning of 2008. The 

main functions of a taxonomy is to be able to eliminate uncertainty, control synonyms, 

and establish hierarchical relationships (Zeng, 2008). The preceding taxonomy 

corresponded to these characteristics apart from the latter. Thus, the taxonomy had a 

flat structure with a one level hierarchy. 25 subject terms represented the taxonomy. 

The succeeding taxonomy was expanded in different aspects resulting in a more detailed 

presentation of corporate, controlled terms. One change was the introduction of a 

second level in the hierarchy that enabled an increase of specificity in topic 

representations. Also the number of terms included increased. As of march 2010, the 

taxonomy incorporated 169 terms at both levels of the hierarchy. Lastly, the controlled 

terms of the taxonomy had been supplied with mouse over texts, which basically had 

the form of scope notes as known from thesauri. By these means further reduction of 

ambiguity and increased control of synonyms are gained.


Hitherto the indexing of intranet documents have been carried out manually by 

a large group of indexers distributed across the organization (between 1000-1500 

indexers). A corporate taxonomy has formed the basis for the controlled indexing. It is 

a common practice in e-governments in general, that employees attach subject terms to 

administrative documents. In section 5.3.3, we presented three different kinds of 

intellectual indexing, namely expert-led, author-based, and user-based indexing. The 

manual assignment of subject terms carried out by the employees in the organization is 

not easily characterized as one or the other. The expert-led type is represented in the 

way, that not all employees handle the assignment. Rather, a group of employees carry 

out the task, though the number is quite large. On one side, when a group of employees 

has been appointed to the task, it is reasonable to expect, that they have a more detailed 

insight into the taxonomy compared to the non-indexing colleagues. On the other hand, 

the large number of indexers could mean, that the indexing task is not a very frequent 

one, which again results in a limited insight into the taxonomy. One thing is certain 

about the group of indexers; the typical indexer is not a professional indexer in the sense 

that he or she carries a LIS degree. The indexing at SKAT also contains elements of 

author-based indexing in the sense, that the indexers occasionally will be the authors of 

the indexed documents. Lastly, the indexing may also be characterized as user-based in 

the sense, that the indexers apart from being indexers are also users of the system. 

The document collection at the existing intranet can be divided in two groups; 

documents published before December 31, 2007 and documents published from January 

1, 2008 and ahead. January 1, 2008 signifies the day, when a revised taxonomy was 

taken into use in the case organization. The implementation of the revised taxonomy 

had different implications. The manual assignment of index terms continued after the 

deadline, though following the structure of the revised taxonomy. However, at the same 

time the index terms assigned to the former group of documents were deleted in the 

database. Therefore documents published before January 1, 2008 could only be 

searched by free text indexing. When our cooperation with SKAT started, the 

organization was already working on a new portal solution encompassing their internet 

and intranet. The new portal comprises different changes and improvements including 

automatic categorization of search results, which is brought into focus in the present 

thesis. 

22

2.5 Summary 

23 

Chapter 2 

The present chapter have presented and argued for the overall research 

methodology applied for the PhD project. We have reviewed the cognitive framework 

and its development from an individualistic towards a contextual methodological 

foundation. The choice of methodological standpoint enables the collection and 

analysis of data that supplements the existing general knowledge on the performance of 

automatic indexing methods. Within the cognitive framework the case study 

methodology has been applied as the overall frame for the specific collection and 

analysis of data reported later. Specifically, we carry out a case study of a Danish 

organization, a pioneer in terms of e-government: SKAT.

3 The e-government domain 

25 

Chapter 3 

During the past century governments all over the World have experienced a continuous 

increase in demands for effectivity of procedures and work routines simultaneously 

with expectations for accuracy and quality in public servants’ handling of work tasks 

(eg. Homburg, 2004). Increased transparency of governments towards citizens has 

been another predominant demand on governments during the period (eg. Bertot, 

Jaeger & Grimes, 2010). The demands for transparency have resulted in numerous 

technical solutions for citizen access, e.g., self-service, and subsequent user 

evaluations. However, the citizen perspective on e-government will not be included in 

further detail here due to our focus on employees. 

The development of governments has taken place both at local, national, and 

international levels. It is in this light that the concept of e-government has emerged. 

Thus, digitalization of governments has been an important step towards resolving the 

challenges of increasing effectivity and quality of governmental processes. The 

examination of e-government as a research area started to grow in the late 1990’s (e.g., 

Grönlund & Horan, 2004; Helbig et al., 2008). Since then the increasing number of 

emerging journals and conferences have in their own way clarified the importance of 

the research field. However, e-government is a complex construction due to its roots 

in a number of related research fields. Public administration, management science, 

organization science, information technology, computer science, and library and 

information science are among the interested parties in contributing to the development 

of e-government. With the present chapter we introduce the e-government domain. 

The purpose is to provide an overview and understanding of the domain framing the 

PhD project. Further the presentation enables a characterization and placing of the 

thesis in the domain. The chapter forms the first part of two of the domain study 

review. We initialize the chapter by defining the concept of e-government and related 

concepts along with the purpose of digitalizing governments. This is followed by an 

overview of the steps that have and still do characterize the development within the 

domain. Models are included here for a graphical presentation of different authors’ 

perception and interpretation of the development of the field. The chapter ends with a 

presentation and discussion of the research field of e-government. In this closing 

section, we focus on subject matters relevant to the PhD project as a thorough review


of the entire field of e-government is outside our scope. Specifically, we address 

information systems, knowledge management, and metadata initiatives. 

3.1 Definition and purpose 

Numerous suggestions of what defines the concept of e-government exist. A 

fairly general definition is put forward by Gil-Garcia & Martinez-Moyano (2007, p. 

266), who see e-government as: 

“The use of information and communication technologies in government 

settings.“ 

However, also more detailed definitions have been formulated, e.g., by Fang (2002, p. 

3-4): 

“- the ability to obtain government services through nontraditional electronic 

means, enabling access to government information and to completion of 

government transaction on an anywhere, any time basis and in conformance 

with equal access requirement. 

–offers potential to reshape the public sector and build relationships between 

citizens and the government.” 

Defining the concept of e-government is not a straight forward task. A number of 

researchers have collected and compared several definitions (e.g., Grönlund & Horan, 

2004; Robbin, Courtright & Davis, 2004; Grant & Chau, 2005; Yildiz, 2007; Hu, Pan 

& Wang, 2010). These examples illustrate the missing common understanding of the 

definition. Today, after more than a decade of research, researchers still inquire an 

unambiguous definition (e.g., Grönlund, 2010; Hu, Pan & Wang, 2010). Overall, the 

difficulties are related to the content and the designation of the concept. As regards the 

content, a number of factors help challenge the task. One factor is the lack of 

agreement as to the definition of central concepts (Robbin, Courtright & Davis, 2004). 

Thus, e-government is defined and referred to differently, depending on the actual 

scope of research papers (Fang, 2002; Grönlund & Horan, 2004; Grant & Chau, 2005; 

Grönlund, 2005). In addition the multidisciplinary nature of the research field 

increases the disagreements (Grönlund & Horan, 2004; Hovy, 2008a). The disciplines 

26

27 

Chapter 3 

Figure 3.1 Disciplines integrated in the multidisciplinary research field og e-government. Adapted 

from Wimmer The continuing (2007, p. 14) 

development of the concept is a third factor (c.f., Jaeger, 2003). 

considered as contributing to the field also vary. Wimmer presents the most 

comprehensive number of contributing disciplines in her model (see Figure 3.1). 

When analysed on the basis of e-government researchers’ home departments 

Wimmer’s model is supported (Heeks & Bailur, 2007). 

Secondly, e-government is taking place at two different levels; the micro level 

which concerns the technological changes taking place within governments including 

ICT; and the macro level which refers to the institutional changes that are usually also a 

part of e-government research. The two levels are often separated, which complicates 

the understanding of the concept (Meijer & Homburg, 2008). At the micro level, 

Grönlund (2003) distinguishes between two fields within e-government, one with an 

internal focus and one with external focus organizationally speaking. Both fields imply 

changes in line with Meijer & Homburg’s (2008) two levels. The internal field regards 

the internal changes in governments that follow from employing ICT for different 

professional operations. This field has been developed for some decades already. The 

external field concerns the increasing availability of internet services aimed at external 

parties, e.g., citizens or enterprises (Grönlund, 2003). The ICT systems supporting the 

two fields are referred to as back office and front office systems respectively (e.g., 

Meijer & Homburg, 2008). In this thesis we are concerned with the micro level


Figure 3.2 Basic elements and relations in governmental systems (Grönlund, 2003, p. 56) 

concerning internal governmental changes from ICT, and more specifically information 

seeking and retrieval in relation to employees’ work task fulfilment. 

What can be inferred from above is that the definition to some degree depends 

on the distinct references consulted. The number of related concepts does not ease the 

definition task. Consequently we will define the concepts and the related terms the way 

they are used in the present work below. We use Figure 3.2 to illustrate the concepts. 

The figure is a simplified model of the democratic system, which in practice is far more 

complex. The figure outlines three zones; civil society, formal politics, and 

administration and their reciprocal interactions. 

Government is considered the overall notion for the concepts to follow. The 

concept government “covers several aspects of managing a country, ranging from the 

very form of government to strategic management to daily operations” (Grönlund, 2003, 

p. 56). Others suggest government to be more focused on the political aspect yet 

without leaving out the administrative field. According to Beynon-Davies government 

“connotes a political organization, which is comprised of the individuals and institutions 

that are authorised to formulate public policies and conduct affairs of state. 

Governments are normally tasked with establishing and regulating the interrelationships 

of individuals, groups and organisations within the boundaries of some territory” (2007, 

p. 11). In Figure 3.2 government covers the two areas of formal politics and 

administration. Public administration denotes the sector, enterprises, and activities 

necessary in order to serve a government (Marini, 2000; Johnston, 2004). Here serving 

28

29 

Chapter 3 

implicates formulating, advising on, and implementing governmental policy, and 

managing resources. Thus, public administration deals with all aspects of government 

matters apart from the political, democratic issues. In Figure 3.2 public administration 

covers the field referred to as administration. Moreover, the thesis belongs to the 

administration subfield, as we do not account for either formal politics or civil society. 

As for the designation of the concept, the literature does not offer a unique 

label for e-government. Examples of synonyms are digital government (e.g., 

Marchionini, Samet & Brandt, 2003), one-stop government (e.g., Glassey, 2002), 

eGovernment (e.g., Schellong, 2007), and online government (e.g., Peres, Guzmán & 

Valbuena, 2009). Digital government appears to be the predominant term in the 

United States while electronic government is the preferred term elsewhere (Grönlund 

& Horan, 2004). Grönlund & Horan (2004) differentiate between e-government and egovernance. 

To illustrate the difference, they draw on Figure 3.2. In their definition, 

e-government covers administration and perhaps formal politics, while e-governance 

embraces all three spheres. Though e-governance in this manner appears to be a 

broader concept, e-government as a term is more dominating in the research field. 

Further, since e-government in the definition of Grönlund & Horan (2004) suits the 

scope of the present paper well with our focus on administrative governmental 

employees, we will refer to it as e-government throughout the thesis. Further, our 

focus means that the operationalization of the concept is placed solely in the 

administrative part of Figure 3.2. Due to the lack of agreement as to the terminology, 

we use the predominant European choice of term and refer to the concept as egovernment 

throughout the thesis. However, in the light of the diversities of the 

definition of the concept demonstrated above, we will draw on literature working with 

other definitions as long as it falls within the definition applied here. 

3.2 Subject areas in e-government research & development (R&D) 

The use of information technology in governmental administrations is not a 

new phenomenon. Rather, it has been going on for decades already (e.g., Kraemer & 

King, 1986; Andersen & Kraemer, 1994; Bellamy & Taylor, 1998). However, the term 

e-government was not introduced until the late 1990s. The two eras have been 

considered divided for some time. Whether they still are, or if they are becoming more 

integrated remains an issue of opposite opinions (Grönlund & Horan, 2004; Andersen et 

al., 2005).


E-government may be seen as a natural consequence of the historical and 

technological development. Historically speaking public administration has from the 

late 1970s become increasingly market-oriented (Johnston & Callender, 1997; Box, 

1999; Johnston, 2004). In the wake of this change, the focus for public administration 

have been on “organizational efficiency, the creation of internal market-style 

competitive conditions and the more purposive application of private-sector business 

techniques” (Johnston, 2004, p. 12510). Concurrently, information technology has 

developed rapidly, providing possibilities for technological support of the change of 

focus in public administration 

In 2007, Schellong presents a modified model of the e-government hype cycle 

(see Figure 3.3). The model presents 2002-2003 as the point in time, where egovernment 

peaked. The years before the peak lasted for approximately 7 years. Those 

years introduced information sites, single agencies online services, and portals among 

other things. In the period after the peak, some problematic issues needed to be dealt 

with, for instance security issues and a low citizen uptake. However, this does not 

Figure 3.3 E-government hype cycle (Schellong, 2007) 

30

31 

Chapter 3 

mean, that the concept of e-government is not ongoing anymore. However, it has rather 

been replaced by a more stable plateau of productivity with more advanced and 

technically demanding solutions. Examples are interoperability, enterprise architecture, 

and integrated data management (Schellong, 2007). The optimism identified in Heeks 

& Bailur’s (2006) indicates a continued belief in the potential of e-government. 

Investigating the potential of automatic indexing methods has potential to many 

subareas mentioned in the model. 

Though implementing e-government is in focus across the world and across 

types of governments, the degree of implementation varies. In order to identify the 

stage of development, Layne & Lee (2001) have developed a four stage model (see 

Figure 3.4) encompassing the technological and organizational complexity (ranging 

from simple to complex) and the integration (sparse to complete). The model suggests 

that the relation between the two variables is proportional, that is, as the technological 

and organizational complexity increases, so does the complexity of integration. The 

model expresses the technological level of public administrations allowing for different 

degrees of services to citizens. Layne & Lee take their point of departure in the first 

websites created by governments. Thus, the use of ICT before then is not reflected in 

the model. 

The four steps contained in Layne & Lee’s model comprises 1) Catalogue; 2) 

Transaction; 3) Vertical integration; and 4) Horizontal integration. The step 

“Catalogue” refers to the introductory stage of e-government, where governments create 

websites with information about the government. At this step citizens and other 

stakeholders are helped with fact finding. At this point in time there are different 

motivations for going online. One reason is the possibility to provide external 

stakeholders with information that would otherwise have to be handled by front office 

employees. Another reason is the pressure and expectations from outside that 

information about the government can be found on the internet. At the second step, 

“Transaction”, we see the beginning of online transactions for government stakeholders. 

Thus, it becomes possible to carry out transactions in order to report one’s taxes and the 

like. The step is characterized by automation and digitalization of existing processes. 

“Vertical integration” is defined by a renovation of existing processes and an increased 

degree of connection between government systems in order to enhance the services 

towards stakeholders. Also, the vertical integration allows for exchange of transaction


Figure 3.4 Dimensions and stages in e-government (from Layne & Lee, 2001, p. 124) 

data across systems. At the final step, “Horizontal integration”, additional integration is 

developed. Aside from exchanging data between governments, horizontal integration 

offers integration across government functions, e.g., in the form of one-stop services, 

that are able to meet the range of administrative service needs following from a life or 

business incident (cf. Gouscos et al., 2003). It should be noted, that Layne & Lee’s 

model most likely differs a lot across countries. For instance, the latest report from 

United Nations (2012) demonstrates geographical differences as to e-government 

implementation levels. Also, Gil-Garcia & Martinez-Moyano (2007) hypothesize that 

the evolution of e-government depends on whether the context is at national, state, or 

local level indicating, that e-government initiatives start at national level and are since 

followed up at state and local level of government. The geographical location and level 

of government will probably not have significant influence on the succession of the 

32

33 

Chapter 3 

steps but may result in differentiated grading in the model. As one among a number of 

e-government forerunners, Denmark is placed in the upper right corner of Layne & 

Lee’s model. To exemplify, a recent investigation among Danish municipal IT 

managers showed some prevalence of horizontal integration between information 

systems (Nielsen et al., 2009). Investigating automatic indexing is in principle useful at 

all stages of the model. 

The expected outcome of digitalizing governments is impressive. Two main 

potentials are continual; changes in the communication between government and civil 

society, and more efficient work processes internally in governments. Or in a more 

simple form, the purpose of e-government is to deliver “government that works better 

and costs less” (Office of the Vice President, 1993, cited from Bellamy, 2002, p. 214). 

Thus, by introducing e-government, governments aim to offer improved access to their 

services, make the most of their resources or perhaps even be able to reduce costs, and 

enhance democracy by improving the access to government employees and offering edemocracy 

(Edmiston, 2003). With the introduction of e-government, a shift may be 

observed from government centric services towards more citizen (or other 

stakeholders) centred services. At the same time, the transparency of governmental 

work is intended to increase along with the level of service. Thus, allowing citizens to 

have access to government day and night is considered one way of increasing the level 

of service towards citizens (Bellamy, 2002). In consequence of this, researchers have 

started to investigate for instance applications, changes in the administrations and 

interaction between government and civil society. 

The development of government processes, organization and technologies has 

been expected to change the work tasks of government employees. Before the dawn of 

e-government the concern about information technology and computerization of 

governments to a large extent regarded employment (e.g., Kraemer & Dedrick, 1997). 

Changes are still expected as a consequence of digitalizing governments. However, 

today the use of information technology is rather expected to affect the composition of 

work tasks for governmental employees (Snellen, 2002; Dörfler, 2003; Marchionini, 

Samet & Brandt, 2003; Brown, 2005; Landsforeningen af Kommunale Servicecentre, 

2005; Mahler & Regan, 2005). This is supported by research based suggestions for 

process models that can support governments’ way of handling work tasks (e.g., 

Palkovits, Woitsch & Karagiannis, 2003; Becker, Pfeiffer & Räckers, 2007). 

In 2005, the Danish National Association of Municipal Service Centres 

predicted a change in the work tasks of municipal e-government employees. Thus, due


to self-service solutions the number of complex situations was expected to increase, 

because the citizens are taking care of more simple tasks themselves. Further, the 

share of tasks related to assisting citizens, who are not able to use self-service were 

also expected to increase (Landsforeningen af Kommunale Servicecentre, 2005). This 

expectation is consistent with the results Grundén found when interviewing employees 

in a Swedish County Administration (2009). Here the need for assistance was 

explained by the digital divide of the customers of the government. Also, Mahler & 

Regan (2005) expect changes due to digitalization of federal government agencies. 

Their conclusions are based on qualitative interviews with agency staff in agencies 

with either strong or weak internet presence. They find that the expected increase of 

complaints from citizens does not actually occur. Further, some, but not all citizens are 

able to find needed information on the agency websites and avert casework for the 

agency. In relation to the present work, this means that we cannot assume government 

work tasks to have remained unchanged. The possible change of tasks may also 

influence the information needs developed in terms of complexity (cf. Byströms 

findings, see section 4.4.5). As a consequence we cannot design a search test based on 

older seeking studies in the domain without further ado, at least as regards information 

needs. A validation of their continuous relevance will be needed. 

3.3 Stakeholders in e-government 

The amount of e-government research constantly increases. Reviews of the 

literature have proposed different ways of categorizing the research in order to 

systematize the research conducted. A common way of characterising studies of egovernment 

is to divide the research as to the relation they express. Thus, a relation 

between one (or several) governments and a stakeholder are predominantly articulated. 

The literature has suggested a number of different relations (e.g., Fang, 2002; Beynon- 

Davies, 2007). The primary emphasis in the e-government literature has been on 

citizens, businesses, and governments. The relations indicate the government as the 

key communicator towards different recipient groups. This is stressed by the common 

way of denoting the relations as G2C (government-to-citizen), G2B (government-tobusiness), 

G2G (government-to-government) and so forth. This way of referring to the 

relations is inspired by the field of e-commerce, where B2B and B2C are common 

designations for business-to-business and business-to-consumer. 

34

1 People as service users 

2 People as citizens 

3 Businesses 

4 Small-to-medium sized enterprises 

5 Public administrators (employees) 

6 Other government agencies 

7 Non-profit organizations 

8 Politicians 

9 E-Government project managers 

10 Design and IT developers 

11 Suppliers and partners 

12 Researchers and evaluators 

Table 3.1 Stakeholders in e-government. Adapted from Rowley (2011, p. 56) 

35 

Chapter 3 

The underlying thought about e-government stakeholders is that their 

respective relations to governments differ as to their characteristics. Thus, 

governments cannot necessarily communicate the same way across different 

stakeholders. In her literature review of relations, Rowley proposes a thorough 

typology of stakeholders (see Table 3.1). It is stressed that stakeholders must be 

characterized as to the roles they play rather than as to the groups they form. 

Highlighting roles in advance of groups allow for individuals and organizations to take 

different roles depending on the current situation, they engage in. The purpose of 

elaborating a typology of stakeholders is to be able to identify characteristics of 

specific stakeholders and allow for comparisons (Rowley, 2011). Further, the typology 

enables a more specific addressing of stakeholders, when their specific characteristics 

are described. In the present work we are concerned with one particular type of 

stakeholders, namely public administrators (government employees). Rowley’s 

division of stakeholders just emphasizes that stakeholder groups differ. As a 

consequence seeking behaviour identified in other stakeholder groups are not 

necessarily representative for the behaviour taking place among employees.


3.4 LIS perspectives on e-government 

Above we made a general introduction to the concept of e-government. 

However two core LIS areas within the government context make an important frame 

of reference to our further work. The areas comprise information systems, knowledge 

management and metadata schemas and standards. Further, since our overall 

perspective is on employees, this perspective will also guide the presentation of LIS 

subject areas. 

3.4.1 Information systems 

The number of information systems in e-government is impressive. Bekkers & 

Homburg refer to the amount as “myriad registration functions” (2007, p. 374). The 

information systems are highly diverse in their nature and content (cf. Veal, 2001; Liu, 

Zhu & Gorton, 2007). Content includes for instance statistical information, 

geographical information, legal materials, and information related to specific cases (e.g., 

Bountouri et al., 2009). In addition, information systems are often designed with a 

specific administration in mind, in addition perhaps developed by the administration 

itself (Ministry of finance, 2001). Also the designations applied to refer to types of 

information systems are remarkable. A thorough presentation of all system types is a 

comprehensive task and beyond the scope of the thesis. Instead we will present 

different ways of typologizing e-government information systems below. The purpose 

is to identify existing types of systems and to provide a context for the characterization 

of the system that is the subject for the search test in the empirical part of our work. 

The types of systems may be divided as to different characteristics. One way 

of characterizing information systems is as to whether they are back or front office 

systems. Front office systems are systems directed towards the customers of egovernment; 

citizens, businesses, and external organizations to the government 

(Millard, 2003). Examples are citizen portals such as www.borger.dk or 

www.direct.gov.uk or business portals like www.virk.dk. A front end service in the 

form of a front end system is a product of the introduction of e-government. Obviously, 

governments have always been communicating with citizens and businesses but 

previous to the introduction of portals and other front end systems the communication 

took place in contact offices or through call centres (Codagnone & Wimmer, 2007). 

Back office processes are processes internal to the government in question. Back office 

processes comprises general management and accounting, but also processing of 

36

37 

Chapter 3 

customers’ applications (Codagnone & Wimmer, 2007). Back office systems, then, is 

the designation for systems that supports internal processes of very diverging kind. 

Further, the systems deliver the data communicated through front end systems. Back 

office systems themselves are commonly not visible to the government customers. 

Back end systems have been applied in governmental administrations for decades 

already. As we are testing a back office system in the search test, we will focus on this 

type of systems below. 

Van de Donk & Snellen (1989) have presented a typology of knowledge based 

systems that is usable for discriminating back office systems further. The suggested 

typology has been developed within the domain of public administration. The 

background for the typologization is based on the elements that make up expertise in 

comparison to laymen: 

“1. encyclopedic knowledge of facts and relationships concerning a certain 

field; 

2. proficient reasoning as the basis of a diagnosis; 

3. practical short-circuit reasoning to arrive at a diagnosis; 

4. proficient reasoning as the basis for a solution; 

5. practical short-circuit reasoning to arrive at a solution” (van de Donk & 

Snellen, 1989, p. 4). 

In particular 3 and 5 differentiate the expert from the layman. On the basis of these 

characteristics three types of knowledge systems are suggested: Handling systems, 

advisory systems, and expert systems. Handling systems embrace items 1, 2, and 4 

above. Handling systems contain facts related to specific cases. Cases are handled by 

being placed in a category, of which solutions are known or diagnoses can be made (van 

de Donk & Snellen, 1989). A core example of handling systems are electronic records 

management systems (ERMS) (also known as electronic document management 

systems (EDMS)), that support creation, capturing, processing, sharing, and managing 

organizations’ records or documents (Gunnlaugsdottir, 2008; Hu et al., 2010). Advisory 

systems embrace items 1, 2, 3, and 4 above. Thus, compared to handling systems the 

possibility of arriving at a diagnosis for a problem makes the difference between the 

two system types. Advisory systems are useful, e.g., when there is uncertainty about the 

facts of a case or when the needed qualifications for reaching a decision are vague. 

(van de Donk & Snellen, 1989). Expert systems in principle contain all five elements 

mentioned above. Thus, they are also able to help users to arrive at solutions for


problems. Several characteristics differ advisory systems from expert systems. One 

main difference between the two systems types is that advisory systems support users’ 

own decisions by providing access to data and models while expert systems offers 

decisions and conclusions. This is what leads Ford (1985, p. 26) to characterize 

advisory systems as more flexible than expert systems. In terms of Van de Donk & 

Snellen, the present system of investigation may be characterized as an expert system, 

as it contains documents supporting the professional and legal basis for the employees. 

Also organizational information is contained, while information related to specific cases 

are stored in other systems. 

Saxena & Aly (1995) embrace a wider variety of systems in their typology. 

The context of their work is public administration including policy planning, policy 

implementation, and policy administration. The typology counts: 

1. Administrative processing systems (APS): are able to process large amounts 

of data in order to support administrative routines, typically in the form of 

statistical compilation systems or transaction processing systems. 

2. Management reporting systems (MRS): offer information for management for 

routine, structured, and expected decisions. Compared to APS, who are more 

oriented towards data and efficiency, MRS is rather characterized by 

information and effectiveness. 

3. Decision support systems (DSS): assist users in decision making by offering 

technological support in order for users to become able to develop individual 

decision models, databases, and report formats. 

4. Group decision support systems (GDSS): are the group equivalent to DSS. 

GDSS are commonly used to refer to systems that support group work such as 

communication, information sharing, generation of ideas and so forth. 

5. Executive support systems (ESS): As indicated by the name these systems 

offers top executive direct access to management reports, information and 

mail services without the connecting link of an intermediary of some sort. 

6. Expert systems (ES): initiates human processes of reasoning in a form, that 

could also be handled by human experts. In other words, ES are able to 

supplement or even replace human experts. Expert systems take the form of 

either handling systems or advisory systems (cf. van de Donk & Snellen, 

1989) (Saxena & Aly, 1995, p. 280-281). 

38

39 

Chapter 3 

One may question Saxena & Aly’s interpretation of handling systems and advisory 

systems as examples of ES. In the introduction made by van de Donk & Snellen 

(1989) we rather see handling systems as an example of APS and advisory systems as 

equivalent to either DSS or GDSS. This is the reason for our overall placing of the test 

system as an ES, also in terms of Saxena & Aly, though on the basis of their 

description of the system type. 

The application of systems depends on whether the context of use is policy 

planning, implementation, or administration. Here we are concerned with public 

administration in the form of policy administration. According to Saxena & Aly, the 

relevant systems for this sub area are APS: transaction processing systems (TPS), 

transaction summary information (TPS-TSI) and detailed transaction lists (TPS-DTI); 

DSS, and ES. However, one must keep in mind, that it is a complex assignment to put 

forward an unequivocal typology due to the great variety of tasks carried out by public 

administrations even within policy administration. The actual system use in a real life 

administration may thus differ as to the typology. To draw a parallel to our 

characterization of the test system above, the system also contains information that is 

not necessarily ES oriented as just mentioned. 

In accordance with the focus on efficiency and effectiveness in e-government 

initiatives and systems obviously need to be evaluated with the purpose of justification. 

Thus, information systems need to function as intended in order to be able to support 

efficiency and effectiveness. Evaluation may help discover inexpediencies in the 

system, but also to inform the developers on the strengths and weaknesses of the 

system as regards users’ use of the system. Evaluation consequently constitutes a 

rather inevitable direction in the e-government literature on information systems. The 

literature on evaluation takes two forms. One is concerned with evaluation of specific 

systems. The other represents a methodological perspective, supporting researchers 

with tools for evaluating either prototypes or systems already in use. Evaluation of 

specific systems is either carried out when a new system is proposed or when the 

system has been in function for some time. Examples are Floropoulos et al.’s (2010) 

evaluation of the Greek Tax Information system (TAXIS) from an employee 

perspective, Hu et al.’s (2010) evaluation of agency satisfaction with an ERMS, and


Quam’s (2001) examination of citizens’ use of Bridges 1 . The LIS literature has 

outlined directions for and analyses of system evaluation (e.g., Robertson & Hancock- 

Beaulieu, 1992; Kelly, 2009). But core e-government also offers methods for 

evaluation. For instance Goh et al. (2008) have developed a checklist that can be used 

to evaluate the degree of knowledge management in e-government portals. The 

evaluation carried out in the search test to follow has been designed with established 

LIS evaluation methods as the foundation. A prototype is tested, that is, the system 

had not been in function among the employees at SKAT at the time of the testing. 

3.4.2 Knowledge management 

Knowledge management designates the process of identifying and controlling 

organizational knowledge in order to support the competitiveness of businesses (de 

Groot, 2003). The attempts to manage knowledge in organizations have arose from 

problems with maintaining, locating and applying knowledge in a systematic manner 

(Alavi & Leidner, 2001). Competitiveness may not be a core issue in public 

organizations as such. However, a clear parallel exists between the measurements of 

private sector outcome in the form of competitiveness and public sector measurements 

of effectiveness. This is reflected in the literature on knowledge management in egovernment. 

Thus, though originating from private sector businesses, knowledge 

management have been widely adopted in the public sector. In spite of fundamental 

differences between the goals of private and public organizations, knowledge 

management also has the potential of improving effectiveness, efficiency, and consumer 

satisfaction in a government context (Ha & Zenebe, 2008). These benefits are very 

much in line with the desired outcome of e-government (cf. Goh et al., 2008; Ha & 

Zenebe, 2008). Including knowledge management as one of the future oriented themes 

pointed out by eGovRTD2020 2 reflects the relevance of the concept to e-government 

(cf. Dawes, 2009). 

However, governments are usually organized in a more complex manner than 

businesses. This larger degree of complexity may affect the realization of knowledge 

management (Ha & Zenebe, 2008). Conversely, the complexity may underpin the need 

1 Minnesota’s Gateway to Environmental Information (http://www.bridges.state.mn.us/, accessed on 19- 

06-2012). 

2 eGovRTD2020 is a research project funded by the EU with the purpose of 

40

41 

Chapter 3 

of a systematic way of handling organizational knowledge by making visible knowledge 

that would otherwise be hidden. In this respect, work tasks that cross government 

boundaries comprise a particular challenge (cf. Peel & Rowley, 2010). De Groot (2003, 

p. 95) accumulates the results of not being able to access employees’ knowledge to be: 

“...knowledge is available only to small group of people, [k]nowledge is often not 

available to the people who need certain knowledge, [and] [e]mployees are overloaded 

with irrelevant information”. Also more tangible factors like financial and time 

constraints may dare the realization of knowledge management in governments 

(Hazlett, McAdam & Beggs, 2008). However, as Southon, Todd & Seneque’s (2002) 

study of two private and one public organization shows, the management of knowledge 

can also be challenged in private organizations. 

Knowledge management is fundamentally a construct of organization theory. 

Knowledge management concerns both tacit knowledge and explicit knowledge 

Table 3.2 Knowledge management processes and the potential role of IT. Adapted from Alavi & 

Leidner (2001, p. 125) 

KM processes Knowledge 

creation 

Supporting 

information 

technologies 

Data mining 

Learning 

tools 

IT enables Combining 

new sources 

of knowledge 

Just in time 

learning 

Knowledge 

storage and 

retrieval 

Electronic 

bulletin boards 

Knowledge 

repositories 

Databases 

Support of 

individual and 

organizational 

memory 

Inter-group 

knowledge 

access 

Knowledge 

transfer 

Electronic 

bulletin boards 

Discussion 

forums 

Knowledge 

directories 

More extensive 

internal network 

More 

communication 

channels 

available 

Faster access to 

knowledge 

sources 

Knowledge 

application 

Expert systems 

Workflow 

systems 

Knowledge can 

be applied in 

many locations 

More rapid 

application of 

new knowledge 

through 

workflow 

automation


manifested in some kind physical form, usually as documents (Cong & Pandya, 2003). 

As regards the latter type, information systems in the form of knowledge management 

systems are commonly applied to support the process (Abecker et al., 1998; Alavi & 

Leidner, 2001; Martin, 2008). Employees in governments and government 

“customers”; citizens (cf. Yang et al., 2006), but also businesses, interest groups and the 

like may earn the benefits of government knowledge management. 

Groupware, communication technologies, and specifically intranets are 

important ICT based tools in terms of mediating knowledge management (Alavi & 

Leidner, 2001, p. 125). Four processes constitute knowledge management: knowledge 

creation, knowledge storage and retrieval, knowledge transfer, and knowledge 

application (see Table 3.2). The search test system takes the form of an intranet with 

different functions. First of all, in terms of Alavi & Leidner, the system is a repository 

of knowledge. Both organizational and specialist knowledge is contained. However, 

also sub portal regarding topics of relevance to the employees are contained. Therefore, 

storage and retrieval, transfer and application of knowledge are supported by the 

system. However, in the search test we are primarily concerned with the support of 

retrieval of knowledge, including how both individual and organizational memory are 

supported. Findings may be made as to knowledge transfer and knowledge application, 

but they are not the object of our investigations. 

3.4.3 ICT tools: Metadata initiatives 

On way of supporting information retrieval is to mark up pieces of information 

by means of metadata The principle of describing and representing information units in 

order to be able to retrieve known items, explore new ones, and establish relations 

between items reaches far back in LIS (cf., Haynes, 2004). Referring to the assigned 

data as metadata came into play along with the introduction of electronic resources (El- 

Sherbini & Klim, 2004). Thus, one of the first incidences of the term metadata appears 

in the beginning of the 1990’es (Gilliland-Swetland, 2005). 

Information units can be characterized as to their content, context, and 

structure. The content expresses what the information is about. The content is 

considered intrinsic to the information unit. The context on the other hand is considered 

extrinsic to the information and is associated with the creation of the information. Whquestions 

may help mapping the contextual issues of the information. The structure of 

the information may be either intrinsic or extrinsic or both and expresses formal 

associations inside one information unit or across several units (Gilliland, 2008). The 

42

43 

Chapter 3 

purpose of adding metadata is to be able to “arrange, describe, track and otherwise 

enhance access to information objects” (Gilliland, 2008, p. 2). NISO (2004) applies a 

slightly different tripartition to characterize metadata. NISO divides metadata into 

descriptive, structural, and administrative metadata. Here descriptive metadata 

describes the information unit in order to support discovery and identification. 

Structural metadata has the purpose of indicating the relation between compound 

objects such as the ordering of pages to form chapters. Administrative metadata 

supports the management of resources by informing about creation, file type, technical 

information and access information. The difference of perspective between Gilliland 

and NISO is caused by their difference of application. Gilliland’s tripartition is aimed 

at the LIS sector, while NISO rather is applied for interoperability and other technically 

oriented contexts. Haynes (2004) suggests a further elaboration on the purpose of 

metadata and identifies five core areas of application: 1) resource description, 2) 

information retrieval, 3) management of information, 4) rights management, ownership 

and authenticity, and 5) interoperability and e-commerce. The extent of Haynes’ 

identification thus appears more thorough in that it comprises the perspectives of 

Gilliland and NISO at the same time. 

Metadata formats differ as to their level of complexity. At the lowest level of 

complexity we find full text indexes based on the documents contained in the indexed 

information system. Full text indexes at the lowest level due to the lack of structure in 

the metadata. The next level of complexity contains simple, structured formats. This 

medium level does not necessarily require professionals for metadata assignment. An 

example is the Dublin Core metadata standard designed for mark-up of internet 

resources. The highest level of complexity standards contains more detailed and 

structured standards. Examples are domain specific standards that aim at characterizing 

the information units in a more detailed manner as for example the MARC format 

(Dempsey & Heery, 1998). In the most complex group of standards the assignment of 

metadata requires a thorough knowledge of the format. Hence it cannot be carried out 

by novices. 

Metadata developed for specific domains are referred to as domain-specific 

metadata. Domain-specific metadata have been developed for various fields such as 

museums, archives, and moving pictures (e.g., Vellucci, 1998; Haynes, 2004). 

However, also within e-government metadata has received quite some attention as a 

means of improving access to governmental information. Metadata is considered 

particularly challenging in e-government due to the heterogeneity of the user group


(Alasem, 2009). A number of influential nations have developed standards for metadata 

with the aim of supporting e-government. 

The forerunner for the introduction of government metadata initiatives was the 

global information locator service (GILS) initiative presented in the early 1990es. The 

intention behind GILS was to outline a standard for localizing information that was 

applicable to different domains including governments. GILS is based on a set of 

metadata in order to support semantic mapping, locator records, and interoperability. 

The inspiration for GILS is to a large extent inspired by the principles of bibliographic 

cataloguing (Christian, 1999, 2001). In the United States a large project, government 

information locator service (also referred to as GILS 3 ), was initiated in the mid-1990es 

(Andrews & Duhon, 1997; Moen, 2001). The stepping stone for the project was a 

politically based decision about paper reduction in the United States. The government 

information locator service was heavily based on the GILS. The service was 

thoroughly evaluated in 1996-1997 with a number of purposes, among other things 

understanding how GILS worked as a tool for information resources management and 

how GILS served different user groups (Moen & McClure, 1997). The evaluation 

indicated that the level of implementation at the time of the evaluation still left room for 

improvement. In particular, the implementation was uneven and diverse in nature in the 

administrations selected as evaluation units, which is hardly surprising the span of time 

taken into account. Thus, the problems were not necessarily caused by inexpediencies 

in the service itself but rather by local characteristics of the administrations. 

The development of specific e-government metadata has continued across the 

World throughout the 2000’s (Tambouris, Manouselis & Costopoulou, 2007; Alasem, 

2009). In many cases the Dublin Core metadata standard (Weibel, 1997) has 

constituted a central element (Alasem, 2009). Dublin Core contains 15 data elements 

and may thus be considered a simple format for metadata compared to for instance the 

highly detailed MARC format. In Australia the standard for government metadata, 

Australian Government Locator Service (AGLS), was initiated in 1997. Instead of 

following the GILS, Australia developed a standard based on the Dublin Core standard 

(Haynes, 2004; National Archives of Australia, 2010). Also the European Union has 

3 

In order to avoid confusion we will refer to the acronym GILS, when designation the Global 

Information Locator Service. The Government Information Locator Service will be designated by its 

full name. 

44

45 

Chapter 3 

developed a mark-up language (GovML) in a 2-year project funded by the European 

Commission. GovML is based on an open XML document structure (Kavadias & 

Tambouris, 2003; Glassey, 2004). 

In Denmark, the National IT and Telecom Agency has functioned as advisors 

for governmental offices within the framework of the FESD project. The purpose was to 

give directions and recommendations for digitalizing governments with specific focus 

on implementing electronic document management systems (EDMS) (Center for 

effektivisering og digitalisering, 2002; Steinmark, 2005). However, applying the 

guidelines was not mandatory as also indicated by the choice of terminology. Likewise, 

applying the government information locator service profile in the United States was 

voluntary. Some American states have adopted it while others have applied alternative 

solutions (Moen, 2001). Recently, the Danish initiatives have concerned standardizing 

of data by means of OIOXML, a XML standard developed with the specific purpose of 

exchanging and reusing data across administrations (National IT and Telecom Agency, 

2009). Obviously, an important presupposition for enabling exchange of data between 

administrations is interoperability. 

Fewer initiatives have been taken in order to standardize descriptive metadata 

in e-government. The initiatives are commonly proposed as a component of enterprise 

architecture and takes the form of ontologies (Peristeras, Tatabanis & Goudos, 2009). 

Ontologies are considered a type of KOS (see section 5.3.2) though with different 

characteristics compared to e.g. taxonomies and thesauri (Soergel, 1999; Gilchrist, 

2003; Haynes, 2004; Zeng, 2008). In Denmark, FORM has been developed that 

contains a common language for exchange between Danish governments. FORM is the 

Danish acronym for Joint Cross Governmental Business Reference Model (cf., OECD, 

2010). In their paper, Abecker et al. (1998) outline three levels for characterizing 

ontologies: Information, domain, and enterprise. FORM is characterized as an 

enterprise ontology by virtue of its identification of work tasks carried out across the 

entire Danish public sector (cf., Gilchrist, 2003; OECD, 2010). At present FORM is 

applied in the national portal to the public domain borger.dk. A number of similar 

initiatives and tools have been developed in other countries (cf., Peristeras, Tatabanis & 

Goudos, 2009). As appears for above, the undertakings regarding metadata have to a 

large extent been concerned with the development of standards. The evaluation of 

initiatives has received less attention in the research literature. In this sense, the present 

project can increase our understanding of the role of metadata, when profession users 

seek information.


3.5 Summary 

E-government is a fairly new interdisciplinary research field comprising 

research fields such as social sciences, computer science, public administration, 

organization studies. The field started out some 20 years ago and in the intervening 

time it has been consolidated with academic journals and conferences. The point of 

departure of the research field was an increased focus on effectivity and efficiency of 

governments along with demands for transparency of public administrations. To some 

extent the development of e-government has been inspired by e-commerce, that is, the 

business world. However, the two worlds differ as to a number of characteristics. 

Examples count the number and types of stakeholders and the complexity of 

organizations. Thus experiences cannot be directly transferred between the two areas. 

The present PhD project is placed within information science, which also 

shapes the approach to e-government. A variety of system types exists that supports egovernment. 

The system that hosts the comparative test of categorization in the PhD 

project is an intranet. Overall, we characterize it as an expert system due to different 

definitions above. However, as it is an intranet, other objects are contained in the 

database too. However, the character of the system places the system as a tool for 

knowledge management. Here, we are mainly concerned with the retrieval of 

knowledge. We test the system with professional employees. From Rowley we have 

learned that many stakeholders exist within e-government and that they do not 

necessarily act the same as regards information seeking. Further we have seen that the 

introduction of e-government most likely has meant a change of work tasks for 

employees. Together this makes demands for the design of the search test. Lastly, the 

investigations of metadata in e-government have to a large extent been concerned with 

metadata formats and to a less degree with descriptive metadata. Further, the existing 

knowledge of the meaning of metadata in e-government information seeking is limited. 

Therefore it is our aim to add to this knowledge by means of the project. 

46

4 Seeking behaviour in e-government 

47 

Chapter 4 

Information seeking constitutes a core research field in the user oriented research 

tradition (e.g., Ingwersen, 1996; Åström, 2007). Further, information seeking has been 

studied in LIS for decades (Ingwersen & Järvelin, 2005). Thus, ARIST started out with 

annual reviews on information needs and uses in 1966. Though the reviews on the 

subject only had an annual frequency until 1972 the ever increasing number of research 

articles and reviews on the subject states the importance of the research field. 

Studies of information seeking in the context of e-government serve different 

purposes. One purpose is the evaluation of (digital) information services. Are they 

being used and how? Does the use reflect the intentions behind the service? Another 

purpose is to characterize the use of information and information services in order to 

enable meeting this use, when designing new initiatives (e.g., Wilson, 1999). 

The purpose of the present chapter is to supplement the prior theoretical 

chapter with a review of information seeking studies within e-government. With the 

present chapter we want to provide an overview of the current state of knowledge 

concerning professional users of information in the context of e-government. We 

introduce the chapter with a definition of the concept of information seeking as it serves 

as the frame of reference for reviewing studies of information seeking. Next follows a 

presentation of the coverage of different e-government stakeholders’ seeking behaviour. 

The purpose of this subsection is to compare the amount of knowledge of other 

stakeholders to the particular group in question here: employees. The brief comparison 

is followed by a review of the current state of knowledge about the information seeking 

of e-government employees. We finish the fourth chapter with a summary. 

4.1 Information seeking and related concepts 

Information seeking designates “the conscious effort to acquire information in 

response to a need or gap in your knowledge” (Case, 2007, p. 5). Further, information 

seeking describes “the variety of methods people employ to discover, and gain access to 

information resources…” (Wilson, 1999, p. 263). Two concepts are closely related to


Figure 4.1 Nested model of information seeking and information searching (Wilson, 1999, p. 263) 

information seeking; information behavior and information searching. Information 

behavior is superordinate to information seeking. The concept is considered a part of 

general human communication behavior and may be defined as “the more general field 

of investigation…” (Wilson, 1999, p. 263). Information searching on the other hand is 

subordinate to information seeking and represents the situation, when a user interacts 

with an information system in order to solve a need for information. Since consulting an 

IR system is one among more possible ways to solve an information need, information 

searching must be characterized as potentially contained in information seeking. The 

relation between the three concepts is illustrated in Figure 4.1. The figure is based on 

analyses of a number of existing models and therefore serves as a metamodel. 

It is implicit to the concept of information seeking that it occurs when a subject 

has experienced some sort of gap in their knowledge and a need for information has 

arisen. Information needs as the triggering element have been a common point of 

departure for studies of information seeking and searching. The concept of information 

need denotes a problematic situation which, unless the problem is very simple or 

routine, causes an information need (MacMullin & Taylor, 1984, p. 93). Different 

theories of the nature of the information need have been presented. Taylor (1968) has 

outlined four stages to characterize an information need (Q1-Q4). Libraries can apply 

the stages in order to help the user at which ever stage his information need is. Belkin 

48

49 

Chapter 4 

Oddy & Brooks’ (1982) contribution is concerned with the background of the 

information need. They have put forward the ASK hypothesis depicting that an 

information need arises from an anomaly in the user’s state of knowledge. The idea 

behind the hypothesis is, that it will be easier for the user to describe the anomaly 

instead of describing the information need in the language of the information system 

(Belkin, Oddy & Brooks, 1982, p. 62). Also Ingwersen (1986a) has offered an 

empirically based typology of information needs of users. The typology comprises 

three different types; verificative information needs, conscious topical information 

needs, and muddled topical information needs. Originally the identification of the three 

types was based on empirical results from library users. When having a verificative 

information need the user wants to locate or verify an item. The user possesses 

characteristic bibliographic data on the item wanted. The conscious topical information 

need refers to a situation, where “the user wants to clarify, review or pursue aspects of 

known subject matter”. Finally, the muddled topical information need describes a user 

wanting to explore new concepts outside of subject matters known to the user ahead of 

the information need. Recently, Ingwersen & Järvelin (2005, p. 289-293) have added 

further specification to the theories of the information need. Here, three dimensions 

characterizing the information need have been identified, namely the user’s 

intentionality behind the search task (whether searching for source contents or data 

entities), the type of knowledge known by the user (whether declarative and/or 

procedural domain knowledge), and the quality of the user’s current knowledge 

(whether well- or ill-defined). Combinations of the three dimensions lead the authors to 

specify eight different information need types ranging from different known item 

searches to muddled types. 

From the 1990s the concept of task has gained attention as to explaining 

information seeking and information searching (Vakkari, 2003). Information needs and 

seeking strongly depend on the underlying task, which explains the relevance of the 

concept in seeking studies. Tasks may have been implicit in earlier theories of the 

information need formation (cf., Byström & Järvelin, 1995). However, it is with the 

empirically based identification of types of tasks and their consequences for information 

seeking actions that the value of tasks as a qualified methodical alternative to 

information needs as the point of departure for studies of information seeking has been 

proven (see e.g., Byström & Järvelin, 1995; Byström, 1997, 2002).


4.2 The purpose of seeking studies 

The study of information seeking is important, because it provides important 

knowledge about users of information. This knowledge is essential when developing 

information services regardless the choice of channel. Thus, studies of information 

seeking may inform the design process since they are able to specify the navigational 

structure and data needed by a particular user group in order for them to be able to 

localize specific information (cf. Rouse & Rouse, 1984; Wilson, 1999). But, as pointed 

out by Wilson (1981, p. 7), studies of information seeking can also stand alone as basic 

research, not necessarily with any practical applications or implications but rather 

increasing our knowledge on why users act the way that they do. This second type of 

studies in particular expresses the change of approach in information seeking studies. 

Thus, the focus of information seeking studies has moved away from examining the 

artifacts of information seeking in what is referred to as system-centered research. 

Recent studies rather emphasize the information user in the user-centered research 

tradition of information seeking studies (e.g., Case, 2007; Courtright, 2007; Vakkari, 

1999). Along with this change of emphasis towards the users of information, the 

context for information seeking has received more attention. Taylor’s (1991) paper on 

information use environments points out the differences in use and perception of 

information in different groups of users, suggesting that information seeking must be 

studied with point of departure in specific user groups. 

4.3 Entities of e-government: studies of seeking behavior 

Information seeking behaviour in general has been thoroughly discovered 

within library and information science. One area of seeking studies have been focusing 

on the seeking behaviour in relation to work contexts, e.g., engineers and lawyers (e.g., 

Case, 2007). But also the area of e-government has been the subject of investigation. 

We have previously outlined the stakeholders of e-government (see section 3.3). 

Rowley’s (2011) typology has been presented in order to be able to, among others, 

identify the differences between needs or demands, that characterize different 

stakeholders. In this section we will apply the typology as a tool for categorizing 

studies of seeking behavior within e-government. The purpose of introducing the 

typology in the present chapter is to outline the research coverage of the different 

stakeholders as to their patterns of information seeking. Obviously, since the typology 

50

51 

Chapter 4 

has not been developed with this particular purpose in mind, not all roles are necessarily 

relevant as objects of investigation in the present framework. For instance we do not 

expect to find seeking studies investigating roles that are not subject to government 

services. By this, we particularly mean the meta actors comprising the last four roles in 

Rowley’s typology, namely project managers, design and IT developers, suppliers and 

partners, and researchers and evaluators. Further, since this chapter serves the function 

of setting the stage for our empirical domain study, we are limiting the following review 

to geographic locations that share level of development with Denmark, which is the 

geographic location of our case organization. 

We have already mentioned (section 3.3) that citizens, businesses, and 

governments have received much attention in the e-government research literature. 

Among others, this is also reflected in seeking studies of e-government stakeholders; in 

particular the seeking and searching behavior of citizens is well discovered. Also 

politicians elected for office have been rather well discovered. Table 4.1 presents 

studies exemplifying seeking studies of different stakeholders. Employees have been 

left out of the table since we are going more into detail with this particular stakeholder 

role from section 4.4 and onwards. The division of the typology does have some 

influence on how seeking studies can be placed. For instance citizens are divided as to 

whether they are general citizens or users of a particular service. This means that the 

studies that can be placed in the latter group are mainly searching studies reflecting the 

use and often also evaluation of a particular service. The evaluative character of the 

latter type of studies also means that they do not necessarily include searching behavior 

per se, such as selection of search terms or modification of queries. 

4.4 E-government employee information seeking 

A number of selection criteria have guided the inclusion and exclusion of studies in 

this review. We have previously mentioned the diverging maturity levels of egovernment 

at national levels. In our review we are focusing on countries that have a 

maturity level similar to Denmark. It would be reasonable to argue that an even 

narrower geographical delimitation would be required due to the specific 

characteristics in the Scandinavian administrative tradition (cf. Arellano-Gault & del 

Castillo-Vega, 2004). However, since we are concerned with seeking behaviour in 

relation to carrying out administrative work tasks and not the administrative tradition


Table 4.1 Examples of studies that have examined information seeking and/or searching of various 

stakeholder roles. 

Stakeholder Author(s) Object of study Methods applied 

People: Service 

users 

Fu, Farn & 

Chao (2006) 

Hu et al. 

(2008) 

Wang & 

Shih (2009) 

People: Citizens Jaeger & 

Thompson 

(2004) 

Businesses and 

Small and 

Medium sized 

Enterprises 

Reddick 

(2005) 

Chau, Fang 

& Sheng 

(2007) 

Citizens’ acceptance of e-tax 

filing 

Determinants of service 

quality in e-tax filing 

Factors influencing citizens’ 

use of government 

information kiosks 

E-government non-users 

among citizens 

Citizens’ interaction with egovernment 

Citizens’ use of a particular 

website: Utah.gov 

Rains (2008) Citizens’ information 

behavior when looking for 

health related information 

Cuillier & 

Piotrowski 

(2009) 

College students’, internet 

volunteers’ and citizens’ 

general seeking behavior and 

perceptions of access to 

government information 

Ren (1999) SME executives’ use of 

government information 

sources 

52 

Survey questionnaire 

Two-stage online 

survey of citizens 


Literary study 

Citizen telephone 

surveys 

Log analysis 


In-class paper 

questionnaires, 

online surveys, and 

phone surveys 

Survey questionnaire

Stakeholder Author(s) Object of study Methods applied 

Non-profit 

organizations 

Elwood 

(2008) 

Politicians Nicholas & 

Colgrave 

(1996) 

The particular challenges 

connected to meeting the 

data needs of grass root 

organizations 

Nikoi (2008) NGO-workers’ 

information needs 

Orton, 

Marcella & 

Baxter 

(2000) 

Askim 

(2007; 2009) 

Information needs of 

British local councilors 

Parliamentary members 

in the United Kingdom 

53 

Chapter 4 

Observation of 2 

organizations and semistructured 

interviews of 

respondents surrounding 

the organizations 

Interviews, observation, 

and analyses of the content 

of information already 

gathered by the respondents 

Interviews and subsequent 

survey questionnaire 

Case study of two 

parliamentary members 

including observation and 

log analysis 

per se we find it reasonable to include studies from other administrative traditions as 

well. Finally, we have not limited the included studies as to their time of publication. 

The rationale behind this decision is the presence of ICT in governments that dates far 

back. Thus, employees have been using ICT as a part of their work for decades 

already. Therefore we suppose, that studies that have been carried out before the 

introduction of the concept of e-government may offer valuable insights into the 

seeking behaviour of our target group. 

As it will appear from the sections to follow, the amount of research 

conducted of information seeking of employees within e-government is limited. 

Therefore we will supplement the review with studies of related and relevant user 

groups that can enrich our uncovering of e-government employees. One area we are 

drawing on is seeking studies of professions with similar characteristics to the context 

in question here. Also, we will consult studies of e-government employees that are not


core seeking or searching studies, but still contribute to our knowledge of the target 

group in question. Not all studies strictly investigate employees. We also see 

examples of studies that inform us about employees, but at the same time include other 

stakeholders such as politicians or citizens (e.g., Marcella et al., 2007). These studies 

will be included in the review given that they are able to add to the knowledge about 

employees. Numerous studies exist on information needs and seeking behaviour 

within medical health professionals (a review of recent studies can be found in Fourie, 

2009). Though medical health may be considered a subcategory of e-government, 

these studies will not be considered in the review since the nature of the applied 

information diverges considerably from the information employed by the user group 

that is in focus here. 

4.4.1 Project INISS 

In the late seventies, Wilson et al (e.g., Wilson & Streatfield, 1977; Wilson, 

1980) performed a large observational study of social workers and social 

administrators; project INISS. Wilson’s participation project INISS (information 

needs and information services in local authority social services departments) had the 

purpose of examining information needs and information behaviour among social 

workers and social administrators (Wilson, 1980, p. 199). The results were supposed 

to be used for improving and developing information system organization and 

information service delivery (Wilson, 1980, p. 199). The project was carried out in a 

selected set of British local authorities departments representing both urban and rural 

departments. Furthermore the test persons reflected different categories of employees 

(Wilson, 1980, p. 203). 22 subjects were observed using structured observation, 

providing 6.000 records of communication events (Wilson & Streatfield, 1977, p. 282). 

The study is primarily a study of information behaviour. Hence, it is concerned with 

multiple aspects of the work situation of the subjects being studied. Still there are 

elements of information seeking in the results, e.g. when referring to the role of current 

awareness bulletins (Wilson & Streatfield, 1977, p. 285). 

The study shows, that 74% of all sessions last 5 minutes or less (Wilson & 

Streatfield, 1977, p. 284). The issue of limited time is still noted years after Wilson & 

Streatfield’s study (see e.g., Quirchmayr & Traunmüller, 1991). In addition, the social 

service staff members stress the importance of clearly and succinctly presented texts, 

preferably in a format, which makes the identification of key elements easy accessible 

(Wilson, 1980, p. 211). As regards information needs, the study indicates, that 

54

55 

Chapter 4 

information needs among the participants are more complex that just verificative needs. 

The study finds that topical needs, whether conscious or muddled are present among the 

participants. However, this does not exclude the presence of verificative information 

needs in the social services departments. Put another way, this indicates that the work 

tasks of the employees generate both verificative and topical information needs. In the 

personal files observed in the study, a number of different information types are to be 

found; e.g. committee papers, pamphlets, reports, and statistics (Wilson, 1980, p. 211). 

It means that the length of the single units of information is varying. 

4.4.2 System development in the Danish Parliament 

In 1989 and onwards Ingwersen (1994) worked as a consultant on a project 

regarding the introduction of a new information system in the Danish Parliament. The 

project included the design and development of a thesaurus. As an introductory part of 

the project the information structure and working processes of people employed in the 

Parliament were analysed. For this particular purpose a user and domain analysis was 

carried out. The empirical basis for the analysis comprised interviews with 32 

respondents on the basis of a structured questionnaire. The respondents were in some 

cases groups of respondents resulting in 32+ respondents. The respondents comprise 

members of parliament (MP’s), assistants, and secretariat employees. The questionnaire 

consisted of four parts: 1) Characteristics of respondents, 2) Quality of information 

(critical incident); 3) Quality of information (in general), and 4) Types of subject terms 

and subject levels. 

The results of the user and domain analysis primarily inform us about the 

searching behaviour of the participants. Thus, the study gives directions for the 

required functionalities of the future information system to be implemented. What 

characterizes the documents of the organization is that they are connected in a highly 

complex manner reflecting the different stages in the law-making process. An 

important feature of the system is that it allows for high precision searches. Thus, the 

respondents consider it an important facility of searching that they are able to identify a 

specific document, when descriptive data or subject data are known in advance. This 

demand is connected to the participants’ need to be able to limit search results as much 

as possible. The paper outlines different ways of reaching this goal. One way is to 

assign several controlled index terms to each document, allowing for discrimination 

between documents. Here one should note the need to discriminate between documents. 

Another parameter is the short window of currency of the documents expressed by the


respondents. Thus, the participants consider the documents of the information system 

as outdated after two years or even less. Thirdly, exhaustive descriptive metadata need 

to be available in order to support searches, when one or several descriptive 

characteristics of the document are known. 

Another result of the study regards potential terms for the future thesaurus. 

Here differences of opinion were expressed as to what makes a synonym. Thus, the 

respondents were presented with different synonyms in the fourth part of the interview 

that they marked as either related or not related terms. It appears that different 

employment functions did not agree on the relations of specific terms. MPs in several 

cases seemed more certain than the remaining two groups of employment as to when 

synonyms were related and when they were not. Thus, when applying thesauri or other 

controlled indexing languages, one must take into account that sub groups of an 

organization may differ as to their perception of relations between concepts despite that 

they work with the same subject areas. On the other hand all three groups represented 

in the study share the opinion that popular expressions for laws and political concepts 

must be present in the thesaurus. 

Figure 4.2 Comprehensive model of information seeking. Adapted from Johnson et al. (1995). 

56

57 

Chapter 4 

4.4.3 Information behavior of employees in a engineering and technical service 

government office 

Johnson et al.’s (1995) study takes it’s point of departure in the employees of a 

governmental agency concerned with engineering and technical services. The intention 

behind the study is to investigate the background factors that affect information seeking 

actions. The dependent variable of the study, information seeking actions, involves two 

aspects, namely scope, that is, the range of people consulted in order to access 

information, and depth, that is, the amount of information sought. In this sense the 

study is not a seeking study per se. Rather, the study informs us about the factors, that 

affect seeking behavior. The model tested in the study appears from Figure 4.2. 

The empirical basis of the study includes 380 responses to a survey 

questionnaire. 26 percent did not respond to the questionnaire. The respondents were 

characterized as having a fairly long seniority in the organization. Also, the 

communication in the organization is extensive along with interpersonal and group 

interdependence. The tests of the model show that the strongest paths exist between 

characteristics and action, and between cultural beliefs and characteristics. As regards 

the former path relation it means that among the tested independent variables, the 

respondents’ assessment of the quality of communication channels guides the amount of 

information and people approached in order to solve an information need. Though not 

specifically expressed in the paper we assume that the relation between the two is 

inversely proportional meaning that with high quality of the channels less information 

and people need to be approached. The latter path relation moves a step backwards in 

the model from the former relation and expresses a strong relation between cultural 

beliefs and characteristics. Thus, the relation documents that for the respondent group 

the cultural conception of a channel decides on the subsequent assessment of the 

channel. What is also found in the study is that some modifications need to be made to 

the proposed model. Thus, some of the variables put forward in the left column of 

Figure 4.2 do not take the path through utility. Instead, they have a direct effect on both 

characteristics (in the middle column) and actions (right column). This finding suggests 

that the variables in the left column can be seen as important to several stages of the 

process outlined in the model. In other words, we can see demographics, direct 

experience, beliefs, and salience as direct indicators of consulted channels in 

information seeking.


4.4.4 Federal, state, and local policy makers’ selection of information sources 

The purpose of Oh’s (1996) study is to investigate which factors have an 

influence on selection of information in bureaucracies, Also the relation between the 

identified influencing factors is under study. Thus, the study is similar to the study by 

Johnson et al. (1995). Oh’s study informs us about the seeking behaviour of 

government employees. On the basis of existing theory and results a theoretical path 

model is created to explain information selection. One assumption behind the study is 

that distinct policy area affects the selection of information. This is the reason for 

testing the path model in two different policy areas within mental health, namely 

delivery and financing of mental health service. The two policy areas are selected on 

the basis of the assumption that the former primarily comprises generalists while the 

latter mainly include specialists. The method applied in the study is twofold. First, a 

series of open ended interviews were carried out. The interviews were subsequently 

coded. The purpose of the first study was to discover basic information about the policy 

making process of the implied policy areas. Second, a series of questionnaires were 

carried out in order to be able to test the path model. 

The results demonstrate that the generalist and the specialist groups do have 

some features in common while they differ at other points. The characteristics of 

selection of information sources comprising both generalists and specialists are that 

internal sources are preferred regardless of the preceding knowledge of the problem at 

hand. Also it appears that the education of the employees more likely affects the 

selection of sources compared to age. Also the type of information sought for 

influences the selection process of the respondents. The influence does not just address 

which sources are selected, but also the number of sources selected. Thus, some 

information types require searching of more sources than others. As mentioned above 

the two groups differ at some points. The specialists have a greater probability of 

comparing different sources, when searching for information than do the generalists. 

One reason for this is the specialists’ need for ensuring the reliability and validity of the 

collected information. 

It is the differences between specialists and generalists that lead Oh to sum up 

that “the factors influencing selection of information sources strongly differ between the 

two policy areas”, suggesting, that future studies must take this difference into account. 

However, since this finding has only been verified as to employees’ selection of sources 

we are not trying to make the same distinction in our domain study and search test. The 

58

59 

Chapter 4 

major reason for this is that our field of study comprises more general seeking and 

searching behaviour which results in a slightly different focus compared to Oh’s study. 

4.4.5 Finnish municipal employees 

As a part of her dissertation work, Byström (1999) conducted a study of two 

Finnish local (municipal) governments. The study has been presented with different 

foci across a number of works (Byström, 1997, 1999, 2002). Therefore we will base the 

present section on a combination of these three publications. Using diary, interview, 

organizational document review and observation (Byström, 1997, p. 132) 54 (80 of the 

cases from the pilot are included) cases handled by 19 officials are analyzed. Data on 

the cases were collected from the moment they arrived at the registrar’s office 

(Byström, 1999, p. 67-68). In the study Byström focuses on information seeking, and 

she is not specifically concerned with the actual information searching. Among other 

things, Byström analyzes information needs, the relation between the complexity of 

work tasks and the subject expertise of the participants in the study (1999, p. 85). As 

expected with a group of fairly experienced participants, the subject expertise is in a lot 

of cases rather large. 

The results of the study are interpreted as to a theoretical frame regarding type 

of work tasks and types of information needed. Five types of work tasks of increased 

complexity were appointed for the study, namely automatic information processing 

tasks, normal information processing tasks, normal decision tasks, known-genuine 

decision tasks, and unknown-genuine decision tasks. The increased level of complexity 

is expressed in terms of a subject’s level of a priory determinability of the information 

needed, the information seeking process, and the expected outcome of the seeking 

process. Three information types are also specified for the information needed. These 

include task information (or single task related), domain information (or multi task 

related), and task-solving information (or instructional) (Byström, 1997, 2002). 

From the analysis of the collected data it turns out that with the highest degree 

of a priori determinability (automatic and normal information processing tasks) are by 

far the most frequent tasks among the participants. Next follow with decreasing 

frequency normal decision tasks and known-genuine decision tasks. Unknown-genuine 

decision tasks are not present in the data material and must thus be expected to be the 

rarest task to the participants. Thus, the participants most often take care of tasks that 

have a low degree of uncertainty as to what information is needed and what constitute 

the process of getting hold of the information (Byström, 1997). Further, it seems that


different types of information are needed for the particular work tasks. Thus, the results 

indicate that with an increased level of uncertainty about the task in question the number 

of types increases. Automatic information processing tasks have the largest share of not 

acquiring any information. This share decreases as complexity of tasks increases. Task 

information is most common in normal information processing tasks while both a 

combination of task and domain information or even task, domain, and task solving 

information is most common in decision tasks. Combined with the frequency of work 

tasks, task information becomes the information type with the highest frequency, while 

domain information has a medium frequency and task-solving information has the 

lowest degree of frequency. A similar distribution is indicated in Serola (2006). 

Likewise, the type of source applied changes with an increase of uncertainty in the work 

task at hand. Hence, the share of documentary sources decreases with an increase of 

uncertainty while people as information sources increase (Byström, 2002). In sum, 

Byström’s results indicate that the complexity of work tasks is somewhat connected to 

the type and amount of information acquired. Also, the use of documentary information 

is more widespread in tasks of low uncertainty and complexity which suggests that it is 

these types of tasks that in particular should be supported by information systems. 

4.4.6 Users of the European Parliamentary Documentation Centre 

In 2004 Marcella et al. (2007) examined users of the European Parliamentary 

Documentation Centre (PDC) regarding information needs and information seeking 

behaviour. The main purpose of the study is to make recommendations for service 

development in the PDC on the basis of the study of users of the PDC. Semi structured 

interviews were conducted with different types of administrative staff (72 persons). The 

types count administrative staff, MEP assistants, legal service administrators, and 

MEPs. Since only 5 of the 72 persons are MEPs, we have decided to include the study 

in the present review as it reflects seeking behaviour from employees’ point of view. 

Also 11 PDC staff were interviewed. In order to assure experienced test persons, the 

data were collected prior to the 2004 election for the European Parliament. 

The study explores elements of information behaviour, information seeking 

behaviour and information searching. 90% of the interviewees use information at least 

on a daily basis. The information originates from both internal and external sources. It 

is applied for a wide range of activities and takes the form of both raw and analysed 

data. The interviewees express difficulties in locating relevant information. The 

reasons for the difficulties count transparency, lack of digitalization (of older materials) 

60

61 

Chapter 4 

and representation of different views of opinion, and objectivity of data. In accordance 

with prior studies it is essential that the time available is limited. The time pressure 

intensifies the difficulties of locating information. This is indicated in that the 

participants use other people to perform their searches and that an important criterion 

for relevance of information is the size of the information. 

In line with Byström’s (1997) results presented in the previous section the 

participants have information needs of varying complexity. Marcella et al. do not 

estimate the relative extent of different types of information needs but the paper 

indicates the presence of varying complexity at different places. Searching by entering 

complete citations points to information needs of low complexity. On the other hand 

the information seeking connected to the legislative process, where the information need 

starts out in a wide ranging manner and is later becoming more focused points to more 

complex information needs. 

4.4.7 Information literacy of Scottish government civil service staff 

The overall purpose of Crawford & Irving’s (2009) study is to investigate the 

nature of civil service employees’ information literacy in order to be able to direct 

improving initiatives more specifically towards the actual practice. The research 

method applied in the study is structured interviews that are allowed to change slightly 

depending on the specific type of staff being interviewed. Thus, the 20 interviews that 

were made embraced different types of government employees: care home staff, civil 

service staff, and social work staff. The paper does not share the wording of questions 

that has comprised the interview, nor how the respondents are distributed across the 

different employee types. 

The most recurrent finding of the study is the importance of humans as sources 

of information. People are used in information seeking at different levels. Thus, other 

people are used as sources of information, but also in order to support the selection of 

websites for information searching. The employees evaluate the sources employed for 

information seeking, whether they are human or ICT-based sources. In this sense they 

appear very information literate. At the same time the information environment seems 

introvert. The authors do not explicate what the introvercy embraces. However, the 

scope of the paper could suggest that it covers lack of openness towards changing the 

information practice according to information literacy courses. The paper also 

investigates aspects of searching behaviour. Thus, in connection with the electronic 

resource data management system of the administration it is mentioned, that the


specificity of subject terms assigned is not sufficient. This finding suggests that the 

employees request high precision when searching for information. The employees also 

demonstrate a high understanding of the value of the information sought and applied. 

On the other hand, the authors bring out that the quality of internet searches varies 

across the employees leaving room for improvement, e.g., through information literacy 

courses. 

4.4.8 Civil servants’ internet skills 

In a recent study by van Deursen & van Dijk (2010) the internet skills of civil 

servants were the subject of investigation. The purpose of the study was to find out the 

strength of skills at different levels, namely operational, formal, information, and 

strategic levels. The levels are specified to comprise: 

1. Operational: operating browsers, online search engines, and completing 

online forms, 

2. Formal: navigate the internet and maintain a sense of location while 

navigating the internet, 

3. Information: locate required information, and 

4. Strategic: take advantage of the internet for specific goals. 

98 civil servants from different Dutch executive policy agencies and 

municipalities served as the empirical basis of the study. The four levels were 

operationalized into two search assignments each, a total of 8 assignments. For every 

assignment a maximum of time allowed to solve the task was specified. The paper does 

not explicate what the motivation for the allowed search time is. The assignments were 

used to test the participants’ ability to fulfill the assignment within the period. The 

degree of accomplishment was taken to express the skills of the participants. This 

measure was subsequently controlled as to different background data. 

The general findings of the study are that the participants’ operational and 

formal skills are stronger than information and strategic skills. Also, it seems that age 

affects the skills in the sense that younger participants perform better in solving the 

assignments than do older participants. Another difference in performance was 

identified, namely as to the type of employment. Thus, the executive employees had a 

lower degree of performance compared to policy advisors and administrators. 

Unfortunately, the paper does not report in a more qualitatively manner how the 

different assignments have been solved by the participants. However, we can use the 

results of the study to make clear that different characteristics about the respondents 

62

63 

Chapter 4 

may affect respondents’ skills and that the skills to some degree depend on the type of 

employment. 

4.5 Related studies of information seeking and searching 

As appears from the sections above, there are not many clear cut studies of the 

seeking behaviour of government employees. This is the reason why we are presenting 

some studies below of professional employees sharing some common features with the 

domain in question. The studies present the seeking behaviour of professional 

information users to whom the employment of information contains a core activity in 

solving daily work tasks as a part of their job. Further, we have included professional 

legal seeking behaviour in the review considering that legal sources are expected to play 

an important role to government employees, since their job is to govern the law. 

4.5.1 Legal seeking behavior 

Different authors have investigated seeking behavior of both academic lawyers 

and attorneys. Parts of the findings are interesting in this review because both the legal 

profession and e-government employees take their point of departure in the legal 

framework constituted by the law. As a consequence it is to be expected that they to a 

certain degree share information sources and seeking behavior. 

Kuhlthau & Tama (2001) have conducted a study that investigates the seeking 

behavior of practicing lawyers. 8 lawyers from different small and medium sized 

enterprises were interviewed following a semi-structured interview guide. The study 

comprises both routine and complex tasks though most attention is paid to the complex 

tasks in the analysis. The interviewees prefer printed over electronic sources. It is 

expressed that the searching possibilities in electronic sources does not support 

serendipity (cf. Foster & Ford, 2003) which is often needed in the lawyers’ work with 

complex cases. When carrying out routine tasks the interviewees are more willing to 

apply electronic sources. A number of electronic sources are applied for staying up to 

date, e.g., e-mail and listserv. The interviewees stress the need to be able to filter the 

information in order to avoid information overload. Here, time pressure makes the 

difference. Thus, the interviewees do not have time to go through all the information 

and are concerned to miss important information. In addition to printed and electronic 

sources the lawyers use persons as information sources in accordance with other user


groups presented above. A similar finding was made by Choo et al. (2006), who 

reported that the employees of a Canadian law firm regularly exchanged information 

with the people, they worked with. A final interesting result for the present work is the 

lawyers’ expressed need to have uniform and well organized access to their documents. 

In a later, related study by Makri, Blandford & Cox (2008a) legal information 

seeking was investigated for academic lawyers. The purpose of the study is to be able 

to make recommendations for the development of established law databases based on 

the users’ seeking and searching behavior. 27 participants, ranging from first year 

undergraduate to Professor performed searches to find information for their work while 

thinking aloud. The frame for the analysis is Ellis’ (1989) model for information 

seeking. During the course of the analysis different sub processes to Ellis’ model are 

identified. The academic background of the participants means that searching for 

scientific articles is in focus throughout the paper at the expense of legal sources. As 

regards e-government employees we expect the use of scientific articles to be close to 

nothing. Still, some of the results should be emphasized. Thus, the authors find that 

staying updated is particularly important in legal matters in order to avoid basing one’s 

work on materials that have been overruled, have changed the law, or is no longer the 

case in general. Updating behavior takes place in connection with Ellis’ level 

“monitoring” and at a new level identified by the authors, namely “accessing”. 

Monitoring is defined as “maintaining awareness of developments of an area through 

regularly following particular sources” (Ellis, 1989, p. 177). In the study of Makri, 

Blandford & Cox monitoring takes place at source, document, and content level. Active 

monitoring is carried out by the participants by conducting searches in different law 

databases, by browsing particular sources, and by following previously bookmarked 

web pages. Passive monitoring takes the form of subscribing to e-mail alert lists. The 

“updating” behavior identified by the authors is a behavior subordinate to “accessing” 

defined as “gaining access to resources, sources or documents/content” (Makri, 

Blandford & Cox, 2008a, p. 625). Updating differs from monitoring in that it 

designates the behavior of investigating the current understanding of a document or the 

content of a document. Updating is primarily direct in the form of searches in a law 

database or checking footnotes to legal texts. The importance of staying updated is also 

verified in other studies of legal information behavior (e.g., du Plessis & du Toit, 2006). 

64

4.5.2 Information behaviour of software engineers 

65 

Chapter 4 

The overall purpose of Freund, Toms & Waterhouse’s (2005) study of software 

engineers was to identify which contextual factors have an effect on information 

seeking in a work context. The study was based on a combination of four methods; 

focus group, semi structured interviews, observation, and finally analysis of documents 

and digital information (phase I). In the second study (phase II) reported in the paper, 

14 software services consultants were interviewed on the basis of a semi-structured 

interview. The purpose of the study was to investigate information behavior on the 

basis of a work task framework. 

Essential results from the study of phase I is the dependency of information needs to the 

type of work task at hand. Work tasks can range from short term to long term 

commitments and the development of information needs is highly dependent of this 

work task context. Phase II of the study reveals that information is extremely important 

to the interviewees’ work. Thus, on average they use approximately 20-30% of their 

working hours searching for and consulting information sources. The results of phase 

II’s study have been summed up in Figure 4.3. The figure illustrates the influence of 

work content on access constraints and information characteristics, which again affects 

the strategies applied for searching and selecting information. Specifically, the figure 

shows that different characteristics of the work context to a large extent affect the 

seeking process. Affecting elements of the work context comprise the employees’ 

characteristics such as her existing knowledge about the task at hand. Also the type of 

task (whether consultant or engineering), and the specific problem at hand (e.g., 

learning, collect advice, or find facts), seems to affect information seeking for the 

interviewees. The work contextual factors affect the selection of sources in terms of the 

time available and the availability of sources, but also the characteristics of information 

and knowledge of the subject. This again has an effect on the type of channel, source, 

and genre selected. The seeking process mirrored in the model reflects a linear 

conception of information seeking. Rather, the strength of the model lies in its 

enumeration of factors that influences information seeking. 

4.5.3 Professional seeking behaviour 

The purpose of Leckie, Pettigrew & Sylvain’s (1996) paper is to model the 

information seeking of not just one specific profession but to identify what characterizes 

the information seeking that takes place across professionals. A review of existing


sought, Figure 4.3 e.g., Model as to of subject, cognitive the factors degree affecting of detail information and specificity, seeking and in the preceding domain of software 

engagement engineering. Adapted (short or from long Freund, assignment, Toms & the Waterhouse stage a project (2005). 

is at, etc.), the type of work 

66

67 

Chapter 4 

studies of engineers’, lawyers’, and health care professionals’ seeking behaviour, and of 

seeking models forms the basis of the general model developed by the authors. The 

model is depicted in Figure 4.4. The model as such informs us about the major role that 

work roles play for the subsequent information seeking of professionals. What can be 

discovered from the model is that work roles should be taken into account, when 

investigating the seeking behaviour of professionals. Compared to the model of Freund, 

Toms & Waterhouse (2005), the present model to a greater extent reflects the 

interactivity of information seeking in that it includes feedback loops. On the other 

hand, the model in itself is not very thorough as to the specific steps in information 

seeking. Thus, the distinct steps of information seeking are not mirrored in the model. 

However, the authors compensate for this in their presentation of the model. 

4.6 Summary 

The present review has made different perspectives on government employee 

seeking behaviour clear. A diversity of information needs is present. Thus, information 

needs range from simple information needs to far more complex information needs. 

However, apparently simple information needs are the most common in the domain. 

The diversity of information needs in the domain requires the presence of different 

types of indexing. The information contained in information systems needs to be 

represented by sufficient descriptive metadata in order to support verificative searches 

(Ingwersen & Wormell, 1989). In addition, in order to meet more complex information 

needs an adequate amount of subject metadata also needs to be present in e-government 

information systems. Further, the assignment of both descriptive and topic metadata is 

required in order for the employees to be able to discriminate between large sets of 

documents contained in information systems. The diversity of work tasks and 

information needs also affects the amount and types of information applied by the 

employees. The review has shown that the amount of information and the information 

types applied depends on the work task at hand. With simple work tasks task 

information is the most dominant type of information applied. When the complexity of 

tasks increases, so does also the amount of information collected and the types of 

information applied. Thus, domain information and task solving information is 

primarily used for solving complex work tasks.


Figure 4.4 The process of information seeking of professionals. Adapted from Leckie, Pettigrew & 

Sylvain (1996, p. 180) 

Time is an issue to the employees at different levels. In general the time 

available for handling tasks is limited. The same finding has been made for other user 

groups (cf. Savolainen, 2006). The time pressure of the employees calls for the 

possibility of carrying out effective searches with high precision in information systems. 

One way of meeting this requirement is again by assigning topic metadata at sufficient 

level of specificity. Time is also significant to the respondents as to the importance of 

staying updated on the subject of their work. The topics of e-government are dynamic 

and the employees need to keep updated within the latest developments. The updating 

is carried out by active information seeking and in a more passive manner by following 

newsletters and other forms of updating. Finally, the development of subjects means 

that documents become obsolete. Being able to sort documents as to their currency and 

to state documents as to their news value is thus important in information systems. 

A long range of information sources are applied by employees in order to solve 

information needs. Information is collected from printed, digital, and human 

68

69 

Chapter 4 

information sources. The assessment of sources as to the information need in question 

is highly qualified. The preferences for information sources depend on a number of 

characteristics of the employees. Among other things, the policy areas and the type of 

employment influence the selection of sources. Also it seems that the number and type 

of sources increase along with the complexity of work tasks and information needs. 

Persons as sources in general are very frequent and the importance of this particular 

source also increases with the complexity of the work task at hand. 

In sum, we do get some insight into the seeking behaviour of e-government 

employees from the presented studies. However, what has also become clear from the 

review is that the body of knowledge on the seeking behaviour of government 

employees is limited. Firstly, the number of studies specifically investigating the 

seeking behaviour of government employees is not impressive. Secondly, some of the 

studies mentioned above are of an earlier date. This becomes problematic since we 

have previously stated that the work tasks of employees are expected to change with the 

digitalization of governments. With a change of work tasks we might also see a change 

in the character of information needs and as a consequence also a change in seeking 

behaviour. The behaviour mirrored in the older studies may therefore not reflect the 

current situation for government employees. Thirdly, several of the studies above do 

not provide direct insight into the seeking behaviour of the user group in question. This 

does not have to do with the quality of the studies. Rather it is an expression of the fact 

that the studies were carried out with another purpose than investigating specific 

seeking behaviour. A core assumption of the empirical foundation of the present work 

is that the evaluation of information systems needs to take its point of departure in 

potential users. On this basis, we estimate that there is a need for a more thorough 

investigation of the current state of civil servants’ seeking behaviour with particular 

emphasis on tax governments. This investigation serves the purpose of qualifying the 

design of the search test. This is the primary reason for carrying out the empirical 

domain study of the thesis.

5 Indexing of electronic documents 

71 

Chapter 5 

The concept of indexing has different meanings. In LIS, the widest sense of the concept 

designates index terms as a set of labels that information searchers can apply in 

information searching in order to denote authors, subjects, journal names etc. (cf., 

Rowley, 1994). Here, we are investigating the subject of documents. Hence, we employ 

a narrower definition of the term. The understanding of the concept of indexing, that 

guides the present work is, that it designates the act of carrying out representations of 

the subject of information in order to enable inclusion and retrieval of documents in a 

database (Lancaster, 2003; Rowley & Hartley, 2008). For it, indexing supports the 

purpose of subject retrieval systems, namely “...to retrieve documents, whose aboutness 

suggest that a user may find in them meaning(s) expedient to a certain need of the 

moment” (Beghtol, 1986, p. 85).The subject representation can take the form of for 

instance descriptors, subject headings, or classification codes (Mai, 2005). 

Indexing has three main purposes; to facilitate easy location of documents by 

topic, to enable the identification of relations between documents, and to predict the 

relevance of a document to information needs (Korfhage, 1997). In other words, 

indexing is a highly important factor in the process of information retrieval. Or as 

Soergel puts it: Indexing ”...sets an upper limit for retrieval performance...” (1985, p. 

327). When seen in relation to the IR process, indexing represents the input to a system 

and retrieval of documents the output respectively (Milstead, 1992, p. 408). 

Accordingly, the distinction between input and output stresses the close relation 

between indexing and retrieval as part of the IR process. Also, the applied indexing 

practice influences the results of information retrieval and retrieval should affect how 

indexing is carried out. 

The relation between subject indexing, subject cataloguing, and classification 

is close. All three concepts are used to designate aspects of labelling and describing 

documents according to their content, whether it is in the form of classification codes, 

subject terms or other indicators (Anderson & Pérez-Carballo, 2005; Lancaster, 2003). 

This is reflected in the literature, where the concepts are used in an ambiguous way. 

Turning to automated acts of indexing and classification, the situation is the same. Here 

the process of deciding the content of documents and grouping them accordingly may 

be referred to as automatic indexing (Lancaster, 2003; Moens, 2000; Salton & McGill,


1983) or automatic classification (e.g., Golub, 2007). In the present work, we define 

indexing in a broad sense, taking the similarities of subject indexing and classification 

into account. This definition allows for a broad view on the literature on automated 

approaches to indexing and classification as well. We apply the term automatic 

indexing, since our focus is on the labelling and grouping of documents. 

In order to expose the context of indexing we make an introduction to the 

purpose of indexing. Second the indexing process is presented and followed by 

approaches to and core concepts in indexing. Afterwards approaches to automatic 

indexing are discussed. We finish the chapter by looking at hybrid indexing types that 

combines elements of either human or automatically based indexing approaches. 

5.1 The process of indexing 

The process of indexing refers to the act of assigning subject terms to 

documents or other types of information in order to enable retrieval. According to 

Philipson (2008), the process of indexing starts when the indexer begins to familiarize 

with a document and ends, whenever the subject description has been completed. The 

process of indexing has been presented containing different numbers of steps in the 

literature. In its most simplistic form, the indexing process is constituted by two steps. 

In the first step the document is analyzed in order to decide on the subject. Here, an 

identification of the aboutness of the document is identified. The first step may be 

referred to as the conceptual analysis (Lancaster, 2003), but the literature shows 

alternative designations as well (Mai, 2000, p. 281). In the second step of the indexing 

process, the document subject is translated into a set of index terms (Mai, 2005). This 

part of the indexing process is denoted translation (Lancaster, 2003). 

The two step conception of the indexing process has been challenged by other 

scholars (Mai, 2000). The indexing process has been presented with up to five steps. 

The increased number of steps allows for a more differentiated presentation of the 

process. However, the two steps in the simplified model can always be identified as 

underlying the more detailed presentations. Rowley (1988, p. 50) presents the indexing 

process as containing 3 steps: 1) familiarization, 2) analysis, and 3) conversion of 

concepts into index terms. In the first step the indexer becomes acquainted with the 

content of the document. Among other things the indexer should be aware of the 

structure of the subject. The familiarization forms the basis of the second phase: the 

analysis of the document. The second phase can to a certain degree be guided by 

72

73 

Chapter 5 

guidelines such as instructions, but experience and intuition are also important here. In 

the third phase concepts from the document are matched with index terms from an index 

vocabulary. Compared to Lancaster’s’ (2003) two step model, Rowley expands the first 

step into her first two phases, while Rowley’s third phase corresponds with Lancaster’s 

second step. Chowdhury (2004, p. 74) operates with a 5-step model of subject indexing 

containing the following steps: 1) analysis of subject; 2) identification of keywords; 3) 

standardization of keywords; 4) choice of indexing system, whether pre- or post 

coordinate, and preparation of entries; and 5) filing of entries. Again we see Lancaster’s 

two steps as underlying Chowdhury’s five. Thus, Chowdhury’s steps 1-3 correspond to 

Lancaster’s first step. At Chowdhury’s first step the indexer analyses the subject of the 

document while the second step involves the decision on which part of perhaps several 

subjects should be represented in the indexing. Whether the third step is mainly 

oriented towards conceptual analysis or translation is difficult to state due to 

Chowdhury’s limited description. However, we consider it as a part of the conceptual 

analysis since it is positioned previous to the introduction of the controlled vocabulary. 

In addition, it contains a standardization of the keywords selected on the basis of the 

conceptual analysis. The fourth and fifth steps of Chowdhury match the translation 

state of Lancaster. Here the entries in the controlled vocabulary are generated and filed 

into the system. In sum, varying levels of detail may be identified in presentations of 

the indexing process. Different advantages are associated with a more detailed 

presentation of the indexing process. Mai (2000) mentions the usefulness when 

carrying out analyses of the process. An additional advantage is that it allows for more 

specificity when indexing guidelines are developed. 

Figure 5.1: Illustration of the subject indexing process (Mai, 2000, p. 279).


Though a number of presentations exists on the indexing process, scholars 

within the field of indexing agree, that not much is known about the subject indexing 

process. In particular the part concerning the indexer’s determination of the subject of a 

document is not very well discovered in the literature. Despite available indexing 

politics, standards, or guidelines, it is difficult to decide, what takes place in the initial 

step; identifying the subject of a document (Mai, 2000; 2005). This is obviously a 

problem, since the initial step of the indexing process may be considered most 

important, since it forms the basis for the steps to follow. However, the entire process 

of indexing is associated with a reduction or perhaps even loss of information compared 

to the full text of the document. Figure 5.1 illustrates this. A reduction of information 

is needed, because to end users it reduces the amount of information to keep track of. 

On the other hand, if documents are represented by wrong or misleading index terms, it 

could cause severe problems. Therefore, ensuring the quality of indexing is essential to 

successful retrieval. 

5.2 Quality of indexing 

Indexing quality is closely connected to the retrieval of documents. Thus, if 

the quality of the indexing is low, it will reflect on the quality of search results (Mai, 

2000). Indexing quality may be expressed in different terms. Two overall perspectives 

exist on how to measure indexing quality. One perspective considers the quality in 

terms of retrieval effectiveness. That is, the quality is measured in terms of the ability 

of the indexing to be able to discriminate relevant documents from irrelevant documents 

as to search requests (e.g., Schultz, 1970; Borko, 1977; Lancaster, 2003). The other 

point of view considers quality in terms of the degree of consistency of the indexing, 

that is, the accuracy in the indexing of a document (e.g., Rolling, 1981). However, 

other concepts also add to the identification of indexing quality, namely specificity and 

exhaustivity. The concepts are important to indexing because they help characterize the 

indexing and are known to affect indexing quality. Below follows an introduction to the 

concepts. 

5.2.1 Specificity 

Specificity expresses the generic level of assigned index terms (Soergel, 1994). 

The concept of specificity is inherently connected to the vocabulary applied for 

74

75 

Chapter 5 

indexing in the sense that the specificity of the applied vocabulary decides the possible 

level of specificity in indexing. Thus, the generic levels of vocabularies will differ as to 

the scope of the vocabulary. For instance, the same content of a document will most 

likely have a different depth of assigned index terms in a general vocabulary compared 

to a special vocabulary (cf. Mai, 2004b). 

It is a common approach in indexing practices that index terms are chosen at 

the most specific level possible within the frame of the indexing language (e.g., Bates, 

1979; Lancaster, 2003). Hereby, the “force of discrimination” (Blair, 2002, p. 280) is 

supported. By force of discrimination is meant that by assigning the most specific index 

terms possible, the database allows for discrimination between documents in the 

database, in particular between general and specific documents. Placing documents at 

the most specific level in the indexing language ensures that documents at the same 

level of description will be retrieved in the same search session. For instance, 

documents dealing with income taxes in general are at a higher generic level than 

documents dealing with allowance for travel expenses. The principle of assigning the 

most specific index term possible is beneficial when carrying out specific searches. 

However, if the search is broader the system needs to allow for inclusion of narrower 

descriptors in order to avoid the inclusion of possibly relevant documents of greater 

specificity (Soergel, 1994). 

5.2.2 Exhaustivity 

Exhaustivity deals with indexing terms’ coverage of the content of a document 

(Salton, 1986; Soergel, 1994; Lancaster, 2003; Anderson & Pérez-Carballo, 2005). Are 

just core aspects of the document covered by indexing terms, or are sub aspects 

represented as well? Obviously, the larger the numbers of terms assigned, the greater 

the exhaustivity of the document will be. The counterpoint to exhaustivity is selective 

indexing, where only the central subjects of a document is covered by the indexing 

(Lancaster, 2003). 

Soergel (1994) distinguishes between viewpoint exhaustivity and importance 

exhaustivity. Importance exhaustivity addresses thresholds for when an aspect of a 

document is important enough to be represented in the indexing. That is, how important 

must an element of a document be in order to be included in the description of the 

document? Viewpoint exhaustivity on the other hand points to the depth or range of the 

implied indexing language. Thus, viewpoint exhaustivity designates the degree as to 

which facets and viewpoints expressed in a document are represented in the indexing


language. One could say that the level of viewpoint exhaustivity is defined by the limits 

of the indexing language. This way the two types of exhaustivity complement each 

other. In the first case the level of the exhaustivity is set by the indexer or the indexing 

rules. In the second case the indexing language sets the upper limit for exhaustivity. In 

practice the two types of exhaustivity interact. Importance exhaustivity will be restricted 

by the nature of the indexing language. At the same time there is no need for a highly 

exhaustive indexing language, if the defined indexing policy prescribes a low level of 

importance exhaustivity. However, distinguishing between the two types of 

exhaustivity allow for identification of the factors affecting indexing exhaustivity. 

The level of exhaustivity has economic implications (Lancaster, 2003). A high 

level of exhaustivity will require more effort from indexers than a low level of 

exhaustivity. It is not necessarily useful to estimate exhaustivity quantitatively in terms 

of the number of assigned terms. Thus, other factors have an impact on exhaustivity, 

such as the size of the documents. Few index terms added to short documents may be 

just as exhaustive as more index terms added to longer documents (Anderson & Pérez- 

Carballo, 2005). The indexing approach is another factor. Thus, a single controlled 

term added, may represent the content of a document more exhaustively than a number 

of uncontrolled terms added by an indexer (Fugmann, 1993). In terms of recall and 

precision (see section 5.2.4) high exhaustivity of indexing will increase precision of 

search results in the sense that documents dealing with the searched subject partially 

will be retrieved along with documents whose main focus is on the same subject. 

Simultaneously recall is improved by high exhaustivity when documents can be found 

that has a more peripheral mention of the searched subject (Rowley, 1988). Also the 

ability to discriminate between documents must be considered in relation to 

exhaustivity. Thus, if the same terms are assigned to many documents, the 

discrimination value of the term decreases (Lancaster, 2003) 

5.2.3 Consistency 

Consistency becomes an issue when dealing with human indexing. The 

consistency problem arises from the subjective process taking place when indexers 

decide on the aboutness of a document. Hence consistency refers to the level of 

agreement between two or more indexers on which index terms to use for the 

representation of a document (Zunde & Dexter, 1969). This type of consistency is also 

known as inter-indexer consistency (Lancaster, 2003). In other words; do two or more 

indexers agree on, what is the subject of a document? And do they select the same index 

76

77 

Chapter 5 

term to represent the subject? The deviation between indexers may take place at 

different levels. Lancaster (2003) lists 7 factors that may influence the degree of 

consistency between indexers. The factors appear in Table 5.1. A related concept, 

intra-indexer consistency, refers to one indexers level of agreement with himself 

(Lancaster, 2003). Here the question would be: Does the same indexer have the same 

interpretation of the subject of a document at different times? In this sense, the concept 

of consistency takes the subjective nature of human indexers into account and deals 

with the fact, that indexing is a highly subjectively dependent process when performed 

by human beings. 

1. Number of terms assigned 

2. Controlled vocabulary versus free text indexing 

3. Size and specificity of vocabulary 

4. Characteristics of subject matter and its terminology 

5. Indexer factors 

6. Tools available to indexer 

7. Length of item to be indexed 

Table 5.1 Possible factors affecting consistency. From Lancaster (2003, p. 71). 

We have briefly mentioned that consistency could be one way to express the 

quality of indexing. Rolling (1981, p. 71) even defines indexing quality in terms of 

consistency. The assumption is that the similarity of documents in an IR system cannot 

be properly expressed, if the indexers do not demonstrate a sufficient level of 

consistency when assigning index terms. However, expressing indexing quality in 

terms of consistency has been disputed by other scholars. Cooper (1969) challenges 

consistency as a measure of quality, because consistency does not necessarily imply 

good indexing. Instead, he emphasizes the need to carry out indexing in accordance 

with the requests users make to an IR system in order to ensure successful retrieval. As 

a consequence Cooper suggests indexer-requester consistency as highly relevant to 

indexing quality. It is implicit to indexer-requester consistency, that it is relevant, when 

indexing quality is expressed in terms of retrieval effectiveness. The assumption is that 

consistency might very well be high between indexers, but if users apply other search 

terms than the ones consistently assigned by indexers, the performance of searches will 

not be good. Achieving a high degree of indexer-requester consistency is made difficult 

by the diverse conditions characterizing the indexer and the requester respectively.


However, the findings by Gomez, Lochbaum & Landauer (1990) suggest, that the richer 

the applied vocabulary, the more likely it is to see correspondence between indexers’ 

index terms and searchers’ search terms.. A similar finding of the study is, that the 

more names an information object is allowed to have in an information system, the 

more likely it is, that it will be retrieved by searchers. 

5.2.4 Performance measures 

An alternative way of measuring indexing quality is to investigate retrieval 

effectiveness. An important instrument here is the application of performance 

measures. Performance measures give an indication of indexer-requester consistency as 

suggested by Cooper (1969). Performance measures provide a macro analysis of the 

system performance and should preferably be supplemented by microanalysis as 

specific investigations of retrieval success and failure (Soergel, 1985). Using 

performance measures for IR evaluation have been a common practice since the 

1950’ies. Kent et al. (1955) are among the first to propose different measures of 

performance in the shape of a number of factors expressing system performance. Two 

performance measures - recall and precision - have traditionally been employed in order 

to measure the quality of indexing. The performance measures are quantitative 

measures expressing respectively: 

Recall = Number of relevant documents retrieved 

Total number of relevant documents in the collection 

Precision= Number of relevant documents retrieved 

Total number of documents retrieved from the collection 

Technically speaking, precision is easier to measure, since the evaluator only 

need to know which documents from a list of retrieved documents that are actually 

relevant. As for recall, one needs to know the relevance of all documents in the 

collection. In other words, recall challenges the setup of IR evaluation. Further, it 

becomes clear, that the concept of relevance is highly important for the outcome of IR 

evaluation due to its core position in the equations above. The concept of relevance 

represents a large and independent research area. Since the concept as such is beyond 

the scope of the present work, we will not explore further on it here. However, a 

thorough review of the concept can be found in Borlund (2003a). Since the first 

78

79 

Chapter 5 

introductions of recall, precision and related performance measures additional measures 

have been introduced, that allows taking into account the characteristics of large scale 

IR systems. Examples are mean average precision, interactive recall, and relative 

relevance (Kelly, 2009). We will not go further into detail with these measures here. 

Different elements in indexing languages can help increase recall or precision 

or both. According to Lancaster (2003) exhaustive indexing will increase recall and 

lower precision since exhaustivity increases the number of retrieved items in searching. 

Further, vocabulary control and the presence of different relationships in the vocabulary 

will increase recall. Inversely, specificity of indexing, scope notes, and relationships in 

the indexing language are examples of precision devices (Aitchison, 1992). In sum, it is 

possible to adjust the indexing quality according to the expected use of the indexing. 

5.3 Approaches to indexing 

Indexing can be divided and characterized in a number of different ways, depending on 

the scope. In the sections to follow, we will present the perspectives needed in order to 

introduce the Ph.D. project. The approaches presented below have been empirically 

tested in a variety of ways. Since some of the approaches are usually close related (e.g., 

intellectual and controlled indexing), empirical comparisons of the approaches may be 

relevant to several of the sections below. Therefore we present the empirical studies 

where we consider them most relevant. 

5.3.1 Document, user, and domain oriented indexing 

The approach to indexing may be defined by the point of departure of the 

indexing; whether document, user or domain oriented. The orientation of the indexing 

captures the focus of the subject analysis that takes place ahead of the assignment of 

index terms, that is, the initial step of the indexing process. 

Document oriented indexing (or entity oriented indexing) seeks to represent the 

content of documents (Soergel, 1985; Fidel, 1994; Mai, 2005). Thus, the analysis of 

the document carried out in the first step of the indexing process is based solely on the 

content of the document and does not take into account the potential use of the 

document. The purpose of the document oriented indexing is to carry out a description 

that is loyal to the content of the document. Document oriented indexing may in 

principle be carried out without any preceding knowledge of the users expected to


benefit from the indexing. The strength of the document centred approach is that 

indexing is kept stable due to the static nature of the document. Hereby, the indexers do 

not need to consider potential future use of the document (Mai, 2005). Further, indexers 

do not need extensive knowledge about the context of the document, whether the 

context implies users or the domain in question (Fidel, 1994). As pointed out by Mai 

(2005), the document oriented approach is supported by the international standard for 

indexing (ISO, 1985). 

User (or request) oriented indexing designates indexing aimed at meeting the 

requests expected from a particular audience (Fidel, 1994; Soergel, 1994; Lancaster, 

2003). Here, users’ anticipated requests are forming the basis of the index terms 

assigned to a document. Thus, the indexer considers, whether a document should be 

retrieved for a certain request or not. Soergel (1985, p. 233) equates descriptors with 

queries. By working through (parts of) an indexing language the indexer checks 

whether a descriptor is relevant to the document in question. This sort of indexing is 

also referred to as checklist indexing (Soergel, 1985; Fidel, 1994). By reflecting 

anticipated requests, user oriented indexing seeks to increase indexer-requester 

consistency (cf. Cooper, 1969). 

Domain oriented indexing may be considered an extension of user oriented 

indexing. The conception of domain is commonly associated with Hjørland’s (2002; 

Hjørland & Albrechtsen, 1995) concept of domain analysis, which is primarily 

concerned with scientific disciplines. However, Mai (2005, p. 605) considers the term 

domain in a broader sense and defines it as “a group of people who share common 

goals.” This way e.g., professional groupings and interest communities are also 

potential recipients of the indexing. The assumption behind domain oriented indexing 

is that the subject of a document is to a large extent determined by the contextual use of 

the document. Domain oriented indexing extends user based indexing in the sense that 

it does take the context of users into account. Thus, it is supposed that the domain users 

are members of, has a significant influence on the role of the document within that 

particular domain (Mai, 2005). It may be discussed whether users or domains change 

the most over time and as a consequence which of the two approaches is the most 

durable. However, both approaches need regular updates in order to maintain their 

currency towards the users (cf. Lancaster, 2003; Mai, 2005). 

80

81 

Chapter 5 

Figure 5.2 Document and domain oriented approaches to indexing. Adapted from Mai (2005, p. 

607) 

The approaches mentioned above may be summed up by Mai’s illustration (see 

Figure 5.2). It is important to note, that the three approaches are to a certain degree 

condensed constructions that serve the purpose of identifying tendencies in subject 

identification. In practice the approaches will in some cases be difficult to perform in a 

clean-cut manner. As an example of this, Mai (2005) mentions the difficulties indexers 

may have, not using for instance contextual knowledge when interpreting the subject of 

a document solely on the basis of the document. 

5.3.2 Controlled vs. uncontrolled indexing 

Controlled and uncontrolled indexing refers to the indexing languages used to 

perform indexing. In its basic form, a controlled vocabulary is an authority list 

specifying the index terms, indexers can assign when performing indexing. However, 

in addition, controlled indexing languages are commonly expressing some sort of 

semantic structure in order to be able to for instance control synonyms, differentiate 

between homographs, and link related terms (Lancaster, 2003). Controlled indexing 

languages belong to the type of systems referred to as knowledge organization systems 

or KOS. KOS may be characterized as to their structure (or the relationships expressed) 

and function (cf., Zeng, 2008). As a consequence a number of different KOS exists.


Figure 5.3 Types of vocabularies and their relationships. Adapted from Morville & Rosenfeld 

(2007, p. 195) 

Subject heading lists are alphabetically ordered lists of controlled terms and related 

subheadings. Thesauri on the other hand differ by having fully organized terms 

elaborating relations between concepts (Aitchison, 1992). Thesauri and subject heading 

lists have two features in common. When in use they control the use and form of 

indexing terms, and enables relations between terms in the indexing language (Rowley, 

1988, p. 68). Additionally, taxonomies, ontologies, and classification schemes are 

variants of controlled indexing languages (Aitchison, 1992; Gilchrist, 2003). Controlled 

indexing languages can be characterized as to their degree of complexity. Morville & 

Rosenfeld have illustrated this graphically (see Figure 5.3). According to the model, the 

lowest level of complexity represents equivalence relationships while the highest level 

represents associative relationships as for instance expressed in thesauri. 

In controlled indexing languages “both the terms used to represent subjects, 

and the process whereby terms are assigned to particular documents, are controlled or 

executed by a person” (Rowley, 1994, p. 109). This is the main reason for the close 

relation between controlled and manual indexing mentioned in section 5.3. 4 However, 

as will be seen later in the present chapter, automatic methods for controlled indexing 

have been developed. In other words, the relation between controlled and manual 

indexing is not unequivocal. 

4 We elaborate further on manual indexing in the section to follow (section 5.3.3). 

82

83 

Chapter 5 

Uncontrolled indexing extracts indexing words from the document itself or 

from another source outside of the controlled indexing language. When speaking of 

uncontrolled indexing, two generic types exist. Free indexing languages assigns terms 

to documents that not necessarily originate from the document itself. Natural language 

indexing on the other hand applies terms from the document for representation, and is 

usually employed when performing automatic indexing (Rowley, 1988). Indexing by 

natural language forms a subordinate field of research to the general field of natural 

language processing (NLP) (Chowdhury, 2003). Uncontrolled indexing may be carried 

out by humans or machines. However, natural language indexing is commonly 

associated with automatic indexing (see section 5.4). 

Dubois (1987, p. 249) have summarized the strengths and weaknesses of 

controlled vocabularies and free text indexing. The key points appear from Table 5.2. 

To a large extent Blair & Maron’s (1985) study in an empirical manner supports 

Dubois’ summary concerning free text indexing. Blair & Maron tested the retrieval 

effectiveness of a full text retrieval system that is, indexing by natural language. 

Involving two test persons within the legal domain Blair & Maron found that the level 

of recall in the searches carried out was surprisingly low. A number of different reasons 

explained the results regarding recall. For one thing the test persons had difficulties 

predicting the exact wording applied in the documents searched for. It turned out that 

the test persons’ selection of words was decided by their point of view on the problem 

in question. Also, misspellings in the documents contained in the retrieval system 

resulted in lack of retrieval. Both these findings illustrate how the searchers are being 

challenged when searching for information as pointed out by Dubois (1987). Further it 

was found that search terms rated important by the searchers did not occur in document 

relevant to given requests. In some cases the terms were just not included in the 

documents. In other cases the terms occurred, but were expressed in terms of narrower 

or broader concepts. This problem is also addressed by Dubois. On the other hand this 

can be considered the strength of natural language indexing. For instance, Tenopir 

(1985) found that the use of synonyms in natural language indexing were able to 

compensate for users’ incomplete queries. 

The performance of controlled versus uncontrolled indexing languages have a 

core subject of investigation in the LIS research literature. One of the first 

investigations comparing the retrieval effectiveness of different indexing languages was 

the Cranfield tests. The tests took place for approximately a decade beginning in the


Controlled 

vocabularies 

Advantages Disadvantages 

Solves many semantic 

problems 

Permits generic 

relationships to be 

identified 

Maps areas of knowledge 

Free text Low cost 

Simplified searching 

Full information content 

searchable 

Every word has equal 

retrieval value 

No human indexing errors 

No delay in incorporating 

new terms 

84 

High cost 

Possible inadequacies of coverage 

Human error 

Possible out of date vocabulary 

Difficulty of systematically 

incorporating all relevant 

relationships between terms 

Greater burden on searcher 

Information implicitly but not 

overtly included in text may be 

missed 

Absence of specific to generic 

linkage 

Vocabulary of discipline must be 

known 

Table 5.2 Summary of strengths and weaknesses of controlled vocabularies and free text. Adapted 

from Dubois (1987, p. 249). 

mid-1950’s. The overall purpose of the Cranfield tests was to carry out comparative 

evaluations of a number of different controlled and uncontrolled indexing languages. 

However, the tests have become at least equally known for their pioneer 

contribution to the methodical body of knowledge on evaluation of IR systems (cf. 

Sparck Jones, 1981). The Cranfield tests comprised two tests; Cranfield I and Cranfield 

II. Cranfield I identified the complexity of isolating a single indexing language in a test 

situation, since the tested indexing languages were found to be interacting as to their 

functions as precision and recall devices respectively (Cleverdon, 1967). Criticism was 

put forward by different authors, mainly concerning methodical issues (Sparck Jones, 

1981). Next followed Cranfield II with a slightly enlarged test collection compared to 

Cranfield I. Cranfield II built upon Cranfield I and served the purpose of carrying out a 

closer investigation of the effect single indexing languages had on performance. Like in 

Cranfield I, a number of different indexing languages were tested against each other. 

The languages were distributed across three main types: 1) Single term indexing

85 

Chapter 5 

languages, 2) Simple concept indexing languages, and 3) Controlled term indexing 

languages. Furthermore, indexing languages representing keywords in titles and 

abstracts were included in the test. Among the results of the test the inverse relation 

between recall and precision was found. By this is meant that when recall is high, 

precision tends to be low and vice versa (Cleverdon, 1967). An ordered list of the 

performance of the tested indexing languages when measured in terms of normalized 

recall further showed that applying single terms (the first group of indexing languages 

mentioned above) was superior to the remaining two groups. Single concepts (the 

second group of indexing languages tested) had the lowest performance, while 

controlled terms and keywords from titles and abstracts had a medium score (Cleverdon 

& Keen, 1966, p. 253; Cleverdon, 1967, p. 189). 5 Thus, the results suggest that 

uncontrolled indexing is certainly a valuable tool for retrieval purposes, but that they 

should preferably be in the form of single term languages compared to simple concept 

languages. 

In another study Cousins (1992) compared the performance of basic marc 

records, and records enriched with either natural language index terms or controlled 

index terms. Performance was measured in terms of recall. The natural language terms 

of the study originated from the table of contents and back of the book indexes of the 

indexed units. PRECIS represented the controlled vocabulary for the test. The choice of 

PRECIS was based on a preceding investigation, where it was found that out of three 

indexing languages, PRECIS was the most suitable for the queries guiding the test. 11 

queries of varying themes were applied for the test. In her test Cousins found that the 

retrieval performance of the enriched records exceeded the basic records. However, it 

was also found that the relative retrieval performance in the enriched records depended 

on whether the queries applied for the test were truncated or not. Thus, it turned out, 

that PRECIS had a better performance when queries were not truncated. Conversely, 

when test queries were truncated the retrieval performance of the natural language 

indexing was superior. Overall, truncated queries applied for natural language indexing 

had the best retrieval performance of the test. In Cousins discussion she mentions the 

influence of the test queries on the test results. Thus, the formulation of some of the 

queries turned out to have quite an effect on the test result due to their choice of terms 

5 A thorough presentation of the results of the Cranfield tests has been presented in Cleverdon (1960) 

(Cranfield I) and Cleverdon & Keen (1966) (Cranfield II).


and subsequent potential for truncation. Apart from the search test results, Cousins 

study adds to emphasize the importance of the amount and nature of queries applied in 

retrieval tests. This is particularly the case, when the test setup does not include real 

users, but are carried out in an experimental setting like Cousin’s. 

In a study with a slightly different focus, Gross & Taylor (2005) investigated 

the amount of relevant records being missed if controlled index terms were removed 

from records in a library catalogue. Thus, though not explicated by the authors, recall 

was used to measure the performance between records including and records excluding 

controlled subject data. A sample of 227 queries drawn from a log of the library 

catalogue functioned as the information needs of the study. The study found that 

approximately one third of records would not have been retrieved without the 

assignment of controlled subject data. The study supports the general perception that 

controlled subject data supports recall. Also, obviously, controlled subject data need to 

supplement the natural language appearing in records. In a similar study Veenema 

(1996) evaluated the performance of controlled index terms and natural language in a 

small test collection (553 documents of highly varying content and form) compiled 

from a Canadian embassy. The indexing policy guiding the manual indexing is far from 

aiming at exhaustivity. This results in an average of 2 assigned terms per document in 

the test collection. The comparison of the two indexing languages shows that the nature 

of the information need affects the performance of the respective languages. Thus, due 

to the highly restrictive indexing policy on the controlled indexing language, the natural 

language performed better in information needs concerning locations, while the 

controlled index terms performed better on information needs regarding a certain sector. 

Though the empirical basis of the study is rather limited, the study adds to illustrate the 

implications of indexing policies on test results, but also how specific characteristics of 

information requests may affect outcomes of comparisons of indexing languages. 

Savoy (2005) has compared manual, assigned indexing and automatic, 

extracted indexing in a study of French database named Amaryllis. Here, we will 

present the results relevant to the performance of controlled versus uncontrolled 

indexing. However, implicitly the differences between manual and automatic indexing 

are also illustrated by the study. In the study, manual indexing was mainly carried by 

using a controlled vocabulary. The indexers were allowed to supplement with 

uncontrolled index terms. In practice uncontrolled terms occurred rarely, though the 

share was not specified in the paper. Automatic indexing was represented in the study 

by ten different indexing models such as the Okapi probabilistic model, a binary model 

86

87 

Chapter 5 

where a term either occurs or do not occur, and a number of weighted approaches. The 

test collection contained approximately 145.000 documents. Thus, the results of the 

study cannot necessarily be transferred to real life databases, which commonly contain 

millions of documents. 25 queries represented the information needs of the study. 

Concerning controlled versus uncontrolled indexing the study found the best 

performance to be achieved by using a combination of the two general indexing 

languages. Similar findings have been made by Tenopir (1985) regarding controlled 

indexing and natural language indexing. A comparison between controlled and 

uncontrolled indexing slightly favored controlled indexing when measured as to mean 

average precision. However, the results were not statistically significant. Going 

through the results manually revealed rather comprehensive variations at query level. 

This result emphasizes the influence of test queries on test results and the importance of 

validation. 

The studies above have compared controlled and uncontrolled indexing 

languages. Recently, Price et al. (2007; 2009) have introduced the notion of semantic 

components that allow for a simultaneous combination of controlled and free text 

indexing. Semantic component indexing provides a supplementary, enriched 

description of document contents by manually marking up segments of text in a 

document (i.e., semantic component instances) with labels (semantic component 

names). Domain-specific documents tend to contain characteristic types of information 

(semantic components). With semantic components a searcher can search for query 

terms within specific semantic components, or specify a preference for documents 

containing particular semantic components. Hereby, the searcher can combine the 

advances of uncontrolled full text search and domain-oriented controlled indexing that 

emphasizes topics or components of the documents. Semantic components have been 

empirically evaluated (e.g., Price et al., 2007; Price et al., 2009). The results suggest 

that this particular type of indexing can be a valuable improvement of full text indexing. 

As appears from the studies presented above, precision and in particular recall 

has been applied several times. However, the results do not point to an unambiguous 

relation between types of indexing languages and the mentioned performance measures. 

Rather, it seems that Svenonius’ (1986, p. 335) perception that both free text and 

controlled vocabularies contribute to recall and precision, but in different ways, is 

validated. Apparently a combination of controlled and uncontrolled indexing may be 

advisable, taking into account the respective strengths and weaknesses of the respective 

indexing languages. Rowley (1994) concludes her paper by outlining a number of


factors that may help decide on the optimal combination of the indexing languages; 1) 

the searching environment, 2) the searchers, 3) available retrieval facilities and 

strategies, and 4) the nature of the search. On the basis of these factors and on the basis 

of this section in general, we can conclude that the selection of indexing languages 

should reflect the actual area of function. 

5.3.3 Intellectual vs. automatic indexing 

Overall, indexing may be carried out by humans (intellectual indexing), by 

machines (automatic indexing), or by a combination of the two (semi-automatic 

indexing). As indicated by the name, intellectual indexing is the indexing carried out by 

humans, that is, indexers assign index words to documents, usually on the basis of a 

controlled vocabulary. The literature also applies the terms human or manual indexing 

to designate intellectual indexing. 

Rafferty & Hidderley (2007) identify three approaches to intellectual indexing: 

Expert-led indexing, author-based indexing, and user-based indexing. Traditionally, 

intellectual indexing have been carried out by professional indexers, expert-led 

indexing. The purpose is to establish a connection between user and document on the 

basis of a controlled vocabulary, by using free text identifiers, or a combination of the 

two. In scientific databases it is also common, that authors attach keywords to their 

contributions. This is referred to as author-based indexing. These keywords are not 

selected from a controlled vocabulary. Rather, they represent the authors’ perception of 

the content of their document in the form of uncontrolled index terms. With the amount 

of information produced today, e.g., on the Internet, supplementing or perhaps even 

replacing professional indexers with other indexers can be a means to ensure subject 

representation of information objects. Thus, in the latest decade, we have seen the 

emergence of online sources that allow users to assign tags to information sources (e.g., 

Hunter, 2009; Trant, 2009). User tags broaden the conception of indexing due to the 

supplementary functions, tags also have (Golder & Huberman, 2006). Thus, tags allow 

for more fine grained access to information sources than usually possible through 

professional indexing using a controlled vocabulary (Kipp, 2005). The latter type is 

known as user-based indexing. 

Rafferty & Hidderley (2007) characterizes the three types of intellectual 

indexing as to their communicative potential. Thus, both expert-led and author-based 

indexing is characterized as monologic, because they express a kind of indexing that is 

88

Controlled indexing Uncontrolled indexing 

Monologic Professional indexers Author-based indexing 

Dialogic User-based indexing 

89 

Chapter 5 

Figure 5.4 Generalized characteristics of intellectual indexing. Accumulated on the basis of 

Rafferty & Hidderley (2007). 

not communicating with the potential users of the indexing. User-based indexing on the 

other hand represents a dialogic type of indexing, because it allows for the users of 

documents to express their individual interpretation of an information unit. This is 

graphically illustrated in Figure 5.4. One must keep in mind, that the figure presents a 

generalized view of the intellectual indexing types. In other words exceptions to the 

figure do exist. For instance, in some cases professional indexers also carry out 

uncontrolled indexing. The reason for connecting professional indexers with controlled 

indexing after all is the fact, that this is the most frequently occurring type of indexing 

performed by this particular group of indexers. As appears from the figure, the box 

representing dialogic, controlled indexing is empty. The reason is that this type of 

indexing has not yet been fully developed. Different authors have addressed the 

problem and different solutions for the lack of control in folksonomies have been 

proposed (cf., Trant, 2009). 

Different studies have investigated the characteristics of the different types of 

indexers mentioned above. Kipp (2005) has compared users’, authors’ and professional 

indexers’ assignment of index terms and tags to 165 scientific papers from core LIS 

journals. The analysis presented mostly investigated the terms assigned by professional 

indexers and users. Kipp found that there was some overlap in the terms assigned, but 

that the overlap often represented narrower terms, broader terms, related terms and 

synonyms. However, quite a number of terms were not related to each other between 

the indexer groups. Kipp suggested, that one explanation could be, that users could 

apply one specific term to address new concepts, whereas indexers needed to express 

new terms in a controlled vocabulary by a combination of controlled terms already 

existing in the vocabulary. 

In a later study, Strader (2009) made a comparative study investigating the 

degree of overlap between author-assigned keywords and Library of Congress Subject 

Headings (LCSH), that is, controlled index terms assigned by professional indexers. 

The subject of investigation was bibliographic records representing doctoral students’


publications in an online catalogue. 285 theses and dissertations containing a total of 

1.681 author keywords and 1.181 LCSH terms were analyzed. The study showed, that 

there was a certain overlap between author-assigned and LCSH. However, 

approximately half of the author-assigned keywords did not match LCSH. A number of 

reasons can explain the lack of overlap between subject terms. One reason may be that 

LCSH are not updated frequently enough to reflect current research. Another reason is 

that the authors use a different terminology to represent similar concepts. Strader also 

found that about one-tenth of author-assigned subject terms and one-third of LCSH 

supplements data could be found elsewhere in the bibliographic record. In other words, 

LCSH to a larger degree supply users with unique access points to the investigated 

records. However, it was concluded that both types of indexing enriches the retrieval 

environment for users. 

Thomas, Caudle & Schmitz (2009) also examined LCSH, but compared it to 

user tags in Library Thing. Ten books were selected to form the basis of the 

investigation. The criteria for selection were that the books were popular, and that they 

represented weak LCSH areas. Both criteria must be taken into account, when applying 

the results of the investigation, since it favors user tags, which potentially affects the 

generalizability of the study. On the basis of the investigation the authors found, that 

users tag for their own purposes. Also, there was a certain overlap between LCSH 

subject terms and user tags, but user tags were stronger than LCSH terms, when 

concerning task organization. Users of information systems will get the richest 

indexing, when the system applies a combination of user tags and LCSH terms, but the 

benefits are greater, when the number of tags is large. 

As a part of her Ph.D. work, Choi (2010a; 2010b) carried out a study 

investigating user-based indexing and expert-led indexing. Though preliminary in 

nature, the study compared index terms assigned to web documents at the web sites 

Intute, BUBL and Delicious. The first study took into account both controlled and 

uncontrolled keywords from Intute. The study showed, that the subject perspectives 

expressed at the three examined websites differed, even between the two sites 

representing professional indexers (Choi, 2010a). The second study left out subjective 

and personal tags from Delicious. The study found that the level of similarity between 

indexers and users differed as to the subject of the indexed websites. Thus, subjects 

with a larger intake of new words (e.g., technology) tend to generate less consistency 

between indexers (Choi, 2010b). 

90

91 

Chapter 5 

Attar (2006) carried out a study evaluating the indexing performance of student 

indexers. Unlike the studies just mentioned, Attar’s study is not comparative in nature. 

The study investigated subject indexing and the formal description of information units 

in a library catalogue. 37 undergraduate and graduate students catalogued and indexed 

a full library collection with very diverse document types after having received two 

days of detailed training. The students came from diverging studies, but none were LIS 

students. When possible, the students indexed information within the subject area they 

were familiar with from their study. Evaluating the indexing subsequently, Attar found, 

that the problems in the indexing carried out in particular related to inconsistent and 

incorrect use of subject headings. For literary works, particularly the use of genre 

caused trouble. The problems were caused by lack of training and lack of familiarity 

with LCSH. In this manner, the study stresses the importance of proper training, when 

carrying out indexing at a professional level. 

The empirical studies presented above to a large extent inform us about the 

characteristics of different types of intellectual indexing. However, as reflected in 

Figure 5.4, the results of the comparisons also elucidate the pros and cons of controlled 

vs. uncontrolled indexing languages presented in section 5.3.2. Taking into account, 

that user-based indexing appear to follow a power law distribution: few information 

units receive most of the assigned tags and vice versa (cf., Thomas, Caudle & Schmitz, 

2009), user-based indexing should not be the only type of indexing, at least in systems 

that are also used for high precision searches. Further, the studies, that have been 

presented above have one thing in common. The method applied is to analyze the 

product of the indexing, namely the assigned tags or index terms. Several authors make 

conclusions on the intentions of the indexers on the basis of the indexing product. A 

study investigating indexer intentions for indexing in a more qualitative manner would 

be an interesting supplement to the existing and highly enlightening studies. 

Automatic indexing constitute a contrast to intellectual indexing since it 

designate indexing carried out solely on the basis of a mechanical identification of index 

terms on the basis of word occurrences in documents. We will explain the concept more 

thoroughly in Section 5.4 and onwards. According to Albrechtsen (1993) automatic 

indexing represents the most simplistic conception of subject analysis, since the subject 

of the document is solely based on the frequency of terms. However, automatic 

indexing may take the form of either extracted or assigned indexing (see Sections 5.4.1 

and 5.4.2. below). As far as automatically assigned indexing is concerned, 

Albrechtsen’s statement may be discussed. Here the assignment of index terms is


carried out on the basis of a set of rules directing the occurrence of certain words to 

specific points in a controlled vocabulary. Since these rules have been formulated by 

humans, some sort of intellectual interpretation of the subject relation between the 

document and the controlled vocabulary has been established. 

If manual indexing is taken to represent controlled indexing and automatic 

indexing to represent uncontrolled indexing, the differences between the two 

approaches will to a large extent be reflected in Table 5.2. However, additional 

differences exist between manual and automatic methods. One obvious difference 

between human and automatic indexing relates to economy. Thus, it is quite costly to 

perform manual indexing, when it comes to economy and time consumption, at least 

concerning expert-led indexing. This can explain some of the efforts put into 

developing automatic methods. Accordingly, the low costs connected with automatic 

indexing are implicitly considered a strength. Here it is important to keep in mind the 

costs related to not being able to find information (cf. Feldman & Sherman, 2001). 

Ineffective retrieval may be caused by both manual and automatic methods. Thus for 

both indexing methods Feldman & Sherman’s calculations emphasize the need to carry 

out evaluations in order to ensure the functionality and quality of indexing. 

However, the two approaches differ as regards to more qualitative aspects as 

well. We have previously mentioned consistency, which is a highly relevant concept 

here. As appears from the papers reviewed above on manual indexing, human indexers 

undertake an interpretation of the content of a piece of information ahead of the 

assignment of index terms, whether they are controlled or uncontrolled. This 

interpretation most likely leads to inconsistencies due to differences in the indexers’ 

conception on, what the document is about (Anderson & Perez-Carballo, 2001a). 

Concurrently, the human interpretation allows for documents to be represented by terms 

not present in the document, which potentially enriches the indexing. In automatic 

indexing, the interpretation is based on statistical calculations based on term occurrence, 

which increases consistency considerably. But, as stated by Bloomfield (2002), 

consistency of indexing may be consistently good or bad. As a consequence it could be 

added that indexing can be inconsistently good. By this is meant that a consistent bad 

indexing is not necessarily preferably to an inconsistent indexing that contains very 

good elements along with very bad elements. According to Mandersloot, Douglas & 

Spicer (1970, p. 50), “[human...] indexing may have inconsistencies, but it is flexible. 

Machine indexing may be consistent, but it is rigid.” Whether this opinion is also 

reflected in empirical comparisons of the two approaches will be investigated below. 

92

93 

Chapter 5 

Different authors have referred to the difficulties of isolating indexing as a 

variable when measuring the performance of an IR system (e.g., Anderson & Perez- 

Carballo, 2001a). However, numerous studies exist that have compared the 

performance of the two approaches. Salton’s (1986a) review of early studies argues for 

the potential of automatic indexing in various evaluative settings. As mentioned 

previously, several of the studies reviewed in section 5.3.2, are also relevant in the 

present section due to their comparison of on one side manual controlled indexing and 

on the other hand automatic, uncontrolled indexing. Examples are the Cranfield 

experiments (e.g., Cleverdon, 1967) that demonstrated promising results as regards 

automatic indexing in the form of single terms compared to manually assigned 

controlled index terms, and Savoy’s (2005) study that found that the best performance 

was achieved by a combination of manual and automatic methods. TREC 6 have also 

carried out experiments regarding automatic indexing and retrieval. Here the studies 

have not been compared to human indexing, but have been tested in isolation. In 

particular the tracks testing term weighting are relevant to automatic indexing (Harman 

& Voorhees, 2006). In sum, apart from expected lower expenditures on automatic 

indexing, what can also be expected from automation of indexing procedures is an 

increased level of consistency. In the sections below a more detailed presentation of the 

characteristics of automatic indexing will follow. 

5.4 Approaches to automatic indexing 

Automatic indexing designates the situation, when machines substitute human 

indexers and carry out the indexing of documents (Lancaster, 2003, p. 283). With our 

point of departure in the broad definition of automatic indexing as outlined in the 

introduction to the present chapter, automatic indexing is covered by a number of 

diverse research societies. Golub (2006) differentiates between four approaches (text 

categorization, document clustering, document classification, and mixed approaches) 

originating from different research societies such as machine learning, information 

retrieval, and library science. A fair amount of the automatic approaches to indexing 

are based on techniques and principles that go back in time. The most significant 

difference between the time of development of the techniques and the present is that the 

6 Short for Text REtrieval Conference. TREC first started out in 1992 (Harman & Voorhees, 2006).


power and capacity of hardware has increased along with the amount of digitalized 

documents. As a consequence, the automatic indexing achieved today has a better 

performance, though there is still room for improvement (Lancaster, 2003, p. 330-331). 

We divide the automatic approaches as to whether they represent extracted or 

assigned methods. In relation to the present work, this division makes sense, because it 

reflects the manual approaches that are being mirrored in the automatic counterparts. 

Moreover, this is the categorization employed by Lancaster (2003) and Moens (2000). 

We will use this division in the sections to follow. In practice, algorithms for automatic 

indexing usually make use of more than one method at the time (Coyle, 2008). In our 

review, however, we will present and discuss the methods in their pure form with 

whatever characteristics they may have. 

5.4.1 Automatic extracted indexing 

In automatic extracted indexing terms are drawn from the document itself to 

represent the content of the document in line with natural language indexing mentioned 

previously. The most basic kind of automatic extracted indexing is indexing all 

occurring words in a collection of documents (Anderson & Perez-Carballo, 2001b, p. 

258) . However, not all natural language index terms appearing in documents makes 

good descriptors of a document. Therefore, a number of techniques are necessary in 

order to increase the quality of descriptors when extracting index terms from 

documents. The basic procedure consists of five steps of which some or all may be 

included in order to improve the automatic indexing; 1) identification of words 

appearing in the document collection (lexical analysis); 2) removal of function words 

using a stop word list; 3) execute stemming in order to make words appear in their basic 

form; 4) compute a weighting factor for the remaining words taking into account the 

term frequency and inverse document frequency; and 5) represent documents with the 

calculated value on the basis of the previous steps (cf. Salton, 1989, p. 304; Salton & 

McGill, 1983). Others supplement the five step procedure with additional steps such as 

formation of phrases. In some cases this results in a changed succession of steps 

(Moens, 2000, p. 78). In the sections to follow, we will elaborate on the single steps. 

5.4.1.1 Lexical analysis and stop word lists 

Lexical analysis identifies a stream of characters into a stream of words. 

Single words are identified, when separated by space or punctuation (Moens, 2000). 

94

95 

Chapter 5 

Some challenges, which might occur in the process, are abbreviations, hyphenated 

terms, punctuation in general, and digits. A machine readable dictionary may help 

solve the problems of abbreviations, and in some cases hyphenated terms. The 

examples do not necessarily cause problems. However, they need to be considered, 

when performing lexical analysis (Fox, 1992; Moens, 2000). 

Stop word lists are lists of the most common words that are removed from full 

text documents in order to reduce the number of possible index terms. Alternative 

designations comprise stop lists or negative vocabularies (Fox, 1989). The assumption 

about stop words is that they do not candidate for good index terms. In particular, it is 

desirable to eliminate function words from the list of potential index terms (e.g., Luhn, 

1957; Salton, 1989). Further, stop word lists limits the space needed in indices (Wilbur 

& Sirotkin, 1992). Stop word lists commonly contain between 50 and 400 words when 

directed towards English text (Moens, 2000). For both lexical analysis and stop word 

lists, the domain in question should be taken into consideration. Thus, for some terms, 

the usefulness of potential index terms may differ depending on the application area 

(Fox, 1992). When preparing a stop word list different choices must be made. The size 

of the list, whether large or small, is at the introductory stage decided by the cut-off 

level based on the frequency of terms. For instance, Fox (1989) set the cut-off to 

occurrences above 300 for a large stop word list. In addition, different qualitative 

actions can be made in order to qualify the stop word list further. Examples are 

reckoning of alternative spellings and variants of stop words with diverging prefixes 

and suffixes. Also examination of potentially relevant and irrelevant words with an 

occurrence close to the cut-off limit is likely to qualify the stop word list (cf. Fox, 

1989). 

5.4.1.2 Stemming 

Stemming identifies morphologically related terms by reducing variants of a 

word to its stem or root (Moens, 2000; Salton & McGill, 1983; Anderson & Perez- 

Carballo, 2001b). Specifically affixes, that is, prefixes and suffixes are removed from 

natural language in order to identify stems (cf. Hammarström, 2006). The assumption 

is, that “when stems are used as index terms, a greater number of potentially relevant 

items can be identified than when one of the original full text words is in use” (Salton & 

McGill, 1983, p. 72). Using a stemmer is likely to increase recall, as documents with 

morphological variations of the same stem are merged to be represented by the same 

index term. Further, like stop word lists, the use of stemming reduces the need for


space in the index, since the number of potential index terms are reduced during the 

process (Salton & McGill, 1983; Moens, 2000; Willett, 2006). 

Stemming can be divided into manual or automatic methods. Manual methods 

employ some type of regular expressions. Automatic stemming on the other hand uses 

for instance affix removal, successor varieties, table look ups or n-grams (Frakes, 1992). 

Automated stemming is carried out by a stemming algorithm that removes prefixes, 

suffixes or both. Two potential problems challenge the performance of stemming: 

under stemming and over stemming. Under stemming removes too little of the term, 

while over stemming removes too much of the term and thus corresponds to what is 

known as over truncation and under truncation in retrieval (cf. Chowdhury, 2004). 

However, what causes over stemming and under stemming differs between languages 

due to the differences in morphological structure (Moens, 2000; Willett, 2006) 

A number of stemmers have been proposed. However, two algorithms in 

particular stand out, namely Lovins’ stemmer (1968) and Porter’s stemmer (1980). 

Both algorithms are aimed at suffix removal, which is the most common type of 

stemmers. Further, both stemmers are aimed at single-word terms (Galvez, de Moya- 

Anegon & Solana, 2005). Lovins’ stemmer involves two steps. In the first step the 

stemming is carried out. In the second step spelling exceptions are handled by a set of 

rules in order to avoid the merging of stems with differing spellings. Examples are 

collide and collision (Lovins, 1968). Like Lovins, Porter specifies a set of suffixes to be 

removed from stems. However, spelling exceptions are not incorporated in the Porter 

stemmer (Porter, 1980). Recently, Porter has developed a new generic stemmer, 

Snowball, which provides stemmers for a number of different European languages 

including Danish (Porter, 2001). 

5.4.1.3 Weighting factors 

When terms are weighted it is implicit, that some terms, even after lexical 

analysis, stop word lists, and stemming have been applied, are more important than 

others. In other words, when differentiating the weights of the remaining terms, it is 

implied that the first three steps are not sufficient for the identification of good index 

terms. Luhn (e.g., 1961) was a pioneer in suggesting, that terms occurring in documents 

could substitute for controlled vocabularies in respect to indexing. The assumption was 

that the subject of a document is reflected by the occurrence of terms designating that 

subject (Moens, 2000; Salton, 1989; Salton & McGill, 1983). The higher the frequency 

of a term (TF), the higher is the probability that the document is concerned with the 

96

97 

Chapter 5 

Figure 5.5 The resolving power of significant index terms. Adapted from Luhn (1958a, p. 161) 

subject referred to by the term. Obviously, this is only true up to a certain point. Stop 

words and other high frequent words do not make good index terms. According to 

Luhn (1958a), good index terms should be found among terms with a medium 

frequency in the document. Luhn’s thoughts are a continuation of Zipf’s findings. 

Approximately a decade earlier, Zipf (1949) discovered that the frequency of terms in a 

document is inversely proportional to its rank position. The principles are illustrated 

below (see Figure 5.5). 

The early ideas by Luhn have been refined and expanded in the years to follow 

due to different issues; among others lack of uniform application and empirical support 

(Salton, 1970, 1988). Thus, it has turned out that fundamental problems arise, if TF is 

used as the only basis for extracted indexing. The reason is that mere TF does not take 

into account the comparable occurrence of terms across documents. The result will be 

low precision indexing, since a term with a high frequency in a large collection of 

documents is not able to distinguish the documents from each other, which is the 

implicit purpose of precision (Salton & Buckley, 1988; Salton, 1989). One way of 

correcting for the limitations of TF is to add the inverse document frequency (IDF) into 

the calculation of term weights. IDF expresses the occurrence of terms in a collection


of documents. The assumption is, that terms with a high frequency across a collection 

is less able to discriminate between the documents containing that particular term, than 

a term that is high frequent in just a few documents (Salton, 1989). The formula for 

IDF takes just this into account. The formula is (Moens, 2000; Salton & Buckley, 

1988): 

 

N 

log 

n 

t 

where 

log = common logarithm 

N = number of documents in the collection 

nt = number of documents in the collection containing the term t 

By combining TF and IDF, high weights are allocated to terms that 

simultaneously have a high frequency in a document and a low frequency in the 

document collection. Further, the product of TF*IDF is one way of measuring term 

discrimination values. Thus, term discrimination is comparable with IDF (Salton, Yang 

& Yu, 1975). Term discrimination value expresses a terms ability to distinguish 

documents of a collection from each other. A core concept in relation to term 

discrimination value is connectivity. High connectivity is characteristic for bad index 

terms due to their lack of capacity to distinguish between documents in a collection. On 

the contrary, good index terms has a low connectivity between documents (Jones & 

Furnas, 1987; Moens, 2000). The applicability of the TF*IDF factor have been the 

subject of different opinions over the years. In a presentation of early empirical retrieval 

tests Sparck Jones (1973) concludes that the combination of the two frequencies 

improves retrieval considerably, when compared to weighting based of TF alone. 

However, as pointed out by Salton & Buckley (1988), a major weakness of the TF*IDF 

product is the need for continuous updates of the frequency factor. This is in particular 

necessary in dynamic document collections. Thus, TF*IDF is more suitable for static 

collections. 

In addition to IDF, the length of documents (or vectors cf., Salton & Buckley 

(1988)) could be taken into consideration, when determining term weights. Thus, long 

documents contain more terms than short, which makes a long document more 

98

99 

Chapter 5 

retrievable than a short document due to the higher frequency of terms. In retrieval the 

problem of the frequency of terms may be reduced by normalizing the term frequency 

as to the length of the document (Singhal et al., 1996). Evidently, normalization is 

particularly necessary in document collections containing heterogeneous documents. 

Further, a long document usually contains more synonyms for the same concept, which 

also increases retrieval. In this case, obviously, normalization of length will not be 

useful for correction. Instead more qualitative tools must be considered. The possible 

higher degree of semantic variability in longer documents could, at least partly, explain 

the tendencies observed by Singhal et al. (1996). They find that in spite of 

normalization, longer documents still tend to have better retrieval compared to shorter 

documents. Similar observations have been made earlier by Sparck Jones (1973), who 

concluded that document length normalization has a little, if any, effect on retrieval. 

5.4.1.4 Compound nouns as index terms 

The procedure outlined above refers to extraction of single word index terms. 

However, also phrases may be taken into account when considering weighting factors. 

Phrases constitute a particular challenge since the occurrence of two or more words in a 

phrase frequently has a quite different meaning than the single terms included in the 

phrase itself. This is the case concerning noun phrases and proper names. Usually, 

phrases bear more meaning and specificity index terms than the single terms 

constituting the phrase (Croft, Turtle & Lewis, 1991; Lancaster, 2003). A classic 

example is the phrase “venetian blinds”. When in a phrase, the concept refers to a 

certain kind of blinds. When divided into single terms, it refers to people from a 

specific city and something used to cover windows respectively, that is, a completely 

different meaning. When combined, but not as a phrase, it may refer to venetians, that 

cannot see. On the other hand, the probability that the two words may occur in the same 

sentence, but without appearing as a phrase is rather low (Salton & McGill, 1983, p. 

86). A number of techniques, whether simple (e.g., simple collocations, statistically 

validated N-grams, syntactic structures) or advanced (e.g., extended n-grams, or 

syntactic parsing), may be used in order to identify phrases in documents (Strzalkowski 

et al., 1999, p. 117). At present, the methods are expensive and time consuming to 

perform, and many questions remains unanswered, such as the pay off as to the 

investments undertaken in different contexts (Voorhees & Pazienza, 1999; Anderson & 

Perez-Carballo, 2001b). This may also explain why at present the weighting functions 

of single terms are to some extent employed when weighting phrases as well (Moens, 

2000). Phrases may be considered either as a set of words or as separate concepts when


weighted (Croft, Turtle & Lewis, 1991). Three scenarios may be lined up for the 

calculation of phrase weights; 1) the phrase is considered as one unit and the assigned 

weight is independent of the components constituting the phrase; 2) the phrase weight is 

calculated on the basis of the single terms comprising the phrase; or 3) a combination of 

1) and 2), where the results of both approaches are considered, when the weight is 

computed (Moens, 2000). 

The challenges of phrases mentioned here particularly concern the English 

language. In the present empirical work the test collection consists of primarily Danish 

texts. The Danish language belongs to the Germanic family of languages along with 

German, Swedish, and others. Germanic languages differ from English in a number of 

ways. The way compound nouns are created is particularly pertinent to automatic 

indexing. Thus, where English compound nouns are created as phrases, Germanic 

languages create compounds by joining them together in one word (Hedlund, 2002). 

This means that results of IR and indexing studies cannot be transferred to Germanic 

languages as a matter of course (Ahlgren & Kekäläinen, 2007). Thus, the challenges 

related to English basically consist of identifying when two or more single terms are in 

fact a noun phrase. The purpose is to enable an increase in precision. In the Germanic 

languages the challenges are if anything the opposite. Here techniques needs to be able 

to identify the components of compound nouns in order to increase recall (cf. Pedersen, 

Navarretta & Hansen, 2005). Techniques for identifying the components of compound 

nouns have been developed and tested for a number of languages other than English. 

However, Danish is not among the most thoroughly discovered languages in this 

respect. A 2-year research project, the VID-project, was carried out in the mid-00s by 

centre for language technology, University of Copenhagen. The overall purpose of the 

project was to investigate the potential of human language technologies as regards 

acquisition and representation of information (Pedersen, Navarretta & Henriksen, 2004). 

Amongst others the project contributed with knowledge about how marking up texts as 

to word classes would affect recall and precision in IR. The study found that precision 

were very satisfying (=0.9), whereas recall surprisingly were lower (=0.6). The reasons 

for the lower recall were explained by errors in the recognition of terms, and by a 

general lack of complexity in the recognition (Pedersen, Navarretta & Hansen, 2005, p. 

28). The results of the study emphasize the need for language tools to allow for a high 

degree of complexity, when aimed at Danish and similar languages. 

To our knowledge no other research supplements the VID-project as regards 

the uncovering of the Danish language. Swedish, on the other hand, has been 

100

101 

Chapter 5 

investigated in different studies. Due to the large share of similarities we may 

reasonably transfer Swedish results to the Danish language. Ahlgren & Kekäläinen 

(2007) have tested a number of different techniques in a comparative study of Swedish 

text. 4 indexing methods ranging from raw text over inflection 7 to two variants of 

compound splitting were compared in the study. The indexing methods were evaluated 

and compared on a collection of Swedish newspaper articles using topics from CLEF. 

To set the scene for evaluations 6 user profiles were set up. The profiles varied as to 

their degree of patience and their perception of what makes a relevant document. 

Normalized discounted cumulated gain (nDCG) (Järvelin & Kekäläinen, 2002) was 

used to measure the performance of indexing methods. The study found that, compared 

to the remainder of the tested methods, in general the simplest indexing method had the 

lowest performance when original words from the topics were used as query terms. On 

the contrary, when the topic terms were truncated for queries, the same method had the 

best performance compared to the remainder. In sum, inflection and compound 

splitting did not improve retrieval compared to simple truncation. It appears the lessons 

learned in phrase indexing, namely that it is time consuming and that the payoff is 

questionable, also seems to be the case for methods applicable for more complex 

languages than English. 

5.4.1.5 Extracted indexing 

Extracted indexing refers to the automatic extraction of index terms based on 

various techniques. The techniques included here corresponds to what Golub 

designates as document clustering (Golub, 2006). However, in order to be able to 

separate the overall concept from the specific technique clustering, we apply the term 

extracted indexing below. As noted by Golub (Golub, 2006), this approach lies within 

the IR-tradition. The close relation between extracted indexing and information 

retrieval is constituted by the common use of advanced IR techniques for marking up 

documents and matching queries with documents. 

Extracted indexing in its most simple form is based on the steps described in 

the preceding sections (lexical analysis, removal of stop words, stemming, and term 

weighting) (Salton, 1991). However, also more advanced techniques exist. Such a 

technique is, for example the vector space model. The vector space model may be 

7 

Inflection designate the different forms a word can take, whether it is due to mutation caused by plural 

form, strong verbs, compounding use of glue morphemes and others (cf. Ahlgren & Kekäläinen, 2007).


applied for comparison between documents (indexing) or between documents and 

queries (IR). For indexing purposes documents are represented by vectors on the basis 

of terms occurring in the documents of a collection. The steps comprised by simple 

extracted indexing mentioned above are followed in order to generate a vector 

representing each document. Subsequently the vectors are processed using cluster 

analysis (Salton, Wong & Yang, 1975). 

Two main steps constitute the process of advanced extracted indexing: 1) First, 

documents are being represented vectors. Subsequently the vectors are being compared 

using a similarity measure, and 2) Clusters are formed by means of clustering 

algorithms (cf. Golub, 2006, p. 356). Cluster analysis designates a method for data 

analysis with numerous areas of application. By means of cluster analysis unlabelled 

patterns within a set of items can be grouped into meaningful clusters (Jain, Murty & 

Flynn, 1999). In terms of documents cluster analysis clusters documents in a collection 

according to common features between documents in a collection. 

Extracted indexing is an unsupervised way of organizing documents, because 

no pre-categorized documents are used as training documents. Document clustering 

may be based on terms occurring in the documents, or on co-occurring citations. Terms 

can also form the basis for clustering. In that case co-occurrence in the document 

collection constitutes the unit of analysis (Rasmussen, 1992). Document clustering is 

characterized as an extracted kind of indexing, since the clusters are not matched against 

a controlled vocabulary. 

The performance of extracted indexing based on various clustering techniques 

and/or on simple extracted indexing has been examined in different comparative studies. 

We have already mentioned the Cranfield tests, one of the first attempts to evaluate 

automatic extracted indexing (see section 5.3.2). 

An early attempt to carry out a clustering interface for post retrieval, Grouper, 

was presented by Zamir & Etzioni (1999) in the late 90es. The functionality was made 

for a meta search engine, HuskySearch. The technology behind Grouper consisted of 

three steps; 1) stemming, 2) identification of basic clusters, and 3) merging of clusters 

with a high degree of overlap between contained documents. Further, Grouper had 

technology built in, which allowed for correcting redundant titles of clusters along with 

technology allowing for fast processing of search results.. The interface was evaluated 

using search logs. 3.183 queries had been logged at the Grouper interface within 2 

months, while 19.330 queries had been logged from the HuskySearch interface 

(representing ranked search results with no clustering of results). The data material 

102

103 

Chapter 5 

does now allow for an explanation of the patterns identified in the search logs due to the 

lack of qualitative data. Another limit to the study, which was pointed out by the 

authors, was the lack of control of who used the test system (Grouper) and the baseline 

system (HuskySearch). From the data it appeared that users explored several clusters in 

order to locate relevant documents in the Grouper interface. The authors explained the 

undesired situation by either a user behavior that searches for different perspectives of 

their information need or simply that generation of clusters were not able to match user 

needs sufficiently. When compared to the baseline system it was found that Grouper 

users found more documents, perhaps suggesting that the clustering made it easier to 

locate relevant documents. A qualitative follow up on the study would provide a more 

thorough understanding of the hypotheses put forward by the authors in the light of their 

findings. 

A later study based on extracted indexing is Käki’s (2005a; 2005b; Käki & 

Aula, 2005) investigation of categorization of web documents for his dissertation work. 

Here, two algorithms for extracting category candidates were applied. One was allowed 

for single terms along with phrases, while the other required phrases containing of at 

least 2 terms. The algorithms were used in order to build a list of categories for 

organizing web results. The results were added to a category if it contained the name of 

the category in its result summary text (Käki, 2005a). It is the extraction of candidate 

terms from the documents themselves and the lack of supervision that cause us to 

classify Käki’s work within extracted indexing. 

On the basis of the two extraction algorithms two interfaces was set up for 

testing. Different evaluations have been reported from the study. Käki & Aula (2005) 

made a comparative study of an interface comprising the algorithm and categorized 

search interface with the World Wide Web (hereafter: the web) as the test base. The 

baseline was a Google web page displaying the results as a ranked list. 20 test persons 

participated in the test, where 9 predefined queries in general topic areas formed the 

starting point of the searches. The test persons were allowed 1 minute for each task in 

order to reflect a faster behavior that supposedly would be more realistic. The 

performance of the experimental system and the baseline system were measured in 

terms of 1) time to accomplish a task, 2) number of results selected for a task, 3) 

relevance of selected results measured on a 3-point scale (relevant, related, and not 

relevant), and 4) subjective attitude concerning both experimental and baseline systems 

(Käki & Aula, 2005, p. 199). In addition recall and precision were measured on the 

basis of the relevance judgments carried out by the test persons. The study found that


the categorized interface had a better average performance in precision (62%, sd=13 

against 49%, sd=15) and recall (33%, sd=4 against 19%, sd=7). The results of the test 

persons’ attitudes against the two systems demonstrated a fairly more positive attitude 

towards the test system compared to the baseline system. The test did not find 

substantial differences as to the time applied, most likely due to the very short time 

window applied in the test. 

The highly controlled test just referred was followed up by a longitudinal study 

that was considerably less experimental (Käki, 2005b). 16 people participated in the 

study. The participants did not receive any instruction for the use of the test system 

besides using it any way they would like. This reflected the purpose of the study, 

namely to reflect the participants’ real behavior. This time no comparison was made to 

a baseline system. Like in the previous study, the test system was applied to the web. 

The data collection lasted for three months including one month of compensation for a 

holiday period. Two types of data were collected; search logs and questionnaires. One 

questionnaire was distributed a week or two after the launch of the data collection, the 

other in the end of the study. 3099 queries were logged, while 3232 result pages were 

accessed and 1915 categories were selected. The relevance of retrieved documents was 

not registered. The study found that categories were used to select 26% of the accessed 

result pages. In the qualitative part of the first questionnaire, participants indicated that 

categories were useful, when “…the original query was vague, broad, general, or 

contained words that have multiple meanings...” (Käki, 2005b, p. 138). The ability of 

the categories to help increase the focus of a less precise query was also expressed in the 

second questionnaire. Further, categories were found useful, when result rankings were 

deficient. The results of the study are interesting, because it demonstrates that 

categorizing results is not necessarily useful in all information searching situations. 

From the analysis we do get some indication of, when categories may be useful. 

However, a more systematic investigation would clearly be relevant. 

5.4.2 Automatic assigned indexing 

Automatic assigned indexing is the automatic equivalent to human controlled 

indexing. The major difference between automatic extracted and automatic assigned 

indexing is that a coupling is established between terms occurring in a collection of 

documents and a controlled vocabulary. The apparent advantage of coupling natural 

language index terms with a controlled vocabulary is the enabling of allowing relations 

between documents that share one or more controlled index terms. 

104

105 

Chapter 5 

Different approaches exist for performing automatic extracted indexing. Two 

methods can be identified within the text categorization literature. 8 One is based on 

knowledge engineering, the other on machine learning (Sebastiani, 2002). Text 

categorization is considered an assignment type of automatic indexing due to its 

categorization of documents into a predefined set of categories. To compare, 

information filtering represents another means of categorizing documents, though with 

dynamic categories (Belkin & Croft, 1992). 

Initially, text categorization was carried out using a rule based approach (or 

knowledge engineering approach, cf. Sebastiani (2002)). Typically, a set of rules was 

built after the principle of if-then, meaning that if a document met certain criteria it 

would be categorized in a specific category (Sebastiani, 1999). In practice, a profile 

was created for each term to be assigned, containing words and phrases with a high 

frequency in the documents that usually would be assigned with the controlled index 

term (cf. Lancaster, 2003, p. 287-288). The sum of rules is referred to as a classifier. 

Hayes & Weinstein (1990) are frequently mentioned in the literature as an example of 

this approach. They reported a system for categorization developed for news stories at 

Reuters. The categorization of documents is based on two steps; 1) concept recognition, 

and 2) categorization rules. In the first phase both single terms and phrases are included 

in the recognition. Further, the system is based on a certain degree of human 

intervention to either limit or extend the context of terms if necessary. Thus, the system 

may basically be considered a hybrid cf. section 5.5. Also the rules have been extended 

compared to plain if-then rules. Thus, the context of a term is included in order to 

decide on the strength of the term as to a specific category. Further, when generating 

the rules, the developers may take into account terms’ specific position in a news story 

just like the length of the document may be considered. 674 rules were created in order 

to meet the needs of the document collections at Reuters’. Hayes & Weinstein (1990) 

report an evaluation in their presentation of the system. However, due to the very 

concise presentation we will not go further into the results here. Further, we have not 

8 

Here we see an example of inconsistent terminology. In section 5.4.1.5 we have been referring to Käki 

(eg., Käki, 2005a), who applies the term categorization to denote an extracted type of automatic 

indexing. In the present section text categorization denotes an assigned type of automatic indexing. 

To avoid confusion we will distinguish between the two by referring to the former as categorization or 

extracted categorization and to the latter as text categorization or assigned categorization.


been able to locate supplementary studies performing more systematic evaluations of 

the system. However, the authors report an estimate savings of introducing the system 

to be approximately $752.000 in the first full year of deployment despite the expenses 

of 6.5 person-years for the development of the system. 

Also the American Petroleum Institute exemplifies the knowledge engineering 

approach. Here, the document collection consists of abstracts containing detailed 

technical information. The units of analysis were abstracts that were subjected to 

stemming and analysis at phrase level due to the large proportion of phrases within the 

chemical domain. The set of rules were to a large extent built around the thesaurus 

applied to the collection. For instance, cross references in the thesaurus functioned as 

rules pointing natural language terms to the preferred terms in the thesaurus (Martinez, 

Lucey & Linder, 1987). As in Hayes & Weinstein’s (1990) study the evaluation of the 

present study is limited. In the paper by Martinez et al. the performance of the 

knowledge engineering approach is evaluated as to percentages of hits and noise. The 

evaluation reports on approximately 50% which cannot be considered impressive. In 

addition noise is reported to comprise about 15% of the retrieved documents. 

The manual and at times rigid elaboration of rules in the knowledge 

engineering approach has turned out to be expensive and time consuming. To illustrate, 

the solution of the American Petroleum Institute contained about 14,000 rules 

(Martinez, Lucey & Linder, 1987, p. 162). In addition the preparation of term profiles 

has been somewhat challenging. Further, deciding on the relation between document 

terms and controlled terms has resulted in weak results in early studies (Apté, Damerau 

& Weiss, 1994; Lancaster, 2003, p. 288). As a consequence of these challenges 

alternative solutions were sought and the machine learning approach emerged during 

the 1990s (Sebastiani, 1999; 2002). In the machine learning approach a classifier is also 

built for each category. Essentially, here the process consists of three stages. First, a 

number of documents are categorized manually into a set of predefined categories. The 

selected documents serve the function of training documents. Preferably, the training 

documents already exist in the collection to be classified. Alternatively, artificial 

documents may be constructed. Next, a classifier is constructed on the basis of 

characteristics of the training documents. A learner forms the basis of building the 

classifier. The learner will usually be available in advance. If a learner does not exist, 

some effort must be put into constructing one since the learner to a large extent decides 

the effectiveness ultimately. It is in this second step of the categorization process that 

the manual production of rules in the knowledge engineering approach is replaced by 

106

107 

Chapter 5 

machine learning. A number of techniques exist for building the classifier. Examples 

count multivariate regression models, nearest neighbour classifiers, probabilistic 

Bayesian models, neural networks, symbolic rule learning, and Support Vector 

Machines (Dumais et al., 1998). Explaining each technique in detail is besides the 

scope of the present work, but thorough reviews can be found in Dietterich (1997), and 

Kotsiantis, Zaharakis & Pintelas (2006). The third and final step of the machine 

learning approach to categorization consists of applying the classifier to the full 

collection of documents (cf. Sebastiani, 2002; Golub, 2006, p. 352-353). 

Text categorization is characterized by being either hard or ranked. Hard text 

categorization basically denotes a fully automated procedure, while ranked text 

categorization contains approval by a human indexer (Sebastiani, 2002). Thus, ranked 

text categorization is basically a semiautomatic approach, which will be explored 

further in section 5.5. 

Machine learning as an approach to categorization has been thoroughly tested 

in different studies (see e.g., Cunningham, Littin & Witten, 1997). The tests have 

investigated a single or several of the techniques mentioned above from a system 

oriented perspective. Core examples count Apté, Damerau & Weiss’ (1994), Chen 

(1995), and Dumais et al. (1998). However, in the present work we are concerned with 

the usefulness of automatic indexing from a user perspective. Therefore, our review of 

studies below will include studies that have incorporated users in their evaluation. The 

aim is to establish a frame of reference for the results found in the search test. 

Some authors have investigated particular government Web pages. The 

GovStat Project (http://www.ils.unc.edu/govstat) has given rise to a number of studies 

relevant to the title of the present section. The project is concerned with a specific kind 

of information within the public domain; US governmental statistical information. 

However, the main project focus is on user access and use of governmental statistical 

information, which is why some of the studies provide valuable insight into the domain 

of e-government. We are presenting three studies from the GovStat project below, 

namely the studies by Efron et al. (2004) and Kules & Shneidermann (2004; 2005). We 

finish this section with a review of Roitblat, Kershaw & Oot (2010). 

Efron et al. (2004) have carried out a study of machine learning within the 

context of the GovStat project. The purpose of the study was to compare three 

representations of documents; keyword, title, and the full text of the documents. The 

study consisted of two phases. The first phase clustered 1279 content rich documents 

using k-means. The clusters were generated on the basis of either the full text of the


documents, the documents’ titles, or on the basis of human generated keyword 

metadata. One document could appear in one cluster each. The purpose was to identify 

the topic of the documents. The quality of the three approaches was evaluated as part of 

the first phase. 10 categories were generated on the basis of phase 1. 

In the second phase, the remainder 14.000 documents of the collection were 

labelled using automatic classification. Ahead of the classification of documents, a 

classifier had been trained on the basis of the topics identified in phase 1. Four models 

formed the basis for the classifier; probabilistic Roccio, naive Bayes (below: limited 

model), support vector machines, and an augmented model, that applied naive Bayes on 

a training set extended with supplementary documents from the www domain in 

question (below: extended model). Based on an analysis of the accuracy of the four 

models’ classification, the second phase compared the two versions of the naive Bayes 

classifier. 11 human judges working on the GovStat project tested the generality of the 

two remaining classifiers. 

The analysis demonstrated that if the success of the classifiers were measured 

by their ability to classify documents correct in either first or second place, the extended 

model performed better than the simple model. However, if the success is measured by 

the two models’ ability to classify documents correct the first time, the limited model 

performs better. Further, when compared to human assignments to the classes, the 

naive Bayes tends to have a more even distribution of documents between the classes. 

5.5 Hybrid types of intellectual and automatic indexing 

Above we have been presenting what is considered prototypical approaches to 

automatic indexing. In reality however, also examples of hybrid forms of indexing 

appear. Computer assisted manual indexing refers to the process, where elements of 

manual and automatic methods are combined in order to handle indexing. In the 

literature, computer assisted manual indexing may also be referred to as computer aided 

indexing (e.g Lancaster, 2003), semiautomatic indexing (e.g., Fangmeyer, 1974), 

machine aided indexing (e.g., Milstead, 1992), or simply MAI. 

Two basic approaches to computer assisted manual indexing exist. One 

approach is labelled candidate term systems. In essence, candidate term systems 

suggest terms for assignment that are subsequently approved by human indexers 

(Milstead, 1994, p. 579, Lancaster, 2003, p. 292). This kind of MAI is represented in a 

system for indexing at NASA (the NASA Lexical Dictionary (NLD)) (Silvester, 

108

109 

Chapter 5 

Genuardi & Klingbiel, 1994). In NLD an automatic indexing procedure is carried out, 

which subsequently presents controlled index terms to indexers for manual approval. 

The indexing system was tested for the effect on the indexing process of human 

indexers. One result of the implementation of MAI was, that the average number of 

index terms assigned to documents had been reduced, resulting in an increased 

uniformity in the indexing. Also, a predominant part of the indexers were able to save 

time due to the suggestions for index terms provided by the system. This corresponds 

to the results of a later study made by Berrios, Cucina & Fagan (2002). They found that 

the number of indexed documents increased along with the degree of automation in the 

test system. Lastly, Silvester & Klingbiel’s work indicated that the selection of index 

terms had become more qualitative since the indexers did not need to spend their time 

looking up terms in the controlled indexing language (1993). 

The second approach supplements human indexing by means of some sort of 

automatic procedure. Here, indexing terms assigned by humans (or similar human 

inputs) are taken as point of departure for a subsequent adding of index terms (Milstead, 

1994, p. 579-80, Lancaster, 2003, p. 291). In this sense, the approach corresponds with 

text categorization mentioned above. When text categorization takes the form of 

semiautomatic indexing, a ranked ordering of potential relevant categories is presented 

to the indexer for approval (Sebastiani, 2002). One example of this approach is the 

MedIndEx project presented by Humphrey (1989). The project has been carried out 

within the National Library of Medicine and is based on Medical Subject Headings 

(MeSH) and the literature found in Medline. In MedIndEx a detailed system of 

predefined frames, facts, and rules guide the automatic analyses of documents in the 

system. These tools form the human input to the automatic part of the indexing 

procedure. However, though mentioned as an example of supplementing human 

indexing by Milstead (1994, p. 580), the MedIndEx also shares some characteristics 

with candidate term systems by involving indexers to approve or reject the suggestions 

provided after automatic procedures have been carried out. 

Hodge characterizes MAI along a continuum according to the degree of 

support provided by in indexing aid, ranging from no computer support (basically 

manual indexing) to full automatic indexing (1994). At the lowest level of machine 

support, we find support of clerical activities. Examples are location of index terms and 

entries of terms in machine-readable form. Tools for this type of support comprise 

thesauri and other kinds of controlled indexing languages. Next follows support for 

quality control. The quality control may take different forms. In general this type of


support checks the manual input of indexers ranging from spelling corrections to 

suggestions for candidate preferred terms in the case of invalid terms. The last step of 

the continuum supports intellectual activities regarding the selection of terms as well. 

One way would be to prompt the indexer for index terms, e.g. in relation to other terms 

entered by the indexer. Another type would be reminding the indexer of required 

elements in the indexing process. Basically, semiautomatic indexing can be useful in 

the introductory stages of full automatic indexing for a manual review of suggested 

index terms. Also, the hybrid between manual and automatic indexing can be applied 

with the purpose of enhancing manual indexing. 

5.6 Summary 

The present chapter have presented the concept and process of indexing. What 

we have seen, is a number of ways to characterize indexing. The share of variables 

outlined throughout the chapter stresses a basic premise of empirical evaluations of 

indexing, namely the challenge of controlling variables. 

We have outlined the characteristics of manual and automatic indexing, but 

also hybrid types of indexing. We have seen that irrespective of the indexing carried 

out, pros and cons can be deduced. Automatic and semiautomatic methods for indexing 

have been tested in a variety of settings. The introduction of automatic, extracted 

methods have allowed for an automatic production of controlled indexing, which is 

highly desirable with the amounts of documents produced today. Automatic, extracted 

indexing allows for a controlled indexing cleared of the challenges of consistency in 

manual indexing. 

As it appears from the reviews in the automatic part of the chapter, the web or 

parts of it have been the subject of investigation in many studies. In the search test of 

present Ph.D. study we investigate the applicability of automatic, assigned indexing in a 

particular test setting, namely an intranet. This equals a considerably smaller amount of 

documents compared to the Web. Further, automated approaches have been tested on egovernment 

subfields with promising results. However, we do not have knowledge of 

studies testing the methods in a collection of documents embracing the entire range of 

document types applied in e-government. This will be the aim of the search test that 

follows. 

110

6 Empirical framework 

111 

Chapter 6 

We have previously presented the methodological standpoint of the thesis. In the 

present chapter, we move on to report on the empirical methods applied in the two 

studies constituting the research project, the domain study and the search test. In the 

presentation we follow the sequence of the actual execution of the individual studies. 

This means that we initiate with the domain study including questionnaire and focus 

group interview designs. Next follows the design of the search test. The chapter is 

closed by a section explaining the relation between the collected empirical data and the 

research questions put forward in the introductory chapter. 

6.1 Domain study 

The case study is initiated by a domain study. As explained, the purpose of 

the domain study is to identify and account for the contextual framework of the search 

test as regards the e-government domain. The domain study consists of two separate 

parts; an analytical and an empirical. The analytical part has been reported in the 

information seeking review (Chapter 4). The empirical part comprises two elements; a 

survey questionnaire followed by focus group interviews. To be able to distinguish, 

we will use the term respondent to denote a questionnaire participant and the term 

participant to denote a focus group participant in the remainder of the thesis. 

The aim of combining different types of data collection methods is to be able to 

compensate for inherent individual limitations of the implied methods. Thus, the 

weaknesses and the strengths of methods have an effect on the outcome of the data 

collection. The combination of different research methods in order to explore a research 

problem is commonly known as method triangulation. The order and types of methods 

applied for triangulation may vary. Miles & Huberman have identified four overall 

successions of research methods. The succession may either start out with quantitative 

methods followed by qualitative methods, with qualitative methods followed by 

quantitative methods, or may employ both methods in a parallel manner. The choice of 

succession depends on the purpose of the study carried out (Miles & Huberman, 1994, 

p. 41). In the domain study we followed the first type of succession, quantitative data 

followed by qualitative. This succession helps the researcher gain an overview of


important phenomena in the first part of the data collection (in the present work through 

a questionnaire). The quantitative data collection is subsequently followed up by 

qualitative data collection (in the present work through focus group interviews) to 

provide insight into and understanding of the patterns identified in the quantitative data. 

The survey questionnaire was distributed to a sample of the employees in the 

case organization. The purpose of the questionnaire was to gain insight into the 

distribution of work tasks, information needs and metadata preferences across the case 

organization. Further we wanted to investigate whether there was a dependency 

between work tasks and seeking behaviour, as it could influence on the choice of test 

persons for the search test. A number of advantages and disadvantages are connected 

with questionnaires. Questionnaires are associated with several strengths. As the 

researcher is not present during data collection, the costs are lower compared to other 

methods. Also the analysis of data is less time consuming (Frankfort-Nachmias & 

Nachmias, 1996). This is particularly the case, when using Kalus (see section 6.2.1) for 

data collection, due to the feature in the system allowing the research to extract the 

results of the survey directly into an excel spread sheet, which can subsequently be 

imported to an analysis software, e.g., SPSS. Further, bias may be reduced due to the 

lack of interaction between interviewer and interviewee, and due to the high degree of 

anonymity (Frankfort-Nachmias & Nachmias, 1996). In both cases reduction of bias is 

ascribed to the non-present interviewer. The presence of an interviewer may result in 

bad communication between interviewer and interviewee. Also the skills of an 

interviewer may influence the results, when conducting interviews. The presence of an 

interviewer can also affect the answers of the respondent to become less honest, because 

the respondent’s feeling of anonymity is low. However, questionnaires also have 

weaknesses. In questionnaires it is highly important, that questions are understandable 

to the respondents, since the researcher is not present to explain the meaning of 

questions to the respondents. In this manner the lack of presence of the researcher at the 

same time becomes a strength and a weakness in questionnaires. Thus, in 

questionnaires, the importance of understandable and unambiguous questions cannot be 

emphasized enough. Further, a common problem in questionnaires is low response 

rates (Frankfort-Nachmias & Nachmias, 1996). 

The second and qualitative part of the domain study consisted of seven focus 

group interviews. Focus groups are associated with a number of characteristics 

implying strengths or weaknesses in terms of their function as a tool for data collection. 

A distinctive feature of focus groups is the synergy arising from the interaction between 

112

113 

Chapter 6 

the participants. This is a strength, when resulting in a more thorough discussion than 

can be achieved in an individual interview. On the other hand the presence of other 

participants may cause censoring and conforming with the participants, which is not 

desirable (Carey & Smith, 1994, p. 124). Having more interviewees present at the same 

time further enables the interviewer to ask participants to compare experiences and 

understandings, which again enriches the understanding of the individual in the group 

(Morgan, 1996, p. 139). In quantitative terms, the method provides data in a quick 

manner and at lower costs compared to individual interviews (Walden, 2006, p. 224). 

The combination of these features made us choose focus group interviews over 

individual interviews for the qualitative exploration of the survey results. 

The focal point of the focus group interviews were the results of the 

questionnaire. Thus, we wanted to introduce the participants to the patterns identified in 

the results in order to encourage to elaboration and discussion in the group. In the focus 

group interviews we wanted to present the participants with the questionnaire results in 

order for them to be able to elaborate on and discuss the patterns. In the section to 

follow, we are elaborating on the methods applied for the domain study. 

6.2 Questionnaire design, collection, and analysis 

A questionnaire was used as the quantitative part of the domain study to get an 

overall view of the distribution of employees, work tasks, and seeking behaviour across 

the case organization. The collection of data for the survey lasted for one week took 

place between 11th and 18th December 2008. We kept a rather short time window for 

the investigation, because we hypothesized that most people, if they respond to a 

questionnaire, respond rather quickly, while they still remember having received an 

invitation to participate. As expected, the majority of responses were received within 

the first two days after the launch of the survey. An invitation to participate was 

distributed by mail (see Appendix 3) to a stratified sample of the employees. The email 

explained the background of the investigation. Following a link in the e-mail, the 

respondents were taken to the online survey. After 5 days an e-mail was sent to remind 

the potential respondents of the survey. We settled for one reminder in to avoid 

annoying the respondents (cf. Cook, Heath & Thompson, 2000, p. 831). 

In Chapter 2 we introduced SKATs business model that comprises and 

describes all work tasks handled by the employees, external as well as internal (see 

Appendix 1). The condensed work tasks and the main processes of the business model


have formed the basis for the questionnaire and the recruitment for the focus groups 

respectively. As emphasized earlier, the diversity of tasks handled by SKAT is large as 

regards the topics and the form of the tasks. In response to this, we had a hypothesis, 

that different work tasks might generate differences in the seeking behaviour. To test 

this hypothesis, we used the work tasks from the business model as the focal point in the 

questionnaire. Thus, each respondent answered clarifying questions about work tasks 

relevant to them. The questionnaire consists of two overall parts; one common to all 

respondents identifying background data; and a second part identifying the work tasks 

handled by the respondents. In the second part of the questionnaire, the respondents 

answered a number of questions exploring seeking behaviour triggered from their work 

tasks. We will elaborate further on this below. 

6.2.1 Technique and structure 

The questionnaire was developed using the software Kalus 

(http://www.kalus.dk). The questionnaire mainly consisted of pre-coded (or closed- 

choice) questions, that has a finite range of answers for the respondent to choose from, 

when responding to a question. The questionnaire does also contain a few examples of 

open-ended questions. Open-ended questions were included in order to allow for 

clarification or supplement of a prior pre-coded question. The strength of using precoded 

questions is that responses are not subject to a potential misinterpretation before 

they can be compared and calculated. This is the main reason for the prevailing role of 

these particular questions in the questionnaire. However, concurrently it should be kept 

in mind that a common problem with this type of questions is the missing possibilities 

for respondents to elaborate on their answers (Buckingham & Saunders, 2004; de Vaus, 

2002b). The choice of primarily pre-coded questions and mandatory answers were 

closely tied to the purpose of the questionnaire and it’s relation to the overall research 

questions. Thus, the overall purpose of the questionnaire was to provide an overview of 

the distribution across the organization as to work tasks and information seeking. The 

more detailed elaboration and explaining of questionnaire results were to be 

investigated in the focus groups. With this in mind, it was reasonable to support the 

overview function of the questionnaire by mainly pre-coded questions and prompted 

answers. When applying this approach for a questionnaire, the pilot testing become 

even more important (de Vaus, 2002b). Further, research show that the wording of 

questions may have an impact on the outcome of surveys (e.g., Olsen, 1997). Also the 

introduction to a question affects how a question is answered (Clark & Schober, 1992). 

114

115 

Chapter 6 

When designing the questionnaire, we wanted to take into account the considerable 

sensitivity towards use of language that respondents have. One way of doing this is to 

aim at making the questions as precise as possible, for instance by using probes or 

incorporating cognitive reliefs into the questions (Olsen, 1997, p. 300). What we 

wanted to achieve was to reduce the degree of uncertainty by elaborating on the 

questions and the possibilities for replies form the respondents. The wording of the 

questionnaire was tested ahead of the data collection. We elaborate on the pilot and 

pretesting in section 6.2.4. 

Contingency questions were used to direct respondents to questions relevant to 

them (de Vaus, 2002b). In the questionnaire, all questions about work tasks worked as 

contingency questions in order to guide the respondents to the questions relevant to the 

work task in question. The purpose was to make sure respondents only reported seeking 

behaviour regarding their actual work tasks. Further, contingency questions increases 

the likeliness of the respondent to finish the survey, as the cognitive complexity is 

reduced (Shropshire, Hawdon & Witte, 2009, p. 356). In most questions we prompted 

for answers. This disposition may be discussed. Optional answers have the advantage 

of not forcing the respondent to respond to a question. At the same time, optional 

answers tend to be skipped, when respondents work through the questionnaire (cf. 

Evans & Mathur, 2005, p. 200). Prompted answers on the other hand may cause, that 

respondents give up answering the questionnaire and do not finish. After careful 

consideration we chose to prompt for answers in order to make sure, that the important 

questions were answered and not avoid having to leave out too many answers due to 

incompleteness. 

6.2.2 Content 

The questionnaire contains six questions for each work task. The six questions 

are replicated for all of the nineteen condensed work tasks included in the questionnaire, 

resulting in a 75 pages questionnaire. Due to the contingency character of the questions 

regarding work tasks, not all pages were presented to the respondents. Before getting to 

the point of elaborating on the work tasks, the respondents were asked about a number 

of background data. The questionnaire finished by thanking the respondents for their 

participation, time and contribution.


6.2.2.1 Background data 

The questionnaire was initiated by a number of demographic questions. We 

refer to these as background data. The questions count the respondents’: 

year of birth (1a), 

gender (1b), 

most recent finished education (2a), 

title of education (2b), 

place of employment (3a), 

departmental affiliation (4-10), and 

length of service in the organization (11) 9 

The purpose of the questions is to enable testing for the impact of demographic 

characteristics on seeking behaviour. Further, the background data is needed in order to 

control for the degree to which the sample reflects the population, it has been drawn 

from. 

6.2.2.2 Work tasks 

Ahead of the design of the questionnaire, we hypothesized that seeking 

behaviour could differ as to the work task in question, both when it come the subject 

and in particular the complexity of the work task. The assumptions were based on 

Byström’s findings of the correspondence between work task complexity and seeking 

behaviour (e.g., reported in Byström & Järvelin, 1995; Byström, 1997) (see section 

4.4.5). The generic work tasks described by SKAT do not address the complexity in 

Byström’s terms as such. Rather they are topical descriptions of the areas of 

responsibility of the organization. Despite the difference of definitions, SKATs 

descriptions of work tasks were used as the basic foundation of the questionnaire. 

Further, in combination with the information need types (see section 6.2.2.3) we do 

become an impression of the complexity of the work task. This decision had several 

reasons. The large diversity of tasks which have been mentioned previously is a core 

characteristic of the organization. Identifying information seeking characteristics on the 

basis of work tasks allowed for data that could inform us about potential (and expected) 

differences in seeking behaviour among the work tasks. We needed this knowledge for 

two purposes. The main purpose was to be able to answer the research questions 

9 The parentheses refers to the question numbers (see Appendix 4). 

116

117 

Chapter 6 

concerned with seeking behaviour. Secondly, we wanted to use the data for the 

selection of test persons for the search test. For this secondary purpose we wanted to 

explore the use of the intranet for different work purposes. Specifically we wanted to 

identify potential variations in the intensity of use of the intranet. Lastly, the work task 

descriptions allowed for a standardized framework of the work areas covered by SKAT 

in a language familiar to the respondents. The work tasks are represented on pages 12, 

16, 38, 45, 49, and 65 in the questionnaire (see Appendix 4). In sum, 19 work tasks are 

included distributed among six main processes. Probes were considered particularly 

important in the questions regarding work tasks. Thus, since the selection or 

deselection of work tasks is highly influential on the results, it was important, that the 

respondents were able to identify their work tasks in the generic descriptions. For that 

reason we used the probe to give examples of the subtasks contained in the overall work 

task. 

6.2.2.3 Elaboration of work tasks 

The work tasks were elaborated on through six questions. The first, frequency 

of work tasks, were considered relevant in order to explore the relation between work 

tasks and information seeking (see question 15, Appendix 4). For the case organization, 

this question was particularly relevant due to the share of work tasks that are highly 

seasonal. The frequency also allowed insight into to the relative importance of the work 

task in question, and thereby enables to examine, whether the frequency affects other 

aspects of seeking behaviour. Next followed the respondents’ experience with the work 

task (see question 16, Appendix 4). The question was included since this was expected 

to have an influence on their seeking behaviour, e.g., as to selection of sources and 

frequency of information seeking. 

The third question regarded the frequency of information seeking (see question 

17, Appendix 4). The rationale for asking this question was that some of the work tasks 

might have a tendency to generate information seeking more often than others. Asking 

this question we wanted to explore, whether there was a divergence between how 

information demanding the outlined work tasks were. We identified the categories of 

choice as to their frequency (every time, every second time, every 3 rd or 4 th time, or 

practically never) instead of using less exact alternatives like almost always, sometimes, 

once in a while, and the like. We are aware, that information seeking in practice does 

not occur on such fixed intervals as suggested in the listed answer categories, which 

may have confused the respondents. Further, the responses were expected to express 

average frequencies. On the other hand, we considered the alternative (e.g., often,


rarely etc.) as too semantically open. The challenge of semantically open categories is 

the interpretation of results, which become less exact. 

Fourth came information sources (see question 18, Appendix 4). The selection 

of information sources reflects aspects of how a work task is dealt with. This was the 

main reason for including the question in the questionnaire. Further, the question about 

information sources had the purpose of identifying the relative importance of the 

intranet compared to other information sources. Finally, with this question, we wanted 

to identify, if there was a difference in the importance of the intranet depending on the 

work task in question. This could point to particular work tasks of relevance, when 

identifying test persons for the search test. Being aware, that the work tasks handled in 

the case organization are quite diverse, we listed some information sources but also 

allowed for the respondents to add missing sources in an open field. This way we were 

able to get a comprehensive picture of the use of information sources, without having to 

list too many sources that might not be relevant to the majority of respondents. The 

question was constructed in a way that allowed for the respondents to choose the 

sources relevant to them, whether one or more. The question was measured in terms of 

dichotomous variables since it enables us to compare the results with prior results. In 

the light of the changes of direction in information seeking studies mentioned in section 

4.2, one could dispute the relevance of the information sources questions in the 

questionnaire. On the other hand, seeking studies that include sources of information 

continue to find their legitimacy (e.g., Davies, 2007; Makri, Blandford & Cox, 2008a; 

Connaway, Dickey & Radford, 2011; Lu & Yuan, 2011). In addition, also more recent 

studies and models of information seeking involves the aspect of information sources, 

yet with another focal point (e.g., Byström & Järvelin, 1995; Byström, 1999). In the 

present study, investigating the use of sources had one significant reason; we wanted to 

map the intranet of the case organization, since it is the object of the evaluation of 

indexing methods later in the thesis. By mapping the intranet along with other 

information sources, we wanted to display the relative importance of the intranet as to 

its function, strength, and weaknesses as experienced by the organization employees. A 

side effect of the question was the possibility of mirroring the scenery for information 

seeking in the organization. 

The fifth question measured the information needs that emerge when dealing 

with a work task (see question 19, Appendix 4). With the variable ‘information need’, 

we wanted to discover, if the identified work tasks of the organisation have a tendency 

to generate certain types of information needs. However, it may be complicated 

118

119 

Chapter 6 

representing the variable by the theoretical concepts themselves. This is particularly 

difficult when considering the problem of respondents’ sensitivity towards the 

formulation of questions discussed above. Therefore we represented the information 

needs with eight indicators of different information needs. The rationale is that it is 

easier for the respondents to relate to an indicator compared to a theoretical concept. 

The decision about which theoretical basis to use for operationalization of the 

information needs was highly influenced by the method selected for data collection. An 

obvious choice would have been to use the recent proposal for types of information 

needs suggested by Ingwersen & Järvelin (2005). However the proposal contains eight 

different types of information needs, which would be difficult to operationalize in a 

form understandable to the respondents. Instead we used the trichotomy suggested by 

Ingwersen (1992, pp. 116-117). Here the suggested information needs are: 1) 

verificative needs (VN), 2) conscious topical needs (CTN), and 3) muddled topical 

needs (MTN). 

Indicator Description Corresponding 

information need 

1 I know exactly which documents I need in order to 

solve the work task 

VN 

2 I need to find a document I have used before VN 

3 I pretty much know which documents exists on the CTN 

subject 

4 I am working with a new project within a subject area 

well known to me. I would like to acquaint myself 

with the part, that is new to me 

MTN 

5 I am looking for documents for a new work task 

within a subject area that is familiar to me 

6 I am working with a subject area, that I have not been 

working with before 

7 I know the subject well but need a specific piece of 


CTN 

MTN 

CTN 

Table 6.1 Indicators of information needs in questionnaire and corresponding theoretical 

descriptions


When a user is having a verificative information need, he wants to locate an 

item or piece of information, where some kind of bibliographic information is known. 

Conscious topical needs cover information needs, in which the user wants to discover 

aspects of a subject matter known to her. Both verificative and conscious topical needs 

are associated with strong cognitive structures. That is, the uncertainty of the 

information user is low. The muddled topical needs cover a situation, where a user 

wants to discover concepts and relations within a subject area not well known to him. 

In this latter type of information need the cognitive structures are weaker as to the topic 

in question. Ingwersen’s (1992) trichotomy of information needs allowed for each of 

the distinct information needs to be represented by more than one indicator and at the 

same time not overloading the respondents with statements to relate to. Between 2 and 

three indicators were developed to represent each information need. We restricted the 

number of indicators due to the possible length of the questionnaire in line with de 

Vaus’ (2002b) directions. The indicators used in the questionnaire are shown in Table 

6.1. 

The sixth and last question concerned preferred metadata (see question 20, 

Appendix 4). The overall rationale for asking the respondents about preferred metadata 

Table 6.2 List of respondents' preferred metadata listed in questionnaire 

Metadata 

1 Target audience (e.g. accountants, employers, divorced, exporters) 

2 Superior subjects (from the taxonomy) 

3 Subject (description of the specific topic of the document) 

4 Name/title of legal text/ruling (e.g. LBK no. 931 as of 18/09/2008) 

5 Object of the document (e.g. car, property, stays abroad) 

6 Activity (e.g. deposits, assessments, billing, imports) 

7 Geographical data (e.g. name of city, country, region) 

8 Responsible institution or department (who published the document?) 

9 Project (is the document connected to a specific project?) 

10 Document type (e.g. ruling, form, guidance) 

11 Document number (e.g., journal number, number of rulings, ISBN) 

12 Document ID (continuous number attached to documents at the intranet) 

13 Work task (searching for colleagues engaged in a particular service or task, 

regardless of location) 

120

121 

Chapter 6 

was to investigate the elements of an ideal search situation. Further, we wanted to use 

the results of this question to encourage the participants to explain their present 

searching behaviour when presenting the results to the focus groups. The question was 

designed as a list of 13 different metadata that the respondents could choose from. The 

metadata represented both intrinsic and extrinsic metadata. That is, whether the 

metadata can be found directly or indirectly in the document, or if the metadata 

designates something external, but still relevant to the understanding of the document. 

Metadata are usually divided into three types depending on, whether they refer to the 

content, the context, or the structure of the document (Gilliland, 2008). The thirteen 

metadata included were aimed at representing all three types of metadata (included 

metadata appear from Table 6.2). The question finished with the possibility to suggest 

missing metadata in the list. 

6.2.3 Data collection 

At the time of the launch of the questionnaire SKAT had 8679 employees that 

comprise the population of the survey. We distributed the questionnaire to a sample of 

this population. A number of advantages are associated with sampling. One is to 

reduce time and costs when collecting and analysing results. In addition, samples are 

for the most part sufficiently reflecting the population (Zikmund, 2000). In the present 

investigation a sample was preferred over including the population in order to reduce 

the amount of time spent on responding by the employees. The questionnaire was 

distributed to a stratified random sample of the employees within the organization 

(Levy & Lemeshow, 2008). The strata were constructed on the basis of the 

departmental affiliation of the employees. Within each stratum a random sample was 

drawn reflecting the relative size of the departments. The sample size was set to 799. 

In this way the sample was abundant above the amount required for a precision of 

results of less than 5% (cf. Israel, 1992). 

6.2.4 Pilot testing 

In order to reduce the risk of errors (e.g., Buckingham & Saunders, 2004, p. 

84), the questionnaire was pilot- and pretested ahead of initiating the survey. The 

questionnaire was discussed with our contact person at SKAT and presented at a 

research meeting with colleagues at RSLIS. Next, a pilot was carried out among a 

number of SKAT employees. The selection criteria reflected the stratified sample, yet


with fewer participants. The purpose of the pilot test was twofold. Firstly, we wanted 

to get an impression of the recipients’ perceived understanding of the questionnaire and 

allow for feedback to potential ambiguities. Secondly, we wanted an indication of what 

could be an expected response rate in the actual survey. This was needed in order to 

calculate the size of the sample. Further we wanted to test the questionnaire on a group 

of people resembling the ones, that would be answering the final version of the 

questionnaire as recommended by de Vaus (2002b). The pilot questionnaire 

corresponded to the final questionnaire. However, in the pilot questionnaire text boxes 

had been inserted in order to welcome the pilot respondents’ comments for the 

questions. The pilot was distributed to 89 respondents. Of these 29% finished the 

questionnaire (see Appendix 5 for further details on the pilot). The feedback from the 

pilot- and pre-tests was incorporated into the questionnaire before it was distributed to 

the respondents. Corrections comprised adding of options, wording of probes, 

simplification of the layout, and the like. 

6.2.5 Data analysis 

The questionnaire data consisted of scales ranging from categorical to interval 

scale. The categorical data obviously appear when asking about the gender of the 

respondent. However, the questionnaire also contains a number of questions allowing 

the respondents to select one or more predefined answers, e.g., regarding information 

sources (question 18, see Appendix 4). In this case every choice also constitutes a 

categorical variable, implying that the variable either has been selected (=1) or 

deselected (=0) by the respondent. The quantum of categorical data in the data set 

determines the analysis of the questionnaire data. Thus, we used nonparametric 

statistics to analyse the data, since the requirements for using parametric tests are data at 

interval level (Siegel & Castellan, 1988, p. 33). The questionnaire data was analysed 

using descriptive statistics. Inferential statistics were carried out too, but did not 

perform results of adequate significance for report here. 

The descriptive, univariate analysis of the questionnaire data consists of 

frequency distributions as to the respondents and their seeking behaviour. Frequencies 

are usually reported as percentages, because they are easier to read than raw 

frequencies. Further, compared to raw frequencies the comparison of percentages is 

more distinct, because the figures have been normalized. However, the basis of the 

normalization is a division of the frequencies by 100. The smaller the sample is, the 

more impact the single unit gets when reporting results as percentages (Healey, 2007). 

122

123 

Chapter 6 

A predominant part of the work tasks reported in the questionnaire part of the domain 

study has less than 100 answers. In order to avoid comparison of figures in the 

univariate part of the analysis that is not true to the actual responses within the single 

work task, we will report the frequencies for all responses. Yet, raw frequencies are 

difficult to compare across two or more groups that do not have the same quantum of 

responses. With the comparisons of frequencies across work tasks in mind, we also 

report the percentages in the relevant tables. 

Table 6.3 Cross tabulations carried out on the basis of variables in questionnaire data 

Independent 

variables 

Education 

Department 

Length of 

service 

Periodicity of 

occurrence of 

work task 

Experience 

with work task 

Frequency of 


seeking 

Use of 


sources 

Frequency of 


seeking 

Use of 


sources 

Dependent variables 

Indicators of 


needs 

Preferred 

metadata 

Further, whenever relevant, we provide the average percentages in the univariate 

statistics tables. Two rationales lie behind this decision. One reason is that some tables 

are rather comprehensive because of the number of reported values. In these cases the 

average percentages can help gain an overview of the content of the table. Also, 

average percentages can help clarify, if a certain work task differs from the average 

distribution in upper or lower direction. Whenever reported the average percentages are


reported at the bottom of the tables. As for the descriptive, bivariate analyses, we 

carried out cross tabulations of central variables. The exact cross tabulations appear 

from Table 6.3. In the table, the columns represent dependent variables while the rows 

represent independent variables. That is, that we controlled for the degree of influence 

on the four dependent variables from the independent variables listed in the rows of the 

table. The results are reported in Chapter 7. 

6.2.6 Methodical reflections 

340 respondents completed the questionnaire resulting in a response rate of 

42,6%. 302 respondents (37,8%) did not log into the questionnaire at all. However, of 

the 799 employees, that constituted the sample, 156 (19.5%) began the questionnaire, 

but did not finish (see Appendix 5). In the latter group of respondents, one part is 

interesting in particular. 66 respondents stop responding when finishing the questions 

concerning the work task Inspection. Further, 10.9% of the respondents completing the 

questionnaire, did not choose any work tasks (see Table 7.2). Different motives may be 

detected for non-response in surveys (see e.g., Nakash et al., 2008). We do not know 

what the exact reason for non-response in the present survey. Both internal and external 

motives can be identified. Internal causes could be that the respondents got bored with 

the questionnaire, because the same questions kept being repeated, when more than one 

work task was chosen. This was indicated by some respondents. Another motive could 

be that the respondents could not relate to the description of work tasks, and therefore 

ended up choosing none. An external reason could be that the employees in the 

organization are presented with questionnaires from time to time. It might be the case 

that this particular questionnaire was deselected because the invited employees felt 

overloaded with questionnaire surveys. Whether the reasons are internal or external, the 

amount of respondents quitting the questionnaire before finishing it and the amount of 

respondents not selecting any work tasks stresses the importance of questionnaire 

designs. 

Another methodical challenge to the questionnaire data is caused by the design 

of the questionnaire itself. Thus, a central feature of the questionnaire guides the 

respondents to the specific work tasks, they are carrying out. This allows insight into 

the characteristics of specific work tasks in the organization. Concurrently, however, 

this particular feature at the same time has had the effect, that some work tasks received 

very few answers (see Table 7.3). This has had the consequence that the reliability of 

work tasks with few respondents must be considered. We will report frequencies and 

124

125 

Chapter 6 

percentages in regards to the univariate statistics, but will be precautious with the results 

from work tasks with few respondents. 

Despite the methodical challenges, the data report answers from 340 people 

regarding their seeking behaviour. The respondents represent a stratified sample of 

approximately 8000 employees, ensuring that many types of employees are represented. 

The purpose of the questionnaire data was to gain an overview of the seeking behaviour 

across work tasks. This purpose has been met by the questionnaire. The subsequent 

focus groups counterbalance for the limitations of the questionnaire. 

6.3 Focus group method 

Focus group interviews were included as the qualitative counterpart in the 

domain study. We refer to the group interviews as focus group interviews in order to 

mirror Morgan’s (1996) definition. He defines focus groups as interviews with a 

composite group of people that are controlled by a moderator while discussing a topic 

defined by the moderator or researcher. The overall purpose of the focus groups was to 

validate and elaborate on the survey results. In the following sections, we will account 

for the research method applied in this part of the data collection. We initiate by 

presenting the data collection as regards purpose and design of the focus groups 

(Section 6.3.1), the questions guiding the focus groups (Section 6.3.2), and the conduct 

and documentation (Section 6.3.3). We finish by explicating the methods used for data 

analysis (Section 6.3.4). 

6.3.1 Purpose and design 

The general intention behind the focus group interviews was to reduce the 

restrictions around them, and allow for the elaborations put forward by the participants, 

since elaborations were just the purpose of the interviews. On the other hand, we aimed 

for a fairly tight form of the focus groups to make sure that all subareas were covered by 

the discussions (cf. Halkier, 2008, pp. 38-41). A slide show and an interview guide 

were applied to retain structure. The slide show was presented to the participants during 

the interview sessions. The intention behind the slide show was to prompt discussions 

of the questionnaire results among the participants and encourage them to explain and 

clarify the underlying information behaviour and meaning of information in their daily 

work. An example of the focus group slideshows appears from Appendix 8. In


addition, a semi structured interview guide was applied to support and guide the group 

discussions. The intention behind the interview guide was not to force the questions on 

the participants. Thus, if the participants had other relevant issues, they wanted to 

discuss in relation to the presented results, they were allowed to. Rather the interview 

guide had the function of supporting the focus group moderator in case discussions 

removed too far from the subject in question, or in case the conversation stalled. In this 

sense the interview guide rather served as a supportive tool to ensure that discussions 

would develop. This also meant that not all questions necessarily needed answers from 

the participants. 

Main process Number of participants 

Settlement Participant 1-6 (6 persons) 

Instruction Participant 7-12 (6 persons) 

Processes of support Participa13-17 (5 persons) 

Customs inspection Participant 18-22 (5 persons) 

Common inspection Participant 23-27 (5 persons) 

Management and development Participant 28-31 (4 persons) 

Collection Participant 32-35 (4 persons) 

Table 6.4 Overview of participants in focus groups 

7 workshops were conducted, each representing one of the main processes of 

the business model of the case organization. One process, Inspection, was represented 

by two workshops, since the two work tasks contained in the main process were 

considered so diverse, that it might affect the outcome, if they had been merged into one 

workshop. The workshops took place in June 2009 in four different locations across 

Denmark (see specifications in Appendix 7. Each workshop lasted approximately 2 

hours and had between 4 and 6 participants and may therefore be characterized as a 

mini group type of group compared to full groups, which usually has between 8 and 10 

participants (Greenbaum, 1993). In total, 35 persons were interviewed. The 

distribution between the main processes appears from Table 6.4. 

Each workshop represents one of the main processes in the business model. 

The recruiting of participants consisted of two steps. Firstly, a number of managers 

were asked by e-mail to identify approximately five participants in their department. 

The managers reported a list of names back that were contacted directly by e-mail 

afterwards. Different locations were used in order to allow for representation of all six 

126

127 

Chapter 6 

main processes of the business model in the workshops. The workshops took place in 

four different physical locations respectively. Collecting the data in different locations 

had the benefit of representing different types of offices. Also different types of 

employees were represented in the focus groups. Thus the participants represented 

employees with an academic background, employees educated within the case 

organization, and employees with a clerical background. 

6.3.2 Data collection: Interview guide 

The interview guide appears in Appendix 9. The function of the interview guide 

was to have a set of questions to bring into play in case the participants had trouble 

discussing the presented slides without triggering questions. The literature suggest, that 

the succession in interviews starts out with general questions followed by more specific 

questions (e.g., Stewart, Shamdasani & Rook, 2007, p. 61). We decided to apply this 

succession throughout the interview starting out with an introduction to the participants’ 

background. Bloor et al. (2001) recommends, that demographic data are collected 

ahead of the focus group, e.g., by using a short questionnaire. However, we found, that 

letting the participants start out by introducing themselves worked well as a way of 

getting everyone into play from the beginning of the interview on a topic comfortable to 

them. This is just the function of opening questions (Krueger, 1998, p. 23). Next the 

second part of the interview followed, concerning the findings of the survey. Here, the 

questionnaire results relevant to the current focus group were introduced in the slide 

show. The questions asked followed the four themes of the questionnaire, namely the 

frequency of information seeking, use of information sources, and developed 

information needs. The focus groups finished with a discussion of preferred metadata 

when seeking information at the intranet. 

6.3.3 Execution and documentation 

The interview guide was accompanied by the slide show as an object for discussion and 

explanation. This meant that the slide show came to serve as probe for the questions 

asked by the interviewer, helping to keep the interview on topic (e.g., Rubin & Rubin, 

2005, p. 164). The work shops were initiated by an introduction to the interviewer, to 

the workshop purpose, and the agenda. Hand-outs of the slide show were distributed to 

the participants to enable them to go back in the slides, if they had additional comments 

later in the interview. Some goodies were offered to the participants in order to show


our appreciation of their efforts. A Dictaphone recorded the interview in preparation for 

documentation purposes. The Dictaphone was started, when the participants started 

introducing themselves. The group interviews ended whenever the participants had 

discussed the slides contained in the slideshow. We finished the session by thanking the 

participants for their time, input, and contributions, and welcomed them to contact us in 

case they recalled topics of relevance after the ending of the interview. 

The interviews were subsequently transferred to the transcription software 

Express Scribe and transcribed. For the transcription, we developed a list of criteria for 

what to include and what to exclude from the transcription (see Appendix 10). Bloor et 

al. (2001) suggest, that all speech are transcribed including passages, where other 

participants agree with a single persons statements. Since we are not performing 

content analysis of the focus groups on other passages than the ones concerning the 

participants’ background, we are not going to be calculating the degree of agreement. 

This is the main reason why we have not transcribed these supporting “mm”s and 

“yeah”s. We finally anonymized the participants’ names before converting the 

interviews into the rtf format required for importing files into atlas.ti. 


The focus groups transcriptions were analysed in two sections. The analysis 

software atlas.ti (version 5.6.2) was used to support the analysis (see Figure 6.1). The 

first analysis concerns the introductory part of the interviews, where the participants 

presented themselves. The purpose was to discover the distribution of the participants 

as to their work tasks, education and length of service. In this introductory analysis we 

were inspired by the principles of content analysis, which is a quantitatively oriented 

type of analysis with the purpose of summarizing a complete set of data or parts of it 

(Neuendorf, 2002; Krippendorff, 2004). In the present analysis, we used the principles 

of content analysis to get a quantitative overview of the distribution of the participants. 

The second part of the analysis concerns the elaboration and validation of the 

survey results in preparation for answering the research questions. This second part of 

the analysis was guided by Halkier’s (2008) three steps in focus group analysis: 1) 

coding; 2) categorization, and 3) conceptualization. In the coding process, passages of 

text are marked up with preliminary labels. Here, categorization designates the process, 

where the initial codes are related to each other, identifying subordinate, superior, and 

co-ordinate codes among the initial codes attached. The categorization can imply a 

128

Figure 6.1 Screen dump from atlas.ti coding of focus group interviews 

129 

Chapter 6 

reduction of the data, when codes are combined into superior categories, but also further 

complication of the data, if codes are expanded and supplemented with more detailed 

sub codes. Identification of relations and contradictions between codes is inherent in 

process of categorization. Finally, conceptualization designates the part of the analysis, 

where the categorization and codes are related to the data, but also the theoretical 

concepts underlying the data, either as to similar studies, theoretical concepts, or other 

empirical parts of the research project. 

We started out by coding the interviews with free codes, corresponding to 

Halkier’s first step coding. Next, we used the function in atlas.ti allowing grouping of 

the initial codes into coding families. Hereby we were able to categorize the codes 

according to Halkier’s second step categorization. The third step, conceptualization, 

were represented by the analysis of the codes and coding families and relating these to 

other studies, to the questionnaire data and to theoretical concepts. Quotes from the 

questionnaire, the focus group interviews and the search test are presented through the 

thesis. The applied quotes have been translated into English, but they appear in their 

original Danish wording in Appendix 11. The results of the seven focus group 

interviews are reported in Chapter 7 along with the questionnaire results.


6.3.5 Limitations 

The focus groups interviews were based on a based on a convenience sample of 

employees. We acknowledge that a random sample perhaps could have been more 

representative of SKAT as such. However, the educational level of the organization 

was reflected in the participants along with the majority of the organization work tasks. 

Further, the focus groups were carried out in four different locations across Denmark in 

order to reflect the geographical distribution of the organizations. 35 people 

participated in 7 focus groups providing valuable insight into their daily information 

seeking patterns. 

6.4 Search test design 

The search test compares full text indexing, an extracted type of automatic indexing, 

and automatic assigned indexing in the form of text categorization. The search test was 

set up as an experimental test. The test took place in June, 2010 in two different office 

locations of SKAT, below location 1 and 2. In accordance with our methodological 

standpoint we asked employees at SKAT to participate in the search test. In the 

remainder of the thesis we will use the term test person to denote a search test 

participant. 

6.4.1 Test system 

The first draft of the search test design was to carry out the test when the 

revised intranet had been implemented, and the employees had had some time to adjust 

to the system. However, the process of implementing the new portal at SKATs pages 

was delayed. This meant that it was not possible to execute the test in the portal 

environment in operation. Instead we used a prototype of the future intranet as our test 

base. At the time of the search test the categorization was still being trained. In 

addition, the prototype had some functional inexpediencies. We explained these to the 

test persons as a part of the introduction to the test system, but nevertheless the course 

of the test was in some cases challenged. In order to avoid changes in the system across 

the single test sessions, the test system was not updated during the search test period. 

From a technical perspective the test system was embedded in a separate test 

environment. The test database was generated in august 2009 and has not been updated 

in the intervening period of time up to the search test in June, 2010. Thus, the newest 

130

131 

Chapter 6 

documents contained in the test base at the time of the search test were from august 

2009. The test base contained a sample of the documents contained in the current 

intranet. The test base contained 188.600 documents that had been randomly drawn 

from the intranet. By comparison, at the time of the search test the intranet in use 

contained 681.640 documents. That is, the test base contained approximately 28 % of 

the full version of the intranet. 

As in the intranet in function, the prototype was based on CMS technology. 

Autonomy’s (www.autonomy.com) search software IDOL provided the search 

functionalities of the search interface. The interface is depicted in Figure 6.2. Though 

more fields were available, the test persons solely used the fields “Søgetekst” (Query 

box), “Søgetype” (Search operator), and “Dokumenttype” (Document type) during 

testing. The possibility to specify searches to forms (“Blanket”), information, or selfservice 

(“Selvbetjening”) just below the grey bar (in the middle of the interface, see 

Figure 6.2) was default set to “Information” and was not changed during the test. 

Neither was the default setting of ranking search results as to their relevance. 

The query box was used for entering query terms. The box supported the use 

of quotation marks for phase searches. Search terms entered were automatically 

Figure 6.2 Screen dump of the test system: Search fields


truncated. The search operator field specified how search terms were combined. One 

of four options could be chosen. “Free text” (FT) retrieved documents containing most, 

but not necessarily all, entered search terms. “Pages containing all words” (AW) 

retrieved documents containing all search terms in their exact or truncated form. Thus, 

the operator corresponds to using Boolean “AND” (Large, Tedd & Hartley, 2001, p. 

148 ff.). “This exact sentence” (ES) retrieved documents that contained the search 

terms in the exact form and order entered into the query box. The operator corresponds 

to entering the search terms in quotation marks. By this the system is considering the 

search terms as a single term (Large, Tedd & Hartley, 2001, p. 167 ff.). Lastly, the “At 

least one of the words” (OW) operator retrieved documents containing at least one of the 

entered search terms in truncated form. The operator corresponds to applying Boolean “OR” 

(Large, Tedd & Hartley, 2001, p. 148 ff.). Of the four, the ES operator is the most 

restrictive. Next follows the AW operator. The FT and the OW operators in 

comparison retrieve larger sets of documents. The last field available to the test persons 

was the metadata field “Document type”. The field made it possible to limit search 

results to specific document types. Choice was between 12 different document types in 

a drop down menu. An empty field at the top of the menu was the default setting of the 

menu, which enabled a search with no limitation as to document types. Search results 

were delivered on a list ranked as to the relevance of the documents to the search terms 

entered. For each hit different pieces of information were provided; a document title, a 

snippet highlighting the search terms and the surrounding terms, the document type (cf. 

the document type field mentioned above), and the date of publication. An example of 

a result list appears from Figure 6.3. 

A central feature of IDOL is the ability of automatically categorizing 

documents on the basis of machine learning as described in section 5.4.2. 10 The IDOL 

categorization facilities were applied to categorize the test system search results. The 

taxonomy taken into use on January 1, 2008 (see section 2.4.2) formed the basis of the 

categories that search results are automatically placed into when presented to the end 

users. The categorization training started out in November, 2008. The first step of the 

training consisted of giving each subject in the taxonomy a rough introduction to the 

10 For further elaborations of the IDOL, white papers on the system can be found at www.autonomy.com. 

In addition, Chaudhry (2010) have made a comparison with 11 other similar systems. 

132

Figure 6.3 Screen dump of the test system: Categorization 

133 

Chapter 6 

understanding of the content of that subject. The procedure consisted of selecting 5 

terms representative of the subject. The 5 terms were subsequently used to search the 

test base. The search result was frisked in order to identify candidate documents to 

represent each category. The minimum number of candidate documents in each 

category was been set to a minimum of 20. IDOLs manual recommends between 40-50 

candidate documents. The status of the categorization at the time of the search test was 

as follows: If a document had been manually indexed at the time of import to the test 

database, the manual mark-up of the document decided the placing of the document in 

the portlet. This is the case for documents published after January 1, 2008. However, 

older documents did not have any subject terms attached. For this group of documents 

the placing in the port let was based on the training that IDOL had achieved at the time 

of the search test. 

The categorization appears from Figure 6.3 (the box at the right hand side of 

the result list). The selection of one or more categories took place after a search had 

been carried out and a result existed. On the basis of the retrieved documents, the 

search result was limited as to subjects present in the search results. The categorization 

window just showed the terms from the taxonomy actually containing documents in the


current result set. If several categories were selected on the basis of the same query, the 

first category was not included in the subsequent category choices. 

In the test situation, when the test persons used the test system without 

categorization, the right hand side of the screen was covered in order to avoid, that the 

test persons were affected by the controlled terms from the taxonomy when composing 

queries for the system. In addition the test persons were not tempted to use the 

categorization, when it was not visible to them. The covering of the categorization 

window means that two test systems are produced in methodical sense; one based on 

free text indexing and one based on categorization of search results. Since the free text 

indexing system functions as the baseline for measuring the effect of categorization, we 

refer to this system as System A. Accordingly, the system employing categorization 

will be denoted as System B. 

6.4.2 Test persons 

32 test persons participated in the search test. The test persons were recruited at 

location 1 and location 2. Ingwersen (2000) recommends 40-50 test persons for purely 

quantitative studies, and less for qualitative studies. Since we were carrying out a 

qualitative study, we found 32 people to be satisfying. From the results of the domain 

study we had found that the frequency of intranet use was high in most parts of the 

organization. Therefore, we did not find it necessary to exclude certain work tasks from 

the search test. The choice of the two offices was motivated by the condition that the 

two departments represent the different educational groups of employees identified in 

the domain study questionnaire. 

To locate relevant test persons all employees within the specified offices 

received a web questionnaire. In total the questionnaire was sent to 459 employees. In 

the questionnaire the employees answered questions about their background, work 

tasks, frequency of use of the intranet and frequency of information seeking (Appendix 

4). We refer to this questionnaire as the recruitment questionnaire. Reliability of 

research designs is affected by the consistency of measures, among other things 

(Carmines & Woods, 2005). Keeping the consistency of work tasks consistent between 

the domain study questionnaire and the recruitment questionnaire comprised a special 

challenge. Thus, in the intervening time the business model had changed and another 

merger had taken place in the organization. In order to capture the modified business 

model and still be able to mirror the previous business model we expanded the reporting 

of current work tasks. Yet, the widening was carried out in a way that allowed for the 

134

135 

Chapter 6 

current work tasks to be fit into the work tasks of the previous business model. Like 

with the domain study questionnaire we aimed at reducing the semantic openness of the 

questionnaire by the use of probes (see Section 6.2.1). As for probes regarding the 

work tasks, we used the latest annual report of SKAT as inspiration (SKAT, 2009). 

In our selection of test persons, we emphasized the test persons’ frequency of 

use of the intranet and their general frequency of information seeking. As for frequency 

of use, the most important parameter was that information needs and derived 

information seeking took place more often than “practically never”. 42 people met 

these requirements. Of these, 10 were used as pilot testers. The remainder 32 carried 

out the actual search test. 

6.4.3 Search tasks 

The literature suggests three generic types of search tasks for IR evaluation, 

namely natural, simulated, and assigned search tasks (cf., Vakkari, 2003). We used 

simulated and genuine work tasks for the test evaluation. Based on the 

recommendations put forward by Ingwersen (2000, p. 173), three simulated work tasks 

were carried out. The purpose of employing simulated work tasks was to increase the 

degree of experimental control in the operational evaluation (cf. Borlund, 2000, p. 72, 

2003b). Different recommendations have been given for the development and use of 

simulated work tasks. Among other things, the recommendations comprise that 

simulated work tasks and genuine information needs are employed in the same test, that 

the work tasks are tailored to the information environment and the test persons, and that 

search jobs are permuted (Borlund, 2003b). As can be seen below, these 

recommendations were incorporated into the present test design. 

IR evaluations are frequently carried out with graduate or undergraduate 

students. This is also the case for the empirical use of simulated work tasks (Borlund & 

Schneider, 2010). However, a few studies have applied different versions of simulated 

work tasks on professional users (e.g., Nielsen, 2004; Suomela & Kekäläinen, 2005, 

2006; Wacholder et al., 2007; Blomgren, Vallo & Byström, 2004). Simulated work 

tasks have been employed in a study of professional users by Blomgren, Vallo & 

Byström (2004). On the basis of their study they conclude, that “…composing a 

simulated work task situation that offers a sufficient level of reality for all participants, 

must be done with great care” (Blomgren, Vallo & Byström, 2004, p. 66). Obviously, 

triggering real information needs in a simulated and professional context is challenging, 

not least when participants have different work tasks and backgrounds within the


professional context, which is the case in the present study and in the study by 

Blomgren, Vallo & Byström. In the study by Price et al. (2009), an subject expert 

participates in the development of simulated work tasks in order to ensure wellfunctioning 

tasks. The importance of reality of simulated work tasks is emphasized by 

several authors (e.g., Blomgren, Vallo & Byström, 2004; Borlund, 2000). Different 

aspects may be kept in mind in order to ensure realism. Here we operationalize realism 

as a relevant subject combined with a level of complexity corresponding to the test 

persons’ genuine information needs. 

As regards the subject content of the simulated work tasks we used different 

sources as inspiration. We went through the fields in the domain study questionnaire 

that allowed for open responses. Also the focus group interviews were scanned in order 

to locate ideas for search tasks. Lastly, we consulted web pages communicating 

citizens’ and minor businesses’ questions about taxes for inspiration. To decide on the 

level of complexity of the simulated work tasks, we consulted the results of the domain 

study. The domain study revealed that information needs of low complexity were far 

more common than more complex types. In the questionnaire, the most frequent 

indicators were “I need to find a document I have used before” and “I know the subject 

well but need to find a specific piece of information”. Saracevic et al. (1987, p. 35) 

defines the complexity of search tasks as to the number of concepts contained. Iivonen 

(1995) operationalizes the complexity further by deciding, that simple search tasks 

consists of up to three concepts. Complex search tasks contains above three concepts. 

Taking into account that the employees reported simple information needs as their 

predominant type, we developed simulated work tasks that contained three concepts (or 

search keys) or less. About ten simulated work tasks were developed. We subsequently 

carried out a pilot test in order to find out, how the work tasks worked in the test 

situation. We wanted information about the understandability of the work tasks for the 

test persons, and specifically, if the test persons within a reasonable amount of time 

were able to solve the work tasks. Also we wanted to reduce the number of work tasks. 

The work resulted in three simulated work tasks concerning the sale of an apartment 

(SIM 1), taxation of e-businesses (SIM 2), and tax based issues related to working as a 

freelancer (SIM 3). The latter of the three search tasks contained four search keys (see 

Table 6.7). However, one is a non-topical facet, which is the reason for still considering 

it a simple task in terms of Iivonen. The final simulated work tasks appear from 

Appendix 14. 

136

137 

Chapter 6 

To be able to control for the test persons insight into the controlled search task 

an on screen questionnaire was filled out every time a task had been completed (see 

Appendix 15). We asked the test persons about their insight into the subject of the work 

task, their view on the difficulty of the work task and the resemblance of the work task 

with their usual work tasks. All questions were graded on a 5-point Likert scale. 

In addition to the simulated work task situations, the test persons were asked to 

bring a genuine information need to the test session. A genuine information need serves 

several purposes (Borlund & Schneider, 2010). We consider the most important ones to 

be the function as a baseline for simulated needs and the possibility to gain insight into 

the system’s effect on real information needs. Also, it appears, that genuine information 

needs may get better scores on different performance measures (cf., Blomgren, Vallo & 

Byström, 2004). Specifically, we e-mailed the test persons shortly before their test 

session to ask them to bring a genuine information need. This way, the e-mail served a 

second function; namely as a reminder for the test persons to show up. The exact 

wording of the e-mail appears from Appendix 16. 

The genuine tasks brought by the respondents confirmed the lack of 

uncontrollability in controlled test settings. The tasks were highly varying as to their 

content reflecting mainly specialist matters. Also organisational matters such as the 

annual summer party were represented though. The tasks also included examples that 

could not be solved using the prototype as the information sought was not included in 

the database. In those cases the test persons made up a new task for themselves. The 

character of the tasks corresponded to the simulated search task in terms on the number 

of facets included. Thus, the genuine tasks contained between one and three facets. 

Three examples are listed in Table 6.5. 

Table 6.5 Examples of genuine search tasks 

Search terms Document type Category Search operator Facets 

included 

Ordrenumre Internal - Free text 3 

store selskaber information 

Bødetakster - Penalty Free text 2 

Skattekvittance - - Free text 1


6.4.4 Test procedure 

The test procedure consisted of three parts; 1) an introduction to the session, 2) the 

search part, where the test persons searched the two systems using the search tasks and 

evaluated retrieved documents, and 3) a post search interview. 

The introduction to the test session consisted of different elements. Firstly, the 

guidelines for performing the search tasks were carried out. Next, the test system was 

introduced to the test persons. Due to time constraints the introduction did not include 

time for the test persons to try out the system. The presentation included the 

characteristics of the system as to search possibilities and the shortcomings in the 

prototype. The elements contained in the introduction are listed in Appendix 17. The 

introduction was closed by informing and ensuring the test person of their anonymity of 

the test (Kvale & Brinkmann, 2009, p. 63 ff.). 

The test persons searched using 4 search tasks; 3 simulated and one genuine. 

The tasks were rotated as to their succession and the succession of the test systems 

(System A and B) in order to control for order effects on the test results (cf. Kelly, 

2009) and to meet the recommendations put forward by Borlund as to the use of 

simulated search tasks (2003b). The missing try-out of the system even further 

necessitated the rotation of work tasks. The rotations applied appear from Appendix 18. 

Also appearing from the rotation appendix is that the rotations also addressed the 

succession of test systems. When searching in System B, it was mandatory that the test 

persons made use of the categorization menu in the right hand side of the screen. This 

was necessary since the only indication of the categorization in the system is visible 

here. Thus, search results are not presented according to the inherent categorization. 

This decision also means that searches omitting categorization when it should have been 

applied was removed from the results. Whenever a task was completed (or resigned 

from), a short questionnaire was completed on the screen. 

The documents were evaluated on the basis of the title and snippets included in 

the result lists. The main reason for this was that it removes the snippet-document 

relationship as a variable in the results (cf. Turpin et al., 2009; He et al., 2010) and 

allows for comparison with corresponding studies (e.g., Käki & Aula, 2005). Further, 

the prototype had trouble connecting from links in the result lists for certain document 

types. The test persons were asked to assess the relevance of documents as to the work 

task in question, that is, situational relevance (cf. Figure 6.4). 

The relevance of search results was noted when the result lists were shown to 

138

W 

CW 

assessor/user 

N 

SR 

P 

r/q 

Real world 

IT 

A 

O-O n 

Collection 

of objects 

139 

Chapter 6 

Legend: 

: Assessor’s / user’s cognitive 

space 

W : Work task situation 

CW : Cognitive perceptionof W 

SR : Situational relevance 

P : Pertinence relevance 

IT : Intellectual topicality 

A : Algorithmic relevance 

N : Information need 

r/q : request/query version 

O : retrieved informationobject(s) 

: Relevance assessment(s) 

or interpretation (s) 

: Transformation 

: IR system 

Figure 6.4 Relevance types in IR evaluation adapted from Borlund (2003a, p. 915). 

the test persons. This way we received the immediate evaluation of the document while 

the test person remembered the document. After the search part of the test, a short post 

search interview was conducted. The purpose of the interview was to make the test 

persons sum up and reflect on their overall impressions of the test system, on their 

present use of the intranet, and how categorization could be useful in their daily work. 

Due to time constraints the interview guide was kept rather short. The interview guide 

appears from Appendix 19. 

During the test the test manager was present in the room. There were several 

purposes for this. One was that the test persons could be observed during their searches. 

This enabled the possibility to ask the test persons to elaborate on specific moves in the 

subsequent interview. Further, the schedule did not leave time for the test persons to get 

acquainted with the test system before the test started. By letting the test manager be 

present during the session, the test persons had the possibility to ask clarifying questions 

during the session. At the closure of the session, the test persons received a minor 

acknowledgement for their involvement. 

Physically the test took place at location 1 and 2. At location 1 a test room was 

available for the conduct of the test. The test room had a stationary machine for the test 

persons to use and a laptop for the test manager. Morae was installed on both machines. 

The Morae Observer module monitored the test persons’ actions on the laptop screen


for the test manager to follow. The monitoring was not kept secret to the test persons, 

since the purpose of using it was to avoid physically having to look the test persons over 

their shoulders. At location 2 a test room was not available. Therefore we brought a 

laptop with Morae installed to enable logging. During the tests at location 2 the test 

manager was obliged to follow the test persons’ moves on the test machine. The 

predominant part of the tests was carried out at location 1. 

6.4.5 Pilot test 

Pilot tests were carried out at several stages of the process ahead of the launch of the 

search test. Specifically, the recruitment questionnaire, the simulated work tasks, and 

the test procedure were tested ahead of the actual collection of data. 

The recruitment questionnaire was pretested by a number of colleagues at the 

RSLIS. Further, a number of employees at SKAT pilot tested the questionnaire. Since 

the recruitment questionnaire had quite some resemblances with the domain study 

questionnaire, we could to a certain extent rely on the methodical experiences gained 

here. However, the changes in the business model necessitated a pilot to ensure that the 

modified work tasks were understandable to the recruitment respondents. The 

questionnaire was adjusted according to the feedback from both RSLIS colleagues and 

SKAT employees. 

The search task pretest also contained different elements. We have already 

mentioned the pretest with the purpose of identifying the most relevant work tasks and 

reducing the total number of work tasks. In addition, we tested the work tasks in the 

test system. Thus, in advance of the pilot of the search tasks among employees at 

SKAT, the search tasks were tested for their relevance to the test system. Thus, we 

tested if the outputs of the search tasks were suitable with the purpose of the search 

tasks. We wanted to find out whether the number documents that would match the 

requests were sufficient. In their principles for search result visualization Kules & 

Shneiderman (Kules & Shneiderman, 2004, p. 2) suggest that 100-1000 results are 

needed as a minimum for an adequate basis of a categorized overview. However, their 

principles are based on the web, where the number of documents by far outnumbers our 

test collection. Kules & Shneiderman do follow the principle with the reservation that 

the optimal number of results depends on many factors such as task domain, and 

document quality. Due to the size of the test collection we have had to aim at a lower 

number of results. Instead we have emphasized the availability of highly relevant 

140

141 

Chapter 6 

documents to match the simulated work tasks in our final choice of tasks. 11 work tasks 

were tested and of these 3 were picked out for the search test. 

Also the test situation as such was pilot tested. We needed information about 

how to handle practical matters such as how to document the searches, which 

succession of test elements to follow, and how to carry out the evaluations of work tasks 

and search results. Also we wanted an approximate estimation of the duration of a test 

session. The pilot tests provided very useful insight into these matters and the test 

design was corrected according to the experiences gained in the pilot tests. In actual 

practice the simulated search tasks and the test procedure were pilot tested 

simultaneously. We let the first test persons recruited by the recruitment questionnaire 

function as pilot testers and continued to pilot until the test design was suitable for data 

collection. In total 10 pilot testers participated. 

6.4.6 Techniques for data collection and preparation 

During the course of the search test different methods for data collection were used in 

order to allow for elaboration of the search process. The test persons’ interaction with 

the test system was logged using the software Morae (see 

http://www.techsmith.com/morae.asp). Morae facilitate logging of key and screen 

activity. Both options were applied for documentation of the test, though we are 

primarily using the key log for analysis. Search (or transaction) logs have been widely 

used in order to document and analyze interactions with retrieval systems and searching 

behavior. The most significant strength of search logs as to the present test setup is the 

unobtrusiveness of the method (Jansen, 2006, p. 424-425). However the data delivered 

are descriptive (Jansen & Pooch, 2001, p. 242). That implies that the search log data 

should not stand alone, if we want to explain and understand the interaction between 

system and user. For that reason the log data were supplemented with qualitative data 

in order to compensate for the limitations of the search log as a research tool. 

Participant observation also took place during the test procedure (cf. Ely, 1991, 

p. 41 ff.). As the test manager was present during the test, observations were made in 

order to capture moves, comments, modifications, and other acts of relevance to the 

search test. The observations are not reported here independently. Rather, the purpose 

of the observation was to qualify the post search interview and enable the test manager 

to ask the test persons specifically about their interaction with the system. 

Interviews, both oral and in questionnaire form, were carried out along the 

course of the search test. The recruitment questionnaire provided background data on


the test persons’ demographic data, seeking behavior, and the like. During the search 

test the simulated work tasks were assessed as to the test persons’ knowledge of the 

subject, their perception of the degree of difficulty, and the extent of similarity with 

their genuine work tasks. Lastly, after the test persons had carried out the search tasks, 

a post interview were carried out. The purpose of the interview was to ask follow up 

questions in order to get a more comprehensive picture of the search situation. For 

documentation purposes a Dictaphone was set to record the search test and the post 

interview. It was decided to record the full event in case the test persons gave 

comments during searching that would be of value to our understanding of their 

interaction with the test system. Further, using the Dictaphone reduced the need for 

note taking during searching and allowed for the test manager to focus on the test 

persons and their actions. The recorded sequences were transcribed subsequently. 

The last type of documentation comprises the relevance assessments made 

during searching. Relevance was captured along two dimensions; the degree of 

relevance and the criteria applied for the assessment (Borlund, 2003a). The more 

systematical of the two were the measurement of the degree of relevance. We have 

already mentioned that relevance assessments took its point of departure in situational 

relevance. The degrees of situational relevance of the documents retrieved were 

measured on a 4-point scale. We followed Sormunens (2002) four point scale since it 

allows for a distinction between the two categories of partial relevance into relevant and 

useful and relevant and potential useless (Sormunen, 2002, p. 329). In order to reflect 

this distinction we followed Sormunens description of the respective degrees in our 

explanation to the test persons. In addition, we asked the test persons about the 

motivations for their assessments, i.e. the relevance criteria applied. The purpose of 

including relevance criteria was not to make a systematic investigation of relevance 

criteria. Rather, the criteria were included as a tool to encourage the test persons to 

explain the assessments given. The questions appear from the post search interview 

guide, though asked in connection with relevance assessments. 


The data collected consisted of 1) background data (from the recruitment 

questionnaire), 2) interview transcriptions (from the search sessions and the post search 

interview), 3) search logs, 4) relevance assessments, and 5) assessments of the 

simulated search tasks. Background data and assessments of tasks were analysed using 

descriptive statistics in SPSS. As the data were used to gain an insight into the 

142

143 

Chapter 6 

characteristics of the test persons and the appropriateness of the search tasks, we did not 

find reason to expand this part of the analysis further. Again the recordings from the 

search test were transcribed to facilitate structured analysis. In the present case the 

transcription was carried out by an external transcriber. The procedure is clarified in 

Appendix 10. Like with the focus group interviews we used atlas.ti for analysis and 

followed Halkier’s (2008) three steps for analysis of qualitative data. 

The search log registered search time and keys applied. From the screen video 

recorded during the searches, we manually drew number of hits retrieved, selection of 

subject categories, use of information filters and search types. All were registered in 

SPSS for the analysis purposes. Lastly the relevance assessments of documents were 

typed into SPSS. This work resulted in the identification of a number of variables listed 

in Table 6.6. 

At query level we measured the number of terms applied, search keys applied, the 

search operators and document type specifications used, the number of hits, the success 

of queries, and the type of reformulations undertaken. The number of search terms is 

included to provide information about the number of terms needed in order to achieve a 

satisfying number of results. Search keys provide knowledge about the number of 

search task facets covered in queries. The facets identified in Table 6.7 (the outer right 

column) forms the basis of interpretation of the queries. All elements of a query could 

add to the facets; query terms, document types, and categories (the latter only in system 

B queries). When a category was included in a query, it was counted as one concept no 

matter the number of terms describing the category. The variable is included for several 

reasons. One reason is to be able to identify the average number of terms used to 

represent search keys. This informs about which search keys are considered more 

important to the test persons, but also the level of detail in the representation of search 

keys. The other reason is the option of identifying the optimal number of search keys 

for obtaining a useful search result. 

Search operators and the use of the document type filter express, how searchers 

combine their search terms, and whether they aim for a narrow or a broad search result. 

The number of hits is another indicator of the success of a search. Thus, a set of results 

can be very small (e.g. 0 hits) or very large (e.g. 50.000 hits). Including the number of 

hits further enables to compare the quantitative output of different queries. The test 

system provided an approximate count of the number of results. In very small sets it


Table 6.6 Search test variables, their definition and measurement 

Variable Definition Measurement 

Query level 

Terms per query Number of words separated by a single Average number of 

spacing. Dashes were not counted as terms per query 

single terms. Terms connected with a 

dash (eg.” e-handel”) were counted as 

one term. 

Search keys per Number of search keys applied in queries Average number of 

query 

search keys per query 

Use of search The search operator chosen for a specific Distribution of queries 

operators in query 

using each of the four 

queries 

search types in 

Use of the filter 

percentages 

The DT filter chosen (if any) for a Average number of 

“Document type” specific query 

queries using the DT 

(DT) in queries 

filter in percentages 

Number of hits in The number of hits retrieved in queries. Average number of 

queries 

hits retrieved 

Query success Queries retrieving at least one document Percentage of 

with a relevance score of 2 or 3 are successful queries 

considered successful 

Type of Reformulations in queries. Registered as Percentage of 

reformulations the change from the past to the present reformulations in 

query. Registered types count: Category, 

query terms, document type, search 

operator, and a combination of the 

above. 

queries 

144

Number of sessions with 

reformulations 

Number of reformulations 

per session 

Session level 

Number of sessions containing 

more than one query. 

Reformulations comprise changes 

of queries, search type (or 

categories in system B), or 

document type. 

Number of times a query have 

been reformulated in a session 

Session success Sessions containing at least one 

successful query are considered 

successful 

Test persons’ assessment Measured on a scale from 1-5, 

of their insight into the where 

simulated search tasks 1=No insight, and 5=Great insight 


of simulated search tasks’ where 

level of difficulty 1=Very easy, and 5=Very difficult 


of the resemblance where 

between the simulated 1=No resemblance, and 5=Great 

search task and their daily 

work tasks 

resemblance 

145 

Chapter 6 

Percentage of 

sessions 


with 

Average number of 

reformulations per 

session 

Average number of 

sessions solved 

Average 

insight 

score on 

Average score on the 

level of difficulty 

Average score on 

resemblance between 

search task and daily 

work tasks 

could be verified, that the count was approximated, as the actual number of results 

sometimes differed slightly from the informed count. To give equal conditions to small 

and large retrieval sets, the number of search results summarized by the system was 

registered as the result for all searches. Lastly, the type of reformulations was included. 

Query reformulations (or modifications) designate the actions taken by searchers in 

order to adjust an inadequate search result. For that reason reformulations are highly 

informative as to users’ interaction with an IR system. Huang & Efthimiadis (2009, p. 

79) have suggested a taxonomy of reformulations that reflect modifications of search 

terms alone. With the present identification of reformulations we wanted reflect the 

changes made in all fields of the search interface including the categorization window.


Overall, we may term these variables as interaction variables (cf. Kelly, 2009, p. 105 

ff.). A related, and very common variable to include in this type of studies, is the search 

time applied. Search time was excluded from the present data, as interaction with the 

Table 6.7 Simulated search task facets 

Search 

task 

Description Facets 

Sim1 Selling apartment purchased by parents for their 

children. Can the parent get a tax relief for expenses 

concerning the estate agent, repairs, and the loss 

gained, when the apartment was sold? 

Find documents outlining the fiscal conditions 

concerning apartments purchased by parents for 

their children. 

Sim2 Taxation of e-commerce: An owner-managed one 

man publishing house wants to sell books online in 

the United States and other countries. The 

permanent establishment is in Denmark. How is 

the owner taxed on his earnings? 

Find documents outlining, how e-commerce with 

permanent establishment in Denmark is taxed. 

Sim3 Freelance work: A freelance teacher is about to 

expand his activities, which will make him earn 

about 100.000 DKR per year. Now he is not sure, 

whether he can continue as a salaried worker, or if 

he must start his own business and become 

registered for VAT. 

Find documents outlining the rules for when to 

become registered for VAT. 

146 

Topical facets: 

- Business activity: Parents’ 

purchase 

- Taxation: Tax relief 

Non-topical facets: 

- Information type: Legal 

guidances, citizen booklets, 

legislation 


- Business activity: 

- E-commerce 

- Taxation: Permanent 

establishment DK, foreign 

income 



guidances, business 

guidances, legislation 


- Business format: Freelance 

- Business activity: Teaching 

- Taxation: VAT registering 



guidances, business 

guidances, legislation

147 

Chapter 6 

observer took place in many search sessions and affected the time spent. As a result, 

search time would not have been a valid variable in the present data set. 

Performance is another prevalent variable type in IR evaluation studies (cf. 

Kelly, 2009, 106 ff.). Commonly established performance measures are used to 

quantify and compare the performance of IR systems. We have already mentioned 

precision and recall (section 5.2.4). Other examples count the discounted cumulative 

gain (DCG), a measure taking into account the ranking of documents (Järvelin & 

Kekäläinen, 2002), and mean average precision, a measure that calculates the mean of 

precision after all relevant documents have been retrieved (Voorhees, 2000). However, 

the form of the log file did not enable these calculations, as it did not store the 

documents retrieved. However, we did measure query success in terms of the query’s 

ability to retrieve relevant documents as outlined in the previous section. For the 

purpose of performance measurement, we set a successful search to be a query 

retrieving at least one document with a relevance of 2 or 3 (on a scale from 0-3, where 3 

is the score of full relevance). 2 was included in the measurement of success, as it 

turned out that the test persons at several occasions stopped their searching, when a 

level 2 document had been retrieved. To exemplify, two test persons stop with the 

following statements: 

“Well, I didn’t find anything that states exactly how to do it, but I have found 

something indicating where I might find the rules.” (TP1, line 243-244), and 

“[I still think it is a 2..] because I do get some information about the tax 

rules… But of course you do need to go one level deeper in order to hit a 3” (TP6, line 

49-52). 

Another reason for including level 2 documents in the definition of a successful query is 

the assessment of documents from the metadata contained in the result lists, and not the 

full document. As one test person puts it: 

“...I would not give it a 3. Actually, I would probably give 1 to both of them, 

because I can’t know if it is what I need, before I get in and see if it is correct. But 

those are the ones, I would choose. Unless I can see that I can move on...” (TP3, line 

73-76).


It appears from the quote that documents might be rated lower because the test persons 

do not get to assess the full version of documents. 

At session level a number of interaction variables were also included; the 

number of sessions with reformulations, the number of reformulations, the session 

success. The reformulations basically provide same information as with queries, though 

at session level the impression might change slightly, which is why it was included here 

too. Likewise, the session success is a condensation of the query level in order to be 

able to compare across the search tasks at a more overall level. In terms of Kelly (2009, 

p. 104-105) the remaining three variables at session level are characterized as 

information need variables. Here we measured the test persons’ assessments of the 

search tasks (solely for simulated search tasks) in terms of the level of difficulty, their 

insight into the topic, and the similarity of the task with genuine work tasks. Though 

the risk of receiving highly subjective answers in this type of assessments, we included 

them to have some indication of the test persons perception of simulated search tasks. 

Subsequently to the registration of data, statistical analyses were carried out. 

The analysis consisted of univariate and bivariate statistics, frequencies, means, and 

correlations. In addition, inferential statistics was carried out, when relevant. We used 

Pearson’s R for interval and scale level data and chi square (2) for data at nominal 

level. 

6.5 Limitations 

It is recognized that the search test has limitations. As the test is designed as a 

laboratory, controlled test, it does not necessarily reflect the everyday seeking behaviour 

of the employees. Also test persons searched on the basis of three simulated search 

tasks. The challenges of designing suitable tasks for professional users have been 

outlined above. From the results presented in Chapter 8, the searchers’ handling of the 

genuine search tasks differ in some respects from the simulated search tasks. However 

in most cases the differences are minor. Further, the accordance of facets in simulated 

and genuine search tasks demonstrates realism to the employees concerning this aspect. 

In addition the test persons carried out their own interpretations of the search tasks as to 

constructing queries, providing 128 sessions and 564 queries. In this respect the test 

provides knowledge of the test persons’ understanding of and ability to incorporate 

elements of a search interface into their queries. Lastly, we want to address the state of 

the prototype used for the test. Though the system included about a fourth of the 

148

149 

Chapter 6 

documents of the running intranet, it could have meant that known documents were not 

included. In addition the training of the categorization was not final at the time of the 

test, which at times challenged the test persons and may have affected the search log 

data. However, the search interviews provided valuable qualitative data to explain and 

understand the nature of the challenges and the test persons use of the prototype. 

6.6 Relation between research method and research questions 

In the previous sections we have outlined the research methods forming the basis for the 

collection and analysis of data. We will close the present chapter by interconnecting the 

research method with the research questions guiding the thesis in order to clarify the 

purpose of the specific elements of the research method. The relations between research 

questions and their empirical basis are outlined in Table 6.8. As appears from the table 

RQ 1.1-1.4 and 2.1-2.9 are empirically based, while RQ 1.5 and RQ2.10 puts the 

empirical findings into perspective. Next we will present the results of the domain 

study.


Table 6.8 Outline of the relation between research questions and empirical data 

Research question Empirical basis 

RQ1: What characterizes the e-government employee’s information seeking behaviour in 

relation to: 

1.1 Their use of information sources? Survey questionnaire and focus group 

1.2 Their frequency of information seeking? interviews 

1.3 Their information needs? 

1.4 Their metadata preferences? 

1.5 How does the seeking behaviour affect The empirical findings of RQ 1.1-1.4 

demands for indexing? 

are analysed from an indexing 

perspective. The response to the 

question is analytical. 

RQ2: How do automatic extracted indexing and automatic categorization perform in 

relation to the identified domain characteristics as to 

2.1 Number of queries in sessions? Search log supported by search 

2.2 Number of terms in queries? 

2.3 Number of concepts in queries? 

2.4 The type of search operator applied? 

2.5 The use of document type filters? 

2.6 Number of reformulations? 

2.7 Types of reformulations? 

2.8 Degree of search success in queries and 

sessions? 

interviews 

2.9 Overall performance measured by 

performance measures 

2.10 Which implications does the performance The empirical findings of RQ 2.1-2.9 

of different indexing methods have for future are analysed in terms of their 

indexing and indexing guidelines in the domain implications. The response to the 

of e-government? 

question is analytical. 

150 

Chapter 7 

Chapter 8

7 Domain study results 

151 

Chapter 7 

The purpose of the domain study is to be able to answer the research questions 

regarding the information seeking behaviour of e-government employees and how the 

domain characteristics affect demands for indexing within the domain (research 

question 1) outlined in Chapter 1. The investigation of seeking behavior in the domain 

served more purposes in the project. Most importantly, the domain study should inform 

the subsequent search test as to how it is designed in order to reflect the behavior within 

the domain. Secondly, we wanted a validation of the relevance of the system chosen for 

the search test (a prototype of a future version of the intranet at SKAT, see section 

6.4.1). 

The results of the questionnaire (see section 6.2) and the focus groups (see 

section 6.3) form the basis for the domain study. The chapter is introduced by a 

presentation of the questionnaire respondents and the focus group participants (section 

7.1 and 7.2). Next follows results the results and analysis of the empirical data 

collection regarding research questions 1.1-1.4. The purpose of the section is to be able 

to characterize the seeking behaviour of e-government employees in the case study. We 

have divided the analysis in two parts. The first section concerns the findings related to 

general seeking behaviour of the employees (section 7.3). The succeeding section is 

concerned with the results generating demands for indexing (section 7.4). The chapter 

is finished by a summary. 

7.1 Questionnaire respondents, their background and work tasks 

340 respondents completed the questionnaire resulting in a response rate on 42, 6% (see 

Appendix 21), which was an increase of responses compared to the pilot test. Here the 

response rate was 29 % (see Appendix 5). The degree of response of the remaining 57 

% also appears from Appendix 21. As we are only using the full responses as basis for 

the data analysis, the 42, 6% are the focal point of the remainder of this chapter as to the 

questionnaire part.


Table 7.1 Distribution of respondents as to their education (percentages) 

152 

# Percentages 

Internal clerk programme 97 28.5 

Administrative assistant 95 27.9 

Other vocational education and training 26 7.6 

Upper secondary education 10 2.9 

Short-cycle higher education 10 2.9 

Bachelor degree 7 2.1 

Medium-cycle higher education 26 7.6 

Long-cycle higher education 43 12.6 

Master’s programme 26 7.6 

Total 340 100 

The age of the respondents ranges between 19 and 68 years. The average age of the 

respondents is slightly above 47 years with a standard deviation of 9.5 years, which 

reflect the population figures (see Appendix 23). The respondents overall have quite a 

long length of service in the organization (see Appendix 24). Accordingly, the 

respondents’ experience with the single work tasks are also extensive, when measured 

as the number of years, the respondents have been working with the task (see Appendix 

22). Thus, the exchange of employees is limited, and that the respondents tend to 

continue carrying out the same work tasks for some time. However, internal circulation 

of employees in the organization also takes place. Thus, both the focus group 

interviews and the search test have revealed employees that have carried out numerous 

different and diverse tasks during their time of service. The majority of the respondents 

are educated within the organization or are administrative assistants (see Table 7.1). 

Another large group have finished a higher education or master’s programmes. In sum, 

the respondents may be characterized as employees of a certain age that are expected to 

have a quite some insight in organization matters and topics due to 

the general long length of service within the organization and due to the educational 

background that in many cases can be considered as organization specific. 

The respondents could select 19 different generic work tasks as their work 

tasks in the questionnaire. There were neither upper nor lower limits to the number of 

selections. The frequencies are shown in Table 7.2. We have already discussed the 

10,9 % of the respondents not selecting any work tasks in section 6.2.6 and will not

153 

Chapter 7 

elaborate further on this issue here. Respondents most frequently chose one (27, 9 %) 

or two (35, 8 %) work tasks. From three and upwards, the number of respondents 

decreases. The number of work tasks selected by the respondents show that employees 

predominantly carry out a few work tasks during their work day. This corresponds to 

the task oriented organization structure mentioned in section 2.4. Further the size of the 

organization allows for highly specialized employees. 

It may be discussed, how one person can take care of up to as many as six 

generic work tasks. The answer may be found in exactly the word generic. It may have 

caused the respondents some problems identifying exactly their work area in the generic 

nature of the description of the work tasks (see Appendix 1), when the actual work 

Table 7.2 Number of work tasks selected by respondents 

Number of work tasks 

carried out by employees 

Data from web questionnaire: 

All employees, 

N=340 

# % 

0 WT 37 10,9% 

1 WT 95 27,9% 

2 WT 122 35,8% 

3 WT 52 15,2% 

4 WT 20 5,9% 

5 WT 10 2,9% 

6 WT 5 1,5% 

Total 340 100%


Table 7.3 Ranked frequency of work tasks in questionnaire results 

Work task carried out by employees Data from web 

questionnaire: 

All respondents, 

n=340 

# % 

Instruction 181 53% 

Inspection: common 61 18% 

Settlement: preliminary assessment of income/personal 

taxes 

57 17% 

Settlement: business relations 57 17% 

Processes of support: legal support 45 13% 

Collection 39 12% 

Management and development: development 27 8% 

Settlement: corporation taxes 25 7% 

Settlement: common 20 6% 

Settlement: vehicles 18 5% 

Inspection: customs 16 5% 

Management and development: strategy 16 5% 

Processes of support: internal activities 15 4% 

Settlement: estate 14 4% 

Processes of support: IT service and administration 14 4% 

Processes of support: HR and education 14 4% 

Management and development: business management 14 4% 

Settlement: customs 12 4% 

Processes of support: minister service 10 3% 

Total 655 

154

155 

Chapter 7 

area consists of a combination of several work tasks. 11 Further, the respondents were 

asked to “pick also work tasks that “you carry out elements of” (page 12 of the 

questionnaire, see Appendix 4). However, the majority of the respondents selected 

between one and three work tasks. The work tasks are represented by the respondents 

according to Table 7.3. In total, the respondents answered questions about 655 work 

tasks distributed between the 19 generic work tasks. The table demonstrates the relative 

extent of the work tasks among the respondents. The most dominating work task is 

Instruction. Instruction differs from most of the other work tasks. Thus, according to 

the definition of Instruction, it represents a different layer, because it operates at a meta 

level basically concerning the contact with clients, whether citizens or businesses. 

Instruction does not refer to specific subject areas in the organisation which is the case 

for the remainder of the work tasks. 

7.2 Characteristics of focus group participants 

The participants in the focus groups were assembled in order to represent the six main 

processes in the business model of the organization. As can be seen from Appendix 25, 

all six main processes in the business model were represented by the participants. It 

turned out, however, that several of the participants covered more than one of the work 

tasks. This is in line with the questionnaire results just mentioned. As a consequence, 

some participants are placed several places in the table. Instruction constitutes a special 

case, since it is a part of most of the participants’ daily work in some sense next to their 

other primary functions. The six participants placed here are the ones participating in 

the focus group specifically concerning Instruction. 

The participants represented a number of different educational backgrounds. 

When counted by the division from the questionnaire, the participants are distributed 

according to Table 7.4. Some of the educations mentioned in the questionnaire 

11 

For instance one combination in the questionnaire represents three tasks: Settlement: Preliminary 

assessment of income/personal taxes, Inspection: Common, and Processes of support: Legal support. 

Thus, the work area is concerns inspections and legal support in regards to income taxes.


Table 7.4 Focus group participants' educational background 

Title of education Data from focus groups: 

Focus group participants, 

N=35 

156 

# % 

Internal clerk programme 19 54 

Administrative assistant 3 9 

Other vocational education 

and training 

Upper secondary education - 

Short-cycle higher education - 

Bachelor degree - 

Medium-cycle higher 

education 

Long-cycle higher education 8 23 

Master’s programme 2 6 

Could not be placed 3 9 

Total 35 

have not been represented in the focus groups. We do not consider this a problem since 

it is the same educations that are less frequent in the questionnaire results (see Table 

7.1). Also, the aim of the focus groups was not necessarily to be representative for the 

organization as to the level of education. The participants range between a few months 

and up to about 40 years as to their length of service within the organization. Thus, 

both high experience employees and newcomers are represented in the groups. 

7.3 Results regarding professional e-government seeking behavior 

The purpose of section 7.3 is to present the general seeking behavior found in the 

questionnaire and the focus group interviews. The section addresses the employees’ 

seeking behavior in terms of information sources applied. 

- 

-

7.3.1 Use of information sources 

157 

Chapter 7 

The respondents’ selection of sources appears in Table 7.5. The questionnaire 

does not reveal the relative importance of the listed sources to solve certain work tasks, 

as it was not incorporated in the design of the questionnaire. Thus, we have asked 

which sources are used by the respondents, but not the frequency of the single source. 

The content of Table 7.5 therefore expresses the range of information sources. The 

questionnaire allowed for the respondents to propose additional sources besides the 

predefined ones. Also the focus groups contributed with supplementary sources and 

verified the sources mentioned by the respondents. The organization demonstrates a 

very broad use of information sources. From the percentages mentioned at the bottom 

row of Table 7.5 it appears that the average importance of the predefined sources varies 

to a large extent. From the table it appears that the intranet is the predominant source of 

information to the employees. On average 85% of all work tasks applies the system for 

problem solving. Also the WWW and reference works are important to the employees. 

The predefined sources can be arranged in three overall groups; reference 

works, various web sites, and internal systems. The groups are not mutually exclusive, 

but are used to characterize the systems applied. A fourth group came up during the 

open questions of the questionnaire and during the focus groups: Colleagues as sources 

of information. The results regarding this particular source of information will be 

presented in section 7.3.2. The groups guide the analysis of use of sources in the 

sections to follow. The additional sources cover internal systems apart from the 

predefined sources, other specialized systems, specific websites, and colleagues (see 

Appendix 26). The appendix reflects the myriads of sources used in a large specialized 

organization as SKAT. The sources are included in the relevant sections below, when it 

has a purpose. 

7.3.1.1 Reference works 

Due to its area of function, SKAT is to a large extent guided by legislation and rules. In 

this section we denote reference works as digital and printed reference works. This is 

mirrored in importance of reference works, whether printed or digital appearing in 

Table 7.5. From the table the importance of the legal basis of the organization is 

emphasized. In general terms the employees use reference works to a large extent: both 

types were used in about 40% of the work tasks. 

The distinction between printed and electronic sources addresses a general 

change in organizations that printed books are phased out for the benefit of printed


editions. Focusing on the work tasks with above 50 respondents, the digital versions 

have a vaguely higher score (see Table 7.5). However printed versions are still 

important to the employees. The participants mentioned different reasons for still 

needing printed versions of reference works. The overall label for the reasons was 

practical matters. The label covers different motivations. One is the nature of the work 

task. The nature of the work tasks designates the aspect that some employees in the 

organization carry out parts of their work away from their desk top due to meetings, 

either internally, or externally paying visits to citizens, businesses, and other 

governments. This supports the recommendations given by Garcia et al. (2006), that the 

implementation of technology in work places should be closely related to how work 

tasks are carried out in particular settings. Also some participants found printed 

versions are easier to read, and lastly it was mentioned that printed versions are easier to 

search. Regarding the searching, differences of opinions were expressed in the focus 

groups though, which may also explain the even use of the two. To exemplify: 

“I use electronic reference works a lot. I believe that I find far the most here. 

If I search right, I will get it. But they also contain cross references to all the thing on 

the intranet, and it is supposed to get, what is on the Internet too, through the 

Parliament and the like.” (R16, p. 5), and 

”…as long as you have a printed reference work, they are easier to consult. 

That is, if you know where to look.” (R23, p. 11). 

The two quotes illustrate how the selection of either electronic or printed sources is a 

matter of the user’s preferences and experience. 

7.3.1.2 Web sites 

The predefined websites covered by the heading “Web sites” is the homepage of the 

Danish Parliament, ministry homepages, borger.dk 12 , “Retsinformation” 13 and the 

12 

Borger.dk is the Danish common portal of communication between citizens and governments. The 

portal enables self-service for citizens, but has also got an area for public authorities. See 

www.borger.dk. 

13 

Retsinformation is the official danish website containing the acts, their procedural history, historical 

law, and the like. The database is located at: www.retsinfo.dk. 

158

159 

Chapter 7 

internet in general. As appears from Table 7.5, the preferred resource to use of the five 

listed is the Internet. Further, the internet is the second most used information source of 

all the predefined types. Of course, the designation of the source may have a saying in 

its predominance. Thus, in principle, the internet in general could include the remainder 

of the web based sources listed in the questionnaire. As appears from the table, the 

internet in general is the second most frequent source of information in SKAT. Using 

Google for searching was brought forward several times during the focus groups. The 

search engine was used for explorative searches and as a gateway to searching other 

systems like the intranet. To exemplify: 

“For me, if I need rulings, I use Google even though I know I can access 

Retsinformation and Thomson too. But I search Google, because I find the electronic 

reference works too bad. Then I find the ruling in Google and then I might get referred 

to one of those pages that we are perhaps supposed to use, but I simply find their search 

functionalities too bad.” (R11, p. 5) 

Further websites is one of the examples of sources that are closely related to 

specific work tasks. Thus both Retsinformation, ministry homepages, and the 

homepage of the Danish parliament are far more extended in the main processes 

“Processes of support” and “Management and development”. The increased use here 

demonstrates that the employees to a larger extent than other employees are engaged in 

detailed legal matters in the organization. 

7.3.1.3 Internal systems 

In the predefined sources, the intranet and Captia (an electronic case management 

system) are the representatives of internal sources of the organization. Apart from the 

two, the questionnaire and focus groups reported a number of additional sources within 

the internal group of systems. To a large extent, the internal systems added represent 

systems equivalent to Captia. Examples are Dipsy, KMD, Remedy, TST, DR, and the 

like. The systems serve different purposes in the organization, but have one thing in 

common; they are all systems for registration of either cases, requests or other data. 

The systems mentioned reflected local differences as an implication of the mergers, but 

also highly specialized systems supporting the professional activities of the employees 

(see Appendix 26). As a preparation for the search test we were interested in finding 

out, if the intranet use differed as to work tasks. Thus, we wanted to find out if some 

work tasks were more appropriate for the search test than others. It


Table 7.5 Respondents' use of predefined information sources (percentages) (to be continued on the succeeding page) 

Sources used for 

certain work 

tasks 

Intranet Digital 

reference 

works 

Printed 

reference 

works 

Homepage of 

the Danish 

Parliament 

160 

Sources 

Captia Ministry 

home- 

pages 

Borger.dk Retsin- 

forma-tion 

The Internet in 

general 

# % # % # % # % # % # % # % # % # % 

Instruction 154 85 102 56 98 54 17 9 47 26 31 17 21 12 42 23 89 49 

Settlement: 

common 

Settlement: 

preliminary 

assessment of 

income/ personal 

taxes 

Settlement: 

business relations 

Settlement: 

corporation taxes 

Settlement: 

customs 

16 80 11 55 9 45 2 10 6 30 2 10 2 10 3 15 8 40 

50 88 42 74 33 58 1 2 6 11 4 7 6 11 5 9 22 39 

46 81 34 60 33 58 8 14 14 25 8 14 4 7 12 21 24 42 

21 84 21 84 17 68 4 16 10 40 4 16 - - 6 24 16 64 

9 75 1 8 7 58 - - 2 17 - - - - - - 2 17 

Legend: The table states the percentages of respondents that use a specific information source for a certain work task. Since the table at least for some 

information sources reflects a wide variation between work tasks, the last row summarize the total average percentage across all work tasks reported.



tasks 

Settlement: 

vehicles 


reference 

works 


reference 

works 

Homepage of 

the Danish 

Parliament 

161 

Sources 


home- 

pages 


forma-tion 

Chapter 7 


general 

# % # % # % # % # % # % # % # % # % 

15 83 3 17 4 22 2 11 3 17 2 11 3 17 3 !% 10 

Settlement: estate 11 79 8 57 9 64 1 7 6 43 5 36 3 21 5 36 9 64 

Inspection: 

common 

Inspection: 

customs 

49 80 51 84 43 71 6 10 14 23 4 7 5 8 19 31 35 57 

11 69 1 6 5 31 1 6 4 25 1 6 - - 1 6 4 25 

Collection 32 82 16 41 11 28 2 5 16 41 3 7 6 15 8 21 13 33 

Processes of 

support: legal 

support 

Processes of 

support: minister 

service 

43 96 36 80 32 71 17 38 17 38 18 40 3 7 22 49 20 44 

9 90 5 50 3 30 8 80 4 40 6 60 - - 5 50 5 50 

56


Table 7.5 Respondents' use of predefined information sources (percentages). Part 2 (to be continued on the succeeding page) 



tasks 

Processes of 

support: IT 

service and 

administration 

Processes of 

support: HR and 

education 

Processes of 

support: internal 

activities 

Management and 

development: 

strategy 


reference 

works 


reference 

works 

Homepage of 

the Danish 

Parliament 

162 

Sources 


home- 

pages 


forma-tion 


general 

# % # % # % # % # % # % # % # % # % 

11 79 1 7 2 14 - - 3 21 - - 1 7 - - 9 64 

12 86 3 21 1 7 - - 3 21 3 21 - - 1 7 9 64 

13 87 1 7 - - - - 5 33 - - 1 7 2 13 12 80 

16 100 3 19 5 31 5 31 1 6 6 38 1 6 3 19 12 75



tasks 


development: 

business 

management 

Management 

and 

development: 

development 

Total average 

percentage 


reference 

works 


reference 

works 

Homepage of 

the Danish 

Parliament 

163 

Sources 


home- 

pages 


forma-tion 

Chapter 7 


general 

# % # % # % # % # % # % # % # % # % 

14 100 1 7 4 29 2 14 3 21 2 14 1 7 1 7 8 57 

26 96 4 15 6 22 8 30 6 22 7 26 - - 9 33 19 70 

85 39 40 15 26 17 7 19 52 

Legend: The table states the percentages of respondents that use a specific information source for a certain work task. Since the table at least for some 

information sources reflects a wide variation between work tasks, the last row summarize the total average percentage across all work tasks reported.


turned out that the intranet was listed as the most frequently used source across all work 

tasks except for one (Inspection: common) (see Table 7.5), and may thus overall be 

considered the most important source of information in the organization. Taking the 

focus groups into account, it appears that the intranet holds different functions to the 

participants. Key functions are as a library of internal messages and documents, as a 

tool for being updated on topics of interest, and as a library of specialist information. 

To illustrate: 

“…so the information need that I have [regarding the intranet]is more aimed 

towards changes in new court decisions, new legislation, and we usually get that from 

the intranet. That means that I go there every morning to see, if anything new have 

come in relation to collection, and that is how I stay updated” (R33, p. 1) 

Having placed the intranet as the most important general source of information, the 

intranet is at the same time considered a challenging system to use by the participants. 

The challenges has different overall directions: too much information, irrelevant 

information, and trouble locating relevant information. Two quotes exemplify: 

“So the intranet, it is our common notice board. And that also decides the 

search results. You might even get recipes, if they have been published.” (R7, p. 5), and 

“It is very often, when we are answering agent telephones. For instance when 

e-income was new, they would ask us “how do you do a correction of wrongly stated 

taxes”, and we also didn’t know many of the questions and then we could search the 

intranet, but we gave up. We had to pass them on to someone dealing with it, because it 

took us too long, and it was confusing to search the intranet. We couldn’t find the 

answers, we needed, because you got page by page containing the least bit about eincome, 

that’s what you get.” (XX, settlement, p. 3) 

The problems are solved in different ways. For documents, that may also be 

found at the official web page of the organization, several of the participants mention, 

that they find the web site easier to navigate than the intranet. Others perform a Google 

search, either at the whole www or limited to the domain www.skat.dk. A third 

common way of solving the problem is to ask a colleague for help. Regardless the 

164

165 

Chapter 7 

approach applied to solve the intranet search problems, the importance of qualified 

indexing is stressed. 

To sum up a variety of sources are used by the employees along with creativity 

to find information when needed. In terms of the search test we were informed that 

apart from “Settlement: Customs”, the intranet had an extensive use in the organization. 

That supported our choice of system for the search test. 

7.3.2 Colleagues as sources of information 

One general characteristic throughout all of the focus groups is the importance 

of colleagues as information sources. We did not ask about this particular type of 

source in the questionnaire. Though, a number of respondents mentioned colleagues 

and neighbor training as additional sources in the open box below the predefined answer 

options for information sources. Colleagues as information sources has been 

investigated in the LIS research previously, in the public domain (e.g., Hazlett, 

McAdam & Beggs, 2008; Woudstra & van den Hooff, 2008) as well as in other 

professional contexts (e.g., Herzum et al., 2002; Herzum & Pejtersen, 2000; Xu, Tan & 

Yang, 2006). Here, the employees put forward two main reasons for using colleagues: 

For efficiency matters: 

“Well, if it is tasks within special problem areas, and we know we have 

colleagues with knowledge about it, then it is tempting to go ask, because the person is 

likely to know the latest decisions in the area. Instead of starting to… It is also a 

matter of time. You can save time by…” (XX, Guidance, p. 11), 

And for validation matters: 

”Well, I do prefer to consult the customs guidance in the first place, and then… 

if I am not really sure if anything new has come, then I will check out the electronic and 

stuff. And then I always go ask…” (R19, p. 5) 

To sum up, colleagues are important to the employees, here and in related studies. 

However, to some extent it is due to ineffective retrieval systems, which emphasizes the 

need for an improved indexing practice.


Table 7.6 Questionnaire results regarding the frequency of information seeking 

Work tasks 

Every 

time 

Instruction 33 

18% 

Settlement: common 5 

Settlement: preliminary 

assessment of income/personal 

taxes 

25% 

5 

9% 

Settlement: business relations 10 

18% 

Settlement: corporation taxes 10 

40% 

Settlement: customs 2 

17% 

Settlement: vehicles 4 

22% 

Settlement: estate 4 

29% 

Inspection: common 18 

30% 

Inspection: customs 5 

31% 

Collection 8 

21% 

Processes of support: legal support 14 

Processes of support: minister 

service 

31% 

5 

50% 

166 

Every second 

time 

20 

11% 

2 

10% 

6 

11% 

6 

11% 

4 

16% 

1 

8% 

2 

11% 

1 

7% 

13 

21% 

- 

- 

9 

20% 

1 

10% 

Frequencies 

Every 3rd 

or 4 th time 

86 

48% 

5 

25% 

27 

47% 

27 

47% 

10 

40% 

3 

25% 

7 

39% 

6 

43% 

24 

39% 

4 

25% 

14 

36% 

18 

40% 

2 

20% 

Practically 

never 

42 

23% 

8 

40% 

19 

33% 

14 

25% 

1 

4% 

6 

50% 

5 

28% 

3 

21% 

5 

8% 

7 

44% 

17 

44% 

4 

9% 

2 

20%

Work tasks 

Processes of support: IT service 

and administration 

Processes of support: HR and 

education 

Processes of support: internal 

activities 

Management and development: 

strategy 


business management 


development 27 

Every 

time 

4 

29% 

2 

14% 

3 

20% 

2 

13% 

3 

21% 

6 

22% 

167 

Every second 

time 

1 

7% 

3 

21% 

1 

7% 

2 

13% 

2 

14% 

5 

19% 

Frequencies 

Every 3rd 

or 4 th time 

5 

36% 

6 

43% 

5 

33% 

7 

44% 

5 

36% 

14 

52% 

7.4 Seeking results regarding demands for indexing in e-government 

Chapter 7 

Practically 

never 

4 

29% 

3 

21% 

6 

40% 

5 

31% 

4 

29% 

In the sections to follow, we report on the findings informing about the demands for 

indexing in e-government. 

7.4.1 The frequency on information seeking 

The need for information seeking were documented in the questionnaire by question 17 

(see Appendix 4) regarding frequency of information seeking. The question was not 

aimed specifically at the intranet. Rather the question was formulated broadly in order 

to investigate information seeking in general. This means, that the question maps the 

information seeking regardless the source applied. The distribution for the single work 

tasks appears in Table 7.6. The table shows, that the most common frequency for 

information seeking is every third or fourth time (column 3). Thus, in 12 of 19 work 

tasks this is the frequency with the highest score. Apparently the general picture is that 

2 

7%


information seeking does take place rather frequently, but not necessarily every time a 

work task is solved. 

In the focus groups, the issue of information seeking received some attention, 

because some of the frequencies from the questionnaire did not mirror the frequencies 

of the participants. Thus, the participants discussed, what constitutes information 

seeking. Some participants intuitively understood information seeking as mere look ups 

in an information system. One participant says: 

“If a client reports to the counter out here, you ask for his civil registration 

number and log into his information. This is the first information. You cannot serve a 

client unless you seek information at least once... But if someone asks a specialist 

question, then the need for information is not nearly as substantial. Because then you 

answer on the basis of something you know like the back of your hand...The only 

requests that do not require information are the ones asking for direction to the motor 

unit. They are handed over an instruction. Everyone else involves look ups.” (R7, p. 2- 

3) 

Another participant (R33) supplements: 

“We cannot do anything without having the ICT based possibilities of looking 

up companies, demands, what does this company owe, this person, what does he or she 

owes. We need to access the network all the time.” 

In other words, if information seeking is understood by the respondents in the sense of 

mere look-ups in some sort of information system, then information seeking occurs very 

frequently if not even every time a work task is solved. 

Information seeking triggered by an information need occurs less frequently. 

The frequency is affected by different conditions. One condition is the number of self 

service solutions developed in the organization. Self-service implies that citizens are 

handling a range of tasks by themselves. One consequence of this is that some 

knowledge areas of the employees are not maintained, because the work tasks that used 

to help maintaining the knowledge areas are handled by the citizens themselves now. In 

relation to the frequency of information seeking, this means, that the frequency 

increases, because an information need now emerges in situations that used to be dealt 

with by the employees memory. The following quote illustrates this: 

168

169 

Chapter 7 

“...when we were employed by the municipality, our job was to assess as many 

people as possible, that is going through their tax return to see, whether they did it right 

or wrong... this means, that back then we gained experience all the time and kept up 

with what happened in this area and this area... now we need to make people use selfservice 

and make error lists, so we keep losing, what we once used to know by memory. 

I certainly feel, that many of the questions, I used to answer just like that, now requires 

reading. Just to be brought up to date and see, if something new has occurred since the 

last time.” (R10, p. 4) 

This discussion may also explain, why several work tasks in Table 7.6 has a peak of 

frequency at both “Every 3 rd or 4 th time” and “Every time”. 

Another condition is the prior knowledge of the case handled. According to 

R35 information seeking only takes place, if: 

“... you are handling a completely new case. Then, obviously, I need to seek 

more information about this company. If it is a company, I know in advance, I might 

just check, what has been declared and what has been paid. But no matter what, I 

always seek before I am going to talk to a company.” 

Some work tasks differ from the general tendency of “Every 3 rd or 4 th time” 

being the most common frequency. Using percentage distribution as an indicator, seven 

work tasks generate noticeably more or less frequent information seeking than the most 

frequent category. 

Within “Processes of support” two work tasks differed from the overall pattern. 

“Minister service” generated a higher frequency of information seeking with the largest 

percentage share of all work tasks on “Every time”. “Internal activities” on the other 

hand had a lower frequency than the general picture ad had the majority of the 

respondents seeking for information every third or fourth time or practically never. 

Neither of the work tasks had a lot of respondents. But still, the focus groups and the 

description of the work tasks added to our understanding of the respondents’ behaviour 

in the two particular work tasks. 

“Minister service” was not directly represented in the focus group interviews, 

but was discussed in the focus group for “Processes of support”. The interview 

supported, that “Minister service” make a special case as to the frequency of


information seeking due to the content of the work task. R14 compares it to the other 

work tasks within Processes of support this way: 

“My spontaneous explanation for “Minister service” is, that, well, so much 

more is at stake, when servicing the minister. You need to be so much more certain. 

...with “Minister service”, you need to be 100% certain. Of course you need to in other 

cases as well, but more is just at stake with “Minister service”... You need to be 100% 

sure, that what you write and produce and contribute with is correct.” (R14, p. 3) 

Thus, it seems, that the importance of correct information becomes even more 

important, when passed on to the target group of “Minister service”. 

“Internal activities” on the other hand generated an average frequency of 

information seeking that was fairly low compared to the general picture. Thus, the 

majority of respondent selected either “every 3 rd or 4 th time” or “Practically never” to 

describe their frequency. In the focus groups, the two representatives of “Internal 

activities” were rather different. One (R32) took care of mail, that could not be 

delivered directly to the relevant party. The other one (R28) worked with 

communication. This difference of work tasks may explain the distribution of 

frequencies of information seeking. To R32, what made the frequency of information 

seeking decrease was the kind of information needed: 

“In our group, experience is more important. We almost need to know, what 

the different departments are doing, and that is what we try to... But all the time it is 

what you remember. He has got something to do with this and he has got something to 

do with that. You can practically not look it up anywhere” (R32, p. 4) 

However, when asked about information sources later on in the interview, it appeared 

that a number of sources were considered highly necessary in order to solve the work 

task at hand. 

R28 on the other hand considered “Every 3 rd or 4 th time” insufficient when 

describing her own frequency of information seeking. When asked if she looked for 

information more often than “Every 3 rd or 4 th time”, she replied: 

“Yes, I think so, because I also use it to orientate myself about some things 

before I show up or answer an e-mail... But it is also related to how I understand a task 

170

171 

Chapter 7 

because to me the intranet and seeking is a part of my job all the time about, well both 

SKAT as a business but also the subject area, I am working with. So you somehow 

either seek information or have signed up for a news mail... And all that information 

aids to how a task is solved one way or the other.”(R28, p. 6) 

It seems that the actual frequency of information seeking is rather frequent within 

“Processes of support”. The reason for the high frequency of “Practically never” at 

“Internal processes” may be explained by the sub work tasks that are also included in 

the overall description of “Internal processes”, for instance purchasing and 

administrating goods, services, and buildings. These are not work tasks that necessarily 

generate a high frequency of information seeking. 

Other work tasks generated less information seeking than the general tendency 

and had the majority of respondents indicating “Practically never” as the frequency of 

their information seeking. The specific work tasks are: “Settlement: common”, 

“Settlement: customs”, “Inspection: customs”, and “Collection”. 

“Customs”, whether in the main process of “Settlement” or “Inspection”, were 

discussed in the fourth focus group. All participants turned out to have their primary 

function within the main process of “Settlement”, but also had some insight into 

Inspection. The participants had difficulties relating to, that the majority of respondents 

of “Settlement: customs” had answered “Practically never” to represent their frequency 

of information seeking. They did provide examples of work tasks that did not require 

information seeking, either because the information needed was well known to them or 

was already a part of the papers provided for the case. But a large part of the work tasks 

carried out needed some kind of information seeking. The participants of the focus 

group did not come to at full agreement on, what was the correct frequency, but the 

group agreed, that “Practically never” did not provide a sufficient picture of the actual 

frequency. 

A hypothesis for the domain study was that the seeking behaviour would differ 

depending on the work task at hand. Looking at Table 7.6, this hypothesis to some 

extent is confirmed. A few work tasks stand out with a more frequent behaviour, others 

with a less frequent behaviour. Though the general impression is that the employees 

look for information regularly, but not every time they are engaged with a work task. 

However disagreements as to the general figures could be traced in the focus groups, 

indicating that numbers from the table are percentages, and that individual differences 

occur. In the recruitment of test persons for the search test we wanted to reflect this


Table 7.7 Distribution of indicators of information needs 

Work task Information needs 

Instruction (181) 38 

172 

1 2 3 4 5 6 7 

21% 

Settlement: common (20) 5 

Settlement: preliminary assessment of 

income/personal taxes (57) 

25% 

16 

28% 

Settlement: business relations (57) 17 

30% 

Settlement: corporation taxes (25) 3 

Settlement: customs (12) 

12% 

Settlement: vehicles (18) 8 

44% 

Settlement: estate (14) 5 

36% 

Inspection: common (61) 13 

21% 

Inspection: customs (16) 4 

25% 

Collection (39) 6 

15% 

Processes of support: legal support (45) 14 

Processes of support: minister service 

(10) 

Processes of support: IT service and 

administration (14) 

Processes of support: HR and education 

(14) 

Processes of support: internal activities 

(15) 

31% 

4 

29% 

2 

14% 

3 

20% 

84 

46% 

11 

55% 

26 

46% 

27 

47% 

13 

52% 

3 

25% 

9 

50% 

8 

57% 

37 

61% 

5 

31% 

13 

33% 

26 

58% 

4 

40% 

7 

50% 

8 

57% 

8 

53% 

46 

25% 

6 

30% 

17 

30% 

15 

26% 

6 

24% 

4 

33% 

5 

28% 

4 

29% 

18 

30% 

8 

50% 

7 

18% 

16 

36% 

3 

30% 

1 

7% 

5 

36% 

5 

33% 

26 

14% 

4 

20% 

4 

7% 

8 

14% 

3 

12% 

1 

8% 

3 

17% 

1 

7% 

16 

26% 

3 

19% 

3 

8% 

5 

11% 

5 

50% 

1 

7% 

1 

7% 

4 

27% 

48 

27% 

5 

25% 

19 

33% 

13 

23% 

5 

20% 

2 

17% 

4 

22% 

3 

21% 

24 

39% 

3 

19% 

6 

15% 

16 

36% 

5 

50% 

3 

21% 

4 

29% 

2 

13% 

51 

28% 

4 

20% 

20 

35% 

15 

26% 

6 

24% 

2 

17% 

3 

17% 

4 

29% 

26 

43% 

4 

25% 

12 

31% 

14 

31% 

4 

40% 

6 

43% 

6 

43% 

4 

27% 

117 

65% 

13 

65% 

36 

63% 

27 

47% 

18 

72% 

6 

50% 

11 

61% 

9 

64% 

42 

69% 

5 

31% 

23 

59% 

34 

76% 

5 

50% 

7 

50% 

9 

64% 

11 

73%

Work task Information needs 

Management and development: strategy 

(16) 

Management and development: business 

management (14) 


development (27) 

173 

Chapter 7 

1 2 3 4 5 6 7 

3 

19% 

3 

21% 

5 

19% 

8 

50% 

6 

43% 

13 

48% 

5 

31% 

3 

21% 

5 

19% 

8 

50% 

5 

36% 

9 

33% 

8 

50% 

4 

29% 

10 

37% 

9 

56% 

7 

50% 

15 

56% 

10 

63% 

7 

50% 

17 

63% 

Legend: 

1) I know exactly which documents I need in order to solve the work task 

2) I need to find a document I have used before 

3) I pretty much know which documents exist on the subject 

4) I am working with a new project within a subject area well known to me. I would like to 

acquaint myself with the part that is new to me 

5) I am looking for documents for a new work task within a subject area that is familiar to me 

6) I am working with a subject area that I have not been working with before 

7) I know the subject well but need a specific piece of information 

individuality. To do this, it was decided to let the individual frequency use of the 

intranet guide, who was selected as test persons for the test. 

7.4.2 Types of information needs 

Types of information needs were investigated in the questionnaire in terms of a 

number of indicators of each of the three information needs employed in the thesis. It is 

important to keep in mind that the question about information needs is formulated 

specifically towards the intranet due to the search test. If the range and diversity of 

sources applied by the employees is taken into account (see section 7.3.1), it is possible 

that specific sources are used for certain information needs. What we are reporting on 

in the present section is therefore the information needs that are solved using the 

intranet. 

Information needs were represented in the questionnaire as a number of 

indicators representing the information needs suggested by Ingwersen (1992). The 

distribution of the respondents across work tasks and information need indicators 

appears from Table 7.7. Two indicators in particular describe the situation of the 

respondents across the work tasks, namely indicator 2 (I need to find a document I have


used before) and 7 (I know the subject well but need to find a specific piece of 

information). Thus, in most of the 19 work tasks, these are the most frequently 

occurring situations triggering an information need. 

This distribution corresponds well with the focus group results. Thus several 

participants express, that seeking carried out at the intranet is usually focused and that 

more open searches are carried out elsewhere. According to R23: 

“I do not use [the intranet] to seek without a specific goal. I would at Google, 

otherwise not. Used or seen before… It is possible, that you have not used the 

document before, but you have seen it before at the least.”. (R23, p. 12) 

R18 agrees: 

“I know this document is in there and I need to use it now. Or: I know this 

court ruling exists and I need to find it now. Or something else... Typically probably 

something I have seen before, that I need to use again.” (R18, p. 5) 

A third indicator is common in the work tasks belonging to the main process 

management and development, namely indicator 6 (I am working with a subject area 

that I have not been working with before). The focus group on management and 

development clarifies why. Thus, management and development is a main process, 

where new projects are planned, developed, and launched. Thus, looking for inspiration 

was the participants’ explanation for the higher frequency for indicator 6. For 

Inspection: customs, the most frequent indicator is number 3 (I pretty much know which 

documents exist on the subject). This may be related to the frequency of information 

seeking for the work task mentioned above (Section 7.4.1). Thus, it seems, that this 

particular work task dealing with field work related to controlling goods and means of 

transportation is routine, and that the employees know the sources needed to solve the 

work task. 

On the other side, two indicators generally have low frequencies, namely 4 (I 

am working with a new project within a subject area well known to me. I would like to 

acquaint myself with the part that is new to me) and 1 (I know exactly which documents 

I need in order to solve the work task). Indicator 4 may have a low frequency, because 

it is less frequent to be starting up new projects than dealing with routine types of tasks. 

174

175 

Chapter 7 

We previously outlined the information needs corresponding to the indicators 

(see Table 6.1). A translation of Table 7.7 to the inherent information needs referred to 

by the indicators, displays the predominant information needs of the respondents. Table 

7.8 displays the average percentage distribution of the three types of information needs 

underlying the indicators from Table 7.7. Again we see that the verificative and the 

conscious topical needs are the most common information needs. 

Table 7.8 Average percentage distribution of verificative needs (VN), conscious topical needs 

(CTN), and muddled topical needs (MTN). 

Work tasks Information needs 

VN CTN MTN 

Instruction (181) 38% 39% 21% 

Settlement common (20) 40% 40% 20% 

preliminary assessment of 

income/personal taxes (57) 

37% 42% 21% 

business relations (57) 39% 32% 20% 

corporation taxes (25) 32% 39% 18% 

customs (12) 13% 33% 13% 

vehicles (18) 47% 37% 17% 

estate (14) 46% 38% 18% 

Inspection common (61) 41% 46% 34% 

customs (16) 28% 33% 22% 

Collection (39) 24% 31% 19% 

Processes of support legal support (45) 44% 49% 21% 


development: 

minister service (10) 20% 43% 45% 

IT service and administration (14) 39% 26% 25% 

HR and education (14) 36% 43% 25% 

internal activities (15) 37% 40% 27% 

strategy (16) 34% 48% 53% 

business management (14) 32% 33% 43% 

development (27) 33% 40% 44% 

Legend: The table displays the mean of occurrences of one or more representatives of the 

indicators of information needs.


Some work tasks has a different ranking of importance as to information needs. 

Three work tasks in particular stand out, namely the work tasks belonging to the main 

process Management and development. Here, muddled topical needs are the most 

frequently occurring needs. 

One aspect of information seeking is not triggered by a specific work task 

when it comes to the intranet at SKAT. The information seeking in question is the 

seeking carried out by the employees in order to maintain a current state of knowledge 

as to their work tasks. Thus, besides seeking for information in relation to specific work 

tasks, the intranet is also used to stay updated on recent developments within topics of 

interest to the employees. The intranet is continuously updated with news and updates. 

This flow of information is partly caused by the characteristic of the foundation of 

the organization. Thus, the work at SKAT is largely guided by legal rules that 

constantly evolve. This further means, that the knowledge of the employees needs to be 

stay updated. Several of the participants state that they consult the intranet on a daily 

basis for updates within their working areas. 

We cannot verify this particular behaviour on the basis of the respondents, 

since the questionnaire did not aim at investigating this kind of behaviour. Instead this 

characteristic of the employees’ seeking behaviour was revealed during the focus group 

interviews. 

”...I think, that the intranet and seeking is a part of my work all the time, also 

just keeping myself updated on, well both on SKAT as a business but also the field, I am 

working with. So you somehow either seek information or have signed up for a 

newsletter, and then you receive the information that way. And all that information 

helps you solve the work task one way or the other.” 

Another participant (R28) agrees: 

“It also the place, where control signals and the like are coming. What we 

need to obey within the business. And also… the directions, the legal directions. When 

they are updated, they are published there too. So there is a lot to be attentive to there, 

really. You cannot avoid it. It would be scary, if it was not at 100 %, our intranet. You 

sort of need to be in there to be able to do your job.” (R28, p. 10-11) 

176

177 

Chapter 7 

This behavior is not distinctive for the domain in question here. Thus, similar findings 

have been made in different domains. Within the domain of engineering Bigdeli (2007) 

found, that developing knowledge and expertise was among the most important 

motivations to look for information. Further, Del Fiol et al. (2008) includes knowledge 

update as a criterion for success in their evaluation of an information system for 

clinicians. Information needs that are not directly tied to a work task thus occur in other 

professional user groups apart from the one in question in the thesis. 

To sum up, the most frequently occurring information needs on the intranet are 

verificative and conscious topical needs. Again we see “Minister service” stand out 

with a high score on all indicators. However, this reflects the high frequency of 

information seeking as to the work task reported in the prior section. In terms of 

information seeking this work task differs from the remainder. 

7.4.3 Preferred metadata 

Each group of questions regarding a work task in the questionnaire were closed by 

asking the respondents, which metadata they would like to be able to apply for 

searching the intranet 14 . The distribution of the respondents’ preferences appears from 

Table 7.9. The here it is evident that the most desired type of metadata among the 

employees is concerned with the topic of the document. Though the percentage points 

is varying, the metadata “subject” has the highest occurrence in 16 out of 19 work tasks. 

The importance of a well-functioning description of the subjects of the documents is 

obvious (with an emphasis on well-functioning). As addressed by a focus group 

participant: 

It all depends how good you are at describing the subject. Which terms are 

used? Who divides it into the superior subjects that can be searched for? It all depends 

on the quality of what is there. And the people, who uploaded it.” (R1, p. 10). 

The orientation towards the subject and content of documents is hardly surprising. 

What is interesting, though, is that requests for superior subjects (the upper level of the 

taxonomy) are far less extended. We interpret it as a request for metadata supporting 

14 The list of preferred metadata and their probes from the questionnaire appears from Table 6.2.


Table 7.9 Metadata preferences distributed across work tasks 

Work tasks Metadata 

Instruction 38 

1 2 3 4 5 6 7 8 9 10 11 12 13 

(21%) 

Settlement: common 3 

(15%) 

Settlement: preliminary 21 

assessment of (37%) 

income/personal taxes 

Settlement: business 17 

relations 

(30%) 

Settlement: corporation 4 

taxes 

(16%) 

Settlement: customs 1 

(8%) 

31 

(17%) 

3 

(15%) 

14 

(25%) 

10 

(18%) 

9 

(36%) 

1 

(8%) 

117 

(65%) 

12 

(60%) 

36 

(63%) 

33 

(58%) 

15 

(60%) 

4 

(33%) 

65 

(36%) 

8 

(40%) 

19 

(33%) 

21 

(37%) 

14 

(56%) 

3 

(25%) 

77 

(43%) 

10 

(50%) 

24 

(42%) 

22 

(39%) 

9 

(36%) 

1 

(8%) 

178 

59 

(33%) 

8 

(40%) 

12 

(21%) 

16 

(28%) 

5 

(20%) 

2 

(17%) 

20 

(11%) 

2 

(10%) 

5 

(9%) 

8 

(14%) 

2 

(8%) 

23 

(13%) 

2 

(10%) 

8 

(14%) 

4 

(7%) 

3 

(12%) 

24 

(13%) 

2 

(10%) 

9 

(16%) 

10 

(18%) 

3 

(12%) 

60 

(33%) 

8 

(40%) 

18 

(32%) 

17 

(30%) 

12 

(48) 

2 

(17%) 

53 

(29%) 

5 

(25%) 

17 

(30%) 

14 

(25%) 

11 

(44%) 

2 

(17%) 

18 

(10%) 

2 

(10%) 

7 

(12%) 

7 

(12%) 

2 

(8%) 

83 

(46%) 

5 

(25%) 

24 

(42%) 

19 

(33%) 

9 

(36%) 

5 

(42%) 

Legend: The table displays the total numbers of respondents within a work task choosing a certain type of metadata. The percentages refer to 

percentages of all respondents within the work task in the questionnaire. The numbers of columns represent the metadata contained in the 

questionnaire, namely: 1) Target group, 2) Superior subject, 3) Subject, 4) Name of legal text or court decision, 5) Object, 6) Activity, 7) Geographic 

data, 8) Responsible institution or department, 9) Project, 10) Document type, 11) Document number, 12) Document ID, 13) Work task.



179 

Chapter 7 

1 2 3 4 5 6 7 8 9 10 11 12 13 

(17%) 


(21%) 

Inspection: common 13 

(21%) 


(25%) 

Collection 6 

(15%) 

Processes of support: 12 

legal support 

(27%) 


minister service (40%) 


IT service and (21%) 



HR and education (21%) 

5 

(28%) 

3 

(21%) 

19 

(31%) 

2 

(13%) 

8 

(21%) 

16 

(36%) 

6 

(60%) 

5 

(36%) 

4 

(29%) 

9 

(50%) 

12 

(86%) 

44 

(72%) 

8 

(50%) 

23 

(59%) 

35 

(78%) 

8 

(80%) 

6 

(43%) 

9 

(64%) 

7 

(39%) 

7 

(50%) 

30 

(49%) 

5 

(31%) 

8 

(21%) 

29 

(64%) 

6 

(60%) 

4 

(29%) 

2 

(14%) 

11 

(61%) 

8 

(57%) 

24 

(39%) 

3 

(19%) 

15 

(39%) 

18 

(40%) 

6 

(60%) 

3 

(21%) 

3 

(21%) 

2 

(11%) 

10 

(74%) 

17 

(28%) 

6 

(38%) 

16 

(41%) 

12 

(27%) 

4 

(40%) 

4 

(29%) 

1 

(7%) 

2 

(11%) 

3 

(21%) 

1 

(2%) 

3 

(19%) 

3 

(8%) 

5 

(11%) 

3 

(30%) 

4 

(29%) 

2 

(14%) 

2 

(11%) 

2 

(14%) 

8 

(13%) 

3 

(19%) 

4 

(10%) 

11 

(24%) 

3 

(30%) 

3 

(21%) 

5 

(36%) 

1 

(6%) 

1 

(7%) 

11 

(18%) 

2 

(13%) 

3 

(8%) 

5 

(11%) 

5 

(50%) 

6 

(43%) 

2 

(14%) 

1 

(6%) 

7 

(50%) 

29 

(48%) 

6 

(38%) 

12 

(31%) 

24 

(53%) 

8 

(80%) 

3 

(21%) 

1 

(7%) 

1 

(6%) 

6 

(43%) 

27 

(44%) 

4 

(25%) 

7 

(18%) 

19 

(42%) 

4 

(40%) 

2 

(14%) 

2 

(14%) 

1 

(6%) 

1 

(7%) 

5 

(5%) 

1 

(6%) 

2 

(5%) 

5 

(11%) 

2 

(20%) 

2 

(14%) 

2 

(14%) 

6 

(33%) 

5 

(36%) 

23 

(38%) 

9 

(56%) 

18 

(46%) 

22 

(49%) 

6 

(60%) 

5 

(36%) 

6 

(43%)



Processes of support: 

internal activities 


development: strategy 


development: business 

management 


development: 

development 

1 2 3 4 5 6 7 8 9 10 11 12 13 

2 

(13%) 

4 

(25%) 

6 

(43%) 

7 

(26%) 

5 

(33%) 

6 

(38%) 

8 

(57%) 

12 

(44%) 

9 

(60%) 

13 

(81%) 

10 

71%) 

18 

(67%) 

2 

(13%) 

3 

(19%) 

5 

(36%) 

10 

(37%) 

2 

(13%) 

6 

(38%) 

6 

(43%) 

9 

(33%) 

180 

2 

(13%) 

7 

(44%) 

8 

(57%) 

11 

(41%) 

3 

(20%) 

4 

(25%) 

3 

(21%) 

8 

(30%) 

3 

(20%) 

10 

(63%) 

7 

(50%) 

13 

(48%) 

4 

(27%) 

9 

(56%) 

4 

(29%) 

15 

(56%) 

3 

(20%) 

6 

(38%) 

4 

(29%) 

6 

(22%) 

1 

(7%) 

3 

(19%) 

2 

(14%) 

4 

(15%) 

3 

(20%) 

3 

(19%) 

2 

(14%) 

4 

(15%) 

9 

(60%) 

7 

(44%) 

6 

(43%) 

11 

(41%) 

Legend: The table displays the total numbers of respondents within a work task choosing a certain type of metadata. The percentages refer to 

percentages of all respondents within the work task in the questionnaire. The numbers of columns represent the metadata contained in the 

questionnaire, namely: 1) Target group, 2) Superior subject, 3) Subject, 4) Name of legal text or court decision, 5) Object, 6) Activity, 7) Geographic 

data, 8) Responsible institution or department, 9) Project, 10) Document type, 11) Document number, 12) Document ID, 13) Work task.

181 

Chapter 7 

highly specific searches in a system that tends to overload the users with many 

irrelevant documents. 

Another type of metadata is frequently requested by the employees: Work task. 

Work task metadata is defined as “searching for colleagues engaged in a particular 

service or task regardless of location” (from Table 6.2). Three work tasks of the 

questionnaire ranged it as the most important metadata (“Settlement: customs”, 

“Inspection: customs”, and “Processes of support: internal activities”). The remainder 

of the work tasks ranged this particular metadata as being in the middle of the spectrum 

of importance. In the focus groups work task metadata also received quite some 

attention. Thus, the participants, regardless of work tasks, required improved 

possibilities to locate colleagues across the organization. Above we saw, that 

colleagues are widely used as information sources across the organization (see section 

7.3.2). The focus on work tasks as metadata in the focus groups corresponds well to the 

role of colleagues as information sources. 

Geographic data, document number, and document ID are found in the lower 

end of requirements for metadata. The metadata types were also not mentioned in the 

focus groups. This is interpreted as an indication of that the employees commonly 

would not be using the metadata actively to retrieve information. One thing is 

surprising, though. Document types were ranked middle to low among the work tasks, 

when compared to other work tasks. In the focus groups, document types received 

more attention. Here they were assessed as an important type of metadata. To 

exemplify: 

“Often you go look for, well, decisions, orders or judgments in the equivalent 

area. And then you actively go search for judgments or orders, so it is exclusively the 

document type in the first place, that you know that you want. But it is not because it is 

the most important thing, but it is a part of what we use in exactly in handling that 

case.” (R7, p. 8). 

As appears from the search test in the following chapter, document types were used as 

an important filter here too. On this basis we must consider it an important type. In 

particular within a domain with such a variety of document types as is the case in egovernment. 

For the remainder of the types a medium or low frequency was traced in 

the table, suggesting that all types could be relevant at some point, but that not 

necessarily all should be included in a default search interface.


7.5 Summary and implications for indexing 

We introduce the present section with 2 quotes from the focus groups, 

emphasizing the role of information in e-government: 

“There is a high, high frequency of information seeking. It is indeed necessary 

and important that everything going out from here is correct. Whether a rate or a 

reference for a paragraph or whatever it just needs to be in order.” (R26, p. ), and: 

“...you cannot memorize all the rules. That is why you go in and read them.” 

(R16, p. ). 

The quotes emphasize two important aspects of information use in SKAT: The 

information passed on to customers, such as citizens or governments, must be accurate. 

In addition the area is controlled by so many rules that it is not possible to memorize 

everything. The purpose of the present section is to summarize the findings of the 

chapter and draw the implications of the seeking behavior identified above to 

requirements for indexing. 

On the basis of the employees’ preferences for information sources it was 

reflected that the intranet was an important source of information. However, several 

participants in the focus groups expressed dissatisfaction with the system’s ability to 

retrieve relevant information. It was also found that along with the intranet and the 

internet, colleagues were important sources of information, to validate findings and to 

save time searching. 

The frequency of information varied between work tasks, but the most 

common frequency in the questionnaire was every 3 rd or 4 th time a task was solved. 

This indicates a frequent seeking behavior and suggests that the employees are 

experienced information searchers. In terms of demands for indexing practice this also 

means that the employees are able to perform exact searches, if they have the right 

options available and that they are able to assess the consequences of a query. The 

predominant information needs among the employees were verificative and conscious 

topical needs. A single work tasks stood out, but it had few cases and does not move 

the general picture. To meet these information needs, indexing must be able to support 

verificative searches by adding or drawing metadata from the documents. Thus, 

verificative information needs are characterized by being guided by some kind of 

known bibliographic information about the document. The conscious topical needs 

182

183 

Chapter 7 

should be supported by sufficient and high-quality metadata describing the content of 

documents. This is supported by the employees’ demands for metadata. However, the 

reduced interest for superior subjects indicates that subject metadata must be at a certain 

level of specificity in order to meet the employees’ large insight into their work areas. 

Lastly, the employees made requirements for metadata accessibility. Apart 

from subject metadata, work tasks were highly desired by the employees, indicating the 

importance of being able to locate topic experts in the national organization. Document 

types did not receive much attention in the questionnaire, but in the focus groups the 

participants emphasized the document type as an important type of metadata. No 

metadata listed in the questionnaire were dismissed. However the metadata varied in 

their importance to the employees indicating, that in the particular work area in question 

must be explored when developing metadata in e-government.

8 Search test results 

185 

Chapter 8 

In the search test we made an experimental test in a prototype of SKATs future intranet. 

Two systems were tested; system A and system B (for screen dumps: see section 6.4.1). 

System A represents a free-text web based search interface with the possibility of 

limiting search results as to document types and adjusting search results by means of 

search operators. System B extends system A’s search facilities by offering a subject 

based categorization of search results (see section 6.4.1). In the present chapter we 

present the findings of the search test. 

8.1 The test persons 

32 test persons participated in the search test, 11 males and 21 females. The mean age 

of the test persons was 47, while the average length of service comprised approximately 

22 years (see Appendix 27). The age distribution corresponds to that of the population 

and of the survey questionnaire respondents (see Appendix 23). The majority of the test 

persons either had academic educations or were educated within the organization. The 

same pattern appeared in the questionnaire results of the domain study, though the share 

of persons with an academic education was slightly higher in the search test compared 

to the domain study (see section 7.1 and 7.2). In our selection of test persons we 

emphasized that the test persons had a certain frequency of use of the current intranet. 

This is mirrored in the frequency of intranet use depicted in Table 8.1. Thus, 25 of the 

32 participants estimate their frequency of use to be on a daily basis or even several 

times a day. The remaining 7 consult the system on a weekly basis. 

Table 8.1 Frequency of test persons' intranet use 

Frequency Percent of N=32 

Several times a day 18 56.3 

On a daily basis 7 21.9 

On a weekly basis 7 21.9 

Total 32 100.0 

Legend: The intranet use frequency of the test persons participating in the search test. N=32.


Table 8.2 Ranking of test persons' most important information sources 

Information sources Frequency Percent of N=32 

Intranet 29 91% 

Internal systems 19 60% 

The Internet 18 56% 

Electronic reference works 15 47% 

Colleagues (including the staff register) 13 41% 

Legend: The table depicts the information systems most frequently mentioned by the test persons 

as being among the three most important systems in terms of solving daily work tasks. Systems 

mentioned by less than 40 % of the respondents have been excluded from the table. N=32. 

In the recruitment questionnaire the forthcoming test persons were asked the point out 

their three most important information sources from a predefined list. An open field 

provided the option of indication additional sources. 

The sources important to most of the test persons are listed in Table 8.2. As 

emerges from the table, the vast majority of the test persons list the intranet as being 

among their three most important sources of information. Subsequently, internal 

systems and the Internet follow. To sum up, the test persons are experienced users of 

the current intranet and we can expect them to have a fine idea of what can be found 

there and how. This also opens the possibility to compare the experimental system of 

the search test and the test persons’ use of it to their genuine use of the running intranet. 

Three simulated search tasks (sim1, sim2, and sim3) and one genuine 

information need (NWT) formed the basis of the search test. Sim1 was concerned with 

fiscal conditions when selling an apartment, sim2 with taxation of e-commerce, and 

sim3 with VAT registration of freelance teachers. To control the possible influence of 

the simulated search tasks to the test results, the test persons carried out a short 

evaluation every time a task had been completed. The evaluation was measured on a 5point 

Likert scale. The general scores of the evaluation appear from Table 8.3. Across 

all sessions the questions were assessed as just below average. The test persons rate 

their insight into the task topics at just below 3. Along with an average resemblance 

with daily work tasks of 2.34 and the test persons’ long average length of service it is 

assumed that the test persons have estimated that their knowledge of the work tasks is 

general, but not detailed. The average of 2.59 concerning the difficulty of the search 

186

187 

Chapter 8 

Table 8.3 General evaluation of simulated search tasks in system a, system b, and total (averages) 

System A System B All sessions 

N=48 N=47 with SWT 

(One missing) N=95 

(One missing) 

Difficulty of search task 2.19 3.00 2.59 

Insight into the topic of the search task 2.88 2.85 2.86 

Resemblance with daily tasks 2.40 2.28 2.34 

Legend: In total, 60 sessions were carried out in each system, including the genuine work tasks. 

However, we did not ask the test persons to evaluate their own search tasks. Therefore: N=48, 

when calculated for the respective systems. 

tasks indicate that the tasks have not been either too hard or too easy to solve using the 

two systems. However, here we see a fairly large distance between the level of 

difficulty between system A (2.19) and system B (3.00). It appears that the system have 

had some influence on the test persons’ perception of the level of difficulty. 

Table 8.4 is more specific and distinguishes the task assessments between the 

three simulated search tasks. Here, minor differences exist as to the insight of the 

search tasks and their resemblance with the test persons genuine work tasks. Again 

system B most significantly differs from system A regarding the level of difficulty. The 

largest distance between the two systems concerns sim2 (e-commerce). The assessment 

of sim2 in system B very well support the trouble, the test persons experienced when 

solving the task. We will explore possible explanations for the differences of 

assessments between system A and system B later in this chapter. 

Table 8.4 Evaluation of simulated search tasks specified to single simulated search tasks (averages) 

Sim1 Sim2 Sim3 Total 

SysA 

(n=16) 

SysB 

(n=16) 

SysA 

(n=16) 

SysB 

(n=16) 

SysA 

(n=16) 

SysB 

(n=15, 1 

missing) 

SysA SysB 

Difficulty 2.13 2.44 2.44 4.06 2.00 2.47 2.19 3.00 

Insight 3.06 3.44 2.75 1.88 2.81 3.27 2.88 2.85 

Resemblance 2.31 2.25 2.38 2.00 2.50 2.60 2.40 2.28


8.2 Overall searching behaviour and performance 

The search test provides data on the searching behaviour in the two test systems, system 

A and system B. The empirical data supporting the remainder of the chapter comprises 

the search log, search interviews, and relevance assessments. In total, 128 sessions 

consisting of 564 queries were undertaken by the 32 test persons, 64 sessions in each of 

the two systems. Table 8.5 summarizes the general findings. 

The average number of terms used in the queries of the test is 2.25 for system 

A and slightly higher in system B: 2.43. This corresponds to the average number of 

terms found in similar studies. For instance Jansen, Spink & Saracevic (2000, p. 214) 

measured an average of 2.21 terms in their analysis of search logs in Excite. In a log 

analysis of a university OPAC, Lau & Goh (2006, p. 1322) found the average query 

length to be 2.86. In a clustering search engine (vivisimo.com) Koshman, Spink & 

Jansen (2006, p. 1879) found and average of 3.13, also based on log analysis. Some 

years later Hochstotter & Koch (2009, p. 55) identified a slightly lower average 

(between 1.6 and 1.8) in their study based on live tickers in a number of general and 

meta Web search engines. Lately, Lykke, Price & Delcambre (2012) showed averages 

of 1.5 and 2.0 in their comparative search test of a web based health portal. Lastly, in a 

study comparing categorized searches with non-categorized searches, Käki (2005b, p. 

136) found an average of 2.10 for the former type, and 2.04 for the latter. Our findings 

corresponds to the findings of a highly similar study then, supporting that on average 

more search terms are applied in categorized queries than in non-categorized queries. 

Table 8.5 General findings of variables in search test 

Variables System A System B 

Sessions N=64 Sessions N=64 

Queries N=229 Queries N=335 

Number of terms in queries (average) 2.25 2.43 

Number of search keys in queries (average) 1,67 1.90 

Search filter “document type” applied (percentage) 43.2 31.6 

Number of sessions with reformulations (percentage) 65.6 82.8 

Number of reformulations in sessions (average) 2.58 4.23 

Query success (percentage) 30.6 21.5 

Session success (percentage) 89.1 84.4 

188

189 

Chapter 8 

As regards the average number of search keys the slightly higher average of 

terms in system B is reflected in the average number of search keys. Thus, system B 

queries contain 1.90 search keys compared to 1.67 in system B. To compare, the 

differences between average number of terms and search keys in Lykke, Price & 

Delcambre’s (2012) study was slightly lower compared to the present results. Thus, the 

test persons used more terms to represent a search keys in the present test. 

Both systems offered filtering by document type. The filter was used in 42.3 % 

of queries in system A and in 31.6 % of queries in system B. This distribution was 

expected as system A has fewer query specification options. Reformulations took place 

in both systems. However, in system A the share of sessions with reformulations was 

65.6 %, while 82.8 % of the sessions in system B required reformulations. In addition 

the average number of reformulations was notably higher in system B (4.23) compared 

to system A (2.58). This obviously means that an average session in system A contains 

3.58 queries while the corresponding number for system B is 5.23. The averages are 

slightly above the findings of similar studies of web search engines and web portals. To 

compare Lykke, Price & Delcambre (2012) found an average of 2.5 and 3.2 queries per 

session. Koshman, Spink & Jansen’s (2006, p. 1879) average was marginally higher: 

3.37. To sum up, the present study, and in particular system B, has an increased number 

of queries in sessions, when compared to similar studies. 

The success of sessions and queries has been summed up in Table 8.6 and 

Table 8.7. The total success at session level slightly benefits system A with relevant 

documents found in 89.1 % of all sessions. System B succeeded in 84.4 sessions. A 

specification as to search tasks reveals a fairly even distribution of successful sessions 

Table 8.6 Session success (percentages) 

Sim1 Sim2 Sim3 NWT Total 

SysA SysB SysA SysB SysA SysB SysA SysB SysA SysB 

Session 15 16 15 9 16 16 11 13 57 54 

succeeded 

(93.8) (100.0) (93.8) (56.3) (100.0 (100.0 (68.8) (81.3) (89.1) (84.4) 

Session 1 0 (0.0) 1 7 0 (0.0) 0 5 3 7 10 

failed (6.3) 

(6.3) (43.8) (0.0) (31.3) (18.8) (10.9) (15.6) 

Total 16 16 16 16 16 16 16 16 64 64


Table 8.7 Query success (percentages) 

Query 

succeeded 



18 

Query failed 13 

Total 31 

(58.1) 

(41.9) 

(100.0) 

23 

(33.3) 

46 

(66.7) 

69 

(100.0) 

17 

(30.4) 

39 

(69.6) 

56 

(100.0) 

11 

(9.7) 

102 

(90.3) 

113 

(100.0) 

20 

(27.8) 

52 

(72.2) 

72 

(100.0) 

190 

22 

(25.6) 

64 

(74.4) 

86 

(100.0) 

15 

(21.4) 

55 

(78.6) 

70 

(100.0) 

16 

(23.9) 

51 

(76.1) 

67 

(100.0) 

70 

(30.6) 

159 

(69.4) 

229 

(100.0) 

72 

(21.5) 

263 

(78.5) 

335 

(100.0) 

between the two systems except in sim2. For the remainder of the sessions, the systems 

performs equally, and even with a minor advantage for system B. In sim2, 1 session 

failed in system A, while 7 sessions failed in system B. This may very well explain a 

part of why the test persons assessed the task as markedly more difficult, as we have 

just seen. 

At query level the total number of successful searches is fairly even between 

the two systems. Only, in system B the total numbers of failed queries are markedly 

higher than in system A, particularly concerning sim1 and sim2, and as a consequence 

also when compared at system level in the last two columns in Table 8.7. Thus, the 

performance at query level increases the differences of performance at the benefit of 

system A compared to the more even overall performance at session level. In short, the 

two systems provide approximately the same number of successful queries. It just 

requires more failed queries in system B. 

To sum up, the overall comparison of the two test systems shows a slight 

advantage of system A at session level in terms of ability to retrieve relevant 

documents. The advantage of system A increases, when measured at query level. In 

addition, system A differs from system B, as fewer terms are needed in queries, and the 

share and number of reformulations are lower. In the sections to follow we will explore 

the nature and causes of the difference of performance of the two systems. We will 

explore, what characterizes the search situation (section 8.2.1), the number and types of 

reformulations carried out (section 0), and the unintended use of system A in system B 

searches (section 8.2.3).

8.2.1 The search situation 

191 

Chapter 8 

The search situation is characterized by different components. In the dataset 

we have identified four components that is guiding this presentation of results: sessions, 

queries, search operators, and filtering by document type. We present the results in that 

order. 

8.2.1.1 Sessions 

128 sessions were carried out in the search test, 64 in system A and 64 in system B. As 

appears from the total numbers, more queries are executed in system B than in system 

A. This is also the case at task level (see Table 8.8). Here, the average number of 

queries needed in order to solve a task in system differs with almost 2 queries (the last 

column). As regards the individual search tasks, the genuine information need has a 

slightly lower average in system B compared to system A, indicating that the genuine 

information need actually benefitted from the categories. In the remainder of the search 

tasks system B is above system A in terms of averages. It has already been shown that 

particularly sim1 and sim2 contained a significantly higher share of failed queries in 

system B compared to system A. That also appears in the present table, where sim1 and 

sim2 executed in system B has an average of queries twice as large as in system A. In 

terms of variance, the standard deviation of the two systems is practically the same. At 

task level, the two differs more with the highest maximum of system A in sim3 (27 

reformulations), and the highest maximum of system B in sim1 (18 reformulations) (see 

Table 1, Appendix 28). Thus, the variances within both systems are fairly large. For 

sim1 the difference is caused by a very high success rate in system A. A further 

explanation could be that sim1 in system A was assessed as below average as regards 

difficulty, and that the test persons had a rather good knowledge of in advance (cf. 

Table 8.4) 

Table 8.8 Number of queries in sessions at task level (averages) 

System A 1.94 (n=16) 

System B 4.31 (n=16) 


3.50 (n=16) 4.50 (n=16) 4.38 (n=16) 3.58 (n=64) 

7.06 (n=16) 5.38 (n=16) 4.19 (n=16) 5.23 (n=64) 

Total 3.13 (n=32) 5.28 (n=32) 4.94 (n=32) 4.28 (n=32) 4.41 (n=128)


Table 8.9 Number of queries in sessions as to success or failure (averages) 

Session 

succeeded 

Session 

failed 



2.00 

(n=15) 

1.00 

(n=1) 

Total 1.94 

(n=16) 

4.31 

(n=16) 

2.93 

(n=15) 

. 12.00 

(n=1) 

4.31 

(n=16) 

3.50 

(n=16) 

6.78 

(n=9) 

7.43 

(n=7) 

7.06 

(n=16) 

4.50 

(n=16) 

192 

5.38 

(n=16) 

3.73 

(n=11) 

- - 5.80 

(n=5) 

4.50 

(n=16) 

5.38 

(n=16) 

4.38 

(n=16) 

3.46 

(n=13) 

7.33 

(n=3) 

4.19 

(n=16) 

3.28 

(n=57) 

6.00 

(n=7) 

3.58 

(n=64) 

4.83 

(n=54) 

7.40 

(n=10) 

5.23 

(n=64) 

Table 8.9 illustrates the number of queries in sessions of success and failure 

respectively. The majority of the sessions are finished with the retrieval of one or more 

relevant documents. From the table it is clear that, apart from one exception (sim1, 

system A), sessions with fewer reformulations are more likely to succeed. In the three 

simulated search tasks, system A is superior to system B, as system A sessions have a 

lower average of queries. For the genuine information need (NWT) the average is a 

little lower for system B sessions than for system A sessions, but not close enough to 

change the overall impression of system A as the most efficient system in terms of low 

average number of queries in sessions. To sum up, at session level test persons put 

more effort, in terms of the number of queries, into sessions that remain unsolved in the 

end. The average number of queries is higher in system B searches, and a session is 

more likely to succeed, if it is solved with fewer queries. 

8.2.1.2 Queries 

The average number of terms has already been summed up to be 2.25 for 

system A and 2.43 for system B. In Table 8.10 the calculations have been made up at 

task level. The table shows that more terms have been entered in all system B queries 

when compared to system A, except for sim3. Here the average number of terms is 

notably lower in system B than in system A. One possible reason for this could again 

be found in the test persons’ assessments of the task. Thus, sim3, system B have 

received the absolute highest score on resemblance with the test persons’ daily work 

tasks. However, due to the scores of sim3, the connection between the average numbers 

of search terms in queries is not consistently higher in system B than in system A. 

However, when measured as the number of search terms entered in the respective 

systems, the overall impression is a superior system A.

Table 8.10 Number of search terms in queries (averages) 


193 

Chapter 8 

System A 2.32 

(n=31) 

2.39 

(n=56) 

2.42 

(n=72) 

1.94 

(n=70) 

2.25 

(N=229) 

System B 2.54 

(n=69) 

2.88 

(n=113) 

1.79 

(n=86) 

2.39 

(n=67) 

2.43 

(N=335) 

Total 2.47 (n=100) 2.72 (n=169) 2.08 (n=158) 2.16 (n=137) 2.36 (N=564) 

Table 8.11 outlines the number of search keys used for the individual search 

tasks of the test. Overall, the average number of search terms is above the average 

number of search keys in queries. That means that on average each search key was 

represented with more than one term. The figures count examples of synonym terms for 

the same concept and phrases such as “when to become VAT registered”. The average 

number of search keys to some extent reflects the average number of search terms just 

identified. Thus, the average is higher in system B for sim1 and sim2, while sim3 is 

higher in system A. The low average in sim1, system A reveals that the search task had 

one very significant word, parents’ purchase (forældrekøb), which, when used as a 

query term, listed a highly relevant citizen booklet that most test persons assessed as 

relevant. As can be seen from the table, more concepts have been used in system B 

compared to system A. That may, at least partially be explained by the test persons’ 

lack of insight into the test system. In a number of cases the test persons composed a 

full query that were able to retrieve the documents wanted and then they were asked to 

filter by a category. In some of these cases categories representing a search key already 

represented by the search terms were chosen. In other cases an additional search key 

Table 8.11 Number of search keys in queries (averages) 


System A 1.29 (n=31) 1.82 (n=56) 1.72 (n=72) 1.67 (N=159) 

System B 1.97 (n=69) 2.12 (n=113) 1.56 (n=86) 1.90 (N=268) 

Total 1.76 (n=100) 2.02 (n=169) 1.63 (n=158) 1.81 (N=427) 

Legend: The table reflects the figures from the simulated search tasks, as we could not perform the 

query search key analysis for the genuine search tasks. That explains the reduced N compared to 

Table 8.10.


Table 8.12 Number of search terms in queries as to success or failure (averages) 

Query 

success 

Query 

failure 



2.28 

(n=18) 

2.38 

(n=13) 

Total 2.32 

(n=31) 

2.3 

(n=23) 

2.65 

(n=46) 

2.54 

(n=69) 

2.35 

(n=17) 

2.41 

(n=39) 

2.39 

(n=56) 

3.00 

(n=11) 

2.86 

(n=102) 

2.88 

(n=113) 

2.2 

(n=20) 

2.5 

(n=52) 

2.42 

(n=72) 

194 

1.77 

(n=22) 

1.8 

(n=64) 

1.79 

(n=86) 

1.93 

(n=15) 

1.95 

(n=55) 

1.94 

(n=70) 

2.13 

(n=16) 

2.47 

(n=51) 

2.39 

(n=67) 

2.20 

(n=70) 

2.28 

(n=159) 

2.25 

(n=229) 

2.21 

(n=72) 

2.49 

(n=263) 

2.43 

(n=335) 

was added to the query by the choice of a category. These latter cases explain a part of 

the reason for the general increased number of search keys in system B searches. 

In Table 8.12, the average number of search terms in successful queries, and in queries 

that failed is listed. With the exception of sim2, system B, queries have consistently 

had a higher success rate with a lower number of search terms. In terms of search keys 

the same overall picture is the same (see Table 8.13). Here successful queries 

consistently represent fewer search keys, when compared to failed queries. Thus, in the 

present database a query based on few search terms and search keys is more likely to 

retrieve relevant documents. A part of the explanation could be the relatively small 

database behind the prototype. The more search terms entered the less documents may 

match the search terms. This is supported by a correlation analysis showing a 

statistically significant relation between the number of search terms entered and the 

number of hits (see table 4, Appendix 28). Further, the succession of search tasks did 

Table 8.13 Number of search keys in queries as to success or failure (averages) 

Query 

success 

Query 

failure 


SysA SysB SysA SysB SysA SysB SysA SysB 

1.28 

(n=18) 

1.31 

(n=13) 

Total 1.29 

(n=31) 

1.57 

(n=23) 

2.17 

(n=46) 

1.97 

(n=69) 

1.53 

(n=17) 

1.95 

(n=39) 

1.82 

(n=56) 

2.09 

(n=11) 

2.12 

(n=102) 

2.12 

(n=113) 

1.65 

(n=20) 

1.75 

(n=52) 

1.72 

(n=72) 

1.55 

(n=22) 

1.56 

(n=64) 

1.56 

(n=86) 

1.49 

(n=55) 

1.77 

(n=104) 

1.67 

(N=159) 

1.66 

(n=56) 

1.96 

(n=211) 

1.90 

(N=267)

195 

Chapter 8 

not either have an effect on the number of search terms applied (see table 4, Appendix 

28). Thus, there were no significance as to the succession of search tasks and the 

number of terms entered in the query field. Another reason for the higher success of 

queries with fewer search terms and search keys may be the test persons’ professional 

background. By this is meant that the test persons in a number of queries entered 

specific and correct search terms that efficiently retrieved documents. When entering 

more terms or search keys at the same time the number of search results became very 

limited. To conclude, the experiences gained during the test did not change how test 

persons composed their queries, at least in terms in the number of terms entered. Also 

queries with fewer terms and concepts were superior, most likely because the test 

persons’ insights into the general topic made them enter qualified search terms, and 

because fewer terms and concepts did not restrict the number of results too much. More 

terms and concepts were applied in system B, partially because categories were added in 

system B to queries that at times were complete without the category. When the 

category was added, it occasionally represented a new concept, which increased the 

average number of concepts in system B. 

Table 8.14 Distribution of search operator in queries (percentages) 



Free text 16 22 27 61 21 62 38 32 102 177 

(51.6) (31.9) (48.2) (54.0) (29.2) (72.1) (54.3) (47.8 (44.5) (52.8) 

Pages 13 45 28 48 46 23 23 29 110 145 

containing 

all words 

(41.9) (65.2) (50.0) (42.5) (63.9) (26.7) (32.9) (43.3) (48.0) (43.3) 

This exact 2 - 1 4 5 1 5 6 13 11 

sentence (6.5) (1.8) (3.5) (6.9) (1.2) (7.1) (9.0) (5.7) (3.3) 

At least - 2 - - - - 4 - 4 2 

one of the 

words 

(2.9) 

(5.7) (1.7) (0.6) 

Total 31 69 56 113 72 86 70 67 229 335 

Legend: The AW operator retrieves documents containing all search terms. FT retrieve documents 

that contain most, but not necessarily all, search terms. ES corresponds to applying quotation 

marks. And the OW operator retrieves documents, where at least one of the types search terms is 

contained. In the search test all search results were ranked as to the best match (relevance).


8.2.1.3 Search operators 

In the search interface the default setting of the search operator field is “Free 

text” (FT). Therefore, as the test persons did not have any prior experience with the test 

systems, it was plausible that the default FT operator would be the most frequently used 

in the test. Thus, users tend to use the default settings put forward by the system 

(Markey, 2007a, p. 1077). As expected, the FT operator had a high frequency across the 

queries, though along with the “Pages containing all words” (AW) operator. The 

unexpected is that system A has a slightly higher frequency of FT searches, while the 

opposite is the case for system B. Thus, in system B, the AW operator is more frequent 

than the FT operator (see Table 8.14). We have previously mentioned that the AW 

operator is the more restrictive of the two (see section 6.4.1). In combination with the 

mandatory categorization in system B, it is likely to result in large differences between 

the sizes of search results in the two systems. 

One explanation for the unexpected distribution of the FT and the AW 

operators between system A and system B is that some test persons had trouble 

Table 8.15 Number of search terms used with search operators in queries (averages) 

Free text 

(FT) 

Pages 

containing 

all words 

(AW) 

This exact 

sentence 

(ES) 

At least one 

of the 

words 

(OW) 



2.25 

(n=16) 

2.62 

(n=13) 

1.0 

(n=2) 

Total 2.32 

2.41 

(n=22) 

2.62 

- 2.0 

(n=31 

(n=45) 

2.30 

(n=27) 

2.46 

(n=28) 

- 3.00 

(n=2) 

2.54 

(n=69) 

(n=1) 

2.64 

(n=61) 

3.10 

(n=48) 

3.75 

(n=4) 

1.57 

(n=21) 

2.85 

(n=46) 

2.00 

(n=5) 

196 

1.58 

(n=62) 

2.17 

(n=23) 

6.00 

(n=1) 

1.76 

(n=38) 

1.83 

(n=23) 

2.40 

(n=5) 

- - - - 3.75 

2.39 

(n=56) 

2.88 

(n=113) 

2.42 

(n=72) 

1.79 

(n=86) 

(n=4) 

1.94 

(n=70) 

2.03 

(n=32) 

2.76 

(n=29) 

2.50 

(n=6) 

1.94 

(n=102) 

2.51 

(n=110) 

2.08 

(n=13) 

- 3.75 

2.39 

(n=67) 

(n=4) 

2.13 

(n=177) 

2.74 

(n=145) 

3.27 

(n=11) 

2.0 

(n=2)

197 

Chapter 8 

incorporating the two operators and separating them from each other. Thus, test persons 

intermittently wondered, why search terms did not occur in their result list, when using 

the FT operator. To exemplify: 

“Yes, but on the other hand it could also give… free text… then they all ought 

to come…” (TP15, line 306) 

In addition the test persons consistently used more search terms when applying the AW 

operator than when the FT operator was used (see Table 8.15), resulting in a gap 

concerning the number of documents retrieved when using one of the two preferred 

operators. To illustrate, an average search in system A using the FT operator retrieved 

548 documents while the AW operator in the same system on average retrieved 121 

documents. In system B, average FT searches retrieved 25 documents, while average 

AW searches retrieved 10 documents (see Table 3, Appendix 28). Thus, the searches 

carried out in system B were significantly narrower than the broader system A searches, 

as the search results in addition were filtered as to the subject. In terms of Boolean 

logic, the addition of a category corresponds to combining a query with an additional 

term, and in some cases an additional concept, as a Boolean “AND”. Again it appears 

that some test persons have had trouble fully understanding the comparative 

implications of the two search operators. Unfortunately, it has not been possible to 

deduce causes for the difference in search operators between the two systems in the 

search interviews, as the test persons have not addressed it during their searches. 

Lastly, the search operator field was rarely used to adjust search results in 

reformulations (cf. Table 8.21). That indicates that the test persons did not feel 

sufficiently safe using the operators for reformulations and instead preferred other types 

of reformulations (we analyse reformulations closer below (section 0)). That the 

understanding of Boolean operators challenges end users corresponds to the findings of 

similar studies (eg. Markey, 2007a). 

As it is evident from above, the use of search operators has resulted in 

significant differences as to numbers of hits retrieved in the two test systems due to the 

use of search operators. However, the success of the queries in terms of search 

operators is needed in order to identify, which performed better. That appears from 

Table 8.16. The success rate of system A is higher on a general level, when compared 

to system B. System A queries have a slightly higher success rate in FT searches


Table 8.16 Success of search operators (percentages) 

System A System B 

FT AW ES OW FT AW ES OW 

Success 33 30 7 - 38 32 1 (9.1) 1 

(32.4) (27.3) (53.8) (21.5) (22.1) 

(50.0) 

Failure 69 80 6 4 139 113 10 1 

(67.6) (72.7) (46.2) (100.0) (78.5) (77.9) (90.9) (50.0) 

Total 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 

Legend: FT=Free text, AW=Pages containing all words, ES=This exact sentence, OW=At least one 

of the words. 

compared to AW searches. As appeared from Table 8.15, the number of search terms 

was lower in FT searches compared to AW searches. This suggests that system A has 

the best performance in open searches (using the FT operator and fewer terms). As 

regards system B, the success of FT and AW is fairly even, but below the average of 

system A. Apparently, the queries in system B have filtered out too many relevant hits. 

To conclude, the best performance was found in system A using the FT operator. This 

also indicates a well-functioning relevance ranking within the system. To conclude, 

system A managed to perform better on the basis of the broader queries applied. The 

test persons had difficulties applying and understanding the search operators correctly, 

which lead to a weaker performance of system B due to a majority of small result sets. 

8.2.1.4 Filtering by metadata 

The test system documents were marked up as to which document type the 

document belongs to. That facilitates an inclusion of the particular metadata field 

“document type”, when queries are built up. Document type metadata are a powerful 

retrieval tool in a collection like the test collection with many heterogeneous document 

types. Thus, using the document type filter removes many irrelevant documents by 

their type. Requesting a specific document type was optional in the prototype. From 

the comments given by test persons during their search sessions it is clear that the 

specification of document types is an important option in the search interface. The 

possibility for specifications was not commented by the test persons as such, but it was 

used as a natural function in queries. In addition, document types were also mentioned 

as one among several important metadata in the domain study (see section 7.4.2). In 

198

199 

Chapter 8 

particular legal guidances were emphasized as important to the employees’ work. For 

instance: 

“Well, it is the common assessment guidance, the one we refer to as our Bible, 

you can say, where you need to go check, if it is right in the legal rules…” (TP02, line 

112-113). 

And the statement is supported: 

“And that is just the problem: Is it a business or is it not? And I know the 

assessment guidance so well that I know that all facets are included here. There might 

be four or seven sub divisions to that document, but it is in there. It is just a matter of 

clicking further and further down until you find it…” (TP05, line 65-68). 

From Table 8.17 it appears that overall a larger share of system A queries used 

the document type filter in comparison to system B. When applying the document type 

filter, legal guidances was the preferred document type searched across both systems. 

That emphasizes the importance of the document type in the test persons’ daily work, 

which was also expressed during the interviews. Further, legal guidances are listed as 

one among more relevant document types in the non-topical facet for all three simulated 

search tasks (see Table 6.7). 

At search task level it becomes evident that the overall averages (the outer right 

columns in Table 8.17) represent a large variation. In system A the average of 56.8 % 

of “None chosen” includes the highest average of 83.9 % in sim1 and the lowest in sim3 

(29.2 %). The differences in system B are smaller, but still of interest. Here the highest 

average is 86.6 % in the genuine information need and the lowest at 55.1 % in sim1. 

The general lowest use of the document type filter is in sim1, system A and in the 

genuine search task, system B. Here, less than a quarter of the queries applied the filter. 

In sim3 the biggest difference between the two systems appears. Here the filter was 

used in approximately 70 % of the system A queries, while only approximately a 

quarter of the system B queries applied the filter. We may explain the general higher 

use of the document type filter in system A with the lower number of filtering 

possibilities, when compared to system B. 

When compared to the facet analysis of the simulated search tasks (section 

6.4.7; Table 6.7), the largest share of correct document filter settings was used in sim1.


Here legal guidances, legislation, and citizen booklets were listed as correct document 

types for the task. All queries used either no filter or one of the three just mentioned. 

The same was the case for sim 3, system B. For sim 3, system A and sim 2, a greater 

variety of document types were applied in the queries. Considering the findings 

Table 8.17 Document type filter used in queries (percentages) 

None 

chosen 

Legal 

guidances 

Business 

guidances 

Citizen 

booklets 

Legislation 

Internal 


Internal 

guidances 


newsletters 



26 

(83.9) 

2 

(6.5) 

3 

(9.7) 

38 

(55.1) 

26 

(37.7) 

- - 

5 

(7.2) 

33 

(58.9) 

11 

(19.6) 

5 

(8.9) 

4 

(7.1) 

- - - 

71 

(62.8) 

16 

(14.2) 

18 

(15.9) 

- 

5 

(4.4) 

21 

(29.2) 

10 

(13.9) 

11 

(15.3) 

200 

3 

(4.2) 

13 

(18.1) 

62 

(72.1) 

8 

(9.3) 

3 

(3.5) 

- 

13 

(15.1) 

- - - - - - 

- - 

1 

(1.8) 

- - - 

1 

(0.9) 

2 

(1.8) 

5 

(6.9) 

5 

(6.9) 

50 

(71.4) 

4 

(5.7) 

4 

(5.7) 

6 

(8.6) 

58 

(86.6) 

8 

(11.9) 

- - 

- 

- - 

- - - 

- - - 

- 

130 

(56.8) 

27 

(11.8) 

16 

(7.0) 

14 

(6.1) 

13 

(5.7) 

6 

(2.6) 

6 

(2.6) 

5 

(2.2) 

229 

(68.4) 

58 

(17.3) 

Case law - - - - - - 4 (5.7) - 4 (1.7) - 

Others - - 2 (3.6) - 1 (1.4) - - - 3 (1.3) - 

Legislative 

materials 

Forms 

SKAT 

circulars 

21 

(6.3) 

5 

(1.5) 

18 

(5.4) 

- 

1 

(0.3) 

2 

(0.6) 

- - - - 2 (2.8) - - - 2 (0.9) - 

- - - - - - 2 (2.9) 

1 

(1.5) 

2 (0.9) 

1 

(0.3) 

- - - - 1 (1.4) - - - 1 (0.4) - 

Total 31 69 56 113 72 86 70 67 229 335

201 

Chapter 8 

Table 8.18 Search success for the document type filter in system A and system B queries 

(percentages) 

System A Total System B Total 

Success Failure system A Success Failure system B 

None chosen 52 (40.0) 78 (60.0) 130 59 (25.8) 170(74.2) 229 

(100.0) 

(100.0) 

Legal 

2 (7.4) 25 (92.6) 27 6 (10.3) 52 (10.3) 58 

guidances 

(100.0) 

(100.0) 

Legislation 0 13 13 2 (11.1) 16 (88.9) 18 

(100.0) (100.0) 

(100.0) 

Business 7 (43.8) 9 (56.3) 16 3 (14.3) 18 (85.7) 21 

guidances 

(100.0) 

(100.0) 


newsletters 

0 5 (100.0) 5 (100.0) 0 2 (100.0) 2 (100.0) 

Internal 

guidances 

Citizen 

booklets 

Internal 


1 (16.7) 5 (83.3) 6 (100.0) 0 1 (100.0) 1 (100.0) 

6 (42.9) 8 (57.1) 14 

(100.0) 

2 (40.0) 3 (60.0) 5 (100.0) 

1 (16.7) 5 (8 3.3) 6 (100.0) - - - 

Legend: Document types that have been applied less than 5 times in total across the two systems 

have been omitted from the table. 

regarding sessions (section 8.2.1.1), where just the same two simulated search tasks had 

the highest average number of reformulations it appears that a wrong choice of 

document types lead to more reformulations in both systems. For sim2 and sim3 the 

correct document types (in terms of the facet analysis) was legal guidances, legislation, 

and business guidances. To conclude, the use of the document filter is overall higher in 

system A. Further, the frequency of application very much depends on the specific task 

at hand. And lastly, if a wrong document type has been chosen, the query is likely to 

result in an unsatisfactory search result and subsequent reformulation. 

The influence of the document type filter on the search success appears from 

Table 8.18. In Table 8.5 it became evident that system A had the highest share of 

successful sessions and queries. The difference between the two systems was slightly


higher for queries than for sessions. The general results reflect on the division as to use 

of the document type filter. 

For system A searches particularly business guidances and citizen booklets 

were helpful in retrieving relevant documents. Both document types were mentioned as 

one among more possibilities in the aforementioned facet analysis of the simulated 

search tasks. Also queries that did not include the document type filter performed well. 

At least 40 % of the queries using these settings in system A retrieved relevant 

documents. At a general level the percentages of successful search results are lower in 

system B. The highest share of successful queries (25.8 %) was found in queries that 

did not include the document type filter. One exception is the document type “Citizen 

booklet”, but the result is less significant as it was used in 5 documents. Apart from the 

“Citizen booklet” the share of successful queries decreases, when the document type 

filter is included. We have already mentioned the over specifications of queries in 

system B. The results from Table 8.18 support the existing impression of over 

specifications in system B. Thus, using the document type filter in system A helps 

specify and reduce search results, while the filter in system B tends to limit the search 

results too much. 

8.2.2 Reformulations 

Reformulations are interesting, because they can inform us about if and how 

searchers try to correct a query on the basis of an unsatisfying search result. Previously 

Table 8.5 demonstrated frequent reformulations in both systems, though with a higher 

frequency in system B compared to system A. System B also accounts for the highest 

average number of reformulations. Table 8.19 and Table 8.20 specify the figures at task 

level. From the first of the two tables it appears that the general figures are mirrored at 

Table 8.19 Number of sessions with query reformulations (percentages) 

Reformulations 

No reformulations 



6 

(37.5) 

10 

(62.5) 

11 

(68.8) 

5 

(31.3) 

12 

(75.0) 

4 

(25.0) 

16 

(100.0) 

10 

(62.5) 

0 (0.0) 6 

(37.5) 

202 

15 

(93.8) 

1 

(6.3) 

14 

(87.5) 

2 

(12.5) 

11 

(68.8) 

5 

(31.3) 

42 

(65.6) 

22 

(34.4) 

Total 16 16 16 16 16 16 16 16 64 64 

53 

(82.8) 

11 

(17.2)

Table 8.20 Number of reformulations in sessions 


203 

Chapter 8 

SysA 2.50 (n=6) 3.33 (n=12) 5.60 (n=10) 3.86 (n=14) 3.93 (n=42) 

SysB 4.82 (n=11) 6.06 (n=16) 4.67 (n=15) 4.64 (n=11) 5.11 (n=53) 

Total 4.00 (n=17) 4.89 (n=28) 5.04 (n=25) 4.20 (n=25) 4.59 (N=95) 

Legend: Sessions without reformulations have been excluded from the present table, which makes 

N=95. That implies that of the total of 128 sessions in the search test, 95 had reformulations. 

session level with one exception. The genuine search task is the only task having fewer 

reformulations in system B than in system A. However, the number of reformulations 

is still high. Further, sim1, system A is the only example of a task where the number of 

sessions without reformulations surpasses the number of sessions with reformulations. 

In sessions with reformulations the general average number of reformulations was 4.59, 

a little less for system A searches and a little above for system B searches (see Table 

8.20, outer right column, bottom cell). From the table below it is apparent that sim3 had 

a higher average of reformulations in system A. For the remainder of the tasks, system 

B had the highest average number of reformulations. As concluded in section 8.2.1.1 it 

required more queries to retrieve relevant documents in system B. 

Types of reformulations add to our understanding of the search moves carried 

out by the test persons. We have analysed reformulations as to whether the category, 

the search terms, the document type, or the search operator were changed, if several 

parameters were changed, or if no reformulation occurred (mostly in the first query of a 

session) (see Table 8.21). As mentioned before, changes of search operators are rare in 

both systems. In system A the overall preferred reformulation is a change of search 

terms. Next follows a change of the document type and simultaneous change of two or 

more parameters. As discussed in section 8.2.1.3, the search operator is rarely used as a 

single reformulation move. Compared to system B, the use of the document type filter 

is far more used in system A, most likely because this is the only possible way of 

reducing search results in system A apart from changing the search terms or the search 

operator. Thus, the test persons actually used the available options for modification of 

their search results. Further the regular use of the document type filter emphasizes the 

importance and relevance of the filter. 

In system B the preferred reformulation was a change of categories, closely 

followed by a combination of two or more parameters. Next in terms of frequency


Table 8.21 Types of reformulations for all queries (percentages) 

No 




16 

(51.6) 

16 

(23.2) 

Category - 15 

(21.7) 

Query terms 11 

(35,5) 

15 

(21.7) 

16 

(28.6) 

15 

(13.3) 

- 46 

(40.7) 

28 

(50.0) 

Document type 4 

(7.1) 

Search 

operators 

>1 types 

simultaneously 

1 

(3.2) 

3 

(9.7) 

Total 31 

(100) 

1 

(1.4) 

22 

(31.9) 

69 

(100) 

3 

(5.4) 

5 

(8.9) 

56 

(100) 

6 

(5.3) 

6 

(5.3) 

2 

(1.8) 

38 

(33.6) 

113 

(100) 

204 

20 

(27.8) 

15 

(17.4) 

- 41 

(47.7) 

23 

(31.9) 

18 

(25.0) 

8 

(9.3) 

1 

(1.2) 

17 

(24.3) 

16 

(23.9) 

- 12 

(17.9) 

35 

(50) 

6 

(8.6) 

- 4 

(5.7) 

11 

(15.3) 

72 

(100) 

21 

(24.4) 

86 

(100) 

8 

(11.4) 

70 

(100) 

18 

(26.9) 

1 

(1.5) 

2 

(3.0) 

18 

(26.9) 

67 

(100) 

69 

(30.1) 

62 

(18.5) 

- 114 

(34.0) 

97 

(42.4) 

28 

(12.2) 

8 

(3.5) 

27 

(11.8) 

229 

(100) 

47 

(14.0) 

8 

(2.4) 

5 

(1.5) 

99 

(29.6) 

335 

(100) 

followed a change of query terms, while document type and search operators were 

rarely used as query modifiers. Here it is evident that categories are important, which is 

to be expected as they were mandatory in system B. In addition categories were to a 

large extent combined with other parameters. Most commonly a change of category 

was combined with a change of search terms (see table 6, Appendix 28). This reflects 

the design of the system, where only categories with content were shown to the 

searchers. Thus, when search terms were changed, a change of available categories was 

likely to occur, as the categories reflected the list of retrieved documents. This also 

explains the importance of a change of query terms as a reformulation. 

The division of search tasks in Table 8.21 shows some individual 

characteristics. One characteristic is the use of categories across system B queries. 

Thus, categories were used approximately twice as much in sim2 and sim3 compared to 

sim1 and the genuine work task. As the categories were not combined with other 

modification tools, the number refers to queries, where the test persons have clicked 

different categories on the basis of the same query terms to find relevant documents.

Table 8.22 Query success on the basis of types of reformulations (percentages) 

205 

Chapter 8 

System A Total System B Total 

Success Failure system A Success Failure system B 

Category - - - 24 90 114 

(21.1) (78.9) (100.0) 

Query terms 22 75 97 5 42 47 

(22.7) (77.3) (100.0) (10.6) (89.4) (100.0) 

Document type 9 19 28 1 7 8 

(32.1) (67.9) (100.0) (12.5) (87.5) (100.0) 

Search operators 1 7 8 1 4 5 

(12.5) (87.5) (100.0) (20.0) (80.0) (100.0) 

>1 types 11 16 27 19 80 99 

simultaneously (40.7) (59.3) (100.0) (19.2) (80.8) (100.0) 

The success of the respective types of reformulations has been summed up in 

Table 8.22. Overall, system A has a higher share of successful reformulations when 

compared to system B. At the level of types of reformulations the best performance is 

achieved in system A by using a combination of terms. Here, about 40 % of queries 

manage to retrieve relevant documents. Next follows a change of settings of the 

document type filter. In system B the variance of performance were smaller than in 

system A. Here the test persons had less success in improving their outputs by 

changing query terms and search operators, meaning that the two most frequent 

reformulation types accounted for fairly the same share of successful queries. 

Categories, search operators, and a combination of query modifiers had the best 

performance within the system, but the performance was below the percentages gained 

in system A. Thus, within system B we may conclude that reformulations based on a 

change of categories perform better in comparison with the remainder modification 

tools. System A reformulations were most successful, when they consisted of a 

combination of more parameters simultaneously. However, the share of successful 

reformulations leaves room for improvement in both systems.


8.2.3 Combined system B sessions and queries 

During the course of the search test, test persons occasionally ended up assessing 

documents before choosing a category in system B queries. The behavior had different 

causes. One cause was the speed of the system. Thus, in the time waiting for the 

system to categorize search results, some test persons began to review the documents 

found on the basis of the initial query. On other occasions the test persons actually saw 

the document they were looking for in the results list before even deciding on a category 

to reduce search results by, and ended up assessing the initial search results without 

filtering by a category. We denote these searches as combined system B searches. The 

following quote serves as an illustration of combined system B searches: 

“But the first time I searched, I got an e-commerce handbook. I would have 

preferred that to going down there [“down there” refers to the categorization window 

on the right hand side of the screen]” (TP10, line 10-11). 

In several cases when a highly relevant document had been discovered before the choice 

of a category in system B, the test persons could not locate the document in the 

categories, which occasionally led to frustrations. To exemplify: 

”It is just as bad, because it says “Arrears”. And “Employers”, and it is 

neither of them. So let’s see about “Employers”… Because it says “Employers and Ataxes” 

And it is withhold by the A-taxes, just like our employers withhold our taxes. I 

simply can’t find it. I know it is in there. But on the basis of this, I can’t get in there. 

Because when I know where it is at, I would go directly for it instead.” (TP05, line 113- 

117). 

A third type of behavior also triggered combined system B queries. It has previously 

been observed that system B searches tended to be narrow. When the initial query 

resulted in very few search results, it did not seem natural to the test persons to further 

reduce an already limited search results. Some test persons undertook the 

categorization despite the few results, while others omitted the categorization and 

assessed the results retrieved on the basis of the remaining search possibilities. 

“It says just that... Well, the costs to the European border should be included 

in the customs value. The other one regarding transportation, I can see that it is 

206

207 

Chapter 8 

explained with great precision. But in this case I did not search for “Customs” down 

here [in the categories]. I got it by searching for freight and customs value and “pages 

with all words”. And then I got the customs guidance, which is also the one referring to 

the customs codes treating the rules about the amount of carriage to add. So this 

[document] is a three then. But I didn’t get it by searching for “Business imports” or 

“Shipping” or “Exports” [referring to categories]” (TP32, line 295-301) 

The quote illustrates, in a combined system B search with just two retrieval results, how 

the test person ends up assessing the documents retrieved without categorization. 

The combined system B queries and sessions were coded as system B searches 

inasmuch as the test persons had access to the taxonomy and could be influenced by it. 

In methodical respect, an overview of the extent of the queries must be provided 

though. To be able to do this, additional codes were added to enable separation from 

the correct system B queries. Reporting on the extent of combined system B sessions is 

the purpose of the present section. Table 8.23 lists the share of combined system B 

sessions. The table shows that about 60 % of the system B sessions contained one or 

more queries omitting categories. It is also evident from the table that approximately 60 

% of the successful sessions in system B had at least one query that did not include the 

choice of a category. The extent of sessions that to some degree pass over the 

categorization is substantial then. 

Table 8.24 enlarge on combined system B sessions. The table shows the 

system delivering successful results for queries contained in sessions. In that way the 

Table 8.23 Sessions carried out in system B, or in a combination of System B and system A: 

Frequency and success (percentages) 

Number of sessions in Number of successful 

system B 

sessions system B 

System B 26 (40.6) 22 (40.7) 

Combined system B sessions 38 (59.4) 32 (59.3) 

Total 64 (100.0) 54 (100.0) 

Legend: System B denotes sessions, that have been carried out in system B exclusively. “Combined 

system B sessions” refers to the sessions that should have been carried out in system B, but where 

test persons have assessed the relevance of documents found in system A and in system B.


Table 8.24 System of successful queries in combined system B sessions 

208 

Frequency Percent 

Valid Task not solved 6 15.8 

System A 13 34.2 

System B 15 39.5 

Both systems applied 4 10.5 

Total 38 100.0 

Legend: The table lists the systems that have provided documents with a relevance score of 2 or 3 in 

combined system B sessions. That explains why N=38. 

table address the sessions based on a combination of the two test systems. It is 

identified that though a combined system B session have included queries conducted in 

system A and system B, not both systems have necessarily provided useful search 

results. The share of successful sessions is fairly even between the two systems. 13 

sessions were solved by omitting categories, 15 sessions had success in including the 

categories in their queries. Only 4 sessions found relevant documents by means of both 

systems. This means that at session level the share of success is fairly even between the 

two systems. It also means that the test persons may have omitted the categorization in 

some queries of a session, but it may still be by means of categorization that relevant 

documents are found. 

Table 8.25 extends the prior table and present the share of successes at query 

level. The table present all queries carried out in system B; both distinct system B 

queries and combined system B queries. Though the test persons in a number of cases 

found the categorization irrelevant, it was still used in approximately two thirds of the 

queries (see outer right hand column). In addition, when calculated in terms of the 

Table 8.25 System B queries: Frequency of category use and query success (percentages) 

Success Failure Total 

Queries with categories 52 (24.2) 163 (75.8) 215 (100.0) 

Queries without categories 20 (16.7) 100 (83.3) 120 (100.0) 

Total 72 263 335 

Legend: The table contains all queries processed in system B, both regular system B queries and 

combined system B queries (N=335).

209 

Chapter 8 

share of successful queries, queries including categories had a better performance (24.2 

% of queries with success) than queries omitting categorization (16.7 % of queries were 

successful). Summing up on combined system B searches, more than half of system B 

sessions included system A queries to some extent. However, at query level for all 

system B queries, queries including a category had a larger chance of succeeding in 

comparison with queries that basically corresponded to system A queries. 

In the post search interviews the test persons were asked to assess system B 

(see interview guide in Appendix 19). In the responses we found answers to, when the 

categorization was useful, when it was not. The answers are analysed in the present 

section in order to elaborate further on the results gained from the search log presented 

above. 

There was an overall agreement between the test persons that the 

categorization was mainly useful, when they had a large set of results. TP21 said on the 

basis of a query with 14 results: 

“It did not help me so much there, because the query didn’t have that many 

results. It was possible to cope with the documents there, whether the categorization 

had been there or not. Only 14 documents were retrieved. You could cope with that. It 

is mainly helpful, when you get large results, a thousand documents or so” (TP21, line 

257-260) 

When the categorization was useful in terms of retrieval set sizes varied. Some 

mentioned 40 documents, others far more like TP21. Categorization was also found 

useful in generating new perspectives on the composition of a query and for 

understanding the facets of the search task. That supports the decision of coding 

combined system B queries and sessions as system B queries and sessions in the overall 

coding of the search log. One example is TP02, who would have liked to have access to 

the categorization in a system A session: 

“At the end I would have liked to be able to go over there [into the 

categorization], because no matter what I did, I could not find anything. And then I need 

somewhere else to search, where I have the option of seeing other sub-topics, in order 

to perhaps access it that way.” (TP02, line 625-633). 

TP09 supports the statement of TP02 in discussing a system B session:


“It worked well there, because suddenly I found a principal topic that I could 

click on. And that gave me that… Hey! Yes! That has to do with company taxation. So it 

also helped me thinking what this is at all” (TP09, line 553-555) 

The findings confirms Käki’s findings (though based on extracted categorization, see 

section 5.4.1.5), that when “…the original query was vague, broad, general, or 

contained words that have multiple meanings...” (Käki, 2005b, p. 138). Still, the test 

persons of the present search test discussed, if the categorization was more useful to 

people with some or no insight into the topic of the tasks. TP06 knew what to look for 

in one of the tasks: 

“I knew that if I was to look for something about the taxation, then I would 

also know something about independent businesses. And then I could go in there faster. 

So I knew that I should choose “Personal incomes” over “Capital income” [examples 

of categories]. I know the tax rules. So it is easier to choose between the categories, 

when the answer is known in advance” (TP06, line 392-395) 

TP20 on the other hand did not find much help in the categorization: 

“But I don’t know, if I would ever start going through all this [the categories]. I 

think it takes more time, because I don’t know what is behind. If I was a specialist in 

SKAT and knew all about company tax settlements or the like, then it [the 

categorization] might be perfect for me. Because then I would know that I can go in 

there exactly, click that, and get the documents out. But I don’t know if it would forget 

about some documents that I need, if it limits the results too much”. (TP20, line 339- 

344) 

TP24 sums up the usefulness for both users with large knowledge on the task topic and 

users with less knowledge: 

If I know what I am looking for, or at least think I know where to go [in the 

categories], then it is really good. But when I don’t know, it might also be good, 

because you get to try out different keywords [taxonomy terms]. But if you have the 

wrong keyword, you will definitely not find it that way.“ (TP24, line 320-323) 

210

211 

Chapter 8 

The reason for the difference of opinion may be due to lack of insight, into the 

functionalities of the system, and into the structure and content of the taxonomy. Thus, 

a considerable number of the test persons expressed lack of experience with the test 

system as an important reason, if they experienced difficulties locating relevant 

documents. The difficulties can be read in Table 8.21 above. Here 34 % of all system 

B reformulations consist of changing the category, meaning that test persons clicked 

around between categories with no simultaneous changes of the remainder of the search 

options. In other cases the trouble experienced by the test persons were caused by 

apparently curious categorizations offered by system B. One example was the presence 

of the taxonomy term “Tonnage taxes” in a query regarding property gain taxes (TP13). 

We have already mentioned the varying sizes of the documents of the collection and the 

importance of the document type directions to the employees, a very large document 

type. The finding suggests that in collections with large documents, the documents 

should be indexed in smaller units to obtain precision of search results. On the other 

side, when performing categorization of search results that are already very limited as 

was the case in many system B searches, the results of the categorization may also be 

skewed. Be it lack of experience with the categorization in system B, too narrow 

queries or odd suggestions for categories, we consider all three as explanations for the 

general increased number of queries in system B sessions described in section 8.2.1.1. 

TP14 summarizes the discussion by saying: 

Once you begin to get an idea, what the categories are, what they stand for… 

Then you fumble, until you find out what it is. Are there more roads leading to Rome, 

or which is the fastest, or… Well, it is an adaptation with some things. What is the 

wisest thing to do…” (TP14, line 493-495) 

8.3 Summary and performance implications for future indexing in egovernment 

The purpose of the present chapter was to answer research question 2 and 3 regarding 

the comparative performance of automatic indexing in terms of extracted versus 

assigned indexing, and the implications for future indexing guidelines in e-government. 

Above the focus has been on research question 2 and the results of the search test. In 

this summary we will unify the conclusions drawn along the respective sections of the


chapter in order to be able state the implications of the results for e-government 

indexing guidelines (research question 2.10). The general figures of the test (section 

8.2) demonstrated a better performance of system A in terms of fewer terms and 

concepts in queries, fewer sessions with reformulations, fewer queries in sessions with 

reformulations, and a higher share of success in sessions and queries. However a more 

detailed analysis of figures and interviews provided a more differentiated picture. Thus, 

when counted as to search tasks system B were equal to or above in some tasks. This 

was the case for the genuine work tasks in terms of session success, query success, the 

number of queries in sessions and the number of queries in successful sessions. In 

sim3, system B outperformed system A in terms of a lower average number of search 

terms and search keys in queries, both as regards total numbers and successful queries. 

The analysis also detected a higher use of the document type filter in system A, which 

was explained by the reduced number of query composition tools in system A. In 

addition the search log discovered that above half of system B sessions included one or 

more queries omitting categorization. The search log and the search interviews revealed 

different reasons for the omissions: When search results were too small, if a relevant 

document was discovered at the list of results while waiting for the system to categorize 

results, or related to the previous: if a highly relevant document was found among the 

first results before a category was chosen. 

Different causes were found for the lower general performance of system B. 

One reason was the test persons’ challenges of handling the search operators available 

in the prototype. Significantly more restrictions were applied in system B queries, 

resulting in at times very few search results, and also reducing the assessment of the 

documents retrieved. Another reason was found in the post search interviews. Here 

lack of experience with the categorization features of system B was a frequent 

explanation for the difficulties experienced. Furthermore, some test persons found it 

difficult to identify by the label, which documents were contained in the respective 

categories. The findings emphasize the importance of users’ familiarity with the design 

and functionality of retrieval systems. The outcome of the difficulties could be detected 

in the types of reformulations carried out. To explain, about one third of all queries 

carried out in system B were reformulations based on a change of categories alone. 

Opposite understandings also existed among the test persons though. Overall, 

categorization was useful, when there was a certain amount of documents to categorize. 

At few results it was easier to look through the documents manually. System B was 

also useful, when the employees had some knowledge of the search task topic. Then it 

212

213 

Chapter 8 

was considered easier to assess the relevance of the categories, as the labels of the 

categories made sense. However, the categorization of system B was also beneficial, 

when test persons had a limited knowledge of the search task at hand. In those cases 

categories helped the test persons discover and understand facets contained in the task. 

Here it is important to make clear that limited knowledge should be understood as 

generalist knowledge of the organization topics. 

As appears from the concluding remarks the use and omission of categorization 

in solving search tasks is not the same to all users despite that they may find themselves 

within the same domain, as with the case study carried out in the thesis. On the basis of 

the search test it is concluded that at times free text indexing as represented in system A 

is preferred by users. This is in particular the case, when they know precisely what to 

look for. In these situations metadata like the type of the document is helpful in 

composing queries of high precision. When few documents of high precision are the 

result of a query, the employees prefer searching by metadata. What has also become 

evident during the test is the employees’ emphasis of document types, both concerning 

queries and when assessing query outputs. The employees had a large insight into the 

range of documents at the intranet, as specific document types often were the outcome 

of a work process. That stresses the importance of metadata in e-government, when 

composing queries, but also in document snippets of search results. 

The overall implication of the search test for indexing guidelines in egovernment 

is that both extracted and assigned automatic types should be present in 

professional-government. As appeared from chapter 5, categorization has primarily 

been tested on the www, but as demonstrated in the search test, also smaller and more 

specialized systems may benefit from the indexing approach. From the domain study 

we learned that verificative information needs and conscious topical information needs 

are prevalent among e-government employees. Extracted indexing in terms of free text 

indexing has proven to be useful particular to the verificative type of information needs, 

while categorization was used more, when the test persons need ideas for search terms 

or perspectives of the work task at hand, that is, the conscious topical needs. 

Assigned indexing in the form of categorization assisted the users in their 

information seeking, when they needed ideas for query reformulation or when they had 

difficulties interpreting the concepts contained in a search task. In future e-government 

indexing both indexing approaches should be represented in order to meet the diversity 

of information need types identified in the domain study. The search test has also 

emphasized the importance of users’ familiarity with the KOS (in this case the


taxonomy). When the employees didn’t correspond with the categories, it easier to 

manually go through an number of results, than it was to click a number of categories to 

find something relevant to the task at hand. 

214

9 Conclusion and recommendations for future work 

215 

Chapter 9 

The purpose of the thesis was to investigate if and how automatic indexing can improve 

professional e-government users’ access to digitalized, work based information. To do 

this, the preceding chapters have reviewed, investigated, and analysed indexing, 

information seeking and searching in e-government from a professional, user based 

perspective. Chapter 2 explained the methodological standpoint of the thesis. Chapter 

3, 4, and 5 reviewed the e-government domain, e-government seeking behaviour, and 

indexing methods respectively. The review chapters served the purpose of guiding the 

empirical investigations. In chapter 6 we outlined and accounted for the empirical 

designs, data collection and analysis of the two overall empirical studies of the thesis, 

the domain study and the search test. The results of the studies were reported and 

analysed in Chapter 7 and 8. Chapter 7 addressed research question 1 concerning 

professional e-government seeking behaviour and the related indexing demands by 

accounting for the results of the domain study. Chapter 8 were concerned with the 

search test. By doing this, research questions 2 and 3 concerning the domain specific 

performance of two indexing methods, and the related implications for indexing 

guidelines within the domain was answered. The purpose of the present chapter is to 

unify the thesis’ threads in order to answer the research questions put forward in 

Chapter 1. In section 9.1 we summarize the empirical findings of the thesis. Section 

9.2 makes recommendations for future work. 

9.1 Summary of empirical findings 

From the domain study it was found that the e-government employees applied a myriad 

of mainly electronic information sources in their daily work. The predominant source 

was the intranet. It has the highest use across all work tasks, while other types of 

sources depend on the work task at hand. The general prevalence of the intranet 

supports its relevance to our choice of test system for the search test. Apart from direct 

information sources both the open field of the questionnaire and the focus group 

participants expressed an extensive use of colleagues as sources of information. 

The employees had a large work experience within SKAT. With a long length 

of service in the organization the frequency of information seeking predominantly took


place within regular intervals, though not all the time. The employees demonstrated a 

good basic insight into their work topics on the basis of their experience. Though, 

particularly the employees engaged in citizen service had experienced, that their work 

tasks had changed with the introduction of self-service. The change had caused a 

reduced memorizing of rules and regulations, as the employees were less in contact with 

citizens. The result was an increased need for verification and updating information. 

Beyond that employee routine and topic insight are attributable to the general frequency 

of information seeking of approximately every 3 rd of 4 th time a work task is handled. To 

conclude, the study confirms the expected changes of employees’ work tasks with the 

introduction of e-government, at least regarding employees occupied with servicing 

citizens. 

The main reason for consulting the intranet was verificative and conscious 

topical information needs. A few work tasks from the administrative parts of the 

organization stood out with a high share of more complex information needs in terms of 

muddled topical needs, but they were exceptions to the general picture. It must be taken 

into account though, that the questions guiding the information needs questions of the 

questionnaire specifically concerned the intranet, and that other information needs may 

occur in relation to other information sources. However, the results correspond well to 

the experience and insight of the employees and to the conclusions drawn above, that 

employees often check up on information and rules to make sure they are updated. In 

addition, the results regarding information needs were verified by the focus groups. 

Concerning metadata the domain study found an extensive need for metadata 

among the employees. A part of the reason for requiring more and higher quality 

metadata originated from a general difficulty of locating relevant documents in the 

running intranet. The difficulties often made the employees consult a colleague instead 

of the intranet In particular content metadata in terms of subject metadata were 

requested by the employees at the intranet, but also other types were inquired by many 

employees. Thus, a general interest towards metadata existed among the employees. 

The findings emphasize the importance of high quality mark up of documents to 

effective sharing of knowledge in e-government. 

On the basis of the findings of the domain study, the following demands for 

indexing were deduced. As both verificative and conscious topical needs were 

dominant among the employees when consulting the intranet, both contextual and 

content metadata should be represented in the indexing. A part of the definition of a 

verificative information need is that the user wants to locate a document on the basis of 

216

217 

Chapter 9 

some kind of known bibliographic information. This calls for contextual metadata. 

Simultaneously conscious topical needs are solved by exploring aspects of a known 

subject matter. Here content and contextual metadata is in place. To conclude, eemployees 

can gain from both types of metadata in terms of their information needs, 

which is why they should be represented in the indexing. In addition the dissatisfaction 

with the search outcomes of the present intranet was remarkable, and in many cases it 

resulted in giving up and consulting colleagues instead. For indexing guidelines it is 

emphasized that not only are metadata needed, they also need to be carefully added in 

order to ensure quality. The quality is a premise for employees to be able to carry out 

effective and efficient information seeking. 

The search test had its main focus on content metadata in terms of the subject 

categorization tested. However, also metadata in terms of document types turned out to 

be important to the test persons during the test. On the basis of the domain study 

findings regarding information needs, three low complexity simulated search tasks 

guided the test searches along with one genuine information need brought by each test 

person. Both simulated and genuine search tasks were simple in terms of the number of 

concepts included. Hence, all tasks consisted of three topical concepts or below. 

At a general level the search test found system B (comprising categorization) to 

have more average terms in queries (2.43 to 2.25 in system A), and more average 

concepts in queries (1.90 to 1.67), and to have a lower share of queries applying the 

document type filter (31.6 to 43.2). Furthermore it required more work from the test 

persons to gain success in system B. Here the share of sessions with reformulations was 

82.8 to 65.6 in system A, and the average number of reformulations was higher (4.23 to 

2.58 in system A). At session level system B was equal to or above system A in 3 of 

the 4 tasks in terms of the number of successful sessions. In terms of queries the total 

number of successful queries was fairly even between the two systems, though the 

number of failed queries were significantly higher in system B compared to system A. 

To conclude the effort required to locate relevant documents in system B was 

significantly higher. 

Further, a general finding of the study was that queries with fewer terms were 

more likely to succeed. That indicates that the test persons are very good at finding 

relevant search terms. Further it means that the combination with a category has a risk 

of over restricting results. This could indicate that a less specific taxonomy could be 

useful to the employees, at least in a relatively small database as the one tested here.


Different causes were found for the increased effort to retrieve relevant 

documents in system B. The test persons consistently used more search terms with the 

more restrictive search operator “Pages containing all words” and fewer search terms 

with the less restrictive search operator “Free text”. This shows that the test persons 

had difficulties understanding the meaning of the two predominant operators of the 

system. However, in terms of search results, system B further reduced the results in the 

categorization in order to complete the query, resulting in very limited search results, 

while the same operation in system A resulted in faster retrieval of relevant documents, 

because no further restrictions were added to the query. Further, in terms of system B, 

some test persons expressed trouble finding suitable categories in the categorization to 

match their queries due to lack of knowledge of the taxonomy. The trouble was 

identified in the analysis of types of reformulations in system B too, where a change of 

mere categorization accounted for 40% of all reformulations in sim2, 47% percent in 

sim3, and 34% in total numbers. The results stress the importance of an appropriate and 

meaningful level of detail in controlled vocabularies. However, the results also stress 

that though the employees are considered experienced information searchers, they may 

be confused by the meaning of Boolean operators. To compare, the average number of 

queries using the document type filter was higher in system A, though with large 

variations at task level. However, the use reflected a better understanding of the use of 

the document type filter as a query tool in the two systems. 

Omissions of categorizations in one third of system B queries were the result of 

the test persons’ challenges. Analyses of the queries carried out in system B showed a 

fairly even distribution of successful sessions as to whether the session had been solved 

by means of categorization or not. At query level the inclusion of a category was 

successful in 24.2 % of queries, while of the queries that omitted categories had a 

success rate of 16.7 %. In the interviews carried out, the omissions of categories were 

explained. Categorization was not supportive in queries, where a highly relevant result 

came out among the first results. Neither was it relevant, if a very small set of results 

were retrieved. In those cases the categorization were considered as inconvenient to the 

retrieval process, as it was easier to manually look through the results instead of 

deciding on the correct category. On the other hand categorization was useful in 

suggesting new search terms for a query 

Overall, it is concluded that there is a basis for implementing categorization in 

information systems supporting professional e-government users. Metadata based 

218

219 

Chapter 9 

extracted indexing are important for successful retrieval in the domain too, in order to 

be able to support verificative information needs in the domain.. 

9.2 Contributions of the thesis 

The contributions of the thesis in terms of the theoretical and empirical framework are 

identified to be: 

A confirmation of the non-verified assumptions of the domain of e-government that 

work tasks of e-government employees are expected to change as a result of increased 

self-service among external stakeholders in the domain (Snellen, 2002; Dörfler, 2003; 

Marchionini, Samet & Brandt, 2003; Brown, 2005; Landsforeningen af Kommunale 

Servicecentre, 2005; Mahler & Regan, 2005). The results have shown, that at least for 

employees engaged in servicing citizens, the need to verify information has increased, 

as less is memorized due to less routine. To LIS, the consequences of increased 

information seeking in order to remain updated are important. 

As regards information seeking of professional e-government users it has been outlined 

that the user group is not very well discovered. In the light of changes in the work tasks 

just mentioned, an update of e-government employees was needed. The thesis has 

added to our knowledge of the user group in terms of their: 

use of information sources 

frequency of information seeking 

metadata preferences 

predominant types of information needs developed and how these needs are met 

by means of contextual and content metadata 

searching behavior 

Regarding the performance of automatic categorization in the domain different things 

have been learned: 

Categorization is supportive to users in tasks, where a new perspective of a task 

is needed, either in the form of suggestions for new search terms or in offering 

an understanding of the facets contained in the search task. In addition 

categorization supports users in reducing large search results. In verificative


searches categorization is less useful, if highly relevant documents are retrieved 

fast. In those cases categorization reduces efficiency. 

Categorization have primarily been tested in larger collection than the present 

test collection. From the present results it has been learned that categorization is 

also useful in smaller document collections. 

However, in order to be able to be supportive to the user group an appropriate 

level of specificity must be expressed through the KOS. In addition, the 

categorization of documents must be correct and meet the employees’ 

understanding of the domain. 

9.3 Recommendations for future work 

In continuation of the conclusions drawn above, the present section suggests 

recommendations for future work. Thus, though the thesis have added to our 

knowledge about professional e-government information seeking and the ability of 

automatic categorization to support this behavior, new question have arisen, that 

remains to be answered. The suggestions are divided in two: Suggestions regarding the 

empirical setting and suggestions regarding the tools applied in the study. 

The empirical stetting of the thesis was a case study of SKAT, the largest 

government agency in Denmark. We have touched upon the long length of service of 

the employees and its implications for information needs and seeking behavior. This is 

not necessarily a general tendency. Therefore it would be interesting to investigate, if 

the behavior is different in smaller governments. 

As a consequence of the information needs characterized in the domain study, 

low complexity simulated search tasks were used as the point of departure of the search 

test along with one genuine search task. It was found that system B performed better in 

some variables in relation to the genuine search task as the only task. For that reason it 

would be enriching to explore the performance of e-government categorization in a 

study designed to reflect genuine search tasks to a larger extent. In this connection 

another question arises. Thus, we have not been within the aim of the test design to 

state the performance of categorization in terms of more complex information needs. A 

study investigating just complex information needs in e-government would add further 

to our knowledge of the performance of categorization in the domain. Lastly, in relation 

to the empirical setting, the search test provided an insight into employees’ use of a 

system for a short amount of time in a system that was new to them. This has 

220

221 

Chapter 9 

advantages, which have been outlined in the empirical framework. However a study 

investigating categorization in a more natural setting could add other perspectives to our 

understanding of the field. 

The search test has applied different tools. The tools have also raised questions 

to be asked ahead. The categorization made use of a two level taxonomy for arranging 

search results. Different opinions have been put forward from the test persons as to the 

specificity of the taxonomy. As it was not a part of the purpose and design of the search 

test to address this question, we have not been able to validate the cause of the 

differences. Within professional users it would therefore be interesting to gain more 

knowledge of what the appropriate specificity and choice of concepts within KOS like 

taxonomies is. In addition the project have investigated automatic assigned indexing in 

terms of automated categorization and found it supportive in e-government seeking 

behavior. Investigations of other types of assigned indexing would increase our 

knowledge of the relative performance of different assigned indexing methods in the 

domain. 

This thesis adds to our knowledge about professional e-government seeking behavior, 

and has increased our understanding of how this behavior can be supported by 

automatic indexing.

10 References 

223 

References 

Abecker, A., Bernardi, A., Hinkelmann, K., Kühn, O. & Sintek, M. (1998). Toward a 

technology for organizational memories. IEEE Intelligent Systems, 13(3), 40-48. 

Ahlgren, P. & Kekäläinen, J. (2007). Indexing strategies for Swedish full text retrieval 

under different user scenarios. Information Processing & Management, 43(1), 

81-102. 

Aitchison, J. (1992). Indexing languages and indexing. In: Dossett, P. (Ed.), Handbook 

of Special Librarianship and Information Work (6. ed., pp. 191-233). London: 

Aslib. 

Alasem, A. (2009). An overview of e-Government metadata standards and Initiatives 

based on Dublin Core. Electronic Journal of e-Government, 7(1), 1-10. 

Alavi, M. & Leidner, D.E. (2001). Review: Knowledge management and knowledge 

management systems: Conceptual foundations and research issues. MIS 

Quarterly, 25(1), 107-136. 

Albrechtsen, H. (1993). Subject analysis and indexing: from automated indexing to 

domain analysis. The Indexer, 18(4), 219-224. 

Andersen, K.V., Grönlund, Å., Moe, C.E. & Sein, M.K. (2005). Introduction to the 

special issue. Scandinavian Journal of Information Systems, 17(2), 3-10. 

Andersen, K.V. & Kraemer, K.L. (1994). Information technology and transitions in the 

public service: A comparison of Scandinavia and the United States. 

Scandinavian Journal of Information Systems, 6(1), 3-24. 

Anderson, J.D. & Perez-Carballo, J. (2001a). The nature of indexing: How humans and 

machines analyze messages and texts for retrieval. Part I: Research, and the 

nature of human indexing. Information Processing & Management, 37(2), 231- 

254. 

Anderson, J.D. & Perez-Carballo, J. (2001b). The nature of indexing: How humans and 

machines analyze messages and texts for retrieval. Part II: Machine indexing, 

and the allocation of human versus machine effort. Information Processing & 

Management, 37(2), 255-277. 

Anderson, J.D. & Pérez-Carballo, J. (2005). Information Retrieval Design: Principles 

and Options for Information Description, Organization, Display, and Access in 

Information Retrieval Databases, Digital Libraries, Catalogs, and Indexes. St. 

Petersburg: Ometeca Institute. 

Andrews, J. & Duhon, L. (1997). GILS, Government Information Locator Service: 

Blending old and new to access U.S. governmental information. The Serials 

Librarian, 31(1-2), 327-333. 

Apté, C., Damerau, F. & Weiss, S.M. (1994). Automated learning of decision rules for 

text categorization. ACM Transactions on Information Systems, 12(3), 233-251. 

Arellano-Gault, D. & del Castillo-Vega, A. (2004). Maturation of public administration 

in a multicultural environment: Lessons from the Anglo-Saxon, Latin, and 

Scandinavian political traditions. International Journal of Public 

Administration, 27(7), 519-528. 

Askim, J. (2007). How do politicians use performance information? An analysis of the 

Norwegian local government experience. International Review of Administrative 

Sciences, 73(3), 453-472. 

Askim, J. (2009). The demand side of performance measurement: Explaining 

councillors' utilization of performance information in policymaking. 

International Public Management Journal, 12(1), 24-47.


Attar, K.E. (2006). Why appoint professionals? A student cataloguing project. Journal 

of Librarianship and Information Science, 38(3), 173-185. 

Attfield, S., Blandford, A. & Makri, S. (2010). Social and interactional practices for 

disseminating current awareness information in an organisational setting. 

Information Processing & Management, 46(6), 632-645. 

Bates, M.J. (1979). Information search tactics. Journal of the American Society for 

Information Science, 30(4), 205-214. 

Becker, J., Pfeiffer, D. & Räckers, M. (2007). Domain specific process modelling in 

public administrations: The PICTURE approach. In: Wimmer, M.A., Scholl, 

H.J. & Grönlund, Å., (Eds.), EGOV 2007, (pp. 68-79). Berlin: Springer. 

Beghtol, C. (1986). Bibliographic classification theory and text linguistics: aboutness 

analysis, intertextuality and the cognitive act of classifying documents. Journal 

of Documentation, 42(2), 84-113. 

Bekkers, V. & Homburg, V. (2007). The myths of e-government: Looking beyond the 

assumptions of a new and better government. Information Society, 23(5), 373- 

382. 

Belkin, N.J. & Croft, W.B. (1992). Information filtering and information retrieval: Two 

sides of the same coin? Communications of the ACM, 35(12), 29-38. 

Belkin, N.J., Oddy, R.N. & Brooks, H.M. (1982). ASK for information retrieval: Part 1. 

Background and theory. Journal of Documentation, 38(2), 61-71. 

Bellamy, C. (2002). From automation to knowledge management: Modernizing British 

government with ICTS. International Review of Administrative Sciences, 68(2), 

213-230. 

Bellamy, C. & Taylor, J.A. (1998). Governing in the Information Age. Buckingham: 

Open University Press. 

Berrios, D.C., Cucina, R.J. & Fagan, L.M. (2002). Methods for semi-automated 

indexing for high precision information retrieval. Journal of the American 

Medical Informatics Association, 9(6), 637-652. 

Bertot, J.C., Jaeger, P.T. & Grimes, J.M. (2010). Using ICTs to create a culture of 

transparency: E-government and social media as openness and anti-corruption 

tools for societies. Government Information Quarterly, 27(3), 264-271. 

Beynon-Davies, P. (2007). Models for e-government. Transforming Government: 

People, Process and Policy, 1(1), 7-28. 

Bigdeli, Z. (2007). Iranian engineers' information needs and seeking habits: An agroindustry 

company experience. Information Research, 12(2). 

Blair, D.C. (2002). The challenge of commercial document retrieval, Part I: Major 

issues, and a framework based on search exhaustivity, determinacy of 

representation and document collection size. Information Processing & 

Management, 38(2), 273-291. 

Blair, D.C. & Maron, M.E. (1985). An evaluation of retrieval effectiveness for a fulltext 

document-retrieval system. Communications of the ACM, 28(3), 289-299. 

Blomgren, L., Vallo, H. & Byström, K. (2004). Evaluation of an information system in 

an information seeking process. In: Heery, R. & Lyon, L. (Eds.), ECDL 2004 

(pp. 57-68). Berlin: Springer. 

Bloomfield, M. (2002). Indexing: Neglected and poorly understood. Cataloging & 

Classification Quarterly, 33(1), 63-75. 

Bloor, M., Frankland, J., Thomas, M. & Robson, K. (2001). Focus Groups in Social 

Research. London: Sage. 

Borko, H. (1977). Toward a theory of indexing. Information Processing & 

Management, 13, 355-365. 

Borlund, P. (2000). Experimental components for the evaluation of interactive 

information retrieval systems. Journal of Documentation, 56(1), 71-90. 

224

225 

References 

Borlund, P. (2003a). The concept of relevance in IR. Journal of the American Society 

for Information Science and Technology, 54(10), 913-925. 

Borlund, P. (2003b). The IIR evaluation model: A framework for evaluation of 

interactive information retrieval systems. Information Research, 8(3). 

Borlund, P. & Ingwersen, P. (1997). The development of a method for the evaluation of 

interactive information retrieval systems. Journal of Documentation, 53(3), 225- 

250. 

Borlund, P. & Schneider, J.W. (2010). Reconsideration of the simulated work task 

situation: A context instrument for evaluation of information retrieval 

interaction. In: Belkin, N.J. & Kelly, D. (Eds.), IIiX 2010. New Brunswick, New 

Jersey: ACM. 

Bountouri, L., Papatheodorou, C., Soulikias, V. & Stratis, M. (2009). Metadata 

interoperability in public sector information. Journal of Information Science, 

35(2), 204-231. 

Box, R.C. (1999). Running government like a business: Implications for public 

administration theory and practice. The American Review of Public 

Administration, 29(1), 19-43. 

Brown, D. (2005). Electronic government and public administration. International 

Review of Administrative Sciences, 71(2), 241-254. 

Buckingham, A. & Saunders, P. (2004). The Survey Methods Workbook: From Design 

to Analysis. Cambridge: Polity Press. 

Byström, K. (1997). Municipal administrators at work: Information needs and seeking 

(IN&S) in relation to task complexity: A case-study amongst municipal officials, 

Information Seeking in Context. Tampere, Finland: Taylor Graham. 

Byström, K. (1999). Task Complexity, Information Types and Information Sources. 

Unpublished Doctoral dissertation, University of Tampere, Tampere. 

Byström, K. (2002). Information and information sources in tasks of varying 

complexity. Journal of the American Society for Information Science and 

Technology, 53(7), 581-591. 

Byström, K. & Hansen, P. (2005). Conceptual framework for tasks in information 

studies. Journal of the American Society for Information Science and 

Technology, 56(10), 1050-1061. 

Byström, K. & Järvelin, K. (1995). Task complexity affects information seeking and 

use. Information Processing & Management, 31(2), 191-213. 

Carey, M.A. & Smith, M.W. (1994). Capturing the group effect in focus groups: A 

special concern in analysis. Qualitative Health Research, 4(1), 123-127. 

Carmines, E.G. & Woods, J.A. (2005). Reliability assessment. In: Encyclopedia of 

Social Measurement (Vol. 3, pp. 361-365). 

Case, D.O. (2006). Information behavior. Annual Review of Information Science and 

Technology, 40, 293-327. 

Case, D.O. (2007). Looking for Information: A Survey of Research on Information 

Seeking, Needs, and Behavior. Amsterdam: Elsevier. 

Center for effektivisering og digitalisering (2002). Prospekt for FESD (Fællesoffentlig 

Elektronisk Sags- og Dokumenthåndtering). Retrieved 13-03, 2011, from 

http://modernisering.dk/fileadmin/user_upload/documents/Projekter/FESD/Bagg 

rund/FESD-prospekt.pdf. 

Chau, M., Fang, X. & Sheng, O.R.L. (2007). What are people searching on government 

web sites? A study of search activity on the Utah.gov web site. Communications 

of the ACM, 50(4), 87-92. 

Chaudhry, A.S. (2010). Assessment of taxonomy building tools. The Electronic 

Library, 28(6), 769-788.


Chen, H. (1995). Machine learning for information retrieval: Neural networks, symbolic 

learning, and genetic algorithms. Journal of the American Society for 


Choi, Y. (2010a). Enhancing access to the Web: Vocabulary analysis on users' tags and 

professionals' index terms, iConference. University of Illinois at Urbana- 

Champaign, Illinois, U.S.A. 

Choi, Y. (2010b). Traditional versus emerging knowledge organization systems: 

Consistency of subject indexing of the Web by indexers and taggers, ASIST 

2010. Pittsburgh, PA, USA. 

Choo, C.W. (2006). The Knowing Organization: How Organizations Use Information 

to Construct Meaning, Create Knowledge, and Make Decisions (2. ed.). New 

York: Oxford University Press. 

Choo, C.W., Furness, C., Paquette, S., van den Berg, H., Detlor, B., Bergeron, P. & 

Heaton, L. (2006). Working with information: Information management and 

culture in a professional services organization. Journal of Information Science, 

32(6), 491-510. 

Chowdhury, G.G. (2003). Natural language processing. Annual Review of Information 

Science and Technology, 37, 51-89. 

Chowdhury, G.G. (2004). Introduction to Modern Information Retrieval (2. ed.). 

London: Facet. 

Christian, E. (1999). Experiences with information locator services. Journal of 

Government Information, 26(3), 271-285. 

Christian, E. (2001). A metadata initiative for global information discovery. 

Government Information Quarterly, 18(3), 209-221. 

Clark, H.H. & Schober, M.F. (1992). Asking questions and influencing answers. In: 

Tanur, J.M. (Ed.), Questions about Questions: Inquiries into the Cognitive Bases 

of Surveys (pp. 15-48). New York: Russel Sage Foundation. 

Cleverdon, C. (1967). The Cranfield tests on index language devices. Aslib 

Proceedings, 19(6), 173-194. 

Cleverdon, C. & Keen, M. (1966). Aslib Cranfield research project. Factors 

determining the performance of indexing systems. Volume 2: Test results. 

Cranfield: College of Aeronautics. 

Cleverdon, C.W. (1960). ASLIB Cranfield Research Project: Report on the first stage of 

an investigation into the comparative efficiency of indexing systems. Cranfield: 

College of Aeronautics. 

Codagnone, C. & Wimmer, M.A., (Eds.). (2007). Roadmapping eGovernment 

Research: Visions and Measures towards Innovative Governments in 2020. 

[Koblentz]: eGovRTD2020 Project Consortium. 

Cole, C. & Leide, J. (2006). A cognitive framework for human information behavior: 

The place of metaphor in human information organizing behavior. In: Spink, A. 

& Cole, C. (Eds.), New Directions in Human Information Behavior (Vol. 8, pp. 

171-202). Netherlands: Springer. 

Cong, X. & Pandya, K.V. (2003). Issues of knowledge management in the public sector. 

Electronic Journal of Knowledge Management, 1(2), 25-33. 

Connaway, L.S., Dickey, T.J. & Radford, M.L. (2011). "If it is too inconvenient I'm not 

going after it": Convenience as a critical factor in information-seeking 

behaviors. Library & Information Science Research, 33(3), 179-190. 

Cook, C., Heath, F. & Thompson, R. (2000). A meta-analysis of response rates in Web- 

or internet-based surveys. Educational and psychological measurement, 60(6), 

821-836. 

Cooper, W.S. (1969). Is interindexer consistency a hobgoblin?' American 

Documentation, 20(3), 268-279. 

226

227 

References 

Courtright, C. (2007). Context in information behavior research. Annual Review of 

Information Science and Technology, 41, 273-306. 

Cousins, S.A. (1992). Enhancing subject access to opacs: Controlled vocabulary vs. 

natural language. Journal of Documentation, 48(3), 291-309. 

Coyle, K. (2008). Machine indexing. The Journal of Academic Librarianship, 34(6), 

530-531. 

Crawford, J. & Irving, C. (2009). Information literacy in the workplace: A qualitative 

exploratory study. Journal of Librarianship and Information Science, 42(1), 29- 

38. 

Croft, W.B., Turtle, H.R. & Lewis, D.D. (1991, October 13-16.). The use of phrases and 

structured queries in information retrieval. In: Bookstein, A., Chiaramella, Y., 

Salton, G. & Raghavan, V.V., (Eds.), Proceedings of the 14th Annual 

International ACM SIGIR Conference on Research and Development in 

Information Retrieval, (pp. 32-45). Chicago, Illinois, USA: New York: ACM. 

Cuillier, D. & Piotrowski, S.J. (2009). Internet information-seeking and its relation to 

support for access to government records. Government Information Quarterly, 

26(3), 441-449. 

Cunningham, S.J., Littin, J. & Witten, I.H. (1997). Applications of Machine Learning in 

Information Retrieval (Working Paper 97/6). Hamilton, New Zealand: The 

University of Waikato, Department of Computer Science. 

Davies, K. (2007). The information-seeking behaviour of doctors: A review of the 

evidence. Health Information and Libraries Journal, 24(2), 78-94. 

Dawes, S.S. (2009). Governance in the digital age: A research and action framework for 

an uncertain future. Government Information Quarterly, 26(2), 257-264. 

de Groot, D. (2003). Vigorous knowledge management in the Dutch public sector. In: 

Wimmer, M.A. (Ed.), 4th IFIP International Working Conference, KMGov 2003 

(pp. 94-99): Springer. 

de Jong, M. & Lentz, L. (2006). Municipalities on the Web: User-Friendliness of 

Government Information on the Internet. In: Wimmer, M., Scholl, H., Grönlund, 

Å. & Andersen, K. (Eds.), Electronic Government, 5th International 

Conference, EGOV 2006 (pp. 174-185). Berlin: Springer. 

De Mey, M. (1977). The cognitive viewpoint: Its development and its scope. In: De 

Mey, M., Pinxten, R., Poriau, m. & Vandamme, F. (Eds.), International 

Workshop on the Cognitive Viewpoint (pp. xvi-xxxii). Ghent, Belgium: 

University of Ghent. 

de Vaus, D. (2002b). Surveys in Social Research (5. ed.). London: Routledge. 

Del Fiol, G., Haug, P.J., Cimino, J.J., Narus, S.P., Norlin, C. & Mitchell, J.A. (2008). 

Effectiveness of topic-specific infobuttons: A randomized controlled trial. 

Journal of the American Medical Informatics Association, 15(6), 752-759. 

Dempsey, L. & Heery, R. (1998). Metadata: A current view of practice and issues. 

Journal of Documentation, 54(2), 145-172. 

Dias, C. (2001). Corporate portals: A literature review of a new concept in Information 

Management. International Journal of Information Management, 21(4), 269- 

287. 

Dietterich, T.G. (1997). Machine-learning research: Four current directions. AI 

Magazine, 18(4), 97-136. 

du Plessis, T. & du Toit, A.S.A. (2006). Knowledge management and legal practice. 

International Journal of Information Management, 26(5), 360-371. 

Dubois, C.P.R. (1987). Free text versus controlled vocabulary. Online Review, 11(4), 

243-253. 

Dumais, S., Platt, J., Heckerman, D. & Sahami, M. (1998). Inductive learning 

algorithms and representations for text categorization. In: Makki, K. & 

Bouganim, L. (Eds.), CIKM '98 Proceedings of the seventh international


conference on Information and knowledge management (pp. 148-155). New 

York: ACM. 

Dörfler, A. (2003). Business process modelling and help systems as part of KM in egovernment. 

In: Wimmer, M.A., (Ed.), KMGov, (pp. 297-303). 

Edmiston, K.D. (2003). State and local e-government: Prospects and challenges. The 

American Review of Public Administration, 33(1), 20-45. 

Edmunds, A. & Morris, A. (2000). The problem of information overload in business 

organisations: a review of the literature. International Journal of Information 

Management, 20(1), 17-28. 

Efron, M., Elsas, J., Marchionini, G. & Zhang, J. (2004). Machine learning for 

information architecture in a large governmental website. In: Proceedings of the 

4th ACM/IEEE-CS joint conference on Digital libraries, (pp. 151-159). Tuscon, 

AZ, USA. 

El-Sherbini, M. & Klim, G. (2004). Metadata and cataloging practices. The Electronic 

Library, 22(3), 238-248. 

Ellis, D. (1989). A behavioural approach to information retrieval system design. Journal 


Elwood, S. (2008). Grassroots groups as stakeholders in spatial data infrastructures: 

Challenges and opportunities for local data development and sharing. 

International Journal of Geographical Information Science, 22(1), 71-90. 

Ely, M. (1991). Doing Qualitative Research: Circles Within Circles. London: 

Routledge. 

Eppler, M.J. & Mengis, J. (2004). The concept of information overload: A review of 

literature from organization science, accounting, marketing, MIS, and related 

disciplines. The Information Society, 20(5), 325-344. 

Evans, J.R. & Mathur, A. (2005). The value of online surveys. Internet Research, 15(2), 

195 -219. 

Fagin, R., Kumar, R., McCurley, K.S., Novak, J., Sivakumar, D., Tomlin, J.A. & 

Williamson, D.P. (2003, May 20–24, 2003). Searching the workplace web. In: 

WWW2003: Proceedings of the 12th international conference on World Wide 

Web, (pp. 366-375). Budapest, Hungary. 

Fang, Z. (2002). E-government in digital era: Concept, practice, and development. 

International Journal of The Computer, The Internet and Management, 10(2), 1- 

22. 

Fangmeyer, H. (1974). Semi Automatic Indexing: State of the Art. Neuilly Sur Seine, 

France: North Atlantic Treaty Organization. 

Feldman, S. & Sherman, C. (2001). The High Cost of Not Finding Information. 

Retrieved 21-03, 2010, from 

http://www.ejitime.com/materials/IDC%20on%20The%20High%20Cost%20Of 

%20Not%20Finding%20Information.pdf. 

Fidel, R. (1994). User-centred indexing. Journal of the American Society for 


Floropoulos, J., Spathis, C., Halvatzis, D. & Tsipouridou, M. (2010). Measuring the 

success of the Greek Taxation Information System. International Journal of 

Information Management, 30(1), 47-56. 

Ford, F.N. (1985). Decision support systems and expert systems: A comparison. 

Information & Management, 8(1), 21-26. 

Foster, A. & Ford, N. (2003). Serendipity and information seeking: An empirical study. 

Journal of Documentation, 59(3), 321-340. 

Fourie, I. (2009). Learning from research on the information behaviour of healthcare 

professionals: A review of the literature 2004–2008 with a focus on emotion. 

Health Information and Libraries Journal, 26(3), 171-186. 

228

229 

References 

Fox, C. (1989). A stop list for general text. Newsletter ACM SIGIR Forum, 24(1-2), 19- 

35. 

Fox, C. (1992). Lexical analysis and stoplists. In: Frakes, W.B. & Baeza-Yates, R. 

(Eds.), Information Retrieval: Data Structures & Algorithms (pp. 102-130). 

Englewood Cliffs, New Jersey: Prentice Hall. 

Frakes, W.B. (1992). Stemming algorithms. In: Frakes, W.B. & Baeza-Yates, R. (Eds.), 

Information Retrieval: Data Structures & Algorithms (pp. 131-160). Englewood 

Cliffs, New Jersey: Prentice Hall. 

Frankfort-Nachmias, C. & Nachmias, D. (1996). Research Methods in the Social 

Sciences (5. ed.). London: Arnold. 

Freund, L., Toms, E.G. & Waterhouse, J. (2005). Modeling the information behaviour 

of software engineers using a work - task framework. Proceedings of the 

American Society for Information Science and Technology, 42(1). 

Fu, J.-R., Farn, C.-K. & Chao, W.-P. (2006). Acceptance of electronic tax filing: A 

study of taxpayer intentions. Information & Management, 43(1), 109-126. 

Fugmann, R. (1993). Subject Analysis and Indexing: Theoretical Foundation and 

Practical Advice. Frankfurt/Main: Indeks Verlag. 

Galvez, C., de Moya-Anegon, F. & Solana, V.H. (2005). Term conflation methods in 

information retrieval: Non-linguistic and linguistic approaches. Journal of 


Garcia, A.C., Dawes, M.E., Kohne, M.L., Miller, F.M. & Groschwitz, S.F. (2006). 

Workplace studies and technological change. Annual Review of Information 

Science and Technology, 40(1), 393-437. 

Gil-Garcia, J.R. & Martinez-Moyano, I.J. (2007). Understanding the evolution of egovernment: 

The influence of systems of rules on public sector dynamics. 

Government Information Quarterly, 24, 266-290. 

Gilchrist, A. (2001). Corporate taxonomies: Report on a survey of current practice. 

Online Information Review, 25(2), 94-103. 

Gilchrist, A. (2003). Tesauri, taxonomies and ontologies: An etymological note. Journal 


Gilliland-Swetland, A. (2005). Electronic records management. Annual Review of 


Gilliland, A.J. (2008). Setting the stage. In: Baca, M. (Ed.), Introduction to Metadata 

(Online version, ver. 3.0 ed.). 

Glassey, O. (2002). A one-stop government prototype based on use cases and scenarios. 

In: Traunmüller, R. & Lenk, K. (Eds.), EGOV 2002 (pp. 116-123). 

Glassey, O. (2004). Developing a one-stop government data model. Government 

Information Quarterly, 21(2), 156-169. 

Glazer, R. (1993). Measuring the value of information: The information-intensive 

organization. IBM Systems Journal, 32(1), 99-110. 

Goh, D.H.-L., Chua, A.Y.-K., Luyt, B. & Lee, C.S. (2008). Knowledge access, creation 

and transfer in e-government portals. Online information review, 32(3), 348-369. 

Golder, S.A. & Huberman, B.A. (2006). Usage patterns of collaborative tagging 

systems. Journal of Information Science, 32(2), 198-208. 

Golub, K. (2006). Automated subject classification of textual web documents. Journal 


Golub, K. (2007). Automated Subject Classification of Textual Documents in the 

Context of Web-Based Hierarchical Browsing. Unpublished PhD thesis, Lund 

University, Lund. 

Gomez, L.M., Lochbaum, C.C. & Landauer, T.K. (1990). All the right words: Finding 

what you want as a function of richness of indexing vocabulary. Journal of the 

American Society for Information Science, 41(8), 547-559.


Gouscos, D., Lambrou, M., Mentzas, G. & Georgiadis, P. (2003). A methodological 

approach for defining one-stop e-government service offerings. In: Traunmüller, 

R. (Ed.), Electronic Government (pp. 173-176). Berlin: Springer. 

Grant, G. & Chau, D. (2005). Developing a generic framework for e-government. 

Journal of Global Information Management, 13(1), 1-30. 

Greenbaum, T.L. (1993). The Handbook for Focus Group Research (Revised and 

expanded ed.). New York: Lexington. 

Gross, T. & Taylor, A.G. (2005). What have we got to lose? The effect of controlled 

vocabulary on keyword searching results. College & Research Libraries, 66(3), 

212-230. 

Grundén, K. (2009). A social perspective on implementation of e-government: A 

longitudinal study at the County Administration of Sweden. Electronic Journal 

of e-Government, 7(1), 65-76. 

Grönlund, Å. (2003). Emerging electronic infrastructures: Exploring democratic 

components. Social Science Computer Review, 21(1), 55-72. 

Grönlund, Å. (2005). What's in a field: Exploring the eGovernment domain, 

Proceedings of the 38th Hawaii International Conference on System Sciences. 

Grönlund, Å. (2010). Ten years of e-government: The 'end of history' and new 

beginning. In: Wimmer, M.A.e.a. (Ed.), Electronic Government (pp. 13-24): 

Springer. 

Grönlund, Å. & Horan, T.A. (2004). Introducing e-gov: history, definition, and issues. 

Communications of the Association for Information Systems, 15, 713-729. 

Gunnlaugsdottir, J. (2008). Registering and searching for records in electronic records 

management systems. International Journal of Information Management, 28(4), 

293-304. 

Ha, L. & Zenebe, A. (2008). Knowledge management in government, The 2nd 

International International Conference in Knowledge Generation, 

Communication and Management. Orlando, Florida: International Institute of 

Informatics and Systemics. 

Halcomb, E.J. & Davidson, P.M. (2006). Is verbatim transcription of interview data 

always necessary? Applied Nursing Research, 19(1), 38–42. 

Halkier, B. (2008). Fokusgrupper (2. ed.). Frederiksberg: Samfundslitteratur. 

Hammarström, H. (2006). A naive theory of affixation and an algorithm for extraction. 

In: Wicentowski, R. & Kondrak, G. (Eds.), SIGPHON '06: Proceedings of the 

Eighth Meeting of the ACL Special Interest Group on Computational Phonology 

and Morphology (pp. 79-88). Stroudsburg: Association for Computational 

Linguistics. 

Harman, D.K. & Voorhees, E.M. (2006). TREC: An overview. Annual Review of 


Hawking, D. (2004). Challenges in enterprise search. In: Proceedings of the 15th 

Australasian database conference, (pp. 15-24). Dunedin, New Zealand. 

Hayes, P.J. & Weinstein, S.P. (1990). Construe-TIS: A system for content-based 

indexing of a database of news stories. In: Rappaport, A. & Smith, R. (Eds.), 

The Second Conference on Innovative Applications of Artificial Intelligence 

(IAAI), Washington, DC. Menlo Park, California: AAAI Press. 

Haynes, D. (2004). Metadata for Information Management and Retrieval. London: 

Facet. 

Hazlett, S.A., McAdam, R. & Beggs, V. (2008). An exploratory study of knowledge 

flows: A case study of Public Sector Procurement. Total Quality Management, 

19(1-2), 57-66. 

He, J., Shu, B., Li, X. & Yan, H. (2010). Effective Time Ratio: A Measure for Web 

Search Engines with Document Snippets 

230

231 

References 

Information Retrieval Technology. In: Cheng, P.-J., Kan, M.-Y., Lam, W. & Nakov, P. 

(Eds.), 6th Asia Information Retrieval Societies Conference, AIRS 2010, Taipei, 

Taiwan, December 1-3, 2010. Proceedings (Vol. 6458, pp. 73-84). Berlin: 

Springer. 

Healey, J.F. (2007). The Essentials of Statistics: A Tool for Social Research. Belmont, 

CA: Thomson Higher Education. 

Hedlund, T. (2002). Compounds in dictionary-based cross-language information 

retrieval. Information Research, 7(2). 

Heeks, R. & Bailur, S. (2006). Analyzing eGovernment Research: Perspectives, 

Philosophies, Theories, Methods and Practice (Vol. 16, iGovernment Working 

Paper Series). Manchester: University of Manchester, Institute for Development 

Policy and Management. 

Heeks, R. & Bailur, S. (2007). Analyzing e-government research: Perspectives, 

philosophies, theories, methods, and practice. Government Information 

Quarterly, 24(2), 243-265. 

Helbig, N., Dawes, S.S., Mulki, F.H., Hrdinova, J.L. & Cook, M.E. (2008). 

International Digital Government Research: A Reconnaissance Study: Center 

for Technology in Government, University at Albany, SUNY. 

Henriksen, H.Z. & Damsgaard, J. (2006). The rise and descent of visions for egovernment. 

In: Donnellan, B., Larsen, T.J., Levine, L. & DeGross, J.I. (Eds.), 

The Transfer and Diffusion of Information Technology got Organizational 

Resilience: IFIP TC8 WG 8.6 International Working Conference, June 7-10, 

2006, Galway, Ireland (pp. 275-289). New York: Springer. 

Herzum, M., Andersen, H.H.K., Andersen, V. & Hansen, C.B. (2002). Trust in 

information sources: Seeking information from people, documents, and virtual 

agents. Interacting with Computers, 14(5), 575-599. 

Herzum, M. & Pejtersen, A.M. (2000). The information-seeking practices of engineers: 

searching for documents as well as for people. Information Processing & 

Management, 36(5), 761-778. 

Hjørland, B. (2002). Domain analysis in information science: Eleven approaches 

traditional as well as innovative. Journal of Documentation, 58(4), 422-462. 

Hjørland, B. & Albrechtsen, H. (1995). Toward a new horizon in information science: 

Domain analysis. Journal of the American Society for Information Science, 

46(6), 400-425. 

Hochstotter, N. & Koch, M. (2009). Standard parameters for searching behaviour in 

search engines and their empirical evaluation. Journal of Information Science, 

35(1), 45-65. 

Hodge, G.M. (1994). Computer-assisted database indexing: The state-of-the-art. The 

Indexer, 19(1), 23-27. 

Homburg, V. (2004). E-government and NPM: a perfect marriage? In: Janssen, M., Sol, 

H.G. & Wagenaar, R.W., (Eds.), ICEC '04 Proceedings of the 6th international 

conference on Electronic commerce, (pp. 547-555). New York: ACM. 

Hovy, E. (2008a). An outline for the foundations of digital government research. In: 

Chen, H., Brandt, L., Gregg, V., Traunmüller, R., Dawes, S., Hovy, E., 

Macintosh, A. & Larson, C.A. (Eds.), Digital Government: E-government 

Research, Case studies, and Implementation (pp. 43-59). New York: Springer. 

Hu, G., Pan, W. & Wang, J. (2010). The distinctive lexicon and consensual conception 

of e-Government: an exploratory perspective. International Review of 

Administrative Sciences, 76(3), 577-597. 

Hu, P.J.-H., Brown, S.A., Thong, J.Y.L., Chan, F.K.Y. & Tam, K.Y. (2008). 

Determinants of service quality and continuance intention of online services:The 

case of eTax. Journal of the American Society for Information Science and 

Technology, 60(2), 292-306.


Hu, P.J.-H., Hsu, F.-M., Hu, H.-f. & Chen, H. (2010). Agency satisfaction with 

electronic record management systems: A large-scale survey. Journal of the 

American Society for Information Science and Technology, 61(12), 2559-2574. 

Huang, J. & Efthimiadis, E.N. (2009). Analyzing and evaluating query reformulation 

strategies in web search logs, Proceedings of the 18th ACM conference on 

Information and knowledge management. Hong Kong, China: ACM. 

Humphrey, S.M. (1989). Research on interactive knowledge-based indexing: The 

MedIndEx prototype. Proceedings of the Annual Symposium on Computer 

Application in Medical Care, 527-533. 

Hunter, J. (2009). Collaborative semantic tagging and annotation systems. Annual 

Review of Information Science and Technology, 43, 187-239. 

Iivonen, M. (1995). Consistency in the selection of search concepts and search terms. 


Ingwersen, P. (1986a). Cognitive analysis and the role of the intermediary in 

information retrieval. In: Davies, R. (Ed.), Intelligent Information Systems: 

Progress and Prospects (pp. 206-237). Chichester: Horwood. 

Ingwersen, P. (1992). Information Retrieval Interaction. London: Taylor Graham. 

Ingwersen, P. (1994). Systemudvikling i et in-house miljø: Folketingets 

emneordssystem som case-studie. Biblioteksarbejde, 41, 5-23. 

Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: 

elements of a cognitive IR theory. Journal of Documentation, 52(1), 3-50. 

Ingwersen, P. (1999). Cognitive information retrieval. Annual Review of Information 


Ingwersen, P. (2000). Users in context. In: Agosti, M., Crestani, F. & Pasi, G. (Eds.), 

Lectures on Information Retrieval (pp. 157-178): Springer. 

Ingwersen, P. & Järvelin, K. (2005). The Turn: Integration of Information Seeking and 

Retrieval in Context. Dordrecht: Springer. 

Ingwersen, P. & Järvelin, K. (2007). On the holistic cognitive theory for information 

retrieval: Drifting outside the cave of the laboratory framework. In: Dominich, 

S. & Kiss, F. (Eds.), International Conference on the Theory of Information 

Retrieval (pp. 135-147). Budapest, Hungary: Foundation for Information 

Society. 

Ingwersen, P. & Wormell, I. (1989). Modern indexing and retrieval tecgniques 

matching different types of information needs. In: Koskiala, S. & Launo, R. 

(Eds.), Proceedings of the forty-fourth FID Congress held in Helsinki, Finland, 

28 August-1 September, 1988 (pp. 79-90). Amsterdam: Elsevier. 

ISO. (1985). Documentation: Methods for Examining Documents, Determining Their 

Subjects and Selecting Indexing Terms (ISO 5963-1985). Geneva: International 

Organization for Standardization. 

Israel, G.D. (1992). Determining Sample Size (Fact Sheet PEOD-6). Gainesville, FL: 

University of Florida. 

Jaeger, P.T. (2003). The endless wire: E-government as global phenomenon. 

Government Information Quarterly, 20, 323-331. 

Jaeger, P.T. & Thompson, K.M. (2004). Social information behavior and the democratic 

process: Information poverty, normative behavior, and electronic government in 

the United States. Library & Information Science Research, 26(1), 94-107. 

Jain, A.K., Murty, M.N. & Flynn, P.J. (1999). Data clustering: A review. ACM 

Computing Surveys, 31(3), 264-323. 

Jansen, B.J. (2006). Search log analysis: What it is, what's been done, how to do it. 

Library & Information Science Research, 28(3), 407-432. 

Jansen, B.J. & Pooch, U. (2001). A review of web searching studies and a framework 

for future research. Journal of the American Society for Information Science and 

Technology, 52(3), 235-246. 

232

233 

References 

Jansen, B.J., Spink, A. & Saracevic, T. (2000). Real life, real users, and real needs: A 

study and analysis of user queries on the web. Information Processing & 

Management, 36(2), 207-227. 

Johansen, H.C. (2007). Dansk skattehistorie: Indkomstskatter og offentlig vækst 1903- 

2005 (Vol. 6): Told- og Skattehistorisk Selskab. 

Johnson, J.D., Donohue, W.A., Atkin, C.K. & Johnson, S. (1995). A comprehensive 

model of information seeking: Tests focusing on a technical organization. 

Science Communication, 16(3), 274-303. 

Johnston, J. (2004). Public administration: Organizational aspects. In: International 

Encyclopedia of the Social & Behavioral Sciences (pp. 12507-12512). 

Johnston, J. & Callender, G. (1997). Vulnerable governments: Inadvertent de-skilling in 

the new global economic and managerialist paradigm? International Review of 

Administrative Sciences, 63(1), 41-56. 

Jones, W.P. & Furnas, G.W. (1987). Pictures of relevance: A geometric analysis of 

swimilarity measures. Journal of the American Society for Information Science, 

38(6), 420-442. 

Järvelin, K. (2007). An analysis of two approaches in information retrieval: From 

frameworks to study designs. Journal of the American Society for Information 


Järvelin, K. & Kekäläinen, J. (2002). Cumulated gain-based evaluation of IR 

techniques. ACM Transactions on Information Systems, 20(4), 422-446. 

Kavadias, G. & Tambouris, E. (2003). GovML: A markup language for describing 

public services and life events. In: Wimmer, M.A. (Ed.), Knowledge 

Management in Electronic Government. Proceedings of the 4th IFIP 

International Working Conference, KMGov 2003, Rhodes, Greece, May 26–28, 

2003. Berlin: Springer. 

Kelly, D. (2009). Methods for evaluating interactive information retrieval systems with 

users. Foundations and Trends in Information Retrieval, 3(1-2), 1-224. 

Kent, A., Berry, M.M., Luehrs, F.U. & Perry, J.W. (1955). Machine literature searching 

VIII. Operational criteria for designing information retrieval systems. American 


Kettunen, K. & Henttonen, P. (2010). Missing in action? Content of records 

management metadata in real life. Library & Information Science Research, 

32(1), 43-52. 

Kipp, M.E.I. (2005). Complementary or discrete contexts in online indexing: A 

comparison of user, creator, and intermediary keywords. Canadian Journal of 

Information and Library Science, 29(4), 419-436. 

Klischewski, R. (2006). Ontologies for e-document management in public 

administration. Business Process Management Journal, 12(1), 34-47. 

Kopackova, H., Michalek, K. & Cejna, K. (2010). Accessibility and findability of local 

e-government websites in the Czech Republic. Universal Access In The 

Information Society, 9(1), 51-61. 

Korfhage, R.R. (1997). Information Storage and Retrieval. New York: Wiley. 

Koshman, S., Spink, A. & Jansen, B.J. (2006). Web searching on the Vivisimo search 

engine. Journal of the American Society for Information Science and 

Technology, 57(14), 1875-1887. 

Kotsiantis, S.B., Zaharakis, I.D. & Pintelas, P.E. (2006). Machine learning: A review of 

classification and combining techniques. Artificial Intelligence Review, 26(3), 

159-190. 

Kraemer, K.L. & Dedrick, J. (1997). Computing and Public Organizations. Journal of 

Public Administration Research and Theory, 7(1), 89-112. 

Kraemer, K.L. & King, J.L. (1986). Computing and public organizations. Public 

Administration Review, 46(6), 488-496.


Krippendorff, K. (2004). Content Analysis: An Introduction to Its Methodology (2. ed.). 

Thousand Oaks: Sage. 

Krueger, R.A. (1998). Developing Questions for Focus Groups (Vol. 3). Thousand 

Oaks: Sage. 

Kuhlthau, C.C. & Tama, S.L. (2001). Information search process of lawyers: A call for 

'just for me' information services. Journal of Documentation, 57(1), 25-43. 

Kules, B. & Shneiderman, B. (2004). Categorized graphical overviews for web search 

results: An exploratory study using U. S. government agencies as a meaningful 

and stable structure, Proceedings of the Third Annual Workshop on HCI 

Research in MIS. Washington, D.C. 

Kules, B. & Shneiderman, B. (2005). Using meaningful and stable categories to support 

exploratory web search: Two formative studies (HCIL Technical Report 2005- 

31). Maryland: Human-Computer Interaction Laboratory, University of 

Maryland. 

Kvale, S. & Brinkmann, S. (2009). Interviews: Learning the Craft of Qualitative 

Research Interviewing (2. ed.). Los Angeles: Sage. 

Käki, M. (2005a). Enhancing Web Search Result Access with Automatic Categorization. 

Unpublished Doctoral Dissertation, Department of Computer Sciences, 

University of Tampere, Tampere, Finland, from http://acta.uta.fi/pdf/951-44- 

6490-7.pdf. 

Käki, M. (2005b). Findex: Search result categories help users when document ranking 

fails. In: Proceedings of the SIGCHI conference on Human factors in 

computing systems, (pp. 131-140). Portland, Oregon: ACM. 

Käki, M. & Aula, A. (2005). Findex: Improving search result use through automatic 

filtering categories. Interacting with Computers, 17(2), 187-206. 

Lancaster, F.W. (2003). Indexing and Abstracting in Theory and Practice (3. ed.). 

London: Facet. 

Landsforeningen af Kommunale Servicecentre, A.o.I. (2005). LKS: Projekt 

Borgerbetjening 2007: Rapport fra arbejdsgruppen om IT. 

Large, A., Tedd, L.A. & Hartley, R.J. (2001). Information Seeking in the Online Age: 

Principles and Practice. München: K. G. Saur. 

Lau, E.P. & Goh, D.H.-L. (2006). In search of query patterns: A case study of a 

university OPAC. Information Processing & Managament, 42, 1316-1329. 

Layne, K. & Lee, J. (2001). Developing fully functional E-government: A four stage 

model. Government Information Quarterly, 18, 122-136. 

Leckie, G.J., Pettigrew, K.E. & Sylvain, C. (1996). Modeling the information seeking of 

professionals: A general model derived from research on engineers, health care 

professionals, and lawyers. Library Quarterly, 66(2), 161-193. 

Levine, M.M. (1974). Information Needs in Milwaukee: Agencies and Groups (Ed-089 

769). Milwuakee: Milwaukee Urban Observatory. 

Levy, P.S. & Lemeshow, S. (2008). Sampling of Populations: Methods and 

Applications (4. ed.). Hoboken, New Jersey: Wiley. 

Lips, M. (1998). Reorganizing public service delivery in an information age. In: 

Snellen, I.T.M. & van de Donk, W.B.H.J. (Eds.), Public Administration in an 

Information Age (pp. 325-339). Amsterdam: IOS. 

Liu, Y., Zhu, L. & Gorton, I. (2007). Performance Assessment for e-Government 

Services: An Experience Report. In: Schmidt, H.W., Crnkovic, I., Heineman, 

G.T. & Stafford, J.A. (Eds.), Component-Based Software Engineering. 10th 

International Symposium, CBSE 2007, Medford, MA, USA, July 9-11, 2007 (pp. 

74-89). Berlin: Springer. 

Lovins, J.B. (1968). Development of a stemming algorithm. Mechanical Translation 

and Computational Linguistics, 11(1-2), 22-31. 

234

235 

References 

Lu, L. & Yuan, Y.C. (2011). Shall I google it or ask the competent villain down the 

hall? The moderating role of information need in information source selection. 

Journal of the American Society for Information Science and Technology, 62(1), 

133-145. 

Luhn, H.P. (1957). A statistical approach to mechanized encoding and searching of 

literary information. IBM Journal of Research and Development, 1(4), 309-317. 

Luhn, H.P. (1958a). The automatic creation of literature abstracts. IBM Journal of 

Research and Development, 2(2), 159-165. 

Luhn, H.P. (1961). The automatic derivation of information retrieval encodements from 

machine-readable texts. In: Kent, A. (Ed.), Information Retrieval and Machine 

Translation (Vol. 3, pt. 2, pp. 1021-1028). New York: Interscience. 

Lykke, M., Price, S. & Delcambre, L. (2012). How doctors search: A study of query 

behaviour and the impact on search results. Information Processing & 

Managament(0). 

MacMullin, S.E. & Taylor, R.S. (1984). Problem dimensions and information traits. The 

Information Society, 3(1), 91-111. 

Mahler, J.G. & Regan, P.M. (2005). Agency internets and the changing dynamics of 

congressional oversight. In: Garson, G.D. (Ed.), Handbook of Public 

Information Systems (2. ed., pp. 559-568). Boca Raton: Taylor & Francis. 

Mai, J.-E. (2004b). The future of general classification. Cataloging & Classification 

Quarterly, 37(1 & 2), 3-12. 

Mai, J.E. (2000). Deconstructing the indexing process. Advances in Librarianship, 23, 

269-298. 

Mai, J.E. (2005). Analysis in indexing: document and domain centered approaces. 

Information Processing & Management, 41, 599-611. 

Makri, S., Blandford, A. & Cox, A.L. (2008a). Investigating the information-seeking 

behaviour of academic lawyers: From Ellis' model to design. Information 

Processing & Management, 44(2), 613-634. 

Mandersloot, W.G.B., Douglas, E.M.B. & Spicer, N. (1970). Thesaurus control: The 

selection, grouping, and cross-referencing of terms for inclusion in a coordinate 

index word list. Journal of the American Society for Information Science, 21(1), 

49-57. 

Marcella, R., Baxter, G., Davies, S. & Toornstra, D. (2007). The information needs and 

information-seeking behaviour of the users of the European Parliamentary 

Documentation Centre: A customer knowledge study. Journal of 


Marchionini, G., Samet, H. & Brandt, L. (2003). Digital government. Communications 

of the ACM, 46(1), 25-27. 

Marini, F. (2000). Public administration. In: Shafritz, J.M. (Ed.), Defining Public 

Administration: Selections from the International Encyclopedia of Public Policy 

and Administration (pp. 3-16). Jaipur: Rawat. 

Markey, K. (2007a). Twenty-five years of end-user searching, part 1: Research findings. 

Journal of the American Society for Information Science and Technology, 58(8), 

1071-1081. 

Martin, B. (2008). Knowledge management. Annual Review of Information Science and 


Martinez, C., Lucey, J. & Linder, E. (1987). An expert system for machine-aided 

indexing. Journal of Chemical Information and Computer Sciences, 27(4), 158- 

162. 

Meijer, A.J. & Homburg, V.M.F. (2008). Introduction: Zooming in and zooming out on 

electronic government. International Journal of Public Administration, 31(7), 

707-710.


Miles, M.B. & Huberman, A.M. (1994). Qualitative Data Analysis: An Expanded 

Sourcebook (2. ed.). Thousand Oaks: Sage. 

Millard, J. (2003). ePublic services in Europe: Past, present and future. Research 

findings and new challenges. Aarhus: Danish Technological Institute. 

Milstead, J.L. (1992). Methodologies for subject analysis in bibliographic databases. 


Milstead, J.L. (1994). Needs for research in indexing. Journal of the American Society 

for Information Science, 45(8), 577-582. 

Ministry of finance. (2001). IT, the Internet, and the Public Sector. Copenhagen: 

Ministry of finance. 

Moen, W.E. (2001). The metadata approach to accessing government information. 


Moen, W.E. & McClure, C.R. (1997). An Evaluation of the Federal Government's 

Implementation of the Government Information Locator Service (GILS): Final 

Report. Washington, DC.: General Services Administration Office of 

Information Technology Integration. 

Moens, M.-F. (2000). Automatic Indexing and Abstracting of Document Texts. Boston: 

Kluwer. 

Morgan, D.L. (1996). Focus groups. Annual Review of Sociology, 22, 129-152. 

Mukherjee, R. & Mao, J. (2004). Enterprise search: Tough stuff. Queue, 2(2), 36-46. 

Nakash, R.A., Hutton, J.L., Lamb, S.E., Gates, S. & Fisher, J. (2008). Response and 

non-response to postal questionnaire follow-up in a clinical trial: A qualitative 

study of the patient's perspective. Journal of Evaluation in Clinical Practice, 14, 

226-235. 

National Archives of Australia (2010). Development history. Retrieved 01-04-2011, 

2011, from http://www.agls.gov.au/about/. 

National IT and Telecom Agency. (2009). Overordnede Principper og Best Practice: 

Version 1.0. Copenhagen: National IT and Telecom Agency. 

Neuendorf, K.A. (2002). The Content Analysis Guidebook. Thousand Oaks: Sage. 

Nicholas, D. & Colgrave, K. (1996). Councillors and information: A study of 

information needs and information provision. Aslib Proceedings, 48(2), 37-46. 

Nielsen, J.A., Kræmmergaard, P., Nielsen, P.A. & Bjørnholt, B. (2009). Det kommunale 

digitaliseringslandskab 2009: Status og udfordringer. Aalborg: Aalborg 

University. 

Nielsen, M.L. (2001). A framework for work task based thesaurus design. Journal of 


Nielsen, M.L. (2004). Task-based evaluation of associative thesaurus in real-life 

environment. Proceedings of the 67th ASIS&T Annual Meeting, 41, 437-447. 

Nikoi, S.K. (2008). Information needs of NGOs: A case study of NGO development 

workers in the northern region of Ghana. Information Development, 24(1), 44- 

52. 

NISO (2004). Understanding Metadata. Retrieved 23-03, 2011, from 

http://www.niso.org/publications/press/UnderstandingMetadata.pdf. 

OECD. (2010). Denmark: Efficient E-government For Smarter Public Service Delivery: 

Preliminary Copy. Paris, France: OECD. 

Oh, C.H. (1996). Information searching in governmental bureaucracies: An integrated 

model. The American Review of Public Administration, 26(1), 41-70. 

Olsen, H. (1997). Tal taler ikke uden ord. Politica, 29(3), 295-310. 

Orton, R., Marcella, R. & Baxter, G. (2000). An observational study of the information 

seeking behaviour of Members of Parliament in the United Kingdom. Aslib 


236

237 

References 

Palkovits, S., Woitsch, R. & Karagiannis, D. (2003). Process-based knowledge 

management and modelling in e-government: An inevitable combination. In: 

Wimmer, M.A. (Ed.), KMGov 2003 (pp. 213-218): Springer. 

Pedersen, B.S., Navarretta, C. & Hansen, D.H. (2005). Ontologibaseret teksthåndtering: 

Med sprogteknologi (VID-rapport no. 6). Copenhagen: Center for 

Sprogteknologi. 

Pedersen, B.S., Navarretta, C. & Henriksen, L. (2004). Building business ontologies 

with language technology techniques: The VID project. In: OntoLex 2004 

Proceedings (pp. 30-35). Paris: European Language Resources Association. 

Peel, M. & Rowley, J. (2010). Information sharing practice in multi-agency working. 

Aslib Proceedings, 62(1), 11-28. 

Peres, M., Guzmán, F. & Valbuena, T. (2009). Online government strategy 

development model for interactional and transactional phases in the territorial 

order, The 3rd International Conference on Theory and Practice of Electronic 

Governance. Bogota, Columbia: ACM. 

Peristeras, V., Tatabanis, K. & Goudos, S.K. (2009). Model-driven eGovernment 

interoperability: A review of the state of the art. Computer Standards & 

Interfaces, 31(4), 316-328. 

Personalestyrelsen (2010). Forhandlingsdatabasen. Retrieved 26-01, 2010, from 

http://perst.dk/Arbejdspladsen/Ledelsesinformation%20og%20statistik/Ledelsesi 

nformation%20og%20lonstyring/Forhandlingsdatabasen.aspx. 

Philipson, K.B. (2008). Indekseringsprocessen: Konsistensmål til sammenligning af 

tilgange til emnebestemmelse og emnebeskrivelse. Dansk Biblioteksforskning, 

4(3), 57-71. 

Poland, B.D. (2003). Transcription qualiry. In: Holstein, J.A. & Gubrium, J.F. (Eds.), 

Inside Interviewing: New Lenses, New Concerns (pp. 267-287). Thousand Oaks: 

Sage. 

Porter, M.F. (1980). An algorithm for suffix stripping. Program: Electronic Library and 

Information Systems, 14(3), 130-137. 

Porter, M.F. (2001). Snowball: A language for stemming algorithms. Retrieved 19-08, 

2011, from http://snowball.tartarus.org/texts/introduction.html. 

Price, S.L., Nielsen, M.L., Delcambre, L.M.L. & Vedsted, P. (2007). Semantic 

components enhance retrieval of domain-specific documents. In: CIKM '07: 

Proceedings of the sixteenth ACM conference on Conference on information and 

knowledge management (pp. 429-438). New York: ACM. 

Price, S.L., Nielsen, M.L., Delcambre, L.M.L., Vedsted, P. & Steinhauer, J. (2009). 

Using semantic components to search for domain-specific documents: An 

evaluation from the system perspective and the user perspective. Information 

Systems, 34, 724-752. 

Project Digital Government & The Digital Taskforce (2002). Towards E-Government: 

Vision and Strategy for the Public Sector in Denmark. Retrieved 13-07, 2010, 

from http://www.epractice.eu/files/media/media_362.pdf. 

Quam, E. (2001). Informing and evaluating a metadata initiative: Usability and 

metadata studies in Minnesotaﾒs Foundations Project. Government Information 

Quarterly, 18(3), 181-194. 

Quirchmayr, G. & Traunmüller, R. (1991). Expert systems in law and public 

administration: Recent developments and future prospects. In: Traunmüller, R. 

(Ed.), Governmental and Municipal Information Systems, II: Proceedings of the 

2nd IFIP TC(/WG8.5 Working Conference on Governmental and Municipal 

Information Systems, Balatonfüred, Hungary, 3-6 June (pp. 145-163). 

Amsterdan: Elsevier. 

Rafferty, P. & Hidderley, R. (2007). Flickr and Democratic Indexing: Dialogic 

approaches to indexing. Aslib Proceedings, 59(4/5), 397-410.


Rains, S.A. (2008). Health at high speed: Broadband internet access, health 

communication, and the digital divide. Communication Research, 35(3), 283- 

297. 

Rasmussen, E. (1992). Clustering algorithms. In: Frakes, W.B. & Baeza-Yates, R. 

(Eds.), Information Retrieval: Data Structures & Algorithms (pp. 419-442). 

Englewood Cliffs, New Jersey: Prentice Hall. 

Rasmussen, E.M. (2003). Indexing and retrieval for the Web. Annual Review of 


Reddick, C.G. (2005). Citizen interaction with e-government: From the streets to 

servers? Government Information Quarterly, 22(1), 38-57. 

Ren, W.-H. (1999). Self-efficacy and the search for government information. Reference 

& User Services Quarterly, 38(3), 283-291. 

Robbin, A., Courtright, C. & Davis, L. (2004). ICTs and political life. Annual Review of 

Information Science and Technology, 38(1), 411-482. 

Robertson, S.E. & Hancock-Beaulieu, M.M. (1992). On the evaluation of IR systems. 


Roitblat, H.L., Kershaw, A. & Oot, P. (2010). Document categorization in legal 

electronic discovery: Computer classification vs. manual review. Journal of the 

American Society for Information Science and Technology, 61(1), 70-80. 

Rolling, L. (1981). Indexing consistency, quality and efficiency. Information 

Processing & Management, 17(2), 69-76. 

Rouse, W.B. & Rouse, S.H. (1984). Human information seeking and design of 

information systems. Information Processing & Management, 20(1-2), 129-138. 

Rowley, J. (1988). Abstracting and Indexing (2. ed.). London: Clive Bingley. 

Rowley, J. (1994). The controlled versus natural indexing language debate revisited: A 

perspective on information retrieval practice and research. Journal of 


Rowley, J. (2011). e-Government stakeholders: Who are they and what do they want? 

International Journal of Information Management, 31(1), 53-62. 

Rowley, J. & Hartley, R. (2008). Organizing Knowledge: An Introduction to Managing 

Access to Information (4. ed.). Hampshire: Ashgate. 

Rubin, H.J. & Rubin, I.S. (2005). Qualitative Interviewing: The Art of Hearing Data (2. 

ed.). Thousand Oaks: Sage. 

Sabucedo, L.Á. & Rifón, L.A. (2006). Semantic Service Oriented Architectures for 

eGovernment Platforms. Retrieved 08-01-2010. 

Salton, G. (1970). Automatic text analysis. Science, 168(3929), 335-343. 

Salton, G. (1986a). Another look at automatic text-retrieval systems. Communications 

of the ACM, 29(7), 648-656. 

Salton, G. (1988). Automatic indexing and abstracting. In: Willet, P. (Ed.), Document 

Retrieval Systems (pp. 42-80). London: Taylor Graham. 

Salton, G. (1989). Automatic Text Processing: The Transformation, Analysis, and 

Retrieval of Information by Computer. Reading, Massachusetts: Addison- 

Wesley. 

Salton, G. (1991). Developments in automatic text retrieval. Science, 253(5023), 974- 

980. 

Salton, G. & Buckley, C. (1988). Term-weighting approaches in automatic text 

retrieval. Information Processing & Management, 24(5), 513-523. 

Salton, G. & McGill, M.J. (1983). Introduction to Modern Information Retrieval. New 

York: McGraw-Hill. 

Salton, G., Wong, A. & Yang, C.S. (1975). A vector space model for automatic 

indexing. Communications of the ACM, 18(11), 613-620. 

238

239 

References 

Salton, G., Yang, C.S. & Yu, C.T. (1975). A theory of term importance in automatic 

text analysis. Journal of the American Society for Information Science, 26(1), 

33-44. 

Saracevic, T. (1996). Relevance reconsidered '96. In: Ingwersen, P. & Pors, N.O. (Eds.), 

Colis 2 - Second International Conference On Conceptions Of Library And 

Information Science: Integration In Perspective, Proceedings (pp. 201-218). 

Copenhagen S: Royal School Librarianship. 

Saracevic, T., Kantor, P., Chamis, A. & Trivison, D. (1987). Experiments on the 

Cognitive Aspects of Information Seeking and Information Retrieving. Final 

Report and Appendices. Washington, D.C.: National Science Foundation, Div. 

of Information Science and Technology. 

Savolainen, R. (1995). Everyday life information seeking: Approaching information 

seeking in the context of ﾓway of lifeﾔ. Library & Information Science 

Research, 17(3), 259-294. 

Savolainen, R. (2006). Time as a context of information seeking. Library & Information 

Science Research, 28(1), 110-127. 

Savoy, J. (2005). Bibliographic database access using free-text and controlled 

vocabulary: An evaluation. Information Processing & Management, 41(4), 873- 

890. 

Saxena, K.B.C. & Aly, A.M.M. (1995). Information technology support for 

reengineering public administration: A conceptual framework. International 

Journal of Information Management, 15(4), 271-293. 

Schamber, L., Eisenberg, M.B. & Nilan, M.S. (1990). A re-examination of relevance: 

Toward a dynamic, situational definition. Information Processing & 

Managament, 26(6), 755-776. 

Schellong, A. (2007). Crossing the boundary: Why putting the e in government is the 

easy part. In: PNG Working Paper Series, PNG07-002. Retrieved 18-01, 2010, 

from 

http://www.hks.harvard.edu/netgov/files/png_workingpaper_series/PNG07- 

002_WorkingPaper_crossing_the_boundary_schellong.pdf. 

Schultz, C.K. (1970). Cost-effectiveness as a guide in developing indexing rules. 

Information Storage and Retrieval, 6(4), 335-340. 

Schwartz, D.G., Divitini, M. & Brasethvik, T. (2000). Internet-Based Organizational 

Memory and Knowledge Management. Hershey, USA: Idea Group. 

Sebastiani, F. (1999). A tutorial on automated text categorisation. In: Amandi, A. & 

Zunino, A. (Eds.), Proceedings of the 1st Argentinian Symposium on Artificial 

Intelligence (ASAI'99) (pp. 7-35). Buenos Aires, AR. 

Sebastiani, F. (2002). Machine learning in automated text categorization. ACM 

Computing Surveys, 34(1), 1-47. 

Serola, S. (2006). City planners' information seeking behavior: Information channels 

used and information types needed in varying types of perceived work tasks. In: 

Ruthven, I. (Ed.), IIiX: Proceedings of the 1st International Conference on 

Information Interaction in Context (pp. 42-45). New York: ACM. 

Shropshire, K.O., Hawdon, J.E. & Witte, J.C. (2009). Web survey design: Balancing 

measurement, response, and topical interest. Sociological Methods Research, 

37(3), 344-370. 

Siegel, S. & Castellan, N.J. (1988). Nonparametric Statistics for the Behavioral 

Sciences (2. ed.). New York: McGraw Hill. 

Silvester, J.P., Genuardi, M.T. & Klingbiel, P.H. (1994). Machine-aided indexing at 

Nasa. Information Processing & Managament, 30, 631-645. 

Silvester, J.P. & Klingbiel, P.H. (1993). An operational system for subject switching 

between controlled vocabularies. Information Processing & Management, 29(1), 

47-59.


Singhal, A., Salton, G., Mitra, M. & Buckley, C. (1996). Document length 

normalization. Information Processing & Management, 32(5), 619-633. 

SKAT (2009). Årsrapport 2008. Retrieved 04-02, 2010, from 

http://www.skat.dk/SKAT.aspx?oId=1809360&vId=0. 

SKAT (2010). About us. Retrieved 26-01, 2010, from 

http://www.skat.dk/SKAT.aspx?oId=1826783&vId=0. 

Skov, M. (2009). The Reinvented Museum: Exploring Information Seeking Behaviour in 

a Digital Museum Context. Unpublished Doctoral dissertation, Research 

Programme Information Interaction and Information Architecture, Royal School 

of Library and Information Science, Copenhagen. 

Snellen, I.T.M. (2002). Electronic governance: Implications for citizens, politicians and 

public servants. International Review of Administrative Sciences, 68(2), 183- 

198. 

Soergel, D. (1985). Organizing Information: Principles of Data Base and Retrieval 

Systems. San Diego, CA: Academic Press. 

Soergel, D. (1994). Indexing and retrieval performance: The logical evidence. Journal 

of the American Society for Information Science, 45(8), 589-599. 

Soergel, D. (1999). The rise of ontologies or the reinvention of classification. Journal of 

the American Society for Information Science, 50(12), 1119-1120. 

Solomon, P. (1997a). Discovering information behavior in sense making.1. Time and 

timing. Journal of the American Society for Information Science, 48(12), 1097- 

1108. 

Solomon, P. (1997b). Discovering information behavior in sense making.2. The social. 

Journal of the American Society for Information Science, 48(12), 1109-1126. 

Solomon, P. (1997c). Discovering information behavior in sense making.3. The person. 


Sormunen, E. (2002). Liberal relevance criteria of TREC - counting on negligible 

documents? In: SIGIR '02: Proceedings of the 25th annual international ACM 

SIGIR conference on Research and development in information retrieval, (pp. 

324-330). August 11-15, 2002, Tampere, Finland: ACM. 

Southon, F.C.G., Todd, R.J. & Seneque, M. (2002). Knowledge management in three 

organizations: An exploratory study. Journal of the American Society for 


Sparck Jones, K. (1973). Index term weighting. Information Storage and Retrieval, 

9(11), 619-633. 

Sparck Jones, K. (1981). The Cranfield tests. In: Jones, K.S. (Ed.), Information 

Retrieval Experiment (pp. 256-284). London: Butterworths. 

Sprehe, J.T., McClure, C.R. & Zellner, P. (2002). The role of situational factors in 

managing U.S. federal recordkeeping. Government Information Quarterly, 

19(3), 289-305. 

Steinmark, C. (2005). EDM in the Danish public sector: The FESD project. Aslib 


Stenmark, D. (2005). How Intranets differ from the Web: Organisational cultureʼs effect 

on technology. In: Bartmann, D., Rajola, F., Kallinikos, J., Avison, D.E., 

Winter, R., Ein-Dor, P., Becker, J., Bodendorf, F. & Weinhardt, C. (Eds.), 

European Conference on Information Systems ECIS 05. 

Stewart, D.W., Shamdasani, P.N. & Rook, D.W. (2007). Focus Groups: Theory and 

Practice (2. ed.). Thousand Oaks: Sage. 

Strader, C.R. (2009). Author-assigned keywords versus Library of Congress Subject 

Headings implications for the cataloging of electronic theses and dissertations. 

Library Ressources & Technical Services, 53(4), 243-250. 

Strzalkowski, T., Lin, F., Wang, J. & Perez-Carballo, J. (1999). Evaluating natural 

language processing techniques in information retrieval. In: Strzalkowski, T. 

240

241 

References 

(Ed.), Natural Language Information Retrieval (pp. 113-145). Dordrecht: 

Kluwer. 

Suomela, S. & Kekäläinen, J. (2005). Ontology as a search-tool: A study of real users' 

query formulation with and without conceptual support. In: Losada, D.E. & 

Fernandez-Luna (Eds.), ECIR proceedings 2005 (pp. 315-329): Springer. 

Suomela, S. & Kekäläinen, J. (2006). User evaluation of ontology as query construction 

tool. Information Retrieval, 9, 455-475. 

Svenonius, E. (1986). Unanswered questions in the design of controlled vocabularies. 


Talja, S., Tuominen, K. & Savolainen, R. (2005). "Isms" in information science: 

Constructivism, collectivism and constructionism. Journal of Documentation, 

61(1), 79-101. 

Tambouris, E., Manouselis, N. & Costopoulou, C. (2007). Metadata for digital 

collections of e-government resources. The Electronic Library, 25(2), 176-192. 

Taylor, R.S. (1968). Question-negotiation and information seeking in libraries. College 

& Research Libraries, 29(3), 178-194. 

Taylor, R.S. (1991). Information use environments. In: Dervin, B. & Voigt, M.J. (Eds.), 

Progress in Communication Sciences (Vol. 10, pp. 217-255). Norwood, NJ: 

Ablex. 

Tenopir, C. (1985). Full text database retrieval performance. Online Information 

Review, 9(2), 149-164. 

The Danish Government, Local Government Denmark & Danish Regions. (2010). 

Mandate: New Common Public Strategy for Digitalization 2011-2015: Local 

Government Denmark. 

The Danish Government, Local Government Denmark, Danish Regions, Copenhagen 

Municipality & Frederiksberg Municipality (2004). The Danish eGovernment 

Strategy 2004-2006: Realising the Potential. Retrieved 13-07, 2010, from 

http://www.epractice.eu/files/media/media_275.pdf. 

The Danish Government, Local Government Denmark (LGDK) & Danish Regions 

(2007). The Danish E-Government Strategy 2007-2010: Towards Better Digital 

Service, Increased Efficiency and Stronger Collaboration. from 

http://www.modernisering.dk/fileadmin/user_upload/documents/Projekter/digita 

liseringsstrategi/Danish_E-government_strategy_2007-2010.pdf. 

Thomas, M., Caudle, D.M. & Schmitz, C.M. (2009). To tag or not to tag? Library Hi 

Tech, 27(3), 411-434. 

Trant, J. (2009). Studying social tagging and folksonomy: A review and framework. 

Journal of Digital Information, 10(1). 

Turpin, A., Scholer, F., Järvelin, K., Wu, M. & Culpepper, J.S. (2009). Including 

summaries in system evaluation. In: Allan, J. (Ed.), Proceedings of the 32nd 

international ACM SIGIR conference on Research and development in 

information retrieval. New York: ACM. 

United Nations. (2012). E-Government Survey 2012: E-Government for the People. 

New York: United Nations. 

Vakkari, P. (1999). Task complexity, problem structure and information actions: 

Integrating studies on information seeking and retrieval. Information Processing 

& Management, 35, 819-837. 

Vakkari, P. (2003). Task-based information searching. Annual Review of Information 


van de Donk, W.B.H.J. & Snellen, I.T.M. (1989). Knowledge-based systems in public 

administration: Evolving practices and norms. In: Snellen, I.T.M., van de Donk, 

W.B.H.J. & Baquiast, J.-P. (Eds.), Expert Systems in Public Administration: 

Evolving Practices and Norms (pp. 3-22). Amsterdam: Elsevier.


van Deursen, A. & van Dijk, J. (2010). Civil servantsﾒ internet skills: Are they ready for 

e-government? In: Wimmer, M.A., Chappelet, J.-L., Janssen, M. & Scholl, H.J. 

(Eds.), Electronic Government. 9th IFIP WG 8.5 International Conference, 

EGOV 2010, Lausanne, Switzerland, August 29 - September 2, 2010. 

Proceedings (pp. 132-143). Berlin: Springer. 

Veal, D.C. (2001). Techniques of document management: A review of text retrieval and 

related technologies. Journal of Documentation, 57(2), 192-217. 

Veenema, F. (1996). To index or not to index. Canadian Journal of Information and 

Library Science, 21(2), 1-22. 

Vellucci, S.L. (1998). Metadata. Annual Review of Information Science and 


Voorhees, E. & Pazienza, M. (1999). Natural language processing and information 

retrieval. In: Lecture Notes in Computer Science: Information Extraction (Vol. 

1714, pp. 32-48). Berlin: Springer. 

Voorhees, E.M. (2000). Variations in relevance judgments and the measurement of 

retrieval effectiveness. Information Processing & Managament, 36, 697-716. 

Wacholder, N., Kelly, D., Kantor, P., Rittman, R., Sun, Y., Bai, B., Small, S., Yamrom, 

B. & Strzalkowski, T. (2007). A model for quantitative evaluation of an end-toend 

question-answering system. Journal of the American Society for Information 


Walden, G.R. (2006). Focus group interviewing in the library literature: A selective 

annotated bibliography 1996-2005. Reference Services Review, 34(2), 222-241. 

Wang, P. (1999). Methodologies and methods for user behavioral research. Annual 

Review of Information Science and Technology, 34, 53-99. 

Wang, Y.-S. & Shih, Y.-W. (2009). Why do people use information kiosks? A 

validation of the unified theory of acceptance and use of technology. 


Weibel, S. (1997). The Dublin Core: A simple content description model for electronic 

resources. Bulletin of the American Society for Information Science, 24(1), 9-11. 

White, M. (2005). The Content Management Handbook. London: Facet. 

Wilbur, W.J. & Sirotkin, K. (1992). The automatic identification of stop words. Journal 

of Information Science, 18(1), 45-55. 

Willett, P. (2006). The Porter stemming algorithm: Then and now. Program: Electronic 

Library and Information Systems, 40(3), 219-223. 

Wilson, T.D. (1980). Information system design implications of research into the 

information behaviour of social workers and social administrators. In: Harbo, 

O.K., L., (Ed.), Theory and application of information research: Proceedings of 

the Second International Research Forum on Information Science, 3-6 August, 

1977, (pp. 198-213). Royal School of Librarianship, Copenhagen: London, UK: 

Mansell. 

Wilson, T.D. (1981). On user studies and information needs. Journal of Documentation, 

37(1), 3-15. 

Wilson, T.D. (1999). Models in information behaviour research. Journal of 


Wilson, T.D. & Streatfield, D.R. (1977). Information needs in local authority social 

services departments: an interim report on project INISS. Journal of 


Wimmer, M.A. (2007). eGovernment as a multidisciplinary research field. In: 

Codagnone, C. & Wimmer, M.A. (Eds.), Roadmapping eGovernment Research: 

Visions and Measures towards Innovative Governments in 2020 (pp. 12-14). 

[Koblentz]: eGovRTD2020 Project Consortium. 

242

243 

References 

Woudstra, L. & van den Hooff, B. (2008). Inside the source selection process: Selection 

criteria for human information sources. Information Processing & Management, 

44(3), 1267-1278. 

Xu, Y.C., Tan, C.Y.B. & Yang, L. (2006). Who will you ask? An empirical study of 

interpersonal task information seeking. Journal of the American Society for 


Yang, D., Tong, L., Ye, Y. & Wu, H. (2006). Supporting effective operation of egovernmental 

services through workflow and knowledge management. In: 

Aberer, K., Peng, Z., Rundensteiner, E.A., Zhang, Y. & Li, X., (Eds.), Web 

Information Systems: WISE 2006, 7th International Conference on Web 

Information Systems Engineering, Wuhan, China, October 23-26, (pp. 102-113). 

Berlin: Springer. 

Yildiz, M. (2007). E-government research: Reviewing the literature, limitations, and 

ways forward. Government Information Quarterly, 24, 646-665. 

Zamir, O. & Etzioni, O. (1999). Grouper: A dynamic clustering interface to Web search 

results. Computer Networks, 31(11-16, 17 May 1999), 1361-1374. 

Zeng, M.L. (2008). Knowledge Organization Systems (KOS). Knowledge Organization, 

35(2/3), 160-182. 

Zikmund, W.G. (2000). Business Research Methods (6. ed.). Fort Worth: Hartcourt. 

Zipf, G.K. (1949). Human Behavior and the Principle of Least Effort. Cambridge: 

Addison-Wesley. 

Zunde, P. & Dexter, M.E. (1969). Indexing consistency and quality. American 


Østergaard, M. & Olesen, J.D. (2004). Digital forkalkning: En debatbog om digital 

forvaltning i Danmark. Frederikshavn: Dafolo. 

Åström, F. (2007). Changes in the LIS research front: Time-sliced cocitation analyses of 

LIS journal articles, 1990-2004. Journal of the American Society for Information 

Science and Technology, 58(7), 947-957.

List of abbreviations 

ARIST Annual Review of Information Science and Technology 

ICT Information and Communication Technology 

IDF Inverse document frequency 

IIR Interactive Information Retrieval 

IR Information Retrieval 

KOS Knowledge Organizing Systems 

LCSH Library of Congress Subject Headings 

LIS Library and Information Science 

MAI Machine aided/assisted indexing 

RSLIS Royal School of Library and Information Science 

TF Term frequency 

245 

Abbreviations

Appendices 

247 

Appendices 

List of abbreviations ................................................................................................................................. 245 

Appendices ............................................................................................................................................... 247 

Appendix 1: Generic work tasks at SKAT ............................................................................................... 249 

Appendix 2: Distribution of employees across main processes in the business model ............................ 253 

Appendix 3: E-mail invitation to employees ............................................................................................ 255 

Appendix 4: Questions contained in questionnaire .................................................................................. 257 

Appendix 5: Questionnaire pilot test data ................................................................................................ 259 

Appendix 6: Link to questionnaire ........................................................................................................... 261 

Appendix 7: Dates for the conduct of focus group interviews ................................................................. 263 

Appendix 8: Example of the slides guiding a focus group interview ....................................................... 265 

Appendix 9: Focus group interview guide................................................................................................ 275 

Appendix 10: Transcription conventions.................................................................................................. 277 

Appendix 11: Verbatim Danish versions of quotes used in the thesis ...................................................... 279 

Appendix 12: E-mail invitation to participate in search test..................................................................... 287 

Appendix 13: Questionnaire for recruiting test persons for the search test .............................................. 291 

Appendix 14: Simulated search tasks ....................................................................................................... 293 

Appendix 15: Test persons’ insight into simulated search tasks .............................................................. 295 

Appendix 16: E-mail concerning naturalistic information needs ............................................................. 297 

Appendix 17: Instructions for search test persons .................................................................................... 299 

Appendix 18: Rotation of search tasks ..................................................................................................... 303 

Appendix 19: Search test interview guide ................................................................................................ 305 

Appendix 20: Judgement of the relevance of retrieved documents in search test .................................... 307 

Appendix 21: Completeness degree of questionnaire responses .............................................................. 309 

Appendix 22: Respondents’ experience with work tasks ......................................................................... 311 

Appendix 23: Age distribution of population, respondents and test persons ............................................ 313 

Appendix 24: Respondents’ length of service in the organization ........................................................... 315 

Appendix 25: Focus group participants work tasks .................................................................................. 317 

Appendix 26: Additional sources mentioned by respondents ................................................................... 319 

Appendix 27: Test persons’ background data .......................................................................................... 325 

Appendix 28: Supplementary search test tables ....................................................................................... 327

Appendix 1: Generic work tasks at SKAT 

249 

Appendices 

This appendix summarizes and explains the content of the 19 generic work tasks 

constituting the version of the business model that formed the basis of the survey 

questionnaire.


Main process Work task Description 

Instruction Common Answering requests, whether written, in person, or by 

phone. 

Marketing, guidance, and outgoing service. 

Settlement Common Handling payments, settlements, certifications, access 

to records or registration, expenditures, 

reimbursements or dealing with complaints. 

Preliminary Preliminary income assessments, annual tax 

assessment of statements and returns of personal taxes, family 

income/person allowance, gift taxes, taxation of estate of deceased 

al taxes persons, undivided estate, and settlements regarding 

personal taxes. 

Business Handling and making decisions about businesses 

relations regarding VAT settlements, excise duties, retirement 

benefits, taxes on labor costs, A tax, and differences of 

income. 

Corporation Preliminary income assessments, annual tax 

taxes 

statements, dealing with applications and making 

decisions – all regarding foundations, associations, 

and companies. 

Customs Registration of imports and exports, custom 

procedures for private persons and companies, making 

decisions about areas of customs and dealing with 

applications 

permissions. 

for custom licenses and custom 

Vehicles Expedition of vehicles and license plates, handling 

procedures concerning duty exemption, assessments, 

and monthly specifications. 

Estate Assessments (depreciations) of estate, handling 

assessment communications, taxation on the basis of 

the law of assessed valuation, recalculation of taxes, 

and registration of property. 

250

251 

Appendices 

Inspection Common Handling criminal cases and cases of liability, 

including the right to operate, divided estates, and 

inspections. 

Customs Inspection of customs (goods and means of 

transportation) towards citizens and businesses. 

Collection Common Collection tasks, including enforced payments, 

Processes of Legal support 

administration of estates, and handling complaints 

about collection. 

Dissemination of rules, instructions, information, and 

support 

interpretation of practice, rules, and laws. 

Secretary Preparation of draft ministerial replies to the 

service parliament and citizens, of memos and analyzes, and 

submission of hearing statements for legislation and 

ministerial responses for the Fiscal Affairs Committee. 

IT service and Maintaining processes to ensure and document the 

administration best IT support of SKATs processes through 

appointments and contact with the business. System 

ownership, 

management. 

platform ownership, and change 

HR and Administration to ensure proper staff conditions and 

education the treatment of employees according to current rules. 

Examples count recruitment, hiring, training and 

development, payroll, and absenteeism. 

Internal Procurement and administration of goods, services and 

activities buildings, accounting, communications, press, and 

secretarial service. 

Management Strategy Processes through which strategies for SKAT are 

and 

planned by means of sight lines, overall objectives, 

development 

development 

prioritization. 

of strategic initiatives, and their 

Business Administration of grants and contracts, production 

management planning, management of contracts and vendors, and 

IT architecture management 

Development Maintaining tasks that support the legislative process 

or participation in development projects.


252

253 

Appendices 

Appendix 2: Distribution of employees across main processes in the 

business model 

The figures in the table below originate from an e-mail correspondence with SKAT. The 

figures reflect the distribution of full-time equivalents across the six main processes 

from the business model of SKAT. 

Main process # % 

Instruction 881 11,1% 

Settlement 2355 29,8% 

Inspection 2321 29,3% 

Collection 1020 12,9% 

Processes of support 819 10,4% 

Management and development 516 6,5% 

Total 7912 100%

Appendix 3: E-mail invitation to employees 

Subject: Invitation til at svare på et spørgeskema 

Kære medarbejder hos SKAT 

255 

Appendices 

Jeg er ph.d studerende ved Danmarks Biblioteksskole. Som en del af et større 

forskningsprojekt, der udføres i samarbejde med IT & Telestyrelsen, er jeg i øjeblikket 

ved at foretage en undersøgelse af, hvordan medarbejdere hos SKAT benytter 

forskellige informationskilder i forbindelse med deres forskellige arbejdsopgaver. 

Formålet er at undersøge, hvordan man kan forbedre medarbejderes søgning efter 

information, når de løser forskellige arbejdsopgaver. 

Jeg bruger blandt andet et spørgeskema til at indsamle data. I den forbindelse skriver jeg 

til dig for at høre, om du vil bidrage til undersøgelsen ved at besvare spørgeskemaet. 

Det tager ca. 10 minutter at besvare spørgeskemaet, som er tilgængeligt på internettet. 

Din besvarelse vil naturligvis blive behandlet fortroligt. Det betyder, at resultaterne kun 

vil blive gjort op på en sådan måde, at enkeltpersoners besvarelser ikke kan identificeres 

i resultaterne. 

Jeg håber du vil hjælpe med projektet ved at besvare spørgeskemaet. Undersøgelsen 

kører indtil den 18/12-2008 kl. 18. 

Du kommer i gang ved trykke på følgende link: 

http://kalus3.kalus.dk/l?d=3LPNCV2EeE3E 

Du er selvfølgelig velkommen til at kontakte mig, hvis du har kommentarer, spørgsmål 

eller lignende. På forhånd mange tak for din tid og hjælp. 

Med venlig hilsen 

Tanja Svarre 

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 

Tanja Svarre, ph.d studerende 

Danmarks Biblioteksskole, Aalborg-afdelingen, Frederik Bajers Vej 7K, 9220 Aalborg 

Øst 

Tlf. 9815 7922, fax 9815 1042 

E-mail: tas@db.dk


256

Appendix 4: Questions contained in questionnaire 

QUESTION 

NUMBER 

TITLE OF 

QUESTION 

1 Age 1a 

2 Gender 1b 

3 Education 2a 

4 Title of education 2b 

5 Place of employment 3a 

6 Further comments to place of employment 3b 

7 Departmental affiliation 4‐10 

8 Length of service in organization 11 

9 Work tasks within instruction 12 

10 Work tasks within settlement 16 

11 Work tasks within inspection 38 

12 Work tasks within collection 45 

13 Work tasks within processes of support 49 

14 Work tasks within management and development 65 

257 

PAGE OF REFERENCE 

Appendices 

IN WEB QUESTIONNAIRE 

15 Frequency of work task 13a, 17a, 20a, 23a, 26a, 29a, 

32a, 35a, 39a, 42a, 46a, 50a, 

53a, 56a, 59a, 62a, 66a, 69a, 

72a 

16 Work task experience 13b, 17b, 20b, 23b, 26b, 

29b, 32b, 35b, 39b, 42b, 

46b, 50b, 53b, 56b, 59b, 

62b, 66b, 69b, 72b 

17 Need for information to solve work task 14a, 18a, 21a, 24a, 27a, 30a, 

33a, 36a, 40a, 43a, 47a, 51a, 

54a, 57a, 60a, 63a, 67a, 70a, 

73a 

18 Information sources 14b,18b, 21b, 24b, 27b, 30b, 

33b, 36b, 40b, 43b, 47b, 

51b, 54b, 57b, 60b, 63b, 

67b, 70b, 73b


19 Information needs 15a, 19a, 22a, 25a, 28a, 31a, 

34a, 37a, 41a, 44a, 48a, 52a, 

55a, 58a, 61a, 64a, 68a, 71a, 

74a 

20 Metadata 15b, 19b, 22b, 25b, 28b, 

31b, 34b, 37b, 41b, 44b, 

48b, 52b, 55b, 58b, 61b, 

64b, 68b, 71b, 74b 

21 Closure and further contact 75 

258

Appendix 5: Questionnaire pilot test data 

Pilot recipients Logged into pilot 

Finished pilot 

questionnaire 

questionnaire 

100% 89 46% 41 29% 26 

259 

Appendices

Appendix 6: Link to questionnaire 

The full version of the questionnaire can be found following this link: 

http://kalus3.kalus.dk/l?d=zTBK24SAF6ep 

261 

Appendices


262

Appendix 7: Dates for the conduct of focus group interviews 

Date Work task 

9/6-2009 Settlement 

11/6-2009 Instruction 

22/6-2009 Processes of support 

22/6-2009 Inspection: Customs 

23/6-2009 Inspection: Common 

29/6-2009 Management and development 

1/7-2009 Collection 

263 

Appendices


264

Appendix 8: Example of the slides guiding a focus group interview 

265 

Appendices 

The following slides guided the focus group for management and development. 15 

slides were presented to the participants, introducing the purpose of the interview and 

presenting results from the questionnaire. The form of the slides in the present 

appendix was followed in the remainder of the focus group slides.


266

267 

Appendices


268

269 

Appendices


270

271 

Appendices


272

273 

Appendices


274

Appendix 9: Focus group interview guide 

275 

Appendices 

The focus group interview guide followed the structure of the questionnaire and the 

succession of the focus group slide shows. Below the interview guide questions are 

listed as to the slides they were supporting (cf. the previous appendix). 

Corresponding 

slides 

Slide 5 

Slide 6-7 

Slides 8-9 

Interview guide questions 

The interview started out with a short presentation of each participant 

as to: 

Their concrete work task within the main process of the 

business model, 

Their experience with the work task, 

Their educational background, 

How often they carry out the work task, and 

Whether they carry out other work tasks than the one 

discussed today 

Does the frequency of information seeking depend on the 

concrete work tasks? How? Why? 

Is it possible, that the answers from the survey express 

average frequencies? If so, what is the real frequency? What is 

the actual oscillation? 

How often do you seek information? 

Are you seeking information for certain work tasks? 

Is there a difference between the way, you seek information 

depending on the work task in question? 

What sources are used when, and why? 

And for which work tasks? 

What is the frequency of use of concrete information sources?


Slides 10-11 

Slides 12-14 

Do the results reflect your everyday information needs? 

How? 

Are there any differences? 

Can you try to explain, when which metadata could be of use? 

o Is it in certain situations? 

o For certain work tasks? 

o For certain information needs? 

o For certain types of documents? 

How do you think, information seeking at the intranet is 

working at present? 

276

Appendix 10: Transcription conventions 

277 

Appendices 

Verbatim transcription took place in connection with focus group interviews in the 

domain study and individual interviews in the search test. In order to target 

transcription consistency, a set of guidelines were developed ahead of the transcription 

process (cf. Poland, 2003). Some guidelines recurred in the transcriptions of both focus 

group and individual interviews. In both cases, topics that were irrelevant to the theme 

of the interview were omitted from the transcription along with laughter, interjections, 

and the like. Whenever passages were counted out, it was marked with “...”. 

Since the two forms of interviews carried out at some points differ with respect 

to transcription issues, some type specific guidelines supplemented the common 

recommendations mentioned above. Verbatim transcription is a challenging task 

(Halcomb & Davidson, 2006). Transcription of focus group interviews is particularly 

challenging. Apart from identifying and typing single words and statements, the 

transcriber must identify who said what when. In addition the participants occasionally 

spoke all at once. All this considered, it was decided to transcribe the focus group 

interviews without outside assistance. To keep focus on the content of the 

conversations taking place, affirmative remarks from fellow participants were omitted 

from the transcriptions. 

As regards the individual search test interviews an external transcriber was 

hired. In addition to the common omissions, further elements were systematically 

sorted out during the transcriptions. These elements comprise: 

Introductions to the search test (see Appendix 17 for a description of the 

introduction delivered to all test persons ahead of the search test), whether in the 

beginning of the search test or in the middle introducing the part referring to the 

categorization. 

Conversations during the search test, that was considered irrelevant to the 

content of the thesis. These pieces of conversation especially took place, when 

the system had a long response time to a request. 

Clarifying comments, questions and related responses concerning the execution 

of the test. 

All transcriptions were set up with line numbers in order to enable accurate referral.


278

Appendix 11: Verbatim Danish versions of quotes used in the thesis 

279 

Appendices 

This appendix reports the Danish quotes applied in the thesis. The quotes originate from 

the focus group and search test interviews. The full transcriptions have been enclosed 

for the assessment committee. Other interested are referred to the author for details on 

the interviews. 

The identification of quotes follows the structure below: 

Focus group 

transcriptions 

Test person 

transcriptions 

Reference in 

text 

Explanation 

(R1, p. 3) R1 refers to the focus group participant, p.3 to the 

pages of the transcription referred to. 

(TP1, line 2- 

6) (r 

TP refers to the test person delivering the quote.


Quote, (Id.) Original Danish wording of questions and responses in questionnaire and focus group 

data 

R16, p. 5 Jeg bruger elektroniske opslagsværker rigtig meget. Jeg synes jeg finder langt det meste 

på de elektroniske opslagsværker. Hvis jeg søger rigtigt, så får jeg det. Men det 

krydshenviser jo også til både alt det, der ligger på intranettet og det skulle jo også gerne 

fange det, der ligger på Internettet, nemlig via folketingets hjemmeside og den slags ting. 

R23, p. 11 Men så længe, man har et trykt opslagsværk, så er det jo nemmere at slå op i. Hvis man 

lige ved, hvor man skal lede. 

R11, p. 5 For mit vedkommende hvis jeg skal bruge nogle afgørelser, så bruger jeg Google, selvom 

jeg ved jeg kan gå ind i retsinformation og Thomson også. Men jeg søger på Google, for 

jeg synes de der elektroniske opslagsværker, som vi har, de er simpelthen for dårlige. Så 

finder jeg afgørelsen inde på Google og så kan det godt være jeg bliver henvist til en af de 

sider, som vi måske egentlig ret beset burde bruge, men jeg synes deres søgefunktioner er 

simpelthen for dårlige. 

R33, p. 1 Så det informationsbehov, jeg har, det er jo måske mere målrettet på det, som ændrer sig i 

nye retsafgørelser, ny lovgivning, og det får vi jo normalt via intranettet og det vil jo sige, 

at så går jeg ind hver morgen og ser, er der kommet noget, der relaterer sig til inddrivelse, 

og det er den måde, jeg holder mig ajour på. 

R7, p. 5 Så intranettet det er jo vores alle sammens opslagstavle og så er søgeresultatet jo altså 

også derefter. Du får jo bageopskrifterne også, hvis de er lagt ind. 

XX, 

afregning, p. 

3 

xx, 

guidance, p. 

11 

R19 

(0:33:27):, 

p. 5 

Det er tit når vi sidder på agenttelefonerne, f. eks da e-indkomst var nyt, så kunne de 

spørge os ”hvordan laver man en efterangivelse” og vi var også i tvivl om mange af 

spørgsmålene, så kunne vi gå ind og søge på intranettet, men vi opgav. Vi blev nødt til at 

stille dem videre til nogle af dem, der sad med det, for det tog for lang tid og det var 

uoverskueligt at søge på intranettet. Vi kunne ikke finde de svar, vi havde brug for. Fordi 

du fik side op og side ned og alt der stod bare med den mindste om e-indkomst, det 

kommer jo med. 

Jamen hvis det er opgaver indenfor specielle problemstillinger, hvor vi ved at vi har 

kolleger, der har nogle spidskompetencer der, så er det jo fristende at gå hen og spørge, 

fordi vedkommende mange gange også kender måske de sidste afgørelser, der ligger på 

det område. Frem for at begynde at… der er også en tidsfaktor i det. Man kan spare en del 

tid ved at… 

Jo, men der har jeg det også lidt sådan at, jeg kan egentlig lidt bedst lide at slå op i 

toldvejledningen i første omgang og så… hvis det ikke rigtigt jeg synes at jeg er sikker på 

om der nu er kommet noget nyt, så går jeg ind og roder lidt og ser den elektroniske og 

280

281 

Appendices 


data 

sådan noget. Og så går jeg altid over og spørger… 

R1, p. 10 Det kommer jo helt an på hvor god man er til at beskrive det emne. Hvad er det for ord, 

man bruger? Hvem er det, der deler det op i de hovedemner, der kan søges? Det kommer 

helt an på kvaliteten af det, der ligger der. Og dem, der har lagt det ind. 

R7, p. 8 Mange gange i forbindelse med sagsbehandling så går du jo også ind og leder efter jamen 

er der afgørelser, kendelser eller domme på tilsvarende område. Og så går du jo positivt 

ind og søger i domme og kendelser, så det er udelukkende dokumenttypen i første 

omgang, som at du ved at det er sådan en, du vil have fat i. men det er ikke fordi det er det 

vigtigste, men det er en del af det, vi bruger i lige præcis den salgsbehandling. 

R7, p. 2-3 Hvis der kommer en kunde herude, og henvender sig ved skranken, så beder du om at få 

vedkommendes cpr-nummer og går ind på deres oplysninger. Det er den først information. 

Du kan ikke ekspedere en kunde andet end at du søger information mindst en gang. Og så 

er spørgsmålet at hvis folk har troet, at informationen var først på det tidspunkt, at der 

blev stillet et spørgsmål, at man så gik ind og brugte det. Men information er jo allerede, 

når vi henter data frem på skatteyderen. Når vi skifter billede, så henter vi en ny 

information. 

R10, p. 4 Det er fordi vores… da vi var kommunalt ansatte, der gik vores opgave ud på at ligne så 

mange folk som muligt, altså gennemse deres selvangivelse og se, om de gjorde det rigtigt 

eller forkert. Det er så lavet om efter vi er kommet til staten, og det vil sige dengang der 

fik vi en erfaring hele tide og holdt ved lige med hvad sker der på det område og det 

område. Men efter vi er kommet til staten, der er det ikke første prioritet, det er tvært imod 

nok lavest prioriteret, nu der er det at vi skal sørge for at få folk til at bruge tast selv og 

lave fejllister, så derfor mister vi hele tiden noget af det, vi engang bare kunne på 

rygraden. Jeg kan i hvert fald mærke med mig selv, at mange af de spørgsmål, jeg førhen 

bare havde sådan der, det skal du altså ind og læse om nu her. For lige at ajourføre og se 

er der kommet noget nyt siden. 

R14, p. 3 Min umiddelbare forklaring på ministerbetjening vil være, at jamen der er der så meget 

mere på spil, når man betjener ministeren, at man skal være så meget mere sikker i sin 

sag. Det er min umiddelbare vurdering af det, hvor imod jamen altså den paratviden vi har 

som juridiske eksperter på hvert vores område gør, at vi meget ofte kan klare et spørgsmål 

eller et problem med et skud fra hoften med den viden, vi har og så nogle gange, jamen så 

har man brug for lige at slå ting efter. Men altså med ministerbetjening, der skal man være 

100 % sikker, det skal man selvfølgelig også i andre sager, men der er bare mere på spil



data 

med ministerbetjening. 

R32, p. 4 Nu nede i vores gruppe, der er det mere erfaring. Der skal vi vide, hvad den afdeling laver 

næsten, og den afdeling laver og det prøver vi også. Nu skal vi have møde igen på fredag, 

hvor det skal gøres yderligere bemærket med, hvad de enkelte afdelinger laver. Men det er 

jo altså hvad man kan huske hele tiden. Det har den der noget med at gøre og det har den 

der noget med at gøre. Du kan næsten ikke slå det op nogen steder. 

R23, p. 12 Jeg bruger det ikke bare til at søge ud i den blå luft. Det ville jeg gøre på google. Ikke 

ellers. Brugt eller set før. Det er ikke sikkert, man har brugt det, men man har i hvert fald 

set det. 

R28, p. 10- 

11 

TP1, line 

243-244 


49-52 


73-76 

FOC 3, Line 

200-202 

(FOC 6, 

Line 145- 

Det er jo der også at styresignaler og ting og sager kommer. Det vi skal rette os efter 

indenfor forretningen. Og også... vejledningerne, de juridiske vejledninger, når de bliver 

opdateret, kommer det jo også ud der. Så egentlig er der jo rigtig meget, man følger med i. 

Man kan ikke undgå det. Det ville være uhyggeligt, hvis den ikke var på 100 %, vores 

intranet. På en eller anden måde er man ligesom derinde for at kunne passe sit arbejde. 

Altså jeg har ikke fundet noget, hvor der står decideret for, hvordan man gør, men jeg har 

fundet noget, der måske indikerer, at der kan jeg finde reglerne. 

IP: Men det er stadig en to’er for dig? 

TP06: Ja, det synes jeg, det er, fordi man får alligevel lidt at vide om, hvordan 

beskatningsreglerne er… Men man skal selvfølgelig niveauet længere ned for at ramme en 

treer på det. 

Jeg ville ikke give den en treer. Jeg ville nok faktisk give en etter til begge to, fordi jeg 

først kan vide, om det er det korrekte, når jeg kommer ind i og ser, om det egentlig er det, 

jeg har brug for. Men det er dem, jeg ville vælge - med mindre jeg kan se, at jeg kan gå 

videre. 

...det er fuldstændigt ubrugeligt. Man kan ikke finde noget. Ja, det kan du godt, du kan 

finde 5.000 hits på et eller andet. Man kan ikke bruge det til noget. Det er også derfor jeg 

tror, der er mange, der gerne vil have bøger. Det er fordi de er rimeligt sikre på de der 

stikordsregistre... 

Det er en høj, høj, høj frekvens af informationssøgning. Det er jo pibende nødvendigt og 

vigtigt, at alt det, vi sender ud herfra, det er bare rigtigt. Om så det er en sats eller en 

paragrafhenvisning eller hvad dælen det er, så skal det bare være i orden. 

... man kan jo ikke huske alle reglerne udenad, så derfor går man ind og læser på dem. 

282

283 

Appendices 


data 

146 


120-145 


216-218 


285-288 


159-173 


273-277 


281-294 

Hvis der kommer en kunde herude, og henvender sig ved skranken, så beder du om at få 

vedkommendes cpr-nummer og går ind på deres oplysninger. Det er den første 

information. Du kan ikke ekspedere en kunde andet end at du søger information mindst en 

gang.... Men er der en, der kommer og stille en et fagligt spørgsmål, så er behovet ikke 

nær så stort. Fordi så sidder der noget på rygraden, du svarer ud fra... De eneste 

henvendelser, der ikke kræver information er dem, der spørger om vej til motorkontoret, 

de får en vejledning udleveret. Alle andre er der opslag i forbindelse med. 

Vi kan jo ingenting uden at vi har edb mulighed for at gå ind og spørge på en virksomhed, 

krav, hvad skylder den her virksomhed, den her person, hvad skylder han eller hun. Vi 

skal ind over nettet hele tiden. 

Det er jo kun, synes jeg, hvis man skal ekspedere en helt ny sag. Så bliver jeg selvfølgelig 

nødt til at søge noget mere om den her virksomhed, og vis det er en virksomhed, jeg 

kender i forvejen, så går jeg måske bare lige ind og tjekker, hvad er der angivet og hvad er 

der betalt. Men uanset hvad, så går jeg jo altid ind og søger inden jeg skal snakke med en 

virksomhed. 

Min umiddelbare forklaring på ministerbetjening vil være, at jamen der er der så meget 

mere på spil, når man betjener ministeren, at man skal være så meget mere sikker i sin 

sag... med ministerbetjening, der skal man være 100 % sikker, det skal man selvfølgelig 

også i andre sager, men der er bare mere på spil med ministerbetjening.... Man skal være 

100 % sikker på det man skriver og yder og bidrager med, det er rigtigt. 

Nu nede i vores gruppe, der er det mere erfaring. Der skal vi vide, hvad den afdeling laver 

næsten, og den afdeling laver og det prøver vi også... Men det er jo altså hvad man kan 

huske hele tiden. Det har den der noget med at gøre og det har den der noget med at gøre. 

Du kan næsten ikke slå det op nogen steder. 

Jo, det synes jeg, fordi jeg synes også man bruger det til at orientere sig om nogle ting, før 

man møder op eller inden man skriver mailen... det er jo også i forhold til hvordan man 

betragter en opgave, for jeg tænker lidt at intranet og søgning er jo hele tiden en del af mit 

arbejde, også bare det at holde mig orienteret om, jamen både for SKAT som forretning 

men også det fagområde, jeg sidder med. Så man på en eller anden måde enten søger 

information eller har tilmeldt sig en nyhedsmail... Og alle de informationer er jo med til, 

hvordan man kan løse en opgave på en eller anden facon. 

FOC 4, Line ...de søger generelt ikke ret meget. De ringer eventuelt, hvis der er et eller andet.



data 

67-68 


877-878) 


371-373 


208-216 


289-294 


554-558 


10-11 


113-117 

Jeg bruger det ikke bare til at søge ud i den blå luft. Det ville jeg gøre på google. Ikke 

ellers. Brugt eller set før. Det er ikke sikkert, man har brugt det, men man har i hvert fald 

set det. 

Jeg ved der ligger et eller andet dokument, det skal jeg lige bruge nu. Eller jeg ved at der 

findes denne her dom, den skal jeg lige finde frem. Eller et eller andet. Typisk nok noget, 

jeg har set før, men som jeg nu skal bruge igen. 

...da vi var kommunalt ansatte, der gik vores opgave ud på at ligne så mange folk som 

muligt, altså gennemse deres selvangivelse og se, om de gjorde det rigtigt eller forkert... 

det vil sige dengang der fik vi en erfaring hele tide og holdt ved lige med hvad sker der på 

det område og det område... nu der er det at vi skal sørge for at få folk til at bruge tast 

selv og lave fejllister, så derfor mister vi hele tiden noget af det, vi engang bare kunne på 

rygraden. Jeg kan i hvert fald mærke med mig selv, at mange af de spørgsmål, jeg førhen 

bare havde sådan der, det skal du altså ind og læse om nu her. For lige at ajourføre og se 

er der kommet noget nyt siden. 

...for jeg tænker lidt at intranet og søgning er jo hele tiden en del af mit arbejde, også bare 

det at holde mig orienteret om, jamen både for SKAT som forretning men også det 

fagområde, jeg sidder med. Så man på en eller anden måde enten søger information eller 

har tilmeldt sig en nyhedsmail, hvor man så får det ind på den måde. Og alle de 

informationer er jo med til, hvordan man kan løse en opgave på en eller anden facon. 

”Det er jo der også at styresignaler og ting og sager kommer. Det vi skal rette os efter 

indenfor forretningen. Og også... vejledningerne, de juridiske vejledninger, når de bliver 

opdateret, kommer det jo også ud der. Så egentlig er der jo rigtig meget, man følger med i. 

Man kan ikke undgå det. Det ville være uhyggeligt, hvis den ikke var på 100 %, vores 

intranet. På en eller anden måde er man ligesom derinde for at kunne passe sit arbejde.” 

Men første gang, jeg søgte, der kom der en håndbog om e-handel. Den ville jeg hellere 

have valgt end at gå derned. 

Det er ligeså ringe, for der står restance. Og arbejdsgivere, og det er ingen af delene. Så 

skal vi se med arbejdsgivere… fordi der står arbejdsgivere og A-skat. Og det er indeholdt 

af A-skat, ligesom vores arbejdsgiver indeholder vores skat. Det kan jeg simpelthen ikke 

finde. Jeg ved, den ligger derinde. Men ud fra det her kommer jeg aldrig derind. For når 

jeg ved, hvor det ligger, så ville jeg gå direkte efter den der i stedet for. 

TP15, line Ja, men omvendt kunne den jo også give… fritekst… så skulle de jo alle sammen komme. 

284

285 

Appendices 


data 

306 


295-301 


257-260 


625-633 


553-555 


392-395 

Der står lige nøjagtig, at… Altså, omkostninger til EU's grænse skal medregnes i 

toldværdien. Den anden, der vedrører transporten, der kan jeg se, at den herinde forklarer 

det helt præcist her. Men der har jeg heller ikke været ind og søge på ”told” hernede. Den 

kom på bare på, at jeg søgte på ”fragt og toldværdi” og ”sider med alle ord”. Og så kom 

jeg ind på toldvejledning, som også er den, der henviser til toldkodeks, som behandler de 

der regler om, hvor meget fragt der skal lægges til. Så denne her er jo en treer. Men jeg 

kom ikke ind til den ved at søge på ”erhvervsmæssig import” eller ”forsendelse” eller 

”eksport”. 

TP21: Der hjalp den ikke så meget, for der var ikke så mange dokumenter alligevel. Der 

kunne du overskue de dokumenter, der var der, om du havde haft den eller ej. Der var kun 

14 dokumenter, der kom frem. Dem ville du kunne overskue. Den vil nok hovedsageligt 

være en hjælp, når du kommer op på de store mængder, altså 1000 dokumenter og den 

slags. 

Jeg sad her til sidste og kunne gå tænke mig at gå over. For uanset hvad jeg gjorde, kunne 

jeg ikke finde det. Og så må jeg have et andet søgested, hvor jeg kan have en mulighed for 

at se nogle andre underpunkter, så jeg måske ad den vej kan gå ind. Så i den sidste synes 

jeg, jeg manglede det. 

IP: Sådan til at generere ideer til, hvad man kunne søge på, eller? 

TP02: Ja, fordi jeg synes, at det, jeg satte ind… Det hedder måske noget andet i 

momsloven, end det jeg satte ind. Det skal jeg lige have fundet ud af. I forhold til det, der 

manglede jeg den her. Der irriterede det mig, at der var en seddel. For uanset hvad jeg 

gjorde, kunne jeg ikke få den op. 

TP09: Der fungerede det jo godt, for der fandt jeg jo lige pludselig et overemne, som jeg 

så kunne klikke ind på. Og det gav mig… hov, ja, det har noget med selskabsbeskatning at 

gøre. Så det hjalp mig lidt på mig, også med at tænke, hvad det er for noget, det her. 

TP06: Ja, det havde jeg. Jeg vidste, at hvis jeg skulle gå ind at kigge på noget med 

beskatningen, så vidste jeg også noget om selvstændig virksomhed. Og så kunne jeg 

hurtigere gå ind der… Så vidste jeg skulle gå ind under personlig indkomst og ikke 

kapitalindkomst. Jeg kender de skattemæssige regler. Så er det nemmere at gå ind i 

kategorierne, når man sådan set kender svaret på forhånd.


TP20, 

line 

339-344 

TP14, 

line 

493-495 

Men jeg ved ikke om jeg nogensinde ville begynde at løbe alt det igennem. For jeg synes det 

for mig tager længere tid, fordi jeg ikke har kendskab nok til, hvad der ligger bag. Hvis jeg nu 

var fagmenneske i Skat og vidste alt om virksomhedsskatteordninger e.l., så kan det godt 

være, at den var genial for mig. For jeg ville vide, at jeg lige præcis kan godt ind og så trykke 

på det der og så få dokumenterne frem. Men jeg ved ikke, om den måske ville glemme nogle 

dokumenter, som jeg har brug for. Om den begrænser for meget. 

Når man får det første spadestik i , hvad det er for nogle kategorier, hvad de står for og dækker 

over, sådan at… Så fumler man, indtil man finder ud, hvad det er. Er der flere veje til Rom, 

eller hvordan er den hurtigste, eller… ja. Det er en tilvænning med nogle ting. Hvad er det 

smarteste at gøre… 

286

Appendix 12: E-mail invitation to participate in search test 

287 

Appendices 

The test persons for the search test were contacted by e-mail. The e-mail informed the 

employees about the purpose of the search test and the progress of the search test. Also, 

the e-mail informed potential test persons about privacy issues. The e-mail appears at 

the following page (in Danish).


Subject: Vil du bidrage til at forbedre SKATs intranet? 

Kære medarbejder hos SKAT 

Som en del af et større forskningsprojekt vedr. søgemuligheder på SKATs intranet, 

foretager vi i den kommende tid en evaluering af SKATs intranetløsning. Formålet med 

evalueringen er at undersøge, hvordan man kan forbedre medarbejderes søgning efter 

information, når de løser forskellige arbejdsopgaver. 

I den forbindelse har vi brug for din hjælp. Intranettet bliver testet af et udvalg af 

medarbejdere i SKAT. Testen udføres hhv. på ”location 1” og ”location 2”. Det tager 

ca. 1 1/2 time og består i, at du får nogle søgeopgaver udleveret, som er dit 

udgangspunkt for søgetesten. Der søges i både den nuværende og den nye 

intranetløsning. Søgetesten afsluttes med et kort interview. Din deltagelse vil naturligvis 

blive behandlet fortroligt og resultaterne formidlet på en måde, så du ikke vil kunne 

identificeres. 

Hvis du vil være med, beder vi dig udfylde, hvornår du har mulighed for at deltage samt 

hvordan vi kan komme i kontakt med dig ved at trykke på dette link: [logindata] 

Du vil desuden blive stillet nogle enkelte spørgsmål omkring din arbejdsfunktion og 

brug af intranettet. Besvarelsen tager omkring 3 minutter. 

Vi vil meget gerne have din tilkendegivelse hurtigst muligt og senest torsdag den 20/5- 

2010 kl. 18. 

Forskningsprojektet udføres som et samarbejde mellem Danmarks Biblioteksskole, IT 

& Telestyrelsen og SKAT. Hos SKAT er projektet forankret i Projektenheden (Ebbe 

Tor Andersen). Søgetesten er godkendt af viceskattedirektør Kaj Kirkegaard. Hvis du 

har kommentarer, spørgsmål eller lignende, er du velkommen til at kontakte Tanja 

Svarre (kontaktoplysninger nedenfor). 

På forhånd mange tak for din tid og hjælp. 

Med venlig hilsen 

288

Ebbe Tor Andersen (Kommunikation, SKAT) og 

Tanja Svarre (Danmarks Biblioteksskole) 

289 

Appendices 

,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, 

Tanja Svarre, ph.d.-studerende 

Danmarks Biblioteksskole, Aalborg-afdelingen, Fredrik Bajers Vej 7K, 9220 Aalborg 

Øst 

Tlf. 9815 7922, fax 9815 1042 

E-mail: tas@db.dk 

Ebbe Tor Andersen, specialkonsulent 

Projektenheden, SKAT 

Tlf. 7 17 02 

E-mail: Ebbe.Tor@Skat.dk


290

291 

Appendices 

Appendix 13: Questionnaire for recruiting test persons for the search test 

The present appendix presents the questionnaire applied for recruiting test persons for 

the search test. The questionnaire was prepared in Kalus and is available at: 

http://kalus3.kalus.dk/l?d=RmMBCvq24teH


292

Appendix 14: Simulated search tasks 

293 

Appendices 

The present appendix presents the three search tasks forming the basis for the controlled 

searched in the search test.


SIM 1: Salg af forældrekøbt lejlighed 

Søgecase: 

Kirsten har solgt en lejlighed købt som forældrekøb. Hun har haft tab på salget og i 

samme forbindelse haft udgifter til ejendomsmægler og renovering af lejligheden. Kan 

hun nu trække tab og udgifter fra i skat? 

Søgeopgave: 

Find dokumenter, der angiver de skattemæssige forhold omkring et forældrekøb. 

SIM 2: Beskatning af e-handel 

Søgecase: 

Et personligt ejet enkeltmandsforlag ønsker at sælge egne, engelsksprogede bøger ved 

hjælp af e-handel på hjemmesider i USA og andre lande, eksempelvis Amazon.com og 

Smashwords.com. Der er fast driftssted i Danmark. Hvordan skal indehaveren forholde 

sig i forhold til beskatning af indtægten på salget? 

Søgeopgave: 

Find dokumenter, der angiver, hvordan man beskatter e-handel, der har fast driftssted i 

Danmark. 

SIM 3: Freelancer 

Søgecase: 

Jens underviser freelance for en virksomhed, men er på vej til at udvide med flere 

kunder. Lykkes alle forhåndsaftaler, vil han komme til at tjene omkring 100.000 årligt. 

Nu er han blevet i tvivl om, om han i givet fald kan fortsætte som lønmodtager eller om 

han skal starte erhvervsmæssig virksomhed op og momsregistreres. 

Søgeopgave: 

Find dokumenter, der angiver reglerne for, hvornår man skal momsregistreres. 

294

Appendix 15: Test persons’ insight into simulated search tasks 

295 

Appendices 

Every time a search task had been completed the test persons answered a short on 

screen questionnaire capturing their insight into the subject of the task. The 

questionnaire was embedded in Morae. The questions contained in the questionnaire 

were: 

1. Min indsigt i arbejdsopgavens emne: 

Ingen 

indsigt 

2. Hvor svær var opgaven? 

1 2 3 4 5 Stor indsigt 

Meget let 1 2 3 4 5 Meget svær 

3. Hvor meget mindede opgaven om de arbejdsopgaver, du sidder med til dagligt? 

Intet 

sammenfald 

1 2 3 4 5 

Stort 

sammenfald


296

Appendix 16: E-mail concerning naturalistic information needs 

297 

Appendices 

A few days before the search test the test persons received an e-mail asking them to 

bring a search task for the test session. We intentionally sent the e-mail shortly before 

the time of the test. This way we wanted make sure, that the test persons actually 

remembered to bring the task when showing up. An alternative would have been to 

mention it in the e-mail confirming the appointment. However for some test persons, 

they received the confirmative e-mail weeks before the appointment and that could have 

caused the test persons to forget about the extra task. Another benefit was that the test 

persons were reminded of their upcoming appointment. The text of the e-mail appears 

below: 

Fra: Tanja Svarre Jonasen 

Sendt: on 16-06-2010 10:17 

Emne: Vedr. søgetest 

Kære testperson, 

Når du møder op til søgetesten en af de kommende dage bedes du medbringe et problem eller 

en søgeopgave, som du for nyligt har løst ved at søge på det nuværende intranet. Opgaven 

skal helst bære præg af at være typisk for din brug af intranettet. 

Vel mødt i lokale F-1-46 (lokalet ved siden af videokonferencen). 

Mange hilsner, 

Tanja Svarre 

Tlf. 9877 3025


298

Appendix 17: Instructions for search test persons 

299 

Appendices 

This appendix presents the elements contained in the instruction given to the test 

persons in advance of the search test.


Instructions for test persons 

Test procedure: 

You will receive a set of search tasks. The search tasks are divided into two 

groups to be searched using their own search functionality. 

Please find the documents you need to solve the search task you are working on. 

Maybe you know the answer to the task in advance, but please search until you 

have the document or the documents that can answer the task. 

Documents are assessed by a 4-point relevance scale (the description is 

presented and handed out to the test person in print to enable brush-up during 

the test). 

When you make a search please print out the search results. It will be used for 

relevance judgments afterwards. 

After completion of each of the tasks you fill out a questionnaire on the screen 

concerning the task and then finally I have some general questions about the 

system you have been searching. 

. 

Presentation of the prototype: 

System functionalities 

o Automatic truncation 

o Search type: Explanation of the different possibilities 

o Document types contained: The types are presented and a printed 

overview is handed out. 

o Categorization (is presented as the test person starts out using it, either 

initially or on the way) 

o Time of publishing 

The system is a prototype. This means that you need to be aware of: 

o Please dispense with the numbers stated after the categories – they are 

not exact at the present time 

o The database has been generated in the fall of 2009, which means that 

the latest documents are not contained. In case you are looking for 

instructions or similar documents that have been updated recently, it will 

be sufficient for you to find the latest document able to answer your 

request in the collection. 

300

301 

Appendices 

o At present the system does not offer correspondence between result lists 

and the full text of the documents. Therefore, please consider the 

relevance of your search results on the basis of the hit lists. 

o You need to use your mouse to click “search”. Using the “enter”-button 

will direct you to simple search.


302

Appendix 18: Rotation of search tasks 

303 

Appendices 

A set of principles were set up for the construction of the rotation. The first row of 

unique successions was generated by listing the work tasks as to their number in 

ascending and declining succession respectively. Next, all work tasks moved one 

position to the right. The following rotation moved the last task to position number two. 

Lastly, a rotation was generated by moving the task at position number three to position 

number one. All rotations were generated twice, starting respectively with 

categorization, or without. By these means, 42 unique rotations were formed. Of these, 

32 were needed for the search test. These rotations are listed in the table on the next 

page.


1. position 2. position 3. position 4. position 

1 1 (Sys B) 2 (Sys B) 3 (Sys A) 4 (Sys A) 

2 1 (Sys A) 2 (Sys A) 3 (Sys B) 4 (Sys B) 































Legend: Sys A and Sys B refer to the designations of the two parts of the test system (see section 

6.4.1). The columns list the search tasks as to their position in the order of succession. 

304

Appendix 19: Search test interview guide 

305 

Appendices 

This appendix presents the interview guide finishing the test sessions. The questions 

are categorized in three superior groups. 

Perception of the test system 

When was the categorization (not) helpful to you during searching? 

In which way? 

What was it about the categorization that made it (un)helpful to you? 

In which situations did you not need the categorization? 

Present use of the intranet 

What characterizes your use of the present intranet? (situations, where it is omitted, 

documents you look for in the intranet and the like)? 

Categorization in your daily work 

I would like you to describe a typical situation from your daily work, where you make 

use of the intranet. 

To that situation, how would categorization be useful to you? 

To that situation, how would categorization not be relevant to you?


306

307 

Appendices 

Appendix 20: Judgement of the relevance of retrieved documents in search 

test 

The present appendix presents the four degrees of relevance, the test persons could use 

during the assessment of retrieval sets. The explanation of the distinct degrees are 

based on Sormunen (2002). In the test situation, the content of the appendix were 

explained to the test persons. Further, a print of the explanations was placed next to the 

test machine to allow for the test persons to consult it whenever needed.


Dokumenters relevans 

0: Dokumentet indeholder ingen information om emnet 

1: Dokumentet peger på emnet, men indeholder hverken mere eller anden 

information end emnebeskrivelsen, typisk en sætning eller et faktum. 

2: Dokumentet indeholder mere information end emnebeskrivelsen, men ikke på 

en udtømmende måde. Er der tale om et emne med flere facetter er det kun 

visse undertemaer eller synspunkter, der er dækket. Typisk et tekstafsnit, nogle 

sætninger eller fakta. 

3: Dokumentet diskuterer emnets temaer udtømmende. Er der tale om et emne 

med flere facetter, er alle eller næsten alle facetter eller synspunkter dækket af 

dokumentet. Typisk flere tekstafsnit eller en del sætninger eller fakta. 

308

Appendix 21: Completeness degree of questionnaire responses 

309 

Appendices 

# 

% of 

798 

Completes the questionnaire 340 42,6% 

Answer beyond Inspection (page 45 in the questionnaire) but quits 

somewhere hereafter 

27 3,4% 

Stop, when the questions regarding Inspection (page 44 in the 

questionnaire) has finished 

66 8,3% 

Stop when the work tasks starts (before page 12 in the questionnaire) 13 1,6% 

Start the questionnaire but quits before answering their place of 

employment (before page 3 in the questionnaire) 

14 1,8% 

Sign into the questionnaire, but does not answer any questions 36 4,5% 

Do not log in at all 302 37,8% 

Total 798* 100% 

Legend: The questionnaire was sent to 799 respondents. However, one could not be included due to 

errors and were deleted. Therefore, the sum of respondents adds up to 798.


310

Appendix 22: Respondents’ experience with work tasks 

Work task/ 

experience with work task 

311 

0-6 months 

Instruction 6 

3% 

Settlement: common 

Settlement: preliminary assessment of 


Settlement: business relations 

2 

4% 

Settlement: corporation taxes 1 

4% 

Settlement: customs 


6% 


14% 

Inspection: common 


6% 

Collection 4 

10% 

Processes of support: legal support 

Processes of support: minister service 3 

30% 

7-11 months 

13 

7% 

1 

5% 

2 

4% 

4 

7% 

3 

12% 

3 

25% 

6 

33% 

1 

2% 

2 

13% 

4 

9% 

1-2 years 

33 

18% 

4 

20% 

6 

11% 

13 

23% 

7 

28% 

1 

8% 

8 

44% 

1 

7% 

5 

8% 

1 

6% 

7 

18% 

7 

16% 

Appendices 

3-5 years 

21 

12% 

3 

15% 

4 

7% 

2 

4% 

2 

8% 

2 

17% 

1 

6% 

3 

21% 

5 

8% 

1 

6% 

9 

23% 

10 

22% 

2 

20% 

More than 5 years 

108 

60% 

12 

60% 

43 

75% 

38 

67% 

12 

48% 

6 

50% 

2 

11% 

8 

57% 

49 

80% 

11 

69% 

19 

49% 

24 

53% 

5 

50%


Processes of support: IT service and administration 

Processes of support: HR and education 

Processes of support: internal activities 4 

27% 

Management and development: strategy 2 

13% 

Management and development: business management 1 

7% 

Management and development: development 3 

11% 

312 

3 

21% 

1 

7% 

1 

6% 

3 

21% 

3 

11% 

3 

21% 

2 

14% 

3 

20% 

1 

6% 

1 

7% 

7 

26% 

4 

29% 

5 

36% 

5 

33% 

7 

44% 

5 

36% 

6 

22% 

4 

29% 

6 

43% 

3 

20% 

5 

31% 

4 

29% 

8 

30%

313 

Appendices 

Appendix 23: Age distribution of population, respondents and test persons 

To compare, the total figures of SKAT at the time count: 

Total 

numbers 

17-18 19-25 26-35 36-45 46-55 56-68 

3 91 950 2180 2972 2473 

% 0% 1% 11% 25% 34% 28%


Questionnaire respondent distribution 

N Valid 340 

Missing 0 

Mean 47.29 

Median 47.00 

Mode 44 

Std. Deviation 9.537 

Skewness -.354 

Std. Error of Skewness .132 

Population distribution: 

N Valid 8681 


Mean 48.44 

Median 49.00 

Mode 58 



Std. Error of Skewness .026 

314 

Test person distribution 

N Valid 31 


Mean 46.45 

Median 48.00 

Mode 48 



Std. Error of Skewness .421

Appendix 24: Respondents’ length of service in the organization 

315 

Appendix 1


316

Appendix 25: Focus group participants work tasks 

317 

Appendices 

The present appendix shows the distribution of participants across the 19 generic work 

tasks. In the introductory part of the focus group interviews, the participants were asked 

to present themselves. It is this personal introduction that has functioned as the basis for 

the table below. We compared the participants’ descriptions of their work areas with the 

work task descriptions from the questionnaire. On this basis, we placed the respondents 

as to their primary work task. In some cases, in particular in the interview concerning 

Instruction, other more important areas of responsibility appeared during the interview. 

In those cases we assessed which work task were more important and let that 

assessment guide the placement of the specific participant.


Main process Work task Participants 

Instruction Instruction R7, R8, R9, R10, 

R11, R12 

Settlement Common 

Preliminary assessment of 


R4 

Business relations R3, R6 

Corporation taxes R2 

Customs R18, R19, R20, 

R21, R22 

Inspection 

Vehicles 

Estate 

Common R23 R24, R25, 

R26, R27 

Collection 

Customs 

Collection R33, R34, R35 

Processes of support Legal support R1, R13, R14, 

R15, R16, R17 

Minister service 

IT service and administration R28 

HR and education R1 

Internal activities R32, R28 

Management and Strategy R29, R30 

development 

Business management 

Legend: R20 refers to the specific focus group participant. 

Development R29, R31 

318

Appendix 26: Additional sources mentioned by respondents 

319 

Appendices 

This appendix reports the sources listed by the respondents to supplement the 

predefined list of sources used for information seeking in relation to certain work tasks. 

The sources are listed as to the work task, they have been mentioned in connection with.

Automatic Indexing 

Work task Additions to predefined sources 

Instruction BIQ systemen 

bekendtgørelse, herunder registreringsbekendtgørelsen og 

registreringsafgiftsloven 

SKATs almindelige systemer Fx DR, TStele, KMD osv 

Database vedr tilbageholdte forsendelser 

DetailCOR (= indkomstoplysninger indeværende år); google 

maps til kørselsvejledning 

www.europa.eu 

www.skat.dk 

Nabolæring i tvilvs tilfælde 

kmd-skat 

Best Practic Vejledninger 

Diverse interne søgesystemer 

skat.dk 

Inddrivelsesvejledningen 

skat ligning, ts tele, virksomhedsreg., remedy 

Google – søgning 

politiets EDB-programmer 

BIQ og SØS 

Kollegaer 

Generelt bruger jeg alle, men Intranet og Captia er de mest brugte 

Interne systemer i Skat. 

Tele, KMD osv. 

Specielt internetsider med kort og luftfoto 

Remedy, KMD, Dipsy, DR-sys, TP-sys m.fl. 

Forskelligt kursusmateriale 

trykte lovsamlinger. TfS. Lærerbøger o.l. 

Remedy 

CVR.dk 

SKAT's interne it-systemer 

TP, TS-tele, Remedy, Sap, KMD osv. 

SKATS hjemmeside - ikke intranettet, men den offentligt 

tilgængelige hjemmeside www.skat.dk 

I perioder bruge jeg captia ellers bruger jeg ingen udover skats 

sys. 

SKATs egne EDB-systemer 

SKAT's generelle systemer 

Programmeringsgrundlag, programmer mv. 

Jeg bruger SKATs jurister og læser ellers 

lovforslag/lov/bemærkninger og ligningsvejledning 

Sharepoint 

EU's databaser, sites for andre landes skattemyndigheder, 

interesseorganisationers og virksomheders hjemmesider 

Afhænger af den konkrete opgave 

Settlement: common politiets edb-systemer 

KMDs og statens systemer 

320

321 

Appendices 


Skat´s DR-system, SAP og andre skattesystemer 

Det afhænger af situationen. Det er mest udenlandsk indkomst jeg 

Settlement: prelim- 

nary assessment of 

income/personal 

taxes 

Settlement: business 

relations 

Settlement: 

corporation taxes 

behandler og det kræver ofte yderligere undersøgelser. 

KMD SkatLigning (sagsbehandlersystem). Captia er noget 

skrammel at arbejde med til fremfinding af dokumenter 

(medmindre jeg endnu ikke har gennemskuet det smarte ved 

Captia) 

skat.dk 

Google søgning 

Kollegaer 

Egne mapper over oplysninger/vejledninger, som jeg har samlet 

gennem tiden eller som vi selv har aftalt i afdelingen. 

Hukommelsen og mangeårig erfaring om skat er de væsentligste 

kilder i dagligdagen. 

Statens systemer og KMD Skatligning 

Remedy 

Det afhænger meget af hvilken type indtægt eller situation jeg 

behandler. Det er mest udenlandsk indkomst og flytning til 

udlandet jeg arbejder med. 

Kollegaer 

sparring med kollegaer 

Skattenyt – Schultz 

Google 

SKAT´s interne edb-systemer. Busines Object, KMD Skat 

Ligning, KMD Skat Forskud, TP-systemet, Remedy, CPRsystemet, 

Dipsy, Erhvervssystemet 

Aviser 

Jeg er afdelingsleder for ca. 20 medarbejdere - jeg anvender stort 

set hele min arbejdstid til personaleledelse. 

Jeg er afdelingsleder, så jeg laver ikke direkte sagsbehandling 

Momsmanual 

Ingen af dem. Jeg hjælper en gang imellem med at taste 

momsangivelser. Til daglig sidder jeg med Listeangivelser vedr. 

EU-salg 

SKAT's generelle systemer 

BIQ og SØS 

Kollegaer 

Settlement: customs 

 

www.europa.eu 

EU's forordninger vedr. forsendelser 

EU's elektroniske opslagsværker 

Toldsystemet 

EUR-lex 

Settlement: vehicles Egne notater 

Settlement: estate 

 

BBR, Kort- og Matrikelstyrelsen, Elektroniske varslinger 

Specielt internetsider med kort og luftfoto 

Danmarks Arealinfo, Danmarks Statistikbank, Kort&



Matrikelstyrelsen, OIS, BBR, Kommunernes hjemmesider med 

lokalplaner, GEO, Plansystem 

Plansystemer, kortopslag, realkreditrådet 

Inspection: common BIQ 

spørger kollegaer 

Nabolæring 

SRF - kursusmaterialer, m.v. 

KMD-SkatLigning, Remedy,mv 

Konkrete oplysninger fra SKATs egne systemer. TS-tele, KMD 

skat/ligning, Remedy. Dvs. systemgenererede, indtastede, interne 

dokumenter, arbejdspapirer mv. Kontroloplysninger på R-75 med 

mere. 

Diverse andre interne søgesystemer 

Amadeus database, Kob, Biq 

Kollegaer 

Aviser 

jeg er ikke sagsbehandler 

Remedy, KMD, Dipsy, DR-sys, TP-sys, m.fl. 

Inspection: customs FødevareErhvervs hjemmeside. 

FødevareErhvervs hjemmeside - EU-tidende 

Captia bruges kun til afdelings sagen 


feoga-håndbogen. Forordninger fra EU 

toldsystemet 

Collection Momsprogrammer 

Undervisningsmateiale fra studie samt relevante bøger fra studiet 

Domstole.dk 

saprring med kollegaer 

Egne systemer 


legal support 


IT service and 



internal activities 

Skattemappen 

kmd-skat 

Skattemappen 


-one word: google (forresten: jeg har ingen økonomiske 

interesser i at fremhæve google fremfor andre....kun at google 

virker, hvergang) 

Ingen 

Egen SharePoint løsning (Sysmod fase 1), Dokumenter i 

filstruktur 

Microsoft også som trykte medier 

Programmer og programmeringsgrundlag 

Sharepoint som væsentligste redskab 

Gamle mails, Programmeringsgrundlag mv 

KMD SKAT LIGNING 

cvr registret 

322



development: 

 

 

google 

Afdelingens fællesdrev 

strategy 


development: 

development 

KMD-SKAT LIGNING 

Værktøjer og vejledninger der er placeret på H-drevet 

Datawarehouse, KMD Skat Ligning(sagsstyring) dipsy 

www.skat.dk 

Vores SAP-system, udtræk af diverse rapporte 

Afdelingens fællesdrev 

323 

Appendices


324

Appendix 27: Test persons’ background data 

325 

Appendices 

The appendix contains tables displaying background data for the test persons of the 

search test as regards gender, age, length of service, and education. The tables were 

generated in SPSS and are listed in the order just mentioned. For all three tables in the 

appendix, one person did not respond to these particular questions. This explains the 

difference in N (in the search test N=32, in this appendix N=31). 

Gender distribution 

Frequency Percent Valid Percent 

Cumulative 

Percent 

Valid Male 10 31.3 32.3 32.3 

Female 21 65.6 67.7 100.0 

Total 31 96.9 100.0 

Missing System 1 3.1 

Total 32 100.0 

Legend: The table displays the distribution of men and women in the group of test persons. N=31. 

The test persons’ year of birth and length of service 

Year of birth Length of service 

N Valid 31 31 

Missing 1 1 

Mean 1963.23 21.68 

Median 1962.00 24.00 

Minimum 1949 4 

Maximum 1980 43 

Legend: Calculations of the average, minimum and maximum age and length of service of the test 

persons. The length of service column denotes the number of years, the test persons have been 

working in the organization. N=31.


Latest education of the test persons 

Frequency Percent Valid Percent 

326 

Cumulative 

Percent 

Valid Internal clerk programme 6 18.8 19.4 19.4 

Administrative assistant 4 12.5 12.9 32.3 

Other vocational education 

and training 

2 6.3 6.5 38.7 

Bachelor degree 1 3.1 3.2 41.9 

Medium-cycle higher 

education 

1 3.1 3.2 45.2 

Long-cycle higher education 9 28.1 29.0 74.2 

Master's programme 8 25.0 25.8 100.0 

Total 31 96.9 100.0 

Missing System 1 3.1 

Total 32 100.0

Appendix 28: Supplementary search test tables 

Table 1: Reformulations in sessions 

System A Min: 0 

Max: 5 

SD=1.5 

System B Min: 0 

Max: 18 

SD=5.2 

Total Min: 0 

Max: 18 

SD=3.9 

(n=32) 


Min: 0 

Max: 11 

SD=2.8 


Max: 10 

SD=2.9 


Max: 11 

SD=3.3 

(n=32) 


Max: 27 

SD=6.6 


Max: 15 

SD=3.6 


Max: 27 

SD=5.3 

(n=32) 

327 


Max: 9 

SD=2.7 


Max: 10 

SD=3.4 


Max: 10 

SD=3.1 

(n=32) 

Table 2: Correlations of the number of search terms in queries and number of hits 

Correlations 

No. of terms in 

query No. of hits 

No. of terms in query Pearson Correlation 1 .200 ** 

Sig. (2-tailed) .002 

N 229 229 

No. of hits Pearson Correlation .200 ** 


N 229 229 

**. Correlation is significant at the 0.01 level (2-tailed). 

Legend: The table displays the correlations in system A, as the number of 

hits in system B is much lower due to categorization. Therefore: N=229. 

1 

Appendices 


Max: 27 

SD=4.0 


Max: 18 

SD=3.9 


Max: 27 

SD=4.0 

(n=128)


Table 3: Correlations between number of terms in queries and the succession of search tasks. 

Correlations 

328 

No. of terms in 

query 

Succession of 

search task 

No. of terms in query Pearson Correlation 1 .037 


N 564 564 

Succession of search task Pearson Correlation .037 1 


N 564 564 

Table 4: Number of documents retrieved in system A and system B using the possible search 

operators (averages) 

FT AW ES OW Total 

Number of documents 548.3 120.8 27.5 332,6 309.6 

retrieved: System A (n=102) (n=110) (n=13) (n=4) (N=229) 

Number of documents 24.7 10.2 1.5 18 19.2 

retrieved: System B (n=133) (n=78) (n=2) (n=2) (N=215) 

Legend: FT=Free text, AW=Pages containing all words, ES=This exact sentence, OW=At least one 

of the words. For system B searches: N designate the number of system B searches actually carried 

out in system B (cf. section 8.2.3). 

Table 5: Combinations of category reformulations with other types of reformulations in system B 

queries (percentages) 

Reformulations Query terms Document type Search operator 

Share of N=83. 66 (79.5) 18 (21.7) 17 (20.5) 

Legend: The queries included in the table are system B queries containing category type 

reformulations in combination with the remaining types of reformulations. N=83.

Automatic indexing in e-government - VBN - Aalborg Universitet

Create successful ePaper yourself

Delete template?

Save as template?