31.07.2015 Views

Google Scholar's Ranking Algorithm: The Impact of ... - Jöran Beel

Google Scholar's Ranking Algorithm: The Impact of ... - Jöran Beel

Google Scholar's Ranking Algorithm: The Impact of ... - Jöran Beel

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Publication YearPublication Yearcaused by the way search queries were created. <strong>The</strong>ywere created automatically by combining differentwords from a word list which resulted in somesenseless search queries such as „finish father‟ or„excessive royalty‟. While sufficient documentationexists in which, for instance, the words „finish‟ and„father‟ occur somewhere in the full text, nodocuments exist which include these words in the title.Table 2: Amount <strong>of</strong> Search Results by Number <strong>of</strong> SearchTerms (Title Search)SingleTermsDoubleTermTripleTermTotalNumber <strong>of</strong> Search Results[0,1] [2, 10] [11, 50] [51, 250] [251, 1000] [1001, 10000] [10001, *] TotalAbsolute 0 1 1 12 23 102 211 350Relative 0,0% 0,3% 0,3% 3,4% 6,6% 29,1% 60,3% 100%Absolute 166 89 54 27 11 3 0 350Relative 47,4% 25,4% 15,4% 7,7% 3,1% 0,9% 0,0% 100%Absolute 345 5 0 0 0 0 0 350Relative 98,6% 1,4% 0,0% 0,0% 0,0% 0,0% 0,0% 100%Absolute 511 95 55 39 34 105 211 1050Relative 48,7% 9,0% 5,2% 3,7% 3,2% 10,0% 20,1% 100%Overall, data from 1,561 search queries (1,050searches in the full text and 511 searches in the title)was used for further analysis. <strong>The</strong> 1,561 searchqueries returned a total <strong>of</strong> 1,364,757 results(1,032,766 articles for full text searches and 331,991articles for title searches). For 810,793 <strong>of</strong> the1,032,766 articles retrieved via full-text search and288,956 <strong>of</strong> the 331,991 articles retrieved via titlesearch, <strong>Google</strong> Scholar displayed the publication year.Those years and the articles' rankings were stored andanalyzed. To verify correct execution <strong>of</strong> the <strong>Google</strong>Scholar parser, spot checks were performed.All results <strong>of</strong> the search queries were visualized asgraphs to recognize patterns. In addition, the mean,median, and modal <strong>of</strong> each position was calculatedand displayed in a graph. Overall, a total <strong>of</strong> 1,567graphs were created and inspected individually.5. ResultsOn first glance, results <strong>of</strong> the current study seem toconfirm our previous results. Graphs <strong>of</strong> individualsearch queries show no significant interdependencybetween an article‟s age and its ranking in <strong>Google</strong>Scholar (see also [10]). This is true for all kind <strong>of</strong>search queries such as searches in full-text or title andsearches with single-word, double-word and triplewordqueries. <strong>The</strong> graphs show that publications fromall years are evenly distributed throughout the resultlist (see Figure 3, Figure 4 and Figure 5) 5 .5 <strong>The</strong> graphs also show that <strong>Google</strong> Scholar has far moredocuments from the 90s and current decade in its databasethan from decades before. However, this is out <strong>of</strong> thecurrent study‟s scope.However, looking at the average age, anotherimpression evolves. Figure 6 displays the averagepublication year (mean) for each position in <strong>Google</strong>Scholar. It shows clearly that in the top positionsarticles are on average older than articles in theremaining positions 6 .A look at the numbers confirms this assumption.While those papers ranked in position 1 by <strong>Google</strong>Scholar were on average published in 1992, papers onposition 5 were on average published in 1993, paperson position 100 in 1994 and papers on position 500 in1995 (see Table 3). Graphs for title-searches looksimilar (see Figure 7) and no significant differencesoccurred between single-word, double word and tripleword search queries 7 .20082003199819931988198319781973196819631958200820031998199319881983197819731968196319580 200 400 600 800 1000Position in <strong>Google</strong> ScholarFigure 3: Search Query 'Future'0 200 400 600 800 1000Position in <strong>Google</strong> ScholarFigure 4: Search Query '<strong>Google</strong> Scholar'6 Graphs for the modal and median publication year showsimilar pictures.7 In all graphs, some outliers can be observed in the very lastpositions. This is due to <strong>Google</strong> Scholar which <strong>of</strong>ten doesnot return the very last results. <strong>The</strong>refore the means forthe last positions was based on few sample data and hencesome outliers could spoil the results.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!