Prof. Anne-Wil Harzing, University of Melbourne
Web: www.harzing.com
Email: pop@harzing.com
© Copyright 2007-2008 Anne-Wil Harzing. All rights reserved.
Document link: http://www.harzing.com/pop_gs.htm
Seventh version, 5 February 2008.
Instead of the Thomson ISI Web of Science Publish or Perish uses Google Scholar data to calculate its various statistics. An important practical reason for this is that Google Scholar is freely available to anyone with an Internet connection and is generally praised for its speed (Bosman et al. 2006). The Web of Science is only available to those academics whose institutions are able and willing to bear the (quite substantial) subscription costs of the Web of Science and other databases in Thomson ISIs Web of Knowledge. As Pauly & Stergiou (2005:34) indicate free access to [ ] data provided by Google Scholar provides an avenue for more transparency in tenure reviews, funding and other science policy issues, as it allows citation counts, and analyses based thereon, to be performed and duplicated by anyone. Alastair Smith (2008) compared citation counts from Google Scholar to the research output from universities under New Zealand's PBRF (Performance Based Research Funding) research assessment exercise and found a very high (0.94) correlation between the PBRF output (defined as PBRF quality score times the FTE staff size) and the total number of citations returned by Google Scholar. However, there are several other good reasons to use Google Scholar to perform citation analyses, which will be covered in this note.
The output of Publish or Perish is only as good as its input. Whilst I do believe that in most cases Google Scholar presents a more complete picture of an academics impact than the Thomson ISI Web of Science, all databases have their own limitations, most of which are discussed in detail below. More generally, citations are subject to many forms of error, from typographical errors in the source paper, to errors in Google Scholar parsing of the reference, to errors due to some nonstandard reference formats. Publications such as books or conference proceedings are treated inconsistently, both in the literature and in Google Scholar. Thus citations to these works can be complete, completely missing, or anywhere in between.
Several academics have been very critical of Google Scholar. Péter Jacsó in particular has published some highly critical papers in Online Information Review (Jacsó, 2005, 2006a/b) discussing a limited number of Google Scholar failures in great detail. Whereas no doubt some of his critique is justified, I was unable to reproduce most of the Google Scholar failures detailed in his paper, suggesting that either they resulted from faulty searches or that Google Scholar has rectified these failures. Jacsó's claim that Google Scholar reports higher citation counts for certain disciplines, but not for the Social Sciences and Humanities is certainly inaccurate as much larger-scale studies (Bosman et al. 2006; Kousha & Thelwall, 2007) find the opposite result. Most importantly, the bulk of Jacsó's critique is leveled at inconsistent results for keyword searches, which are not relevant for the author and journal impact searches conducted with Publish or Perish. In addition, the summary metrics in Publish or Perish (e.g. h-index, g-index) are fairly robust and insensitive to occasional errors.
When using Publish or Perish for citation analyses, I would like to suggest the following general rule of thumb. If an academic shows good citation metrics, it is very likely that he or she has made a significant impact on the field. However, the reverse is not necessarily true. If an academic shows weak citation metrics, this may be caused a lack of impact on the field. However, it may also be caused by working in a small field, publishing in a language other than English (LOTE), or publishing mainly (in) books. Although Google Scholar performs better than the Web of Science in this respect, it is still not very good in capturing LOTE articles and citations, or citations in books or book chapters. As a result, citation metrics in the Social Sciences and even more so in the Humanities will always be underestimated as in these disciplines publications in LOTE and books/book chapters are more likely than in the Sciences.
The major disadvantage of the Web of Science is that it may provide a substantial underestimation of an individual academics actual citation impact. This is true equally for the general search function and for the Web of Science cited reference function, the two functions most generally used to perform citation analyses. However, the Web of Science general search function performs more poorly in this respect than the cited reference function. For example, the current (August 2007) number of citations to my own work is around 120 with the general search function, around 310 with the cited reference function and 803 with Google Scholar. My h-index is 7 with the general search function, 12 with the cited reference function and 15 with Google Scholar.
Differences will not be as dramatic for all scholars, but many academics show a substantially higher number of citations in Google Scholar than in the Web of Science. For instance Nisonger (2004) found that (excluding self-citations) Web of Science captured only 28.8% of his total citations, 42.2% of his print citations, 20.3% of his citations from outside the United States, and a mere 2.3% of his non-English citations. He suggests that librarians and faculty should not rely solely on Web of Science author citation counts, especially when demonstration of international impact is important. Nisonger also summarises several other studies that found Web of Science citation data to be incomplete.
Meho & Yang (2007) conducted a large-scale comparison between Web of Science, Scopus (Elseviers alternative to Thomson ISIs Web of Science) and Google Scholar covering citations of over 1,000 scholarly works of all 15 faculty members of the School of Library and Information Science at Indiana University Bloomington between 1996 and 2005. They found the overlap in citations between the three databases to be rather small. The overlap between Web of Science and Scopus was 58.2%. The overlap between Google Scholar and the union of Web of Science and Scopus was only 30.8%. This small overlap is largely caused by the fact that Google Scholar produced more than twice as many citations as Web of Science and nearly twice as many citations as Scopus. Many of those additional citations came from conference papers, doctoral dissertations, masters theses and books and book chapters.
At the same time both sources (Web of Science and Google Scholar) have been shown to rank specific groups of scholars in a relatively similar way. Saad (2006) found that for his subset of 55 scientists in consumer research, the correlation between the two h-indices was 0.82. Please note that this does not invalidate the earlier argument as it simply means most academics h-indices are underestimated by a similar magnitude by Web of Science. Meho & Yang (2007) also found that when Google Scholar results were added to those of Web of Science and Scopus separately its results did not significantly change the ranking of the 15 academics in their survey. The correlation between Google Scholar and Web of Science was 0.874, between Google Scholar and the union of Web of Science and Scopus 0.976.
Meho & Yang (2007) conclude that Google Scholar can help identify a significant number of unique citations. These unique citations might not significantly alter ones citation ranking in comparison to other academics in the same field and might not all be of the same quality as those found in the Web of Science or Scopus. However, they can be very useful in showing evidence of broader intellectual and international impact than is possible with Web of Science and Scopus. Hence they conclude Google Scholar could be particularly helpful for academics seeking promotion, tenure, faculty positions, research grants, etc.
There are a number of reasons for the underestimation of citation impact by Thomson ISI Web of Science.
There are, however, some disadvantages to the use of Google Scholar that are not shared by Thomson ISI Web of Science:
Google Scholar sometimes includes non-scholarly citations such as student handbooks, library guides or editorial notes. However, incidental problems in this regard are unlikely to distort citation metrics, especially robust ones such as the h-index. An inspection of my own papers shows that in general more than 75% of the citations are in academic journals, with the remainder appearing in books, conference papers, working papers and student theses. Few non-scholarly citations were found. Moreover, I would argue that even a citation in student handbooks, library guides or editorial note shows that the academic has an impact on the field, even if the field is not narrowly defined as the academics scholarly colleagues.
In a similar vein, Vaughan and Shaw (2008) argue that 92% of the citations identified by Google Scholar in the field of library and information science represented intellectual impact, primarily citations from journal articles.
Although for reasons discussed above Google Scholar generally provides a higher citation count than ISI, this might not be true for all fields of studies. The social sciences, arts and humanities, and engineering in particular seem to benefit from Google Scholar's better coverage of (citations in) books, conference proceedings and a wider range of journals. The Natural and Health Sciences are generally well covered in ISI and hence Google Scholar might not provide higher citation counts. In addition, for some disciplines in the Natural and Health Sciences Google Scholar's journal coverage seems to be very patchy. This leads to citation counts in these areas that might actually be much lower than those in ISI. In a systematic comparison of a 64 articles in different disciplines, Bosman et al. (2006) found overall coverage of Google Scholar to be comparable with both Web of Science and Scopus and slightly better for articles published in 2000 than in 1995. However, huge variations were apparent between disciplines with Chemistry and Physics in particular showing very low Google Scholar coverage and Science and Medicine also showing lower coverage than in Web of Science.
Based on a sample of 1650 articles Kousha & Thelwall (2007, 2008) found Google Scholar coverage to be less comprehensive than ISI in the three Science disciplines included in their study (Biology, Chemistry and Physics), with Google Scholar showing a particularly low coverage for Chemistry. Google Scholar coverage for the four Social Sciences included in their study (Education, Economics, Sociology and Psychology) as well as Computing was significantly higher than ISI coverage. Similarly, Bar-Ilan (2008) finds the number of Google Scholar citations substantially higher than the WoS and Scopus for mathematicians and computer scientists, but lower for high-energy physicists.
More detailed comparisons by academics working in the respective areas would be necessary before we can draw general conclusions. However, as a general rule of thumb, I would suggest that using Google Scholar might be most beneficial for three of the Google Scholar categories: Business, Administration, Finance & Economics; Engineering, Computer Science & Mathematics; Social Sciences, Arts & Humanities. Although broad comparative searches can be done for other disciplines, we would not encourage heavy reliance on Google Scholar for individual academics working in other areas without verifying results with either Scopus or Web of Science.
Archambault, E.; Gagné, E.V. (2004) The Use of Bibliometrics in Social Sciences and Humanities, Montreal: Social Sciences and Humanities Research Council of Canada (SSHRCC), August 2004.
Bar-Ilan, J. (2008) Which h-index? - A comparison of WoS, Scopus and Google Scholar, Scientometrics, vol. 74, no. 2., pp. 257-271.
Belew, R.K. (2005) Scientific impact quantity and quality: Analysis of two sources of bibliographic data, arXiv:cs.IR/0504036 v1, 11 April 2005.
Bosman, J, Mourik, I. van, Rasch, M.; Sieverts, E., Verhoeff, H. (2006) Scopus reviewed and compared. The coverage and functionality of the citation database Scopus, including comparisons with Web of Science and Google Scholar, Utrecht: Utrecht University Library, http://igitur-archive.library.uu.nl/DARLIN/2006-1220-200432/Scopus doorgelicht & vergeleken - translated.pdf.
Butler, L. (2006) RQF Pilot Study Project History and Political Science Methodology for Citation Analysis, November 2006, accessed from: http://www.chass.org.au/papers/bibliometrics/CHASS_Methodology.pdf, 15 Jan 2007.
Jacsó, P. (2005) Google Scholar: the pros and the cons, Online Information Review, vol. 29, no. 2, pp. 208-214.
Jacsó, P. (2006a) Dubious hit counts and cuckoo's eggs, Online Information Review, vol. 30, no. 2, pp. 188-193.
Jacsó, P. (2006b) Deflated, inflated and phantom citation counts, Online Information Review, vol. 30, no. 3, pp. 297-309.
Kousha, K.; Thelwall, M. (2007) Google Scholar Citations and Google Web/URL Citations: A Multi-Discipline Exploratory Analysis, Journal of the American Society for Information Science and Technology, vol. 58, no. 7, pp. 1055-1065.
Kousha, K; Thelwall, M. (2008) Sources of Google Scholar citations outside the Science Citation Index: A comparison between four science disciplines, Scientometrics, vol. 74, no. 2., pp. 273-294.
Meho, L.I.; Yang, K. (2007) A New Era in Citation and Bibliometric Analyses: Web of Science, Scopus, and Google Scholar, Journal of the American Society for Information Science and Technology, vol. 58, no. 13, pp. 2105–2125.
Nisonger, T.E. (2004) Citation autobiography: An investigation of ISI database coverage in determining author citedness, College & Research Libraries, vol. 65, no. 2, pp. 152-163.
Noruzi, A. (2005) Google Scholar: The New Generation of Citation Indexes, LIBRI, vol. 55, no. 4, pp. 170-180.
Pauly, D.; Stergiou, K.I. (2005) Equivalence of results from two citation analyses: Thomson ISIs Citation Index and Google Scholars service, Ethics in Science and Environmental Politics, December, pp. 33-35.
Roediger III, H.L. (2006) The h index in Science: A New Measure of Scholarly Contribution, APS Observer: The Academic Observer, vol. 19, no. 4.
Saad, G. (2006) Exploring the h-index at the author and journal levels using bibliometric data of productive consumer scholars and business-related journals respectively, Scientometrics, vol. 69, no. 1., pp. 117-120.
Smith, A.G. (2008) Benchmarking Google Scholar with the New Zealand PBRF research assessment exercise, Scientometrics, vol. 74, No. 2., pp. 309-316.
Testa, J. (2004) The Thomson Scientific Journal Selection Process, http://scientific.thomson.com/free/essays/selectionofmaterial/journalselection/, accessed 15 Jan 2007.
Vaughan, L.; Shaw, D. (2008) A new look at evidence of scholarly citations in citation indexes and from web sources, Scientometrics, vol. 74, no. 2., pp. 317-330.
|
|
| Home | Research
| Publications | Professional
| Teaching | Resources Copyright © 1997-2008 by Anne-Wil Harzing. All rights reserved. This page was last modified on 24/06/08 12:38 |