Reflections on Google Scholar
Prof. Anne-Wil Harzing, University of Melbourne
Web: www.harzing.com
Email: pop@harzing.com
© Copyright 2007 Anne-Wil Harzing. All rights reserved.
Document link: http://www.harzing.com/pop_gs.htm
First version, 17 January 2007
Introduction
Instead of the Thomson ISI Web of Science (WoS), Publish or Perish uses Google
Scholar (GS) data to calculate its various statistics. An important practical
reason for this is that GS is freely available to anyone with an Internet
connection. The WoS is only available to those academics whose institutions
are able and willing to bear the (quite substantial) subscription costs of
the WoS and other databases in Thomson ISIs Web of Knowledge. As Pauly
& Stergiou (2005:34) indicate free access to [
] data provided
by GS provides an avenue for more transparency in tenure reviews, funding
and other science policy issues, as it allows citation counts, and analyses
based thereon, to be performed and duplicated by anyone. However, there
are several other good reasons to use GS to perform citation analyses, which
will be covered in this note.
General caveat
The output of Publish or Perish is only as good as its input. Whilst I do believe
that GS presents a more complete picture of an academics impact than
the Thomson ISI WoS, all databases have their limitations. I would like to
suggest the following general rule of thumb. If an academic shows good citation
metrics, it is very likely that he or she has made a significant impact on
the field. However, the reverse is not necessarily true. If an academic shows
weak citation metrics, this may be caused a lack of impact on the field. However,
it may also be caused by working in a small field, publishing in a language
other than English (LOTE), or publishing mainly (in) books. Although GS performs
better than the WoS in this respect, it is still not very good in capturing
LOTE articles and citations, or citations in books or book chapters. As a
result, citation metrics in the Social Sciences and even more so in the Humanities
will always be underestimated as in these disciplines publications in LOTE
and books/book chapters are more likely than in the Sciences.
The disadvantage of using Thomson ISI Web of Science for citation analyses
The major disadvantage of the WoS is that it may provide a substantial underestimation
of an individual academics actual citation impact. This is true equally
for the general search function and for the WoS cited reference
function, the two functions most generally used to perform citation analyses.
However, the WoS general search function performs more poorly
in this respect than the cited reference function. For example,
the current (January 2007) number of citations to my own work is 97 with the
general search function, 287 with the cited reference
function and 658 with GS. My h-index is 7 with the general search
function, 10 with the cited reference function and 13 with GS.
Differences will not be as dramatic for all scholars, but virtually all academics
show a substantially higher number of citations in GS than in the WoS. For
instance Nisonger (2004) found that (excluding self-citations) WoS captured
only 28.8% of his total citations, 42.2% of his print citations, 20.3% of
his citations from outside the United States, and a mere 2.3% of his non-English
citations. He suggests that librarians and faculty should not rely solely
on WoS author citation counts, especially when demonstration of international
impact is important. Nisonger also summarises several other studies that found
WoS citation data to be incomplete.
Meho & Yang (2006) conducted a large-scale comparison between WoS, Scopus
(Elseviers alternative to Thomson ISIs WoS) and GS covering citations
of over 1,000 scholarly works of all 15 faculty members of the School of Library
and Information Science at Indiana University Bloomington between 1996 and
2005. They found the overlap in citations between the three databases to be
rather small. The overlap between WoS and Scopus was 58.2%. The overlap between
GS and the union of WoS and Scopus was only 30.8%. This small overlap is largely
caused by the fact that GS produced more than twice as many citations as WoS
and nearly twice as many citations as Scopus. Many of those additional citations
came from conference papers, doctoral dissertations, masters theses
and books and book chapters.
At the same time both sources (WoS and GS) have been shown to rank specific
groups of scholars in a relatively similar way. Saad (2006) found that for
his subset of 55 scientists in consumer research, the correlation between
the two h-indices was 0.82. Please note that this does not invalidate the
earlier argument as it simply means most academics h-indices are underestimated
by a similar magnitude by WoS. Meho & Yang (2006) also found that when
GS results were added to those of WoS and Scopus separately its results did
not significantly change the ranking of the 15 academics in their survey.
The correlation between GS and WoS was 0.874, between GS and the union of
WoS and Scopus 0.976.
Meho & Yang (2006) conclude that GS can help identify a significant number
of unique citations. These unique citations might not significantly alter
ones citation ranking in comparison to other academics in the same field
and might not all be of the same quality as those found in the WoS or Scopus.
However, they can be very useful in showing evidence of broader intellectual
and international impact than is possible with WoS and Scopus. Hence they
conclude GS could be particularly helpful for academics seeking promotion,
tenure, faculty positions, research grants, etc.
Why Thomson ISI Web of Science underestimates true citation impact
There are a number of reasons for the underestimation of citation impact by
Thomson ISI WoS.
- In the General Search function WoS only includes citations to journal
articles published in ISI listed journals (Roediger, 2006). Citations
to books, book chapters, dissertations, theses, working papers, reports,
conference papers, and journal articles published in non-ISI journals
are not included. Whilst in the Natural Sciences and Engineering (NSE)
this may give a fairly comprehensive picture of an academics total
output in the Social Sciences and Humanities (SSH) only a limited number
of journals are ISI listed. Also, in both the Social Sciences and the
Humanities books and book chapters are very important publication outlets.
GS includes citations to all academic publications regardless of whether
they appeared in ISI-listed journals (Belew, 2005, Meho & Yang, 2006).
- In the Cited Reference function WoS does include citations to non-ISI
publications. However, it only includes citations from journals that are
ISI-listed (Meho & Yang, 2006). As indicated before in SSH only a
limited number of journals are ISI-listed. Butler (2006) analysed the
distribution of publication output by field for Australian universities
between 1999-2001. She finds that whereas for the Chemical, Biological,
Physical and Medical/Health sciences between 69.3% and 84.6% of the publications
are in ISI listed journals, for Social Sciences such as Management, History
Education and Arts only 4.4%-18.7% of the publications are published in
ISI listed journals. ISI estimates that of the 2000 new journals reviewed
annually only 10-12% are selected to be included in the WoS (Testa, 2004).
Archambault & Gagné (2004) found that US and UK-based journals
are both significantly over-represented in the WoS in comparison to Ulrichs
journal database. This overrepresentation was stronger for the Social
Sciences and Humanities than for the Natural Sciences and Engineering.
In contrast to the WoS, GS includes citations from all academic publications
regardless of where they appeared. However, it must be acknowledged that
although GS captures more citations in books and book chapters than the
WoS (which captures none), it is by no means comprehensive in this respect.
Google Book Search may provide a better alternative for book searches.
- In the General Search function WoS does not include citations to the same
work that have small mistakes in their referencing (which especially for
books and book chapters occurs very frequently). In the Cited Reference
function WoS does include these citations, but they are not aggregated
with the other citations. GS appears to have a better aggregation mechanism
than WoS. Even though duplicate publications that are referenced in a
(slightly) different way still occur, GS has a grouping function that
resolves the worst ambiguities. For instance, my 1996 publication with
Geert Hofstede in the research annual Research in the Sociology of Organizations
draws 15 WoS citations but these are spread over 7 different appearances.
GS shows 23 citations and has only one appearance for the publication.
Belew (2005) confirms that GS has lower citation noise than WoS. In the
WoS only 60% of the articles were listed as unique entries (i.e. no citation
variations), while for GS this was 85%. None of the articles in his sample
had more than five separate listings within GS, while 13% had five or
more entries in the WoS.
- Whilst the Cited Reference function of WoS does include citations to non-ISI
journals, it only includes these publications for the first author. Hence
any publications in non-ISI journals where the academic in question is
the second or further author are not included. GS includes these publications
for all listed authors. For instance, my 2003 publication with Alan Feely
in Cross Cultural Management shows no citations in the WoS for my name,
whilst it shows 10 citations in GS.
- The WoS includes only a very limited number of journals in languages other
than English (LOTE) and hence citations in non-English journals are generally
not included in any WoS citation analysis. Whilst GSs LOTE coverage
is far from comprehensive, it does include a larger number of publication
in other languages and indexes documents in French, German, Spanish, Italian
and Portuguese (Noruzi, 2005). Meho & Yang (2006) found that 6.94%
of GS citations were from LOTE, while this was true for only 1.14% for
the WoS and 0.70% for Scopus. Archambault & Gagné (2004) found
that Thomsons ISIs journal selection favours English, a situation
attributable to ISIs inability to analyse the content of journals
in LOTE.
- The WoS only includes citations in published journal articles. GS also
includes citations in conference papers, working papers, and pre-prints
of articles to appear in journals (Meho & Yang, 2006). As a results
GS provides a more comprehensive picture of recent impact, especially
for the Social Sciences and Humanities where more than five years can
elapse between research appearing as a working or conference paper and
research being published in a journal. This also means that GS usually
gives a more accurate picture of impact for junior academics.
The disadvantage of using Google Scholar for citation analyses
There are, however, some disadvantages to the use of GS that are not shared
by Thomson ISI WoS:
- GS sometimes includes non-scholarly citations, such as student handbooks,
library guides or editorial notes. However, incidental problems in this
regard are unlikely to distort citation metrics, especially robust ones
such as the h-index. A casual inspection of my most cited paper (The persistent
myth of high expatriate failure) shows that more than 75% of the citations
are in academic journals, with the remainder appearing in books, conference
papers, working papers and student theses. No non-scholarly citations
were found. Moreover, I would argue that even a citation in student handbooks,
library guides or editorial note shows that the academic has an impact
on the field, even if the field is not narrowly defined as the academics
scholarly colleagues.
- Not all scholarly journals are indexed in GS. Unfortunately, GS is not
very open about its coverage and hence it is unclear what its sources
are. It is generally believed that Elsevier journals are not included
(Meho & Yang, 2006), because Elsevier has a competing commercial product
in Scopus. However, I was able to find all Elsevier journals I have published
in. On the other hand, Meho & Yang (2006) did find that GS missed
40.4% of the citations found by the union of WoS and Scopus, suggesting
that GS does miss some important refereed citations. It must also be said
though that the union of WoS and Scopus misses 61.04% of the citations
in GS. Further, Meho & Yang (2006) found that most of the citations
uniquely found by GS are from refereed sources.
- GS does not perform as well for older publications as these publications
and the publications that cite them have not (yet) been posted on the
web. Pauly & Stergiou (2005) found that GS had less than half of the
citations of the WoS for a specific set of papers published in a variety
of disciplines (mostly in the Sciences) between 1925-1989. However, for
papers published in the 1990-2004 period both sources gave similar citation
counts. The authors expect GSs performance to improve for old articles
as journals back issues are posted on the web. Meho & Yang (2006)
found the majority of the citations from journals and conference papers
in GS to be from after 1993. Below (2005) found GS to be competitive in
terms of coverage for references published in the last 20 years, but the
WoS superior before then. This means that GS might underestimate the impact
of scholars who have mainly published before 1990.
- GSs processing is done automatically without manual cleaning and
hence sometimes provides nonsensical results. For instance one of the
citations to my Managing the Multinationals book lists as its title K.,
1999. The author of the citing paper listed my initials with a comma
after the first two initials and hence GS interpreted the third initial
and year as the title. However, incidental mistakes like this are unlikely
to have a major impact on citation metrics, especially those as robust
as the h-index. Morevover, GS is committed to fix mistakes (GS help function).
References
Archambault, E.; Gagné, E.V. (2004) The Use of Bibliometrics in Social
Sciences and Humanities, Montreal: Social Sciences and Humanities Research
Council of Canada (SSHRCC), August 2004.
Belew, R.K. (2005) Scientific impact quantity and quality: Analysis of two
sources of bibliographic data, arXiv:cs.IR/0504036 v1, 11 April
2005.
Butler, L. (2006) RQF Pilot Study Project History and Political Science
Methodology for Citation Analysis, November 2006, accessed from: http://www.chass.org.au/papers/bibliometrics/CHASS_Methodology.pdf,
15 Jan 2007.
Meho, L.I.; Yang, K. (2006) A New Era in Citation and Bibliometric Analyses:
Web of Science, Scopus, and Google Scholar, under review at Journal
of the American Society for Information Science and Technology, accessed
from: http://dlist.sir.arizona.edu/1695/,
15 Jan 2007.
Nisonger, T.E. (2004) Citation autobiography: An investigation of ISI database
coverage in determining author citedness, College & Research Libraries,
vol. 65, no. 2, pp. 152-163.
Noruzi, A. (2005) Google Scholar: The New Generation of Citation Indexes,
LIBRI, vol. 55, no. 4, pp. 170-180.
Pauly, D.; Stergiou, K.I. (2005) Equivalence of results from two citation
analyses: Thomson ISIs Citation Index and Google Scholars service,
Ethics in Science and Environmental Politics, December, pp. 33-35.
Roediger III, H.L. (2006) The h index in Science: A New Measure of Scholarly
Contribution, APS Observer: The Academic Observer, vol. 19, no.
4.
Saad. G. (2006) Exploring the h-index at the author and journal levels using
bibliometric data of productive consumer scholars and business-related journals
respectively, Scientometrics, vol. 69, no. 1., pp. 117-120.
Testa, J. (2004) The Thomson Scientific Journal Selection Process, http://scientific.thomson.com/free/essays/selectionofmaterial/journalselection/,
accessed 15 Jan 2007.