Sacrifice a little accuracy for a lot more comprehensive coverage

I wrote this prologue for La revolución Google Scholar: Destapando la caja de Pandora académica by Enrique Orduña-Malea, Alberto Martín-Martín, Juan M. Ayllón, Emilio Delgado López-Cózar before I started blogging. I thought it would be useful to feature it during my "August blog holiday".


If I were asked to summarise my “verdict” of Google Scholar in just 10 words, the title above would be a likely candidate. I have been following Google Scholar since its introduction in 2004, but my level of interest increased in 2006 when I needed to make a case for research impact in my application for promotion to full professor.

Working in International Business, I found that very few of the journals I had published in were listed in the Web of Science. Those that were listed had low journal impact factors, largely because many of the journals that were citing them were not (yet) included in the Web of Science. As a result, ten years ago it was very difficult for an International Business scholar to convince a promotion panel that they had achieved just as much impact as for instance academics working in the neighbouring disciplines of Economics and Psychology. In both these disciplines a far higher proportion of academic journals was ISI listed.

publish or perish

Enter Google Scholar and Publish or Perish!

Enter Google Scholar! When searching for my work in Google Scholar, my case immediately looked much brighter. It turned out my work was actually very highly cited and even my non-traditional publications such as books, book chapters, and a journal ranking list had a strong impact in terms of citations. Unfortunately, the Google Scholar interface didn’t make it very easy to aggregate citation metrics in an accessible form that could be compared across academics.

Thus – in October 2006 – Publish or Perish (PoP) was born. PoP is a software program that retrieves and analyses academic citations. It uses Google Scholar to obtain the raw citations, then analyses these and presents a very wide range of metrics. For me, it was also the start of a new “research hobby”, doing research in the field of bibliometrics. Since then I have published about a dozen articles in journals such as Scientometrics, Journal of Informetrics, and Journal of the Association for Information Science & Technology. Most of these articles have used Google Scholar as a data source.

Google Scholar developments in the last decade

Google Scholar has come a long way since the early days. Its coverage has improved dramatically and it is now better for all disciplines than either the Web of Science or Scopus (Harzing & Alakangas, 2016). As a result there are now many bibliometric studies that rely on Google Scholar (usually with Publish or Perish) to do their research. An all of the words Google Scholar search for the words: Harzing "Publish or Perish" results in nearly 2,000 hits.

Discipline Scopus citations as % of
Google Scholar citations
Web of Science citations as % of
Google Scholar citations
Humanities 11.5% 7.0%
Social Sciences 30.0% 22.7%
Engineering 57.6% 45.7%
Sciences 64.2% 65.6%
Life Sciences 70.5% 66.8%

The table above summarises part of the results of our recent study of 146 academics in the Life Sciences, Sciences, Engineering, Social Sciences and Humanities. (Harzing & Alakangas, 2016). As is immediately apparent, both the Web of Science and Scopus miss a huge number of citations in the Humanities and Social Sciences, mostly because they do not include book publications, and in some disciplines cover only a fraction of the journals. However, even in Engineering, the Sciences and the Life Sciences, Google Scholar reports between one-and-a-half and twice as many citations as the Web of Science and Scopus.

Google Scholar and Publish or Perish have democratised citation analysis

I would be the first to acknowledge that Google Scholar does have some important drawbacks, the most important of which are summarised in my Publish or Perish tutorial. There is also the non-negligible danger that - with easy access to bibliometric tools - comes a certain level of inexpert or plain ignorant use.

PoP used by academics, librarians, governments, grant agencies, and laboratories

However, in my view Google Scholar has played a major role in “democratising” citation analysis (see also Harzing & Mijhardt, 2015). With Publish or Perish everyone with a computer and Internet access can run bibliometric searches. Not surprisingly, the software is used all over the world, from individual academics and librarians in more than 80 countries to governments departments (e.g. US Dept of Energy, US Environmental Protection Agency, US Agency for International Development), from grant giving agencies (e.g. SSHRC in Canada, CNRS in France) to research laboratories (e.g. Microsoft, Hewlett Packard, IBM).


PoP used at both highly ranked and emerging countries' universities

It is particularly gratifying to note that the software is widely used at highly ranked universities such as Harvard, Stanford, MIT, Oxford, and Cambridge, universities that have comprehensive access to commercial alternatives. However, it is even more satisfying to see its equally high use at under-resourced universities in countries such as Armenia, Botswana, Mongolia, Paraguay, Tajikistan, and Uruguay. More generally, there are over a thousand libraries worldwide that list the software as a free alternative to Scopus and the Web of Science. Google Scholar and Publish or Perish clearly fill a need!

PoP used to improve transparancy and meritocracy

Closer to (my) home, the software and Google Scholar are particularly popular in Italy, Greece, and Poland, countries in which many universities do not have access to Scopus or the Web of Science either. What I find particularly pleasing is that in both Italy and Greece, the software has been used regularly to promote transparency and meritocracy in university appointments.

PoP used to cover non-English language publications

Moreover, in these and other countries, academics that publish in their own languages rather than in English need to rely on Google Scholar if they want more than incidental coverage of their work. Just like East Asians, they might also find that Thomson Reuters Web of Knowledge isn’t particularly well versed in accurately distinguishing academics with non-English names (see Harzing, 2015).


The verdict?

Using Google Scholar means sacrificing a certain level of accuracy. In the majority of bibliometric analyses, especially those at higher levels of aggregation and those focusing on robust metrics such as the h-index, Google Scholar’s inevitable slip-ups will not significantly influence the results. However, this is little solace for the (rare) individuals that are “robbed” of their most cited publication, because Google Scholar doesn’t parse their name correctly. If 99-100% accuracy is required, Google Scholar will always need to be triangulated with other data sources.

However, as a scholar in the Social Sciences, I am more than happy to accept the occasional Google Scholar lapse in return for a coverage that is vastly superior to Scopus and the Web of Science and does not discriminate against scientific publication practices outside the (Life) Sciences (see also Harzing, 2013). Hence my opening quote: “Sacrifice a little accuracy for a lot more comprehensive coverage”. I might add to that: … “and a lightening fast response”. By the time I have started up the Web of Knowledge or Scopus website, remembered my password and logged in, I have already finished my searches in Publish or Perish!

Why we need this book?

Although many researchers have used Google Scholar as a data source, none of them have been so diligent in their efforts to provide a better picture of Google Scholar as the EC3 Research Group. Since 2008 Emilio and his team have worked tirelessly to explore the inner workings of Google Scholar. I am therefore delighted that their experience to date has now been integrated into a monograph. My only regret is that it is not (yet) available in English!