Running the REF on a rainy Sunday afternoon: Do metrics match peer review?

White paper that proposes replacing the REF with a metrics-based exercise

Anne-Wil Harzing - Wed 2 Aug 2017 14:15 (updated Sat 15 Apr 2023 07:40)

Prof. Anne-Wil Harzing, Middlesex University
Web: www.harzing.com, Email: anne@harzing.com

First version, 2 August 2017 with minor additions on 14 and 16 August.

Abstract

This white paper focuses on a single key issue: the feasibility of replacing the cumbersome, convoluted, and contested REF process with a much simpler exercise based on metrics. Using the free Publish or Perish software to source citation data from Microsoft Academic (MA), I was able to create a citation-based ranking of British universities that correlates with the REF power ranking at 0.97. The whole process took me less than two hours of data collection and less than two hours of analysis.

So I suggest we replace the REF by metrics and use the time and money thus saved more productively. Let's stop wasting our time on evaluating individuals for REF submission or serving on REF panels that evaluate the “end products” of our colleagues' research activity. Instead, let's spend our “evaluation time” where it really matters, i.e. in reading papers and funding applications of our (junior) colleagues before submission, and in carefully evaluating their cases for tenure and promotion. Wouldn’t that create a far more positive and productive research culture?

Introduction

For those blissfully unaware of what the REF (Research Excellence Framework) is, please see this Guardian article for a good summary of the process. Two other Guardian articles provide good summaries of its five key problems and its devastating effects on staff morale and research cultures. The Council for the Defence of British Universities, whose founding members include a dozen past and current presidents of Britain’s foremost learned societies, Nobel prize winners, former Vice-Chancellors and cabinet ministers, and “national treasures” such as David Attenborough and Alan Bennett, has also published an excellent critique of the REF.

This white paper instead focuses on a single key issue: the feasibility of replacing the cumbersome, convoluted, and contested REF process with a much simpler exercise based on metrics. As the quotes below show, I am by no means the first to argue to ditch the current REF process for a simpler way to distribute Quality Related (QR) research funding.

“The research assessment exercise started life as a simple measurement tool. It has grown into something completely different, a powerful and often perverse driver of academic behaviour. We need to get back to first principles, and design an assessment system that is at once simpler and more open.” [Peter Scott, Professor of Higher Education Studies, UCL Institute of Education, University of London in the Guardian]

“If metrics-based measures can produce much the same results as those arrived at through an infinitely more costly, laborious and time-consuming process of ‘expert review’ of individual outputs, there is a compelling reason to go with the metrics; not because it is necessarily a valid measure of anything but because it as reliable as the alternative (whose validity […] is no less dubious for different reasons) and a good deal more cost-efficient.” [Derek Sayer, 2015:89, Emeritus Professor of Sociology, University of Alberta]

"Having a large group of experts reading vast numbers of outputs and trying to compare their value is costly and often inaccurate. We therefore need to develop systems that are sensitive to quality without involving enormous efforts and complexity. For this reason, journal publications and citation counts can hardly be avoided. Peer reviews [AWH: as in peer review for journals] and citations generally reflect the jugdments of fellow academics. They are less sensitive to capricious opinions, biased jugdments, local conventions and preferences, and nepotism. The expert opinions of evaluation panels, whose members have their own likes and dislikes, specialisms and blind spots, must be balanced with som more impersonal criteria, like the ability to generate research funds and earn citations from other scholars. " [Alvesson, Gabriel & Paulsen, 2017:125]

I am not the first one conducting empirical research on the extent to which the result of metrics matches the results of peer review either. For a detailed literature review see the Metrics Tide report, a few key examples can be found in blogposts by Dorothy Bishop and Jon Clayden and journal articles in Scientometrics (Mryglod, Kenna, Holovatch & Berche, 2013) and British Journal of Management (Taylor, 2011). However, these studies typically use limited samples and/or employ fairly time-consuming data collection methods. They also rely on commercial, subscription-based databases such as the Web of Science or Scopus that have limited coverage in the Social Sciences and Humanities and, to a lesser extent, Engineering (Harzing & Alakangas, 2016).

Most recently Mingers, O’Hanley & Okunola (2017) used a freely accessible database – Google Scholar – and created a university ranking based on citations drawn from Google Scholar Citation Profiles. Although inventive, this method is still fairly time-consuming and relies on academics having created a Google Scholar Citation Profile. As my research has shown (Harzing & Alakangas, 2017a) uptake of these profiles is by no means universal. Uptake might also be heavily dependent on university policies promoting their usage, thus leading to unbalanced comparisons between universities.

Hence, I took a different approach and used Publish or Perish to source citation data from another freely accessible database – Microsoft Academic (MA) – which has been shown to have a comprehensive coverage across disciplines (Harzing & Alakangas, 2017a, Harzing & Alakangas, 2017b) and - unlike Google Scholar - allows for searching by affiliation and field.

For those of you who can’t wait, I’ll give you my key take-away right now: On a rainy Sunday afternoon, I was able to create a citation-based ranking of British universities that correlates with the REF power ranking at 0.97. It took me less than two hours of data collection and less than two hours of analysis. For the full story, read on…

Nine methodological differences between REF and citation ranking

The full discussion of my methods – which is likely to bore the hell out of most people – can be found later on in this white paper. Briefly, both the REF ranking and my proposed ranking evaluate publications published between 2008 and 2013. However, the first does so through peer review of these publications, the second through citations to these publications. In addition, there are eight other ways in which these two approaches differ.

The REF assesses not just publications, but also non-academic impact, through impact case studies, and the research environment, through a combination of metrics and narrative. My approach only looks at [citations to] publications.
The REF only includes publications by academics that were selected by their institution for submission to the REF. My approach includes publications by all academics in the institution.
The REF includes a maximum of four publications for each academic. My approach includes all publications for each academic, regardless of whether she has published one, two, three, four or 50 papers over the six-year period.
The REF only includes academics employed at the institution at the census date. My approach includes all publications that carry the university’s affiliation. If “4* academics” move institutions or are given a fractional appointment just before the census date this influences the REF ranking, but not my citation ranking.
The REF requires academics to be submitted in a particular disciplinary area (unit of assessment or UoA). My university-wide citation ranking doesn't necessitate any disciplinary choices as it evaluates all of the university's output. I also conduct detailed disciplinary rankings, however, and here the match between REF UoA and MA subject field might not be perfect.
The REF allowed submission of publications that were accepted in 2013 (or earlier), even if they were actually published after 2013. My approach only includes articles that were actually published between 2008 and 2013.
The REF assessment was conducted in 2014. My approach used citation counts as of July 2017.
The REF output was predominantly made up of journal publications. My approach includes all publications that are covered in the MA database. This means for instance that a fair number of books, book chapters, conference papers, and software are also included (see Harzing & Alakangas, 2017b).

It is useful to point out at this stage that REF 2021 will differ from REF 2014 in that it will include all academics not employed on a teaching only contract [point 2] and will allow more (and less) than four publications per academic [point 3]. Regarding transferability of publications [point 4], it is likely to that the Stern report recommendation for publications to stay with the institution will be implemented, though with some transitional arrangements. As a result, the rules currently proposed for REF 2021 are closer to my citation-based approach than the rules that were applied for REF 2014. Thus, we can expect any positive correlations between the REF peer review ranking and a MA citation ranking to be stronger for REF 2021 than for REF 2014.

Surely the many methodological differences means correlations are weak?

Given these differences we can’t honestly expect a very strong correlation between a REF peer review ranking and a MA citation ranking, can we? As both are purporting to measure the same underlying construct (research performance of the institution), we would obviously expect a positive correlation. We should also realise that at aggregated levels some of the differences highlighted above might not have such a big impact. For instance with regard to point 4, the difference would only matter if turnover was high and staff characteristics at the census date were very different from those during the 2008-2013 period.

So maybe we could expect a correlation in the 0.7 to 0.8 range? Or is that too optimistic? Would you believe me if I said the actual correlation between the REF power rating and the total number of citations is 0.9695, so nearly 0.97. Probably not! I didn’t either, so I checked and checked, but the outcome remained the same. This high correlation, however, might have been partly driven by the extremely high scores at the top of the ranking. So what if we used rank correlations instead? Well, the correlation does decline, but only by 0.001 to 0.9685, which for all practical purposes is still 0.97 :-). Figure 1 shows the regression plot for the correlation between the REF Power Rank and the MA Citation Rank. Most universities cluster around the regression line and the average difference in rank is only 6.8 (on a ranking of 118 universities).

Figure 1: REF power rank by MA citation rank

But hold on… I do see some big differences for some universities!

Indeed! Although most universities cluster around the regression line, there are some notable deviations. Apart from the real possibility that either peer review or metrics provides a superior way to measure the underlying construct, these deviations can largely be categorized in three key sources of problems. The first relates to problems with the MA data, which in turn lie in three areas.

First, MA has problems in correctly identifying two universities in our sample: Queens University Belfast and the Open University (coloured red in Figure 1). Queens University Belfast sees quite a number of their highly cited papers attributed to Queens University (Canada). The British Open University experiences the opposite problem, a substantive portion of their highly cited papers were in fact produced by academics affiliated with the Dutch or Israeli Open University.
Second, although the MA attribution of affiliation to academics has improved tremendously since my last test in February 2017 - the number of papers with affiliation more than doubled - it is not perfect. There are still papers where some of the authors do not have an associated affiliation. Another problem is that some academics see some of their publications from a previous employment attributed to their current affiliation. Finally, MA generally attributes only one affiliation to an author, so if an author publishes a paper listing multiple affiliations only the first affiliation might be captured. This is likely to particularly disadvantage universities that have a lot of visiting professors on fractional appointments, a fairly common strategy to boost REF performance.
Third, a minor problem is that MA – just like Google Scholar – generally aggregates citations for books to the latest edition of a book and sometimes attributes citations to a review of the book. This makes sense from a search perspective. Someone searching for a book would generally favour its latest edition or would prefer to see a review of the book if the book wasn’t available online. However, from a research evaluation perspective this is obviously not desirable. Fortunately, these occurrences are rare and unlikely to make a big difference at an aggregate university level unless a university’s citation counts are low overall.

Whilst these problems are unfortunate, they are eminently solvable. Since its launch just over a year ago, MA has been actively improving their matching algorithm. In addition, every single issue I have reported to them has eventually been resolved and used for further fine-tuning. Hence, I do expect these MA related problems to become less and less prominent.

The second category relates to universities that employ one, or a small group of, academics participating in huge consortia doing research in for instance particle physics or gene technology. Publications resulting from these collaborations might have over a thousand authors and a huge number of citations (as every author tends to cite these publications in their other work). Although these publications might have been highly ranked in the REF, they would have made up only a small proportion of the institution's REF submission. In contrast, they are likely to represent a disproportionate share of citations, especially for smaller institutions. Universities coloured orange in Figure 1 all share this problem to varying extents. A solution might be to remove mega-author papers from the comparison. THE decided to do so after seeing a fairly obscure university storm up their rankings, simply because of one individual staff member’s involvement in this type of papers.

A related problem is the more general issue that citation practices differ by discipline. Citation levels tend to be much higher in the Sciences and Life Sciences than in the Social Sciences and Humanities, with Engineering falling between these two extremes (Harzing & Alakangas, 2016). Thus, universities that have a heavy concentration in the (Life) Sciences (coloured orange in Figure 1) are likely to outperform universities who have a strong presence in the Social Sciences and Humanities (coloured purple in Figure 1) in terms of citations. London Business School (LBS) is an exception to this rule. Its research output, which appears almost exclusively in the top US journals, tends to be very highly cited, so its citation rank even exceeds it REF rank. However, the School of Oriental and African Studies (SOAS) and to a lesser extent the London School of Economics (LSE) clearly suffer from this, as does Warwick, a university whose strong presence in Physics is counteracted by a strong presence in the Social Sciences.

To appreciate the effect this might have, consider for instance Warwick and Birmingham, who are ranked similarly (#14 and #15 respectively) on REF rank. On citation rank, however, Birmingham (#8) substantially outranks Warwick (#21). The same is true for Liverpool and LSE that are ranked #22 and #23 respectively on REF rank, but #15 and #33 respectively on citation rank. In their study, Mingers, O’Hanley & Okunola (2017) signalled the same problem and applied a correction for disciplinary composition; this catapulted LSE to the top of their ranking. In the next section, I will suggest an alternative correction.

A third category of problems mainly involves post-92 universities, composed of two distinct groups. First, there is a group of universities (marked black in Figure 1) that is fairly highly ranked in the REF without having a correspondingly high level of citations. Many of these universities had a substantially lower REF ranking in 2008 than in 2014. A second group of universities (marked green in Figure 1) is ranked higher on citation count than on the REF rating. These discrepancies might have been caused by any combination of the following five reasons.

Given the small number of publications involved, the previously listed MA inaccuracies regarding author affiliation might have a relatively strong positive or negative impact for both of these groups.
The relatively high citation rank for the green universities might have been caused by textbooks, which would not have been submitted as REF publications. Their citations might also have been inflated if the textbooks had multiple editions as MA generally attributes all citations to the last edition.
The relatively high citation rank for the green universities might have been caused by a single author, or a small group of authors, making up a huge chunk of the institution’s citation record. In the REF ranking their influence would have been limited to four publications each. Wolverhampton is a case in point; excluding just one academic would lead the institution to drop 6 places in the citation ranking.
The green universities might have had a larger turnover of academics just before the census date than the black universities, thus losing their output (and lowering their REF rating), but keeping their citations. Conversely, the black universities might have hired more academics shortly before the REF census date, thus improving their REF rating, but not their citations.
The black universities might have had better scores for impact than the green universities thus leading to a better REF ranking than citation ranking. The fact that the black universities have generally improved their REF ranking substantially since 2008 (when impact was not included) seems to point in that direction.

Many of these reasons are aggravated by the fact that this particular group of universities has low citation levels and REF ratings overall, so that individual idiosyncrasies have a disproportiate impact and might lead to substantial volatility in the rankings. However, a manual verification of the most highly cited papers would allow us to spot most of the problems listed under the first three points fairly easily. Attributing publications to the universities where the research was conducted is already likely to happen in REF 2021, thus resolving point 4. The last point might indicate that a separate exercise to assess non-academic impact might be worthwhile.

But what if we want to compare disciplines?

So far, my analysis has been at the level of the university as a whole, whereas the REF was conducted by discipline. The ostensible aim of REF, however, is to distribute QR funding. QR funding goes to universities, not disciplines, and certainly not to individual researchers. So if the outcome variable is defined at the university level, why do we go through the tremendous effort of evaluating universities by discipline? Even worse, why do we insist on evaluating individual academics for REF entry? As Sayer argues:

Metrics will not allow us to drill down to individuals (or possibly even to UoAs) with any degree of validity, but they do not need to if the REF is merely a funding mechanism. Any such additional information – and the time, money and effort spent in gathering it – would be superfluous to what is required. (Sayer, 2015: 89)

Why not simply assess universities as a whole and let them decide how to distribute the money internally, based on their own strategic priorities? This would also solve the problem of assessing multidisciplinary research in one fell swoop.

If universities should need assistance in deciding how to divvy up the money, they could run citation-based analyses by field to establish the relative merits of their individual schools or departments. This would be as easy as adding a field label to the analyses I conducted above. To test the feasibility of this option, I conducted field-level analyses for Business & Management, Chemistry, and Computer Science. Rank correlations for Chemistry and Computer Science, for which I only conducted marginal cleaning (see methods section for details), were 0.94. In my own discipline (Business & Management), for which I conducted more substantive cleaning, the rank correlation was 0.97. So clearly running this analysis at a disciplinary level - if so desired - is feasible.

But what if we wanted to provide a comparison across universities as a whole, taking the differential disciplinary mix into account? In that case, we could use a metric that corrects for the number of co-authors, a simple, but quite effective, way to address disciplinary differences. One of these metrics, the hIa (Harzing, Alakangas & Adams, 2014), is an annualised single-author equivalent h-index; it shows the average number of impactful single-author equivalent publications a university publishes a year. As Figure 2 shows, a ranking based on the hIa-index correlates with the REF power rank at 0.9654. For many universities a hIa ranking reduces deviation from the REF rank when compared with a rank based on raw citations; the average difference in rank declines from 6.8 to 6.4.

Figure 2: REF power rank by MA hIa rank

As Figure 2 shows, most of the orange-coloured universities have moved closer to the regression line. Remember these were universities with a higher rank for citations than for REF power, where high citation levels were caused primarily by their concentration in the (Life) Sciences. In fact, Birmingham and Liverpool even cross the line and are now ranked lower on citations than on REF power. Looking at the two pairs discussed above, Warwick and Birmingham now score very similarly, both on REF power (#14 and #15) and on citations (#15 and #17). In this case, the disciplinary correction has brought their citation rank very close to their REF rank. LSE and Liverpool, however, show a complete reversal of fortunes. Whereas for raw citations LSE ranked 18 places below Liverpool, on discipline corrected citations it ranks 23 places above Liverpool.

LSE, LBS and SOAS are the three biggest winners of a disciplinary correction as all three institutions only produce work in the Social Sciences. Their published work is generally highly cited, but has an average of only 1.5-3 authors per publication, both for their overall publication record and for their top 1,000 most cited publications. Birmingham and Liverpool on the other hand are two of the biggest losers. They both have a significant concentration in disciplines with large author teams, with an average of 7-8 authors per publication for their overall publication record and an average of 14-15 authors per publication for their most highly cited publications.

Obviously, I am not suggesting LSE, LBS and SOAS should receive far more money than they do, although it would be nice to see the Social Sciences being given preferential treatment for once! The actual size of the disciplinary correction should also be up for discussion. The current default setting for Publish or Perish is to only take up to 50 authors into account, but this can easily be changed in the preferences. Most importantly, our results show that – if so desired – a disciplinary correction can be applied easily.

Methods

The REF power rating was taken from Research Fortnight’s calculation. Research Fortnight first created a quality index, reflecting the funding formula with 4* research counting three times as much as 3* research, which was weighted by staff numbers. They subsequently created a power rating by relating the quality ranking to the best performing university (Oxford), which was scored 100.

Citation data were collected through a Publish or Perish Microsoft Academic search, including the name of the university in the Affiliation field and restricting the publication years to 2008-2013. As Business Schools often have a name that is distinct from the university, I constructed OR queries such as “University of Oxford OR Said Business School” and “University of Kent OR Kent Business School”. I excluded all universities and institutes that were entered in only one UoA; most of these were art schools that had a limited publication output anyway. I also excluded specialised institutes such as the Institute of Cancer Research and the London School of Hygiene and Tropical Medicine, both of which are small and were only entered in two REF units for assessment. This left me with 118 universities.

Although Publish or Perish calculates a wide variety of metrics, including various metrics dependent on the number of publications, I decided to focus on citation-based metrics. The number of publications is a sign of productivity, but this doesn’t necessarily translate into research quality (Harzing, 2005). Although citations are by no means a perfect measure of academic impact, they are a generally accepted proxy.

I thus collected all citations for the 2008-2013 period, which was a fairly time-consuming process [app. 10 hours] as the total number of publications amounted to well over a million. However, as citation data tend to be highly skewed, I also collected data for the top 1,000 publications for each university (or less if the university had fewer publications) for a total of well over 100,000 publications. This process took little more than an hour!

Correlations between the REF power rank and citations rank are almost identical regardless of whether I use only the top 1,000 publications [0.965] or the total number of publications [0.969] for each university, whereas correlations between REF power rating and the number of citations are very similar too [0.961 vs. 0.970]. For completeness sake I used the total number of publications, but obviously focusing on the top 1,000 publications only would significantly reduce the time spent on data collection.

For the disciplinary comparisons I used the MA field of research. This was very straightforward for Chemistry and Computer Science, where I simply searched for these terms. It was a bit more complicated for Business & Management, as this discipline includes a wide range of sub-disciplines. Hence, for Business & Management I used the field search term “Management OR Finance OR Accounting OR Marketing OR Operations Management OR Business OR Commerce”.

Even for Chemistry and Computer Science, the MA searches will draw in publications in fields that would typically have been submitted other UoA’s, such as Environmental Science and Materials Science for Chemistry and Mathematics/Statistics and Electrical Engineering for Computer Science. However, it took me less than a minute per university to exclude these publications. I didn't attempt any further cleaning as I do not have sufficient familiarity with these fields to do so reliably.

For Business & Management, the necessarily broad search string resulted in the inclusion of many publications in Economics, Education, Public Policy, Geography, Industrial Engineering, as well as a range of other disciplines. I followed a dual strategy to exclude as many of these publications as possible. First, I sorted the results for each university by publication outlet as that allowed me to exclude whole swaths of publications in one go. Second, I manually reviewed all publications with more than 50 citations and excluded those not in Business & Management. Although this might sound like a time-consuming process, it typically didn’t take me more than 2-10 minutes per university.

Some caveats

Whenever metrics are proposed as an alternative to peer review, academics are quick to point out the manifold flaws of metrics. In principle, I agree with nearly all of these reservations. However, as I have argued before, and with me many others, these arguments usually compare a “reductionist” version of metrics with an idealised version of peer review rather than a “responsible and comprehensive” version of metrics with the reality of peer review.

It is true that citations might be biased against certain individuals, research approaches, publication types, types of journals, or disciplines, but peer review is equally likely to be subject to these biases. Interrater reliability of peer review is notoriously low (see e.g. Nicolai, Schmall & Schuster, 2014). There is even evidence that rankings based on article level citations are more inclusive than rankings based on journal rankings (Harzing & Mijnhardt, 2015). As we have found in prior research (Harzing & Alakangas, 2016; Harzing & Alakangas, 2017a) disciplinary biases can be reduced through the use of MA and/or Google Scholar which capture many of the non-journal publications that are important in the Social Sciences and Humanities, as well as some Engineering disciplines. As Alvesson, Gabriel & Paulsen (2017:113) aptly summarize: "An advantage of citation metrics is that they are fairly format-neutral - a good book or an innovative article in a lesser journal may score higher than a mundane 'contribution' in a star journal."

Citation databases, including commercial ones, might occasionally make big mistakes, but then again so might peer review. Citations might be easy to game, but then again the whole REF process has been argued to be a game. If deemed necessary, self-citations and systemic levels of cross-citation through citation cartels could be excluded. The key issue to remember though is that citation biases typically exert a limited influence at higher levels of analysis, such as the university level. Although the level of agreement between peer review and citations might be limited at the individual or journal level or even the research group level, it tends to be very strong at aggregate levels such as universities, where what we lose in terms of individual nuances is compensated by the law of big numbers. My views with regard to the preferred level of analysis thus contrast with the conclusions in the Metrics Tide report, which considers a detailed analysis of correlations at the individual research output as superior to an analysis at an aggregated level (see also Harzing & Mijnhardt, 2015). As detailed below, I argue that conducting a metrics-based analysis at the aggregate university level allows us to spend our peer "evaluation time" where it really matters.

Obviously, the use of citations for research evaluation is inappropriate in some disciplines. They are clearly irrelevant in the performing arts and other disciplines that do not have publication outputs. They are also problematic in some of the Humanities disciplines in which citations might be used to critique rather than endorse a publication. However, rather than using these arguments to discard citation analysis for all disciplines, why not simply use citation analysis for the 95% of disciplines where it does work and spend the huge amount of time saved to come with creative ideas for the remaining disciplines? To me, using different methods to evaluate different disciplines doesn't mean we treat them unequally, it simply means recognising that disciplines are unique.

In sum: peer review or metrics?

When using citation analysis our work is evaluated by hundreds of thousands of peers, including international ones (remember the REF process was designed to establish whether British research is world leading, i.e. 4*, or at least internationally excellent, i.e. 3*). Collecting citations in the way I proposed in this white paper is also a process that is transparent and replicable in just a few hours by everyone with a computer, Internet access and the free Publish or Perish software.

In the current REF process, our work is evaluated by a select group of academics: those that have volunteered to serve on the REF panels. And as Sayer indicates: “the REF’s panels give extraordinary gatekeeping power to a disproportionately older, male, white – and overwhelmingly Russell Group and former 1994 Group – academic elite”. Some of these panel members have to read more than a 1,000 articles, often outside their immediate area of expertise, collectively spending (wasting?) over a 1,000 years of productive research time. They also have to “burn their papers” after the panel meeting.

So what would your choice for the REF be: peer review or metrics? Mine would be metrics, hands down. This doesn’t mean that metrics should be our primary choice when evaluating individuals, although even in that case they sometimes are preferable to peer review, especially in countries that suffer from academic nepotism. Obviously, this doesn’t mean either that we should give up completely on evaluating research quality through peer review. But letting metrics do the bulk of the work frees up a staggering amount of time and would allow us to come up with more creative and meaningful ways to build in a quality component.

From insidious competition to supportive research cultures

By using metrics we all win back the time we waste on evaluating individuals for REF submission, a soul-destroying activity in the first place. Instead, we can spend our “evaluation time” where it really matters, i.e. in reading papers and funding applications of our (junior) colleagues before their submission, and in reading their cases for tenure and promotion. Academics that were part of the REF panels win back even more time. Imagine if those 1,000 eminent scholars, rather than spending all of their time evaluating the “end products” of their colleagues' research activity, would spend that time instead on mentoring their junior colleagues. Wouldn’t that create a far more positive and productive research culture?

More generally, rather than spending such a significant part of our academic lives giving “scores” to people and finding faults in their work, why don’t we spend more of our time truly mentoring and developing junior academics. As someone who is performing exactly this role at Middlesex University for some 50 junior academics, I can assure you it is also a far more interesting and rewarding job! So from an environment where we all compete against each other to get that elusive 4* hit, let’s move to an environment where we support each other to do meaningful research (see also Adler & Harzing, 2009 and Alvesson, Gabriel & Paulsen, 2017).

References

Adler, N.; Harzing, A.W. (2009) When Knowledge Wins: Transcending the sense and nonsense of academic rankings, The Academy of Management Learning & Education, vol. 8, no. 1, pp. 72-95. Available online... - Publisher’s version
Alvesson, M., Gabriel, Y., & Paulsen, R. (2017) Return to Meaning: A Social Science with Something to Say, Oxford University Press.
Harzing, A.W. (2005) Australian research output in Economics & Business: High volume, low impact?, Australian Journal of Management, vol. 30, no. 2, pp. 183-200. Available online...
Harzing, A.W.; Alakangas, S.; Adams, D. (2014) hIa: An individual annual h-index to accommodate disciplinary and career length differences, Scientometrics, vol. 99, no. 3, pp. 811-821. Available online... - Publisher's version (read for free)
Harzing, A.W.; Mijnhardt, W. (2015) Proof over promise: Towards a more inclusive ranking of Dutch academics in Economics & Business, Scientometrics, 102, no. 1, pp. 727-749. Available online... - Publisher's version (read for free)
Harzing, A.W.; Alakangas, S. (2016) Google Scholar, Scopus and the Web of Science: A longitudinal and cross-disciplinary comparison, Scientometrics, vol. 106, no. 2, pp. 787-804. Available online... - Publisher's version (read for free) - Presentation slides - Video presentation of this article
Harzing, A.W.; Alakangas, S. (2017a) Microsoft Academic: Is the Phoenix getting wings?, Scientometrics, 110, no. 1, pp. 371-383. Available online... - Publisher's version (read for free) - Press coverage in Scientific American and Nature
Harzing, A.W.; Alakangas, S. (2017b) Microsoft Academic is one year old: the Phoenix is ready to leave the nest, Scientometrics, 112, no. 3, pp. 1887-1894. Available online... - Publisher's version (read for free)
Mingers, J.; O’Hanley, J.R. & Okunola, M. (2017) Using Google Scholar Institutional Level Data to Evaluate the Quality of University Research, Working paper, DOI: 10.13140/RG.2.2.25603.91686
Mryglod, O., Kenna, R., Holovatch, Y., & Berche, B. (2013) Comparison of a citation-based indicator and peer review for absolute and specific measures of research-group excellence, Scientometrics, vol. 97, no. 3, pp. 767-777.
Nicolai, A. T., Schmal, S., & Schuster, C. L. (2015) Interrater reliability of the peer review process in management journals. In Incentives and Performance: Governance of Research Organizations, (pp. 107-119). Springer International Publishing.
Sayer, D. (2015) Rank hypocrisies: The insult of the REF, Sage Publications.
Taylor, J. (2011) The Assessment of Research Quality in UK Universities: Peer Review or Metrics?, British Journal of Management, vol. 22, no. 2, pp. 202–217.

Find the resources on my website useful?

I cover all the expenses of operating my website privately. If you enjoyed this post and want to support me in maintaining my website, consider buying a copy of one of my books (see below) or supporting the Publish or Perish software.

Aug 2022:

Nov 2022:

Feb 2023:

May 2023:

Anne-Wil Harzing is Professor of International Management at Middlesex University, London and visiting professor of International Management at Tilburg University. She is a Fellow of the Academy of International Business, a select group of distinguished AIB members who are recognized for their outstanding contributions to the scholarly development of the field of international business. In addition to her academic duties, she also maintains the Journal Quality List and is the driving force behind the popular Publish or Perish software program.

Anne-Wil Harzing's profile and contact details >>

Running the REF on a rainy Sunday afternoon: Do metrics match peer review?

Table of contents

Abstract

Introduction

Nine methodological differences between REF and citation ranking

Surely the many methodological differences means correlations are weak?

But hold on… I do see some big differences for some universities!

But what if we want to compare disciplines?

Methods

Some caveats

In sum: peer review or metrics?

From insidious competition to supportive research cultures

References

Related blog posts

Find the resources on my website useful?