Research excellence: getting better all the time – or is it?

Simon Marginson

Research assessment is only partly reliable as an indicator of the real quality of the work going on in higher education. It has a dual character. On one hand it is rooted in material facts and objective methods. Strong research quality and quantity should be and are rewarded in the UK Research Excellence Framework (REF), the results of which have just been published.

But the outcome is also shaped by the universities that select and fashion data for competitive purposes and the subject area panels that define research judged to be outstanding on a global scale.

Total research activity can never be fully captured in performance data. Some things, such as citations in top journals, are easier to measure than others, such as the long-term impacts of research on policy and professional practice. Experienced players are best at gaming the system in their own interest.

A very strong overall REF performance signifies a large concentration of outstanding work. It is an unambiguous plus. All the same, precise league table positions in the REF, indicator by indicator, should be taken with a grain of salt.

The impact of impact

In the REF the indicators for ‘impact’, which are new to the 2014 assessment, are the least objectively grounded and most vulnerable to manipulation. This is because of the intrinsic difficulty of measuring the changes to society, economy and policy induced by new knowledge, especially in the long-term, and because of the kind of crafted ‘impact-related’ data that are collected during the REF assessment process. A sophisticated industry has already emerged, in the manufacture of examples of the relevant ‘evidence’ of impact. Thus the REF assesses simulations of impact, rather than actual impact.

At best, this gets everyone thinking about real connectivity with the users of research, which is one (though only one) of the starting points when producing the impact documentation. At worst, it leads to data that bear as much relation to reality as the statement of output by Russian factories in response to Soviet-era targets. Inevitably, the universities most experienced and adept at managing their response to performance measures of all kinds, perform especially well in producing impact documentation. There is also a ‘halo’ effect, of the kind that affects all measures contaminated by prior reputation. Research at, say, Imperial is seen to have impact precisely because it is research from Imperial.

The REF indicators that are the most meaningful are those related to ‘output’ quality, such as the grade point average, and the proportion of researchers ranked at 4*, the top mark. These are grounded in considered judgments of real research work, by panels with significant expertise. All the same, the standardised value of the output indicators, as measures of comparative quality, are subject to two caveats.

‘Getting better all the time’—or is it?

First, between the 2008 RAE and the 2014 REF there has been a notable inflation of the proportion of UK research outputs judged to be ‘world leading’ (rated 4*) and ‘internationally excellent’ (rated 3*).

In 2008, just 14% of research outputs were judged to be 4* and 37% were judged to be 3*, a total of 51% in the top two categories. In 2014, the proportion of the work judged to be outstanding had somehow jumped to 72%, with 22% judged to be 4* and another 50% judged to be 3*. This phenomenal improvement happened at a time when resources in higher education were constrained by historical standards.

While genuine improvement no doubt has occurred in at least some fields, the scale and speed of this improvement beggars belief. It reflects a combination of factors that generate boosterism. Higher Education Institutions (HEI)s have a vested interest in maximizing their apparent quality; subject area panels have a vested interest in maximizing the world class character for their fields; and UK higher education and its institutions are competing with other nations, especially the United States, for research rankings, doctoral students and offshore income.

The inflation of 4*s and 3*s is a worrying sign of a system in danger of becoming too complacent about its own self-defined excellence. This is not the way to drive long-term improvement in UK research. Less hubris and more hardnosed Chinese-style realism would produce better outcomes. It would be better to rely less on self-regulation, enhance the role of international opinion, and spotlight areas where improvement is most needed, not collapse into boosterism.

The selectivity game

Second, HEIs can readily game the assessment of output quality, by being highly selective about whose work they include in the assessment. Including only the best researchers pushes up the average GPA and the proportion of research ranked 4*. HEIs that do this pay a financial price, in that their apparent volume of research is reduced, and their subsequent funding will fall. Nevertheless, it is good for reputation. That has many long-term spinoffs, including financial benefits.

While some HEIs have chosen to approach the REF on an inclusive basis, others have pursued highly tailored entries designed to maximise average output quality and impact.

With the data from each HEI incomplete as a census of all research activity, and individual HEIs pursuing a variety of strategies, essentially the REF does not compare like-with-like. This undermines the validity of the REF as a league table of system performance, though everyone treats it that way. The same factor also undermines the value of performance comparisons between the 2008 RAE and the 2014 REF. The trend to greater selectivity, manifest in some but not all HEIs, is no doubt one of the factors that has inflated the incidence of 4*s and 3*s.

REF results in Education and the effect of IOE’s inclusive approach

Both of these tendencies—the inflation of outstanding performance, and the gaming of the system by being highly selective about the research on which the institution is judged —are apparent in the field of Education. In Education the proportion of work judged to be at 4* level doubled in the six years between research assessments, from 11% in 2008 to 22% in 2014. There were also changes in the ordering of institutions, on the basis of quality of outputs, driven by the gaming strategies of institutions.

The UCL Institute of Education (IOE) again submitted by far the largest entry, with 219 fulltime equivalent (FTE) staff, much the same as the 218 in 2008. The IOE took the inclusive approach to research assessment, and in that sense its REF results are a more accurate indicator of real research quality than is the case in some HEIs. In terms of total ‘research power’, the number of staff multiplied by the average assessment of quality (the GPA), the IOE achieved 703 points in the 2014 REF, which was more than four times the level of the number two institution in the field of Education, the Open University (164). Oxford was third at 140, followed by Edinburgh at 128 and King’s College at 124. As in 2008, the IOE is again confirmed as perhaps the world’s most important producer of globally significant research in the field of Education.

However, whereas in the 2008 RAE, the IOE was ranked equal first in terms of the quality of research outputs, in the 2014 REF it had slipped to equal 11th position. This was not due to any decline in the quality of outputs. In 2014 the proportion of IOE research judged to be at 4* level was 28%, up from 19% in 2008, in line with the trends in the RAE overall and in the field of Education. The proportion of work ranked at 3* also rose, from 38% to 40%, and 74% of the IOE’s research was ranked at maximum possible level for Impact. The IOE prepared 23 cases for Impact evaluation, with the next largest submission in the field of Education including only six cases.

Most of the HEIs that equalled or went past the IOE in 2014 on the basis of average output quality in Education, submitted more selective staff lists, compared to those used in 2008. Edinburgh dropped its staff input from 85 FTE in 2008 to 40 FTE in 2014, Nottingham from 51 FTE to 25, Birmingham from 47 to 24, Cambridge from 50 to 34, Bristol from 43 to 35, Durham from 31 to 25 and Sheffield from 24 to only 15.

Only Oxford, Exeter and King’s College London slightly increased their staff numbers in Education, though all three remained relatively ‘boutique’ in character, with 20 per cent or less of the IOE staff complement.

Oxford and King’s improved their overall REF performance in many fields of study, lifting their position within the top group of UK HEIs. This indicates either genuine research improvement, or more careful vetting of the best four publications per staff member that are the basis of the evaluation of outputs.

However, the largest volume of high quality research, 5.33% of total UK ‘research power’, was generated at the IOE’s parent university, University College London. Like the IOE, UCL takes the inclusive approach to research assessment. UCL’s share of research power rose sharply from its previous level of 3.83% in 2008. Following mergers with the School of Pharmacy and IOE, UCL is now the largest fish in the UK pond. Oxford is second at 5.19%, and Cambridge third at 4.49% followed by Edinburgh (3.60%) and Manchester (3.18%).

 

Tagged with: , , ,
Posted in Further higher and lifelong education, Research matters
6 comments on “Research excellence: getting better all the time – or is it?
  1. […] This article by Simon Marginson originally appeared at The Conversation, a Social Science Space partner site, under the title “Game-playing of the REF makes it an incomplete census.” In turn, this is an extract of an article published on the IOE London blog […]

    • Prof James M Scobbie says:

      I spy straw-man arguments. The press are more obsessed with REF’s % results than even university management, and certainly more than academics, but Prof Marginson complains about REF’s inability to make like-for-like comparisons, then falls into his own trap. I’m interested in absolute numbers, while happy to enjoy the PR when it’s good for the good of my own colleagues – who wouldn’t be?!

      So, in my group, the absolute number of 4* outputs went up from 6 to 14, and at my university, the number of 4* outputs tripled. I’m therefore rather interested in whether these improvements are due to grade inflation or a genuine improvement in research. Gut feeling tells me the latter. I’d not find out the answer for the comparable question nationally from the this article easily. It’s full of proportional comparisons and a note of the FTE staff return in individual groups, plus vague national claims – REF is said to be more selective, to game the process.

      In fact:
      FTE is pretty stable across the UK, though the number of outputs is down a bit. Hardly enough to cook the books and a normalised comparison will be roughly equivalent to the total count of quality as follows.

      REF2014
      a. FTE return was 52,061 category A with 191,150 outputs. (ref.ac.uk) – or
      b. FTE return was 52,077 category A with 191,232 outputs (from our central research office some months ago, maybe an early estimate)

      RAE2008
      a. FTE “more than 50,000” (rae.ac.uk)
      b. FTE category A 52,401 with 215,507 outputs (ref.ac.uk)
      c. FTE total sums up to 52,409 FTE from the rae results spreadsheet
      % of 4* and 3* outputs turned into an absolute number (i.e. ignoring esteem, impact and environment)

      REF2014 (ref.ac.uk)
      4* outputs 22.4% = 42,827
      3* outputs 49.5% = 94,619
      total = 137,446
      RAE2008 (ref.ac.uk cos rae.ac.uk doesn’t seem to easily provide the numbers)
      4* outputs 14% = 30,170
      3* outputs 37% = 79,737
      total = 109,907

      So that’s a rise of 27,537 3*/4* outputs, in absolute terms (a rise of 25%).

      There was an FTE drop of 324 (<1%).

      I have no idea what the total amount of research output of all levels of quality was from UK HEIs in that period, nor the total number of people of all types doing research through the period or on census day. So who knows if fewer or more people are producing fewer-better outputs, or more-better outputs, or what.

      Perhaps there is grade inflation and the REF panels were 25% more generous at 3*/4* than the RAE panel. Really? Or perhaps us researchers feel a lot more pressure to publish outputs, and to attempt to produce better ones too. I know the latter is true, I can't give you a comment on the former, except, of the people I know on these panels all have the highest integrity and are the same people peer reviewing our papers and grant submissions.

      So what is more likely?

      REF isn't a selective cheat. Are we (the REF panels) really 25% more complacent so are giving ourselves raised grades without trying? Or are we (the researchers contributing to REF) more pressured to produce better stuff? Gosh, perhaps we are just better at doing it all efficiently, or something! Me, I think REF reflects excellence fairly well. It's useless for a whole lot of other things, like apportioning funding or comparing institutions in absolute terms across disciplines, or disciplines against each other… but in terms of discipline-specific excellence, I think the standards are up. My local gut feeling and the size of the national result support that. Sure, REF doesn't compare like with like, but to compare REF against RAE at a national level and then put the increase in the proportion of 4* and 3* outputs down to "boosterism" and gaming is pretty poor analysis given the rise in absolute, not relative, standards by 25%. Blame the panels for corrupt self-promotion if you want (I don't, not for outputs at least) but don't put it down to gaming.

      sources –
      http://www.ref.ac.uk/…/REF%2001%202014%20-%20full%20documen…
      http://www.rae.ac.uk/results/outstore/RAEOutcomeFull.pdf
      http://www.ref.ac.uk/…/analys…/comparisonwith2008raeresults/

      • Simon Marginson says:

        Prof Scobbie’s two posts together confirm my original analysis. In the 2014 REF the proportion of work graded at 4* and 3* has risen from 51% to 72%. This creates a misleading impression of the rate of improvement of UK research, one likely to induce complacency. I speculate that the rise in the proportion of work graded as 4* or 3* derives from three possible factors.

        1. Distortion due to inflation of the value of outputs, without improvement in the quality, so that for this reason alone the number of papers ranked in the top two categories has increased, and the number of lower ranked papers has fallen. We can call this REF ‘grade inflation’.

        2. increased selectivity, due to gaming of the REF by institutions, so that the number and proportion of weak papers has fallen. Prof Scobbie confirms the increase in selectivity. There were 24,275 fewer outputs submitted. However, this alone is insufficient to account for the lift of 21% in excellent work.

        3. Genuine improvement in the quality of the work, so that the number of papers ranked in the top two categories has increased for this reason. This is plausible, but the scale of the improvement is such that it cannot be an exhaustive explanation.

        Such is the scale of the increase in excellent work that it looks to me that the second factor – grade inflation – is in play. Factors 2 and 3 are unlikely to be sufficient to explain an increase in excellent work on this scale.

        The best way to avoid the second distortion, that derived from gaming on the basis of enhanced selectivity, is to require all institutions to submit census style comprehensive returns of their research rather than offer them the option of presenting only their very best researchers. Clearly, the REF would then provide a more accurate picture of UK research, and comparisons between institutions and over time would become stronger.

        The best way to avoid the first distortion, that related to grade inflation, would be to use panels composed primarily of international researchers, rather than panels composed primarily of UK-based researchers, some of whom may be affected by a vested interest in boosting the UK’s reputation for excellence in research, and boosting the position of their own disciplinary fields within UK research. If the last suggestion be considered too radical, then panels that are 50% Uk and 50% international would lead to a more objective result than the present arrangements.

        Simon Marginson
        Professor of International Higher Education
        UCL Institute of Education

  2. Prof James M Scobbie says:

    I forgot to say, there were 24,275 fewer outputs submitted in REF. That’s a game-playing bit, if they were all 1* and 2*.

    But that just affects the proportions of 3*/4*, not the absolute numbers, which is why to focus on proportions to support a game-playing analysis is a straw man argument.

    Thanks for comments to come,
    Jim

  3. […] This piece was first published on the Institute of Education London blog. […]

  4. Oschool says:

    Thanks for sharing your helpful post.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

@IOE_London

Enter your email address

Want to keep up with IOE research?