How open is Cambridge?

As part of Open Access Week 2016, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this final OAWeek post Dr Arthur Smith analyses how much Cambridge research is openly available.

For us in the Office of Scholarly Communication it’s important that, as much possible, the University’s research is made Open Access. While we can guarantee that research deposited in the University repository Apollo will be made available in one way or another, it’s not clear how other sources of Open Access contribute to this goal. This blog is an attempt to quantify the amount of Cambridge research that is openly available.

In mid-August I used Cottage Labs’ Lantern service in anLantern_Oct2016_Graphic attempt to quantify just how open the University’s research really is. Lantern uses DOIs, PMIDs or PMCIDs to match publications in a variety of sources such as CORE and Europe PMC, to determine the Open Access status of a publication – it will even try to look at a publisher’s website to determine an article’s Open Access status. This process isn’t infallible, and it relies heavily on DOI matching, but it provides a good insight into the possible sources of Open Access material.

To determine the base list of publications against which the analysis could be run,  I queried Web of Science (WoS) and Scopus to obtain a list of publications attributed to Cambridge authors. In 2015, the University published 9069 articles, reviews and conference papers according to Web of Science. Scopus returned a slightly lower figure of 7983 publications. Combining these two publication lists, and filtering to only include records with a DOI, produced one master list of 9714 unique publications (that’s ~26 publications/day!).

In 2015 the Open Access team processed 2746 HEFCE eligible submissions, so naïvely speaking, the University achieved a 28.3% HEFCE compliance rate. That’s not bad, especially because the HEFCE policy had not yet come into force, but what about other Open Access sources? We know that other universities in the UK are also depositing papers in their repositories, and some researchers make their work ‘gold’ Open Access without going through the Open Access team, so the total amount of Open Access content must be higher.

In addition to the Lantern analysis, I also exported all available DOIs from Apollo and matched these to the DOIs obtained from WoS/Scopus. WoS also classifies some publications as being Open Access, and I included these figures too. If a publication was found in at least one potentially Open Access source I classified it as Open Access. Here are the results:

Figure 1. Of 9714 DOIs analysed by Lantern, 51.8% appear in at least one open access source.

It is pleasing that our naïve estimate of 28.3% HEFCE compliance closely matches the number of records found in Apollo (26.2%). The discrepancy is likely due to a number of factors, including publications received by the Open Access Team that were actually published in 2014 or 2016, but submitted in 2015, and Apollo records that don’t have a publisher DOI to match against. However, the most important point to note is the overall open access figure – in 2015 more than 50% of the University’s scholarly publications with a DOI were available in at least one “open access” source.

Let’s dig a little deeper into the analysis. Using everyone’s favourite metric, the journal impact factor (JIF), the average JIF of articles in Apollo was 5.74 compared to 4.33 for articles that were not OA. Other repositories and Europe PMC achieved even higher average JIFs. On average, Open Access publications by Cambridge authors have a higher JIF (6.04) than articles that are not OA, which suggests that researchers are making value judgements on what to make Open Access based on journal reputation. If a paper appears in a low(er) impact journal, it’s less likely to be made Open Access. Anecdotally this is something we have experienced at Cambridge.

Figure 2. Average 2015 JIF of papers classified according to their open access status.

The WoS and Scopus exports contain citation information at the article level, so we can also look at direct citations received by these publications (up to 16 August 2016)  rather than relying on the JIF. I found that Open Access articles, on average, received 1.5 to 2 more citations than articles that are not Open Access. However, is this because authors are making their higher impact articles Open Access (which one might expect to receive more citations anyway) and are not bothering with the rest? Or this is effect due entirely to the greater accessibility offered by Open Access publication? Could the differences arise because of different researcher behaviour across different disciplines?

My feeling is that we have reached a turning point – the increased citation rates of Open Access material is not caused by the article being Open Access as these articles would have naturally received more citations anyway. Instead of looking at formal literature citations, the benefits of Open Access need to be measured outside of academia in areas that would not contribute to an articles citations.

Figure 3. Average citations received by papers according to their open access source.

Breaking it down by the source of Open Access reveals that articles that appear in other repositories receive significantly more citations than any other source. This potentially reveals that collaborative papers between researchers at different institutions are likely to have greater impact than papers conducted solely at one institution (Cambridge), however, a more thorough analysis that looks at author affiliations would be needed to confirm this.

If we focus on the WoS citation distribution the difference in average citations becomes clearer. Of 8348 WoS articles, not only are there fewer Open Access articles with no citations (14% vs 17%), but Open Access articles also receive more citations in general.

Figure 4. Citation distribution of papers found in WoS depending on their open access status.

What can we take away from this analysis? Firstly, Lantern is a valuable tool for discovering other sources of Open Access content. It identified over a thousand articles by Cambridge researchers in other institutional repositories that we did not know existed. When it comes time for the next REF, these other repositories may prove a vital lifeline in determining whether a paper is HEFCE compliant.

Secondly, more than 50% of the University’s 2015 research publications are potentially Open Access. Hopefully a similar analysis of 2016’s papers will show that even more of the University’s research is Open Access this year. And finally, although Open Access articles receive more citations than articles that are not Open Access, it is no longer clear whether this is caused by the article being Open Access, disciplinary differences, or if authors are more likely to make their best work Open Access.

Published 28 October 2016
Written by Dr Arthur Smith

Could the HEFCE policy be a Trojan Horse for gold OA?

The HEFCE Policy for open access in the post-2014 Research Excellence Framework kicks in 9 weeks from now.

The policy states that, to be eligible for submission to the post-2014 REF, authors’ final peer-reviewed manuscripts of journal articles and conference proceedings with an ISSN must have been deposited in an institutional or subject repository on acceptance for publication. Deposited material should be discoverable, and free to read and download, for anyone with an internet connection.

The goal of the policy is to ensure that publicly funded (by HEFCE) research is publicly available. The means HEFCE have chosen to favour is the green route – by putting the AAM into a repository. This does not involve any payment to the publishers. The timing of the policy – at acceptance – is to give us the best chance of obtaining the author’s accepted manuscript (AAM) before it is deleted, forgotten or lost by the author.

Universities across the UK have been preparing. Cambridge has had the ‘Accepted for publication? Send us your manuscript‘ campaign running since May 2014 with a very simple and well liked interface allowing researchers to submit their work. The Open Access team then deposits the item, checks for funding and the publisher policies and then organises payment for open access publication if required.

To give an idea of the numbers we are dealing with at Cambridge, during 2015 the Open Access team deposited 2553 articles into our repository Apollo.

Compliance levels

We have been reporting to Wellcome Trust and the RCUK over the past few years to indicate compliance levels with their policies. However the ‘compliance level’ for the HEFCE policy is a slippery concept. For a start, the policy has not yet come into force. Another complicating factor is the long term nature of the ‘reporting’. We will not truly know how compliant we have been until the time comes to submit to REF – whenever that will be (currently it seems 2021).

At Cambridge have been working on the assumption that because we do not know which outputs will be the ones that we will claim we should collect all eligible articles. However, the number of deposited articles Open Access team received over the past year represents approximately 30% of the full eligible output of the University. This might seem concerning in some ways, but it must be remembered that each researcher in the University will only be reporting four research outputs for the REF.

There are some articles that are obvious contenders for REF. By concentrating on researchers who are publishing in very high impact journals we have been trying to catch those articles we are extremely likely to claim.

During the course of 2015 we discovered 93 papers published in Nature, Science, Cell, The Lancet and PNAS. 33% of these papers were already HEFCE compliant. Of the remaining non-compliant papers we contacted 47 authors, made them aware of the HEFCE open access policy, and invited them to submit their accepted manuscript to the Open Access Service. Less than 40% of those authors who were contacted responded with their accepted manuscript. Therefore, even after direct intervention only 49% papers were HEFCE compliant, which means that still more than half of all eligible papers published in Nature, Science, Cell, The Lancet and PNAS during this period would not have been HEFCE compliant had the policy been in place.

The lack of engagement by members of the academic community with this process is a serious concern – and potentially due to four reasons:

  • Lack of awareness of the policy
  • Putting it off until the policy is in place
  • Deliberately choosing not to submit a work because it is not considered important enough or they do not consider their contribution to be significant enough
  • Some form of conscientious objection to the policy

We should note that the third reason is a matter of some concern to the University as it is not the researcher who decides which articles are put forward for REF. In addition, the University is interested in having a high overall level of compliance for REF as it considers making the research output of the institution available to be important.

Temporary reprieve

Cambridge is no island when it comes to facing significant challenges in capturing all outputs in preparation for HEFCE’s policy. While the highly devolved nature of the institution and the sheer volume of publications may be a problem unique to Cambridge and Oxford, other institutions are still developing the technology they intend to use or are facing staffing issues.

In a concession to serious concern across the sector about the ability to meet the deadline, on 24 July 2015 HEFCE announced that there was a temporary modification to the policy. They now allow research outputs to be made open access up to three months after publication until at least April 2017 (and until such time that the systems to support deposit at acceptance are in place).

This means for the first year of the policy we have a small window after publication to locate articles, determine if they are in our repositories, and if not chase the authors for the Author’s Accepted Manuscript.

The trick is knowing that an article has been published. At Cambridge our ‘best bet’ is to use Symplectic which scrapes various aggregating sources such as Scopus. However Symplectic is hindered by the efficiency of its sources. There is no guarantee that a given article will appear in Symplectic within three months of publication. And even if it is, we have already discussed the low engagement by the research community to approaches from the Open Access team for AAMs.

Subject based repositories

So far this blog has been talking about using institutional repositories for compliance. But the policy specifically states: “The output must have been deposited in an institutional repository, a repository service shared between multiple institutions, or a subject repository“.

The oldest, most established subject repository is arXiv.org and it makes sense for us to consider using arXiv as part of Cambridge’s compliance strategy. After all, some areas of high energy physics, most of computer science and much of mathematics use arXiv as a means to share their research papers. In 2014, the number of articles that were deposited into arXiv.org and subsequently picked up in Symplectic and approved by researchers were 582 – approximately 6.5% of Cambridge’s total eligible articles.

If we are able to claim these articles for HEFCE compliance without any behaviour change requirement from our academic staff then this is an ideal situation. But how do we actually do this? There is a footnote to the HEFCE statement above which says that: “Individuals depositing their outputs in a subject repository are advised to ensure that their chosen repository meets the requirements set out in this policy.” And this is the crunch point. arXiv does not currently identify which version of the work has been deposited, nor does it record the acceptance date of the work. Because of this we are currently not able to simply use the work being uploaded to arXiv.

There is work underway to look at this possibility and what would be required to allow us to use the subject based repositories as a means for compliance. HEFCE themselves have identified under ‘Further areas of work‘ that  “measures to support compliance in subject repositories” is an area of uncertainty and they will work with the community to address this.

Alternative approach?

It is possibly a good moment to take a step back from the minutiae of the means and the timing of the HEFCE policy and focus on the goal that publicly funded research is publicly available. We are in a complex policy environment. HEFCE affects all researchers but many researchers are also funded through COAF or the RCUK with their respective (gold leaning) Open Access policies.

Of the HEFCE eligible articles submitted to to Open Access team in 2015, after working through all the different funder requirements, there was a split of 44% gold Open Access and 56% green Open Access. Of the gold payments the split is approximately 74% for hybrid journals and 26% for fully open access journals.  That said, the three journals with which we have published the most – PLOS ONE, Nature Communications and Scientific Reports – are fully Open Access journals with APCs of $1495, $5200 and $1495 respectively.

A highly relevant question is – outside of the efforts by our Open Access compliance teams, how much Cambridge research is being made open access anyway?

Open access articles

The Web of Science (WoS) allows a filter on ‘Open Access’. It does not appear to list articles that are made open access on a hybrid basis, only picking up fully open access journals. While these are not definitive numbers, it does give us some idea of the scale we are looking at. In 2014 WoS gives us a figure of 981 articles published as open access by a University of Cambridge author in a fully open access journal.

The Springer Compact to which many institutions (including Cambridge) have signed up means that now all articles published by that research community will be made open access. In 2014, the Open Access Service had paid for 21 articles to be made open access. In the same period across the institution we had published 695 articles with Springer. (Note that in 2015 we paid 51 Springer  APCs). This means that for the cost of the Springer subscription and our APC payments for the previous year we will have a good proportion of Cambridge articles published as open access articles.

These two sets of numbers only allow for articles published either in fully open access journals or with Springer. It does not account for the articles where the University (or a Department or individual) pays an APC to make an article available in a hybrid (non Springer) journal. The upshot is – a significant proportion of Cambridge research is published open access.

Skip the AAM on acceptance part?

So what does this published open access research mean for compliance with the HEFCE policy? The updated HEFCE policy has addressed this:

“… we have decided to introduce an exception to the deposit requirements for outputs published via the gold route. This may be used in cases where depositing the output on acceptance is not felt to deliver significant additional benefit. We would strongly encourage these outputs to be deposited as soon as possible after publication, ideally via automated arrangements, but this will not be a requirement of the policy.”

This makes sense from an administrative perspective if the article appears in a journal where there is an embargo period on making the AAM available, forcing the University to pay an APC to make the work Open Access to meet RCUK requirements. It would avoid the palaver of:

  • obtaining the AAM from the author
  • depositing it into the repository
  • having to check to see when the article has been published
  • updating the details and
  • either set the embargo on the AAM or change the attachment in the record to the Open Access final published version

However journals where there is an embargo period on making the AAM available forcing an APC payment is in fact almost a definition of hybrid journals. We know there are issues with hybrid – of the extra expense, of double dipping, of the higher APC charges for hybrid over fully Open Access journals. Putting these aside, what this HEFCE policy change means is that publishers have effectively shifted the HEFCE policy away from a green open access policy to a gold one for a significant proportion of UK research. This is a deliberate tactic, along with the unsubstantiated campaign that green Open Access poses a major threat to scholarly publishing and therefore embargoes should be even longer.

We are already facing the problem that hybrid journals are forcing the move towards green open access being ‘code’ for a 12 month delay. This is the beginning of a very slippery slope. We have been outplayed. It really is time for the RCUK and Wellcome Trust to stop paying for hybrid Open Access.

But I digress.

The cons

The message is confusing enough – three sets of policies and three different requirements in terms of the timing and the means to make work compliant and available. We are trying to make it as simple as possible for researchers – with limited success.

The move to widespread Open Access in the UK is a huge shift for the research community and those that support them. It would be very difficult to debate the ‘against’ argument for the statement that publicly funded research should be publicly available but the devil is very much in the detail.

It would be an incredible shame if the HEFCE policy is hijacked into a partial gold OA policy, but as administrators we are drowning in compliance. There needs to be a broad discussion across the funders to try and address the conflicting compliance requirements and the potentially negative effect these policies are having on the future of open scholarly publishing. 

We welcome the opportunity to discuss these issues with HEFCE, Wellcome Trust and the RCUK. There’s plenty to talk about.

Published 25 January 2016
Written by Dr Danny Kingsley
Good news stories about data sharing?

We have been speaking to researchers around the University recently to discuss the expectations of their funders in relation to data management. This has raised the issue of how best to convince people this is a process that benefits society rather than a waste of time or just yet another thing they are being ‘forced to do’ – which is the perspective of some that we have spoken with.

Policy requirements

In general most funders require a Research Data Management Plan to be developed at the beginning of the project – and then adhered to. But the Engineering and Physical Sciences Research Council (EPSRC) have upped the ante by introducing a policy requiring that papers published from May 2015 onwards resulting from funded research include a statement about where the supporting research data may be accessed. The data needs to be available in a secure storage facility with a persistent URL, and that it must be available for 10 years from the last time it was accessed.

Carrot or stick?

While having a policy from funders does make researchers sit up and listen, there is a perception in the UK research community that this is yet another impost on time-poor researchers. This is not surprising. There has recently been an acceleration of new rules about sharing and assessing research.

The Research Excellence Framework (REF) occurred last year, and many researchers are still ‘recuperating’. Now the Higher Education Funding Council of England (HEFCE) is introducing  a policy in April 2016 that any peer reviewed article or conference paper that is to be included in the post-2014 REF must have been deposited to their institution’s repository within three months of acceptance or it cannot be counted.  This policy is a ‘green’ open access policy.

The Research Councils UK (RCUK) have had an open access policy in place for two years, introduced in 1 April 2013, a result of the 2012 Finch Report. The RCUK policy states that funded research outputs must be available open access, and it is permitted to make them available through deposit into a repository. At first glance this seems to align with the HEFCE policy, however, restrictions on the allowed embargo periods mean that in practice most articles must be made available gold open access – usually with the payment of an accompanying article processing charge. While these charges are supported by a block grant fund, there is considerable impost on the institutions to manage these.

There is also considerable confusion amongst researchers about what all these policies mean and how they relate to each other.

Data as a system

We are trying to find some examples about how making research data available can help research and society. It is unrealistic to hope for something along the lines of Jack Akandra‘s breakthrough for a diagnostic test for pancreatic cancer using only open access research.

That’s why I was pleased when Nicholas Gruen pointed me to a report he co-authored: Open for Business: How Open Data Can Help Achieve the G20 Growth Target – A Lateral Economics report commissioned by Omidyar Network – published in June 2014.

This report is looking primarily at government data but does consider access to data generated in publicly funded research. It makes some interesting observations about what can happen when data is made available. The consideration is that data can have properties at the system level, not just the individual  level of a particular data set.

The point is that if data does behave in this way, once a collection of data becomes sufficiently large then the addition of one more set of data could cause the “entire network to jump to a new state in which the connections and the payoffs change dramatically, perhaps by several orders of magnitude”.

Benefits of sharing data

The report also refers to a 2014 report The Value and Impact of Data Sharing and Curation: A synthesis of three recent studies of UK research data centres. This work explored the value and impact of curating and sharing research data through three well-established UK research data centres – the Archaeological Data Service, the Economic and Social Data Services, and the British Atmospheric Data Centre.

In summarising the results, Beagrie and Houghton noted that their economic analysis indicated that:

  • Very significant increases in research, teaching and studying efficiency were realised by the users as a result of their use of the data centres;
  • The value to users exceeds the investment made in data sharing and curation via the centres in all three cases; and
  • By facilitating additional use, the data centres significantly increase the measurable returns on investment in the creation/collection of the data hosted.
So clearly there are good stories out there.

If you know of any good news stories that have arisen from sharing UK research output data we would love to hear them. Email us or leave a comment!