Cambridge University hosted Ben Ryan and Amanda Chmura from the Engineering and Physical Sciences Research Council (EPSRC) on Friday 15 May for a discussion about how the University is meeting the EPSRC expectations for sharing research data.
We started the conversation with a demonstration of the services we offer our researchers including our Research Data Management website, and talked about the open data sessions and other training events we have been holding. So far we have managed to speak to 764 researchers about data sharing requirements (the numbers continue to grow).
In 2011 EPSRC published nine key expectations on research data management. The expectations are directed principally at research organisations and highlight their role in supporting researchers to ensure research data is properly managed. EPSRC set a deadline, 1 May 2015, for research organisation compliance with their expectations.
One of the expectations is that data supporting publications arising from funded research is openly available – this reflects the Common Principles on Data Policy published by RCUK (2011) and in the Royal Society’s subsequent (2012) report ‘Science as a Public Enterprise’. To monitor compliance with this expectation EPSRC have said that this autumn they will conduct checks of papers published after 1 May 2015 to ensure these provide appropriate directions to the supporting data.
Ben clarified that the checks will help to determine the level of awareness of the policy and expectations. He noted that there is a balance in what the EPSRC is trying to do. They are trying to create a new research culture, and they are primarily focused on what the institution should be doing to support that.
According to the EPSRC policy, in situations where research arises from collaborations, or from work partially funded by commercial partners, any potential problems with research data sharing should be addressed before the start of the project, in a data management plan. We therefore asked Ben why the EPSRC – of all the RCUK funding bodies– don’t require researchers to create a data management plan. Ben indicated that the main value in data management planning is to the researcher and the research organisation – adding them to EPSRC’s funding submission process would simply add to the admin and peer review burden without it being clear how peer reviewers could properly judge them because they don’t know the infrastructure available where the research is being conducted.
The question arose of whether a single RCUK policy on research data might be possible. Ben noted that the different councils fund different types of work, which informs their individual policies, and explained that although a single policy might be achievable it would require every council to change their existing policy and would be very disruptive of current processes across the whole system. As such he felt it would need a ‘very strong steer externally’ to drive such a change.
However, the research councils recognise the need for more guidance and are about to publish cross-council guidelines presenting a collective position on what should be done with particular types of data.
A question that often arises from researchers is ‘what data are we expected to keep and make available’? We were able to get confirmation that it is:
- the data that underpins publications
- the data that validates research findings
- the data that is worth keeping
All questions should be answered by considering the principles behind the policy. The default position is data should be open – in a way that does not damage the research process. The important thing is that the validity of the published research findings is testable.
An example of the way this principle can be used is when considering another common question – what to do in the situation where several papers are expected to come out of the one set of data. Researchers are concerned that if they release the data on the first publication it jeopardises their subsequent publications as they may be scooped. Ben acknowledged this is a concern but asked is it reasonable to sit on data for, say, five years so that other people end up being funded to generate the same data again?
He pointed out that the RCUK Common Principles state that those who undertake Research Council funded work may be entitled to a limited period of privileged use of the data they have collected to enable them to publish the results of their research. However, the length of this period varies by research discipline.
There is also the consideration of the way another user can access the data and reproduce results. The question is – how far do we go to enable a user to reproduce the work? The minimum is that we should provide the information that someone would need to be able to validate published work – this is also critical to maximise the impact of publicly funded research and to maintain public trust in science and research.
The software situation
We had representatives from Cambridge Enterprise and from the School of Technology at the meeting who had specific questions about sharing software. While Ben indicated he might need to reflect on some of the questions, we did come to some clarification on others.
Although software is different from other forms of intellectual property the same basic question arises: “is the institution best served by making it freely available or by commercialising it?” Both approaches can lead to the creation of jobs and economic impact. EPSRC is clear that the choice of exploitation strategy rests with the research organisation.
The EPSRC does not have an expectation about the licence under which software should be released.
It was agreed that if there is material that is potentially commercial, then we should take the steps to make it available and commercialise the software. It was confirmed we are able to make software arising from a research project available free for non-commercial re-use by other researchers (within the academic community) while at the same time making it available to others under a commercial licence
One can argue that since the taxpayer funded the work in the first place the taxpayer should not have to pay for it again, but this position, taken to its natural conclusion, of course would mean that no commercialisation of funded research should ever occur.
There is also the situation where a researcher has put their ‘life and soul’ into generating outputs and naturally feels they have some ownership of the work. Ben agreed that many of these questions are ‘very challenging’, but noted that researchers seldom ‘own’ their outputs – under RCUK grant conditions the research organisation owns all the intellectual assets arising from the funded research and is responsible for seeing that they are used to the benefit of society and the economy. Some of these questions stem from a mindset that insufficiently recognises the importance of ensuring that the economy and society as a whole benefits from publicly funded research, and a culture change is needed in addition to new processes.
The EPSRC do wish to avoid people sitting on data indefinitely because they don’t want to release their software. Ben said that in principle it is permissible for people to make software available through GitHub, but he would need to investigate how sustainable it is and how it is governed before being able to say whether GitHub is a reasonable option in terms of meeting EPSRC expectations..
Addressing (some) concerns
Time prevented us covering all of the topics we wished to raise. Many Cambridge researchers have raised questions about sharing data from collaborations – with concern that non-UK partners who do not have a data sharing requirement may find the UK requirements onerous and that this could decrease the amount of international collaborations in which UK institutions are involved.
There was also no magic bullet for the challenge of paying the not insignificant cost of storing research data safely for 10 years+. The problem is that where researchers were unaware of this expectation at the time they applied for their grant there is no allowance for it in their budget. This will not be an issue in the future as current grants are approved, but we are in a transition period now as the research from existing grants is published and the supporting data is being made available and stored. When we discussed this, Ben explained that the EPSRC does not have any additional funds to support this transition period, and that the costs need to be found within existing resources.
There have been some challenges with communication of the EPSRC policy. Many researchers at the University of Cambridge have said they would have liked to be informed about it directly by EPSRC (as, for example, they would expect to have been by e.g. the Wellcome Trust). Ben explained that the approach had deliberately been to communicate the policy through research organisation senior managers (e.g. ProVCs Research), and that this was because the expectations are addressed principally to research institutions, which have primary responsibility for ensuring that researchers manage their data effectively and have access to appropriate facilities to do so. However, he acknowledged that EPSRC could have communicated more with researchers and undertook to explore how more information could be made available directly to researchers.
Therefore it was helpful to be able to express some of the concerns and fears amongst the research community. We have been collating the questions that people have asked during our sessions and will compile a FAQ from this that will appear on our Research Data Management website. Ben indicated that there might be a possibility of a selection of these FAQs also appearing on the RCUK website to help address the universal questions about sharing research data. This step would be welcomed by the University.