In May 2017 the Office of Scholarly Communication organised a workshop with Paola Quattroni from Cancer Research UK (CRUK) focusing on data sharing policy and practices. It was a great opportunity for the funder to outline its policies and current initiatives on data sharing and for the Cambridge researchers to discuss the issues, suggest further solutions and give feedback to the funder about the changes they would like to see implemented. This blog highlights the main points of the workshop.
This session was continuing the conversation from February last year when the CRUK and Wellcome Trust came to Cambridge to speak to our research community.
CRUK’s grand ambition
In her presentation “Data sharing in policy and practice with Cancer Research UK“, Paola Quattroni began with CRUK’s grand ambition: “To bring forward the day all cancers are cured” and “see three quarters of people surviving cancer within the next 20 years.”
One of the key elements to materialise this and maximise public benefit is data sharing. CRUK firmly believes that transparency, research integrity and swift dissemination and reproducibility of research results are key ingredients to the success.
“Our goal is to improve how research is carried out,” explained Paola, who is the Research Funding Manager – Data at CRUK. “We fund the best science and expect researchers to follow best practices… Improving patient benefit and health is our ambition.”
She emphasised the need to have ongoing discussions with the research community and work together on how to overcome barriers to data sharing. Appropriate sharing and dissemination of research data are particularly important for CRUK, and good data management is the first step to get most from the data and facilitate sharing and re-use. In this context, CRUK is actively working to increase and improve data sharing by being instructive but not necessarily demanding in its requirements.
The majority of the attendees came from the fields of Biological Sciences and Clinical Medicine. When asked why they came to the workshop the consensus was to be informed regarding the CRUK policy and what actions they needed to take. Examples of individual responses included:
- To learn how to fulfil funders’ requirements.
- To learn more about processing data.
- To know the policy on sharing code and data.
- To learn the difference between data sharing and open data.
- To discuss about the costs of storing data and how to be able to forecast costs for periods of more than 10 years.
- To learn more about contractual agreements.
- To learn what the funder expects regarding data sharing.
- To learn and inform other colleagues about it.
The workshop started with an icebreaker. The audience was asked to pinpoint why they came to the workshop and what they hoped to gain from it. Following that, Paola Quattroni presented CRUK’s policy on the management and sharing of data, explained why data sharing is important, what are the barriers and outlined current initiatives to improve data sharing among researchers.
Paola highlighted some of the work CRUK is doing to increase data sharing such as the recent signing of the San Francisco Declaration of Research Assessment (DORA) and the fact that CRUK is continuing to work with others to put it into practice. Other future activities include:
- Encouraging grant applicants to explain the significance and impact of their discoveries, publications and a broad range of other outputs (e.g. policy influence).
- Being more explicit about evaluating grant applicants’ publications according to their scientific content, rather than simply consider where they are published.
- Working with reviewers and committee members to evaluate the impact of all research outputs.
- Measuring the re-use of research.
- Encouraging replication studies.
- Recognising and rewarding researchers who share their data.
After the presentation, everybody split into groups and identified various challenges of data sharing which were then analysed by the teams and the trainers. The last part of the workshop concentrated on group feedback and suggestions from the audience on what funders could do to further enhance collaboration with the research community.
The workshop continued by splitting into groups. Each group identified challenges and problems of data sharing with regard to Publishing, Skills and Training, Rewards and Data Infrastructure:
A recurring item among all groups was the fear of being scooped and the loss of publication opportunities. Also, that the impact factor is still be-all, end-all. Other challenges included:
- Accepting citations of preprints as a metric of achievement – can be dangerous as groups can release data non-peer reviewed online to discourage innovation of competitors.
- Range of requirements across different journals/publishers.
- Need to take care not to kill analytical innovation.
- The larger the collaboration the higher the importance of a standardised data format and analysis.
Skills & Training
The Skills and Training section concentrated on how to write data management plans and standardise laboratory notes as well as the necessary training to catch up with technology. Other points included:
- Lack of computer skills/knowledge to physically upload data.
- Formatting data.
- Version Control.
It was apparent in most of the groups that time, cost and re-usability problems were significant inhibitors regarding rewards and incentives:
- There is a need to overcome the ‘time burden’ aspect of sharing.
- Cost and Time – solution: Electronic Laboratory Notebooks (ELN) – one or many? Public or private?
- New PI (Persistent Identifier) for metrics.
- Re-usability – how do you measure it?
- DMPs are required at the time of grant submission. However, the researcher needs to report after one year because various parameters can change and might need to be re-adjusted.
The need for standardisation in data acquisition, storage and analysis methods and how ‘big data’ is handled by the funders were common themes in this category. In addition, it was pinpointed that individual Institutes should have the infrastructure to support data sharing and DMP writing.
Other data infrastructure challenges included:
- Data formats – for example there are so many different scanners for imaging, which all have different formats.
- EU project testing imaging modality across 20 sites where integrating the data is a challenge. The analogy is a clinical trial where protocols and practices have to produce comparable data.
- Cost of the software: there are open source imaging software available. However, you may need different imaging analysis tools.
Although there was not enough time to concentrate on all challenges, the ongoing discussions turned into ideas that provided the seeds for possible solutions or change of strategies regarding how data is being valued and shared.
For example, what if you are just scooped? Would citations help? One solution is that if you have a DOI stamp this can be evidence that you were first.
Currently, publications are considered to be the sole reward so there is a wide fear of loss of publication opportunities. However, if your data is more valuable than the paper, then the dataset becomes the incentive and is highly valued. How can this be achieved? Micropublishing? If you can build a career on data publishing instead of papers, it would change the incentive strategy. Instead of relying on the old system where there is a big story, what about writing a small story or event data papers? Data in conjunction with data notes is a type of article. These kind of outputs are valuable and publishers should consider this.
Despite the fact that staff working for funders have often been researchers themselves, they could visit researchers from different disciplines to get an idea of what is needed, especially with discipline specific DMPs. Some participants suggested that DMPs should be discipline-specific and standardised. As an example, if preclinical and clinical data had the same format, such data could easily be compared.
Another solution proposed by the participants to the financial challenges associated with data sharing could be an open access fund for data, similar to COAF that supports the cost of infrastructure and rewards openness.
As already mentioned, the discussions evolved to the point that there was no time left to analyse all challenges and talk about practical issues.
For example, there was a clear need from the participants’ point of view for practical guidance on data plans and distinct approaches per field (STEM/HASS). Questions arose about the use and cost of ELNs and any implications in the future. Similarly, about what happens if data needs to be deposited somewhere else or in the middle of the plan. What would the rules be for additional funding midway in such instances? Lastly, preservation and infrastructure costs that associate projects in the long term was another big topic as well future funders’ strategies regarding ‘big data’. (See this blog for a discussion on the cost issue).
This workshop brought together researchers from different disciplines interested in learning more about data management and sharing at CRUK. From the funder’s perspective, it was a great opportunity to discuss policies and initiatives in data sharing and to hear directly from researchers about the main barriers to data sharing. CRUK strives to help researchers overcome these barriers and is actively working to facilitate the way research is carried out and ultimately shared.
It was agreed that this workshop was only the beginning and highlighted that collaboration is key to overcome some of these challenges.
The main outcomes, however, were clear from the onset:
- There is a recognised need for ongoing collaboration between funders, researchers and institutions.
- A global view is required – all funders should have the same vision and aims regarding data sharing.
- Reporting and disseminating all data is key.
- Data needs to be available and reusable.
- We need to overcome the technical and infrastructure challenges of how to measure the “journey” of the data and its re-usability.