As part of the Office of Scholarly Communication Open Access Week celebrations, we are uploading a blog a day written by members of the team. Wednesday is a piece by Dr Marta Teperek reporting on the Software Licensing Workshop held on 14 September 2015 at Cambridge.
Uncertainties about sharing and licensing of software
If the questions that the Research Data Service Team have been asked during data sharing information sessions with over 1000 researchers at the University of Cambridge are any indicator, then there is a great deal of confusion about sharing source code.
There have been a wide range of questions during the discussions in these sessions, and the Research Data Service Team has recorded these. We are systematically ensuring that the information we are providing to our research community is valid and accurate. To address the questions about source code we decided to call in expert help. Shoaib Sufi and Neil Chue Hong* from the Software Sustainability Institute agreed to lead a workshop on Software Licensing in September, at the Computer Lab in Cambridge. Shoaib’s slides are here, and Neil’s slides on Open Access policies and software sharing are here.
We had over 50 researchers and several research data managers from other UK universities attending the Software Licensing workshop. The main questions we were trying to resolve was: Are researchers expected to share source code they used in their projects? And if so, under what conditions?
Is software considered as ‘research data’ and does it need to be shared?
The starting question in the discussion was whether software needed to be shared. Most public funders now require that research data underpinning publications is made available. What is the definition of research data? According to the EPSRC research data “is defined as recorded factual material commonly retained by and accepted in the scientific community as necessary to validate research findings”. Therefore, if software is needed to validate findings described in a publication, researchers are expected to make it available as widely as possible. There are some exceptions to this rule. For example, if there is an intention to commercialise the software there might not be a need to share it, but the default assumption is that the software should be shared.
The importance of putting a licence on software
It is important that before any software is shared, the creator considers what they would like others to be able to do with it. The way to indicate the intended reuse of the software is to place a licence on it. This governs the permission being granted to others with regards to source code by the copyright holder(s). A licence determines whether the person who wants to get hold of software is allowed to use, copy, resell, change, or distribute it. Additionally, a licence should also determine who is liable if something goes wrong with the software.
Therefore, a licence not only protects the intellectual property, but also helps others to use the software effectively. If people who are potentially interested in a given piece of software do not know what they are allowed to do with it, it is possible they will search for alternative solutions. As a consequence, researchers could lose important collaborators, buyers, or simply decrease the citation rate that could have been gained from people using and citing software in their publications.
Who owns the copyright?
The most difficult question when it comes to software licensing is determining who owns the copyright – who is allowed to license the software used in research? If this is software created by a particular researcher then it is likely that s/he will be the copyright owner. At the University of Cambridge researchers are the primary owners of intellectual property. This is however a very generous right – typically employers do not allow their employees to retain copyright ownership. Therefore, the issue of copyright ownership might get very complicated for researchers involved in multi-institutional collaborations. Additionally, sometimes funders of research will retain copyright ownership of research outputs.
Consequences of licensing
An additional complication with licensing software is that most licences cannot be revoked. Once something has been licensed to someone under a certain licence, it is not possible to take it back and change the licence. Moreover, if there is one licence for a set of software, it might not be possible to license a patch to the software under a different licence. The issue of licence compatibility sparked a lot of questions during the workshop, with no easy answers available. The overall conclusion was that whenever possible, mixing of licences should be avoided. If use of various licences is necessary, researchers are recommended to get advice from the Legal Services Office.
Good practice for software management
So what are the key recommendations for good practice for software management? Before the start of a research project, researchers should think about who the collaborators and funders are, and what the employer’s expectations are with regards to intellectual property. This will help to determine who will own the copyright over the software. Funders’ and institutional policies for research data sharing should be consulted for expectations about software sharing With this information it is possible to prepare a data management plan for the grant application.
During the project researchers need to ensure that their software is hosted in an appropriate code repository – for example, GitHub or Bitbucket. It is important to create (and keep updating!) metadata describing any generated data and software.
Finally, when writing a paper, researchers need to deposit all releases of data/software relevant to the publication in a suitable repository. It is best to choose a repository which provides persistent links e.g. Zenodo (which has a GitHub integration), or the University of Cambridge data repository (Apollo). It is important to ensure that software is licensed under an appropriate licence – in line with what others should be allowed to do with the software, and in agreement with any obligations there might be with any other third parties (for example, funders of the research). If there is a need to restrict the access to the software, metadata description should give reasons for this restriction and conditions that need to be met for the access to be granted.
Valuable resources to help make right decisions
Both Neil and Shoaib agreed that proper management and licensing of software might be sometimes complicated. Therefore, they recommended various resources and tools to provide guidance for researchers:
- Case studies guidance on what should be shared in research projects using various forms of software – prepared by the Software Sustainability Institute;
- Attorney-verified interpretation of what most popular license terms mean – prepared by tldrLegal;
- Frequently Asked Questions about software sharing – with answers validated by Ben Ryan from the EPSRC.
The workshop was organised in collaboration with Stephen Eglen from the Department of Applied Mathematics and Theoretical Physics (University of Cambridge) who chaired the meeting, and with Andrea Kells from the Computer Lab (University of Cambridge) who hosted the workshop.
The Research Data Service is also providing various other opportunities for our research community to pose questions directly of the funding bodies. We invited Ben Ryan from the EPSRC to come to speak to a group of researchers in May and the resulting validated FAQs are now published on our research data management website. Similarly, researchers met with Michael Ball from the BBSRC in August.
These opportunities are being embraced by our research community.
*About the speakers
Shoaib Sufi – Community Lead at the Software Sustainability Institute
Shoaib leads the Institute’s community engagement activities and strategies. Graduating in Computer Science from the University of Manchester in 1997, he has worked in the commercial sector as a systems programmer and then as software developer, metadata architect and eventually a project manager at the Science and Facilities Technologies Council (STFC).
Neil Chue Hong – Director at the Software Sustainability Institute
Neil is the founding Director of the Software Sustainability Institute. Graduating with an MPhys in Computational Physics from the University of Edinburgh, he began his career at EPCC, becoming Project Manager there in 2003. During this time he led the Data Access and Integration projects (OGSA-DAI and DAIT), and collaborated in many e-Science projects, including the EU FP6 NextGRID project.