When it comes to software management there are probably more questions than answers to problems – that was the conclusion of a recent workshop hosted by the Office of Scholarly Communication (OSC) as part of a national series on software sustainability, sharing and management, funded by Jisc. The presentations and notes from the day are available, as is a Storify from the tweets.
The goal of these workshops was to flesh out the current problems in software management and sharing and try to identify possible solutions. The researcher-led nature of this event provided researchers, software engineers and support staff with a great opportunity to discuss the issues around creating and maintaining software collaboratively and to exchange good practice among peers.
Whilst this might seem like a niche issue, an increasing number of researchers are reliant on software to complete their research, and for them the paper at the end is merely an advert for the research it describes. Stephen Eglen described this in his talk as an ‘inverse problem’ – papers are published and widely shared but it is very hard to get to the raw data and code from this end product, and the data and code are what is required to ensure reproducibility.
These workshops were inspired by our previous event in 2015, where Neil Chue Hong and Shoaib Sufi spoke with researchers at Cambridge about software licensing and Open Access. Since then the OSC has had several conversations with Daniela Duca at Jisc and together we came up with an idea of organising researcher-led workshops across several institutions in the UK.
Opening up software in a ‘post-expert world’
We began the day with a keynote from Neil Chue-Hong from the Software Sustainability Institute who outlined the difficulties and opportunities of being an open researcher in a ‘post-expert world’ (the slides are available here). Reputation is crucial to a researcher’s role and therefore researchers seek to establish themselves as experts. On the other hand, this expert reputation might be tricky to maintain since making mistakes is an inevitable part of research and discovery, which is poorly understood outside of academia. Neil introduced Croucher’s Law to help us understand this: everyone will make mistakes, even an expert, but an expert will be aware of this so will automate and share their work as much as possible.
Accepting that mistakes are inevitable in many ways makes sharing less intimidating. Papers are retracted regularly due to errors and Neil gave examples from a variety of disciplines and career stages where people were open about their errors so their communities were accepting of the mistakes. In fact, once you accept that we will all make mistakes then sharing becomes a good way to get feedback on your code and to help you fix bugs and errors.
This feeds into another major theme of the workshop which Neil introduced; that researchers need to stop aiming for perfect and adopt ‘good enough’ software practices for achievable reproducibility. This recognises that one of the biggest barriers to sharing is the time it takes to learn software skills and prepare data to the ‘best’ standards. Good enough practices mean accepting that your work may not be reproducible forever but that it is more important to share your code now so that it is at least partially reproducible now. Stephen Eglen built on this with his paper on ‘Towards standard practices for sharing computer code and programs in neuroscience’ which includes providing data, code, tests for your code and using licences and DOIs.
Both speakers and the focus groups in the afternoon highlighted that political work is needed, as well as cultural change, to normalise code sharing. Many journals now ask for evidence of the data which supports articles and the same standards should apply to software code. Similarly, if researchers ask for access to data when reviewing articles then it makes sense to ask for the code as well.
Automating your research: Managing software
Whilst sharing code can be seen as the end of the lifecycle of research software, writing code with the intention of sharing it was repeatedly highlighted as a good way to make sure it is well-written and documented. This was one of several ‘selfish’ reasons to share, where sharing also helps the management of software, through better collaboration, the ability to track your work and being able to use students’ work after they leave.
Croucher’s Law demonstrates one of the main benefits of automating research through software; the ability to track the mistakes to improve reproducibility and make fixing mistakes easier. There were lots of tools mentioned throughout the day to assist with managing software from the well-known version control and collaboration platform Github to the more dynamic such as Jupyter notebooks and Docker. As well as these technical tools there was also discussion of more straightforward methods to maintain software such as getting a code buddy who can test your code and creating appropriate documentation.
Despite all of these tools and methods to improve software management it was recognised by many participants that automating research through software is not a panacea; the difficulties of working with a mix of technical and non-technical people formed the basis of one of the focus groups.
Managing software appropriately allows it to be shared but re-using it in the long- (or even medium) term means putting time into sustaining code and make sure it is written in a way that is understandable to others. The main recommendations from our speakers and focus groups to ensure sustainability were to use standards, create thorough documentation and embed extensive comments within your code.
As well as thinking about the technical aspects of sustaining software there was also discussion of what is required to motivate people to make their code re-usable. Contributing to a community seemed to be a big driver for many participants so finding appropriate collaborators is important. However larger incentives are needed and creating and maintaining software is not currently well-rewarded as an academic endeavour. Suggestions to rectify this included more software-oriented funding streams, counting software as an output when assessing academics, and creating a community of software champions to mirror the Data Champions scheme we recently started in Cambridge.
This workshop was part of a national discussion around research software so we will be looking at outcomes of other workshops and wider actions the Office of Scholarly Communication can support to facilitate sharing and sustaining research software. Apart from Cambridge, five other institutions held similar workshops (Bristol, Birmingham, Leicester, Sheffield, and the British Library). As one of the next steps, all organisers of these events want to meet up to discuss the key issues raised by researchers to see what national steps should be taken to better support the community of researchers and software engineers and also to consider if there any remaining problems with software which could require a policy intervention.
However, following the maxim to ‘think global, act local’, Neil’s closing remarks urged everyone to consider the impact they can have by influencing those directly around them to make a huge difference to how software is managed, sustained and shared across the research community.