Tag Archives: data

‘Paperless research’ solutions – Electronic Lab Notebooks

The Office of Scholarly Communication started 2017 with a discussion about ‘going digital’ – on 13 January 2017 we organised an event at Cambridge University’s Department of Engineering to flesh out the problems preventing researchers from implementing Electronic Lab Notebook solutions. Chris Brown from Jisc wrote an excellent blog post with his reflections of the event* and agreed for us to re-blog it here.

For researchers working in laboratories the importance of recording experiments, results, workflows, etc in a notebook is engrained into you as a student. However, these paper-based solutions are not ideal when it comes to sharing and preservation. They pile on desks and shelves, vary in quality and often include printed data stuck in. To improve on this situation and resolve many of these issues, e-lab notebooks (ELNs) have been developed. Jisc has been involved in this work through funding projects such as CamELN and LabTrove in the past. Recently, interest in this area has been renewed with the Next Generation Research Environment co-design challenge.

On Friday 13 January I attended the E-Lab Notebooks workshop at the University of Cambridge, organised by Office of Scholarly Communication. Its purpose was to open up the discussion about how ELNs are being used in different contexts and formats, and the concerns and motivations for people working in labs. A range of perspectives and experience was given through presentations, group and panel discussions. The audience were mostly from Cambridge, but there was representation from other parts of the UK, as well as Denmark and Germany. A poll at the start showed that the majority of the audience were researchers (57%).

Institutional and researchers’ perspective on ELNs at Cambridge

The first part of the workshop focussed on the practitioners’ perspective with presentations from the School of Biological Sciences. Alastair Downie (Gurdon Institute) talked about their requirements for an ELN as well as anxieties and risks of adopting a particular system. Research groups currently use a variety of tools, such as Evernote and Dropbox, and often these are trusted more than ELNs. The importance of trust frequently came up during the day. Alastair conducted a survey to gather more detail on the use and requirements of ELNs and received an impressive 345 responses. Cost and complexity were given as the main reasons not to use ELNs. However, when asked for the most important features, cost was less important but ease of use was the most. Researchers want training, voice recognition and remote access. There is clear interest across the school at all levels, but it requires a push with guidance and direction.

Pic1Marko Hyvönen (Dept of Biochemistry) gave the PI perspective and the issues with an ELN for a biochemical lab. He reinforced what Alastair had said about ELNs. He showed how paper log books pile up, deteriorate over time and sometimes include printed information. They are hard to read and easy to destroy, a poor return on effort, often disappear and not searchable. It was interesting to hear about bad habits such as storing data in non-standardised ways, missing data, printing out Word documents and sticking them into the lab books.

With 99% of their data electronic many of the issues in the use of lab books generally are around data management and not ELNs. An ELN solution should be easy to use, cross platform, have a browser front end, be generic/adaptable, allow sharing of data and experiments, enforce Standard Operating Procedures when needed, have templates for standard work to minimise repetition, include inputting of data from phones and other non-specific devices. What they don’t want are the “bells and whistles” features they don’t use. Getting buy-in from people is the top issue to overcome in implementing an ELN.

Views on ELNs from outside the UK

Jan Krause from the École pPolytechnique Fédérale de Lausanne (EPFL) gave a non-UK perspective on ELNs. He described a study, as part of a national RDM project, where they separated ELNs (75 proprietary, 12 open source – 91 features) and Lab Info Management Systems (LIMS) (281 proprietary, 9 open source – 95 features) and compared their features. The two tools used mostly in Switzerland are SLims (commercial solution) and openBIS (homemade tool). To decide which tool to use they undertook a three phase selection process. The first selection was based on disciplinary and technical requirements. The second selection involved detailed analysis based on user requirements (interviews and evaluation weighted by feature) and price. The third selection was tendering and live demos.

Data storage, security and compliance requirements

When using and sharing data you need to make sure your data is safe and secure. Kieren Lovell, from the University Information Services, talked about how researchers should keep their data and accounts safe. Since he started in May 2015, all successful hacks on the university have been due to human error, such as unpatched servers, failures in processes, bad password management, and phishing. Even if you think your data and research isn’t important, the reputational damage of security attacks to the university is huge. He recommended that any research data is shared through cloud providers rather than email, never trust public wifi as is not secure so use Cambridge’s VPN service. If using a local machine you should encrypt your hard drive.

Pic2

Providers’ perspective

In the afternoon, presentations were from the providers’ perspective. Jeremy Frey, from the University of Southampton, talked about his experience of developing an open source ELN to support open and interdisciplinary science. He works on getting the people and technology to work together. It’s not just recording what you have done, you need to include the narrative behind what you do. This is critical for understanding and ELNs are one part of the digital ecosystem in the lab. The solution they’ve developed is LabTrove, partly funded by Jisc, which is a flexible open source web based solution. Allowing pictures to be added to the notes has really helped with accessibility and usability, such as dyslexia. Sustainability, as is often the case, came up and how a community is required to support such a system. It also needs to expand beyond Southampton. Finally, Jeremy used Amazon Echo to query the temperature within part of his lab. He hopes that this will be used more in the lab in the future when it can recognise each researcher’s voice.

In the next two presentations, it was over to the vendors to show the advantages of adopting RSpace (by Rory Macneil) and Dotmatics (by Dan Ormsby). The functionality on offer in these types of solutions is attractive for scientists and RSpace showed how it links to most common file stores. With any ELN, it should enhance researchers’ workflow and integrate with the tools they use.

Removing the barriers

After lunch there were three parallel focus group discussions. I attended the one on sustainability, something that comes up frequently in discussions, particularly when looking at open source or proprietary solutions. Each group reported back as follows:

Focus group 1: Managing the supplier lock in risk

Stories of use need to be shared. The PDF is not a great format for sharing. Vendors tell the truth like estate agents. Have to accept the reality that won’t have 100% exporting functionality so need to decide the minimum level. Determine specific users’ requirements.

Focus group 2: Sustainability of ELN solutions

What is the lifetime of an ELN? How long should everything be accessible? Various needs come from group and funder requirements, e.g. 10 years. There is concern if you are relying on one commercial solution as companies can die, so how can you guarantee the data will be available? Have exit policies and support standards and interoperability so data can be moved across ELNs. Broken links and file formats expiring is not just an ELN problem, but relates to the archiving of data in general. Should selection and support of an ELN be at group, department, institution or national level? This is difficult if it’s in one group as adopting any technical solution requires support in place. It requires institutional level support.

Focus group 3: Human element of ELN implementation

The biggest hurdle is culture change and showing the benefits of using an ELN. Training and technical support costs money and time. It would cost more initially but becomes more efficient. You can incentivise people by having champions. There are different needs in a large institution. You may join a lab and find the ELN is not adequate. Legal issues around sensitive data complicates matters. You need to believe it will save time. Long term solutions include using cloud base solutions, even MS Office, but what happens when people leave? Need support from higher level. Functionality should be based on user requirements. A start would be to set up a mailing list of people interested in ELNs.

Remaining barriers to wide ELN adoption

Finally, I chaired a panel session with all the presenters. Marta Teperek had kindly asked me to give a short presentation on what Jisc does as many researchers don’t know (in fact I was asked “what’s Jisc?” in the focus group) and to promote the Next Generation Research Environment co-design challenge. Following my presentation the discussion was prompted by questions from the audience and remotely via sli.do. Much of the discussion re-iterated what had been said in the presentations, such as the importance of an ELN that meets the requirements of researchers. It should allow integration with other tools and exporting of the data for use it other ELNs. Getting ELNs used within a department is often difficult so it does need institution level commitment and support. Without this ELNs are unlikely to be adopted within an institution, never mind nationally. One size does not fit all and we should not try to build an ELN that tries to satisfy the different needs of various disciplines. A modular system that integrates with the tools and systems already in use would be a better solution. Much of what was said tallied with the feedback received for the Next Generation Research Environment co-design challenge.

Closing remarks

Ian Bruno closed the workshop and he reiterated what was said in the panel discussion. I found the event extremely helpful and it provided lots of useful information to feed into the Next Generation Research Environment work. I’d like to thank Marta Teperek for inviting me to chair the panel and for all her hard work putting the event together with @CamOpenData. Marta has put together the tweets from the day into the following storify.  All notes and presentations from the event are now published in Apollo, the University of Cambridge’s research repository.

Follow-up actions at the University of Cambridge – give it a go!

Those of you who are interested in ELNs and who are based at the University of Cambridge might be interested in knowing that we are planning to do some trial access to Electronic Lab Notebooks (ELN). The purpose of this trial will be to test out several ELNs to decide on solutions which might best meet the requirements of the research community. A mailing list has been set up for people who are interested in being part of this pilot or would like to be involved in these discussions. If you would like to be added to the mailing list, please fill in the form here: https://lists.cam.ac.uk/mailman/listinfo/lib-eln

*Originally published by Jisc on 18 January 2017.

Published on 29 January 2017
Written by Chris Brown
Creative Commons License

Open Data – moving science forward or a waste of money & time?

On the 4 November the Research Data Facility at Cambridge University invited some inspirational leaders in the area of research data management and asked them to address the question: “is open data moving science forward or a waste of money & time?”. Below are Dr Marta Teperek’s impressions from the event.

Great discussion

Want to initiate a thought-provoking discussion on a controversial subject? The recipe is simple: invite inspirational leaders, bright people with curious minds and have an excellent chair. The outcome is guaranteed.

We asked some truly inspirational leaders in data management and sharing to come to Cambridge to talk to the community about the pros and cons of data sharing. We were honoured to have with us:

  • PRE_IntroSlide_V3_20151123Rafael Carazo-Salas, Group Leader, Department of Genetics, University of Cambridge
    @RafaCarazoSalas
  • Sarah Jones, Senior Institutional Support Officer from the Digital Curation Centre; @sjDCC
  • Frances Rawle, Head of Corporate Governance and Policy, Medical Research Council; @The_MRC
  • Tim Smith, Group Leader, Collaboration and Information Services, CERN/Zenodo; @TimSmithCH
  • Peter Murray-Rust, Molecular Informatics, Dept. of Chemistry, University of Cambridge, ContentMine; @petermurrayrust

The discussion was chaired by Dr Danny Kingsley, the Head of Scholarly Communication at the University of Cambridge (@dannykay68).

What is the definition of Open Data?

IMG_PMRWithText_V1_20151126The discussion started off with a request for a definition of what “open” meant. Both Peter and Sarah explained that ‘open’ in science was not simply a piece of paper saying ‘this is open’. Peter said that ‘open’ meant free to use, free to re-use, and free to re-distribute without permission. Open data needs to be usable, it needs to be described, and to be interpretable. Finally, if data is not discoverable, it is of no use to anyone. Sarah added that sharing is about making data useful. Making it useful also involves the use of open formats, and implies describing the data. Context is necessary for the data to be of any value to others.

What are the benefits of Open Data?

IMG_RCSWithText_V1_20151126Next came a quick question from Danny: “What are the benefits of Open Data”? followed by an immediate riposte from Rafael: “What aren’t the benefits of Open Data?”. Rafael explained that open data led to transparency in research, re-usability of data, benchmarking, integration, new discoveries and, most importantly, sharing data kept it alive. If data was not shared and instead simply kept on the computer’s hard drive, no one would remember it months after the initial publication. Sharing is the only way in which data can be used, cited, and built upon years after the publication. Frances added that research data originating from publicly funded research was funded by tax payers. Therefore, the value of research data should be maximised. Data sharing is important for research integrity and reproducibility and for ensuring better quality of science. Sarah said that the biggest benefit of sharing data was the wealth of re-uses of research data, which often could not be imagined at the time of creation.

Finally, Tim concluded that sharing of research is what made the wheels of science turn. He inspired further discussions by strong statements: “Sharing is not an if, it is a must – science is about sharing, science is about collectively coming to truths that you can then build on. If you don’t share enough information so that people can validate and build up on your findings, then it basically isn’t science – it’s just beliefs and opinions.”

IMG_TSWithText_V1_20151126Tim also stressed that if open science became institutionalised, and mandated through policies and rules, it would take a very long time before individual researchers would fully embrace it and start sharing their research as the default position.

I personally strongly agree with Tim’s statement. Mandating sharing without providing the support for it will lead to a perception that sharing is yet another administrative burden, and researchers will adopt the ‘minimal compliance’ approach towards sharing. We often observe this attitude amongst EPSRC-funded researchers (EPSRC is one of the UK funders with the strictest policy for sharing of research data). Instead, institutions should provide infrastructure, services, support and encouragement for sharing.

Big data

Data sharing is not without problems. One of the biggest issues nowadays it the problem of sharing of big data. Rafael stressed that with big data, it was extremely expensive not only to share, but even to store the data long-term. He stated that the biggest bottleneck in progress was to bridge the gap between the capacity to generate the data, and the capacity to make it useful. Tim admitted that sharing of big data was indeed difficult at the moment, but that the need would certainly drive innovation. He recalled that in the past people did not think that one day it would be possible just to stream videos instead of buying DVDs. Nowadays technologies exist which allow millions of people to watch the webcast of a live match at the same time – the need developed the tools. More and more people are looking at new ways of chunking and parallelisation of data downloads. Additionally, there is a change in the way in which the analysis is done – more and more of it is done remotely on central servers, and this eliminates the technical barriers of access to data.

Personal/sensitive data

IMG_FRWithText_V1_20151126Frances mentioned that in the case of personal and sensitive data, sharing was not as simple as in basic sciences disciplines. Especially in medical research, it often required provision of controlled access to data. It was not only important who would get the data, but also what they would do with it. Frances agreed with Tim that perhaps what was needed is a paradigm shift – that questions should be sent to the data, and not the data sent to the questions.

Shades of grey: in-between “open” and “closed”

Both the audience and the panellists agreed that almost no data was completely “open” and almost no data was completely “shut”. Tim explained that anything that gets research data off the laptop to a shared environment, even if it was shared only with a certain group, was already a massive step forward. Tim said: “Open Data does not mean immediately open to the entire world – anything that makes it off from where it is now is an important step forward and people should not be discouraged from doing so, just because it does not tick all the other checkboxes.” And this is yet another point where I personally agreed with Tim that institutionalising data sharing and policing the process is not the way forward. To the contrary, researchers should be encouraged to make small steps at a time, with the hope that the collective move forward will help achieving a cultural change embraced by the community.

Open Data and the future of publishing

Another interesting topic of the discussion was the future of publishing. Rafael started explaining that the way traditional publishing works had to change, as data was not two-dimensional anymore and in the digital era it could no longer be shared on a piece of paper. Ideally, researchers should be allowed to continue re-analysing data underpinning figures in publications. Research data underpinning figures should be clickable, re-formattable and interoperable – alive.

IMG_DKWithText_V1_20151126Danny mentioned that the traditional way of rewarding researchers was based on publishing and on journal impact factors. She asked whether publishing data could help to start rewarding the process of generating data and making it available. Sarah suggested that rather than having the formal peer review of data, it would be better to have an evaluation structure based on the re-use of data – for example, valuing data which was downloadable, well-labelled, re-usable.

Incentives for sharing research data

IMG_SJWithText_V1_20151126The final discussion was around incentives for data sharing. Sarah was the first one to suggest that the most persuasive incentive for data sharing is seeing the data being re-used and getting credit for it. She also stated that there was also an important role for funders and institutions to incentivise data sharing. If funders/institutions wished to mandate sharing, they also needed to reward it. Funders could do so when assessing grant proposals; institutions could do it when looking at academic promotions.

Conclusions and outlooks on the future

This was an extremely thought-provoking and well-coordinated discussion. And maybe due to the fact that many of the questions asked remained unanswered, both the panellists and the attendees enjoyed a long networking session with wine and nibbles after the discussion.

From my personal perspective, as an ex-researcher in life sciences, the greatest benefit of open data is the potential to drive a cultural change in academia. The current academic career progression is almost solely based on the impact factor of publications. The ‘prestige’ of your publications determines whether you will get funding, whether you will get a position, whether you will be able to continue your career as a researcher. This, connected with a frequently broken peer-review process, leads to a lot of frustration among researchers. What if you are not from the world’s top university or from a famous research group? Will you be able to still publish your work in a high impact factor journal? What if somebody scooped you when you were about to publish results of your five years’ long study? Will you be able to find a new position? As Danny suggested during the discussion, if researchers start publishing their data in the ‘open”’ there is a chance that the whole process of doing valuable research, making it useful and available to others will be rewarded and recognised. This fits well with Sarah’s ideas about evaluation structure based on the re-use of research data. In fact, more and more researchers go to the ‘open’ and use blog posts and social media to talk about their research and to discuss the work of their peers. With the use of persistent links research data can be now easily cited, and impact can be built directly on data citation and re-use, but one could also imagine some sort of badges for sharing good research data, awarded directly by the users. Perhaps in 10 or 20 years’ time the whole evaluation process will be done online, directly by peers, and researchers will be valued for their true contributions to science.

And perhaps the most important message for me, this time as a person who supports research data management services at the University of Cambridge, is to help researchers to really embrace the open data agenda. At the moment, open data is too frequently perceived as a burden, which, as Tim suggested, is most likely due to imposed policies and institutionalisation of the agenda. Instead of a stick, which results in the minimal compliance attitude, researchers need to see the opportunities and benefits of open data to sign up for the agenda. Therefore, the Institution needs to provide support services to make data sharing easy, but it is the community itself that needs to drive the change to “open”. And the community needs to be willing and convinced to do so.

Further resources

  • Click here to see the full recording of the Open Data Panel Discussion.
  • And here you can find a storified version of the event prepared by Kennedy Ikpe from the Open Data Team.

Thank you

We also wanted to express a special ‘thank you’ note to Dan Crane from the Library at the Department of Engineering, who helped us with all the logistics for the event and who made it happen.

Published 27 November 2015
Written by Dr Marta Teperek
Creative Commons License