Tag Archives: Open Research

Open at scale: sharing images in the Open Research Pilot

Dr Ben Steventon is one of the participants in the Open Research Pilot. He is working with the Office of Scholarly Communication to make his research process more open and here reports on some of the major challenges he perceives at the beginning of the project.

The Steventon Group is a new group established last year which looks at embryonic development, in particular focusing on the zebrafish. To investigate problems in this area the group uses time-lapse imaging and tracks cells in 3D visualisations which presents many challenges when it comes to data sharing, which they hope to address through the Wellcome Trust Open Research Project. Whilst the difficulties that this group are facing are specific to a particular type of research, they highlight some common challenges across open research: sharing large files, dealing with proprietary software and joining up the different outputs of a group.

Sharing imaging data 

The data created by time-lapse imaging and cell tracking is frequently on a scale that presents a technical, as well as financial, challenge. The raw data consists of several terabytes of film which is then compressed for analysis into 500GB files. These compressed files are of a high enough quality that they can be used for analysis but they are still not small enough that they can be easily shared. In addition the group also generates spreadsheets of tracking data, which can be easily shared but are meaningless without the original imaging files and specific software to allow the two pieces of data to be connected. One solution which we are considering is the Image Data Resource, which is working to make imaging datasets in the life sciences, which have not previously been shareable due to their size, available to the scientific community to re-use.

Making it usable

The software used in this type of research is a major barrier to making the group’s work reproducible. The Imaris software the group uses costs thousands of pounds so anything shared in their proprietary formats are only accessible to an extremely small group of researchers at wealthier institutions, which is in direct opposition to the principles of Open Research. It is possible to use Fiji, an open source alternative, to recreate tracking with the imaging files and tracking spreadsheets; however, the data annotation originally performed in Imaris will be lost when the images are not saved in the proprietary formats.

An additional problem in such analyses is the sharing of protocols that detail the methodologies applied, from the preparation of the samples all the way through data generation and analysis. This is a common problem with standard peer-review journals that are often limited in the space available for the description of methods. The group are exploring new ways to communicate their research protocols and have created an article for the Journal of Visualised Experiments, but these are time consuming to create and so are not always possible. Open peer-review platforms potentially offer a solution to sharing detailed protocols in a more rapid manner, as do specialist platforms such as Wellcome Open Research and Protocols.io.

Increasing efficiency by increasing openness

Whilst the file size and proprietary software in this type of research presents some barriers to sharing, there are also opportunities through sharing to improve practice across the community. Currently there are several different software packages being used for visualisation and tracking. Therefore, sharing more imaging data would allow groups to try out different types of images on different tools and make better purchasing decisions with their grant money. Furthermore, there is a great frustration in this area that lots of people are working on different algorithms for different datasets, so greater sharing of these algorithms could reduce the amount of time wasted creating algorithms when it might be possible to adapt a pre-existing one.

Shifting models of scholarly communication

As we move towards a model of greater openness, research groups are facing a new difficulty in working out how best to present their myriad outputs. The Steventon group intends to publish data (in some form), protocols and a preprint at the same time as submitting their papers to a traditional journal. This will make their work more reproducible, and it also allows researchers who are interested in different aspects of their work to access the bits that interest them. These outputs will link to one another, through citations, but this relies on close reading of the different outputs and checking references. The Steventon group would like to make the links between the different aspects of their work more obvious and browsable, so the context is clear to anyone interest in the lab’s work. As the research of the group is so visual it would be appropriate to represent the different aspects of their work in a more appealing form than a list of links.
The Steventon lab is attempting to link and contextualise their work through their website, and it is possible to cross-reference resources in many repositories (including Cambridge’s Apollo), but they would like there to be a more sustainable solution. They work in areas with crossovers to other disciplines – some people may be interested in their methodologies, others the particular species they work on, and others still the particular developmental processes they are researching. There are opportunities here for openness to increase the discoverability of interdisciplinary research and we will be exploring this, as well as the issues around sharing images and proprietary software, as part of the Open Research Pilot.

Published 8 May 2017
Written by Rosie Higman and Dr Ben Steventon

Creative Commons License

“Become part of the research process” – observations from RLUK2017

When is a librarian not a librarian? Rather than a bad joke, this was one of the underlying interesting discussions arising from the 2017 RLUK conference held earlier in March. The conference Twitter hashtag was #rluk17 and the videos are now available. The answer, it appears is when we start talking about partnerships with, rather than support of, our research community.

As always with my write-ups of conferences, these are simply the parts that have resonated with me, and the impression I walked away with. This write up will be very different from anyone else’s from the conference, such as this blog from Lesley Pitman, and the RLUK conference report.

I have also written a sister blog describing the workshop I co-presented on the topic of Text and Data Mining.

Libraries’ role in research

The role of libraries and the people who work in them was the theme of one session – with arguments that libraries should be central to the research process.

Masud Khokhar, the Head of Digital Innovation and Research Services at Lancaster University, gave a talk on the Role of research libraries in a technological future. He said we need to get out of the culture of researchers only coming to the library with research outputs/outcomes. Language matters, he said. Lancaster University has made a deliberate decision not to use the word ‘support’, because “we have bigger aims than that”. Partnership is the future for libraries rather than just collaboration. We need to be creative co-developers working with the research community if we are to be a research library.

We need to generate a culture of experimentation: “Be creative, experiment fast, succeed or fail fast and learn from both”. It is a good challenge for us librarians to be more creative and less passive. We should embed library in research questions and processes.

The issue of how we present information to our clients came up, with Khokhar saying consistency when searching should no longer be important – we should depend on the context of the searcher. “Content might be king, but context is the kingdom”, he said.

He also showed evidence of how data visualisation can lead to greater downloads of data, and it may be even more important to data use than good metadata. Indeed, Lancaster University Library has allowed 10TB of server space for analytics of library data alone, because this is a growing and important area to drive decision making.

This perspective was also put forward by Patrick McCann from the University of St Andrews Library. He talked about the new role of Research Software Engineers, which is a role which works with the research community to develop research solutions and research outputs. St Andrews has a senior librarian for digital humanities and research computing. He noted: “we are part of the research process”.

A comment was made during the conference that many speakers had identified themselves as ‘not a librarian’. There was a call for us to open the idea of what a librarian is. Masud Khokhar suggested he would consider himself to be an ‘honorary’ librarian.

But the ‘librarian or not’ debate is an interesting question. William Nixon from the University of Glasgow noted that their Research Data Management team are not librarians, saying “it is a skill set in itself. Kokhar argued that we need to develop digital leaders for libraries. Are these people already in libraries who we train up, or are they people with these skill sets we bring in and introduce to library culture?

Libraries’ role in the Open Science agenda

Libraries are the central pivot point for the move to open research across the world, was the message from presentations about activities in Europe and Canada. This fits with the narrative that libraries should be driving the agenda rather than reacting to it.

Susan Reilly, the outgoing Executive Director of LIBER talked about re-imagining the library space in the context of open science as she presented the LIBER 2020 vision.

Open Science (a term used in Europe for ‘open research’) is on the European agenda, every single member state has signed up to develop the necessary skills, development of the open science cloud. There has been an 80 million Euro investment in this. Given LIBER is a group of libraries with a common mission to enable world-class research, the question is whether LIBER should make the whole strategy about open science?

Reilly noted that libraries have been ‘bold’ on open science for years and held back by faculty and publishers. She argued we must be resilient on this agenda. Libraries need to be taking a leadership role in all research. “Libraries need to get into the researchers’ lifecycle”, she argued. They should provide tools throughout the research lifecycle to ensure ‘open science’. To achieve this, we need digital skills, which underpin a more open and transparent research lifecycle.

The end goal, said Reilly, is world-class research, but open science facilitates that through facilitating collaboration and ensuring the sustainability of research. The 2020 vision is: “Libraries powering sustainable knowledge in the digital age”.

The proposal is that by 2022, open access will be the predominant form of publishing and research data is Findable Accessible Interoperable Reusable (F.A.I.R). Reilly noted that it is research data management “where we get the most pushback” – an experience reflected in many other institutions.

Libraries can provide platforms of innovative scholarly communications. They can facilitate open access to research publications, with services ranging from payment for APCs and becoming a publisher. Libraries also offer research data management, innovative metrics and innovative peer review.

This is an opportunity for libraries to disrupt scholarly communications system. In order for us to achieve this goal, we need research skills that underpin a more open and transparent research lifecycle – and so we need to equip researchers to do this.

Reilly noted that when LIBER went out to stakeholders – “they bought into the vision”. To achieve these goals, Reilly said it is important for libraries to have a strong relationship with institutional leadership. There needs to be transparency around the cost of publications.

We need to work on diversifying librarian’s skills and research skills. This is a matter of ‘compete or fail’ or Elsevier could take over what libraries do. We need to get into the research workflow.

LIBER’s outcomes from their consultation with stakeholders were:

  • Importance of libraries having a string relationship with institutional leadership
  • Transparency around the cost of publications
  • Working on diversifying librarians’ skills AND researchers skills
  • Be clear about what the role of libraries is/should be
  • Compete or fail
  • Get into the research workflow
  • Opportunity for libraries to disrupt scholarly communications system

It was interesting (for me) to note how similar these are to the Strategic Goals of the Office of Scholarly Communication:

The Open Scholarship theme was continued in a presentation by representatives of RLUK’s sister organisation, the Canadian Association of Research Libraries (CARL). This is a leadership organisation thinking of ways to enhance members capacity and leadership in this environment. Martha Whitehead, the President of CARL and Susan Haigh, the Executive Director presented the Canadian Roadmap for advancing Scholarly Communication.

There are issues with open access, they noted. Repositories need to improve in two major areas – we need to improve their functionality, and support and encourage the development of value added services such as peer review and tools.

There have been challenges in discussions with publishers about maximising openness which have become ‘somewhat fraught’. Libraries are working with Canadian journals to develop, assess and adopt sustainable open access funding models. The idea is that the model will be non-profit (where the money goes back in).  While it is not clear if the discussions will coalesce around anything new and bold, there is value in bringing together the communities.

The Canadians presented an initiative related to Research Data Management (RDM) called Portage. This is designed to help with RDM in the country. It has a director, and because it is an organisation with a facility, the library voice is well respected around the table. Experts are contributing their expertise to this. There is also a Federated Research Data Repository – a joint software development project with Compute Canada, and the Scholars Portal Dataverse offers data deposit and sharing at no charge to researchers.

New challenges for libraries

Torsten Reimer spoke about the new focus of the British Library on ‘everything accessible’. He discussed the implications for libraries as we move towards a more open access future. We need to change focus, he argued, with new skills and areas, and we should be working together with the research community.

As more material is available openly then what is the role of a national library? Reimer asked. Perhaps libraries need to provide infrastructure, we should focus on preservation & adding value. Given the majority of academics use software in the context of their projects, should libraries be supporting, integrating and preserving it?

The ‘just in case’ model is no longer feasible for libraries. The British Library is looking at partnerships in content creation, research & infrastructure. Examples include plans to expose the EThOS API to allow for machine consumption of data about theses. They are also looking to replace the current “hand knitted” preservation system with more robust scalable shareable solution

Collaborate or die?

The opening keynote was by John MacColl, the University Librarian & Director of Library Services, at St Andrews University (and outgoing president of RLUK). MacColl spoke about the ‘research commons’.

He referred to the ‘tragedy of the commons’ which was an argument put forward in 2003 that individuals cancelling subscriptions for the Big Deal had meant an increase of 129% in cost to access literature. Publishers are creating ‘artificial scarcity’ to the literature which means they can charge as they please. This is a ransack of the commons.

It is not just cost, these Big Deals have meant that most collections are becoming the same and we are losing access to other resources. MacColl also noted the lost need for bibliographers. But his call was that research libraries face a challenge in re-appropriating the responsibility for the preservation of key scholarly objects held on publisher servers and other vendors worldwide.

So, argued McColl, we need to work collectively to ‘find means of getting around being held ransom by publishers’. We need a ‘post-collective Big Deal world’. This is Plan B, where we take back control, find post cancellation access, arrange document delivery and green open access.

But this is not something we can do individually. MacColl asked: “When we are doing things in our own institutions, who are we letting down by not thinking of the wider community?” We need some sort of formal governance to make that happen. The challenge is Higher Education is a very conservative world. People will not take a step unless convinced this is a sensible step to take.

We need to focus on the global – where libraries collaborate on shared bibliographic data and create a ‘collective collection’. Plan B needs to be national.

So much more

This blog has glossed over many very interesting presentations and talks. I do, however wish to mention the last session of the event which broadened the discussion outside of the library to the issue of ‘inclusion’ in the Higher Education sector. Libraries, as a neutral ‘safe’ place on campus, of course have a big role to play in this. As has been the case in every meeting I have attended since November last year, the double threats of Brexit and Trump have never been far from the discussion, and never more so than in the context of inclusion.

Darren Lund, a ‘middle aged white guy from Canada’ spoke very entertainingly about his work on diversity, making the point that if you have privilege you should use it to make positive change.

The final talk was a sobering walk through some research into the racial diversity of universities with plenty of data proving that universities are not as liberal as they are perceived to be by us. Statistics such as 92% of professors in the UK are white, and the fact there are only three Vice Chancellors from the black and minority ethnic community in the UK, supported Professor Kalwant Bhopal’s argument that we need to actively address the issue of inclusion.

Summary

This blog began with a fairly provocative statement – that people do not identify themselves as librarians when we start talking about partnerships with, rather than support of, our research community. This is an interesting question. Many librarians feel that their role is to support, not lead. Yet others argue that unless we do take a leading role we will become redundant.

So what is the solution? Do we widen the definition of a library? Do we widen the definition of a librarian? Or are we happy with the ‘honorary librarian’ solution? These are some of the questions that need further teasing out. One thing is sure, the landscape is changing rapidly and we need to change with it.

Published 30 March 2017
Written by Dr Danny Kingsley
Creative Commons License

Open Research Project, first thoughts

Dr Laurent Gatto is one of the participants in the Office of Scholarly Communication’s Open Research Pilot. He has recently blogged about his first impressions of the pilot. With his permission we have re-blogged it here.

I am proud to be one of the participants in the Wellcome Trust Open Research Project (and here). The call was initially opened in December 2016 and was pitched like this:

Are you in favour of more transparency in research? Are you concerned about research reproducibility? Would you like to get better recognition and credit for all outputs of your research process? Would you like to open up your research and make it more available to others?

If you responded ‘yes’ to any of these questions, we would like to invite you to participate in the Open Research Pilot Project, organised jointly by the Open Research team at the Wellcome Trust and theOffice of Scholarly Communication at the University of Cambridge.

This of course sounded like a great initiative for me and I promptly filed an application.

We had our kick-off meeting on the 27th January, with the aim of getting to know each other and somehow define/clarify some of the objectives of the project. This post summarises my take on it.

Here’s how I introduced myself.

Who are you?

Laurent Gatto, Senior Research Associate in the Department of Biochemistry, physically located in Systems Biology and the Maths Department. SSI fellow and Software/Data Carpentry instructor and generally involved in the Open community in Cambridge, such as OpenConCam and Data Champions initiative.

What is your research about and what kind of data does your research generate?

My area of research is computational biology, with special focus on high-throughput proteomics and integration of different data and annotations. I use raw data produced by third parties, in particular the Cambridge Centre for Proteomics (mass spectrometry data), and produce processed/annotated/interactive data and a lot of software (and also here).

What motivated you to participate in the Pilot?

Improve openness/transparency (and hence reproducibility/rigour) in my research and communication, and participate in improving openness (and hence reproducibility/rigour) more widely.

What kind of outputs are you planning to share? Do you foresee any difficulties in sharing?

My direct outputs are systematically shared openly early on: open source software (before publication), pre-prints, improved data (as data packages). Difficulties, if any, generally stem from collaborators less willing to share early and openly.

A personal take on the project

It is a long project, 2 years, and hence a rather ambitious one, of a unique kind. Hence, we will have to define its overall goals as we go. The continued involvement of the participants over time will play a major role in the project’s success.

What are attainable goals?

It is important to note that there is no funding for the participants. We are driven by a desire to be open, benefit from being open and the visibility that we can gain through the project, and the prospect that the Wellcome Trust will learn from our experience and, implement any lessons learnt. We get to interact with each other and with research support librarians, who will help us throughout the duration of the project. We also commit to sharing of research outputs beyond traditional publications and to engage with the Project, by participating in Project meetings and contributing to Project publications.

A lot of our initial discussions centred around rewards for open research or, actually, lack thereof and perceived associated risks. Indeed, the traditional academic rewarding system and the competitiveness in research leaves little room for reproducibility and openness. It is, I believe, all participants hope that this project will benefit us, in some form or another.

A critical point that is missing is the academic promotion of open research and open researcher, as a way to promote a more rigorous and sound research process and tackle the reproducibility crisis. What should the incentives be? How to make sure that the next generation of academics genuinely value openness and transparency as a foundation of rigorous research?

Some desired outputs

Ideally, I would like that the Wellcome Trust’s famous Research investigator awards to be de facto Open research investigator awards. There’s currently a split (opposition?) between doing research and supporting open science when doing research. In every grant I have written, I had to demonstrate that the team had a track record, or was in a good position to successfully pursue to proposed project. Well, how about demonstrating a track record in being good in opening and sharing science outputs? Every researcher submitting a grant should convincingly demonstrate that they are, have been and/or will be proactive open researcher and openly disseminate all the outputs. By leading by example in the frame of this Open Research Project, this is something that the Wellcome Trust could take away from.

Unfortunately, it is a fact that open science is not on the agenda of many (most?) more senior researchers and that they are neither in a position to be open nor that open science is a priority at all. I find it particularly disheartening that many senior academics (i.e. those that will sit on the panel deciding if I’m worth my next job) consider investing time in open science and the promotion of open science as time wasted of actually doing research. A bit like time for outreach and promotion of science to the wider public is sometimes looked down at, as not being the real stuff.

Another desire is that this project will enable us to influence funders, such as the Wellcome Trust, of course, but also more widely the research councils.

As a concrete example, I would like all grants that are accepted to be openly published beyond the daft layman summary. Published grants after acceptance should include data management plan, the pathway to impact, possibly more, and these could then be used to assess to what extend the project delivered as promised.

This serves at least two purposes. First, it is a way to promote transparency and accountability towards the funder, scientific community and public. Also, it is a great resource for early career researchers. Unless there is specific support in place, writing a first grant is not an easy job, especially given the multitude documents to prepare in addition to the scientific case for support. And even for more experienced researchers, it can’t harm to explore different approaches to grant writing.

Another concrete output is the requirement for a dedicated software management plan for each grant that involves any software development. I certainly consider my software to be equivalent to data and document it as such in my DMPs, but there seems to be a need for clarification.

I believe that I do a pretty decent job in conducting open science: pre-prints, open access, release data, … In the frame of this project, I shall do a better job at promoting open science for its own sake.

I also hope that by bringing some of my projects under the umbrella of the the Open Research Project, I will benefit from a broader dissemination that will, directly or indirectly, be beneficial for my career (see the importance of benefits and rewards above).

Next steps

It is important to make the most out of this unique opportunity. We need to create a momentum, define ambitious goals, and work hard to reach them. But I also think that it is important to get as much input as possible from the community. Nothing beats collective intelligence for such open-ended projects, in particular for open projects.

So please, do not hesitate to comment, discuss on twitter or elsewhere, or email me directly if you have ideas you would like to promote and or discuss.

Published 08 March 2017
Written by Dr Laurent Gatto
Creative Commons License