Tag Archives: research

Forget compliance. Consider the bigger RDM picture

The Office of Scholarly Communication sent Dr Marta Teperek, our Research Data Facility Manager to the  International Digital Curation Conference held in in Amsterdam on 22-25 February 2016. This is her report from the event.

Fantastic! This was my first IDCC meeting and already I can’t wait for next year. There was not only amazing content in high quality workshops and conference papers, but also a great opportunity to network with data professionals from across the globe. And it was so refreshing to set aside our UK problem of compliance with data sharing policies, to instead really focus on the bigger picture: why it is so important to manage and share research data and how to do it best.

Three useful workshops

The first day started really intensely – the plan was for one full day or two half-day workshops, but I managed to squeeze in three workshops in one day.

Context is key when it comes to data sharing

The morning workshop was entitled “A Context-driven Approach to Data Curation for Reuse” by Ixchel Faniel (OCLC), Elizabeth Yakel (University of Michigan), Kathleen Fear (University of Rochester) and Eric Kansa (Open Context). We were split into small groups and asked to decide what was the most important information about datasets from the re-user’s point of view. Would the re-user care about the objects themselves? Would s/he want to get hints about how to use the data?

We all had difficulties in arranging the necessary information in order of usefulness. Subsequently, we were asked to re-order the information according to the importance from the point of view of repository managers. And the take-home message was that for all of the groups the information about datasets required by the re-user was the not same as that required from the repository.

In addition, the presenters provided discipline-specific context based on interviews with researchers – depending on the research discipline, different information about datasets was considered the most important. For example, for zoologists, the information about specimen was very important, but it was of negligible importance to social scientists. So context is crucial for the collection of appropriate metadata information. Insufficient contextual information makes data not useful.

So what can institutional repositories do to address these issues? If research carried out within a given institution only covers certain disciplines, then institutional repositories could relatively easily contextualise metadata information being collected and presented for discovery. However, repositories hosting research from many different disciplines will find this much more difficult to address. For example, Cambridge repository has to host research spanning across particle physics, engineering, economics, archaeology, zoology, clinical medicine and many, many others. This makes it much more difficult (if not impossible) to contextualise the metadata.

It is not surprising that information most important from the repository’s point of view is different that the most important information required by the data re-users. In order to ensure that research data can be effectively shared and preserved in long term, repositories need to collect certain amount of administrative metadata: who deposited the data, what are the file formats, what are the data access conditions etc. However, repositories should collect as much administrative metadata as possible in an automated way. For example, if the user logs in to deposit data, all the relevant information about the user should be automatically harvested by feeds from human resources systems.

EUDAT – Pan-European infrastructure for research data

The next workshop was about EUDAT – the collaborative Pan-European infrastructure providing research data services, training and consultancy for researchers. EUDAT is an impressive project funded by Horizon2020 grant and it offers five different types of services to researchers:

  • B2DROP – a secure and trusted data exchange service to keep research data synchronized, up-to-date and easy to exchange with other researchers;
  • B2SHARE – service for storing and sharing small-scale research data from diverse contexts;
  • B2SAFE – service to safely store research data by replicating it and depositing at multiple trusted repositories (additional data backups);
  • B2STAGE – service to transfer datasets between EUDAT storage resources and high-performance computing (HPC) workspaces;
  • B2FIND – discovery service harvesting metadata from research data collections from EUDAT data centres and other repositories.

The project has a wide range of services on offer and is currently looking for institutions to pilot these services with. I personally think these are services which (if successfully implemented) would be of a great value to Pan-European research community.

However, I have two reservations about the project:

  • Researchers are being encouraged to use EUDAT’s platforms to collaborate on their research projects and to share their research data. However, the funding for the project runs out in 2018. EUDAT team is now investigating options to ensure the sustainability and future funding for the project, but what will happen to researchers’ data if the funding is not secured?
  • Perhaps if the funding is limited it would be more useful to focus the offering on the most useful services, which are not provided elsewhere. For example, another EC-funded project, Zenodo, already offers a user-friendly repository for research data; Open Science Framework offers a platform for collaboration and easy exchange of research data. Perhaps EUDAT could initially focus on developing services which are not provided elsewhere. For example, having a Pan-Europe service harvesting metadata from various data repositories and enabling data discovery is clearly much needed and would be extremely useful to have.

Jisc Shared RDM Services for UK institutions

I then attended the second half of Jisc workshop on shared Research Data Management services for UK institutions. The University of York and the University of Cambridge are two of 13 pilot institutions participating in the pilot. Jenny Mitcham from York and I gave presentations on our institutional perspectives on the pilot project: where we are at the moment and what are our key expectations from the pilot. Jenny gave an overview of an impressive work by her and her colleagues on addressing data preservation gaps at the University of York. Data preservation was one of the areas in which Cambridge hopes to get help from the Jisc RDM shared services project. Additionally, as we described before, Cambridge would greatly benefit from solutions for big data and for personal/sensitive data. My presentation from the session is available here.

Presentations were followed by breakout group discussions. Participants were asked to identify the areas of priorities for the Jisc RDM pilot. The top priority identified by all the groups seemed to be solutions for personal/sensitive data and for effective data access management. This was very interesting to me as at similar workshops held by Jisc in the UK, breakout groups prioritised interoperability with their existing institutional systems and cost-effectiveness. This could be one of the unforeseen effects of strict funders’ research data policies in the UK, which required institutions to provide local repositories to share research data.

As a result of these policies, many institutions were tasked with creating institutional data repositories from scratch in a very short time. Most of the UK universities now have institutional repositories which allow research data to be uploaded and shared. However, very few universities have their repositories well integrated with other institutional systems. Not having the policy pressure in non-UK countries perhaps allowed institutions to think more strategically about developing their RDM service provisions and ensure that developed services are well embedded within the existing institutional infrastructure.

Conference papers and posters

The two following days were full of excellent talks. My main problem was which sessions to attend: talking with other attendees I am aware that the papers presented at parallel sessions were also extremely useful. If the budget allows, I certainly think that it would be useful for more participants from each institution to attend the meeting to cover more parallel sessions.

Below are my main reflections from keynote talks.

Barend Mons – Open Science as a Social Machine

This was a truly inspirational talk, raising a lot of thought-provoking discussions. Barend started from a reflection that more and more brilliant brains, with more and more powerful computers and with billions of smartphones, created a single, interconnected social super-machine. This machine generates data – vast amount of data – which is difficult to comprehend and work with, unless proper tools are used.

Barend mentioned that with the current speed of new knowledge being generated and papers being published, it is simply impossible for human brains to assimilate the constantly expanding amount of new knowledge. Brilliant brains need powerful computers to process the growing amount of information. But in order for science to be accessible to computers, we need to move away from pdfs. Our research needs to be machine-readable. And perhaps if publishers do not want to support machine-readability, we need to move away from the current publishing model.

Barend also stressed that if data is to be useful and correctly interpretable, it needs to be accessible not only to machines, but also to humans, and that effort is needed to make data well described. Barend said that research data without proper metadata description is useless (if not harmful). And how to make research data meaningful? Barend proposed a very compelling solution: no more research grants should be awarded without 5% of money dedicated for data stewardship.

I could not agree more with everything that Barend said. I hope that research funders will also support Barend’s statement.

Andrew Sallans – nudging people to improve their RDM practice

Andrew started his talk from a reflection that in order to improve our researchers’ RDM practice we need to do better than talking about compliance and about making data open. How a researcher is supposed to make data accessible, if the data was not properly managed in the first place? The Open Science Framework has been created with three mission statements:

  • Technology to enable change;
  • Training to enact change;
  • Incentives to embrace change.

So what is the Open Science Framework (OSF)? It is an open source platform to support researchers during the entire research lifecycle: from the start of the project, through data creation, editing and sharing with collaborators and concluding with data publication. What I find the most compelling about the OSF is that is allows one to easily connect various storage platforms and places where researchers collaborate on their data in one place: researchers can easily plug their resources stored on Dropbox, Googledrive, GitHub and many others.

To incentivise behavioural change among researchers, the OSF team came up with two other initiatives:

Personally, I couldn’t agree more with Andrew that enabling good data management practice should be the starting point. We can’t expect researchers to share their research data if we have not helped them with providing tools and support for good data management. However, I am not so sure about the idea of cash rewards.

In the end researchers become researchers because they want to share the outcomes of their research with the community. This is the principle behind academic research – the only way of moving ideas forward is to exchange findings with colleagues. Do researchers need to be paid extra to do the right thing? I personally do not think so and I believe that whoever decides to pursue an academic career is prepared to share. And it is our task to make data management and sharing as easy as possible, and the use of OSF will certainly be of a great aid for the community.

Susan Halford – the challenge of big data and social research

The last keynote was from Susan Halford. Susan’s talk was again very inspirational and thought-provoking. She talked about the growing excitement around big data and how trendy it has become; almost being perceived as a solution to every problem. However, Susan also pointed out the problems with big data. Simply increasing the computational power and not fully comprehending the questions and the methodology used can lead to serious misinterpretations of results. Susan concluded that when doing big data research one has to be extremely careful about choosing proper methodology for data analysis, reflecting on both the type of data being collected, as well as (inter)disciplinary norms.

Again – I could not agree more. Asking the right question and choosing the right methodology are key to make the right conclusions. But are these problems new to big data research? I personally think that we are all quite familiar with these challenges. Questions about the right experimental design and the right methodology have been known to humankind since scientific method is used.

Researchers always needed to design studies carefully before commencing to do the experiments: what will be the methodology, what are the necessary controls, what should be the sample size, what needs to happen for the study to be conclusive? To me this is not a problem of big data, to me this is a problem that needs to be addressed by every researcher from the very start of the project, regardless of the amount of data the project generates or analyses.

Birds of a Feather discussions

I had not experienced Birds of a Feather Discussions (BoF) before at a conference and I am absolutely amazed by the idea. Before the conference started the attendees were invited to propose ideas for discussions keeping in mind that BoF sessions might have the following scope:

  • Bringing together a niche community of interest;
  • Exploring an idea for a project, a standard, a piece of software, a book, an event or anything similar.

I proposed a session about sharing of personal/sensitive data. Luckily, the topic was selected for a discussion and I co-chaired the discussion together with Fiona Nielsen from Repositive. We both thought that the discussion was great and our blog post from the session is available here.

And again, I was very sorry to be the only attendee from Cambridge at the conference. There were four parallel discussions and since I was chairing one of them, I was unable to take part in the others. I would have liked to be able to participate in discussions on ‘Data visualisation’ and ‘Metadata Schemas’ as well.

Workshops: Appraisal, Quality Assurance and Risk Assessment

The last day was again devoted to workshops. I attended an excellent workshop from the Pericles project on the appraisal, quality assurance and risk assessment in research data management. The project was about how an institutional repository should conduct data audits when accepting data deposits and also how to measure the risks of datasets becoming obsolete.

These are extremely difficult questions and due to their complexity, very difficult to address. Still, the project leaders realised the importance of addressing them systematically and ideally in an (semi)automated way by using specialised software to help repository managers making the right preservation decisions.

In a way I felt sorry for the presenters – their project progress and ambitions were so high that probably none of us, attendees, were able to critically contribute to the project – we were all deeply impressed by the high level of questions asked, but our own experience with data preservation and policy automation was nowhere at the level demonstrated by the workshop leaders.

My take home message from the workshop is that proper audit of ingested data is of crucial importance. Even if there is no automation of risk assessment possible, repository managers should at least collect information about files being deposited to be able to assess the likelihood of their obsolescence in the future. Or at least to be able to identify key file formats/software types as selected preservation targets to ensure that the key datasets do not become obsolete. For me the workshop was a real highlight of the conference.

Networking and the positive energy

Lots of useful workshops, plenty of thought-provoking talks. But for me one of the most important parts of the conference was meeting with great colleagues and having fascinating discussions about data management practices. I never thought I could spend an evening (night?) with people who would be willing to talk about research data without the slightest sights of boredom. And the most joyful and refreshing part of the conference was that due to the fact we were from across the globe, our discussions diverted away from the compliance aspect of data policies. Free from policy, we were able to address issues of how to best support research data management: how to best help researchers, what are our priority needs, what data managers should do first with our limited resources.

I am looking forward to catching up next year with all the colleagues I have met in Amsterdam and to see what progress we will have all made with our projects and what should be our collective next moves.

Summarising, I came back with lots of new ideas and full of energy and good attitude – ready to advocate for the bigger picture and the greater good. I came back exhausted, but I cannot imagine spending four days any more productively and fruitfully than at IDCC.

Thanks so much to the organisers and to all the participants!

Published 8 March 2016
Written by Dr Marta Teperek

Creative Commons License

‘It is all a bit of a mess’ – observations from Researcher to Reader conference

“It is all a bit of a mess. It used to be simple. Now it is complicated.” This was the conclusion of Mark Carden, the coordinator of the Researcher to Reader conference after two days of discussion, debate and workshops about scholarly publication..

The conference bills itself as: ‘The premier forum for discussion of the international scholarly content supply chain – bringing knowledge from the Researcher to the Reader.’ It was unusual because it mixed ‘tribes’ who usually go to separate conferences. Publishers made up 47% of the group, Libraries were next with 17%, Technology 14%, Distributors were 9% and there were a small number of academics and others.

In addition to talks and panel discussions there were workshop groups that used the format of smaller groups that met three times and were asked to come up with proposals. In order to keep this blog to a manageable length it does not include the discussions from the workshops.

The talks were filmed and will be available. There was also a very active Twitter discussion at #R2RConf.  This blog is my attempt to summarise the points that emerged from the conference.

Suggestions, ideas and salient points that came up

  • Journals are dead – the publishing future is the platform
  • Journals are not dead – but we don’t need issues any more as they are entirely redundant in an online environment
  • Publishing in a journal benefits the author not the reader
  • Dissemination is no longer the value added offered by publishers. Anyone can have a blog. The value-add is branding
  • The drivers for choosing research areas are what has been recently published, not what is needed by society
  • All research is generated from what was published the year before – and we can prove it
  • Why don’t we disaggregate the APC model and charge for sections of the service separately?
  • You need to provide good service to the free users if you want to build a premium product
  • The most valuable commodity as an editor is your reviewer time
  • Peer review is inconsistent and systematically biased.
  • The greater the novelty of the work the greater likelihood it is to have a negative review
  • Poor academic writing is rewarded

Life After the Death of Science Journals – How the article is the future of scholarly communication

Vitek Tracz, the Chairman of the Science Navigation Group which produces the F1000Research series of publishing platforms was the keynote speaker. He argued that we are coming to the end of journals. One of the issues with journals is that the essence of journals is selection. The referee system is secret – the editors won’t usually tell the author who the referee is because the referee is working for the editor not the author. The main task of peer review is to accept or reject the work – there may be some idea to improve the paper. But that decision is not taken by the referees, but by the editor who has the Impact Factor to consider.

This system allows for information to be published that should not be published – eventually all publications will find somewhere to publish. Even in high level journals many papers cannot be replicated. A survey by PubMed found there was no correlation between impact factor and likelihood of an abstract being looked at on PubMed.

Readers can now get papers they want by themselves and create their own collections that interest them. But authors need journals because IF is so deeply embedded. Placement in a prestigious journal doesn’t increase readership, but it does increase likelihood of getting tenure. So authors need journals, readers don’t.

Vitek noted F1000Research “are not publishers – because we do not own any titles and don’t want to”. Instead they offer tools and services. It is not publishing in the traditional sense because there is no decision to publish or not publish something – that process is completely driven by authors. He predicted this will be the future of science publishing will shift from journals to services (there will be more tools & publishing directly on funder platforms).

In response to a question about impact factor and author motivation change, Vitek said “the only way of stopping impact factors as a thing is to bring the end of journals”. This aligns with the conclusions in a paper I co-authored some years ago. ‘The publishing imperative: the pervasive influence of publication metrics’

Author Behaviours

Vicky Williams, the CEO of research communications company Research Media discussed “Maximising the visibility and impact of research” and talked abut the need to translate complex ideas in research into understandable language.

She noted that the public does want to engage with research. A large percentage of public want to know about research while it is happening. However they see communication about research is poor. There is low trust in science journalism.

Vicki noted the different funding drivers – now funding is very heavily distributed. Research institutions have to look at alternative funding options. Now we have students as consumers – they are mobile and create demand. Traditional content formats are being challenged.

As a result institutions are needing to compete for talent. They need to build relationships with industry – and promotion is a way of achieving that. Most universities have a strong emphasis on outreach and engagement.

This means we need a different language, different tone and a different medium. However academic outputs are written for other academics. Most research is impenetrable for other audiences. This has long been a bugbear of mine (see ‘Express yourself scientists, speaking plainly isn’t beneath you’).

Vicki outlined some steps to showcase research – having a communications plan, network with colleagues, create a lay summary, use visual aids, engage. She argued that this acts as a research CV.

Rick Anderson, the Associate Dean of the University of Utah talked about the Deeply Weird Ecosystem of publishing. Rick noted that publication is deeply weird, with many different players – authors (send papers out), publishers (send out publications), readers (demand subscriptions), libraries (subscribe or cancel). All players send signals out into the school communications ecosystem, when we send signals out we get partial and distorted signals back.

An example is that publishers set prices without knowing the value of the content. The content they control is unique – there are no substitutable products.

He also noted there is a growing provenance of funding with strings. Now funders are imposing conditions on how you want to publish it not just the narrative of the research but the underlying data. In addition the institution you work for might have rules about how to publish in particular ways.

Rick urged authors answer the question ‘what is my main reason for publishing’ – not for writing. In reality it is primarily to have high impact publishing. By choosing to publish in a particular journal an author is casting a vote for their future. ‘Who has power over my future – do they care about where I publish? I should take notice of that’. He said that ‘If publish with Elsevier I turn control over to them, publishing in PLOS turns control over to the world’.

Rick mentioned some journal selection tools. JANE is a system (oriented to biological sciences) where authors can plug in abstract to a search box and it analyses the language and comes up with suggested list of journals. The Committee on Publication Ethics (COPE) member list provides a ‘white list’ of publishers. Journal Guide helps researchers select an appropriate journal for publication.

A tweet noted that “Librarians and researchers are overwhelmed by the range of tools available – we need a curator to help pick out the best”.

Peer review

Alice Ellingham who is Director of Editorial Office Ltd which runs online journal editorial services for publishers and societies discussed ‘Why peer review can never be free (even if your paper is perfect)’. Alice discussed the different processes associated with securing and chasing peer review.

She said the unseen cost of peer review is communication, when they are providing assistance to all participants. She estimated that per submission it takes about 45-50 minutes per paper to manage the peer review. 

Editorial Office tasks include looking for scope of a paper, the submission policy, checking ethics, checking declarations like competing interests and funding requests. Then they organise the review, assist the editors to make a decision, do the copy editing and technical editing.

Alice used an animal analogy – the cheetah representing the speed of peer review that authors would like to see, but a tortoise represented what they experience. This was very interesting given the Nature news piece that was published on 10 February “Does it take too long to publish research?

Will Frass is a Research Executive at Taylor & Francis and discussed the findings of a T&F study “Peer review in 2015 – A global view”. This is a substantial report and I won’t be able to do his talk justice here, there is some information about the report here, and a news report about it here.

One of the comments that struck me was that researchers in the sciences are generally more comfortable with single blind review than in the humanities. Will noted that because there are small niches in STM, double blind often becomes single blind anyway as they all know each other.

A question from the floor was that reviewers spend eight hours on a paper and their time is more important than publishers’. The question was asking what publishers can do to support peer review? While this was not really answered on the floor* it did cause a bit of a flurry on Twitter with a discussion about whether the time spent is indeed five hours or eight hours – quoting different studies.

*As a general observation, given that half of the participants at the conference were publishers, they were very underrepresented in the comment and discussion. This included the numerous times when a query or challenge was put out to the publishers in the room. As someone who works collaboratively and openly, this was somewhat frustrating.

The Sociology of Research

Professor James Evans, who is a sociologist looking at the science of science at the University of Chicago spoke about How research scientists actually behave as individuals and in groups.

His work focuses on the idea of using data from the publication process that tell rich stories into the process of science. James spoke about some recent research results relating to the reading and writing of science including peer reviews and the publication of science, research and rewarding science.

James compared the effect of writing styles to see what is effective in terms of reward (citations). He pitted ‘clarity’ – using few words and sentences, the present tense, and maintaining the message on point against ‘promotion’ – where the author claims novelty, uses superlatives and active words.

The research found writing with clarity is associated with fewer citations and writing in promotional style is associated with greater citations. So redundancy and length of clauses and mixed metaphors end up enhancing a paper’s search ability. This harks back to the conversation about poor academic writing the day before – bad writing is rewarded.

Scientists write to influence reviewers and editors in the process. Scientists strategically understand the class of people who will review their work and know they will be flattered when they see their own research. They use strategic citation practices.

James noted that even though peer review is the gold standard for evaluating the scientific record. In terms of determining the importance or significance of scientific works his research shows peer review is inconsistent and systematically biased. The greater the reviewer distance results in more positive reviews. This is possibly because if a person is reviewing work close to their speciality, they can see all the criticism. The greater the novelty of the work the greater likelihood it is to have a negative review. It is possible to ‘game’ this by driving the peer review panels. James expressed his dislike of the institution of suggesting reviewers. These provide more positive, influential and worse reviews (according to the editors).

Scientists understand the novelty bias so they downplay the new elements to the old elements. James discussed Thomas Kuhn’s concept of the ‘essential tension’ between the classes of ‘career considerations’ – which result in job security, publication, tenure (following the crowd) and ‘fame’ – which results in Nature papers, and hopefully a Nobel Prize.

This is a challenge because the optimal question for science becomes a problem for the optimal question for a scientific career. We are sacrificing pursuing a diffuse range of research areas for hubs of research areas because of the career issue.

The centre of the research cycle is publication rather than the ‘problems in the world’ that need addressing. Publications bear the seeds of discovery and represent how science as a system thinks. Data from the publication process can be used to tune, critique and reimagine that process.

James demonstrated his research that clearly shows that research today is driven by last year’s publications. Literally. The work takes a given paper and extracts the authors, the diseases, the chemicals etc and then uses a ‘random walk’ program. The result ends up predicting 95% of the combinations of authors and diseases and chemicals in the following year.

However scientists think they are getting their ideas, the actual origin is traceable in the literature. This means that research directions are not driven by global or local health needs for example.

Panel: Show me the Money

I sat on this panel discussion about ‘The financial implications of open access for researchers, intermediaries and readers’ which made it challenging to take notes (!) but two things that struck me in the discussions were:

Rick Andersen suggested that when people talk about ‘percentages’ in terms of research budgets they don’t want you to think about the absolute number, noting that 1% of Wellcome Trust research budget is $7 million and 1% of the NIH research budget is $350 million.

Toby Green, the Head of Publishing for the OECD put out a challenge to the publishers in the audience. He noted that airlines have split up the cost of travel into different components (you pay for food or luggage etc, or can choose not to), and suggested that publishers split APCs to pay for different aspects of the service they offer and allow people to choose different elements. The OECD has moved to a Freemium model where that the payment comes from a small number of premium users – that funds the free side.

As – rather depressingly – is common in these kinds of discussions, the general feeling was that open access is all about compliance and is too expensive. While I am on the record as saying that the way the UK is approaching open access is not financially sustainable, I do tire of the ‘open access is code for compliance’ conversation. This is one of the unexpected consequences of the current UK open access policy landscape. I was forced to yet again remind the group that open access is not about compliance, it is about providing public access to publicly funded research so people who are not in well resourced institutions can also see this research.

Research in Institutions

Graham Stone, the Information Resources Manager, University of Huddersfield talked about work he has done on the life cycle of open access for publishers, researchers and libraries. His slides are available.

Graham discussed how to get open access to work to our advantage, saying we need to get it embedded. OAWAL is trying to get librarians who have had nothing to do with OA into OA.

Graham talked the group through the UK Open Access Life Cycle which maps the research lifecycle for librarians and repository managers, research managers, fo authors (who think magic happens) and publishers.

My talk was titled ‘Getting an Octopus into a String Bag’. This discussed the complexity of communicating with the research community across a higher education institution. The slides are available.

The talk discussed the complex policy landscape, the tribal nature of the academic community, the complexity of the structure in Cambridge and then looked at some of the ways we are trying to reach out to our community.

While there was nothing really new from my perspective – it is well known in research management circles that communicating with the research community – as an independent and autonomous group – is challenging. This is of course further complicated by the structure of Cambridge. But in preliminary discussions about the conference, Mark Carden, the conference organiser, assured me that this would be news to the large number of publishers and others who are not in a higher education institution in the audience.

Summary: What does everybody want?

Mark Carden summarised the conference by talking about the different things different stakeholder in the publishing game want.

Researchers/Authors – mostly they want to be left alone to get on with their research. They want to get promoted and get tenure. They don’t want to follow rules.

Readers – want content to be free or cheap (or really expensive as long as something else is paying). Authors (who are readers) do care about the journals being cancelled if it is one they are published in. They want a nice clear easy interface because they are accessing research on different publisher’s webpages. They don’t think about ‘you get what you pay for.’

Institutions – don’t want to be in trouble with the regulators, want to look good in league tables, don’t want to get into arguments with faculty, don’t want to spend any money on this stuff.

Libraries – Hark back to the good old days. They wanted manageable journal subscriptions, wanted free stuff, expensive subscriptions that justified ERM. Now libraries are reaching out for new roles and asking should we be publishers, or taking over the Office of Research, or a repository or managing APCs?

Politicians – want free public access to publicly funded research. They love free stuff to give away (especially other people’s free stuff).

Funders – want to be confusing, want to be bossy or directive. They want to mandate the output medium and mandate copyright rules. They want possibly to become publishers. Mark noted there are some state controlled issues here.

Publishers – “want to give huge piles of cash to their shareholders and want to be evil” (a joke). Want to keep their business model – there is a conservatism in there. They like to be able to pay their staff. Publishers would like to realise their brand value, attract paying subscribers, and go on doing most of the things they do. They want to avoid Freemium. Publishers could be a platform or a mega journal. They should focus on articles and forget about issues and embrace continuous publishing. They need to manage versioning.

Reviewers – apparently want to do less copy editing, but this is a lot of what they do. Reviewers are conflicted. They want openness and anonymity, slick processes and flexibility, fast turnaround and lax timetables. Mark noted that while reviewers want credit or points or money or something, you would need to pay peer reviewers a lot for it to be worthwhile.

Conference organisers – want the debate to continue. They need publishers and suppliers to stay in business.

Published 18 February 2016
Written by Dr Danny Kingsley
Creative Commons License

Open Data – moving science forward or a waste of money & time?

On the 4 November the Research Data Facility at Cambridge University invited some inspirational leaders in the area of research data management and asked them to address the question: “is open data moving science forward or a waste of money & time?”. Below are Dr Marta Teperek’s impressions from the event.

Great discussion

Want to initiate a thought-provoking discussion on a controversial subject? The recipe is simple: invite inspirational leaders, bright people with curious minds and have an excellent chair. The outcome is guaranteed.

We asked some truly inspirational leaders in data management and sharing to come to Cambridge to talk to the community about the pros and cons of data sharing. We were honoured to have with us:

  • PRE_IntroSlide_V3_20151123Rafael Carazo-Salas, Group Leader, Department of Genetics, University of Cambridge
    @RafaCarazoSalas
  • Sarah Jones, Senior Institutional Support Officer from the Digital Curation Centre; @sjDCC
  • Frances Rawle, Head of Corporate Governance and Policy, Medical Research Council; @The_MRC
  • Tim Smith, Group Leader, Collaboration and Information Services, CERN/Zenodo; @TimSmithCH
  • Peter Murray-Rust, Molecular Informatics, Dept. of Chemistry, University of Cambridge, ContentMine; @petermurrayrust

The discussion was chaired by Dr Danny Kingsley, the Head of Scholarly Communication at the University of Cambridge (@dannykay68).

What is the definition of Open Data?

IMG_PMRWithText_V1_20151126The discussion started off with a request for a definition of what “open” meant. Both Peter and Sarah explained that ‘open’ in science was not simply a piece of paper saying ‘this is open’. Peter said that ‘open’ meant free to use, free to re-use, and free to re-distribute without permission. Open data needs to be usable, it needs to be described, and to be interpretable. Finally, if data is not discoverable, it is of no use to anyone. Sarah added that sharing is about making data useful. Making it useful also involves the use of open formats, and implies describing the data. Context is necessary for the data to be of any value to others.

What are the benefits of Open Data?

IMG_RCSWithText_V1_20151126Next came a quick question from Danny: “What are the benefits of Open Data”? followed by an immediate riposte from Rafael: “What aren’t the benefits of Open Data?”. Rafael explained that open data led to transparency in research, re-usability of data, benchmarking, integration, new discoveries and, most importantly, sharing data kept it alive. If data was not shared and instead simply kept on the computer’s hard drive, no one would remember it months after the initial publication. Sharing is the only way in which data can be used, cited, and built upon years after the publication. Frances added that research data originating from publicly funded research was funded by tax payers. Therefore, the value of research data should be maximised. Data sharing is important for research integrity and reproducibility and for ensuring better quality of science. Sarah said that the biggest benefit of sharing data was the wealth of re-uses of research data, which often could not be imagined at the time of creation.

Finally, Tim concluded that sharing of research is what made the wheels of science turn. He inspired further discussions by strong statements: “Sharing is not an if, it is a must – science is about sharing, science is about collectively coming to truths that you can then build on. If you don’t share enough information so that people can validate and build up on your findings, then it basically isn’t science – it’s just beliefs and opinions.”

IMG_TSWithText_V1_20151126Tim also stressed that if open science became institutionalised, and mandated through policies and rules, it would take a very long time before individual researchers would fully embrace it and start sharing their research as the default position.

I personally strongly agree with Tim’s statement. Mandating sharing without providing the support for it will lead to a perception that sharing is yet another administrative burden, and researchers will adopt the ‘minimal compliance’ approach towards sharing. We often observe this attitude amongst EPSRC-funded researchers (EPSRC is one of the UK funders with the strictest policy for sharing of research data). Instead, institutions should provide infrastructure, services, support and encouragement for sharing.

Big data

Data sharing is not without problems. One of the biggest issues nowadays it the problem of sharing of big data. Rafael stressed that with big data, it was extremely expensive not only to share, but even to store the data long-term. He stated that the biggest bottleneck in progress was to bridge the gap between the capacity to generate the data, and the capacity to make it useful. Tim admitted that sharing of big data was indeed difficult at the moment, but that the need would certainly drive innovation. He recalled that in the past people did not think that one day it would be possible just to stream videos instead of buying DVDs. Nowadays technologies exist which allow millions of people to watch the webcast of a live match at the same time – the need developed the tools. More and more people are looking at new ways of chunking and parallelisation of data downloads. Additionally, there is a change in the way in which the analysis is done – more and more of it is done remotely on central servers, and this eliminates the technical barriers of access to data.

Personal/sensitive data

IMG_FRWithText_V1_20151126Frances mentioned that in the case of personal and sensitive data, sharing was not as simple as in basic sciences disciplines. Especially in medical research, it often required provision of controlled access to data. It was not only important who would get the data, but also what they would do with it. Frances agreed with Tim that perhaps what was needed is a paradigm shift – that questions should be sent to the data, and not the data sent to the questions.

Shades of grey: in-between “open” and “closed”

Both the audience and the panellists agreed that almost no data was completely “open” and almost no data was completely “shut”. Tim explained that anything that gets research data off the laptop to a shared environment, even if it was shared only with a certain group, was already a massive step forward. Tim said: “Open Data does not mean immediately open to the entire world – anything that makes it off from where it is now is an important step forward and people should not be discouraged from doing so, just because it does not tick all the other checkboxes.” And this is yet another point where I personally agreed with Tim that institutionalising data sharing and policing the process is not the way forward. To the contrary, researchers should be encouraged to make small steps at a time, with the hope that the collective move forward will help achieving a cultural change embraced by the community.

Open Data and the future of publishing

Another interesting topic of the discussion was the future of publishing. Rafael started explaining that the way traditional publishing works had to change, as data was not two-dimensional anymore and in the digital era it could no longer be shared on a piece of paper. Ideally, researchers should be allowed to continue re-analysing data underpinning figures in publications. Research data underpinning figures should be clickable, re-formattable and interoperable – alive.

IMG_DKWithText_V1_20151126Danny mentioned that the traditional way of rewarding researchers was based on publishing and on journal impact factors. She asked whether publishing data could help to start rewarding the process of generating data and making it available. Sarah suggested that rather than having the formal peer review of data, it would be better to have an evaluation structure based on the re-use of data – for example, valuing data which was downloadable, well-labelled, re-usable.

Incentives for sharing research data

IMG_SJWithText_V1_20151126The final discussion was around incentives for data sharing. Sarah was the first one to suggest that the most persuasive incentive for data sharing is seeing the data being re-used and getting credit for it. She also stated that there was also an important role for funders and institutions to incentivise data sharing. If funders/institutions wished to mandate sharing, they also needed to reward it. Funders could do so when assessing grant proposals; institutions could do it when looking at academic promotions.

Conclusions and outlooks on the future

This was an extremely thought-provoking and well-coordinated discussion. And maybe due to the fact that many of the questions asked remained unanswered, both the panellists and the attendees enjoyed a long networking session with wine and nibbles after the discussion.

From my personal perspective, as an ex-researcher in life sciences, the greatest benefit of open data is the potential to drive a cultural change in academia. The current academic career progression is almost solely based on the impact factor of publications. The ‘prestige’ of your publications determines whether you will get funding, whether you will get a position, whether you will be able to continue your career as a researcher. This, connected with a frequently broken peer-review process, leads to a lot of frustration among researchers. What if you are not from the world’s top university or from a famous research group? Will you be able to still publish your work in a high impact factor journal? What if somebody scooped you when you were about to publish results of your five years’ long study? Will you be able to find a new position? As Danny suggested during the discussion, if researchers start publishing their data in the ‘open”’ there is a chance that the whole process of doing valuable research, making it useful and available to others will be rewarded and recognised. This fits well with Sarah’s ideas about evaluation structure based on the re-use of research data. In fact, more and more researchers go to the ‘open’ and use blog posts and social media to talk about their research and to discuss the work of their peers. With the use of persistent links research data can be now easily cited, and impact can be built directly on data citation and re-use, but one could also imagine some sort of badges for sharing good research data, awarded directly by the users. Perhaps in 10 or 20 years’ time the whole evaluation process will be done online, directly by peers, and researchers will be valued for their true contributions to science.

And perhaps the most important message for me, this time as a person who supports research data management services at the University of Cambridge, is to help researchers to really embrace the open data agenda. At the moment, open data is too frequently perceived as a burden, which, as Tim suggested, is most likely due to imposed policies and institutionalisation of the agenda. Instead of a stick, which results in the minimal compliance attitude, researchers need to see the opportunities and benefits of open data to sign up for the agenda. Therefore, the Institution needs to provide support services to make data sharing easy, but it is the community itself that needs to drive the change to “open”. And the community needs to be willing and convinced to do so.

Further resources

  • Click here to see the full recording of the Open Data Panel Discussion.
  • And here you can find a storified version of the event prepared by Kennedy Ikpe from the Open Data Team.

Thank you

We also wanted to express a special ‘thank you’ note to Dan Crane from the Library at the Department of Engineering, who helped us with all the logistics for the event and who made it happen.

Published 27 November 2015
Written by Dr Marta Teperek
Creative Commons License