Tag Archives: digital curation

Milestone -1000 datasets in Cambridge’s repository

Last week, Cambridge celebrated a huge milestone – the deposit of the 1000th dataset to our repository Apollo since the launch of the Research Data Facility in early 2015. This is the culmination of a huge amount of work by the team in the Office of Scholarly Communication, in terms of developing systems, workflows, policies and through an extensive advocacy campaign. The Research Data team have run 118 events over the past couple of years and published 39 blogs.

In the past 12 months alone there have been 26000 downloads of the data in Apollo. In some cases the dataset has been downloaded many times – 117 – and the data has featured in news, blogs and Twitter.

An event was held at Cambridge University Library last week to celebrate this milestone.


Opening remarks

The Director of Library Services, Dr Jess Gardner opened proceedings with a speech where she noted “the Research Data Services and all who sail in her are at the core of our mission in our research library”.

Dr Gardner referred to the library’s long and proud history of collecting and managing research data that “began on vellum, paper, stone and bone”. The research data of luminaries such as Isaac Newton and Charles Darwin was on paper and, she noted “we have preserved that with great care and share it openly on line through our digital library.”

Turning to the future, Dr Gardner observed: “But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical is a core part of what we now do.”

“In the 21st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible,” she noted.  “It is about sharing freely wherever it is appropriate to do so”. [Dr Gardner’s speech is in full at the end of this post.]

Perspectives from a researcher

The second speaker was Zoe Adams, a PhD student at Cambridge who talked about the work she has done with Professor Simon Deakin on the Labour Regulation Index in association with the Centre for Business Research.

Ms Adams noted it was only in retrospect she could “appreciate the benefit of working in a collaborative project and open research generally”. She discussed how helpful it had been as an early career researcher to be “associated with something that was freely available”. She observed that few of her peers had many citations, and the reason she did was because “the dataset is online, people use the data, they cite the data, and cite me”.

Working openly has also improved the way she works, she explained, saying “It has given me a new perspective on what research should be about. …  It gives me a sense that people are relying on this data to be accurate and that does change the way you approach it.”

View from the team

The final speaker was Dr Lauren Cadwallader, Joint Deputy Head of the OSC with responsibility for the Research Data Facility, who discussed the “showcase dataset of the data that we can produce in the OSC” which is  taken from usage of our Request a Copy service.

Dr Cadwallader noted there has been an increase in the requests for theses over time. “This is a really exciting observation because the Board of Graduate studies have agreed that all students should deposit a digital copy of their thesis in our repository,” she said. “So it is really nice evidence that we can show our PhD students that by putting a copy in the repository people can read it and people do want to read theses in our repository.”

One observation was that several of the theses that were requested were written 60 years ago, so the repository is sharing older research as well. The topics of these theses covered algebra, Yorkshire evangalists and one of the oldest requested theses was written in 1927 about the Falkland Islands. “So there is a longevity in research and we have a duty to provide access to that research, ” she said.

Thanks go to…

The dataset itself is one created by the OSC team looking at the usage of our Request a Copy service. The analysis undertaken by Peter Sutton Long and we recently published a blog post about the findings.

The music played at the event was complied by Tony Malone and covers almost 1000 years of music, from Laura Cannell’s reworking of Hildegard of Bingen, to Jane Weaver’s Modern Cosmology. There are acknowledgments to Apollo, and Cambridge too. The soundtrack is available for those interested in listening.

This achievement is entirely due to the incredible work of the team in the Research Data Facility and their ability to engage with colleagues across the institution, the nation and the world. In particular the vision and dedication of Dr Marta Teperek cannot be understated.

In the words of Dr Gardner: “They have made our mission different, they have made our mission better, through the work they have achieved and the commitment they have.”

The event was supported by the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.



Published 21 September 2017
Written by Dr Danny Kingsley
Creative Commons License

Speech by Dr Jess Gardner

First let us begin with some headline numbers. One thousand datasets. This is hugely significant and a very high level when looking at research repositories around the country. There is every reason to be proud of that achievement and what it means for open research.

There have been 26000 downloads of that data in the past 12 months alone – that is about use and reuse of our research data and is changing the face of how we do research. Some of these datasets have been downloaded 117 times and used in news, blogs and Twitter. The Research Data team have written 39 blogs about research data and have run 118 events, most of these have been with researchers.

While the headline numbers give us a sense of volume, perhaps let’s talk about the underlying rationale and philosophy behind this, which is core.

Cambridge University Library has a 600 year old history we are very proud of. In that time we have had an abiding responsibility to collect, care for and make available for use and reuse, information and research objects that form part of the intrinsic international scholarly record of which Cambridge has been such a strong part. And the ability for those ideas to inspire new ideas. The collection began on vellum, paper, and stone and bone.

And today much of that of course is digital. You can’t see that in the same way you can see the manuscripts and collections. It is sometimes hard to grasp when we are in this grand old dame of a building that I dare you not to love. It is home to the physical papers of such greats as Isaac Newton and Charles Darwin. Their research data was on paper and we have preserved that with great care and share it openly on line through our digital library. But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical and that is a core part of what we now do.

And the people in this room have changed that. They have made our mission different, they have made our mission better through the work they have achieved and the commitment they have.

Philosophically this is very natural extension of what we have done in the Library and the open library and its great research community for which this very building is designed. Some of you may know there is a philosophy behind this building and the famous ‘open library Cambridge’. In the 19th century and 20th century that was mostly about our open stack of books and we have quite a few of them, we are a little weighed down by them.

Our research data weighs less but it is just as significant and in the 21st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible. It is about sharing freely wherever it is appropriate to do so and there are many reasons why data isn’t open sometimes, and that is fine. What we are looking for is managing so we can make those choices appropriately, just as we have with the archive for many, many years.

So whilst as there is a fantastic achievement to mark tonight with those 1000 datasets it really is significant, we are really celebrating a deeper milestone with our research partners, our data champions, our colleagues in the research office and in the libraries across Cambridge, and that is about the changing role in research support and library research support in the digital age, and I think that is something we should be very proud of in terms of what we have achieved at Cambridge. I certainly am.

I am relatively new here at Cambridge. One of the things that was said to me when I was first appointed to the job was how lucky I was to be working at this University but also with the Office of Scholarly Communication in particular and that has proved to be absolutely true. I like to take this opportunity to note that achievement of 1000 datasets and to state very publicly that the Research Data Services and all who sail in her are at the core of our mission in our research library. But also to thank you and the teams involved for your superb achievements. It really is something to be very proud of and I thank you.


Data management – one size does not fit all

As the Research Data Facilitator at the University of Cambridge, I am part of the team establishing a Research Data Management (RDM) Facility at the University. This blog is a note of my impressions from the Digital Curation Centre (DCC) meeting held in London on the 28th April 2015: Preparing Data for Deposit.

As always, the DCC meeting was extremely useful for networking. I met with people at similar roles at other institutions. And again, the breakout sessions were invaluable – they allowed us to exchange precious experience, feedback gained and lessons learnt while developing RDM services.

What could have been done better though is more appreciation for differences between universities.

Unrealistic staffing

The talk from the keynote speaker, Louise Corti, the Associate Director at the UK Data Service, was very inspirational. I loved the uplifting expression that RDM supporters are like artists evangelising researchers. It was great to hear about RDM solutions available at the UK Data Service, and the professional approach to research data, with every aspect of data curation addressed by the excellent team of 70 dedicated people, with precise workflows for data processing.

However, how realistic it is for a university to develop similar solutions locally? Which University would be able to dedicate similar amount of resources for the development of an RDM facility?

At the University of Cambridge, I am the only full-time employee dedicated to work on establishment and provision of RDM services to our researchers. There is a team of people supporting the facility but these staff are shared with other projects. I would have very much appreciated what would be the scalable solution that the UK Data Service could recommend universities to develop, knowing that resources available are nowhere near what a 70 people team could offer.


On the other hand, we had a presentation from the University of Loughborough. The University, represented by Gary Brewerton, teamed up with Figshare and Arkivum (Mark Hahnel and Matthew Addis, respectively). The three of them explained to us the infrastructure developed to support RDM management at the University of Loughborough. The University data repository, DSpace, has been equipped with archival storage provided by Arkivum, which guarantees 100% data integrity. Additionally, researchers at the University of Loughborough can benefit from the use of Figshare, which provides them with a user-friendly research data sharing platform.

These systems seemed to offer excellent solutions to researchers, but somehow I could not help having the impression of listening to sales pitches. Are there any disadvantages of these solutions? Are there any alternatives?

Figshare charges for the file transfer (downloading of openly accessible data is actually not free for institutions). How substantial would be these charges for bigger institutions, producing huge amounts of valuable research data, frequently sought after and downloaded by others? Would institutions be able to sustain the cost of data access to their most valuable research datasets?

Risk management

The Loughborough solutions do not appear to take into account risks associated with implementation of services from third party providers at bigger, research-intense universities. At the University of Cambridge we have almost 300 EPSRC-funded research grants. In April this year alone our data repository received 40GB of research data deposits coming from EPSRC-funded projects. Producing valuable research outputs is business-critical for universities.

What would be the costs associated with the data transfer of supposedly open-access datasets if these were available via Figshare? Is there any upper limit on possible transfer charges?

What is the long-term risk of handing over university’s research data holdings to a third party service provider? Note that some UK research funders expect data to be stored long-term, and in some cases in perpetuity (10 years from the last access). What will be the conditions for research data storage offered by these external providers in 10, 20, 30 years time? How will the cost change? Will it be easy/possible to transfer all research data somewhere else?

Figshare has recently entered into a legal partnership with Macmillan (you can read more about it in a blog post from Dr Peter Murray-Rust) – how will this partnership evolve in the future?


It would be extremely valuable if RDM solutions proposed at DCC meetings could be discussed taking into account the size of the institution, the amount of research conducted at the University, and the size of the RDM team locally available to work on the implementation of the solution.

One size does not and will not fit all, and a better recognition of differences between organisations would greatly help developing optimal solutions for each individual institution. Additionally, it seems to me of key importance to openly talk about drawbacks of each solution for universities to efficiently mitigate future risks.

Published 14 May 2015
Written by Dr Marta Teperek
Creative Commons License