Tag Archives: Apollo

Milestone -1000 datasets in Cambridge’s repository

Last week, Cambridge celebrated a huge milestone – the deposit of the 1000th dataset to our repository Apollo since the launch of the Research Data Facility in early 2015. This is the culmination of a huge amount of work by the team in the Office of Scholarly Communication, in terms of developing systems, workflows, policies and through an extensive advocacy campaign. The Research Data team have run 118 events over the past couple of years and published 39 blogs.

In the past 12 months alone there have been 26000 downloads of the data in Apollo. In some cases the dataset has been downloaded many times – 117 – and the data has featured in news, blogs and Twitter.

An event was held at Cambridge University Library last week to celebrate this milestone.

   

Opening remarks

The Director of Library Services, Dr Jess Gardner opened proceedings with a speech where she noted “the Research Data Services and all who sail in her are at the core of our mission in our research library”.

Dr Gardner referred to the library’s long and proud history of collecting and managing research data that “began on vellum, paper, stone and bone”. The research data of luminaries such as Isaac Newton and Charles Darwin was on paper and, she noted “we have preserved that with great care and share it openly on line through our digital library.”

Turning to the future, Dr Gardner observed: “But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical is a core part of what we now do.”

“In the 21st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible,” she noted.  “It is about sharing freely wherever it is appropriate to do so”. [Dr Gardner’s speech is in full at the end of this post.]

Perspectives from a researcher

The second speaker was Zoe Adams, a PhD student at Cambridge who talked about the work she has done with Professor Simon Deakin on the Labour Regulation Index in association with the Centre for Business Research.

Ms Adams noted it was only in retrospect she could “appreciate the benefit of working in a collaborative project and open research generally”. She discussed how helpful it had been as an early career researcher to be “associated with something that was freely available”. She observed that few of her peers had many citations, and the reason she did was because “the dataset is online, people use the data, they cite the data, and cite me”.

Working openly has also improved the way she works, she explained, saying “It has given me a new perspective on what research should be about. …  It gives me a sense that people are relying on this data to be accurate and that does change the way you approach it.”

View from the team

The final speaker was Dr Lauren Cadwallader, Joint Deputy Head of the OSC with responsibility for the Research Data Facility, who discussed the “showcase dataset of the data that we can produce in the OSC” which is  taken from usage of our Request a Copy service.

Dr Cadwallader noted there has been an increase in the requests for theses over time. “This is a really exciting observation because the Board of Graduate studies have agreed that all students should deposit a digital copy of their thesis in our repository,” she said. “So it is really nice evidence that we can show our PhD students that by putting a copy in the repository people can read it and people do want to read theses in our repository.”

One observation was that several of the theses that were requested were written 60 years ago, so the repository is sharing older research as well. The topics of these theses covered algebra, Yorkshire evangalists and one of the oldest requested theses was written in 1927 about the Falkland Islands. “So there is a longevity in research and we have a duty to provide access to that research, ” she said.

Thanks go to…

The dataset itself is one created by the OSC team looking at the usage of our Request a Copy service. The analysis undertaken by Peter Sutton Long and we recently published a blog post about the findings.

The music played at the event was complied by Tony Malone and covers almost 1000 years of music, from Laura Cannell’s reworking of Hildegard of Bingen, to Jane Weaver’s Modern Cosmology. There are acknowledgments to Apollo, and Cambridge too. The soundtrack is available for those interested in listening.

This achievement is entirely due to the incredible work of the team in the Research Data Facility and their ability to engage with colleagues across the institution, the nation and the world. In particular the vision and dedication of Dr Marta Teperek cannot be understated.

In the words of Dr Gardner: “They have made our mission different, they have made our mission better, through the work they have achieved and the commitment they have.”

The event was supported by the Arcadia Fund, a charitable fund of Lisbet Rausing and Peter Baldwin.

 

 

Published 21 September 2017
Written by Dr Danny Kingsley
Creative Commons License

Speech by Dr Jess Gardner

First let us begin with some headline numbers. One thousand datasets. This is hugely significant and a very high level when looking at research repositories around the country. There is every reason to be proud of that achievement and what it means for open research.

There have been 26000 downloads of that data in the past 12 months alone – that is about use and reuse of our research data and is changing the face of how we do research. Some of these datasets have been downloaded 117 times and used in news, blogs and Twitter. The Research Data team have written 39 blogs about research data and have run 118 events, most of these have been with researchers.

While the headline numbers give us a sense of volume, perhaps let’s talk about the underlying rationale and philosophy behind this, which is core.

Cambridge University Library has a 600 year old history we are very proud of. In that time we have had an abiding responsibility to collect, care for and make available for use and reuse, information and research objects that form part of the intrinsic international scholarly record of which Cambridge has been such a strong part. And the ability for those ideas to inspire new ideas. The collection began on vellum, paper, and stone and bone.

And today much of that of course is digital. You can’t see that in the same way you can see the manuscripts and collections. It is sometimes hard to grasp when we are in this grand old dame of a building that I dare you not to love. It is home to the physical papers of such greats as Isaac Newton and Charles Darwin. Their research data was on paper and we have preserved that with great care and share it openly on line through our digital library. But our responsibility now is today’s researcher and today’s scientists and people working across all disciplines across our great university. Our preservation stewardship of that research data from the digital humanities across the biomedical and that is a core part of what we now do.

And the people in this room have changed that. They have made our mission different, they have made our mission better through the work they have achieved and the commitment they have.

Philosophically this is very natural extension of what we have done in the Library and the open library and its great research community for which this very building is designed. Some of you may know there is a philosophy behind this building and the famous ‘open library Cambridge’. In the 19th century and 20th century that was mostly about our open stack of books and we have quite a few of them, we are a little weighed down by them.

Our research data weighs less but it is just as significant and in the 21st century our support and our overriding philosophy is all about supporting open research and opening data as widely as possible. It is about sharing freely wherever it is appropriate to do so and there are many reasons why data isn’t open sometimes, and that is fine. What we are looking for is managing so we can make those choices appropriately, just as we have with the archive for many, many years.

So whilst as there is a fantastic achievement to mark tonight with those 1000 datasets it really is significant, we are really celebrating a deeper milestone with our research partners, our data champions, our colleagues in the research office and in the libraries across Cambridge, and that is about the changing role in research support and library research support in the digital age, and I think that is something we should be very proud of in terms of what we have achieved at Cambridge. I certainly am.

I am relatively new here at Cambridge. One of the things that was said to me when I was first appointed to the job was how lucky I was to be working at this University but also with the Office of Scholarly Communication in particular and that has proved to be absolutely true. I like to take this opportunity to note that achievement of 1000 datasets and to state very publicly that the Research Data Services and all who sail in her are at the core of our mission in our research library. But also to thank you and the teams involved for your superb achievements. It really is something to be very proud of and I thank you.

 

Mission Open Access: the Apollo repository launches

IMG_2298To celebrate Open Access Week 2016, the Office of Scholarly Communication (OSC) officially launched ‘Apollo’, the University of Cambridge’s upgraded open access repository.

Researchers, University research staff and librarians gathered at the University’s Engineering Department to see a demonstration of the new features of Apollo, speak to some of the University’s Open Access Champions and raise a glass to launch the service.

The repository stores a range of content and provides different levels of access, but its primary focus is on providing open access to the University’s research publications.  Apollo forms an important part of the University’s provision for meeting research funder requirements for open access, enabling ‘Green’ access to publications.  The launch of the upgrade comes at an exciting time for the Office of Scholarly Communication, as the repository has recently received its 10,000th upload.

The Cambridge University Office of Scholarly Communication looks after all aspects of scholarly communication within the University. This ranges across the entire research lifecycle from searching for information and collaborators, through to authoring and copyright issues and finally the publication and dissemination process, leading into assessment. The OSC has responsibility for the open access and open data programs at the University in terms of compliance with funders’ policies, and delivers and manages the University’s digital repository, Apollo.

Cambridge University was one of a handful of ‘testbed ‘ institutions that participated in the early deployment and development of DSpace, and has been running a DSpace repository for over a decade. Over that time, Apollo has participated in a number of externally funded projects intended to better understand researcher requirements or improve the services it offers. These include: Incremental, DataTrain and PrePARe, which developed resources to support research data management and EPIC and Keeping Research Data Safe (KRDS), which focused on the repository’s preservation services.

IMG_2297Upgraded features

With the support of RCUK, the OSC have spent £43,000 to upgrade the repository. Cambridge is now leading the country by running DSpace Version 5.4, the most recent and most stable version of the application. This has given Apollo a modern and improved user-friendly interface.

Since the upgrade in May 2016, the repository has had close to 2 million views from actual people (not machines!)

The upgrade means we can now increase the services offered by the repository.  Digital Object Identifiers, or DOIs, can be minted in-house. The Open Access team has minted over 6000 DOIs since May for articles, theses, datasets and other research outputs.

In addition, people identifiers – Author ORCIDs – are now displayed in the repository. The repository is interoperable with other systems and sends ORCIDs  to Datacite, which might allow repository items to be automatically populated into Authors’ ORCID profiles in the future.

Perhaps the most exciting integration is with the University’s publication management system Symplectic, allowing for easier reporting of Open Access compliance.

Request a Copy

Part of the upgrade involved the introduction of a new feature called ‘Request a Copy,  designed to open up the University’s most current research to a wider audience.  ‘Request a Copy’ operates on the principle of peer-to-peer sharing – if an item in Apollo is not yet available to the public, a repository user can ask the author for a copy of the item.  Authors sharing copies of their work on an individual basis falls outside the publisher’s copyright restrictions; here, the repository is acting as a facilitator to a process which happens anyway.

The Request a Copy button has been much more successful than we anticipated, particularly because there is no actual ‘button’.  By the end of September 2016 (four months after the introduction of ‘Request a copy’), we had received 1120 requests (approximately 280 requests per month), with two thirds for articles. Apart from a small number of requests for datasets, the remaining third were for theses.

Of the requests for articles during this period, 38% were fulfilled by the author sending a copy via the repository, and 4% were rejected by clicking the ‘Don’t send a copy’ button.

Of the articles requested during this period 45% were yet to be published.  The large number of requests made prior to publication indicates the value of having a policy where articles are submitted to the repository on acceptance rather than publication – there is clearly interest in quickly accessing this research, rather than waiting for publication.

Open Access Week

The Apollo launch was the closing event of Open Access Week at the OSC.  Established by SPARC and partners in the student community in 2008, International Open Access Week is an opportunity to take action in making openness the default for research—to raise the visibility of scholarship, accelerate research, and turn breakthroughs into better lives.  The OSC also released a daily programme of announcements, blog posts and live-streamed events, which are spotlighted on the OA Week webpage, and celebrated this year’s theme of ‘Open in Action’.

Stay in touch with news from the OSC through the monthly newsletter

Published 28 October 2016
Written by Hannah Haines

Creative Commons License

Theses – releasing an untapped resource

As part of Open Access Week 2016, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Dr Matthias Ammon looks at theses and their use.

It may sound obvious, but PhD theses are a huge reservoir of original research content, given that each thesis represents at least three or four years’ focussed engagement with a specialised research topic. Traditionally, however, the results of this work have not been easily accessible.

A print copy of the approved thesis would be deposited in the library of the university where the PhD was undertaken so that access was mainly restricted to other members of that university. Interested readers have to travel to visit the library or rely on frequently costly interlibrary loans. While some of the research contained in theses would be published in articles or monographs, this still means that an enormous amount of research was and is effectively locked away.

Increasing access

With the changes in technology in recent decades allied with the rise of Open Access and institutional repositories, the accessibility of PhD theses in general has improved. In Australia, the Australian Digital Theses program began in 1998, expanding to the Australasian Digital Theses program in 2005. This used VT-ETD software to host digital theses at individual institutions which were collated to one search engine. The ADT website, a central metadata repository, was hosted at the University of New South Wales. This was decommissioned in 2011 as theses were migrated to their various institutional repositories. All Australian theses are now findable in Trove, the National Library of Australia’s Trove service. There are 334, 000 theses listed in Trove of which over 119,000 are available online.

A significant number of UK universities now require the deposit of a digital copy of a thesis in the university’s repository as a condition for awarding the PhD degree. Usually this entails making the thesis openly available although embargoes may be placed for reasons of confidentiality or commercial concerns. In addition, PhD students funded by any of the UK research councils under the RCUK Training Grant are required to make their theses available Open Access.

Although it is not yet mandatory at the University of Cambridge for PhD students to provide a digital copy of their thesis, students can voluntarily upload their approved dissertations to the institutional repository, Apollo. Approximately one in 10 PhD students do so. In the next couple of weeks, the Office of Scholarly Communication is embarking on a pilot for the systematic submission of digital theses with selected departments.

Finding theses

There are national and international repositories that aggregate access to PhD theses, such as the British Library’s EThOS (for the UK) or DART-Europe (for European universities), making it easier for interested researchers to find relevant material without having to trawl through individual repositories.

Open Access Theses and Dissertations aims to be the best possible resource for finding open access graduate theses and dissertations published around the world. Metadata (information about the theses) comes from over 1100 colleges, universities, and research institutions. OATD currently indexes 3,422,634 theses and dissertations.

NDLTD, the Networked Digital Library of Theses and Dissertations provides information and a search engine for electronic theses and dissertations (ETDs), whether they are open access or not. The service also provides ‘Guidance Briefs’ on topics such as Copyright and Preserving and Curating ETD Research Data and Complex Digital Objects.

Proquest Theses and Dissertations (PQDT) is a database of dissertations and theses published digitally or in print. Note these are made available for a fee that does not benefit the author. In addition access to PQDT may be limited depending on local library licensing arrangements.

Looking to the past

So while it is looking likely that most future PhD theses will be available online (either freely or requestable), what about the vast number of PhD theses written up to this point? For context, Cambridge alone holds over 40,000 printed theses, with approximately 1100 being added every year. Approximately 2,000 of these have been digitised at the request of individuals wishing to have access to the theses.

Last year we ran an ‘Unlocking Theses’ project to increase the number of Open Access theses in the repository, which stood at about 600 at the beginning of 2015. The Library also held over 1200 scanned theses on an internal server. The Unlocking Theses project added all of these scanned theses held by the Library into the University repository. The Development and Alumni Office were able to provide contact details for just over 600 of these authors. The majority of these authors have now been contacted and we have had a 35% positive response rate from them.

As of today we hold 2257 theses in the repository of which half are Open Access. The remaining theses are currently held in a Restricted Theses Collection but the biographical information about these theses is searchable. Approximately one third of requests we have from our Request a Copy service is for these theses. In addition some authors have found their restricted thesis online and requested we open access to it.

Cambridge is currently working with the British Library to digitise some of the 14,000 Cambridge theses they hold on microfilm. Our finances do not stretch to the whole corpus, so we have decided to digitise ten percent. This has meant a process to determine which theses we choose to have digitised. Considerations have included the quality of digitisation from microfilm for typeset versus typewritten theses (and indeed whether the thesis is printed single or double sided because of shadowing). We have also chosen theses on the basis of those disciplines are highly requested from our Digital Content Unit. This has proved to be challenging, not least because of the difficulty of determining disciplines of theses from our library catalogue.

We are hoping to upload these theses to the repository towards the end of the year, and with the addition of several hundred theses that have been digitised this year from the Digital Content Unit will double the number of theses we hold in the repository.

Considerations

There are several issues that need to be considered before theses can be made available openly. The first concerns third party copyright, that is to say the inclusion of quotations, images, photographs or other material that does not represent original work on behalf of the thesis author but has been taken from previously published work. There is generally no problem with including such material in the copy of the thesis submitted for examination and the print version deposited in the University library, but making the thesis freely available online constitutes a change of use and requires separate permissions. This is a problem that applies to both current and older theses and requires checks on behalf of the author and possibly the library.

Another issue related to copyright is the author’s permission to make the thesis available which is necessary because the author retains the copyright for his work. For current theses, this permission can be incorporated into the submission process, either as part of the requirement for the PhD or by the author signing an agreement when the thesis is voluntarily uploaded.

However, it is not so easy to obtain permission for retrospective digitisation as we discovered during our Unlocking Theses project. The contact details of alumni are not always known and in cases where the original author is deceased it may be challenging to establish the copyright holder, making it difficult to obtain an explicit ‘opt-in’ permission. Finally, there are financial considerations as the digitisation of large number of theses requires a significant outlay for staff, equipment and administrative costs.

Big projects

In recent years, a number of universities have undertaken large-scale digitisation projects of their holdings of PhD theses and have dealt with the permission issue in different ways.

The experience of these UK universities also appears to indicate that alumni are for the most part happy to see their theses made openly available. If more institutions follow suit and dedicate funding to opening up the research undertaken by generations of students this large reservoir of research will no longer remain untapped.

There are other challenges related to digital theses that still remain to be solved, such as the problem of linking theses to their associated data and the question of persistent identifiers to seamlessly integrate the output of both individual researchers and institutions. In the future, consideration should be given to non-text or multimedia PhDs, as was debated at a recent panel discussion at the British Library.

For now though, opening up access to decades’ or even centuries’ worth of scholarship sitting on university library shelves in the form of physical copies of PhD theses sounds like a good start.

Published 26 October 2016
Written by Dr Matthias Ammon and Dr Danny Kingsley
Creative Commons License