Tag Archives: altmetrics

Walking the talk- reflections on working ‘openly’

As part of Open Access Week 2016, the Office of Scholarly Communication is publishing a series of blog posts on open access and open research. In this post Dr Lauren Cadwallader discusses her experience of researching openly.

Earlier this year I was awarded the first Altmetric.com Annual Research grant to carry out a proof-of-concept study looking at using altmetrics as a way of identifying journal articles that eventually get included into a policy document. As part of the grant condition I am required to share this work openly. “No problem!” I thought, “My job is all about being open. I know exactly what to do.”

However, it’s been several years since I last carried out an academic research project and my previous work was carried out with no idea of the concept of open research (although I’m now sharing lots of it here!). Throughout my project I kept a diary documenting my reflections on being open (and researching in general) – mainly the mistakes I made along the way and the lessons I learnt. This blog summarises those lessons.

To begin at the beginning

I carried out a PhD at Cambridge not really aware of scholarly best practice. The Office of Scholarly Communication didn’t exist. There wasn’t anyone to tell me that I should share my data. My funder didn’t have any open research related policies. So I didn’t share because I didn’t know I could, or should, or why I would want to.

I recently attended The Data Dialogue conference and was inspired to hear many of the talks about open data but also realised that although I know some of the pitfalls researchers fall into I don’t quite feel equipped to carry out a project and have perfectly open and transparent methods and data at the end. Of course, if I’d been smart enough to attend an RDM workshop before starting my project I wouldn’t feel like this!

My PhD supervisor and the fieldwork I carried out had instilled in me some practices that are useful to carrying out open research:.

Lesson #1. Never touch your raw data files

This is something I learnt from my PhD and found easy to apply here. Altmetric.com sent me the data I requested for my project and I immediately saved it as the raw file and saved another version as my working file. That made it easy when I came to share my files in the repository as I could include the raw and edited data. Big tick for being open.

Getting dirty with the data

Lesson #2. Record everything you do

Another thing I was told to do during my PhD lab work was to record everything you do. And that is all well and good in the lab or the field but what about when you are playing with your data? I found I started cleaning up the spreadsheet Altmetric.com sent and I went from having 36 columns to just 12 but I hadn’t documented my reasons for excluding large swathes of data. So I took a step back and filled out my project notebook explaining my rationale. Documenting every decision at the time felt a little bit like overkill but if I need to articulate my decisions for excluding data from my analysis in the future (e.g. during peer review) then it would be helpful to know what I based my reasoning on.

Lesson #3. Date things. Actually, date everything

I’d been typing up my notes about why some data is excluded and others not so it informs my final data selection and I’d noticed that I’d been making decisions and notes as I go along but not recording when. If I’m trying to unpick my logic at a later date it is helpful if I know when I made a decision. Which decision came first? Did I have all my ‘bright ideas’ on the same day and now the reason they don’t look so bright is was because I was sleep deprived (or hungover in the case of my student days) and not thinking straight. Recording dates is actually another trick I learnt as a student – data errors can be picked up as lab or fieldwork errors if you can work back and see what you did when – but have forgotten to apply thus far. In fact, it was only at this point that I began dating my diary entries…

Lesson #4. A tidy desk(top) is a tidy mind

Screen Shot 2016-10-24 at 13.21.11I was working on this project just one day a week over the summer so every week I was having to refresh my mind as to where I stopped the week before and what my plans were that week. I was, of course, now making copious notes about my plans and dating decisions so this was relatively easy. However, upon returning from a week’s holiday, I opened my data files folder and was greeted by 10 different spreadsheets and a few other files. It took me a few moments to work out which files I needed to work on, which made me realise I needed to do some housekeeping.

Aside from making life easier now, it will make the final write up and sharing easier if I can find things and find the correct version. So I went from messy computer to tidy computer and could get back to concentrating on my analysis rather than worrying if I was looking at the right spreadsheet.

 

Lesson #5. Version control

One morning I had been working on my data adding in information from other sources and everything was going swimmingly when I realised that I hadn’t included all of my columns in my filters and now my data was all messed up. To avoid weeping in my office I went for a cup of tea and a biscuit.

Upon returning to my desk I crossed my fingers and managed to recover an earlier version of my spreadsheet using a handy tip I’d found online. Phew! I then repeated my morning’s work. Sigh. But at least my data was once again correct. Instead of relying on handy tips discovered by frantic Googling, just use version control. Archive your files periodically and start working on a new version. Tea and biscuits cannot solve everything.

Getting it into the Open

After a couple more weeks of problem free analysis it was time to present my work as a poster at the 3:AM Altmetrics conference. I’ve made posters before so that was easy. It then dawned on me at about 3pm the day I needed to finish the poster that perhaps I should share a link to my data. Cue a brief episode of swearing before realising I sit 15ft away from our Research Data Advisor and she would help me out! After filling out the data upload form for our institutional repository to get a placeholder record and therefore DOI for my data, I set to work making my spreadsheet presentable.

Lesson #6. Making your data presentable can be hard work if you are not prepared

I only have a small data set but it took me a lot longer than I thought it would to make it sharable. Part of me was tempted just to share the very basic data I was using (the raw file from Altmetric.com plus some extra information I had added) but that is not being open to reproducibility. People need to be able to see my workings so I persevered.

I’d labelled the individual sheets and the columns within those sheets in a way that was intelligible to me but not necessarily to other people so they all needed renaming. Then I had to tidy up all the little notes I’d made in cells and put those into a Read Me file to explain some things. And then I had to actually write the Read Me file and work out the best format for it (a neutral text file or pdf is best).

I thought I was finished but as our Research Data Advisor pointed out, my spreadsheets were returning a lot of errors because of the formula I was using (it was taking issue with me asking it to divide something by 0) and that I should share one file that included the formulae and one with just the numbers.

If I’d had time, I would have gone for a cup of tea and a biscuit to avoid weeping in the office but I didn’t have time for tea or weeping. Actually producing a spreadsheet without formulae turned out to be simple once I’d Googled how to do it and then my data files were complete. All I then needed to do was send them to the Data team and upload a pdf of my poster to the repository. Job done! Time to head to the airport for the conference!

Lesson #7. Making your work open is very satisfying.

Just over three weeks have passed since the conference and I’m amazed that already my poster has been viewed on the repository 84 times and my data has been viewed 153 times! Wowzers! That truly is very satisfying and makes me feel that all the effort and emergency cups of tea were worth it. As this was a proof-of-concept study I would be very happy for someone to use my work, although I am planning to keep working on it. Seeing the usage stats of my work and knowing that I have made it open to the best of my ability is really encouraging for the future of this type of research. And of course, when I write these results up with publication in mind it will be as an open access publication.

But first, it’s time for a nice relaxed cup of tea.

Published 25 October 2016
Written by Dr Lauren Cadwallader
Creative Commons License

What is ‘research impact’ in an interconnected world?

Perhaps we should start this discussion with a definition of ‘impact’. The term impact is used by many different groups for different purposes, and much to the chagrin of many researchers it is increasingly a factor in the Higher Education Funding Councils for England’s (HECFE) Research Excellence Framework. HEFCE defined impact as:

‘an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia’.

So we are talking about research that affects change beyond the ivory tower. What follows is a discussion about strengthening the chances of increasing the impact of research.

Is publishing communicating research?

Publishing a paper is not a good way of communicating work. There is some evidence that much published work is not read by anyone other than the reviewers. During an investigation of claims that huge numbers of papers were never cited, Dahlia Remler found that:

  • Medicine  – 12% of articles are not cited
  • Humanities – 82% of articles are not cited – note however that their prestigious research is published in books, however many books are rarely cited too.
  • Natural Sciences – 27% of articles are never cited
  • Social Sciences – 32% of articles are never cited

Hirsch’s 2005 paper: An index to quantify and individual’s scientific research output, proposing the h index – defined as the number of papers with citation number ≥h. So an h index of 5 means the author has at least 5 papers with at least 5 citations. Hirsch suggested this as a way to characterise the scientific output of researchers. He noted that after 20 years of scientific activity, an h index of 20 is a ‘successful scientist’. When you think about it, 20 other researchers are not that many people who found the work useful. And that ignores those people who are not ‘successful’ scientists who are, regardless, continuing to publish.

Making the work open access is not necessarily enough

Open access is the term used for making the contents of research papers publicly available – either by publishing them in an open access journal or by placing a copy of the work in a subject or institutional repository. There is more information about open access here.

I am a passionate supporter of open access. It breaks down cost barriers to people around the world, allowing a much greater exposure of publicly funded research. There is also considerable evidence showing that making work open access increases citations.

But is making the work open access enough? Is a 9.5MB pdf downloadable onto a telephone, or through a dail-up connection?  If the download fails at 90% you get nothing. Some publishing endeavours have recogised this as an issue, such as the Journal of Humanitarian Engineering (JHE), which won the Australian Open Access Support Group‘s 2013 Open Access Champion award for their approach to accessibility.

Language issues

The primary issue, however is the problem of understandability. Scientific and academic papers have become increasingly impenetrable as time has progressed. It’s hard to believe now that at the turn of last century scientific articles had the same readability as the New York Times.

‘This bad writing is highly educated’ is a killer sentence from Michael Billig’s well researched and written book ‘Learn to Write Badly: How to Succeed in the Social Sciences‘.  This phenomenon is not restricted to the social sciences, specialisation and a need to pull together with other members of one’s ‘tribe‘ mean that academics increasingly write in jargon and specialised language that bears little resemblance to the vernacular.

There are increasing arguments for scientific communication to the public being part of formal training. In a previous role I was involved in such a program through the Australian National Centre for the Public Awareness of Science. Certainly the opportunities for PhD students to share their work more openly have never been more plentiful. There are many three minute thesis competitions around the world. Earlier this year the British Library held a #Share your thesis competition where entrants were first asked to tweet why their PhD research is/was important using the hashtag #ShareMyThesis. The eight shortlisted entrants were asked to write a short article (up to 600 words) elaborating on their tweet and explaining why their PhD research is/was important  in an engaging and jargon-free way.

Explaining work in understandable language is not ‘dumbing it down’.  It is simply translating it into a different language. And students are not restricted to the written word. In November the eighth winner of the annual ‘Dance your PhD‘ competition sponsored by Science, Highwire Press and the AAAS will be announced.

Other benefits

There is a flow-on effect from communicating research in understandable language. In September, the Times Higher Education recently published an article ‘Top tips for finding a permanent academic job‘ where the information can be summarised as ‘communicate more’.

The Thinkable.org group’s aim is to widen the reach and impact of research projects using short videos (three minutes or less). The goal of the video is to engage the research with a wide audience. The Thinkable Open Innovation Award is a research grant that is open to all researchers in any field around the world and awarded openly by allowing Thinkable researchers and members to vote on their favourite idea. The winner of the award receives $5000 to help fund their research. This is specifically the antithesis of the usual research grant process where grants “are either restricted by geography or field, and selected via hidden panels behind closed doors”.

But the benefit is more than the prize money. This entry from a young Uni of Manchester PhD biomedical student did not win, but thousands of people engaged in her work in just few weeks of voting.

Right. Got the message. So what do I need to do?

Researcher Mike Taylor pulled together a list of 20 things a researcher needs to do when they publish a paper.  On top of putting a copy of the paper in an institutional or subject repository, suggestions include using various general social media platforms such as Twitter and blogs, and also uploading to academic platforms.

The 101 Innovations in Scholarly Communication research project run from the University of Utrecht is attempting to determine scholarly use of  communication tools. They are analysing the different tools that researchers are using through the different phases of the research lifecycle – Discovery, Analysis, Writing, Publication, Outreach and Assessment through a worldwide survey of researchers. Cambridge scholars can use a dedicated link to the survey.

There are a plethora of scholarly peer networks which all work in slightly different ways and have slightly different foci.  You can display your research into your Google Scholar or CarbonMade profile. You can collate the research you are finding into Mendeley or Zotero. You can also create an environment for academic discourse or job searching with Academia.edu, ResearchGate and LinkedIn. Other systems include Publons – a tool to register peer reviewing activity.

Publishing platforms include blogging (as evidenced here), Slideshare, Twitter, figshare, Buzzfeed. Remember, this is not about broadcasting. Successful communicators interact.

Managing an online presence

Kelli Marshall from DePaul University asks ‘how might academics—particularly those without tenure, published books, or established freelance gigs—avoid having their digital identities taken over by the negative or the uncharacteristic?’

She notes that as an academic or would-be academic, you need to take control of your public persona and then take steps to build and maintain it. If you do not have a clear online presence, you are allowing Google, Yahoo, and Bing to create your identity for you. There is a risk that the strongest ‘voices’ will be ones from websites such as Rate My Professors.

Digital footprint

Many researchers belong to an institution,  a discipline and a profession. If these change your online identity associated with them will also change. What is your long term strategy? One thing to consider is obtaining a persistent unique identifier such as an ORCID – which is linked to you and not your institution.

When you leave an institution, you not only lose access to the subscriptions the library has paid for, you also lose your email address. This can be a serious challenge when your online presence in academic social media sites like Academia.edu and ResearchGate are linked to that email address. What about content in a specific institutional repository? Brian Kelly discussed these issues at a recent conference.

We seem to have drifted a long way from impact?

The thing is that if it can be measured it will be. And digital activity is fairly easily measured. There are systems in place now to look at this kind of activity. Altmetrics.org moves beyond the traditional academic internal measures of peer review, Journal Impact Factor (JIF) and the H-index. There are many issues with the JIF, not least that it measures the vessel, not the contents. For these reasons there are now arguments such as the San Francisco Declaration on Research Assessment (DORA) which calls for the scrapping of the JIF to assess a researcher’s performance. Altmetrics.org measures the article itself, not where it is published. And it measures the activity of the articles beyond academic borders. To where the impact is occurring.

So if you are serious about being a successful academic who wants to have high impact, managing your online presence is indeed a necessary ongoing commitment.

NOTE: On 26 September, Dr Danny Kingsley spoke on this topic to the Cambridge University Alumni festival. The slides are available in Slideshare. The Twitter discussion is here.

Published 25 September 2015
Written by Dr Danny Kingsley
Creative Commons License

FORCE2015 observations & notes

This blog first appeared on the FORCE2015 website on the 14 January 2015

First a disclaimer. This blog is not an attempt to summarise everything that happened at FORCE2015 – I’ll leave that to others. The Twitter feed using #FORCE2015 contains an interesting side discussion, and the event was livestreamed with individual sessions live in two weeks here – so you can always check bits out for yourself.

So this is a blog about the things that I as a researcher in scholarly communication working in university administration (with a nod to my previous life as a science communicator) found interesting. This is a small representative of the whole.

This was my first FORCE event, which has occurred annually since the first event FORCE11 , which happened in August 2011 after a “Beyond the pdf” workshop in January that year. It was nice to have such a broad group of attendees. There were researchers and innovators (and often people were both), research funders, publishers, geeks and administrators all sharing their ideas. Interestingly there were only a few librarians – this, in itself makes this conference stand out. Sarah Thomas, Vice President of Harvard Library observed this, noting she is shocked that there are usually only librarians at the table at these sort of events.

To give an idea of the group – when the question was asked about who had received a grant from the various funding bodies, I was in a small minority by not putting up my hand. These are actively engaged researchers.

I am going to explore some of the themes of the conference here, including:

  • Library issues
  • The data challenge
  • New types of publishing
  • Wider scholarly communication issues, and
  • The impenetrability of scientific literature

Bigger questions about effecting change

Responsibility

Whose responsibility is it to effect change in the scholarly communication space? Funders say they are looking to the community for direction. Publishers are saying they are looking to authors and editors for direction. Authors are saying they are looking to find out what they are supposed to do. We are all responsible. Funding is not the domain of the funders, it is interdependent.

What is old is still old

The Journal Incubator team asked the editorial board members of the new journal “Culture of Community” to identify what they thought will attract people to their journal. None mentioned the modern and open technology of their publishing practices. All points they identified were traditional, such as: peer review, high indexing, pdf formatting etc. Take home message – Authors are not interested in the back-end technology of a journal, they just want the thing to work. This underlies the need to ENABLE not ENGAGE.

The way forward

The way forward is three fold, and incorporates: Community – Policy – Infrastructure. Moving forward we will require initiatives focused on: Sustainability, Collaboration and Training.

Library issues

Future library

Sarah Thomas, the Vice President of the Harvard Library spoke about “Libraries at Scale or Dinosaurs Disrupted”. She had some very interesting things to say about the role of the library into the future:

  • Traditional libraries are not sustainable. Acquisition, cataloguing and storage of publications doesn’t scale.
  • We need to operate at scale, and focus on this centuries’ challenges not last, by developing new priorities and reallocate resources to them. Use approaches that dramatically increase outputs.
  • There is very little outreach of the libraries into the community –  we are not engaging broadly expect “we are the experts and you come to us and we will tell you what to do”.
  • We must let go of our outdated systems – such as insufficiently automated practices, redundant effort, ‘just in case coverage’.
  • We must let go of our outdated practices – a competitive, proprietary approach. We need to engage collaborators to advance goals.
  • Open up hidden collection and maximise access to what we have.
  • Start doing research into what we have and illuminate the contents in ways we never could in a manual world, using visualization and digital tools

Future library workforce

There was also some discussion about the skils a future library worksforce needs to have:

  • We need an agile workforce – skills training, data science social media etc – help promote the knowledge of quality to work. Put it in performance goals.
  • We need to invest in 21st century skillsets. the workforce we should be hiring includes:
    • Metadata librarian
    • Online learning librarians
    • Bioinformatics librarians
    • GIS specialist
    • Visualization librarian
    • Copyright advisor
    • Senior data research specialist
    • Data curation experts
    • Scholarly communications librarian
    • Quantitative data specialist
    • Faculty technology specialist
    • Subject specialist
Possible solution?
The Council on LIbrary and Information Resources offers PostDoc Fellowships: CLIR Postdoctoral Fellows work on projects that forge and strengthen connections among library collections, educational technologies and current research. The program offers recent PhD graduates the chance to help develop research tools, resources, and services while exploring new career opportunities.

Possible opportunity to observe change?

In summing up the conference Phil Bourne said there is an upcoming major opportunity point – both the European Bioinformatics Institute in EU and the National Library of Medicine in US will soon assume new leadership. They are receiving recommendations on what the institution of the future should look like.

The library has a tradition of supporting the community, being an infrastructure to maintain knowledge, and in the case of National Library of Medicine to set policy. If they are going to reinvent this institution we need to watch what will it look like in the future.

The future library (or whatever it will be called) should curate, catalog, preserve and disseminate the complete digital research lifecycle. This is something we need to move towards. The fact that there is an institution that might move towards this is very exciting.

The data challenge

Data was discussed at many points during the conference, with some data solutions/innovations showcased:

  • Harvard has the Harvard Dataverse Network– a repository to share data. “Data Management at Harvard” – Harvard Guidelines and Policies cranking up investment in managing data LINK
  • The Resource Identification Initiative is designed to help researchers sufficiently cite the key resources used to produce the scientific findings reported in the biomedical literature.
  • Bio Caddie is trying to do for data what PubMed central has done for publications using a Data Discovery Index. The goal of this project is to engage a broad community of stakeholders in the development of a biomedical and healthCAre Data Discovery and Indexing Ecosystem (bioCADDIE).

The National Science Foundation data policy

Amy Frielander spoke about The Long View. She posed some questions:

  • Must managing data be collated with storing the data?
  • What gets access to what and when?
  • Who and what can I trust?
  • What do we store it in? Where do we put things, where do they need to be?

The NSF don’t make a policy for each institution, they make one NSF Data Sharing Policy that works more or less well across all disciplines. There is a diversity of sciences with heterogeneous research results. Range of institutions, professional societies, stewardship institutions and publishers, and multiple funding streams.

There are two contact points – when grant is awarded, and when they report. If we focus on publications we can develop the architecture to extend to other kinds of research products. Integrate the internal systems within the enterprise architecture to minimise burden on investigators and program staff.

Take home message: The great future utopia (my word) is: We want to upload once to use many times. We want an environment in which all publications are linked to the underlying evidence (data) analytical tools, and software.

New types of publishing

There were several publishing innovations showcased.

Journal Incubator

The University of Lethbridge has a ‘journal incubator’ which was developed with the goal of sustaining scholarly communication and open and digital access. It allows the incubator to train graduate students in the task of journal editorships so the journal can be provided completely free of charge.

F1000 Research Ltd – ‘living figures’

Research is an ongoing activity but the way we publish you wouldn’t think it was. It is still very much around the static print object. The F1000 LINK has the idea that data is embedded in the article – developed a tool that allows you to see what is on the article.

Many figures don’t need to exist – you need the underlying data. Living figures in the paper. Research labs can submit data directly on top of the figure – to see if it was reproducible or not. This provides interesting potential opportunities –bringing static reseach figures to life – a “Living collection” Can have articles in different labs around that data. The tools and technologies are out there already.

Collabra – giving back to the community

New University California Open Press journal, Collabra will share a proportion of APC with researchers and reviewers. Of the $875 APC, $250 goes into the fund. Editors and reviewers get money into the fund, and there is a payout to the research community – they can decide what to do with it. Choices are to:

  • Receive it electronically
  • Pay it forward to pay APCS in future
  • Pay it forward to institution’s OA fund.

This is a journal where reviewers get paid  – or can elect to pay themselves. See if everyone can benefit from the journal. No lock-in – benefit through partnerships.

Future publishing – a single XML file

Rather than replicating old publishing processes electronically, the dream is we have one single XML file in the cloud. There is role-based access to modify the work (by editors, reviewers etc) then at the end that version is the one that gets published. Everything is in the XML and then automatic conversion at the end.  References at the click of a button are completely structured XML – tags are coloured. Can track the changes. The journal editor gets a deep link to say something to look at. Can accept or reject. XML can convert to a pdf – high level typography, fully automatically.

Wider scholarly communication issues

This year is the 350th anniversary of the first scientific journal* Philosophical Transactions of the Royal Society. Oxford holds a couple of copies of this journal and there was an excursion for those interested in seeing it.

It is a good time to look at the future.

Does reproducibility matter?

Something that was widely discussed was the question of whether research should be reproducible,which raised the following points:

  • The idea of a single well defined scientific method and thus an incremental, cumulative, scientific process is debatable.
  • Reproducibility and robustness are slightly different. Robustness of the data may be key.
  • There are no standards with a computational result that can ensure we have comparable experiments.
Possible solution?

Later in the conference a new service that tries to address the diversity of existing lab software was showcased – Riffyn. It is a cloud based software platform – a CAD for experiments. The researcher has a unified experimental view of all their processes and their data. Researchers can update it themselves – not reliant on IT staff.

Credit where credit is due

I found the reproducibility discussion very interesting, as was the discussion about authorship and attribution which posed the following:

  • If it is an acknowledgement system everyone should be on it
  • Authorship is a proxy for scientific responsibility. We are using the wrong word.
  • When crediting research we don’t make distinctions between contributions. Citation is not the problem, contribution is.
  • Which building blocks of a research project do we not give credit for? And which ones only get indirect credit? How many skills would we expect one person to have?
  • The problem with software credit is we are not acknowledging the contributors, so we are breaking the reward mechanism
  • Of researchers in research-intensive universities, 92% are using software. Of those 69% say their work would be impossible without software. Yet 71% of researchers have no formal software development training. We need standard research computer training.
Possible solutions
  • The Open Science Framework  –  provides documentation for the whole research process. This therefore determines how credit should be apportioned.
  • Project CRediT has come up with a taxonomy of terms. Proposing take advantage of an infrastructure that already exists. Using Mozilla OpenBadges – if you hear or see the word ‘badges’ think ‘Digital Credit’

The impenetrability of scientific literature

Astrophysicist Chris Lintott discussed citizen science, specifically the phenomenally successful programGalaxyZoo which taps into a massive group of interested amateur astronomers to help classify galaxies in terms of their shape. This is something that humans do better than machines.

What was interesting was the challenge that Chris identified – amateur astronomers become ‘expert’ amateurs quickly and the system has built ways of them to communicate with each other and with scientists. The problem is that the astronomical literature is simply impenetrable to these (informed) citizens.

The scientific literature is the ‘threshold fear’ for the public. This raises some interesting questions about access – and the need for some form of moderator. One suggestion is some form of lay summary of the research paper – PLOS Medicine have an editor’s summary for papers. (Nature do this for some papers, and BMJ are pretty strong on this too).

Take home message – By designing a set of scholarly communication tools for citizen scientists we improve the communication for all scientists. This is an interesting way to think about how we want to browse scholarly papers as researchers ourselves.

*Yes I know that the French Journal des scavans was published before this, but it was boarder in focus, so hence the qualifier ‘first scientific journal”
Published 18 March 2015
Written by Dr Danny Kingsley
Creative Commons License