Researchers: Don't fight the data-management battle alone
What to do with data? Most academics have been trying to avoid the questions but they keep coming. Special issues in Nature and now Science highlight the excitement of new research paradigms in data mining, correlation and visualisation.
The bad news is that there are now very few who don’t worry about information overload and how to manage it all, from academic journals to micro-blogging, lab books to personal photo collections.
There is the more obvious and ever more important area of research-information management: citation; publication of underlying journal-paper datasets; impact and the Research Excellence Framework. But receiving far less attention is the fundamental business of managing research data itself. We’re squirreling away ever more of the stuff. But this probably means using USB memory sticks and stand-alone hard drives as institutional IT quotas are reached and extra kit deemed unaffordable. You might be lucky enough to work in a research group large enough to pay someone to look after the data for you—for now at least. More likely, the 'data manager' is an experienced researcher who would rather not spend time firefighting or running around with back-up tapes. They may not even have the experience really needed to cope.
But storage is cheap and getting ever cheaper, isn't it? As cheap as your time to store it properly? Can you find it when you need it? Would a colleague ever find it? Is it annotated in some way and reusable? Is it secure, especially if outsourced? Is it backed up? Aren’t you actually required by your funding council to make it available for reuse?
In the case of sciences, we all know the importance of reproducibility. Indeed the mandates are getting stronger and more insistent. In the US the National Science Foundation ignited a firestorm by insisting proposals submitted to it must include a supplementary data management plan—in two pages. The UK's Wellcome Trust now requires all researchers' proposals to include a plan for managing and sharing their data. The research-council community has been aware of the problem for a long time. The Science and Technology Facilities Council and Natural Environment Research Council fund data centres to curate major project datasets while the UK Data Archive (UKDA) curates datasets in the economic and social sciences.
But the engagement of researchers outside the big projects is less noticeable and usually lacking. I’ve been spoiled. The challenge of big data has been around a long time in astronomy and larger departments or research groups can pool project resources to provide in house IT support. But as funding for infrastructure dwindles under the searchlight of STFC reviews and sexy-science contests, even large departments are buckling under the data-management strain. Meanwhile demand from researchers increases exponentially in the era of petabyte astronomy, bioinformatics and data-driven research.
Through working in the UK e-science programme I became aware how acute this problem is for all those going digital outside big science. The projects may be smaller but they involve many more lone researchers, often working on creaking old laptops using basic spreadsheets and database programs. They are definitely not getting enough help to manage it all.
But help is available, and the UK has some serious expertise. Enter JISC, the Joint Information Systems Committee, the Research Information Network (RIN) and the Higher Education Funding Council for England UK Research Data Service (UKRDS) initiative.
So, a many body problem. Internationally, notably in the US and Australia but increasingly in the EU, nations are recognising the importance of joined-up thinking on data. Through organisations such as the JISC-funded Digital Curation Centre (DCC) we have long been studying this and through the JISC Managing Research Data (#JISCMRD) programme actually testing implementation. Perhaps all that is missing is the national coordination and engagement to provide practical services for the researcher. Enter too, we hope, the recently announced University Modernisation Fund for shared services in the cloud brokered by JANET and mediated by data-management planning tools developed with the DCC. There is to be one year of funding from April to establish pilot services.
So passionately do I feel about all this that, after 20 years in astronomy research, I jumped ship in late 2009 to work with an inspired director of IT Services at Leicester to try and address the problem through its involvement as a pathfinder in the UKRDS. A key recommendation of UKRDS, specifically trialled through the JISC MRD programme, was the need for a research-liaison manager or facilitator with a strong research background and enough experience to coordinate expertise from central IT services, the library, finance and other research groups within and across institutes, and external providers such as the DCC and JISC.
And here one meets the institutional tensions between researchers; research groups and department; department and faculty; academic and management; researcher and research funder; institution and RCUK; RCUK and HEFCE. But the greatest challenge of all? The buck passing between Government, HEFCE, RCUK, institutions and researchers about who is actually responsible for coordinating and paying for all this.
To be continued…