Administrative data and research data both need to be open

1 Leave a comment on paragraph 1 0 We live in a world where there is more information available at the tips of our fingers than even existed 10 or 20 years ago. Much of what we use to evaluate research today was built in a world where the underlying data was difficult and expensive to collect. Companies were built, massive data sets collected and curated and our whole edifice of reputation building and assessment grew up based on what was available. As the systems became more sophisticated new measures became incorporated but the fundamental basis of our systems weren’t questioned. Somewhere along the line we forgot that we had never actually been measuring what mattered, just what we could.

2 Leave a comment on paragraph 2 0 Today we can track, measure, and aggregate much more, and much more detailed information. It’s not just that we can ask how much a dataset is being downloaded but that we can ask who is downloading it, academics or school children, and more, we can ask who was the person who wrote the blog post or posted it to Facebook that led to that spike in downloads.

3 Leave a comment on paragraph 3 0 This is technically feasible today. And make no mistake it will happen. And this provides enormous potential benefits. But in my view it should also give us pause. It gives us a real opportunity to ask why it is that we are measuring these things. The richness of the answers available to us means we should spend some time working out what the right questions are.

4 Leave a comment on paragraph 4 0 There are many reasons for evaluating research and researchers. I want to touch on just three. The first is researchers evaluating themselves against their peers. While this is informed by data it will always be highly subjective and vary discipline by discipline. It is worthy of study but not I think something that is subject to policy interventions.

5 Leave a comment on paragraph 5 0 The second area is in attempting to make objective decisions about the distribution of research resources. This is clearly a contentious issue. Formulaic approaches can be made more transparent and less easy to legal attack but are relatively easy to game. A deeper challenge is that by their nature all metrics are backwards looking. They can only report on things that have happened. Indicators are generally lagging (true of most of the measures in wide current use) but what we need are leading indicators. It is likely that human opinion will continue to beat naive metrics in this area for some time.

6 Leave a comment on paragraph 6 0 Finally there is the question of using evidence to design the optimal architecture for the whole research enterprise. Evidence based policy making in research policy has historically been sadly lacking. We have an opportunity to change that through building a strong, transparent, and useful evidence base but only if we simultaneously work to understand the social context of that evidence. How does collecting information change researcher behavior? How are these measures gamed? What outcomes are important? How does all of this differ cross national and disciplinary boundaries, or amongst age groups?

7 Leave a comment on paragraph 7 0 It is my belief, shared with many that will speak today, that open approaches will lead to faster, more efficient, and more cost effective research. Other groups and organizations have concerns around business models, quality assurance, and sustainability of these newer approaches. We don’t need to argue about this in a vacuum. We can collect evidence, debate what the most important measures are, and come to an informed and nuanced inclusion based on real data and real understanding.

8 Leave a comment on paragraph 8 0 To do this we need to take action in a number areas:

9 Leave a comment on paragraph 9 0 1. We need data on evaluation and we need to able to share it.

10 Leave a comment on paragraph 10 0 Research organizations must be encouraged to maintain records of the downstream usage of their published artifacts. Where there is a mandate for data availability this should include mandated public access to data on usage.

11 Leave a comment on paragraph 11 0 The commission and national funders should clearly articulate that that provision of usage data is a key service for publishers of articles, data, and software to provide, and that where a direct payment is made for publication provision for such data should be included. Such data must be technically and legally reusable.

12 Leave a comment on paragraph 12 0 The commission and national funders should support work towards standardizing vocabularies and formats for this data as well critiquing it’s quality and usefulness. This work will necessarily be diverse with disciplinary, national, and object type differences but there is value in coordinating actions. At a recent workshop where funders, service providers, developers and researchers convened we made significant progress towards agreeing routes towards standardization of the vocabularies to describe research outputs.

13 Leave a comment on paragraph 13 0 2. We need to integrate our systems of recognition and attribution into the way the web works through identifying research objects and linking them together in standard ways.

14 Leave a comment on paragraph 14 0 The effectiveness of the web lies in its framework of addressable items connected by links. Researchers have a strong culture of making links and recognizing contributions through attribution and citation of scholarly articles and books but this has only recently being surfaced in a way that consumer web tools can view and use. And practice is patchy and inconsistent for new forms of scholarly output such as data, software and online writing.

15 Leave a comment on paragraph 15 0 The commission should support efforts to open up scholarly bibliography to the mechanics of the web through policy and technical actions. The recent Hargreaves report1 explicitly notes limitations on text mining and information retrieval as an area where the EU should act to modernize copyright law.

16 Leave a comment on paragraph 16 0 The commission should act to support efforts to develop and gain wide community support for unique identifiers for research outputs, and for researchers. Again these efforts are diverse and it will be community adoption which determines their usefulness but coordination and communication actions will be useful here. Where there is critical mass, such as may be the case for ORCID and DataCite, this crucial cultural infrastructure should merit direct support.

17 Leave a comment on paragraph 17 0 Similarly the commission should support actions to develop standardized expressions of links, through developing citation and linking standards for scholarly material. Again the work of DataCite, CoData, Dryad and other initiatives as well as technical standards development is crucial here.

18 Leave a comment on paragraph 18 0 3. Finally we must closely study the context in which our data collection and indicator assessment develops. Social systems cannot be measured without perturbing them and we can do no good with data or evidence if we do not understand and respect both the systems being measured and the effects of implementing any policy decision.

19 Leave a comment on paragraph 19 0 We need to understand the measures we might develop, what forms of evaluation they are useful for and how change can be effected where appropriate. This will require significant work as well as an appreciation of the close coupling of the whole system.

20 Leave a comment on paragraph 20 0 We have a generational opportunity to make our research infrastructure better through effective evaluation and evidence based policy making and architecture development. But we will squander this opportunity if we either take a utopian view of what might technically feasible, or fail to act for a fear of a dystopian future. The way to approach this is through a careful, timely, transparent and thoughtful approach to understanding ourselves and the system we work within.

21 Leave a comment on paragraph 21 0 The commission should act to ensure that current nascent efforts work efficiently towards delivering the technical, cultural, and legal infrastructure that will support an informed debate through a combination of communication, coordination, and policy actions.

22 Leave a comment on paragraph 22 0 Notes

23 Leave a comment on paragraph 23 0 On Monday 30 May 2011 I gave evidence at a European Commission hearing on Access to Scientific Information. This is the text that I spoke from. This was originally posted as “Evidence to the European Commission Hearing on Access to Scientific Information” to Science in the Open on 31 May 2011.

  1. 24 Leave a comment on paragraph 24 0
  2. Hargreaves, I (2011), Digital Opportunity: A review of intellectual property and growth, Intellectual Property Office, UK, available online at http://webarchive.nationalarchives.gov.uk/20140603093549/http://www.ipo.gov.uk/ipreview-finalreport.pdf
Page 31

Source: http://book-shaped-object.cameronneylon.net/wp/research-assessment-for-a-networked-world/administrative-data-and-research-data-both-need-to-be-open/