¶ 1 Leave a comment on paragraph 1 0 If the core of my personal narrative, how do we make research that is appropriate and relevant, was to be found in the last two sections, then this section is an attempt to show how those ideas developed in practice. Research assessment and monitoring is perhaps the place where the possibilities of technology meet the community of researchers but also of other communities, of institutions, funders, and wider publics. It is the place where small changes, new types and modes of assessment can have big affects, for good or ill. It is also the place where we researchers are most sensitive. The things we study, systems, people, technologies, art: those are appropriate things for us to measure and critique. But when it comes to assessing ourselves then we claim special privilege. The system is “too complex”, the value of our work “unmeasurable”.
¶ 2 Leave a comment on paragraph 2 0 As with the previous section, the chronology of these pieces starts from well before I understood how complex the interplay of culture, technology and social change is. Along with most advocates for Open Access in mid 2010s I started with the assumption that change would happen once funders demanded it. Many conversations would start “if the funders would just…”. As I began to actually talk to people within funders the magnitude of that “just” started to become clear and I began to understand the limitations on what funders could do. Bear in mind this was 2005-10 and long before most funders were even considering Open Access policies beyond “it would be nice if…”. The question was therefore how to tweak existing systems, add to them in ways that would encourage not just Open Access but data sharing are better communication more generally.
¶ 3 Leave a comment on paragraph 3 0 The biggest challenge, then as now, was the prestige of the traditional publishing venue. Whether it is the journal, for articles, or for books the publisher, the use of venue as a proxy of “quality” remains endemic. Challenging the status quo in scholarly communication is absolutely dependent on undermining the assumptions and systems built on the idea that venue is a viable proxy. One way to tackle that was to introduce measures of readership and engagement, metrics that would show not just how articles (or later books) themselves were doing, but would systematically work in favour of Open Access content. The development of Article Level Metrics within PLOS, the subsequent (and continuing) pitched battle against journal level metrics in general, and the Thomson-Reuters Journal Impact Factor in particular, and the San Francisco Declaration on Research Assessment have all been a part of this larger program.
¶ 4 Leave a comment on paragraph 4 0 My original approach to this was from the angle of discovery, the idea that social tools would be a powerful way of bring the right content to the right researcher at the right time. The first piece in this section is a perspective article written by myself and Shirley Wu for PLOS Biology. PLOS wasn’t the first publisher to collect or display some form of article metrics but it was the first to push it as a major program and this article was part of that launch. Ironically for a while this was one of the most highly viewed PLOS articles. My own use of services like Citeulike showed that proxies like journal name were useless as tools to help me find what to read. A talk I was giving at the time was subtitled “Thanks Clay Shirky, but where’s my bloody filter?”. Our focus in this article remained search and discovery but the motivation for PLOS was to demonstrate how specific articles within PLOS ONE were performing.
¶ 5 Leave a comment on paragraph 5 0 Looking back at this piece as well as the Altmetrics Manifesto it is interesting to see how we were clearly motivated to emphasise the way the value of individual articles could be measured in new ways, but we shied away from explicitly talking about assessment. We could see how measurement might effect cultural change but it seems like we didn’t want to say that too publicly. It is interesting in that light to note the article raises many of the concerns still voiced today about these metrics. While the range of measures has moved on, it seems the debate on their technical merits has not so much.
¶ 6 Leave a comment on paragraph 6 0 But the reticence about the potential for tweaking research assessment didn’t last and of course the presentation of options needed to be combined with direct attacks on traditional measures. While it doesn’t have the star quality of Stephen Curry’s 2012 “Sick of Impact Factors” my own rant from 2010, which drew on earlier pieces by Björn Brembs has the same target, the inconsistency of quantitative researchers using a flawed numerical measure. This remains one of the easiest, yet most commonly ignored vector of attack against the Impact Factor. But the deeper flaw it exposes, an incoherency in management targets is perhaps more serious.
¶ 7 Leave a comment on paragraph 7 0 Here we see the beginning of the idea of arguing for internal consistency, that extends through the rest of this section. Combining the idea of new technical capabilities with an appeal to what researchers see as core values. Pointing out to a researcher that the systems by which they are assessed, and in which they assess others, don’t adhere to our own basic standards of data gathering and analysis doesn’t necessarily make you friends. But what it does is raises questions that stick in the minds of critically trained people. Values and standards are strong elements of self identity. So arguing for the application of our own standards of analysis to ourselves, while a long game, can be highly effective.
¶ 8 Leave a comment on paragraph 8 0 These four short pieces, “Warning”, “Metrics of re-use”, “Administrative Data..” and “Data Driven Approaches” all pose the same question in different ways. They have different audiences and different targets but the idea of self consistency, of funders and institutions seeking to exemplify good practice, runs through all of them. Using the existing values of a community to persuade them to change, particularly when it builds on the technical possibilities that they are advocating for, is the first strand that emerges from developing understanding we saw in the previous section.
¶ 9 Leave a comment on paragraph 9 0 The second strand that emerges from that concern with relevance and broader communities is the question of what research is for. What is it that research assessment is trying to optimise and how it can be measured? This is the implicit theme of “Warning…”. “Metrics of re-use” continues that idea, also building on the story of “Impact is Re-use”. We follow that to the logical conclusion that if re-use is an effective proxy for impact then measuring re-use and rewarding researchers based on that measure will drive the behaviour we are looking for.
¶ 10 Leave a comment on paragraph 10 0 The following pieces explore that second idea from “Warning…”, that our approaches should be internally consistent in two different contexts. The first piece is the text I spoke from in giving oral evidence to a European Commission hearing on research assessment in 2011. I include it in part because it is a piece very directly targeted at a funder but also because it is one of my earliest articulations of the need for data about the research enterprise to be open and to be seen as the basis for research on ourselves. That systems and processes, and evidence that support research assessment should be subject to the same level of critical analysis as any object of research. Aside from the rather naive notion that policy interventions would not effect how researchers evaluate each other this piece still seems relevant, not least because funders in general have largely failed have failed to live up to the standards they are increasingly imposing on researchers. There is a pretty straight line between this piece and the PLOS submission to the HEFCE enquiry on metrics in research assessment four years later.
¶ 11 Leave a comment on paragraph 11 0 If standards are required in one setting, they must necessarily be required in the other. With the growing interest in and requirement for Open Data by funders, then the data about data sharing surely needs to be an exemplar of best practice. “Data Driven Approaches to Data Driven research” makes the point that neither the funders requiring improved data management, but particularly the institutions and services that are supposed to deliver research data sharing are not being held to the same standards when it comes to their data. Where is the data about data management? Where are the requirements for collecting usage data and assessing metadata quality? If research data needs to be managed and curated to support policy goals, then surely the policy development about that data management and curation should also be supported by high quality data on what works and what doesn’t?
¶ 12 Leave a comment on paragraph 12 0 The three pieces in the following and final section will develop the question of what research is for, the idea of internal consistency and connect them with the technical opportunities discussed in earlier sections. But we end this section with the inverse of the right incentives; with a rant on how as a community of researchers we are ourselves complicit in the results of the perverse incentives systems we have built. It nonetheless completes the arc that begin in this section with the description of new technological possibilities, leading through the way that measurement and assessment can define what success looks like for good or ill. Here we end with one possible outcome, one in which measurement and in particular an obsession with simple and easily game-able measures of productivity, leads us towards a dystopia. The demands for better measurement implicit in “Warning…” and the arguments for better data and evidence to support policy development throughout the other pieces were built explicitly on the idea that measuring a system can change it, and that change can be for the better. But we always need to remember not just the law of unintended consequences, but also that systems made up of components that are experts in the study of systems are more prone to unintended consequences than most.