¶ 1 Leave a comment on paragraph 1 0 The research paper is a comfortable and familiar form of communication for researchers in the sciences and humanities. The paper contains a narrative — often actually two narratives: a story about the findings themselves and a story about how those findings were discovered and tested. Both of these stories are necessarily fictions. One is a model, a summary of the findings, and all models are incomplete. The second will, by convention, pose a clearly defined question which is in turn, clearly answered by data. This is of course never how the actual study proceeded. Anita de Waard’s description of papers as “stories that persuade with data” and the related narrative analysis captures much of what is important here.
¶ 2 Leave a comment on paragraph 2 0 Humans are story tellers, and perhaps more critically, story absorbers. We construct our world view from local models, made up of stories. If we accept a view of research as a process transferring these models, testing and modifying them locally, and then passing them on, then the use of story telling as a transfer mechanism makes perfect sense. You can see these stories being told and being absorbed in teaching sessions, in books, in research seminars, and of course, in research papers.
¶ 3 Leave a comment on paragraph 3 0 These particular stories started their life as letters, sent from one researcher to another, sometimes collected into volumes and republished but usually passed from one person to one other. As the research community grew this became inefficient; centralizing the process of collecting and distributing these letters made a great deal of sense and the journal was born. It is important to note that this was an active and logical choice given the scale of the community, the complexity of the research being done, and the volume of data being created. With limited common frameworks in which to discuss results and with limited data collection ability, the descriptive narrative, the framing of the argument as a model — often being described for the first time — was crucial.
¶ 4 Leave a comment on paragraph 4 0 By the turn of the 20th century, the journal system was quite formalized and a growing professional research class was generating more and more papers. For the most part review was carried out by editors. Journals were starting to creak as the volume of papers increased but this remained manageable. Looking at papers from the 1930s and 40s, it is still common to find extensive tables of data; essentially the whole of the record that could be printed was published. In many cases the direct results of measurements are presented.
¶ 5 Leave a comment on paragraph 5 0 As the expansion of scientific research following the second world war took hold, editors alone couldn’t cope, and a more structured system of peer review by external referees was formalized. At the same time the number of journals expanded rapidly as did the size of existing journals. By the turn of the 21st century, the whole system had expanded further, exploiting the capacity of the Internet and the possibilities raised by online-only journals.
¶ 6 Leave a comment on paragraph 6 0 However, despite a huge rise in the number of “junk” journals, where peer review is virtually non-existent, and a massive expansion in the number of low prestige journals, scarcity remains a fundamental aspect of the current journal system; that is, limited space in which to publish and limited resources to review all the submissions made. The content of research publications, still fundamentally narratives, has been reduced to an extent where the paper is at best a summary of what has been found. In many cases neither data, nor process, are described in sufficient detail to allow a detailed critique, let alone replication. At the same time the scarcity of review resources leads to a huge reduction in the proportion of research results that are validated and made public. The time and effort of getting papers reviewed means that the majority of research results are likely never made available.
¶ 7 Leave a comment on paragraph 7 0 The issues of loss of access to negative results are well rehearsed. What is less well understood is how the lack of value given to maintaining comprehensive research records damages the access of research groups to their own data. In a world where the only measure of quality that matters is the venue in which a conventional research paper is published, much that is of potential value is lost completely. We have little idea what the opportunity costs of this loss are. What is clear is that we have a communications monoculture where the narrative is the only thing perceived as having value. The scarcity of these narratives is in part artificial, maintained so as to sustain the prestige factor that drives author behaviour, and in part a function of the lack of expert resource for review. But fundamentally we still live in a world of scarce narratives.
The abundance of the web
¶ 8 Leave a comment on paragraph 8 0 The reason I described the scarcity as in part artificial is due to the fundamental changes that the web brings. Given an existing online journal infrastructure, the marginal cost of making more papers available is as near to zero as makes no difference. The requirements of page limits, space limitation, colour charges, simply disappear. What does not disappear is the scarcity of resources to generate and structure these narratives and the cost of reviewing and reading them. It is trivial to post objects to the web; it is not trivial to create them or carry out quality assurance on them because these are fundamentally human processes and therefore processes that do not scale.
¶ 9 Leave a comment on paragraph 9 0 A working assumption in exploiting the abundance of web publication is that in the worst case, publishing anything does no harm. That is, any form of publication allows access to an upside: a potential unexpected use or repurposing of the published information or data, without any significant downside risk. Concerns are often raised about the potential for swamping our ability to find information but these are largely misguided. New channels for data and information will not oblige anyone to look at them. Even if the entirety of the research enterprise were dumped on the web, it would add only a small part to that store of information and would arguably increase its average quality. Tools for discovery of the right information or data remain a challenge in the research space but I would argue that this is precisely because of the limited amounts of the research record currently published. Greater diversity in both type and quality of research information on the web would enhance the development of tools for discovery.
¶ 10 Leave a comment on paragraph 10 0 However the assumption of this upside does not mean that the upside opportunities are necessarily large. As noted above, publication of anything remains an effort. The key question is how to maximize the potential upside while reducing the burden that publication creates for authors and reviewers. This is especially important when we face both a period of limited resources and lack of clarity about where and how the potential benefits of more comprehensive publication of the research record might arise.
The costs of reviewing
¶ 11 Leave a comment on paragraph 11 0 If a wider range of scholarly outputs are to be published, then it follows that a significant component of any additional cost (probably the largest component) would be in reviewing this material. Peer review is costly in expert time and because there is little tradition or culture guiding the review of non-traditional outputs, confusion and further costs may result. However, if it is accepted that some benefit can arise from more comprehensive publication, then there is a simple answer to reducing the potential cost of review. Don’t.
¶ 12 Leave a comment on paragraph 12 0 With limited demonstrable benefit of such review and no real agreement on how to proceed with effective review of data or workflows, there seems no value in incurring the large potential costs, at least until the benefits can be identified and the resource required focused where it can make the most good. In the meantime, as argued above, there is little downside risk to publishing more material.
¶ 13 Leave a comment on paragraph 13 0 There is no doubt that the quality of this material will be variable but low costs of publication mean that the only serious problem is one of enabling discovery of useful quality material amongst the rest. Automated review processes can be developed and applied cheaply, and while these will lack the nuance of expert review, they can effectively identify data and workflows that are of sufficient quality to be useful. Simply tracking the re-use of outputs provides a strong signal for usable material. This is not perfect; much good material may not surface because it is not discovered and used, but at least it is not also completely lost as the case is now.
¶ 14 Leave a comment on paragraph 14 0 The web is not made more useful by removing pages from it. The web is made more useful by providing better discovery and summary systems, whether they are based on algorithms or tracking the actions of human readers. Our research discovery systems are woefully inadequate in large part because there is insufficient data available and legally usable to make them worth developing. To develop effective discovery engines, we need to provide much more data and crucially more diverse data to enable their optimization. The review problem on the web is not one of filtering prior to publication, but of discovery after it.
Reducing authoring costs
¶ 15 Leave a comment on paragraph 15 0 Having established the necessity of balancing the burden of publication with its potential benefits the second question is how to reduce the costs of authoring. How can more of the research record be published without creating more costs than benefits? I would argue that the “Journal of Stuff” approach, in which a research output that would not traditionally be published is wrapped in a conventional journal article format, is amongst the worst possible approaches. Numerous attempts to create a “Journal of Negative Results” have failed because they do not recognize the balance of burden and reward. Generating a traditional paper to wrap a negative result, a data set, or software, is a lot of work, and in most cases more work than it is worth.
¶ 16 Leave a comment on paragraph 16 0 It is clear that “wrapping” of these research outputs is necessary at some level. A minimum of contextual metadata is required to ensure that the published object is comprehensible, re-usable, and appropriately provenanced. At a minimum these will probably include authors, an address at which the object can be cited and/or retrieved, and some type information. In fact, the only required element is an address as any additional information can be provided at that location. Indeed it is preferable for such additional information to be provided by the object itself rather than carried around separately. However the cultural traditions of citation are likely to require that the “address” include details such as date, author, and title.
Capturing the wrapper
¶ 17 Leave a comment on paragraph 17 0 The published address of a research artifact will be created by the act of the publication. However the additional ‘wrapping’ that is required is not created at that point. We need to capture that information earlier, ideally in a completely automated fashion. The key principle in reducing the burden of publication is to capture as much contextual information as possible while bothering the author as little as possible.
¶ 18 Leave a comment on paragraph 18 0 When a research artifact is created, much of this information can be readily captured from the context of its creation or of the recording of its creation. When a digital file is created — a dataset, a piece of software, or a document — the date of creation is captured. The context in which it is created will in most cases also allow the identity of the author to be captured and recorded either from the system account or the identity of the machine where it is created. In some cases the author might be an automated system.
¶ 19 Leave a comment on paragraph 19 0 Capturing the type of object can be more challenging but there are at least two automated approaches that can provide significant coverage. Firstly, automated identification (or scrobbling) as exemplified by systems such as the Quixote project can search for, classify, and even process files based on templates that enable their identification based on contents. The second approach is to identify objects based on where they are created. If an instrument or process drops its output data files into a specific directory, those files can be collected and processed. This requires a (probably manual) setup process in which the location is identified, but with this in place, the capture of contextual metadata can be automated along with downstream processing.
Physical objects in the research record
¶ 20 Leave a comment on paragraph 20 0 The capture of digital objects is relatively straightforward but the capture of physical objects is potentially more difficult. The existence of samples, materials, and stocks is frequently not recorded in detail in academic research labs. Important physical objects are recorded but practice varies greatly as to what is important. The emerging tools for a “web of things” can play a role in making this recording more effective and comprehensive.
¶ 21 Leave a comment on paragraph 21 0 The recording of physical objects is actually a microcosm of the whole challenge of improving scholarly communication. Currently, recording and publication of those records is poor. Part of the reason for this is that the perceived costs of comprehensive recording processes and systems are too high for the perceived benefits. Hidden costs result when samples are lost, recreated unnecessarily, or are allowed to “go off” but these opportunity costs are rarely visible when compared to up front costs of recording in systems and researcher time. Tools do exist that could help, QR codes, barcode readers, laboratory information management systems but these are generally either expensive or not a good fit for the local conditions in a small scale laboratory.
¶ 22 Leave a comment on paragraph 22 0 The challenge therefore is to reduce the costs of recording and publication of physical objects to as near zero as possible, and to realize the potential benefits that arise from this as rapidly as possible. The costs are best minimized by creating systems that embed the recording of these options in tasks that are already being carried out. A simple example is to create sample records via the printing of labels. A simple system that provides a standardized (and appropriate) label for samples can easily be configured to create a record of those samples. Systems that generate samples should create a record of these automatically. Where preparations are made from component parts, simple tools to assist in their preparation (e.g., doing the calculation of how much of each specific reagent is required) can also act to create the record, and indeed print off the label. Very lightweight systems can do this at very low system cost, potentially saving the researcher time, and provide a record which is then searchable, enabling a wide range of downstream uses.
Capturing the wider context
¶ 23 Leave a comment on paragraph 23 0 We can imagine systems that capture the creation of objects, and enough of the who, what, when, and where, to be useful. To be used these systems will need to be almost invisible to the user when activated but also provide high quality interfaces that enable that same user to find, interact with, organize, and publish those records. For publication to be worth the investment, it must be extremely easy and robust, particularly when, as noted above, we don’t know what the downstream returns on that publication are likely to be, either directly for the user, or more widely for the community that might use the published artifacts.
¶ 24 Leave a comment on paragraph 24 0 But while we do not know the specific uses to which these published objects might be put, we can easily see that simply publishing them, even with the metadata of who, what, where, and when, will not enable a very large number of use cases. Specifically this bare metadata will not provide enough context to allow systems to understand where the published object came from: what workflow created it, what samples were used to generate the data, where samples came from, how they were prepared, were the proper controls run?
¶ 25 Leave a comment on paragraph 25 0 Publishing the artifacts of research is a step forward, and there are many potential uses of these, but it is not enough to support a detailed trust and understanding of the wider context. For that we need to understand the context of these artifacts in terms of the process within which they were created; the workflow or the software or the environment which created the digital artifact or the laboratory process which created a physical object.
Are workflows the answer to providing process information?
¶ 26 Leave a comment on paragraph 26 0 An enormous amount of work has been done on systems and pipelines for the automated processing of digital artifacts, and the effective recording of the running of those pipelines. Tools exist that enable the design, standardization, recording, provenancing, and enactment of defined processes. In addition work on reporting systems that help to define and record the processes by which data is recorded can also provide much of this information in the specific cases where they are applicable.
¶ 27 Leave a comment on paragraph 27 0 Where these systems have been applied and where the investment in their setup is justified, they can supply exactly the information on process that is required. Additionally they can help to standardize the form in which that information is provided. The significant investment in systems and interfaces that make such workflow tools and standards systems easier to use and more generally applicable are also increasing the scope of such systems and reducing the cost required to set them up in any specific case.
¶ 28 Leave a comment on paragraph 28 0 However there are many cases, perhaps even covering the majority of research activities, where it will be difficult to pin down in advance a specific pipeline that is applicable. Firstly in most laboratories the recording is sparse and the benefits of recording workflows per se is probably minimal. Secondly the contingent and responsive nature of most research processes means that the actual process can often only be defined in retrospect, as a way of making a record, rather than as a way of managing the actual process.
¶ 29 Leave a comment on paragraph 29 0 To give a trivial example, most data analysis in a research situation involves “fiddling”. A range of different approaches are taken, possibly using a diverse set of tools. A set of pieces are assembled and the researcher plays with them to see how they fit; gets a feel for how different parameters effect the result; looks for outliers that might suggest a course of action; dives into one area to look for clues before pulling back out and seeing how that fits the big picture. These processes are exploratory, difficult to record, and above all personal. These cases, along with the parallel case of “playing in the lab”, present the biggest challenges for effective recording. Workflows can help where they are applicable but they can only ever act as pieces unless we reject the notion that such “playful” approaches to research have their place.
A simple, but not practical, solution
¶ 30 Leave a comment on paragraph 30 0 This leaves us with a need for a system that enables us to capture process, the relationships between published research artifacts, in a way that doesn’t have significant costs for the user yet still provides benefits.
¶ 31 Leave a comment on paragraph 31 0 If we imagine a world in which we capture artifacts as they are created, we are left needing a system that enables us to capture the process of defining relationships between published research artifacts in a way that doesn’t have significant costs for the user yet still provides benefits. A very simple solution to this problem is using feeds. By definition, each captured artifact has a feed of the object being created. If we capture the creation of every artifact, then by definition, we have a feed of those records being created. The relationships between these artifacts are related to processes, e.g., this procedure took that input material and created that sample; this instrument took that sample and created this data file; this analysis procedure took all of these data files and generated this graph. When a procedure is recorded (ideally by a system linked directly to carrying out that procedure) the feeds of all possible inputs are presented to the user who then selects the correct ones. When the output artifact is created — and that creation captured and recorded — it should be automatically connected to the procedure that created it, and through that procedure to its inputs.
¶ 32 Leave a comment on paragraph 32 0 This may sound complicated but its actually extremely simple. If every system that captures object creation generates an RSS feed, then all that is required is that every system that enacts a process provides a drop down menu that is populated from those RSS feeds. These can be filtered based on metadata within the feeds according to the specific process, but there is no need to standardize either the filtering or the provided metadata as this can be done locally. There are benefits to standardization in terms of wider re-use but it is not necessary. This creates a record of relationships that provide the context in which all of the research artifacts are created. In particular it enables an external user to traverse those relationships to identify the records of how each relevant object came to be created.
¶ 33 Leave a comment on paragraph 33 0 The problem with such an approach is that it is only useful where it is sufficiently comprehensive and making it comprehensive requires a complete reworking of virtually every process in every research chain, something which is clearly impractical. What may be practical, however, is to deploy such systems where they can find use locally. There are benefits to providing such systems within single laboratories and even within single processes in terms of better data management and records. They can also be almost invisible to the user meaning the costs of adoption can be low. We have deployed such systems and users have found them useful in helping to create a more structured environment in which specific local records are kept. Thus this approach can provide a route towards capturing more in areas where the research is diffuse and responsive, while workflows, pipelines, and data standards can continue to expand their applicability to more structured systems.
The abundant fragments and the graph of their relationships
¶ 34 Leave a comment on paragraph 34 0 The central assumption of the current argument is that publishing more of the research record is better. To enable that it is not sufficient to create more publication venues because our traditional publication venues have very high costs of both authorship and review. Thus publishing more, and different, types of output requires new approaches to publication that specifically aim to reduce the impact of those costs. Specifically they must reduce the burden of authorship and preparation and reduce the burdens associated with review. Very few current proposals seriously address these issues.
¶ 35 Leave a comment on paragraph 35 0 In terms of review the simplest answer is simply to not bother. Given the lack of evidence on how review can enhance the quality of published outputs and its significant cost this is a reasonable approach. It does however require that serious attention be paid to the development of discovery tools and to automated systems that can assist in determining the quality and usability of published outputs. Such development will in fact benefit from a larger and more diverse range of published outputs, Google’s algorithms would not be improved by removing pages from the web, so actively deciding not to review new forms of research output will enable a greater dataset on which to develop improved discovery systems.
¶ 36 Leave a comment on paragraph 36 0 Reducing authoring costs is more challenging. The use cases for simply “throwing stuff up on the web” exist but are limited. The absolute minimum metadata for both re-use and cultural reasons will be who, what, where, and when. If systems can be provided that capture this information at the instance of creation for digital objects, and at the instance a record is made for physical objects then this minimal metadata can be captured without any significant additional burden for the user. The burden of publication, for the author, then becomes no more than making a decision to publish. The burden for the system is to provide automated systems that can take these objects, push them to the public web, and provide sufficient confidence in long term accessibility and curation.
¶ 37 Leave a comment on paragraph 37 0 However to enable the wider range of use cases the publication of objects is not enough, they need to be placed in context. Workflow systems and data reporting standards can provide much of this for more structured forms of research. However a significant proportion of research process is not easily captured within these systems and the costs of implementing structured pipelines for recording purposes may be too high. In these cases there are simple options but it will be difficult to apply these globally due to the change required in a wide range of systems. Nonetheless implementing simple input-output recording systems can be valuable even in isolation and may provide a route towards capturing more context and supporting the development systems that will work towards wider acceptance.
The return to the narrative
¶ 38 Leave a comment on paragraph 38 0 All of the above paints a picture in which conventional publication of papers carries on more or less as it has for the past 20 years while a light-weight, and light touch, ecosystem of publication of a wide range of objects continues. Because the cost of publication of these new objects needs to be kept down they are likely to be less useful, and probably less used, on a per object basis than traditional papers. On the other hand their diversity and sheer potential quantity mean that even if usage rates are an order of magnitude or more lower then gross use rates could easily be higher.
¶ 39 Leave a comment on paragraph 39 0 However the bottom line is that for the system described above the narrative article remains a more efficient way of providing context for the human reader. We are story telling and consuming machines so despite its higher cost of authorship and reviewing the traditional narrative still has enormous value and clearly has a place in the scholarly communication ecosystem. Above all, none of the system described above provides any meaning. At best it is a record of objects that can be used to support statements, but it is not a system in which those statements have meaning. However it is a system that could be used to support the making of statements and the telling of stories and in doing so reduce both the costs of authorship and the costs of reviewing for those stories.
The human narrative
¶ 40 Leave a comment on paragraph 40 0 A key problem with the current narrative paper is that it is a monolithic object which is seen as having to be completely “new”. Introductions, methods, discussion and conclusion have to be written from scratch, unless the authors wish to be accused of plagiarism, and the whole has to be prepared and then often re-formatted for any other journals it is submitted to. The re-use of text or the idea of papers without introductions are unlikely to catch on soon but in the meantime it is still possible to use previously (or simultaneously) published fragments to reduce the authoring burden.
Aggregation can reduce the authoring burden
¶ 41 Leave a comment on paragraph 41 0 It is not difficult to imagine authoring environments in which previously existing research objects, or collections, are “embedded” in a way which makes the authoring process more efficient. References to methods used, as well as the detailed record of carrying them out, can be dropped in to make up the methodology section, results can be pulled from the existing records, perhaps cleaned up a bit or reformatted but not requiring the arduous transcription and recreation of visualizations that are common today. Perhaps most appealingly these fragments can pull all of their external references in with them, making it possible to autopopulate the reference list, the acknowledgement of funders, and other parts of the manuscript.
¶ 42 Leave a comment on paragraph 42 0 Additionally as there are more examples of this process of aggregating fragments into papers the systems that support this authoring will get smarter. By selecting the dataset, all the relevant pieces of methodology, or all the connected data could be automatically pulled into the authoring environment. The chapters or the main events would be already in place waiting for the authors; the threads and the weft in place waiting for the authors to weave the narrative, to tell the story that they want to tell. Equally such systems could surface gaps in the evidence, warn the authors that a piece is missing, saving everyone’s time in both rewriting and reviewing.
Aggregation can reduce the reviewing burden
¶ 43 Leave a comment on paragraph 43 0 It is also possible for the prior publication of elements of the record, or even the capture of local re-use to reduce the burden of review for traditional papers. Part of the concern in reviewing is which components of a paper can or should be reviewed and what level of trust can be given to the authors. For example in many cases it may be desirable to manipulate images to illustrate a specific point, for example combining lanes from different gels or positions on the gel. This, in and of itself, is not wrong, it is effective story telling but if the component original images can be easily identified and connected to then there is a stronger basis for accepting that the manipulation is both useful and acceptable.
¶ 44 Leave a comment on paragraph 44 0 More generally when a method or data or a material has been widely re-used then there is less need to delve into the details of that part of the paper as opposed to new methods or data which will require closer scrutiny and critique. Evidence of re-use is also good evidence that appropriate metadata has been provided reducing the burden of checking for policy compliance. Ultimately it may be possible to carry out entirely automated technical reviews for whole classes of papers, reducing the burden dramatically.
Authoring via aggregation can enhance discovery
¶ 45 Leave a comment on paragraph 45 0 Thus far I have made the claim that “more data will support better discovery” but not really addressed how that might happen. In the short to medium term it is clear that a major route for initial discovery will be through conventional search and discovery approaches which are strongly biased towards conventional papers. That is, a major discovery route for published fragments is via references within narrative papers. An advantage of the narrative being aggregated from these fragments is exactly that it will strengthen those references and enable these discovery pathways.
¶ 46 Leave a comment on paragraph 46 0 Precisely because a significant quantity of work is going into categorizing, mining, and working with narrative papers an information framework is being built, a graph that will support search across all of these connected objects, albeit the first instance from a limited set of entry points. As the network is strengthened the connections between the non-narrative objects or fragments will become more common, providing new facets along which to search, ultimately allowing much more effective search strategies for specific use cases. In addition simply exposing these different object types will enable the research community to take more advantage of advances in consumer search as they arise, including image, data, and social search approaches.
¶ 47 Leave a comment on paragraph 47 0 As these fragments get re-used in multiple narratives we start to see three key things happen which mirror advances in information technology over the past twenty years. Firstly that the same fragment can be indexed, wired into, many narratives, categorized in many ways, in the same way that it is no longer necessary to put a book, or a file in a single place with a single index card. Secondly we will see the sparse graph of the current literature with its limited referencing and citation become much thicker finally enabling us to make effective use of the graph algorithms that make the web work. The reason Google Scholar is apparently a relatively poor search engine is primarily because the linkage graph of the research literature is so sparse. If we fix that the value of existing consumer search tools for scientific content will increase exponentially.
¶ 48 Leave a comment on paragraph 48 0 Finally, mirroring the advances of the social web over the past 5-10 years, the weaving of fragments into narratives will make social search much more effective for scientific content. Social search approaches are already gaining much ground for conventional literature but are limited by the monolithic nature of the narrative paper. If social search tools can be built around the way people interact with fragments of scientific content, they will become much more powerful. The fact that many people have re-used a workflow associated with a paper is much more useful than the fact that many people cite the paper.
The narrative remains expensive – invest in it wisely
¶ 49 Leave a comment on paragraph 49 0 All of the enhancements that reduce the cost of publishing fragments and increase the usability of those published fragments will ultimately not make massive inroads into reducing the costs of either authoring or reviewing. Both of these will remain costly however they are managed. Ideally such effort will be focused where it is needed for the specific use case, and not as is the case currently, wasted on what is effectively the publication of fragments, process, data, as if they were narratives.
¶ 50 Leave a comment on paragraph 50 0 My personal belief is that the scholarly publishing industry as currently configured is heading for a crash. The costs and increases in cost are unsustainable. Scholarly publishing in its current form will simply cease to be profitable. This will drive lower cost alternatives but there will still be a place for the important narratives, the clash of ideas and concepts. There is a place for the paper, and there will always be a place for researchers telling stories. But in the future we will have to husband our resources, both in terms of authoring and in terms of reviewing, much more effectively. A real market in the time of experts will drive this rationalization and it is unclear whether it will be managed or not.
Charting a way forward
¶ 51 Leave a comment on paragraph 51 0 What will actually happen? How do we move away from the current system towards a more diverse publishing ecosystem and what are the steps that need to be taken? The first step is providing lightweight platforms for publication of diverse objects. It is clear that the exemplar here will be data publication with many initiatives currently underway to support data publication in one form or another.
¶ 52 Leave a comment on paragraph 52 0 Most of these however still work from the view of pulling together the data for publication and there are few if any systems that support structured publication as a side effect of simply recording data. The Publish@Source concept described ten years ago by Jeremy Frey remains largely unrealized beyond demonstration systems. Achieving the kind of lightweight publication and recording systems envisaged here will require much more integration between laboratory recording systems and publication systems. Work in this area is proceeding and the recent JISC managed University Modernisation Fund programme includes a number of projects in this area.
¶ 53 Leave a comment on paragraph 53 0 The web provides many examples of mechanisms for lightweight publication and many of these can be adopted wholesale, particularly in the area of software publication where repositories are already available and highly functional. However the question of long term archival, and whether and when it should be undertaken, remains a serious question that must be tackled, and not just tackled with the assumption that we will somehow “keep everything”. We have not even managed to effectively archive the scholarly literature over the past twenty years. How can we address the long term archival of a diverse graph of dynamic objects?
¶ 54 Leave a comment on paragraph 54 0 Authoring tools need to adopt a “copy by reference” or embedding approach that preserves the link to the published fragments that are being addressed. These links in turn need to be preserved through publishing mechanisms. The purpose of narratives is to connect things together, to link the disparate into a simplifying framework that lets us understand more. But without the links back to the supporting information it is simply story telling and not science.
¶ 55 Leave a comment on paragraph 55 0 As it is these links that carry the core of our scientific knowledge we need to work to standardize and enrich them. Working to standardize citation mechanisms and tools will be a critical part of making the whole ecosystem work. It is the connections that will hold the whole fabric together and we need to make them effective and useable.
¶ 56 Leave a comment on paragraph 56 0 And finally we need to tackle our addiction to the narrative paper as the only form of scholarly output that matters. This can be addressed, and is being addressed, top down by policy initiatives, it can be addressed by expanding the culture of citation to include a wider range of objects, but it also needs massive cultural change from “the middle out” which will take time and effort. This requires both grass roots advocacy and leadership from scholarly societies, senior researchers, publishers, and funders. At one level it is just a matter of keeping saying the same things and of embedding those values deeply in research culture over time. This is where the real challenge lies. The rest is necessary but not sufficient.