¶ 1 Leave a comment on paragraph 1 0 If you were around either myself or Deepak Singh around 2010 you would almost certainly have heard the Jeff Jonas/Jon Udell soundbite: ‘Data finds data. Then people find people’1. Jonas is referring to data management frameworks and knowledge discovery and Udell is referring to the power of integrated data to bring people together.
¶ 2 Leave a comment on paragraph 2 0 At some level Jonas’ vision (see his chapter with Lisa Sokol in Beautiful Data 2) is what the semantic web ought to enable, the automated discovery of data or objects based on common patterns or characteristics. Thus far in practical terms we have signally failed to make this a reality, particularly for research data and objects.
¶ 3 Leave a comment on paragraph 3 0 Udell’s angle (or rather, my interpretation of his overall stance) is more linked to the social web – the discovery of common contexts through shared data frameworks. These contexts might be social groups, as in conventional social networks, a particular interest or passion, or – in the case of Jon’s championing of the iCalendar standard – a date and place as demonstrated by the elm city project3 supporting calendar curation and aggregation4. Shared context enables the making of new connection, the creation of new links. But still mainly links between people.
¶ 5 Leave a comment on paragraph 5 0 The naïve analysis of the success of consumer social networks and the weaknesses of science communication has lead to efforts that almost precisely invert the Jonas/Udell concept. In the case of most of these “Facebooks for Scientists” the idea is that people find people, and then they connect with data through those people.
¶ 6 Leave a comment on paragraph 6 0 My belief is that it is this approach that has led to the almost complete failure of these networks to gain traction. Services that place the object research at the centre; the reference management and bookmarking services, to some extent Twitter and Friendfeed5, appear to gain much more real scientific use because they mediate the interactions that researchers are interested in, those between themselves and research objects. Friendfeed in particular seems to support this discovery pattern. Objects of interest are brought into your stream, which then leads to discovery of the person behind them. I often use Citeulike in this mode. I find a paper of interest, identify the tags other people have used for it and the papers that share those tags. If these seems promising, I then might look at the library of the person, but I get to that person through the shared context of the research object, the paper, and the tags around that object.
¶ 8 Leave a comment on paragraph 8 0 A common complaint made of research data is that people don’t make it available. This is part of the problem but increasingly it is a smaller part. It is easy enough to put data up that many researchers are doing so, in supplementary data of journal articles, on personal websites, or on community or consumer sites. From a linked data perspective we ought to be having a field day with this, even if it represents only a small proportion of the total. However little of this data is easily discoverable and most of it is certainly not linked in any meaningful way.
¶ 9 Leave a comment on paragraph 9 0 A fundamental problem that I feel like I’ve been banging on about for years now is that dearth of well built tools for creating these links. Finally these tools are starting to appear with Freebase Gridworks6 being an early example. There is a good chance that it will become easier over time for people to create links as part of the process of making their own record. But the fundamental problems we always face, that this is hard work, and often unrewarded work, are limiting progress.
¶ 11 Leave a comment on paragraph 11 0 Human interaction is unlikely to work at scale. We are going to need automated systems to wire the web of data together. The human process simply cannot keep up with the ongoing annotation and connection of data at the volumes that are being generated today. And we can’t afford not to if we want to optimize the opportunities of research to deliver useful outcomes.
¶ 12 Leave a comment on paragraph 12 0 When we think about social networks we always place people at their centre. But there is nothing to stop us replacing people with data or other research objects. Software that wants to find data, data that wants to find complementary or supportive data, or wants to find the right software to convert or analyze it. Instead of Farmville or Mafia Wars7 imagine useful tools that make these connections, negotiate content, and identify common context. As pointed out to me by Paul Walk this is very similar to what was envisioned in the 90s as the role of software agents8. In this view the human research users are the poorly connected users on the outskirts of the web.
¶ 13 Leave a comment on paragraph 13 0 The point is that the hard part of creating linked data is making the links, not publishing the data. The semantic web has always suffered from the chicken and egg problem of a lack of user-friendly ways to generate RDF and few tools that could really use that RDF in exciting ways even if it did exist. I still can’t do a useful search on which restaurants in Bath will be open next Sunday9. The reality is that the innards of this should be hidden from the user, the making of connections needs to be automated as far as possible, and as natural as possible when the user has to be involved. As easy as hitting that “like” button, or right clicking and adding a citation.
¶ 14 Leave a comment on paragraph 14 0 We have learnt a lot about the principles of when and how social networks work. If we can apply those lessons to the construction of open data management and discovery frameworks then we may stand some chance of actually making some of the original vision of the web work.
- ¶ 17 Leave a comment on paragraph 17 0
- Udell, J (2007), Data finds data, then people find people, Jon Udell (blog), retrieved from http://blog.jonudell.net/2007/07/02/data-finds-data-then-people-find-people/ ↩
- Jonas, J, Sokol S (2009), Data finds data in Beautiful Data, Segaran, T, Hammerbacher, J eds, Sebastopol, O’Reilly Media, available online at http://jeffjonas.typepad.com/DataFindsDataCreativeCommons.pdf ↩
- Udell, J, (20), Elm City FAQ, Jon Udell (blog), retrieved from http://blog.jonudell.net/elmcity-project-faq/ ↩
- The people part is demonstrated sadly by the fact that the Elm City project did not survive Jon’s move onto other things. The core person driving these projects turns out to be crucially important ↩
- Friendfeed appears again. It was a crucial part of the development of many of these ideas, both the technological possibilities that it represented and the community that was discussing them ↩
- Gridworks has long since disappeared, being incorporated into the project that first became GoogleRefine, after the purchase of Freebase by Google, and was later spun out as OpenRefine. OpenRefine is still a great tool but it doesn’t really represent the ideas being explored here, which were less about cleaning data and more about connecting it up ↩
- At the time these were the games on Facebook that were most well known. I’m not at all sure whether things have moved on or not ↩
- See for example, Nwana, HS (1996), Software Agents: An overview, retrieved from http://www.sce.carleton.ca/netmanage/docs/AgentsOverview/ao.html ↩
- This remains my test case for the promise of the semantic web or linked data more generally. While the information to in principle support this query is available in the Google Knowledge Graph, a search for the query yields a TripAdvisor post from 2008. ↩