Patrick Murray-John has been working tirelessly over the last month to realize an extremely exciting possibility for marrying the Semantic Web with WPMu, although this experiment is by no means limited to this application. What he has been doing is scraping the available data from the uber RSS feed of public blogs from the UMW Blogs Tags Site, and pulling it into a suite of semantic web tools provided by MIT’s Simile project (namely Exhibit and Timeline).
“Why?” you ask. Well Hondo, because these tools provide the means to visualize and connect the activity on UMW Blogs in new ways, check out the Timeline of UMW Blogs posts over the last two weeks here. Or look at how a tool like Exhibit provides interesting ways for creating a more comprehensive directory of users, tags, and posts (something WPMu just can’t do extensively). The alphabetized Bloggers Exhibit that has a weighted tag cloud for each letter of the alphabet which lists usernames, or take a peek at the Blogs Exhibit that does the same thing with Blog titles.
Moreover, we now have a way to collect all the images uploaded to UMW Blogs in one place, and a gallery of top ten lists for those blogs with the most images, audio files, or videos. What this means is we now have a series of alternative means for capturing and mnpulating dta for UMW Blogs that will allow us to search, discover, and make connections more easily than we could previously. We are at the beginnings of this experiment in some ways, yet in others we simply just have to style and re-theme the data accordingly and we are ready to unleash it on the UMW Blogs community to see how they use it and what value it brings to further build upon this already robust publishing platform. Is this what the trendy discussions about Web 3.0 is all about (besides the pervasive idea of cloud computing which is in many ways upon us)? Finding ways to marry the power, ease, and usability of Web 2.0 tools with the promise of discoverability, visualization, and deep connections that the Semantic Web has promised? I guess we’re about to find out here at UMW.
Thanks for the shout-out, Jim! Still lots to do, but it’s coming along happily!
“that will allow us to search, discover, and make connections more easily than we could previously”
how’s about doing something to the network graphs in
http://ouseful.wordpress.com/2008/10/13/visualising-the-ou-twitter-network/
or
http://ouseful.open.ac.uk/blogarchive/014984.html
or
http://ouseful.open.ac.uk/blogarchive/014840.html
I think i have a php script that will take an export from wp, mine the internal trackbacks and generate a dotfile for graphviz if you want it? (Not done one for scraping links out of RSS descriptions yet, to see how blogposts relate to each other that way? Maybe you know someone who could tweak the code to do that?)
Tony,
Very cool stuff…and highlights a big weakness of my approach of using Atom and RSS feeds — they don’t report those trackbacks (though I can get links just by looking at the content in the feed and scraping by a nodes). I discovered already that there’s not as much internal linking as I might have hoped.
I’m very torn about the fact that I’m focusing on the feeds. On one hand, I know I could get a lot more data, as you have, by working with a script that goes right to wp database. On another hand, I fear updating it as WP changes. On still another hand, I also want this to scrape data from Drupal, Blogger, Omeka, and any other future application we use. I can’t write a new script (and maintain it!) for each app, but I can count on a feed of one kind or another. So the Vishnu-factor of too many hands on one side is keeping me concentrated on the feeds.
Since I’m working in RDF, I’m hoping in the future to be able to play nicely with the plugins being produced by the SIOC (Semantically Interlinked Online Communities) folks. But there’s another hand, there — I’d be counting on _others_ to install those plugins. Until the virtues of installing those plugins are more widely embraced, I don’t want to count on them being there. Indeed, part of the sneaky mission of this is to demonstrate some of the virtues of making your data available in RDF to encourage that adoption.
That said, I am curious about the script you mention — I suspect it won’t be too hard to tweak it to report data as RDF and, at least for UMWBlogs, we might be able to bring it in that way.
Thanks!
Patrick
Patrick
The export I used for the digitalworlds graph was just the wordpress export of the Digital Worlds blog, that I then parsed and scraped.
The ORO author network was generated by screen scraping author names, but the repositroy has since opened up RSS feeds where I’ll scrape the authornames from next time; the twitter thing was done by grabbing data from repeated calls to the twitter api.
I guess one of the things it would be good to do would be to see who was linking to whom; this could be done by subscribing to and mining the feeds, building up a directory of original post URLs and the URLs that are linked out to from those posts over time?
tony
Tony,
That should be relatively easy to do. Right now the exhibits for blogs and posts includes a list of everything that they link out to. There’s also the Link Friends Exhibit, which gives a list of all the URLs that more than one person has linked to, along with info about who linked to it.
The trickier thing that I’d like to get at is a “Good Neighbors” exhibit, which I think will move closer to what you’re getting at. This would pull out pairs (maybe groups) of blogs such that one or more post within one of the pair links to one or more post in the other blog. Pulling that data together is relatively straightforward — the trickier part has been figuring out how to display it in a readable way.
Patrick