Gladly Eating Some Drupal Crow

Bill Fitzgerald has posted about a most impressive aggregator he put together with Drupal in about six hours. He documents the modules he used, and his creation slices and dices the content from various feeds in some really impressive ways.

This may very well be the beginnings of a more sophisticated tag cloud, directory, searchable archive, and filtering system. Bill and I have been jawing back and forth about Drupal and WPMu for a while now, and he’s an absolute pleasure to heckle. More than that, he has been doing some amazing work with DrupalEd and this here aggregator site he put together fully defines the meaning of wrapping Drupal around a WPMu install (or any other series of feeds for that matter) that I could never quite get my head around last year.

Next step is tweaking this experiment to create OPML and a more sophisticated directory. I am really starting to believe that Drupal may be just the tool we are looking for to display the activity at UMW Blogs. Wait, did I just say that I think I just said?!

More to come on this very exciting development. There is no question I have a lot of crow eating to do when it comes to both Bill and Drupal, and given the season I think I am ready to eat another bird or two some time very soon.

Image of a cool cat eating crowImage courtesy of tanakawho.

This entry was posted in drupal, wordpress multi-user, wpmu and tagged , , , , , , . Bookmark the permalink.

14 Responses to Gladly Eating Some Drupal Crow

  1. Hello, Jim,

    I’ve been thinking about different ways to skin this cat (sorry, given the photo in your post, I couldn’t resist 🙂 ) and this is where I’m at —

    D’Arcy mentioned the need for this to scale (http://bavatuesdays.com/umw-blogs-middlesell-sittin-in-a-tree/#comment-40603), and he’s right. With that said, I don’t think we need to have scalability to 100K students as a first goal. The beauty of the small pieces loosely joined is that it’s easy, and that it’s a step away from the monolithic LMS’s so beloved by so many —

    Toward that end, it’s good to consider what we’d need to carry from the blog to the aggregator in order to connect a student work with an institutional SIS/LMS. To start, I see two factors as essential: first, mapping a feed to a student, and second, mapping individual posts from within a feed to a course.

    The first piece is relatively straightforward: within the institutional aggregator, map each feed to a userid within the school’s system. This way, institutional IDs are not exposed via any type of feed, and the connection of student feed to institutional record occurs where it needs to: within the institutional aggregator.

    The next piece gets trickier: embedding course info into the feed. I actually think the easiest way to do this would be to use the Atom feed, as the Atom feed is designed to carry additional info (as an xml payload within the feed). Google is using Atom feeds this way on Open Social (http://code.google.com/apis/opensocial/docs/gdata/people/developers_guide_protocol.html — although for a far more complex implementation, carrying friend data), and given that WP already generates Atom feeds, it makes sense to leverage what’s already there.

    So, on the WP side: some new code that creates a drop down list of course names keyed to course IDs. When a person is creating a blog post, they have an additional field containing a list filtered to their own courses. If we want to get really tricky, we could include whether the poster is a student, instructor, ta, etc, for a specific course. This would involve querying/syching data out of the school’s Course Management System and exposing it via the WP UI.

    On the Drupal side, this data would need to be mapped into taxonomy terms (and this code already exists/is working on the feeds site). This mapped taxonomy term would automatically generate a feed from within Drupal of every post in the course, and these posts could also be displayed on a course by course basis — so we could filter by author, course, keyword, date, etc. Then, within Drupal, OPML feeds per course would need to be exposed to privileged users — these OPML feeds would be exportable, and would allow someone to subscribe to all the feeds in a single course at one time.Creating these OPML feeds would require new code. Alternately, it would be possible to create a page view of all the posts within each course using the views module, date filters, etc.

    While there would still be more work to do after this, coding solutions for these two items (add course data to the atom feed from within WP, and generate the OPML feeds from within Drupal) would allow feeds from WPMU to be aggregated and sorted by student and course.

    The advantage of creating the drop down list is that the process of selecting/typing the correct tag is simplified. The disadvantage, though, is that any user not on a school-offered blog is ouot of luck. In order to support a wider variety of platforms basic keywords could be used on a course by course basis. Then, within Drupal, keywords that have been reserved for a specific course could be handled differently than other keywords. This system would be far more prone to user error (and would subsequently have issues scaling) but it has the additional advantage of working with any blogging platform that supports tags on posts. WP does that, right?

    🙂

    Also, re the title of your post, I make a habit of eating crow, but I like to do it in style.

    http://bertc.com/three_crows.htm

    Bon Appetit,

    Bill

  2. CAW! CAW!

    Excellent stuff…we’re getting closer and closer…

    I’ll throw in a thought about the two sides, and the implicit — and , I think, trickiest — third side: the school’s database of course info. I think that’s trickiest because that’s the most likely to be proprietary, and who knows what kind of hurdles will need to be vaulted to get at it all.

    I’m thinking of a mixer (yeah, if you know me you know what kind!) in the middle of all three that would be the store of the administrative data about courses (semester, teacher, enrollment if we’re lucky, etc.) and people (student/TA/prof, courses taken during X semester, etc.). The admin data would come in from the univ.’s database directly or indirectly. People data could come from a variety of sources (WPMU profiles, Drupal profiles, separate registration. . . ).

    Basic data about a post would come into the system through an Atom feed. That data could then be mixed with the additional data to give a beefier feed back to Drupal. That would delegate a lot of the data management to the centralized store. Thus, if the Atom feed coming out of WPMU carries the author and some course identifier for the post, the mixer could then add in additional metadata about both from the central store. That way WPMU gets to concentrate on just the post data, simplifying that end; Drupal gets a whole lot of data to play with for lovely views, aggregations, and feeds; and the task of gathering administrative data is kept separate. As a bonus, it might be easier for other platforms to play along.

    On the tricky part of getting all that admin data from the school’s system, no matter what there will be conversion to do (again, if that data is exposed at all). At worst, much of the data–for example schedule info–is probably exposed at least as a web page. A scraper could at least grab that, then stuff it someplace. There, too, is where the mixer would come in.

    I’m thinking of that central mixer being RDF-based (but many of you knew that already). Yes, that means some conversion into RDF, but in most cases that’s fairly straightforward and many good tools are available. RDF makes mixing a variety of different data easy, and spitting it back out as ATOM, JSON, or whatever is also fairly straightforward.

    Here’s a diagram of what I have in mind. (I don’t know how much HTML I’m allowed, so here’s also a link to it.

  3. great post, jim. followed by epic comments by bill and patrick. awesome stuff. I think we’re slowly getting there 🙂

    The aggregation stuff is the first step, but to do it up right, this is almost more of an identity management problem as it is an aggregation problem. An individual needs to be able to say “This is me. I publish at these various places. I am a member of these communities. I am a student here, taking these courses, in these semesters, in these sections. I am a member of these school groups. I work here…” etc…

    I’m not sure how to best do that. Making every student come to one magical “portal” to register their various bits of online self might work, but it feels backward. There must be a way for an individual to manage their online identity in a decentralized way (ala OpenID) and extend that by associating their publishing, communities, etc…

    Patrick, that’s a great diagram that helps to flesh out some of the interactions. If it helps, I’ve put some (rather incoherent) sketches of my take on the concept, as it’s evolved. I’m overdue to update it though…

  4. I s’pose it might come down to the OpenID-ish (and RDF-ish) idea of the unique identifier for each person. Then, in whatever environment they add info about themselves, that info is assembled together around the identifier by the mixer. Something derived from an email address or university user id could do the trick–UMW still gives everyone a whopping 10MB of server space at the URL “http://people.umw.edu/~{userid}”. If the profile tool (assuming there is one) in whatever environment lets you put in some kind of id, and the environment can spit out profile info, then all the user would need to register at a centralized place is “My identifier (a URI) is this. Look for more info about me here, here, here, here, and here.” Alternatively, go nuts with each environment having its own separate identifier for you, and just have a registry of here/identifier pairs: “Look for more info about here w/ id X, here w/ id Y. . . ”

    That kinda reverses what you describe. Instead of “I publish at these various places,” it’s “These various places have info about me,” and that’s when gets (somehow) into a central registry.

    In this scenario, there are breakpoints due to the decentralization (can the profile carry that info? can the environment spit out that info?), and there’s still the need to register where the system should look. So the balance seems to me to be between the virtues of decentralization and the risk of breakpoints. There’s also the matter of people keeping their info current, especially when bits and pieces of it come from different sources.

  5. The identity management piece is certainly a big piece of getting this right. The identity management piece is also directly tied into barriers to entry/ease of use: what steps must a user take in order to participate in this system, and at what point does this shift from being an incarnation of small pieces, loosely joined to a reincarnation of a a bigger, less flexible system.

    As I see it, there are two contexts where identity needs to be addressed: a person’s identity within the university (and this connects with the various mechanisms in place to record/track/assess their progress within the universtity) and a person’s identity outside the university (a blog, an Amazon wishlist, a Facebook account, a blog, etc). These identities come together when content from external sources (ie, the external identity) need to be connected to the user’s internal identity.

    So, at the most basic, we need to connect a feed URI to a school-issued userid (or UID).

    1. UID –> Feed URI — one UID can be connected to many URI’s

    2. Ideally, we could also have info about users: UID –> role (student, teacher, ta, etc) — one uid can be associated with one or more roles.

    3. Then, we’d also need info on users and courses: UID –> CrseID — one UID to many courses, with perhaps an optional column to hold data on a user’s role in the course.

    4. We’d also need a means of assigning tags to courses: CrseID –> CrseTag

    People blogging externally would have a feed registered in the system so all posts coming into the system from their feed would automatically be associated with their URI.

    These feeds would contain the course info, in the form of the course tag. In a more tightly managed system (ie, the school issues the blogs) the course data (ie the tag) could be handled via a drop down select list that get’s dynamically populated. In a more open system (ie, one that supported more platforms) this data could be entered manually by the student. More clunky, yes, but it also allows for a lower barrier to entry, and it also creates the potential for anyone, anywhere to participate using the tools they are already using.

    RE the rdf mixer and multiple methods of identifying users, and Patrick’s diagram: absolutely, and the key is getting these various forms of identification linked to a person’s UID. So, whether it’s OpenID with attribute exchange, FOAF data, etc, we could look to two final tables:

    5. UID –> identifier, identifier type, identifier source; and identifier –> attribute, note

    At the core, though, is the basic relation between UID and Feed URI. This is the lightest connection *necessary* for this to work, exposes the least amount of student data to the outside world, and has the fewest barriers to entry.

    Cheers,

    Bill

  6. and:

    Patrick, it heartened me to no end to see you blogging from a suitable platform. 🙂

    Cheers,

    Bill

  7. Excellent…thanks Bill, that parses things out very nicely!

    For a different project, I’ve worked a bit on a slice of this–associating a student with course info, as well as info about the course. The project is RDF-based so that’s where my brain is in this sketch, but hopefully some of the ideas and approaches will be general enough to keep the discussion moving in any direction.

    Each person has a URI, essentially a UID they choose attached to a namespace for people registered.

    Roles, at least in the most general sense are tricky, at least in the most general cases (thankfully, I haven’t yet needed to deal with them). Think of a TA in grad school. In some courses, the role is student. In others, it’s instructor (or grader, or recitation instructor, or slave labo– you get the idea). So as in Bill’s #3, role would have to be a function of the course, not the individual.

    Also on #3, course info gets sticky when we deal with more than one semester, and when we deal with more than one section. I handled that by making two distinct types — one for ‘course’ in administrative usage, which I called a “Course”; and one for ‘course’ as in an actual classroom, which I called a “CourseManifestation” (as in an individual real-life manifestation of the abstract, administrative Course. (In RDF terms, the CourseManifestation is a subtype of a foaf:Group).

    So a Course can have properties like ‘owner department’, ‘instructional level’, ‘requirements filled’, etc. A CourseManifestation, then, first has the property ‘manifestationOf (a Course)’. Then it has ‘semester’, ‘instructor’, ‘location’, and of course ‘foaf:member’s. (Again in RDF terms, the various roles might be subproperties of foaf:member, dunno about that).

    It sounds more complicated than is needed up until we get to #4, assigning tags. Tags would have to be to CourseManifestations, not Courses. The separation then also provides a mechanism for aggregating across different sections, instructors, or semesters if ya want.

    For the course id’s and related (machine?) tags, the university gives one for administration purposes like registering, but what student (or teacher, for that matter), will remember that? This, I think, will be tricky to sort through if it’s going to be precise enought to work right and be simple enough to be usable.

    For the core issue of connecting users to their various blogs (and feeds), it seems like some human-based mechanism for collecting the list of students and their blog URLs (as most teachers now do), would gather the basic data — someone has to manually enter it at this step — then the technology can take over to find the feed URL(s) for the blog.

    Just ‘cuz I’m loving diagrams this week, here‘s a sketch of parts of what I’m thinking, based on that project I mentioned and on what we’re talking about. Pretend that ‘patrickgmj’ is a student and ‘jimgroom’ is a teacher.

  8. jimgroom says:

    Wow,

    When the cats away the Drupal rats will play 🙂

    You all are rocking’ out, it took me a few hours to digest these comments for there is so much here. Let me just say for the moment that the more abstract idea of “One Feed Per Child” idea is really interesting to me. I student comes in not so much with a blog or wiki or anything like that, but some kind of OPML-like feed that they can manipulate and add data to in order to make it do their bidding.

    I like this idea a lot. And there is project Brian lamb’s group was working on that in many ways addresses part of this OPMl creation on the fly, I have to find the URL for that test project shortly.

    Whenever we talk about getting data from the centralized Banners of the world I always cringe a little bit. That said, what info we really need is a course lis that can then become a drop-down menu from which students can choose on a post by post basis.

    Yet, even that would be a pain in the ass, with the new tagging functionality in WPMu I imagine a pilot where each class decides on a tag, allowing Drupal to slice, dice, and display these tags or create and OPML file around them. I opt for the potentially more error prone option right now because the pilot is still not “enterprise” and I would be interested to see how effectively we could incorporate not only WPMu blogs, but also blogger, Drupal (God knows why), Typepad, etc. Additionally, the overhead of banging your head against proprietary administrative systems might hinder experimentation that I really believe we would need to do with this before it becomes a fully scalable solution.

    The possibility of OpenID in all of this seems extremely forward thinking. Scott Leslie did a presentation recently about OpenId, and I haven;t looked at it yet, but I know I need to. How can we start thinking about a distributed identity management system that may help take some of the pressure off grabbing and filtering all our information about people from centralized systems?

    This is amazing stuff, and I do believe we should create a working group to pursue all of these issues more specifically together. Reason being, unless we set up some specific cases to test, all of these ideas may never have the opportunity to be closely examined. And I imagine we will have a pretty rich testing scenario in UMW Blogs next semester. Any takers?

  9. Tags and Mixers

    The attack of “each class decides on a tag” leaves me worried because it introduces a whole bunch of ambiguity, especially in the case of wanting to aggregate across more than one section of the same class. That makes a situation where, at the section level, there are different sections deciding on different tags, but on the class level they should all be able to refer to the same thing. Whatever might do that work would also need to know how to match up different section-defined tags with the over-arching tag for the class, and that would require a good deal of manual labor.

  10. jimgroom says:

    Patrick,

    We could just as easily come up with a consistent tag for each section, say engl101_03_s08 and have each student and professor enter this accordingly. It may be a bit of overhead at first, but for the pilot stage it may move the experimentation along quicker. Although, this may not be the case if the more centralized warehousing tool is easy to build and plays with Banner, Drupal, and WPMu nicely. Either way, these are all options we need to flesh out and experiment with to see what, indeed, works the best.

    I always have my ax to grind, but I am also up for a detente every so often 🙂

  11. Greetings, all,

    Some quick thoughts on this:

    Patrick’s middleware/translator between Banner (ot: Banner? Really? Banner? Why, why, why?) or any other system will be the key to the long term viability of this, as for this to scale to any degree it will be essential to have the kind of feed-student-specific course instance mapping we all discuss. This is a big project, and not something that will be coming together quickly. On a fast development timeline, I reckon that a team with dedicated resources and a clean spec could start working now, produce a working version by June, test over the summer, and deploy in the Fall of 08. And that would be working quickly on this, and developing a tool that had flexibility wrt collecting data and clearly defined formats for sharing data. Clearly documented APIs on this tool would be a must.

    WRT the pilot, I agree with Jim (and man, it hurts to say that 🙂 ): create a list of tags with a specific tag for each section of each course. A separate list will need to be compiled that maps each blog feed to a specific student account. In theory, a student could have more than one feed, but (and correct me if I’m wrong here) in this pilot this probably will not be the case.

    Within Drupal, we can create a nested taxonomy that positions each course-specific tag within the proper organizational structure — this would be both a good organizational exercise, but could also simplify any transition to a more robust (ie, automated/scalable, aka, the middleware) system later on.

    The downside of this approach is that it involves manual effort to perform tasks that can be automated. The upside to this approach is that, by proceeding with the pilot, we will get a better sense of what is essential for the middleware, and eventually build a cleaner, more targeted app.

    Cheers,

    Bill

  12. RE: OpenID

    OpenID could be used here to help move this system forward using the OpenID 2.0 spec with attribute exchange. If the university supplied the OpenIDs (ie, served as the identity provider) then any users attributes could potentially include their userid, etc, etc — ie, all the things that can be carried via the attribute exchange spec.

    External OpenID’s, to work within this system, would be treated differently, as there would be no guarantee that they would contain the same data as the university-issued OpenID’s.

    In this system, the OpenID identifier would probably resolve to a users blog address — this is how many people currently do it (via OpenID delegation) — in some ways, using OpenID replaces the feed URI with the OpenID uri.

    Using OpenID also raises the question of how the uni will support the OpenIDs after a student graduates. But that’s a whole ‘nuther issue.

    In this instance, OpenID would solve some issues while raising others. If, however, there was going to be blog-based classes between two or more schools, then OpenID becomes a very attractive option.

    Cheers,

    Bill

  13. jimgroom says:

    Bill,

    Can you say you agree with me again? I really like the way that sounds.

    More seriously, this is amazing, and I wouldn’t mind mapping some of this stuff more specifically, outside of these comments, so that it can be organized accordingly to long and short-term goals. I imagine Patrick could start testing the middleware (if you are so inclined, which I know you are 😉 ). While we test a more barebones version of Bill’s Drupal aggregator with some student and course feeds from UMW Blogs.

    WRT the “one blog per student” question, as it stands now, students and professors can create as many blogs (and by exten sion as many feeds) as they would like. Making one URL (or feed) highly unlikely over the course of an undergraduate degree. This might be something we need to examine more specifically.

  14. Hello, all,

    I definitely think we’ve hashed out some of the preliminary details, and are ready to move onto to the more substantial work of actually building this thing.

    So, let’s talk next steps — ping me offline or leave a comment here about how you’d like to move forward.

    Cheers,

    Bill

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.