A Ghost in the WordPress Machine: the Self-Referential RSS Feed

I’ve been meaning to write about this for a while now, but I guess the pure absurdity of it has stopped me again and again. But it is time to move beyond that. We’ve been running two major WordPress Multi-Site installs (what was WPMu) since 2007. In the five years we have been doing it, a vast majority of our performance issues have been linked back to  self referential RSS Feed that basically infinitely loop and crash the server because it becomes so resource intensive.  This brought down UMW Blogs several times during the 2010-2011 academic year, and it recently has been making the ds106 server spin out of control.

What the hell am I talking about? Well it’s simple, you call your own feed in an RSS Widget or whatever. For example I would call http://bavatuesdays.com/feed in my sidebar and this site would infinitely loop once it called make the server go batshit. You can see a graph of the ds106 server regularly going batshit until we realized there was a self-referntial feed in on the sidebar pages on ds106.us.

So, given this is a real simple way to pull down an entire multi-site installation, why can’t we find any info about it? Has anyone else had this issue? is there a way WordPress might actually disallowing self-referencing feeds so that sites don’t crash and burn? This seems like a ridiculous issue to have gone on unresolved for so long, but maybe part of the issue is no one;s talking. perhaps there’s a ring of silence around this one 🙂 But more seriously, anyone have ideas of how to prevent it? And if so, anyone interested lobbying the core WP developers to see if we can’t get it committed to future releases to stop the madness once and for all?!

This entry was posted in WordPress, wordpress multi-user, wpmu and tagged , , . Bookmark the permalink.

3 Responses to A Ghost in the WordPress Machine: the Self-Referential RSS Feed

  1. Too funny. I had exactly the same problem with Edu_RSS back in the day. It happened and was solved really quickly, though – the very first day I put up an ‘add-feed’ page some wag submitted the Edu_RSS feed and the circle is on.

    The solution is to prevent the feed from ever being harvested in the first place. So buried deep in gRSShopper today is a line that says
    return if ($feed->{url} =~ /$Site->{co_base}/);

    I use the cookie base because it’s an easy way to prevent all sorts of permutations on the feed. The cookie base for my site is “downes.ca” and the match operator (=~) searches for the presence of a substring in a string.

    In WP probably the best pace to put such a line is right at the feed harvester (though no doubt it could go elesewhere and be equally effective). I’d put it in fetch_feed() — you can find this around line 530 in wp-includes/feed.php

    To fix yours with a simply hack, do the following:

    In line 532 (ie., right before the line that says $feed = new SimplePie()) insert the following code:

    $co_host = “ds106.us”;
    if (preg_match $co_host,$url) { return 0; }

    That should solve your problem. If you’re finding other feeds harvested in error, add the same code for each feed.

    If you want it to be less of a hack and more of a fix for WP, then you would find a way to automatically detect the site base URL, rather than simply defining it as $co_host – but that would take me more time than I want to spend on his.

    Anyhow, hope this helps!

  2. Reverend says:

    Stephen,

    That is awesome, thanks for this fix. I will be trying it out shortly. You are truly a renaissance man!

  3. Ted Mann says:

    I am with you… STOP THE MADNESS! 🙂

Leave a Reply to Reverend Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.