WPMu Text Encoding Hell

Important Update: In the following discussion I talk about how changing the UTF8 setting in the wp-config.php file for WPMu to UTF-8 fixed my encoding problem, and this is true. That said, this change has led to a far bigger problem, namely it has prevented me from creating any additional sites on my WPMu account. It returns an error “The page isn’t redirecting properly…” And don’t ask me why, but I am certain now that this has everything to do with the dash between the UTF and the 8 I added to fix the encoding. So i will probably blog I have blogged about this ordeal, but just wanted to clarify this here before someone says I am an idiot, which wouldn’t be totally untrue 🙂

I recently re-visited the ELS Blogs installation that started the whole WordPress Multi-User craze at UMW. I figured it was time to upgrade this installation, and perhaps use it as a way for thinking about how we might archive this stuff, or even just let the students who have blogs on the system know they are still there and waiting to be re-claimed, exported, etc.

To my great chagrin, after upgrading this installation from WPMu 1.2.5a to WPMu 1.3 the text-encoding went batty. A number of blogs were littered randomly with this unattractive character  — and foreign accents and apostrophes where delivering all sorts of bizarre symbol combinations. So, for about three hours today, I downloaded the database of ELS Blogs and tried to figure out if I could do a search and replace for the various symbols. I tried at length, but the database was too big to be manageable in Textmate (weighing in at 60 MBs). I then cut it up into pieces, and did search and replaces for all sorts of things, only to realize I was messing up the core SQL code because I am a hack.

So, finally, after pretending I know something about databases, I decided to turn to the forums (why didn’t I do this in the first place again?) to find a simple fix (at least for me) to a very annoying issue particular to the upgrade from WPMu 1.2.x to WPMu 1.3.x. In short, the database chracterset went from latin to UTF-8 between the versions, with very little said about it in the upgrade readme. In fact, the fix is in the wp-config.php file (one I often don’t overwrite when upgrading because I want to keep the db connection information intact). Seems like if you are going from 1.2.x to 1.3.x you have to use the upgraded wp-config that has the relevant information about the UTF-8 encoding.

But my annoyance is not simply because I didn’t overwrite the wp-config file, for that is my fault and I can live with that. What gets me is that the way the encoding is defined in the wp-config file seems to be wrong, for example:


// ** MySQL settings ** //
...
define('DB_HOST', 'localhost'); // 99% chance you won't need to change this value
define('DB_CHARSET', 'utf8'); // <--WTF this should be utf-8
define('DB_COLLATE', '');
define('VHOST', 'VHOSTSETTING');
$base = '/';

The line
define('DB_CHARSET', 'utf8');
Doesn’t fix the character encoding problem when updated, rather you need to change it to
define('DB_CHARSET', 'utf-8');

I learned this all from the forum discussion here, and while I am excited it worked and I don’t have to search and replace thousands of bad symbols, I don’t know why such an important difference between the two versions wasn’t a bit more pronounced.

This entry was posted in wordpress multi-user, wpmu and tagged , , , , , , . Bookmark the permalink.

15 Responses to WPMu Text Encoding Hell

  1. PatrickGMJ says:

    I’ll second the comment in the code: WTF. That’s definitely not something that should have been missed on the developers’ parts.

    But I’m very glad that they have gone to serving up the UTF character set. That should be good news especially for the foreign languages dept. We’ll just have to watch that people have their browsers set to use the character set the page gives. (I.e., if anyone has their browser set to always expect Western (ISO 8859-1), they’ll get the same funny characters. Going UTF-8 makes us unicode-happier.

  2. Matt says:

    This is such a great web-design lesson, though — one absent dash can make all the difference!

    It’s never fun to go through an experience like that, but at least you’ve got a good story to tell . . . given the right audience.

  3. Reverend says:

    Patrick,

    When it comes to text-encoding, if your happy, I’m happy. Because I don’t know WTF I’m doing 😉

  4. Mario says:

    Jim;

    Thanks for sharing your esperience with this upgrading. I would very helpful for an upgrade thar I need to do soon.

  5. Reverend says:

    @Matt — Isn’t it amazing that anyone would find this interesting? I just keep throwing this stuff out there in hopes fan boys & girls can benefit from my grave mistakes. Like, for example Mario (another commenter above) who has always been a mensch when it comes to sharing any and everything he learns about WPMu.

    @Mario: Always glad to help, and I would strongly encourage you take a look at this post as well, which is a more coherent and intelligent take on this database encoding concern when upgrading: http://alexking.org/blog/2008/03/06/mysql-latin1-utf8-conversion

  6. Andrea says:

    I dunno either. I’ve seen quite a few interesting typos and stray wp.com code get left in there. 🙂

  7. Thanks Reverend, I’ve just been going through the same thing, trying to sort this out after upgrading WPMU 1.2.5a to 1.3.3 yesterday.

    I’d even read the forum post and *thought* I’d tried the fix, which just shows how far down the slippery slope I am. Maybe I looked at a cached page? Today I’ve been restoring backups, checking encoding in the old database etc and was just at the point of backing the whole thing out and giving my head a rest.

    A final clutch at Google threw up your post, and sent me straight back again. Things now looking a lot happier.

    @Andrea: Thanks for the warning, to be honest I was reluctant to believe something so fundamental could be wrong. Now older and wiser.

  8. Pingback: Addendum to My WPMu Encoding Hell at bavatuesdays

  9. Pingback: » Addendum to My WPMu Encoding Hell WPMu Ed

  10. Pulsuz Sayt says:

    Hi,
    I have also problem about collacation or character set. Lately i have changed server and everything is OK but i can`t use Azerbaijani umlauts. All my database is UTF8 and all new posts are saves correctly but old posts aren`t. Is there are any solution for this?

    Thanks.

  11. Reverend says:

    Pulsuz,

    This is a hack, but try going into the wp-config file and changing the character encoding from utf8 to utf-8. Let me know if that brings back the umlaut. It may not, but I’m curious to see.

    Best,
    Jim

  12. Pulsuz Sayt says:

    Thanks for reply but not worked

  13. Reverend says:

    Pulsuz,

    What version of WPMu are you using?

  14. Pulsuz Sayt says:

    That is WPMU 2.9.1 with multi-DB

  15. Pingback: Late to the Party: Migrating an outdated WPMu to WordPress Multisite | bavatuesdays

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.