Important Update: In the following discussion I talk about how changing the UTF8 setting in the wp-config.php file for WPMu to UTF-8 fixed my encoding problem, and this is true. That said, this change has led to a far bigger problem, namely it has prevented me from creating any additional sites on my WPMu account. It returns an error “The page isn’t redirecting properly…” And don’t ask me why, but I am certain now that this has everything to do with the dash between the UTF and the 8 I added to fix the encoding. So
i will probably blog I have blogged about this ordeal, but just wanted to clarify this here before someone says I am an idiot, which wouldn’t be totally untrue 🙂
I recently re-visited the ELS Blogs installation that started the whole WordPress Multi-User craze at UMW. I figured it was time to upgrade this installation, and perhaps use it as a way for thinking about how we might archive this stuff, or even just let the students who have blogs on the system know they are still there and waiting to be re-claimed, exported, etc.
To my great chagrin, after upgrading this installation from WPMu 1.2.5a to WPMu 1.3 the text-encoding went batty. A number of blogs were littered randomly with this unattractive character Â — and foreign accents and apostrophes where delivering all sorts of bizarre symbol combinations. So, for about three hours today, I downloaded the database of ELS Blogs and tried to figure out if I could do a search and replace for the various symbols. I tried at length, but the database was too big to be manageable in Textmate (weighing in at 60 MBs). I then cut it up into pieces, and did search and replaces for all sorts of things, only to realize I was messing up the core SQL code because I am a hack.
So, finally, after pretending I know something about databases, I decided to turn to the forums (why didn’t I do this in the first place again?) to find a simple fix (at least for me) to a very annoying issue particular to the upgrade from WPMu 1.2.x to WPMu 1.3.x. In short, the database chracterset went from latin to UTF-8 between the versions, with very little said about it in the upgrade readme. In fact, the fix is in the wp-config.php file (one I often don’t overwrite when upgrading because I want to keep the db connection information intact). Seems like if you are going from 1.2.x to 1.3.x you have to use the upgraded wp-config that has the relevant information about the UTF-8 encoding.
But my annoyance is not simply because I didn’t overwrite the wp-config file, for that is my fault and I can live with that. What gets me is that the way the encoding is defined in the wp-config file seems to be wrong, for example:
// ** MySQL settings ** //
define('DB_HOST', 'localhost'); // 99% chance you won't need to change this value
define('DB_CHARSET', 'utf8'); // <--WTF this should be utf-8
$base = '/';
Doesn’t fix the character encoding problem when updated, rather you need to change it to
I learned this all from the forum discussion here, and while I am excited it worked and I don’t have to search and replace thousands of bad symbols, I don’t know why such an important difference between the two versions wasn’t a bit more pronounced.