[Lessons learned: never do a victory lap on your blog before the migration is in the bag.]
It’s been well over a month in the making, and in many ways it started my descent into the Mouth of Madness that I’m currently ascending from—so let’s revisit, shall we.
Reclaim Hosting is in migration mode, we’re working to migrate larger WordPress Multisite instances and cPanel servers to newer infrastructure as we stare down the barrel of the upcoming CentOS 7 end of life. I took the occasion of getting ReclaimPress up and running to take on one of our more complex WordPress Multisite instances, Macaulay’s venerable Eportfolios, and moving it into a containerized WP instance.
What’s more, part of the complexity of this site was that the MySQL database was running in a separate Digital Ocean droplet given how resource intensive it was when Tim migrated it 6 or 7 years ago. So, this migration would be moving it off a cPanel server into its own container, and also consolidating the database into the same container. We needed to make sure there would no be performance issues, which luckily there were not, and also ensure that w gave ourselves enough time for the migration given it took 17+ hours to drop the database on the outdated Ubuntu server—making this migration particularly onerous.
The other piece we wanted to solve was trying to get the over 700 GBs of media off into an S3 bucket to save space on the server. As we run more and more in the cloud, offloading sites with a large amount of media (200+ GB ) becomes more an more important given media eats up space on our dedicated, bare metal cloud servers (the cloud is just another server, it turns out). So, Eportfolios was going to be our first experiment with doing this not only in a larger WordPress Multisite besides ds106, but also running the media through AWS’s S3 and content delivery network (CDN) Cloudfront, given we’ve only ever done this previously with Cloudflare.
So, it took me some time, but I did get the consolidation of the database and the migration from cPanel and ReclaimPress figured out. What’s more, once I did the site was running lightening fast in dev, so that piece seemed all set. A major issue I ran into was getting a PHP 7.4 LLSMP* container running easily on ReclaimPress, the other was timing the MySQL dump from the stand-alone MySQL Droplet. But once they were under control, the site ran cleanly and was quite fast, even with the consolidated database.†
So once all was set and the switch to the new environment happened the site was loading quickly, the only issue was all the media was broken. WTF?! Oh, probably just the upload media path…no! Wait, it has to be permissions…no!! OK, then really it HAS to be a .htaccess problem…NO!!! You get the idea, for about 12 hours Macaulay was not resolving images and media files while we frantically dug to figure out what the hell was happening. Luckily, Chris came in for the assist and did what was some next-level forensic analysis on one of the images that were not loading to scan for any differences between the original that did load, and the migrated image. Turns out there was a single character added to the beginning of every media file’s hex code that was corrupting anything trying to load.
Here is hex code of an image that was loading cleanly on the old server, and we confirmed it’s the same as the image on the new server as well, so no issue with the image itself:
But when the image loaded the characters
0A were added, essentially corrupting this file, and every other one.
Major kudos to Chris Blankenship for figuring this out using a image hex decoder that ultimately allowed us to put in a ticket to Litespeed, given the corruption was happening at the level of the web server, not the image. Turns out the file
wp-includes/ms-files.php was adding this extra character, and once we resolved that the entire site was working as expected. Damn that was rough, I was so convinced I had nailed that migration, what did I say about premature victory laps?
With that resolved and the site loading as expected, other priorities quickly took over, but we had pushed 700+ GB of media to an S3 bucket on AWS. We were not loading media from the S3 bucket yet, but that was the plan to free up almost a terabyte of space on the server. Just this week we returned to this part of the migration in earnest, and this was in many ways a first for our team to experiment with mapping a domain to AWS’s Cloudfront so that all media would run off a Macaulay branded subdomain:
files.eportfolios.macaulay.cuny.edu. This required copying everything in the
wp-content/blogs.dir directory to a bucket named
files.eportfolios.macaulay.cuny.edu. After that, we needed to create a distribution network on Cloudfront that used files.eportfolios.macaulay.cuny.edu as an alternate domain. We also needed to get an SSL certificate through Cloudfront using a CNAME certificate—which was set to expire after 72 hours if not activated, so a bit tricky to coordinate.
The other big piece here is using WP Offload Media to make this all work, and that has been clutch. I wrote about that plugin while using it to offload media for both this blog and ds106.us, but this is the first time we did it for this many files and using Cloudfront, so definitely a brave new world for us. There may be a bit of clean-up this morning, but as of now—and a bigger proof-of-concept for offloading media for much larger, media intensive sites—this is a huge win for us.
*Getting a PHP 7.4 environment to run on ReclaimPress is an unnecessarily round about process right now given you are forced to import from an older container manifest, namely https://raw.githubusercontent.com/jelastic-jps/wordpress/v2.2.0/manifest.yml
†Dropping the database in the new environment only took 15 minutes (as opposed to 17 hours) to dump in the new environment—that alone was a huge win.