Macaulay Migrations, Random Litespeed Issues, and S3 Media Offloads – Oh My!

[Lessons learned: never do a victory lap on your blog before the migration is in the bag.]

It’s been well over a month in the making, and in many ways it started my descent into the Mouth of Madness that I’m currently ascending from—so let’s revisit, shall we.

In the Mouth of Madness GIF

Reclaim Hosting is in migration mode, we’re working to migrate larger WordPress Multisite instances and cPanel servers to newer infrastructure as we stare down the barrel of the upcoming CentOS 7 end of life. I took the occasion of getting ReclaimPress up and running to take on one of our more complex WordPress Multisite instances, Macaulay’s venerable Eportfolios, and moving it into a containerized WP instance.

What’s more, part of the complexity of this site was that the MySQL database was running in a separate Digital Ocean droplet given how resource intensive it was when Tim migrated it 6 or 7 years ago. So, this migration would be moving it off a cPanel server into its own container, and also consolidating the database into the same container. We needed to make sure there would no be performance issues, which luckily there were not, and also ensure that w gave ourselves enough time for the migration given it took 17+ hours to drop the database on the outdated Ubuntu server—making this migration particularly onerous.

The other piece we wanted to solve was trying to get the over 700 GBs of media off into an S3 bucket to save space on the server. As we run more and more in the cloud, offloading sites with a large amount of media (200+ GB ) becomes more an more important given media eats up space on our dedicated, bare metal cloud servers (the cloud is just another server, it turns out). So, Eportfolios was going to be our first experiment with doing this not only in a larger WordPress Multisite besides ds106, but also running the media through AWS’s S3 and content delivery network (CDN) Cloudfront, given we’ve only ever done this previously with Cloudflare.

So, it took me some time, but I did get the consolidation of the database and the migration from cPanel and ReclaimPress figured out. What’s more, once I did the site was running lightening fast in dev, so that piece seemed all set. A major issue I ran into was getting a PHP 7.4 LLSMP* container running easily on ReclaimPress, the other was timing the MySQL dump from the stand-alone MySQL Droplet. But once they were under control, the site ran cleanly and was quite fast, even with the consolidated database.†

So once all was set and the switch to the new environment happened the site was loading quickly, the only issue was all the media was broken. WTF?! Oh, probably just the upload media path…no! Wait, it has to be permissions…no!! OK, then really it HAS to be a .htaccess problem…NO!!! You get the idea, for about 12 hours Macaulay was not resolving images and media files while we frantically dug to figure out what the hell was happening. Luckily, Chris came in for the assist and did what was some next-level forensic analysis on one of the images that were not loading to scan for any differences between the original that did load, and the migrated image. Turns out there was a single character added to the beginning of every media file’s hex code that was corrupting anything trying to load.

Here is hex code of an image that was loading cleanly on the old server, and we confirmed it’s the same as the image on the new server as well, so no issue with the image itself:

Image of Hex code of the uncorrupted image file that was from the old server before migration

Hex code of the uncorrupted image file that was from the old server before migration

But when the image loaded the characters 0A were added, essentially corrupting this file, and every other one.

Image of Hex code of the additional characters added the characters 0A to beginning of all media that was corrupting files loading on eportfolios

Hex code of the additional characters added the characters 0A to beginning of all media that was corrupting files loading on eportfolios

Major kudos to Chris Blankenship for figuring this out using a image hex decoder that ultimately allowed us to put in a ticket to Litespeed, given the corruption was happening at the level of the web server, not the image. Turns out the file wp-includes/ms-files.php was adding this extra character, and once we resolved that the entire site was working as expected. Damn that was rough, I was so convinced I had nailed that migration, what did I say about premature victory laps?

With that resolved and the site loading as expected, other priorities quickly took over, but we had pushed 700+ GB of media to an S3 bucket on AWS. We were not loading media from the S3 bucket yet, but that was the plan to free up almost a terabyte of space on the server. Just this week we returned to this part of the migration in earnest, and this was in many ways a first for our team to experiment with mapping a domain to AWS’s Cloudfront so that all media would run off a Macaulay branded subdomain: files.eportfolios.macaulay.cuny.edu. This required copying everything in the wp-content/blogs.dir directory to a bucket named files.eportfolios.macaulay.cuny.edu. After that, we needed to create a distribution network on Cloudfront that used files.eportfolios.macaulay.cuny.edu as an alternate domain. We also needed to get an SSL certificate through Cloudfront using a CNAME certificate—which was set to expire after 72 hours if not activated, so a bit tricky to coordinate.

Image of WP Offload Media Interface

WP Offload Media Interface being delivered through Cloudfront

The other big piece here is using WP Offload Media to make this all work, and that has been clutch. I  wrote about that plugin while using it to offload media for both this blog and ds106.us, but this is the first time we did it for this many files and using Cloudfront, so definitely a brave new world for us. There may be a bit of clean-up this morning, but as of now—and a bigger proof-of-concept for offloading media for much larger, media intensive sites—this is a huge win for us.
_________________________________________

*Getting a PHP 7.4 environment to run on ReclaimPress is an unnecessarily round about process right now given you are forced to import from an older container manifest, namely https://raw.githubusercontent.com/jelastic-jps/wordpress/v2.2.0/manifest.yml

†Dropping the database in the new environment only took 15 minutes (as opposed to 17 hours) to dump in the new environment—that alone was a huge win.

This entry was posted in AWS, plugins, s3, WordPress, wordpress multi-user and tagged , , , , , , , , . Bookmark the permalink.

One Response to Macaulay Migrations, Random Litespeed Issues, and S3 Media Offloads – Oh My!

  1. Eric Likness says:

    So this got me curious about ms-files.php. Did a lazy search on DuckDuckGo and found this from 13 years ago. https://core.trac.wordpress.org/ticket/19235. Looks like ms-files.php is something people were questioning the need for all the way back then for Mutli-site 3.0. Or disabling by default for new installs. But I’m no WP expert, and would never have guess it was able to re-write data and not simply URLs. So cray-cray-zay

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.