First there was WPMu, then WPMS, and now WPMR!
WordPress Multi-Region is more a hosting than application specific feature, and to be clear this functionality is possible for applications beyond WordPress. But Jelastic, our Cloud Provider for Reclaim Cloud, has created a one-click application for installing a multi-region WordPress cluster that can replicate across various data centers in real-time. There are a few elements of this that are exciting as hosting providers:
- With a one-click installer it’s easy to spin-up a complex WordPress infrastructure across numerous regions
- It has the ability to route traffic so folks get less latency being able to access the instance closest to them
- It bakes in fail over so that if one server in one region goes down the traffic is immediately redirected to another available datacenter to avoid downtime
These are all good reasons, but the last may be the most exciting because sites go down. Data centers catch fire, DDoS attacks happen, and servers will crash; it’s not a matter of if, only when. So, as more and more edtech infrastructure has become mission critical there needs to be options to route around that painful reality, and failover is just that: it replicates a single server setup across various data centers across various regions (US-West, Canada, UK, etc.) to ensure there isn’t one point of failure for a enterprise-level service. That’s pretty exciting given this is something we’ve been dreaming about at Reclaim Hosting for a while, and given we manage quite a few large WordPress instances, this could be an immediate options for folks that want to ensure uptime.
So, that’s the logic behind WordPress Multi-Region clusters, and while in Nashville for the Reclaim Hosting team retreat Tim started playing with this setup to test fail over. It worked in theory while we set it up, and then again in practice last week when our UK Cloud server had issues in the early morning. That reminded me that I was planning to play around with a WPMR setup for this modest standalone WP bava blog—cause the bava should never, ever go down … ever. After that, I’ll see if I can make ds106 a multi-region setup over the winter break to get a sense of how it works with a fairly intense WPMS instance. So everything hereafter will be jotting down my progress over the last two days.
I started with spinning up a multi-region cluster to host bavatuesdays. It was a 3-region cluster (US-East, US-West, and UK) and after figuring out permissions to rsync files across environments in Reclaim Cloud (it was harder than it should’ve been, thanks for the assist Chris Blankenship!) the migration was fairly straight forward. The Multi-Region setup across 3 regions has one primary cluster and two secondary clusters, and you rync the files to the primary application environment as well as import the database to that environment. Soon after that it syncs with the secondary environments, and like magic the replica clusters have all the files and database settings, posts, comments, etc., imported to the primary cluster. The replication happens in less than 60 seconds, so it might say asynchronous, but it ‘s all but immediate for my purposes.
I did get bavatuesdays.com running in a WPMR setup for several hours yesterday while experimenting, but had to revert to the stand-alone instance given I ran into an issue creating new posts that I’m still investigating. But as you can see above the blog is running on the domain bavafail-1.us.reclaim.cloud, and there was another instance at bavafail-2.wc.reclaim.cloud, and a third at bavafail-3.uk.reclaim.cloud. You can see from the URLs they are in different regions, US (East coast), WC (US West Coast), and the UK. These all worked perfectly, and the way to have them all point to bavatuesdays.com was to add the public IP from the load balancer for each of the different regional clusters as an A record in your DNS zone editor.
Reclaim Cloud provisions the SSL certificates, and after clearing the cluster’s cache the 3 sites were loading as one, with failover and regional traffic routing working well. It was pretty awesome, but there was one small issue, I could not create new posts, which is kind of a deal breaker for a blog. So I had to revert to the old server environment until I figured that issue out.* I was using the failover and routing baked into Jelastic’s setup seamlessly, but wanted to test out Cloudflare’s load balancing as well, but I’ll save those DNS explorations for another post. That said, Jelastic lays out the possibilities in their post on DNS load balancing for WordPress clusters quite well.
After setting up the A records and issuing SSL certs the bava was beaming across 3 regions. And when I turned one of the three regional clusters off, the site stayed online—so failover was working! The one issue that was also the case when Tim tested in Nashville is that when the Primary cluster goes down the secondary clusters are supposed to let you write to them. In other words, the WP authoring features accessed at /wp-admin should only work on the Primary cluster by default, but if it were to go down one of the other two secondary clusters should allow you to write. This would not only keep the site online, but also allow posting to continue without issue, all of which should then be synced seamlessly back to the primary cluster once it comes back online. I was not able to get this functionality to work. After stopping the primary cluster, the secondary clusters would throw 500 internal server errors when trying to access /wp-admin -so that is another issue to figure out.
I have since spun down the bavafail 3-region test instance after hosing the application servers trying to downgrade PHP from 8.0.10 to 7.4.25 to test out a bad theory, so the first attempt of operation bavafailover with WPMR is dead on the operating room table. Although hope springs eternal at the bava, so I have plans to resuscitate that WPMR setup given I believe it’s a permissions issue—which means I’ll be bothering Chris again.
In the interim, however, I’ve spun up a two-region WPMR setup using the domain bava.rocks as a way to ensure adding new posts works on a clean instance (it does), and also to see if you can access the secondary clusters to write to the database when the primary is down (you can’t), so there is still definitely more work to do on this, but it is really exciting that we are just a couple of issues away from offering enterprise-level traffic routing and fail over for folks that need it. Reclaim Cloud is the platform that just keeps on giving in terms of next-level hosting options, and I love it.
*I was running into the same critical error that folks mention in this forum post, but after downgrading PHP versions from 8.0.10 to 7.4.25 on the WPMR cluster everything broke. I then tested PHP 8.0.10 on my LEMP environment for bavatuesdays (not a WPMR setup) and that worked fine. So not sure if it is specific to the WPMR setup in Jelastic, which uses LiteSpeed whereas my current blog uses Nginx, but this is something I am going to have to revisit shortly.