I had followed with great interest the discussion on the Reclaim Hosting Community site about archiving a dynamic, database driven site as static HTML files. I share Alan Levine’s passion for trying to archive as much of the work I’ve done online as possible, I’m just not nearly as good at it. That said, today I had an occasion to use the Mac tool SiteSucker Tim Owens has been raving about for a while. The app costs $4.99 and takes any URL and packages up the entire site (including images and media) into local static HTML files.
I finally decided to try it when I was migrating a website built with another host’s custom webpage builder. There was no export tool (why you gotta be like that?) and I was not going to copy and paste scores of HTML pages. I was prepared to tell the Reclaimer the migration was a no-go, but then I remembered SiteSucker. Given this was a custom web tool and they’re planning on building a new site after the move, why not simply package it up with SiteSucker which will provide them an interim home as well as an archive?
So, I did. And it was as awesome as Tim promised. I just added the URL as illustrated above and 3 minutes later the entire site was downloaded as static HTML pages. I uploaded the entire archive to their Reclaim Hosting account and pointed the domain at our nameservers and that was that. Crazy how simple that was, it makes me want to start working my way through a bunch of old WordPress sites I have and start retiring them to HTML.
I don’t pay for that many applications, but this is one that was very much worth the $5 for me. I can see more than a few uses for my own sites, not to mention the many others I help support. And to reinforce that point, right after I finished sucking this site, a faculty member submitted a support ticket asking the best way to archive a specific moment of a site so that they could compare it with future iterations. One option is cloning a site in Installatron on Reclaim Hosting, but that requires a dynamic database for a static copy, why not just suck that site? And while cloning a site using Installatron is cheaper and easier given it’s built into Reclaim offerings, it’s not all that sustainable for us or them. All those database driven sites need to be updated, maintained, and protected from hackers and spam. Something like SiteSucker makes a lot more sense than cloning a site for helping folks archive their work so that it can be accessible for the long term, and building that feature into Reclaim Hosting’s services would be pretty cool.
Suck That Site! I like that slogan 🙂 I’ve been using this for a conference I help run, http://www.vsteconference.org/. We used to have a WP Multisite instance at http://www.vsteconference.org/ with previous years archived to a subfolder site but then I realized that was lame since you had to manage plugin updates and things breaking. I started archiving the sites to HTML and putting the folders up on the server with each year so http://www.vsteconference.org/2015, http://www.vsteconference.org/2014, etc all are static sites that look practically identical to the dynamic versions and cause no maintenance overhead.
What about HTTrack?
Good call, that looks like a nice Linux/Windows version that does basically the same thing, here is the link for anyone interested:
https://www.httrack.com/ Thanks for the pointer!
I love SiteSucker, too. We had all these old MediaWiki sites that students had made long ago (maybe you even worked on some of them?). They were sitting out there accumulating spam and even some vandalism, so a couple of years ago I decided to convert them all to static HTML. SiteSucker, of all the tools I tried, was by far the best. Sucked them down and spat them out again perfectly.
A few broken links but most of those seem to have been from students hotlinking images anyway. Old MediaWiki, old WordPress from the days when each class had its own install (before the days of WPMU and even longer before multisite).
It’s a great tool.
I love the idea of using SiteSucker on some old MediaWiki sites, I have a bunch of those, and that would solve a lot of legacy issues, thanks for the tip, Joe!
Can you do it on the server using wget?
Absolutely, and if we can integrate this into Reclaim’s offers, that is the way to go. I love the idea of allowing folks to archive an old site from the cPanel. Though Tim, Alan, and many others already went through this in the forum, I am just still excited by it.
Did you try using wget? Was it able to do the same job effectively?
I can only add to download the ImageOptim app on your Mac and just drag-n-drop the downloaded folder into it in order to further compress all the images.
Pingback: A Web Diet: Converting WordPress Sites Over to Static Sites – Adam Croom