I was doing a major migration of various sites for Gary Stanton, a Historic Preservation professor I worked with on and off for a decade at the University of Mary Washington. About the same time I was leaving he was retiring, and we had worked together on a ton of WordPress sites. He is a folklorist by training, and he has an unbelievably eclectic set of interests across all sorts of awesome vernacular American culture. When he creates class sites, they usually weigh-in by the gigabyte given how many audio files, images, and documents he shares with his students. He’s been building sites like that for years, and when he asked if he could move his stuff to Reclaim Hosting after retirement I jumped at the chance. He has so much cool stuff to share, he’s one of those folks that makes the web a better pace by populating it with his closet of curiosities. And to think Reclaim can help make sure it’s online and stays around for the long haul is an honor and a privilege.
That said, I also knew there would be some snafus because I helped him architect it. In fact, I put off the migration for the last few weeks while traveling, but today was the day to sort it out. He had sites on UMW Domains, UMW Blogs, and a few other places. What’s more, he was changing URLs for all of the sites. I’m a fairly old hand at WordPress, so most of that worked out fairly cleanly. I did find, however, that a bunch of media files are missing for one site, so we’ll have to dig deeper there, but I’m confident it’s around—and if not, knowing Gary, he probably has local copies.
The one set of files that had me stumped was the massive HTML resources he has hand-coded for decades now. Gary has been doing some really impressive work cataloguing old Fredericksburg newspapers, wills, insurance records, land records, etc. It an impressive legacy, and it needs to be cared for. It’s one of those niche archives that don’t necessarily have wide-spread interest, but tell a particular story of a place through artifacts and data. Anyway, many of the URLs in these HTML files where hardcoded, which could mean a ton of manual labor updating thousands of files. So I asked the oracle Tim Owens if there were any find and replace features on linux. He responded, “anything is possible in linux.” And that is why we are partners, that is the answer you get from real genius 🙂 He pointed me to the following post “Replace a String in Multiple Files in Linux Using Grep and Sed.” And, as it turns out, you can indeed find and replace a string of characters like umwmhisp.org an swap them out with stanton1946.com across thousands of files in a directory within seconds. So amazing.
After reading the post, I used the following code to replace umwhisp.org with stanton1946.com across multiple files:
grep -rl umwhisp.org ./resources | xargs sed -i 's/umwhisp.org/stanton1946.com/g'
And with that line of code tons of historical data about Fredericksburg was preserved for the web. In my mind this is exactly what Reclaim Hosting is about, keeping the vernacular culture of the web up and running one account at a time!
Command line commando!
FWIW this is one of the reasons I still use the first text editor I started with in 1993, BB Edit- you can do grep search and replace across multiple files, directories (more or less what you do there). I’ve done it to update hundreds of files at once. It has saved my butt on many occasions.
Also useful is find and replace in MySQL databases http://www.electrictoolbox.com/mysql-find-replace-text/
xargs for the win,
That thing does some HEAVY lifting when you know 1.)what it is and 2.)how it works.
Eric,
Exactly, and thanks to Tim I learned both right quick 🙂