On Wednesday I was able to tune into the Decentralized Web Summit for a couple of hours thanks to the live stream. The event was hosted in the web’s chapel or, the Internet Archive. The video stream has all been archived, and you can access the entire proceedings from both days on YouTube—here’s a link to Day 1. I was really taken with Vint Cerf‘s talk “A Web that Archives Itself.” It’s start’s about 23 minutes in, and it’s worth the half hour if you are interested in archiving the web.
This is the first time I have heard Vint Cerf speak, and I was really impressed. He did a brilliant job explaining the principles of the internet and the web, and was equal parts specific, general, and clear. It was apparent he was intentionally avoiding overly technical language, while being careful not to over simplify. It was a masterclass in making a complex process like archiving the web comprehensible. I have to go back and watch more of his talks because his style has much teach anyone trying to make this stuff accessible.
There were a few points he touched on during his talk I found really compelling for the work I’m doing right now, and I wanted to get them down here before Dr. Oblivion takes over.
- The DNS domain system as it is currently setup is broken, an idea Tim Berners Lee re-iterated in his talk directly followed Cerf’s. The idea of a lease-driven system folks pay for is responsible for much of the link rot and ephemeral nature of the web.* This is something I want to dig into deeper because I know Dan Gillmor has discussed the deep dysfunction of ICANN on a few occasions.
- The idea that everyone should have a domain for life. This is a similar idea to Jon Udell’s seminal (at least for me) talk on “The Disruptive Nature of Technology” in 2007. Udell was not necessarily thinking in terms of a URL specifically there, but more of a hyper-secure repository that we control our digital life bits and use it as a hub to share access, etc.—a more “integrated domain” for one’s digital identity. That said the domain URL would be an important piece of this, and the idea that would be something everyone would get and have “for ever” from an archiving perspective is very compelling. You could still have vanity domains, but they would just be temporary aliases, not something that ever gets understood as the address (similar to the Digital Object Identifier system for published works). So, in short, we get a DOI-like identifier for our work that is also a URL that we can point various domain names at, etc, but always depends on a more permanent identifier.
- The other idea I was taken by was how Cerf description of our current approach to web archiving as akin to creating a digital diorama: taking a two-dimensional snapshot, often by scraping sites. This is exactly what the Internet Archive has been doing for two decades, and more recently the Berkman Center’s tool/plugin Amber does this for individual WordPress and Drupal sites. These digital dioramas capture a moment in time on the web often void of deep context given how we have imagined domains registration, URLs, etc.
I understand there is no easy solution to these issues, and that might be why I love thinking about them. I’ve been approached a few times recently with questions about how someone could keep a site up on the web indefinitely after they leave this earthly realm. I have no good answer. Putting a copy on the Internet Archive would be a good first step, but in terms of guaranteeing any longevity beyond 5 or 10 years as a hosting company would be disingenuous, not to mention impossible. I know this can quickly become a strangely morbid topic, but what happens to my digital domain actually matters as much to me as what happens to my Smurf collection, comic books, laser discs, Twilight Zone dolls, etc.
*To be fair, there are some who see the temporality of content on the web as a feature not a bug, and this can coincide with privacy, surveillance, the right to be forgotten, etc.