<-- PREVIOUS | NEXT -->

Why Nothing On The Internet Lasts Forever (Despite What You Think)

Date: 2023-11-01 14:07

I'm sure you've heard someone remind you constantly that anything you put on the Internet stays there forever. I myself heard this, time and time again. "The Internet never forgets". "Be careful what you post". And so on, and so forth. However, I feel that a bit of debunking is in order. This notion isn't true, far from it. In fact, this can be disproved by looking at a simple concept that goes by many names, its chief one being link rot.

What Is Link Rot?

Link rot is the process of hyperlinks to web pages becoming invalid after time. This is a very common phenomenon, and is caused by a lot of things:

This isn't an exhaustive list, and the causes are quite varying.

How Common Is It?

Researchers have looked into link rot multiple times in the past. In 2003, a study [1] estimated that one in 200 links break every week. This gives a half-life of just 138 weeks, or just over 2 years. (This means that half of the studied links died after 138 weeks, in case you didn't get that one). This was corroborated in a 2016-17 study of the Yahoo! Web Directory (which stopped being updated in 2014), echoing a similar sentiment [2]. The point is, after just 20 years, about 98% of links become dead. That is some terrible preservation right there.

The 'Social Media' Argument

Some people argue that link rot does not matter when you factor in social media content. "Anything you put on social media stays forever, even if you delete it". While some social media services [3] do not delete your content even if you delete it, the argument fails to take into account the sheer number of social networking sites that have failed so far, some of which are from the largest companies in the world:

This list isn't exhaustive, and only shows some notable examples, but this does make my point.

A More Concrete Example

Recently, I played a game called Uplink. It's brilliant, and you should play it too. The game has a popular fan site called Modlink which is over 17 years old at this point. It has a "Links" section, which collates a lot of useful links to other places in the Uplink community. The page has 15 links and only 4 of them still work; most are expired domains and one of them is just a blank website. In less than 20 years, 73% of the links on the website died, and a lot of these are lost to time.

Million Dollar Homepage

A legendary tale from the early Internet serves as another really good example of link rot.

The Million Dollar Homepage was an initiative set up by college student Alex Tew in 2005 to pay for his college fees. The website allows you to purchase part of a 1000 by 1000 pixel grid for $1 per pixel. The site eventually sold all one million pixels, and Tew made a million dollars.

I decided I wanted to collect some info about the site, and took a look at the source code. This is what each entry looks like:

<area onmouseover="d(this)" onmouseout="e(this)" shape="rect" coords="630,310,640,320" href="http://www.getpixel.net/" title="getpixel.net, stock photography">

Yeah, not amazingly maintainable, but serviceable. I decided to write a basic Python script to check the status code of all 3,306 links on the site.

The Script

The script is quite simple:

The Results

Total Scanned: 3,306

Alive: 1,736*

Result: 52.5%

*This is just the number of URL's that return a valid web page. The real number is probably quite a lot less, as they could be parked pages or otherwise be broken.

Things Lost on the Internet

If I'm going to make an argument, I need to at least back it up. Wordplay alone doesn't get you far.

Introducing the Lost Media Wiki, which collates information about lost media, including lost Internet media.

Here's a (non-exhaustive) list of some things I found on its Category page:

This is just a small list, and I think it does prove my point.

Conclusion

The terms "Everything put on the Internet stay there forever" and "The Internet never forgets" are demonstrably false. History has shown time and time again that unless active efforts are made to archive material while it is still available, we will lose more and more essential history, and eventually, restoration will become impossible. Hard drives will fail, links will die, websites shut down, content deleted for a plethora of reasons. Current efforts (like the Internet Archive and Archive Team) clearly aren't enough.

More cooperation needs to be done to ensure the safety of an essential part of humanity's culture and history. So, archivists, keep on archiving, and make redundant backups of said archives. And maybe *you* could also start doing similar things. Find things that need archiving and do it. Before it is too late.