The White Album Problem

In Men in Black (1997), Kay is showing Jay some new alien technology and says, “This is gonna replace CD’s soon; guess I’ll have to buy the White Album again.”

The White Album Problem

At the time of this writing I’m 33 years old, but in my lifetime I’ve seen the progression in music from vinyl records, 8-tracks, and cassette tapes to CDs, MP3s, and Streaming audio. But in the computer world, the change is even more dramatic. I grew up with hard-drive-less PCs that ran off of 5 and 1/4th inch floppies to 5 MB hard drives to 3 and 1/2 inch floppies to ATA to SATA to Flash hard drives.

I started writing on a DOS-based PC using PFS Write and when Windows 3.1 came out I had to convert my book to Windows by using PFS to “print to file”. This allowed me to open the unformatted files and save them in Microsoft Write. When Windows 95 came out I converted the book again to Microsoft Works, which came with the computer. After Windows 98 came out and I got Microsoft Word, I again had to convert the book to Word.

Then came the web.

It was one thing to have to worry about the files stored locally on your hard drive. With the introduction of web applications, we now had to manage digital archives in the sky. My first encounters with web storage was with email, which I accessed via Telnet to Pine. Soon after I started using Eudora as an email client followed by Juno, which was my first personal email address. I didn’t use AOL mail, but that along with Prodigy and Compuserve were popular webmail clients at the time. My first experience with webmail was Hotmail, followed by Yahoo! Mail, and finally Gmail. It’s important to note that with the exception of Gmail, ALL OF MY EMAIL IS GONE. Because I didn’t backup or convert my old email from Pine, Eudora, or Juno and because of Hotmail and Yahoo’s aggressive email deletion policies at the time, all of my past email has been lost. The only archives I have from that time exist in paper letters I saved, audio tape recordings, printed photographs, and VHS tapes.

I was still writing paper letters to my girlfriends, my mother, and my grandmothers up until 1999. Even though I had a Juno and Hotmail account at the time, the other parties didn’t. I can remember making lists of people’s email addresses in high school because so few people had them – and those who did have them were usually shared with their parent’s as part of their family’s ISP account. I can remember getting in trouble with one of my girlfriend’s dads for a joke I sent her via email in high school. One AOL product I did end up using a lot was AOL Instant Messenger. It’s how I kept in touch with friends sitting next to me and across state lines after high school. It’s also how I met my wife, but that’s another story. Like my old email, no conversations were ever recorded, kept, or archived. I don’t have them. They are gone. This loss of digital history is part of the reason people still want to know what happened to It had all of their emails, pictures, and chats stored there went it went offline. Imagine if Facebook went offline tomorrow – how would you feel?

Sometimes, digital archive management can even be a problem within a single site. Take Youtube for example. It started out as a site that had it’s own login. People didn’t really realize how they would eventually use it or what exactly it was for at first. They setup accounts, posted weird stuff, sometimes forgot about it, and then came back and setup another account later. Eventually Google bought Youtube and started forcing users to login with their Google Account, which they also had multiple logins for. So now people had multiple Youtube accounts and multiple Google accounts and now Google was forcing you to reconcile the two. Videos uploaded to Youtube could not be transferred between accounts. If you wanted to delete the account, but keep the videos, you had to manually download them and re-upload them again. Then came Google+ and Google wanted you to stop caring so much about your channel name and start using your real name, which caused even more confusion. Despite Facebook’s massive growth and change over the years, their product has remained relatively consistent compared with Google’s products.

Digital Preservation is a huge problem going forward. As more and more data is created, it first has to get stored, but it then has to be read over time. That either means preserving the devices and programs that can access the data or constantly converting the data over time. That essentially is the White Album problem. In MIB, it is alluded that Kay first bought the White Album on a vinyl record, then bought it on an 8-track, followed by a cassette tape and finally a CD (in 1997 MP3s were not popular enough to mention on a movie). If Kay had not bought the White Album on the latest format, he would have to maintain the older system that was capable of playing the music in the format he originally bought it on. If I were to have saved my old email, I and the service providers would have to have maintained the computers and servers capable of displaying that email information or I would have to had downloaded and converted the data into a suitable format. Does it matter? Maybe not for me, but as a society we have to wonder what is going to get saved and what is going to get lost?

There is a lot of talk about the “reverse 15 minutes” rule where instead of everyone being popular for 15 minutes, everyone gets to be anonymous for 15 minutes. I don’t know. A LOT of the stuff I’ve created has been deleted and though it may exist somewhere, if you can’t find it on Google or your own hard drive, by all practical purposes, it doesn’t exist. This applies too to those VHS tapes you had converted to DVD in the early 2000’s. If you’re not converting them to Youtube or some other form of digital media you risk losing (yet again) the information you previously sought to keep. In regards to social media, yes, the data is there, but the more data you add, the harder it is to find things. Have you ever tried to look up one of your old tweets? The server only shows you the last 50 tweets or so at a time and loading more old tweets takes a long time, however if you know what you’re looking for, one search can bring back tweets from 2009. Facebook’s Timeline feature made it easier to go back in time, but unless you’re going all the way to the bottom or scrolling slowly, you can’t easily see everything and Timelines aren’t searchable (Update: Facebook Now Allows Users to Search Timelines). While the problems with a lack of a “delete” button on the Internet is not the topic of this post, I’d argue that it’s less of an issue than it might seem. While the government will always have access to whatever information they want, individual companies will go out of business, files will get deleted or not converted, or databases will not be indexed or searchable making the data irrelevant over time.

What You Can Do to Preserve and Convert Your Data Over Time

If you’re looking for a place to store your digital files, consider Dropbox for general file storage online. 100 GB is currently $9.99 a month, which probably isn’t enough for all of your files, but for pictures there is Flickr, which can store up to 1 TB of images per year online. Google Drive is another option and one that can convert your Word, Excel, and Power Point files into an editable document. This is one way to keep files ‘always converted’ as long as Google Drive still exists (Google has been known to shut down services often so beware). The bigger Google gets the less likely I am to invest my data with their systems. While I still use them for email (via Google Apps), I use Dropbox for image and document storage and sharing. I also have a second backup hard drive in my computer, an external hard drive, and a network hard drive connected to my home PC. I still have the first digital picture I ever took of myself in high school, but to do that I had to copy that file from a floppy disk to a computer I had in the late 90’s and to every computer I’ve had since. Just one misstep along the way would have meant that file would have been lost. And is there any value to keeping it? Maybe, maybe not.

When I used to do computer repair, the #1 most heard request when fixing a computer was, “save my pictures”. These people were saving their digital images on their computers and no where else. This is still the case for the most part. The difference is that most people’s pictures are now on their phones and when they drop them in the toilet, their pictures go down the drain. This again is where Dropbox comes in handy as it can automatically upload pictures from your phone to Dropbox. However, there is an obvious and real cost to all of this data storage. Whether you continually buy new hard drives or you continue to pay month after month to Dropbox to store your data there, you are assigning value to the preservation of that data. And storage of the data does not equate with retrieval of the data. If enough time passed and you were no longer able to open a JPEG image with any available software, that data would have become useless. As each new software and hardware platform comes along, we will always have the White Album Problem and those who do not keep up risk losing access to their data, forever.

The Snapchat Generation, the Forgotten Generation

The EU recently blocked a bill titled the Right to Be Forgotten, which would have granted users the right to ask service providers to delete the personal information. But Snapchat users aren’t waiting for a law, they’re simply not storing anything online. While it’s technically possible to retrieve the data, for most purposes this means that for a generation of social media users, no data will be stored online. So what will their history be? How will they remember these times? Maybe there are still places online where they will store their pictures and maybe they’ll occasionally use email to communicate, but when your primary mode of communication is text, Snapchat, Skype, or some other form of ephemeral communication, what legacy are you leaving? Maybe they don’t care, but should we?

Update from Vint Cerf 2/13/2015

The following are excerpts from Google’s Vint Cerf warns of ‘digital Dark Age’:

I worry a great deal about that,” Mr Cerf told me. “You and I are experiencing things like this. Old formats of documents that we’ve created or presentations may not be readable by the latest version of the software because backwards compatibility is not always guaranteed.

“And so what can happen over time is that even if we accumulate vast archives of digital content, we may not actually know what it is.”

‘Digital vellum’
Vint Cerf is promoting an idea to preserve every piece of software and hardware so that it never becomes obsolete – just like what happens in a museum – but in digital form, in servers in the cloud.

If his idea works, the memories we hold so dear could be accessible for generations to come.

“The solution is to take an X-ray snapshot of the content and the application and the operating system together, with a description of the machine that it runs on, and preserve that for long periods of time. And that digital snapshot will recreate the past in the future.”