Table of contents for Zen and the Art of Knowledge Maintenance
- Zen and the Art of Knowledge Maintenance, Part I: Ashes to Ashes, Zeroes to Zeroes
- Zen and the Art of Knowledge Maintenance, Part II: Ideas Worth Spreading
- Zen and the Art of Knowledge Maintenance, Part III: Our Digital DNA
- Zen and the Art of Knowledge Maintenance, Part IV: No Bit Left Behind
On my bookshelf there sits a book called Dark Ages II: When the Digital Data Die, by Bryan Bergeron. It was published in 2001, and it’s now out of print, so finding a copy of this treatise on the dangers of entrusting human knowledge to digital formats is increasingly difficult, left to remainder bins and second-hand bookstores. Its publisher hasn’t deemed the book popular enough to warrant issuing it in digital format, either–no Kindle version, no ePub version, no PDF version. In a few short years this book will be forgotten, lost to the ages in a feat of sublime irony.
Bergeron’s book will likely survive in a good number of public libraries, of course, but without an index of its contents that can be searched electronically, its secrets will remain behind closed covers, revealed only to those few hardy souls with the curiosity, the physical access, and the knowledge of the book’s existence. Some will stumble across it by chance on a library shelf, others will discover it while poring over the references in the endnotes of scholarly papers, but the vast remainder will never learn of it, much less read it. Bergeron’s contribution to the sum of human knowledge is destined for oblivion, in other words, because it has no currency in the digital realm, a good candidate for the Warehouse of Unwanted Books.
It’s been pointed out, of course, that a Shakespeare manuscript from the 17th Century remains largely readable today, whereas a WordStar document from 1985 on a 5.25” floppy disk might as well be random gibberish. These are both instructive examples in their own way.
That Shakespeare manuscript, for example, persists today only because of environmental control systems that carefully regulate air temperature, humidity, and oxygen levels, in order to slow the decay of the paper and ink. Without the continuous availability of electricity to power those environmental control systems, books and papers that old would perish quickly. The expense involved in maintaining such a controlled environment also forces the librarian to be choosy about what materials are deemed worthy of preservation. The works of a famous writer like Shakespeare clearly merit special treatment, but how many works by lesser-known authors over the millennia have been lost simply because the monks and scribes and library curators lacked the time, budget, or shelf space to preserve them?
The WordStar example hits closer to home for me. Back in the mid-80’s I wrote a number of short stories–even a novel, though like any fledgling effort by a teenager it was nothing to brag about. I had an Atari 400 computer in those days, and the word processor I used was called AtariWriter. Every night, for a few hours after dinner, I’d sit down and continue the exciting adventure saga of Rivenhelm, until to my amazement a few months later I’d finished a 200-page opus. I knew it wasn’t in any way ready for publication, but nevertheless it was an intensely personal work and I figured I’d eventually go back to it when I’d aged a little and become a better writer. I saved the novel on a set of four single-sided, single-density floppy disks, put them in a diskette case for safe keeping, and moved on to other projects.
It was about ten years later that I decided to revisit Rivenhelm’s adventures as part of a wave of nostalgia. By then I was using a Pentium-based Windows 95 computer, and while it still had an old-style floppy disk drive, it was expecting double-density, double-sided disks, and had no understanding of how to read an Atari-formatted disk. I still had my old Atari computer, though, so I brought it out of storage and booted it up, figuring that would solve my problem. I put the first disk in the drive, and it was unreadable. The second disk was apparently unformatted. The third and fourth disks were marginally readable, but with badly corrupted files that AtariWriter couldn’t parse. I was gobsmacked, to put it mildly. What had happened to my data in just ten years?
The harsh reality is that magnetic media only holds its state for as long as its induced magnetic field remains stronger than the force of gravity acting on the ferrous filings–something called “bit rot”. When a filing is lying flat (in the plane of the disk), it represents a 0; when it’s coaxed into a perpendicular orientation it represents a 1. Over time, the magnetic field holding those filings in place weakens, while the force of gravity relentlessly tries to flatten them, so that in the end all the 1’s become 0’s–ashes to ashes, zeroes to zeroes. You can refresh the magnetic field simply by putting the disk in a drive and letting it spin around in there for a few read operations every once in a while, but if you keep those disks in a box for years, gravity will first corrupt, then ultimately erase your data.
But even if I’d been able to read the data from those old disks somehow, I’d have run into trouble trying to get a modern word processor like Microsoft Word to read the AtariWriter files. File formats undergo evolutionary and sometimes revolutionary changes as technology lets us do more interesting and complex things with our data. There was no choice of font with AtariWriter, for example–font was determined by the printer. There was no way to change the font size, either, and no way to incorporate images into a document. Modern word processors can embed a lot more information about a document’s layout, appearance, and content in its data format, and while Word can import some older word processor formats the list is relatively short. Eventually the developers decide that the demand for support of ancient formats is so low as to be negligible, and it gets dropped altogether.
NASA ran into this problem with their Moon landing tapes, which were recorded using a type of slow-scan rendering that only a handful of purpose-built machines could read. Over the decades those machines were mothballed, dismantled, and cannibalized for parts until only one semi-functional tape reader remained. Today the Lunar Orbiter Image Recovery Project is refurbishing two of these old AMPEX FR-900 tape drives in order to recover the full-resolution video of the recently rediscovered tapes. The obscure data format required custom hardware and custom media on which to record it, all of which served to limit the accessibility of the data and fuelled Moon landing hoax conspiracy theories, since there was no way for third parties to analyze the data first-hand. Never mind the fact that the tapes themselves sat in warehouses for decades, and that NASA lost track of them for 20 years, or that there were no backups made. That’s just stupid.
No related posts.

Actually, Bergeron’s book has been scanned by Google and is available in “snippit format” on Google Books. Were Bergeron and/or his publisher to say the word (and sign the agreement) it would be available in ebook and pdf format in a heartbeat – at no cost to them. Odds are they’re either not paying attention, have a religious objection to what Google has done, or have plans of their own for digital versions.
As an aside, odds are you can load your old Atari Writer files into an ASCII editor and strip out the control codes – the result will almost certainly need cleaning up, but should be fairly readable. That’s assuming that you can’t find an Atari Writer equivalent to the old DOS “unws.com” that would do the trick for WordStar files.
Bit rot is bit rot, and if it goes far enough (or if your media gets too brittle) it’s gone, gone, gone, but file formats needn’t foil an adequately determined hacker.
Yes, while much of the publishing industry opposes Google Books I fear it may be the last real hope for out-of-print titles. I suspect the holdout publishers are simply waiting for the right time, the right venue, and the right pricing structure to release their catalogs in digital form. Unfortunately it means that in the interim an out-of-print title is in a kind of limbo, with an uncertain future.
Google Books is trying to fill that void legally, but there are plenty of amateur efforts less concerned with copyright entanglements out there scanning everything from magazines and comics to paperbacks and hardcovers. Some of these even operate under the noble mantle of “digital preservationists” to rationalize what copyright law calls theft and piracy.
As for hacking arcane, obsolete file formats, you’re quite right–a determined hacker willing and able to take a hex editor to a data file to decipher its structure and write code to parse it would not be deterred. Someone unskilled in those arts would be lost, though, and unless the data was vitally important or extremely valuable the effort and cost involved in finding such an expert to reconstruct it would likely be a big deterrent.
A word processor document is probably one of the easier sorts of data files to reverse-engineer, knowing that most of the data should be text (hence your good suggestion about ASCII editors). A word processor document can also suffer more bit rot than most types of data files, because the odds are higher that an affected byte will be a text character–the rot would manifest as a changed character more often than a changed formatting code or structure datum. In the more general case, an application’s data may very well be numeric (e.g. a spreadsheet), or encoded with some sort of record structure that may or may not be fixed in length, dense or sparse (e.g. a database). A hex editor becomes the tool of choice in cases like that, but even then the task is non-trivial–certainly not something an unskilled hacker would feel comfortable tackling.
Computers look very different to people who aren’t programmers, techies, hackers, or engineers, and I think we overlook that at our peril. To folks who don’t have a good understanding of what’s going on under the hood a computer is an appliance like any other, and when it misbehaves or balks with an error message it’s like hitting a brick wall. Where a hacker might say, “oh crap, I’ll go look up the file structure online, fire up a hex editor and see if I can fix the errant bytes manually,” that’s not what most computer users are going to be thinking or doing next