The Bite of Bytes: The Importance and Challenges of Software Preservation

Stephen Kennedy - Lead Web Developer

Software has advanced incredibly in the last few decades. From operating systems that help manage and universalize the interactions between software and hardware, to small applications that simply tell you the time. The way we interact with these electronics is becoming more convenient and complex.

The downside of all this advancement is that we are hardly ever looking back and thinking about ways to preserve the work of the years before so that they can be examined and potentially learned from. This is a task that has only recently (in the last decade) started to be recognized as important. The Smithsonian Institute has started placing software and hardware-based exhibits in its American History Museum (Cobol, American Enterprise, Innovations in Defense). This has resulted in a more concerted and curated academic effort in addition to the broader and less discerning community efforts that have been around for a long time.

The first big question everyone asks is Why? Why is it important? Why should time be spent on it when the hardware that runs it may no longer exist or be accessible? Why are we worried about it? The first two questions are pretty easy to answer: like any discipline, software developers are "standing on the shoulders of Giants" (Isaac Newton, 1675). Our knowledge of this field is cumulative, and by losing access to evidence of our early knowledge, we may run into issues that have already been solved many times over—or risk toppling the tower entirely.

Bar none, the largest roadblocks for software preservation are the availability of source code and hardware specifications. A growing number of applications are closed source, which means that only the compiled final product is available. This is a decision often made to protect the business interests of companies, but anyone looking to learn from how the software accomplished its tasks can only get a partial picture, without actually being able to see what the code is doing behind the scenes, and that kind of knowledge becomes a necessity when trying to design emulation platforms to preserve this software in a functional state for the future, and this problem will continue to grow as the complexity of the software we are looking to preserve increases.

So, why should that be a concern? Because by not releasing the source code behind antiquated applications, we may be artificially slowing down progress in the software industry as a whole. There are many ongoing problems the information security sector is facing at the moment (the expansion of machine learning, and the looming of effective quantum processing) that might be easier to solve by taking a look back at how we solved problems in the past. Quantum proofing existing security schemes could come down to designing a task with many "right" answers, but with only one right answer that then leads to a valid result—something that early software developers likely encountered frequently when unreliable CPU synchronizations were more common.

The other big roadblock is the final question: How? The majority of software that is currently being preserved actively is stored on physical media. The best place to preserve data is on the Internet, where at any given time, you can count on at least a few hundred computers storing it simultaneously. This provides enough redundancy to ensure that it continues to be accessible. One problem is how to get the data from the physical media—where it can be scratched, demagnetized, and otherwise damaged—to the internet, where it can be safely preserved. The solution is often antiquated hardware that has been customized and configured to be a sort of bridge. This kind of customization can take a great deal of knowledge, depending on how many computing eras one has to cross, and what format the physical media takes.

For an easy example of this challenge, ask yourself: if you had a precious family photo saved on a punch card, how would you get it off? There is a question to consider in this same vein: will the software be re-compiled (and potentially, heavily modified) to be used on modern hardware, making access as easy as possible, or will it require specialized software or hardware to run in a state far closer to the original?

If you're interested in learning a little more about this topic. I highly recommend looking through the following resources, and maybe doing a little research of your own:

These are questions software engineers and individuals will continue to answer in the coming years, but business owners only have one real question to answer: do you have a policy for open-sourcing your code after it is no longer in active production?

Let us know how you feel about digital preservation, is it important to you or your business?

Wednesday February 14th, 2018#software #data