And if so, why would you ever want to? About a year ago the University of Iowa Libraries Special Collections announced a rather exciting project, to digitize the data tapes from the Explorer I satellite mission. My first thought: the data on these tapes is likely digital to begin with, so there’s not really something to digitize here. They explain, the plan is to “digitize the data from the Explorer I tapes and make it freely accessible online in its original raw format, to allow researchers or any interested parties to download the full data set. “ It might seem like a minor point for a stickler for vocabulary, but that sounds like transferring or migrating data from its original storage media to new media.
To clarify, I’m not trying to be a pedant here. What they are saying is clear and it makes sense. With that said, I think there are actually some meaningful issues to unpack here about the difference between digital preservation and digitization and reading, encoding and registering digital information. Edit: See the comment from Greg Prickman below for further explanation of why the digitization of the Explorer 1 tapes, which are in fact analog reel-to-reel recordings, is indeed a case of digitization.
Digitization involves taking digital readings of physical artifacts
In digitization, one uses some mechanism to create a bitstream, a representation of some set of features of a physical object in a sequence of ones and zeros. In this respect, digitization is always about the creation of a new digital object. The new digital object registers some features of the physical object. For example, a digital camera registers a specific range of color values and a specific but limited numbers of dots per square inch. Digital audio and video recorders capture streams of discrete numerical readings of changes in air pressure (sound) and discrete numerical readings of chroma and luminance values over time. In short, digitization involves taking readings of some set of features of an artifact. (Edit/Note: See comment from Carl Fleischhauer below for a refinement of exactly how these digitization processes work.)
Reading bits off old media is not digitization
Taking the description of the data tapes from the Explorer I mission, it sounds like this particular project is migrating data. That would mean reading the sequence of bits off their original media and then make them accessible. On one level it makes sense to call this digitization, the results are digital and the general objective of digitization projects is to make materials more broadly accessible. Moving the bits off their original media and into an online networked environment feels the same, but it has some important differences. If we have access to the raw data from those tapes we are not accessing some kind of digital surrogate, or some representation of features of the data, we would actually be working with the original. The alographic nature of digital objects, means working with a bit for bit copy of the data is exactly the same as working with the bits encoded on their original media. With this noted, perhaps most interestingly, there are times when one does want to actually digitize a digital object.
When we do digitize digital objects
In most contexts of working with digital records and media for long term preservation, one uses hardware and software to get access to and acquire the bitstream encoded on the storage media. With that said, there are particular cases where you don’t want to do that. In cases where parts of the storage media are illegible, or where there are issues with getting the software in a particular storage device to read the bits off the media there are approaches that bypass a storage devices interpretation of it’s own bits and instead resort to registering readings of the storage media itself. For example, a tool like Kryoflux can create a disk image of a floppy disk that is considerably larger in file size than the actual contents of the disk. In this case, the tool is actually digitizing the contents of a floppy disk. It stops treating the bits on the disk as digital information and shifts to record readings of the magnetic flux transition timing on the media itself. The result is a new digital object, one from which you can then work to interpret or reconstruct the original bitstream from the recordings of the physical traces of those bits you have digitized.
So when is and isn’t it digitization?
So, it’s digitization whenever you take digital readings of features of a physical artifact. If you have a bit for bit copy of something, you have migrated or transferred the bitstreams to new media but you haven’t digitized them. With that said, there are indeed times when you want to take digital readings of features of the actual analog media on which a set of digital objects are encoded. That is a situation in which you would be digitizing a set of features of the analog media on which digital objects reside. What do you think? Is this helpful clarification? Do you agree with how I’ve hashed this out?