Dan Hockstein and Mari Allison are 2022 Junior Fellows in the Digital Collections Management and Services Division (DCMS) working under the mentorship of Kate Murray.
Over the course of our Junior Fellowship this summer, we have focused on a variety of streams of work around the Library of Congress’ Sustainability of Digital Formats website. The site contains an extensive list of commonly used file formats, wrappers, and encodings. There are thousands of these created by legacy equipment and software that present challenges in identification, preservation, and use. Among these is the file format produced by the now-defunct word processing platform WordStar.
The WordStar format is the default proprietary plain-text format for a word processing platform of the same name. The initial version of WordStar was first published by MicroPro International for Digital Research, Inc’s CP/M operating system. Subsequent releases of WordStar’s early versions were ports for microcomputers and their operating systems – for example, Tandy’s LDOS-5, the Epson PX-8, the Osborne 1, and the Apple ][.
Microsoft’s MS-DOS operating system became a platform for WordStar’s wide adoption, beginning with version 3.0 of the program. At this point, MicroPro International also began to splinter into various companies through staffing changes, some of which created direct competition with WordStar. Through our research, it became clear that the program changed hands several times, intersected with and borrowed from other pieces of software, and created a complicated pathway that created several output files that could all be called “WordStar” files. As a result, the structure of WordStar files did not exist in a streamlined, linear trajectory of updates and versioning either – changes were fairly drastic. In the first few versions of WordStar, the 8th bit of ASCII characters, usually reserved to extend the character set, was instead used to store print and formatting information. This limited cross-compatibility with other word processors and was later changed with the release of WordStar 5.0.
The introduction of additional word processing programs, such as Microsoft Word and Apache OpenOffice, minimized the market share of WordStar for its use case. The software is now hosted and available for paid download, but is no longer developed or maintained by its original owners.
Quite a few WordStar files exist within the Library’s collections, and they have unique properties that, in comparison to similar or more modern text formats, are more complex. In order to document and assist preservation and access to these files, we have been tasked with creating a Format Description Document, or FDD, for the WordStar file format. This research is still in progress, but will be published at WordStar File Format Family.
The WordStar Community
One of the most fun aspects of doing research on WordStar was seeing the passionate community of writers who still swear by the software. For example, George R.R. Martin still exclusively uses WordStar 4.0 for DOS to write A Song of Ice and Fire, the book series that inspired the Game of Thrones TV series. Hobbyists keep discussion alive in online forums, post guides on how to set up a DOS machine or emulator to run WordStar, and modify Microsoft Word to include the same key command shortcuts as WordStar. Through reading these posts, we came to understand what people love about the WordStar application. It was the first word processor that was able to render the document on screen, formatted almost exactly as it would appear when printed. The efficient command keys, used to navigate menus and perform operations such as Print or Save, are another favorite application feature, vividly explained in this post about a man teaching his 9 year old daughter how to use them. The command keys are distinct from dot commands, which aren’t just application features, but present in the actual text file. These are visible on the screen during the editing process but become formatting information when rendered into a printed page.
While it was wonderful to explore WordStar’s online community, the unofficial nature of the information, often taking the form of blog posts on someone’s personal website, somewhat complicated our research. When writing FDDs, the Library prefers to use primary sources, and at times, we were hesitant about linking out to personal sites. In order to verify information, we tried to cross reference several personal websites and news articles.
Identifying WordStar Files
Utilizing a modern graphic user interface to access, organize, and name files is quite different from creating content on the early microcomputers that many WordStar files were created on. While these systems did have directory structure, it was not always represented visually by folders, and operating systems did not necessarily associate filetypes with applications. Generally, much more was left to the end user.
Because of this, identifying the WordStar files in our collection was an unexpected challenge. By modern conventions, most people use file extensions, or the 2-4 characters preceding the “.” in a file name, as a high level but imprecise way to identify a format at a glance - “.docx” for Microsoft Word documents, “.mp3” for MP3 files, or “.csv” for Comma Separated Values. Some WordStar files may follow similar standards with .ws and optionally .ws2, .ws3, etc. depending on the version of WordStar, but other WordStar files may have very different extensions. The WordStar Reference Manual from 1983 p.1-12 states: “The most useful file name is one that helps you remember the file contents… For example, you might add .LET after each letter file name, .REP after each report or .912 to indicate that September 12 was the last editing session.” Depending on which standard an individual creator has decided to follow, this increases the difficulty of verifying WordStar as the software used to create a single file at a glance. At an institution with many word processing documents dating back to the ‘70s, this poses an issue!
Examining a file for signature information is a more consistent way to identify formats. A signature is a piece of embedded metadata used to identify a filetype, often found in the header or footer of the file.
Many of WordStar’s different versions have their own unique file signature, which greatly increases the complexity of trying to identify any given file. Signature information for WordStar 5.0, 6.0, 7.0 and 2000 are currently available in the National Archives UK’s PRONOM registry, but other versions have not had signatures identified yet.
We are also looking into unique features of the format that could also serve as identifiers. One example is symmetrical sequences which first showed up in WordStar 5.0. Symmetrical sequences serve as tags that enclose extra information like font color and footnotes. Symmetrical sequences follow a defined byte structure, and the opening and closing tags have their own control character, 1DH. Theoretically, this distinguishing feature can be a way to identify files from WordStar 5.0 and above, but it’s not as consistent as a file signature, nor is it as easy to automate checking.
When researching WordStar, we had to maintain a balance between technical and contextual information. To create a holistic format description, it’s important to describe the history and adoption of WordStar’s many versions while also providing data about byte sequences and ASCII encodings. Combining our research skills and technical knowledge to uncover more about WordStar was an extremely rewarding process. We uncovered and better understood a history of early word processing documentation, unearthed some great graphic design, and created new resources to sustain digital formats into the future!
Mari: I’ve learned a lot about conducting file format research through this whole process. It’s been really fun to dive down some of the rabbit holes, from studying the full format specification and picking out useful identification information to reading blog posts and interviews from science fiction and fantasy authors. Sometimes it’s difficult to know if I’m going too deep on esoteric information, but I’ve learned through interactions with the greater file format and digital preservation community that every detail is valued. It feels incredible to contribute to a resource that will be used both within the Library, and also by outside researchers.
Dan: It’s been a great experience to discover the inner workings of what makes a file, all while contributing to an important body of knowledge for the digital preservation community. Knowing that our research may make identification and access of WordStar files easier for future researchers, and being able to contribute to Library resources, has allowed me to build new skills while also making a lasting impact as a Fellow. It also made me even more interested in early computing and technology.