File Formats

Every once and a while, I run across a topic and think, “surely I’ve covered this one my blog?” The most recent of these topics, which came up when I wrote December’s Exit Strategy post, concerns file formats.

File formats and exit strategies go hand-in-hand, as you don’t want to be stuck with inaccessible data in an unreadable file format. Proprietary, out-of-date, little used formats lower the chance that you’ll be able to access your data when you need it. Ask anyone with 20-year old files if they can use those files and you’ll likely see why file formats matter.

When it comes to choosing file formats that last, file types actually exist a spectrum. The best formats are open, well-documented, and in wide use. Clear examples are .TXT instead of .DOCX, or .CSV instead of .XLSX or .SAS. In the middle of the spectrum we find something like .PDF, which is an Adobe file format but in such wide use that it will be usable for many years. Also note that the spectrum of preferred file formats shifts over time (hello Lotus Notes and WordPerfect!).

Since there is no one right answer for any data type, the key thing for picking a good file format is to ask yourself if your content is currently in a file format that is uncommon or can only be opened by a specific software program. If the answer is yes, now is the time to make a backup copy of that data in a more open format. Even if you lose some formatting in the process, it’s better to have some data in an open format than having no data because it’s locked in an unreadable format. By making a copy, you also don’t have to lose the performance of the original file format while gaining the sustainability of the new; you can, of course, wholly switch to a better format if that is feasible. It’s also worth reviewing old data for formats that are no longer popular or supported.

Good data management is the sum of a number of small practices and picking good files formats is a piece of this puzzle. The more you are aware of how closed your current file formats are, the better you can plan for making that data usable into the future.