Version Control for Evolving Files

How often do your files evolve? For example, many researchers develop their protocols, paper drafts, code, slide decks, and analyses over time, going through many different versions of a file before settling on a final one (or even never finalizing a document!). This is a totally normal part of doing research but does make managing those files a bit challenging.

Thankfully, there is a solution to dealing with such ever-changing documents: version control. Version control allows you to keep track of how your files change over time, either by taking a snapshot of the whole document or making note of the differences between two versions of the same document. This makes it easy to go back and see what you did or even revert to an earlier version of the file if you made changes that didn’t work out (just think how helpful this would be for files like protocols!).

The good news is that there are several ways to accomplish version control, depending on your need and skills.

The Simple Way

The easiest way to implement version control is to periodically save a new version of the file. Each successive file should either have an updated version number or be labelled with the current date. Here’s how a set of versioned files might look:

  • mydata_v1.csv
  • mydata_v2.csv
  • mydata_v3.csv
  • mydata_FINAL.csv

I like using the designation “FINAL” to denote the very last version – it makes for easy scanning when you’re looking through your files.

Dates are also useful, especially if you don’t anticipate ever having a final version of the document. Don’t forget to use ISO 8601 here to format your dates!

  • myprotocol-20151213.txt
  • myprotocol-20160107.txt
  • myprotocol-20160325.txt

The downside of this system is that it takes up lots of hard drive space, as you’re keeping multiple copies of your file on hand.

The Robust Way

For something more robust than the simple solution described above, I recommend a version control system such as Git. These systems come out of computer science and track the small changes between files, therefore taking up less disk space for file tracking. While originally designed only for code, Git is now being used by researchers to track many types of files.

The downside of version control systems that they can have a high learning curve (though Git has a GUI version that is less onerous to learn than the command line). But once you get over the curve, version control software is a really powerful tool to add to your research arsenal as it handles most of the legwork of versioning and offers other cool features besides, like collaboration support. Here’s a list of resources to help you get started with Git.

(A side note for those familiar with GitHub: be aware that Git and GitHub are two different things. Git is the system that handles versioning while GitHub is an online repository that houses files. They are ideally designed to be used together though it’s definitely possible to use Git without GitHub.)

An Alternate Way

As a last resort, you can leverage the fact that some storage platforms have built-in versioning. For example, both Box and SpiderOak keep track of each version of a file uploaded onto their systems. This is not the best option for versioning, as it takes control out of your hands, but it’s better than nothing and is useful in a pinch.

Final Thoughts

I hope you will consider using one of these methods of version control for your files. Version control is downright handy for when you accidentally delete a portion of a file, are sending documents back and forth with collaborators, need to revert to an earlier version of a file, or want to be transparent about evolving protocols. So pick the system that works best for you and go for it!