I mentioned README.txt files in my previous post and I wanted to expand on this concept because README’s are one of my favorite data management tools. The reason is that many of us keep notes separate from our digital data files, so our digital data is not always well documented or understandable at a glance. README.txt files cover this gap and allow you to add notes about the organization and content of your digital files and folders. This helps coworkers and your future-self navigate through your data.
README.txt files originated with computer code, where it is the first file someone should look at in order to understand the code (as implied by the name, README). Being a .txt file makes this information readable on a number of systems because of the simple file type. The simplicity and portability make README’s a great tool to coopt for data management.
I strongly recommend that you use a README.txt file at the top level of your project folder to explain the purpose of the project, the relevant summary and contact details, and general organization of your files. This is equivalent to using the first page of your laboratory notebook to give a general description of your project.
Here is an example of a top-level README.txt file for an imaginary chemistry project:
Project: Kristin’s important chemistry project
Date: June 2013-April 2014
Description: Description of my awesome project here
Funder: Department of Energy, grant no: XXXXXX
Contact: Kristin Briney, firstname.lastname@example.org
All files live in the ‘ImportantProject’ folder, with content organized into subfolders as follows:
- ‘RawData’: All raw data goes into this folder, with subfolders organized by date
- ‘AnalyzedData’: Data analysis files
- ‘PaperDrafts’: Draft of paper, including text, figures, outlines, reference library, etc.
- ‘Documentation’: Scanned copies of my written research notes and other research notes
- ‘Miscellaneous’: Other information that relates to this project
Raw data files will be named as follows:
All files will be stored on my computer and backed up daily to the shared department server. I will also keep a backup copy in the cloud using SpiderOak.
If I hand someone this project folder, the README.txt contains enough information to understand the project and do basic navigation through the subfolders. Plus, I tell you where all of the copies of my data live if one should accidentally be lost. While not extensive, this information is invaluable to someone unfamiliar with my work trying to find and use my files, such as a boss or coworker.
Besides having one top-level README.txt file, I also recommend using these text files throughout your digital file structure whenever you need them. If you cannot tell, at a glance, what all of the files and subfolder contain, you should create a README.txt (and possibly rename your files and folders!).
Here is an example of a low-level README.txt, which documents the differences between several different versions of analyzed dataset:
Description of files in the “Analysis/ReactionTime/KMnO4” folder
- KMnO4rxn_v01: Organizing raw data into one spreadsheet
- KMnO4rxn_v02: Trying out first-order reaction rate
- KMnO4rxn_v03: Trying out second-order reaction rate
- KMnO4rxn_v04: Revert back to v02/first-order fitting and refining analysis
- KMnO4rxn_FINAL: Final fit and numbers for reaction rate
The graphs corresponding to each file version are in the ‘Graphs’ subfolder, with correspondence explained by the README.txt contained therein.
You can see that README’s don’t have to be large files. Instead, they just need to contain enough information to know what you’re looking at.
README.txt files are ostensibly for other people who might use your data, but they are also useful for you, the data creator, if and when you come back to an older set of data. We tend to forget small details over time and a good README.txt serves as a reminder about those details and an easy way to reacclimate ourselves with our older data.
It takes a small amount of time to create README.txt, but they fill an important documentation gap and are incredibly useful for data given to others and data with long-term value. I encourage you to create a few README.txt files and improve your data management!