Thank you to Virginia Wilson at the University of Saskatchewan Library for allowing me to copy her excellent Research Data Management guide.
I have changed some of the information to reflect KPU's situation, but most almost all credit goes to Virginia.(Any mistakes are likely mine, though!)
File Management
Large research projects can generate massive amounts of data, both in terms of size and number of files. Short, descriptive file names and a simple file hierarchy make these files easier to navigate and locate.
Once you create, collect, or start manipulating data and files, they can quickly become disorganized. To save time and prevent errors later on, you and your colleagues should decide how you will name and structure files and folders. Including a data dictionary or README file containing descriptive information about the data ('metadata') along with the data itself preserves context to ensure that you and others can understand the data in the short- and long-term. This documentation helps research teams and collaborators work more effectively and efficiently throughout the entire research life cycle, as well as greatly improving the data's future reusability.
"Documents" xkcd, CC-BY-NC 2.5
Consistent and thoughtful file naming will help you and your colleagues avoid frustration and work more efficiently. Establishing a naming convention will help to provide consistency, which will make it easier to find and correctly identify your files, prevent version control problems when working on files collaboratively. It is wise to develop a logical structure in cooperation with your collaborators at the start of a project.
e.g. [element 1]_[element 2 WordPart-WordPart-WordPart]_[element 3].txt
Recommended: around 30 characters. Definitely no more than 255 characters (the maximum filepath length in Windows).
If you use abbreviations, they must be explained in the data documentation (README).
Why? Shorter filenames are easier to read, don't cause problems with file systems, and reduce side-scrolling and column adjustment.
e.g. dates, file types, locations, people, version, procedures performed
Why? Helps users find the right file more easily.
Use _underscores or -hyphens as delimiters in filenames, or use CamelCase (words capitalized, no spaces)
Don't use any other special characters, e.g.: & , * % # * ( ) ! @$ ^ ~ ‘ { } [ ] ? < >
Why? Different computer programs handle special characters differently – filing order, etc.
Why? YYYYMMDD is an international standard (ISO 8601), ensuring interoperability. Computers sort YYYYMMDD in chronological order.
Either sequentially (e.g. v01, v02,...) or with a unique date and time ( e.g. 20140403_182206).
Why? Next year, will you remember what changed from one file to the next, and in what order?
Recommended: at most 3 - 4 levels deep
Why? Complex folder hierarchies are harder to navigate and offer more opportunities for filing errors. System back-ups may take longer.
(Adapted from: UBC data management planning documentation)
Version Control is the way to track revisions of a data set, or a process. If your research involves more than one person, it is essential. You will want to record every change to a file, no matter how small. Keep track of the changes to a file in your file naming convention and log files, or version control software. File sharing software can also be used to track versions.
You can do it manually by including a version control indicator in the file name, such as v01, v02, v1.4. The standard convention is to use whole numbers for major revisions, and decimals for minor ones.
There are several software programs that are designed for managing versions tracking. Mercurial, TortoiseSVN, Apache Subversion, Git, and SmartSVN.
File sharing software can also be used to track versions. Google Docs records version changes as well.
As you think through how to manage this step, keep the following issues in mind:
File Formats
A computer file format is a particular way of encoding information within a computer file so that it can be recognized by an application. File formats are indicated by the file name extension, usually a full stop followed by three letters. Examples: .csv, .pdf, .txt
Open File Formats (.TIFF, .PDF, .XML, .MP3)
An open file format is one where the format specification is available to anyone, free of charge, so that the specification can be used in a variety of software without any intellectual property right limitations. Because the file specifications are publicly available, the open-source software community can ensure that data stored in these file formats remain accessible over the long term.
Open formats are recommended for file preservation purposes because they do not require specific software to access. Choose open file formats in order to:
Proprietary File Formats (.DOCX, .RAW, DWG, .PSD).
Proprietary File Formats work only with software provided by the vendor. File specifications are not freely available, so when the software is no longer supported, files in that format are typically unreadable.
Recommended File Formats
Note: Some research disciplines and industries treat a specific proprietary file format as a de facto standard which you may wish to follow.
Source: UBC Library.