Backing up…with Dissertation

I wonder what's behind Door Number One

The Scary Stuff

This tweet caught my attention this morning:

The “lost laptop” note from the student, complete with login password and instructions for how to find the specific folder containing the research, reeks of understandable desperation. As I read it, my heart sank at merely the thought of losing dissertation project data. There’s irreplaceable, and then there’s irreplaceable. I cannot imagine the feeling of soul-crushing loss that would accompany the discovery of a lost collection of that much research.

That moment of vicarious terror felt more familiar than I’d like. I recalled making a brush-with-death post on Facebook not too long ago. While coding interview data, the analysis software I was using decided to have trouble saving my work. The program suggested that I save my work to a new location, then compare the new with the old. Because apparently comparing a database full of text and coding is a trivial thing. (I make it trivial: when the “old” version has a last-saved time of hours before I started working, it’s a safe bet that’s a bad version to keep.)

Windows error dialogue: “A problem has occurred and you will be asked to specify a new destination for your project.”

These close-calls made me double-check that the work I need to preserve would survive disaster. Posting about the trouble created an interesting conversation on Facebook, and responses to the Rutgers laptop loss brought numerous suggestions on Twitter, as well. Most people recommend Dropbox, for an excellent reason: it’s free, it’s offsite, and it’s automatic. If you add in the fact that it’s redundant, it meets all three requirements of Merlin Mann’s Holy Trinity of Backups. However, the fact that a Dropbox backup, lovely and automatic as it is, sits on someone else’s server that you have no control over, leads to a very significant issue that can easily go overlooked:

Dropbox should not be used for personally identifiable data collected as part of a research project.

That means my survey responses (with student names and email addresses attached) or my portfolio assessment data (with student names and the equivalent of a grade) cannot be stored on a system that can potentially be accessed by an outside party. This complicates the default backup strategy for most grad students. Since most of our work is textual, and therefore small and compressible, it can be saved on even the smallest of Dropbox plans. Most students I know use the service and forget about it. That’s fine until the data you’re working with shouldn’t be on there in the first place.

The Critical Elements

Dissertations need special handling, and that means thinking about your backup strategy. I’d like to review the Holy Trinity of Backups and two additional elements, encryption (essential for research) and recoverability (essential for any backup). And because modern technology makes this five-part plan simple, why not adopt it with all your data?

Backups must be automated.

If you have to think about making a backup, it won’t work. The more important a project is, the more likely you are to get deeply involved in it. The more involved you are in working on the project, the less involved you are in backing it up. If your backup process doesn’t work for you while you’re working for yourself, it’s a broken process. Make sure whatever solution you use can run without your intervention. There’s nothing better than, after accidentally losing data, finding that it accidentally exists in a backup.

Backups must be redundant.

This is the real beauty of Dropbox and the most essential distinction of a backup vs. simple storage. A backup must be a duplicate of what you normally keep somewhere else. In other words, your backup must be a completely separate thing from the thing it’s backing up. Most particularly, it must be on a separate drive. That way, when the drive inside your computer dies (I only had two students lose hard drives this semester), the other drive containing your backup is okay. Or vice-versa.

Backups must be off-site.

If your house burns down, the backup drive that’s sitting on your desk won’t help. If you use Dropbox, check to see whether all your family photos are kept there along with your documents. My guess is that they aren’t. Buy Yet Another Hard Drive™ and make an extra copy of all your stuff, then leave it in your desk drawer at work. That way, in order to destroy all record of your wedding portraits, nefarious Acts of God would have to get both your home and your office. Obviously, the closer together those two places are, the less willing you should be to consider them off-site.

Backups should be encrypted, a must if they contain sensitive research data.

If your house is burglarized, and someone grabs the backup drive off your desk and plugs it into another computer, will they have instant access to everything you’ve ever done/saved? What about your passwords, account numbers, or other sensitive information? It’s easier than ever to encrypt entire disks, well worth it for the peace of mind it allows. Dropbox should not be used for backing up sensitive research data because you don’t know who has access to that data on their servers. Keep it close to home.

Backups must be recoverable.

Try it sometime: take the most important thing you’re working on and move it somewhere random. Then restore it from your backup system. See what’s involved in getting the data back in place, and decide whether it actually works. Be particularly aware of what programs you have to use to get to the data, because those would have to be in-place before you can recover after a disaster. Make sure you can go through the process. If you can’t, your backup is useless.

My Backup Process

Here’s what I do to address all five steps. It’s a process I’ve been using for years now, and it’s saved my butt more times than
I’d care to admit.

Time Machine

Included in every Mac OS since October 2007, Time Machine is a backup system that’s designed to be disgustingly simple. You plug in an external drive, the computer asks if you want to use it for backup. You say yes, you go have dinner. Any time that drive is plugged in, your computer will back up. (If you have to remember to plug in, it’s not automatic, and you’re breaking rule #1.) I have a server on my network that hosts the backup disk used by all three of the computers in the house. That’s a little on the advanced side of things, but the same result can be achieved by plugging a hard drive into an AirPort Extreme base station—a wireless router that can host drives.

screenshot of Time Machine panel of System Preferences

Starting with Mountain Lion (released in July 2012), Time Machine supports multiple backup drives, cycling among them whenever they’re available. After 10 days of no access, Time Machine displays a message saying it’s been a while since it backed up to a particular drive. That little notification helps me keep my off-site backups updated.

Time Machine notification that a backup disk hasn't been available for 11 days

When I’m told it’s been too long since a backup was made, I put the hard drive from my desk in my bag to take to work. Once there, I put that drive in my locked file cabinet and replace it with the one that was sitting in my office, ready to bring it home, plug it in, and update the backup. I cycle my offsite backups every 10 days or so, and my local backups happen every hour.

Time Machine also handles encryption. When adding a new drive, Mountain Lion presents an option to encrypt the backup. I encrypt the entire drive so if anyone steals my backup, they need my password to get at the information on the disk.

FileVault

Since the release of Lion in July 2011, Mac users can completely encrypt their entire hard drive using a tool called FileVault 2. This system is completely transparent once it’s been turned on, but it adds the peace of mind that if your computer is ever stolen, the thief would have a very difficult time accessing the drive. Particularly for those with portable computers, turning on FileVault is a no-brainer, adding simple and thorough security. (It’s a good idea to require a password when your computer wakes from sleep/screensaver, as well.)

Dropbox

For less-sensitive data, I do use Dropbox. That way, as soon as I revise a document, a copy is sent to a server in case of trouble. But for everything else on my computer, the backup on the network is only ever an hour old, and the backup at work is less than two weeks old.

I sincerely hope I never have my laptop stolen. But if I do, I know I have automatic, redundant, off-site, encrypted, recoverable backups. Do you?