tadhg.com
tadhg.com
 

Backups

23:55 Tue 13 Mar 2007. Updated: 09:27 14 Mar 2007
[, , , ]

For years I’ve had no coherent backup strategy. For someone who does so much on computers, that’s rather insane. It’s been a project of mine to have a comprehensive and regularly-executed backup plan for about a decade. I’m not quite there, but I think that it’s finally within my grasp…

The first question is: “what do I want to back up?”, and the second question is “what is the ideal post-data-meltdown scenario?”.

Ideally, of course, I want everything backed up, and I want to resume immediately after any loss of data/hardware. More realistically, however, I want to make completely sure that I can recover the following:

* personal files (writing, notes, contact information, non-email correspondence, etc.)
* email (yes, it deserves a category of its own)
* application configuration
* web development code and assets
* other code
* databases
* graphics
* photos
* work files

I’m not as concerned about music. Perhaps I should be, but it seems like it’s a) replaceable, b) awkward to back up due to size, c) already backed up by physical media.

Naturally, I don’t just want backups, I want history. Version Control, specifically Subversion, gives me this. It also gives me something else that makes backing up far easier: centralization. If I use Subversion for everything, it follows that there’s a central place for everything to go. My move to Subversion is a key piece of why a backup strategy seems close at hand.

One tool that’s proven incredibly useful on Windows in my move to using Subversion for everything is linkd.exe, which is a rough analog of the Unix ln command, although it works only on directories. It allows you to create a new “quasi-directory” that points to another directory, and acts like it. So, if your checked-out Subversion stuff is in C:\Documents and Settings\username\someotherdirectory\subversion, you can use
linkd c:\svn C:\Documents and Settings\username\someotherdirectory\subversion
and thereafter refer to c:\svn to get there. This also means that you can “trick” applications that are finicky about where their configuration files go into being fine with those files living in your Subversion sandbox.

So, after centralization, the following files (will) comfortably live in Subversion as part of my everyday practice:

* personal files (writing, notes, contact information, non-email correspondence, etc.)
* application configuration (after some linkd.exe setup)
* web development code and assets
* other code
* graphics
* work files

It doesn’t make sense to put email into Subversion, because I rarely go back to edit prior email, so it doesn’t really need version control per se. My mail system (using Maildir rather than mbox) can be effectively backed up using rsync, which essentially does “smart copy”—copying files to a remote location, but only doing an actual copy per file if that file is new or has been changed since the last write. So the first time I back up my thousands of emails, it’ll copy all of them, but the second time it’ll only copy the new files.

My databases (primarily for this blog and sfmagic.org) aren’t really files that I work on, so they won’t get caught in the centralization net. Instead, I’ll have to write scripts to export them, and then add the results of these exports to Subversion. Version control does make sense for these, since it might be very important to be able to roll back to specific revisions. So these will go into Subversion via automated processes.

While I put my graphics work in Subversion, photos are different. They’re rather large, and they don’t tend to change—like email, they’re better suited to additive storage, and rsync. If I work on photos, I move them into graphics, where they will be put into version control. But for the vast majority of them, that would be overkill.

So the areas split into:

* Subversion daily practice (personal, web development, other code, graphics, work, application configuration)
* Subversion via automated export/backup (databases)
* rsync (email, photos)

So what about the actual backup part? Where does this stuff get backed up to?

For Subversion, I’m going to keep a second copy of my repositories on a second server on my local network. In addition, every machine on my network will have the latest version checked out daily. So two copies of the total history, and multiple copies of the latest version

For rsync, I’ll copy everything to a second server.

That takes care of local “soft” backups. Next is remote “soft” backups. These will go to unworkable, a server in Ireland run by my brother and some of his friends that I’ve paid for space on, and hopefully to Seth‘s server (we’ve talked about a reciprocal backup deal for a while). The remote backups are trickier, because I want to both encrypt and compress most of the stuff before backing it up, making straight rsync less useful, and because disk space and bandwidth concerns make filesize more of an issue. The process I’d like to follow is this:

* Keep 8 weeks’ worth of weekly backups
* Every week, test the 7-week-old backup for data integrity
* If it fails, raise the alarm…
* If it passes, delete the 8-week-old backup
* Pass or fail, copy the newest backup over

The main problem here is that “testing for data integrity” might be difficult and/or time-consuming. I suspect that I’ll be able to automate it at least to some degree. If I use straight rsync for the photos, this will be easier, since grabbing the stuff to my local network before performing operations on it becomes easier.

Lastly, I haven’t addressed backups written to storage media, like DVD. I’ll plan for that too, because there are data-corruption scenarios in which the local stuff gets tainted somehow and corrupts the off-site backups over time. But writing to DVD will be easier once I have all of the rest of this in place.

That’s the plan. I think I’ll have most of it in place by the end of March, and the remote aspects by the end of April. And that will be one less nagging worry to carry around.

Leave a Reply