My home backup system

After having spent the last five years feeling guilty, I now, finally, have my laptop backing up the data I care about to another machine on my network. Here’s how I did it. This is a relatively long and complicated process, but it means that it all happens automatically and by magic, and I don’t ever have to interact with it, which is what I want. The first component I needed was some backup space: a machine on my network that I could send the backups to. I did look at online backup space (Amazon’s S3 and similar) like all the cool kids, but I just can’t get on with it, and I resent paying because I’m a cheapskate. So, it was to be a box on my network. Now, there are useful NAS machines around, which just get plugged in and automatically export their disc space (normally as a Windows share, with Samba), and I looked at those too (there’s the Terastation, etc, etc). However, I needed an always-on server for another purpose anyway, so I decided to go with a real machine. A machine cobbled together out of the Big Box Of Machine Bits, of course.

Setting up the server

It’s got two disc drives in it, and I divided the first disc into two partitions, one with 1GB and the other with all the rest. Install Ubuntu Linux 6.10 Edgy, server edition, on the 1GB partition. (I actually installed dapper and then upgraded it to edgy, for that bleeding edge greatness; at this writing, edgy is only at RC stage.) After that, we want to take all the remaining space on the machine (one big partition on disc 1, and all of disc 2) and make them one big block of disc space; this is what LVM, the Linux Volume Manager, is for. Note that all this stuff can be done with proper GUI tools, but I don’t have a GUI on this machine because it’s a server and I’m trying to converse disc space. This bit’s also from memory, so be very careful and don’t just slavishly follow it.

# First, make the partition available to LVM, by 
# making it a "physical volume". This is LVM-speak for 
# "a bit of a disk that I can use"
pvcreate /dev/hda3 # the big partition on the first disc
pvcreate /dev/hdb # and all of the second disc

# Now, create a "volume group". This is LVM-speak for 
# "a big block of disc space all managed together"
vgcreate volumegroup /dev/hda3 /dev/hdb

# Next, create a "logical volume". LVM-speak: "something
# that looks like a disc drive, so you can mount it"
# First, find out how big it can be
vgdisplay | grep "Total PE"
  Total PE              11833
# now create the logical volume at that size
lvcreate -l 11833 volumegroup -n logical1

# You now have a device /dev/volumegroup/logical1
# which you can treat as if it were a disc
# Create a dir to put it in
mkdir /space
# and add it to /etc/fstab so it gets mounted. Add the line:
/dev/volumegroup/logical1 /space auto   defaults        0       0

After that complex little bit (again, if you aren’t tight like me, do it with the GUI, it’s easier), you will have a directory /space on the machine with loads of space in it. Install openssh-server and rsync, because we’ll need them later.

Rotating backups

The way I want my backups to work is as follows. Every night, each machine on my network should connect, and send everything that’s changed since yesterday. When I look on the backup server, there should be a folder for each machine, and there should be in there a folder per day. Each folder should look like a complete backup, but if a file hasn’t changed since yesterday it shouldn’t take up any more disc space. So, the folder structure should look, say, like this:

/space
  /stuart
    /2006-10-24
      /folder1
        /file1
        /file2
        /newfile1
      /folder2
        /file3
    /2006-10-23
      /folder1
        /file1
        /file2
      /folder2
        /file3

and the 2006-10-24 folder should have all the files in it but only take up as much space as newfile1. Complicated, but part of the reason I specified this is because I know it’s possible. (The main reason, of course, is that I’m tight and want to save disc space.) Making this happens involves two stages: making a hardlink tree, and using rsync.

The hardlink tree

If you can get over how much this sounds like something out of an Enid Blyton book, it’s a cool technique. I’m not going to explain hardlinks and inodes and things like that here, because there are many other descriptions elsewhere. Suffice to say that, if you have a folder, you can make a duplicate of that folder with cp -al folder newfolder, and that duplicate will look the same and be full of real files but not take up any disc space. My nightly backup therefore needs to do the following:

  1. Copy last night’s backup to a new folder, named for the current date
  2. Change the data in the new folder to look like my laptop, so it’s got all yesterday’s data but with any changes I’ve made today

The issue here is: how do you know what last night’s backup is called? I’ve solved this by making sure there’s a symbolic link called current which always points to the most recent backup. So, the above process actually becomes:

  1. Copy the current folder to a new folder, named for the current date
  2. Change the data in the new folder to look like my laptop, so it’s got all yesterday’s data but with any changes I’ve made today
  3. Change the current link so it points to the newly created most recent backup

The script that does this is stored in /space/begin-backup, made executable with chmod +x /space/begin-backup, and looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/bin/bash

PERSON=$1
BROOT=/space

if [ -z "$PERSON" ]; then
  echo You must pass the name of a backup dir
  exit 1
fi

PDIR=$BROOT/$PERSON/

# If person dir doesn't exist, create it
if [ ! -d $PDIR ]; then mkdir $PDIR; fi

# If there's no current dir, create an empty one and link it
if [ ! -d $PDIR/current ]; then
  mkdir $PDIR/first
  ln -s first $PDIR/current
fi

DT=$(date -Iseconds)

# Hardlink-tree the existing recent dir
cp -al "$(readlink -f $PDIR/current)" $PDIR/$DT
# and link current to the new hardlink tree
rm $PDIR/current
ln -s $DT $PDIR/current

We’ll come back to how you run this in a minute.

Rsync

The change the data in the new folder to look like my laptop bit is done with rsync, which is complex but brilliantly clever. In essence, rsync is like copy (or cp), except that it compares the source and the destination and only sends the changes over. On my laptop, I can do

rsync -avz --delete -e ssh 
    /some/folder/to/back/up 
    myserver:/space/stuart/current/

and that will copy /some/folder/to/back/up over to the server. Importantly, if that folder is already in the backup space, in the current folder (because we backed it up yesterday) then it’ll only copy the changes over. This is why we make sure that there’s a folder called current with the contents of last night’s backup! Exactly how we run this rsync command we’ll come on to in a minute. Patience, Iago.

Choosing what gets backed up

I don’t want to back up everything. I don’t have the space, and to be frank I have a lot of crap lying around on my machine. So I need a very easy way of tagging something for backups. This is a perfect use of emblems; I can “tag” a file or a folder in the file manager with a special “backup” emblem, and that should indicate to my backup process that that file or folder wants to be included in the backup. Ubuntu doesn’t have a backup emblem included by default, but adding one is easy, and explained in the docs. Pick yourself an image (I use this little tape) and add it as an emblem, and then go through your machine and add it to every file or folder that needs backing up. (This will, if applied to a folder, back up everything inside it. If you need it to back up only some of the stuff inside it then you’ll have to not apply it to the folder. Yes, this is awkward, but I don’t need to do that.) Applying emblems is also in the documentation; a quick way if you’re doing this a lot is to pop up the Edit > Backgrounds and Emblems window and just repeatedly drag your new backup emblem to everything.

SSH with no password

One final preparation step: in order that the backup can run without me being around, I need to be able to make an ssh connection from my laptop to the server without entering a password. I’m not going to describe how to do this because there are plenty of guides out there on the web.

Make it so

Now, finally, after lots of setup, it’s time to actually make it all happen. To summarise, then, to do a backup, we need to:

  1. Run, on the server, the copy-last-night’s-backup script
    1. Get the list of all the files with the backup emblem
  2. Use rsync to copy all those files into the new backup folder on the server

To get the list, we can use my findemblem.py script (and you thought I just wrote it for fun!). The final script, dobackup.sh, which actually does the work, just does the above steps, and looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#!/bin/bash

# Do backups to the rsync server
# You must have already set up a passphraseless ssh key to the ssh server
# so that "ssh servername" just logs you in.

BK=$(dirname $0)
BKNAME=stuart

# First, tell it to clock over the backup
ssh servername /space/begin-backup $BKNAME

# Now, do the backup
python $BK/findemblem.py backup | while read fn; do
  rsync -avzq --delete -e ssh "$fn" 
    servername:/space/$BKNAME/current
done

All that remains now is to schedule this script to run every night, by editing your tasklist with crontab -e and adding the line

40  4  *   *   *     /full/path/to/dobackup.sh

And, lo and behold, you have overnight backups. All done and dusted. Phew.

More in the discussion (powered by webmentions)

  • (no mentions, yet.)