Most of us feel uneasy when we compare the masses on data that we
store on our modern terabyte harddrives and think back to a time where
1-3 tapes would provide suitable backup.
The most common backup strategy today, outside of companies that
afford a tape robot is external harddrives, usually USB or firewire.
This is a very shallow form of backup.
Having one single backup is a problem, as it covers only a part of
the failure modes that can wreck your data. Also, if you are using
USB in particular, be warned that the USB connection can be very
flimsy. Both USB hardware and USB software (drivers) often suffer
from "mouse-syndrome", meaning that they have been designed with a
simple devices in mind, not with massive flows of data where no bit is
allowed to turn.
What do we want to guard against?
That's the first thing to do: make a catalog of the possible failure
Here are the obvious ones:
On first sight this is pretty manageable. You use RAID-1, 5 or 6
against dead harddrives. You back up to a USB drive so that you can
"roll back" when you fatfingered your files. If the external drive
breaks that's fine as long as you don't kill files on the primary
computer at the same time.
Now, here is the class of "silent corruption" events. This means
that you damage or otherwise lose files on your primary computer but
don't notice right away. By the time you notice you probably have
overrwritten that single external harddrive backup with the bad
version, leading to a permanent loss of the data.
Reasons for silent corruption include:
And some more things to keep in mind:
Here is what I ended up doing:
The backup then goes as follows:
Obviously, the snapshots then allow me to go back to the point in time
of any previously issued rsync command.
Snapshots are very small for small changes, it is block by block.
So I can do it often as long as I don't push junk.
Junk management is required here. A snapshot becomes readonly. If
you got some large piece of junk into a snapshot you can only get the
space back by deleting that whole snapshot. I deal with this in more
Evaluation against threats
Let's see how this solution does against the threats:
I mentioned previously that "junk management" is critical here.
Snapshots are readonly after they have been taken. If you even pushed
large junk from the primary computer via the backup server into a
snapshot, you can only ever get the disk space back by deleting the
That sounds scary, but I found that there is a way to deal with
this more comfortably, and that is by comparing the sizes of the
snapshots. Let's say you have a couple snapshots already. You push
data and make a new snapshot. You compare the size of the snapshots,
you look whether the new one is much larger. If this is your personal
computer you will usually have an idea whether you created some
legitimate large piece of data since the last snapshot. So you are
able to spot junk, kind of. If you spot this you can hunt for the
junk, exclude it from the backup script (the one that pushes from
primary machine to backup server) and take a new snapshot right after
the bad one. If the new snapshot meets expectations you nuke the
Typically people will use a system of varying frequency, that means
of the newest snapshots you keep each, snapshots older than 3 weeks you
keep one per week, snapshots older than 3 months you keep one per
month. That means dropping snapshots, and you can selectively drop
snapshots that have odd sizes - after looking what's in that space, it
might be valuable.
My way of building this thing in practice:
There are userlevel programs that do this kind of snapshotting in
That way you wouldn't have to have an OS with snapshots on the
backup server. This won't be as fine-graded, filesystem does blocks,
these guys do it by file, AFAIK. And of course no integrated RAID and
I don't think they do compression.
Didn't try this yet, I have no idea whether it works better or
worse than a full machine with ZFS.
Weaknesses and expectations
The primary thing I don't like about this is the lack of snapshots on
the primary machine.
Backup to the backup server takes hours, so I won't have -say- one
snapshot every hour, a thing that would be entirely practical with
snapshots on the primary machine. You would then drop these snapshots
after the big rsync to the backup server runs.
Unfortunately Linux is on my primary machine and for whatever
reason they are years behind the other OSes when it comes to
filesystems and snapshots. LVM level snapshots are a complete joke
(sorry, raw device snapshots and they get dropped on overflow, I'm not
making this up). Before Linux gets BTRFS we are probably out of luck
I could experiment with NFS storage towards some fileserver that
has a modern filesystem, but that has obvious latency issues, GbE is
too slow even outside latency/turnaround and this requires one more
permanently running machine - requiring ECC RAM, best power supply and
battery backup. Not gonna happen. A Netapp with 10GbE or
computer-to-computer SCSI would work I guess.
iSCSI doesn't help as it could only do raw-device snapshots.
I expect that some time in the future either Linux will have real
snapshots or that I will be able to run FreeBSD on my primary machine
again. Then I will certainly do a hierarchy of snapshots. A couple
quick ones on the primary machine while the backup server is off or
does something else. Then you do the main backup and save to a
snapshot on the backup server and on successful completion drop the
short-term snapshots on the primary machine. Or don't drop them as
long as there's space.