Ekstatische Lyriken Pinnwand

Backup Software

written by Pj on Wednesday May 1st, 2013 -- 5:30 a.m.

edit this message - return to message index
(only moderators may edit messages)
I do wish there were some good backup software for Linux.  What exists seems to mostly misunderstand what a backup is. 

For example, most backup software seems to think that "backup" means "make a copy of stuff, overwriting the previous copy." You know, because when I accidentally corrupt a file and fail to notice until two weeks later, all I care about is that I have a second copy of that corruption somewhere.  Most software also seems to assume that whatever media I'll be backing up to is larger than the data I'm backing up, such that all it needs to do is copy the files.  Indeed, many web sites list "tar" as a backup program.

I believe that ideal backup software would do the following:

Upon each invocation, calculate an SHA-1 sum of each file in the filesystem.  Add this information to a database, which stores the result of this scan each time it has been done in the past, so that it is possible to see the filesystem not just as it was at the last backup, but at any time a backup was performed.  Then, for each SHA-1 that was found in the filesystem, pick out the ones which have been least-recently backed up, and pick just enough of them to fill whatever media I'm using for backup (CD, DVD, tape, whatever).  Finally, add to that database which disk number you wrote the files to, such that for each SHA-1 sum, it lists every disk that data was ever written to, in case any disk should fail.  Also write this database to each backup, so that files can be named after the SHA-1 sum of their contents, in order to avoid problems with file name restrictions.  Finally, just toss the ISO somewhere so I can encrypt it if I want to, and I probably do, since off-site backups are a good idea and that's easier when the people I give the disks to don't have to worry about protecting my data -- they can just toss them in a box somewhere and if they turn up missing, it doesn't matter.

The result is sort of a hybrid between a full backup and an incremental backup.  For example, if you have 100 GB of data, but every week when you run the software, you only have 100 MB of changes, then not only do your recent changes get backed up each time you run the software, but each set of 22 disks is also a full backup of all of your data.  So, if you backup every week, then it's basically like doing a full backup every five months, and also doing an incremental backup every week, except that you only need 22 disks instead of the 44 you'd need if all of that empty space on your incremental backups went to waste by being left blank.

I don't understand why something like this doesn't already exist.  The closest thing I can think of would be Apple's "Time Machine" software, which backs up to an external hard drive, and lets you open folders into the past and copy files out, but even that puts your entire backup system on a single hard disk which, if it fails, you no longer have backups.  So then you have to backup your backup, which is just retarded.

So far, the best thing I've found is CrashPlan which lets you back up your files on your friend's computers, and also keeps old versions of files.  It'll even make multiple backup sets, assuming you have multiple friends.  It won't, however, create DVD sets.

It looks interesting, but I can't help but wonder how it would work in practice.  Say my friends gave me 20 GB, which would easily hold my 16 GB of data and leave some space for some older versions of some recently-changed files.  Then one day I decide to test Linux distributions and so it sees many new GB of data to back up.  I assume it would then start deleting old versions of files from the backup in order to make room for this new data that I don't even care about.  If space runs sufficiently low, it might even assume that the newer files are more important than the older ones, and so delete the only remaining backup copy of the older files to make room for the ISOs. 

That's what really interests me about disk-based backups.  Once I burn the disk, the data is backed up, and nothing that happens in the future is going to remove it from the backup, whereas all of these online backup solutions include the possibility that the backup software decided last week that it no longer needed to keep the file from last year that I just now decided I want back.  Hell, a lot of the data on my hard drive is just old shit that I'd like to keep in case I want it again someday, but which I likely won't ever actually use.  It'd be nice if I could just delete it all and rely upon the fact that it's in my backups somewhere if I ever decide I want it again.


return to message index

Your Reply

Name: No registration necessary. Simply choose
a name and password and type them in.
You may want to read the rules before you spend a lot of time writing something.
Plain Text - What you type is what you will see.
Some HTML - Use this if you are including HTML tags.
Pure HTML - Copies your post directly into the web page.
first, then