One of my backup servers used to be in the same datacenter as the primary server. I only recently moved it to a different host. It's still in the same city, though, so I'm considering other options. I'm not a big fan of just-make-a-tarball-of-everything-and-upload-it-to-the-cloud backup methodology, I prefer something a bit more incremental. But with Backblaze B2 being so cheap, I might as well just upload tarballs to B2. As long as I have the data, the servers can be redeployed in a couple of hours at most.
The SBG fire illustrates the importance of geographical redundancy. Just because the datacenters have different numbers at the end doesn't mean that they won't fail at the same time. Apart from a large fire or power outage, there are lots of things that can take out several datacenters in close vicinity at the same time, such as hurricanes and earthquakes.
> I'm not a big fan of just-make-a-tarball-of-everything-and-upload-it-to-the-cloud backup methodology, I prefer something a bit more incremental.
pretty much a textbook use-case for zfs with some kind of snapshot-rolling utility. Snap every hour, send backups once a day, prune your backups according to some timetable. Transfer as incrementals against the previous stored snapshot. Plus you get great data integrity checking on top of that.
with all due respect here - I've never heard of it either, and that's not what you want with a filesystem.
The draw of ZFS is that it's the log-structured filesystem with 10 zillionty hours of production experience that says that it works. And that's why BTRFS is not a direct substitute either. Or Hammer2. There are lots of things that could be cool, the question is are you willing to run them in production.
There is a first-mover advantage in filesystems (that occupy a given design and provide a given set of capabilities). At some point a winner sucks most of the oxygen out of the atmosphere here. There is maybe space for a second place winner (btrfs), there isn't a spot for a fourth-place winner.
I use tarballs because it allows me to not trust the backup servers. ssh is set up such that backup server's ssh keys are certified to only run a command that will allow them to run a backup script that will just return the encrypted data, and nothing else.
It's very easy to use spare storage in various places to do backups this way, as ssh, gpg and cron are everywhere, and you don't need to install any complicated backup solutions or trust the backup storage machines much.
All you have to manage centrally is private keys for backup encryption, and CA for signing the ssh keys + some occasional monitoring/tests.
I thought so too for a long while. Until I was trying to restore something (just to test things), and wasn’t able to... it might have been specific to our GPG or an older version or something... but I decided to switch to restic and am much happier now.
Restic has a single binary that takes care of everything. It feels more modern and seems to work really well. Never had any issue restoring from it.
Just one data point. Stick to whatever works for you. But important to test not only your backups, but also restores!
I've been using Duplicati forever. The fact that it's C# is a bit of a pain (some distros don't have recent Mono), but running it in Docker is easy enough. Being able to check the status of backups and restore files from a web UI is a huge plus, so is the ability to run the same app on all platforms.
I've found duplicity to be a little simplistic and brittle. Purging old backups is also difficult, you basically have to make a full backup (i.e. non-incremental) before you can do that, which increases bandwidth and storage cost.
Restic looks great feature-wise, but still feels like the low-level component you'd use to build a backup system, not a backup system in itself. It's also pre-1.0.
Interesting, I will check Restic out, I’ve heard other good things about it. Duplicity is a bit of a pain to set up and Restic’s single binary model is more straightforward (Go is a miracle). Thanks for the recommendation!
GPG is a bit quirky but I do regularly check my backups and restores (if once every few months counts as regular).
Ditto. Moved to rclone after having a bunch of random small issues with Duplicity that on their own weren't major but made me lose faith in something that's going to be largely operating unsupervised except for a monthly check-in.
The SBG fire illustrates the importance of geographical redundancy. Just because the datacenters have different numbers at the end doesn't mean that they won't fail at the same time. Apart from a large fire or power outage, there are lots of things that can take out several datacenters in close vicinity at the same time, such as hurricanes and earthquakes.