snaprep

Revision: 35 ccx_at_webprojekty_dot_cz-20151215131121-e5zrxj2kpygb40nq
Generated:on 2024-07-24 11:07:13 UTC by bzr2html

you can use bzr branch <this-url> to get this repository

snapshot replicator

Collection of scripts for creating, managing & replicating filesystem snapshots, deduplicated using hardlinks. Written using rsync, POSIX sh & awk.

This projects contains loose collection of scripts which can be chained to allow arbitrary workflow. It's possible to restrict access both in push and pull mode so in case one of machines is compromised no harm will come to other, except for filling up the drive assigned to snapshots. Authentication and encryption may be provided by SSH, TLS or any other layer that can handle bidirectional data stream.

filesystem layout

Each sequence of snapshots is stored inside a directory. Each snapshot is in subdirectory whose name is unix timestamp of time the snapshot was taken, written as decimal integer. This uniquely identifies the specific snapshot in a sequence. Also in the snapshot sequence directory there is empty file named same as the snapshot with .snapshot. prepended for every snapshot that has been completed successfully. Snapshots lacking corresponding file are considered unfinished and are not replicated. This empty file also usually resides in the snapshot directory itself and serves same purpose for shares specified in rsyncd.conf.

usage

You can use snap.push to replicate all snapshots in a sequence not present in destination directory or snap.push.single to replicate single snapshot only.

local destination

Snapshot is saved directly in timestamped directory. Files for hardlinking are looked in snapshot with highest timestamp.

remote shell destination

Currently unimplemented. Use daemon mode instead. See USING RSYNC-DAEMON FEATURES VIA A REMOTE-SHELL CONNECTION in rsync manpage.

rsyncd destination

Server should expose two modules: read-only foo and write-only foo.push, where foo points to a snapshot sequence directory and foo.push to it's new subdirectory and has snap.rsync.pre as pre-xfer script and snap.rsync.post as post-xfer script.

You can use snap.genconf to generate such configuration snippet. If you use snapshot pushing and want to make the pushing machine unable to read snapshots back, you can use exclude = /[0-9]* in the config file to make only the .snapshot.timestamp files readable.

The foo module is used to determine which snapshots to push (unless you push specific ones). Each snapshot is then in sequence written to foo.push module. The .snapshot.timestamp file in the snapshot directory is written last to signify that upload is complete. After that post-xfer scripts hardlinks all files to a timestamped directory if it does not exist yet. It's important to disable the --in-place option so existing files which are hardlinked elsewhere are not overwritten.

sources

All sources accepted by rsync are possible. It's impossible to have both remote source and remote destination as rsync does not implement this.

LVM2

The snap.lvm2 script creates a snapshot of a logical volume, mounts it and runs supplied program (most frequently snap.push.single). After the program exits, it unmounts and removes the snapshot. The location of the mountpoint is exported as SNAP_SRC environment variable which snap.push.single understands.

things to note

The rsync option --hard-links is avoided as it requires keeping whole tree in memory, which may be infeasible in a large setup.

Hardlinks are only kept for exact same file (path, mtime, access rights). It is possible to make rsync search for the same file with different path using the --fuzzy option.

It might be reasonable to implement additional deduplication (by hardlinking or otherwise) as rsync's algorithms are not perfect and the implementation does not look outside single snapshot series.

Removing old snapshots is currently left to the user.

todo

  • when replicating locally give option to update ./new directory.
  • plain ssh destination (call snap.rsync.pre and snap.rsync.post on remote side)
  • old snapshot pruning / rotation
  • document each script specifically
  • document how access can be restricted using ssh
  • give specific examples
  • support snapshots & COW deduplication using: btrfs, zsf, aufs, unionfs, unionfs-fuse, ...