Revision: | 35 ccx_at_webprojekty_dot_cz-20151215131121-e5zrxj2kpygb40nq |
---|---|
Generated: | on 2024-07-24 11:07:13 UTC by bzr2html |
you can use bzr branch <this-url> to get this repository
Collection of scripts for creating, managing & replicating filesystem snapshots, deduplicated using hardlinks. Written using rsync, POSIX sh & awk.
This projects contains loose collection of scripts which can be chained to allow arbitrary workflow. It's possible to restrict access both in push and pull mode so in case one of machines is compromised no harm will come to other, except for filling up the drive assigned to snapshots. Authentication and encryption may be provided by SSH, TLS or any other layer that can handle bidirectional data stream.
Each sequence of snapshots is stored inside a directory. Each snapshot is in subdirectory whose name is unix timestamp of time the snapshot was taken, written as decimal integer. This uniquely identifies the specific snapshot in a sequence. Also in the snapshot sequence directory there is empty file named same as the snapshot with .snapshot. prepended for every snapshot that has been completed successfully. Snapshots lacking corresponding file are considered unfinished and are not replicated. This empty file also usually resides in the snapshot directory itself and serves same purpose for shares specified in rsyncd.conf.
You can use snap.push to replicate all snapshots in a sequence not present in destination directory or snap.push.single to replicate single snapshot only.
Snapshot is saved directly in timestamped directory. Files for hardlinking are looked in snapshot with highest timestamp.
Currently unimplemented. Use daemon mode instead. See USING RSYNC-DAEMON FEATURES VIA A REMOTE-SHELL CONNECTION in rsync manpage.
Server should expose two modules: read-only foo and write-only foo.push, where foo points to a snapshot sequence directory and foo.push to it's new subdirectory and has snap.rsync.pre as pre-xfer script and snap.rsync.post as post-xfer script.
You can use snap.genconf to generate such configuration snippet. If you use snapshot pushing and want to make the pushing machine unable to read snapshots back, you can use exclude = /[0-9]* in the config file to make only the .snapshot.timestamp files readable.
The foo module is used to determine which snapshots to push (unless you push specific ones). Each snapshot is then in sequence written to foo.push module. The .snapshot.timestamp file in the snapshot directory is written last to signify that upload is complete. After that post-xfer scripts hardlinks all files to a timestamped directory if it does not exist yet. It's important to disable the --in-place option so existing files which are hardlinked elsewhere are not overwritten.
All sources accepted by rsync are possible. It's impossible to have both remote source and remote destination as rsync does not implement this.
The snap.lvm2 script creates a snapshot of a logical volume, mounts it and runs supplied program (most frequently snap.push.single). After the program exits, it unmounts and removes the snapshot. The location of the mountpoint is exported as SNAP_SRC environment variable which snap.push.single understands.
The rsync option --hard-links is avoided as it requires keeping whole tree in memory, which may be infeasible in a large setup.
Hardlinks are only kept for exact same file (path, mtime, access rights). It is possible to make rsync search for the same file with different path using the --fuzzy option.
It might be reasonable to implement additional deduplication (by hardlinking or otherwise) as rsync's algorithms are not perfect and the implementation does not look outside single snapshot series.
Removing old snapshots is currently left to the user.