README.rst (4402B)
1 =================== 2 snapshot replicator 3 =================== 4 5 Collection of scripts for creating, managing & replicating filesystem 6 snapshots, deduplicated using hardlinks. Written using rsync, POSIX sh & awk. 7 8 This projects contains loose collection of scripts which can be chained to 9 allow arbitrary workflow. It's possible to restrict access both in push and 10 pull mode so in case one of machines is compromised no harm will come to 11 other, except for filling up the drive assigned to snapshots. Authentication 12 and encryption may be provided by SSH, TLS or any other layer that can handle 13 bidirectional data stream. 14 15 filesystem layout 16 ----------------- 17 18 Each sequence of snapshots is stored inside a directory. Each snapshot is 19 in subdirectory whose name is unix timestamp of time the snapshot was taken, 20 written as decimal integer. This uniquely identifies the specific snapshot in 21 a sequence. Also in the snapshot sequence directory there is empty file named 22 same as the snapshot with ``.snapshot.`` prepended for every snapshot that has 23 been completed successfully. Snapshots lacking corresponding file are 24 considered unfinished and are not replicated. This empty file also usually 25 resides in the snapshot directory itself and serves same purpose for 26 shares specified in ``rsyncd.conf``. 27 28 usage 29 ----- 30 31 You can use ``snap.push`` to replicate all snapshots in a sequence not present 32 in destination directory or ``snap.push.single`` to replicate single snapshot 33 only. 34 35 local destination 36 ~~~~~~~~~~~~~~~~~ 37 38 Snapshot is saved directly in timestamped directory. Files for hardlinking are 39 looked in snapshot with highest timestamp. 40 41 remote shell destination 42 ~~~~~~~~~~~~~~~~~~~~~~~~ 43 44 Currently unimplemented. Use daemon mode instead. See USING RSYNC-DAEMON 45 FEATURES VIA A REMOTE-SHELL CONNECTION in rsync manpage. 46 47 rsyncd destination 48 ~~~~~~~~~~~~~~~~~~ 49 50 Server should expose two modules: read-only ``foo`` and write-only 51 ``foo.push``, where ``foo`` points to a snapshot sequence directory and 52 ``foo.push`` to it's ``new`` subdirectory and has ``snap.rsync.pre`` as 53 pre-xfer script and ``snap.rsync.post`` as post-xfer script. 54 55 You can use ``snap.genconf`` to generate such configuration snippet. If you 56 use snapshot pushing and want to make the pushing machine unable to read 57 snapshots back, you can use ``exclude = /[0-9]*`` in the config file to make 58 only the ``.snapshot.timestamp`` files readable. 59 60 The ``foo`` module is used to determine which snapshots to push (unless you 61 push specific ones). Each snapshot is then in sequence written to ``foo.push`` 62 module. The ``.snapshot.timestamp`` file in the snapshot directory is written 63 last to signify that upload is complete. After that post-xfer scripts 64 hardlinks all files to a timestamped directory if it does not exist yet. 65 It's important to disable the ``--in-place`` option so existing files which 66 are hardlinked elsewhere are not overwritten. 67 68 sources 69 ~~~~~~~ 70 71 All sources accepted by rsync are possible. It's impossible to have both 72 remote source and remote destination as rsync does not implement this. 73 74 LVM2 75 ~~~~ 76 77 The ``snap.lvm2`` script creates a snapshot of a logical volume, mounts it and 78 runs supplied program (most frequently ``snap.push.single``). After the 79 program exits, it unmounts and removes the snapshot. The location of the 80 mountpoint is exported as ``SNAP_SRC`` environment variable which 81 ``snap.push.single`` understands. 82 83 things to note 84 -------------- 85 86 The rsync option ``--hard-links`` is avoided as it requires keeping whole tree 87 in memory, which may be infeasible in a large setup. 88 89 Hardlinks are only kept for exact same file (path, mtime, access rights). It 90 is possible to make rsync search for the same file with different path using 91 the ``--fuzzy`` option. 92 93 It might be reasonable to implement additional deduplication (by hardlinking 94 or otherwise) as rsync's algorithms are not perfect and the implementation 95 does not look outside single snapshot series. 96 97 Removing old snapshots is currently left to the user. 98 99 todo 100 ---- 101 102 * when replicating locally give option to update ``./new`` directory. 103 104 * plain ssh destination (call snap.rsync.pre and snap.rsync.post on remote 105 side) 106 107 * old snapshot pruning / rotation 108 109 * document each script specifically 110 111 * document how access can be restricted using ssh 112 113 * give specific examples 114 115 * support snapshots & COW deduplication using: btrfs, zsf, aufs, unionfs, 116 unionfs-fuse, ...