snaprep

git mirror of https://ccx.te2000.cz/bzr/snaprep
git clone https://ccx.te2000.cz/git/snaprep
Log | Files | Refs

README.rst (4402B)


      1 ===================
      2 snapshot replicator
      3 ===================
      4 
      5 Collection of scripts for creating, managing & replicating filesystem
      6 snapshots, deduplicated using hardlinks. Written using rsync, POSIX sh & awk.
      7 
      8 This projects contains loose collection of scripts which can be chained to
      9 allow arbitrary workflow. It's possible to restrict access both in push and
     10 pull mode so in case one of machines is compromised no harm will come to
     11 other, except for filling up the drive assigned to snapshots. Authentication
     12 and encryption may be provided by SSH, TLS or any other layer that can handle
     13 bidirectional data stream.
     14 
     15 filesystem layout
     16 -----------------
     17 
     18 Each sequence of snapshots is stored inside a directory. Each snapshot is
     19 in subdirectory whose name is unix timestamp of time the snapshot was taken,
     20 written as decimal integer. This uniquely identifies the specific snapshot in
     21 a sequence. Also in the snapshot sequence directory there is empty file named
     22 same as the snapshot with ``.snapshot.`` prepended for every snapshot that has
     23 been completed successfully. Snapshots lacking corresponding file are
     24 considered unfinished and are not replicated. This empty file also usually
     25 resides in the snapshot directory itself and serves same purpose for
     26 shares specified in ``rsyncd.conf``.
     27 
     28 usage
     29 -----
     30 
     31 You can use ``snap.push`` to replicate all snapshots in a sequence not present
     32 in destination directory or ``snap.push.single`` to replicate single snapshot
     33 only.
     34 
     35 local destination
     36 ~~~~~~~~~~~~~~~~~
     37 
     38 Snapshot is saved directly in timestamped directory. Files for hardlinking are
     39 looked in snapshot with highest timestamp.
     40 
     41 remote shell destination
     42 ~~~~~~~~~~~~~~~~~~~~~~~~
     43 
     44 Currently unimplemented. Use daemon mode instead. See USING RSYNC-DAEMON
     45 FEATURES VIA A REMOTE-SHELL CONNECTION in rsync manpage.
     46 
     47 rsyncd destination
     48 ~~~~~~~~~~~~~~~~~~
     49 
     50 Server should expose two modules: read-only ``foo`` and write-only
     51 ``foo.push``, where ``foo`` points to a snapshot sequence directory and
     52 ``foo.push`` to it's ``new`` subdirectory and has ``snap.rsync.pre`` as
     53 pre-xfer script and ``snap.rsync.post`` as post-xfer script.
     54 
     55 You can use ``snap.genconf`` to generate such configuration snippet. If you
     56 use snapshot pushing and want to make the pushing machine unable to read
     57 snapshots back, you can use ``exclude = /[0-9]*`` in the config file to make
     58 only the ``.snapshot.timestamp`` files readable.
     59 
     60 The ``foo`` module is used to determine which snapshots to push (unless you
     61 push specific ones). Each snapshot is then in sequence written to ``foo.push``
     62 module. The ``.snapshot.timestamp`` file in the snapshot directory is written
     63 last to signify that upload is complete. After that post-xfer scripts
     64 hardlinks all files to a timestamped directory if it does not exist yet.
     65 It's important to disable the ``--in-place`` option so existing files which
     66 are hardlinked elsewhere are not overwritten.
     67 
     68 sources
     69 ~~~~~~~
     70 
     71 All sources accepted by rsync are possible. It's impossible to have both
     72 remote source and remote destination as rsync does not implement this.
     73 
     74 LVM2
     75 ~~~~
     76 
     77 The ``snap.lvm2`` script creates a snapshot of a logical volume, mounts it and
     78 runs supplied program (most frequently ``snap.push.single``). After the
     79 program exits, it unmounts and removes the snapshot. The location of the
     80 mountpoint is exported as ``SNAP_SRC`` environment variable which
     81 ``snap.push.single`` understands.
     82 
     83 things to note
     84 --------------
     85 
     86 The rsync option ``--hard-links`` is avoided as it requires keeping whole tree
     87 in memory, which may be infeasible in a large setup.
     88 
     89 Hardlinks are only kept for exact same file (path, mtime, access rights). It
     90 is possible to make rsync search for the same file with different path using
     91 the ``--fuzzy`` option.
     92 
     93 It might be reasonable to implement additional deduplication (by hardlinking
     94 or otherwise) as rsync's algorithms are not perfect and the implementation
     95 does not look outside single snapshot series.
     96 
     97 Removing old snapshots is currently left to the user.
     98 
     99 todo
    100 ----
    101 
    102 * when replicating locally give option to update ``./new`` directory.
    103 
    104 * plain ssh destination (call snap.rsync.pre and snap.rsync.post on remote
    105   side)
    106 
    107 * old snapshot pruning / rotation
    108 
    109 * document each script specifically
    110 
    111 * document how access can be restricted using ssh
    112 
    113 * give specific examples
    114 
    115 * support snapshots & COW deduplication using: btrfs, zsf, aufs, unionfs,
    116   unionfs-fuse, ...