README.rst

===================
snapshot replicator
===================

Collection of scripts for creating, managing & replicating filesystem
snapshots, deduplicated using hardlinks. Written using rsync, POSIX sh & awk.

This projects contains loose collection of scripts which can be chained to
allow arbitrary workflow. It's possible to restrict access both in push and
pull mode so in case one of machines is compromised no harm will come to
other, except for filling up the drive assigned to snapshots. Authentication
and encryption may be provided by SSH, TLS or any other layer that can handle
bidirectional data stream.

filesystem layout
-----------------

Each sequence of snapshots is stored inside a directory. Each snapshot is
in subdirectory whose name is unix timestamp of time the snapshot was taken,
written as decimal integer. This uniquely identifies the specific snapshot in
a sequence. Also in the snapshot sequence directory there is empty file named
same as the snapshot with ``.snapshot.`` prepended for every snapshot that has
been completed successfully. Snapshots lacking corresponding file are
considered unfinished and are not replicated. This empty file also usually
resides in the snapshot directory itself and serves same purpose for
shares specified in ``rsyncd.conf``.

usage
-----

You can use ``snap.push`` to replicate all snapshots in a sequence not present
in destination directory or ``snap.push.single`` to replicate single snapshot
only.

local destination
~~~~~~~~~~~~~~~~~

Snapshot is saved directly in timestamped directory. Files for hardlinking are
looked in snapshot with highest timestamp.

remote shell destination
~~~~~~~~~~~~~~~~~~~~~~~~

Currently unimplemented. Use daemon mode instead. See USING RSYNC-DAEMON
FEATURES VIA A REMOTE-SHELL CONNECTION in rsync manpage.

rsyncd destination
~~~~~~~~~~~~~~~~~~

Server should expose two modules: read-only ``foo`` and write-only
``foo.push``, where ``foo`` points to a snapshot sequence directory and
``foo.push`` to it's ``new`` subdirectory and has ``snap.rsync.pre`` as
pre-xfer script and ``snap.rsync.post`` as post-xfer script.

You can use ``snap.genconf`` to generate such configuration snippet. If you
use snapshot pushing and want to make the pushing machine unable to read
snapshots back, you can use ``exclude = /[0-9]*`` in the config file to make
only the ``.snapshot.timestamp`` files readable.

The ``foo`` module is used to determine which snapshots to push (unless you
push specific ones). Each snapshot is then in sequence written to ``foo.push``
module. The ``.snapshot.timestamp`` file in the snapshot directory is written
last to signify that upload is complete. After that post-xfer scripts
hardlinks all files to a timestamped directory if it does not exist yet.
It's important to disable the ``--in-place`` option so existing files which
are hardlinked elsewhere are not overwritten.

sources
~~~~~~~

All sources accepted by rsync are possible. It's impossible to have both
remote source and remote destination as rsync does not implement this.

LVM2
~~~~

The ``snap.lvm2`` script creates a snapshot of a logical volume, mounts it and
runs supplied program (most frequently ``snap.push.single``). After the
program exits, it unmounts and removes the snapshot. The location of the
mountpoint is exported as ``SNAP_SRC`` environment variable which
``snap.push.single`` understands.

things to note
--------------

The rsync option ``--hard-links`` is avoided as it requires keeping whole tree
in memory, which may be infeasible in a large setup.

Hardlinks are only kept for exact same file (path, mtime, access rights). It
is possible to make rsync search for the same file with different path using
the ``--fuzzy`` option.

It might be reasonable to implement additional deduplication (by hardlinking
or otherwise) as rsync's algorithms are not perfect and the implementation
does not look outside single snapshot series.

Removing old snapshots is currently left to the user.

todo
----

* when replicating locally give option to update ``./new`` directory.

* plain ssh destination (call snap.rsync.pre and snap.rsync.post on remote
  side)

* old snapshot pruning / rotation

* document each script specifically

* document how access can be restricted using ssh

* give specific examples

* support snapshots & COW deduplication using: btrfs, zsf, aufs, unionfs,
  unionfs-fuse, ...