overview.html - s6 - Mirror/fork of https://skarnet.org/software/s6/

overview.html (22057B)
      1 <html>
      2   <head>
      3     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
      4     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
      5     <meta http-equiv="Content-Language" content="en" />
      6     <title>s6: an overview</title>
      7     <meta name="Description" content="s6: an overview" />
      8     <meta name="Keywords" content="s6 overview supervision init process unix" />
      9     <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
     10   </head>
     11 <body>
     12 
     13 <p>
     14 <a href="index.html">s6</a><br />
     15 <a href="//skarnet.org/software/">Software</a><br />
     16 <a href="//skarnet.org/">skarnet.org</a>
     17 </p>
     18 
     19 <h1> An overview of s6 </h1>
     20 
     21 <p>
     22  s6 is a collection of utilities revolving around process supervision and
     23 management, logging, and system initialization. This page is a high-level
     24 description of the different parts of s6.
     25 </p>
     26 
     27 <h2> Process supervision </h2>
     28 
     29 <p>
     30  At its core, s6 is a <em>process supervision suite</em>, like its ancestor
     31 <a href="https://cr.yp.to/daemontools.html">daemontools</a> and its
     32 close cousin
     33 <a href="http://smarden.org/runit/">runit</a>.
     34 </p>
     35 
     36 <h3> Concept </h3>
     37 
     38 <p>
     39  The concept of process supervision comes from several observations:
     40 </p>
     41 
     42 <ul>
     43  <li> Unix systems, even minimalistic ones, need to run
     44 <em>long-lived processes</em>, aka <em>daemons</em>. That is one of the
     45 core design principles of Unix: one service &rarr; one daemon. </li>
     46  <li> Daemons can die unexpectedly. Maybe they are missing a vital
     47 resource and cannot handle a certain failure; maybe they tripped on a bug;
     48 maybe a misconfigured administration program killed them; maybe the
     49 kernel killed them. Processes are fragile, but daemons are vital to a
     50 Unix system: a fundamental discrepancy that needs to be solved. </li>
     51  <li> Automatically restarting daemons when they die is generally a good
     52 thing. In any case, sysadmin intervention is necessary, but at least the
     53 daemon is providing service, or trying to, until the sysadmin can log in
     54 and investigate the underlying problem. </li>
     55  <li> Ad-hoc shell scripts that restart daemons <strong>suck</strong>, for
     56 several reasons that would each justify their own page. The difficulty of
     57 keeping track of the PID, explained below, is one of those reasons. </li>
     58  <li> It is sometimes necessary to send signals to a daemon. To kill it,
     59 of course, but also to make it read its config file again, for instance;
     60 signalling a daemon is a natural and very common way of sending it
     61 simple commands. </li>
     62  <li> Generally, to send a signal to a daemon, you need to know its PID.
     63 Without a supervision suite, knowing the proper PID is hard. Most
     64 non-supervision systems use a hack known as <em>.pid files</em>, i.e.
     65 the script that starts the daemon stores its PID into a file, and other
     66 scripts read that file. This is a bad mechanism for several reasons, and
     67 the case against .pid files would also justify its own page; the most
     68 important drawback of .pid files is that they create race conditions
     69 and management scripts may kill the wrong process. </li>
     70  <li> Non-supervision systems provide scripts to start and stop daemons,
     71 but those scripts may fail at boot time even though they work when run
     72 manually,
     73 and vice versa. If a sysadmin logs in and runs the script to restart a
     74 daemon that has died, the result might not be the same as if the whole
     75 system had been rebooted, and the daemon may exhibit strange behaviours!
     76 This is because the boot-time environment and the restart-time environment
     77 are not the same when the script is run; and a non-supervision system
     78 just cannot ensure reproducibility of the environment. This is a core
     79 problem of non-supervision systems: countless bugs have been falsely
     80 reported because of simple environment differences or configuration errors,
     81 countless man-hours have been wasted to try and understand what was
     82 going on. </li>
     83 </ul>
     84 
     85 <p>
     86  A process supervision system organizes the process hierarchy in a
     87 radically different way.
     88 </p>
     89 
     90 <ul>
     91  <li> A process supervision system starts an independent hierarchy of
     92 processes at boot time, called a <em>supervision tree</em>. This
     93 supervision tree never dies: when one of its components dies, it is
     94 restarted automatically. To ensure availability of the supervision
     95 tree at all times, it should be rooted in process 1, which cannot die. </li>
     96  <li> A daemon is never started, either manually or in a script, as a
     97 scion of the script that starts it.
     98  Instead, to start a daemon, you configure a
     99 specific directory which contains all the information about your daemon;
    100 then you send a command to the supervision tree. The supervision tree
    101 will start the daemon as a leaf. <strong>In a process supervision
    102 system, daemons are always spawned by the supervision tree, and
    103 never by an admin's shell.</strong> </li>
    104  <li> The parent of your daemon is a <em>supervisor</em>. Since your
    105 daemon is its direct child, <strong>the supervisor always knows the
    106 correct PID of your daemon</strong>. </li>
    107  <li> The supervisor watches your daemon and can restart it when it
    108 dies, automatically. </li>
    109  <li> The supervision tree always has the same environment, so starting
    110 conditions are reproducible. Your daemon will always be started with the
    111 same environment, whether it is at boot time via init scripts or for the
    112 100th automatic - or manual - restart. </li>
    113  <li> To send signals to your daemon, you send a command to its
    114 supervisor, which will then send a signal to the daemon on your behalf.
    115 Your daemon is identified by the directory containing its information,
    116 which is stable, instead of by its PID, which is not stable; the supervisor
    117 maintains the correct association without a race condition or the other
    118 problems of .pid files. </li>
    119 </ul>
    120 
    121 <h3> Implementation </h3>
    122 
    123 <p>
    124  s6 is a straightforward implementation of those concepts.
    125 </p>
    126 
    127 <ul>
    128  <li> The <a href="s6-svscan.html">s6-svscan</a> and
    129 <a href="s6-supervise.html">s6-supervise</a> programs are the components
    130 of the <em>supervision tree</em>. They are long-lived programs.
    131  <ul>
    132   <li> <a href="s6-supervise.html">s6-supervise</a> is a daemon's
    133 <em>supervisor</em>, its direct parent. For every long-lived process on a
    134 system, there is a corresponding <a href="s6-supervise.html">s6-supervise</a>
    135 process watching it. This is okay, because every instance of
    136 <a href="s6-supervise.html">s6-supervise</a> uses very few resources. </li>
    137   <li> <a href="s6-svscan.html">s6-svscan</a> is, in a manner of speaking,
    138 a supervisor for the supervisors. It watches and maintains a collection of
    139 <a href="s6-supervise.html">s6-supervise</a> processes: it is the branch
    140 of the supervision tree that all supervisors are stemming from. It can be
    141 run and
    142 <a href="//skarnet.org/software/s6/s6-svscan-not-1.html">supervised
    143 by your regular init process</a>, or it can
    144 <a href="//skarnet.org/software/s6/s6-svscan-1.html">run as
    145 process 1 itself</a>. Running s6-svscan as process 1 requires
    146 some effort from the user, because of the inherent non-portability of
    147 init processes; the
    148 <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>
    149 package automates that effort and allows users to run s6 as an init
    150 replacement. </li>
    151   <li> The configuration of a daemon to be supervised by
    152 <a href="s6-supervise.html">s6-supervise</a> is done via a
    153 <a href="servicedir.html">service directory</a>. </li>
    154   <li> The place to gather all service directories to be watched by a
    155 <a href="s6-svscan.html">s6-svscan</a> instance is called a
    156 <a href="scandir.html">scan directory</a>. </li>
    157  </ul>
    158  <li> The command that controls a single supervisor, and allows you to
    159 send signals to a daemon, is
    160 <a href="s6-svc.html">s6-svc</a>. It is a short-lived program. </li>
    161  <li> The command that controls a set of supervisors, and allows you to
    162 start and stop supervision trees, is
    163 <a href="s6-svscanctl.html">s6-svscanctl</a>. It is a short-lived
    164 program. </li>
    165 </ul>
    166 
    167 <p>
    168  These four programs,
    169 <a href="s6-svscan.html">s6-svscan</a>,
    170 <a href="s6-supervise.html">s6-supervise</a>,
    171 <a href="s6-svscanctl.html">s6-svscanctl</a> and
    172 <a href="s6-svc.html">s6-svc</a>,
    173 are the very core of s6. Technically, once you have them, you have a
    174 functional s6 installation, and the other utilities are just a bonus.
    175 </p>
    176 
    177 <h3> Practical usage </h3>
    178 
    179 <p>
    180  To use s6's supervision features, you need to perform the following steps:
    181 </p>
    182 
    183 <ul>
    184  <li> For every daemon you potentially want supervised, write a
    185 <a href="servicedir.html">service directory</a>. Make sure that
    186 your daemon does not background itself when started in the
    187 <tt>./run</tt> script! Auto-backgrounding is a historical hack
    188 that was implemented when supervision suites did not exist; since
    189 you're using a supervision suite, auto-backgrounding is unnecessary
    190 and in this case detrimental. </li>
    191  <li> Write a single <a href="scandir.html">scan directory</a> for
    192 the set of daemons you want to actually run. This set can be modified
    193 at run time. </li>
    194  <li> At some point in your initialization scripts, run
    195 <a href="s6-svscan.html">s6-svscan</a> on the scan directory. This will
    196 start the supervision tree, including your set of daemons. The exact
    197 way of running s6-svscan depends on your system: it is not quite the same
    198 when you want to run it as process 1 on a real machine, or under another
    199 init on a real machine, or as process 1 in a
    200 <a href="https://www.docker.com/">Docker</a> container, or in another
    201 context entirely. </li>
    202  <li> Alternatively, you can start <a href="s6-svscan.html">s6-svscan</a>
    203 on an empty scan directory, then populate it step by step and send an
    204 update command to s6-svscan via
    205 <a href="s6-svscanctl.html">s6-svscanctl</a> whenever the supervision
    206 tree should pick up the differences and start the services you added. </li>
    207  <li> That's it, your services are running. To control them manually,
    208 you can use the <a href="s6-svc.html">s6-svc</a> command. </li>
    209  <li> At the end of the system's lifetime, you can use
    210 <a href="s6-svscanctl.html">s6-svscanctl</a> to bring down the supervision
    211 tree. </li>
    212 </ul>
    213 
    214 <h2> Service-specific logging </h2>
    215 
    216 <p>
    217 <a href="s6-svscan.html">s6-svscan</a> can monitor a supervision tree,
    218 but it can also do one more thing. It can ensure that a daemon's log,
    219 i.e. what the daemon outputs to its stdout (or stderr if you redirect it),
    220 gets processed by another, supervised, long-lived process, called a
    221 <em>logger</em>; and it can make sure that the logs are never lost
    222 between the daemon and the logger - even if the daemon dies, even if the
    223 logger dies.
    224 </p>
    225 
    226 <p>
    227  If your daemon is outputting messages, you have a decision to make
    228 about where to send them.
    229 </p>
    230 
    231 <ul>
    232  <li> You can do as non-supervision systems do, and send the messages
    233 to syslog. It's entirely possible with a supervision system too.
    234 However, like auto-backgrounding, syslog is a historical mechanism that
    235 predates supervision suites, and is technically inferior; it is
    236 recommended that you do not use it whenever you can avoid it. </li>
    237  <li> You can send them to the daemon's stdout/stderr and do nothing special
    238 about it. The logs will then be sent to s6-svscan's stdout/stderr;
    239 what mechanism will read them depends on how you started s6-svscan. </li>
    240  <li> You can use s6-svscan's service-specific logging mechanism and
    241 dedicate a logger process to your daemon's messages. </li>
    242 </ul>
    243 
    244 <p>
    245  s6 provides you with a long-lived process to use as a logger:
    246 <a href="s6-log.html">s6-log</a>. It will store your logs in one (or
    247 more) specific directory of your choice, and rotate them automatically.
    248 </p>
    249 
    250 <h2> Helpers for run scripts </h2>
    251 
    252 <p>
    253  Creating a working
    254 <a href="servicedir.html">service directory</a>, and especially a good
    255 <em>run script</em>, is the most important part of the work when
    256 adapting a daemon to a supervision framework.
    257 </p>
    258 
    259 <p>
    260  If you can find your daemon's invocation script on a non-supervision system,
    261 for instance a System V-style init script, you can see the exact
    262 options that the daemon is being run with: environment variables,
    263 uid and gid, open descriptors, etc. This is what you
    264 need to replicate in your run script.
    265 </p>
    266 
    267 <p>
    268  (Do not replicate the auto-backgrounding, or things like
    269 <a href="http://man.he.net/man8/start-stop-daemon">start-stop-daemon</a>
    270 invocation: start-stop-daemon and its friends are hideous and kludgy
    271 attempts to work around the lack of proper supervision mechanisms. Now
    272 that you have s6, you should remove them from your system, throw them
    273 into a bonfire, and dance and laugh while they burn. Generally speaking,
    274 as a system administrator you want daemons that have been designed
    275 following the principles described
    276 <a href="https://jdebp.uk/FGA/unix-daemon-design-mistakes-to-avoid.html">here</a>,
    277 or at least you want to use the command-line options that make them
    278 behave in such a way.) 
    279 </p>
    280 
    281 <p>
    282  The vast majority of the tools provided by s6 are meant to be used in
    283 run scripts: they help you control the process state and
    284 environment in your script before it executes into your daemon. Or,
    285 sometimes, they are daemons themselves, designed to be supervised.
    286 </p>
    287 
    288 <p>
    289  s6, like other <a href="//skarnet.org/software/">skarnet.org
    290 software</a>, makes heavy use of
    291 <a href="https://en.wikipedia.org/wiki/Chain_loading#Chain_loading_in_Unix">chain
    292 loading</a>, also known as "Bernstein chaining": a lot of s6 tools will
    293 perform some action that changes the process state, then execute into the
    294 rest of their command line. This allows the user to change the process state
    295 in a very flexible way, by combining the right components in the right
    296 order. Very often, a run script can be reduced to a single command line -
    297 likely a long one, but still a single one. (That is the main reason why
    298 using the
    299 <a href="//skarnet.org/software/execline/">execline</a> language
    300 to write run scripts is recommended: execline makes it natural to handle
    301 long command lines made of massive amounts of chain loading. This is by no
    302 means mandatory, though: a run script can be any executable file you want,
    303 provided that running it eventually results in a long-lived process with
    304 the same PID.)
    305 </p>
    306 
    307 <p>
    308  Some examples of s6 programs meant to be used in run scripts:
    309 </p>
    310 
    311 <ul>
    312  <li> The <a href="s6-log.html">s6-log</a> program is a long-lived
    313 process. It is meant to be executed into by a <tt>./log/run</tt>
    314 script: it will be supervised, and will process what it reads on
    315 its stdin (i.e. the output of the <tt>./run</tt> daemon). </li>
    316  <li> The <a href="s6-envdir.html">s6-envdir</a> program is a
    317 short-lived process that will update its current environment according
    318 to what it reads in a given directory, then execute into the rest of its
    319 command line. It is meant to be used in a run script to adjust the
    320 environment with which the final daemon will be executed into. </li>
    321  <li> Similarly, the <a href="s6-softlimit.html">s6-softlimit</a> program
    322 adjusts its resource limits, then executes into the rest of its command
    323 line: it is meant to set the resources the final daemon will have
    324 access to. </li>
    325  <li> The <a href="s6-applyuidgid.html">s6-applyuidgid</a> program,
    326 part of the <tt>s6-*uidgid</tt> family, drops root privileges before
    327 executing into the rest of its command line: it is meant to be used
    328 in run scripts that need root privileges when starting but do not
    329 need it for the execution of the long-lived process. </li>
    330  <li> <a href="s6-ipcserverd.html">s6-ipcserverd</a> is a daemon that
    331 listens to a Unix socket and spawns a program for every connection.
    332 It is meant to be supervised, so it should be used in a run script,
    333 and it's also meant to be a flexible super-server that you can use
    334 for different applications: so it is a building block that may appear in
    335 several of your run scripts defining
    336 <a href="localservice.html">local services</a>. </li>
    337 </ul>
    338 
    339 <h2> Readiness notification and dependency management </h2>
    340 
    341 <p>
    342  Now that you have a supervision tree, and long-lived processes running
    343 supervised, you may want to introduce dependencies between them: do not
    344 perform an action (e.g. start (with <a href="s6-svc.html">s6-svc -u</a>)
    345 the Web server connecting to a database)
    346 before a given daemon is up and running (e.g. the database server).
    347 s6 provides tools to do that:
    348 </p>
    349 
    350 <ul>
    351  <li> The <a href="s6-svwait.html">s6-svwait</a>,
    352 <a href="s6-svlisten1.html">s6-svlisten1</a> and
    353 <a href="s6-svlisten.html">s6-svlisten</a> programs will wait until a set of
    354 daemons is up, ready, down (as soon as the <tt>./run</tt> process dies) or
    355 really down (when the <tt>./finish</tt> process has also died). </li>
    356  <li> Unfortunately, a daemon being <em>up</em> does not mean that it is
    357 <em>ready</em>:
    358 <a href="notifywhenup.html">this page</a> goes into the details. s6
    359 supports a simple mechanism: when a daemon wants to signal that it is
    360 <em>ready</em>, it simply writes a newline to a file descriptor of its
    361 choice, and <a href="s6-supervise.html">s6-supervise</a> will pick that
    362 notification up and broadcast the information to processes waiting for
    363 it. </li>
    364  <li> s6 also has a legacy mechanism for daemons that do not
    365 notify their own readiness but provide a way for an external program
    366 to check whether they're ready or not:
    367 <a href="s6-notifyoncheck.html">s6-notifyoncheck</a>.
    368  This is polling, which is bad, but unfortunately necessary for
    369 many daemons as of 2019. </li>
    370 </ul>
    371 
    372 <p>
    373  s6 does not provide a complete dependency management framework,
    374 i.e. a program to automatically start (or stop) a set of services in a
    375 specific order - that order being automatically computed from a graph of
    376 dependencies between services.
    377  That functionality belongs to a <em>service manager</em>, and is
    378 implemented for instance in the
    379 <a href="//skarnet.org/software/s6-rc/">s6-rc</a> package.
    380 </p>
    381 
    382 <h2> Fine-grained control over services </h2>
    383 
    384 <p>
    385  s6 provides you with a few more tools to control and monitor your
    386 services. For instance:
    387 </p>
    388 
    389 <ul>
    390  <li> <a href="s6-svstat.html">s6-svstat</a> gives you access to
    391 the detailed state of a service </li>
    392  <li> <a href="s6-svperms.html">s6-svperms</a> allows you to configure
    393 what users can read that state, what users can send control
    394 commands to your service, and what users can be notified of
    395 service start/stop events </li>
    396  <li> <a href="s6-svdt.html">s6-svdt</a>
    397 allows you to see what caused the latest deaths of a supervised
    398 process </li>
    399 </ul>
    400 
    401 <p>
    402  These tools make s6 the most powerful and flexible of the existing
    403 process supervision suites.
    404 </p>
    405 
    406 <h2> Additional utilities </h2>
    407 
    408 <p>
    409  The other programs in the s6 package are various utilities that may be
    410 useful in designing servers, and more generally multi-process software.
    411 They can be used with or without a supervision environment, although
    412 it is of course recommended to have one; but they are not part of the core s6
    413 functionality, and you may safely ignore them for now if you are just getting
    414 into the supervision world.
    415 </p>
    416 
    417 <h3> Generic inter-process notification </h3>
    418 
    419 <p>
    420  The <tt>s6-ftrig*</tt> family of programs allows notifications between
    421 unrelated processes: a set of processes can subscribe to a certain
    422 channel - identified by a directory in the filesystem - and ask to be
    423 notified of certain events on that channel; another set of processes can
    424 send events to the channel.
    425 </p>
    426 
    427 <p>
    428  The underlying mechanism is the same as the one used by the supervision
    429 tree for readiness notification, but the <tt>s6-ftrig*</tt> tools provide
    430 a more generic access to that mechanism.
    431 </p>
    432 
    433 <h3> Helpers for designing local services </h3>
    434 
    435 <p>
    436  Local services, i.e. daemons listening to a Unix domain socket, are a
    437 powerful and flexible mechanism, especially with modern Unix systems
    438 that allow client authentication. s6 includes tools to take advantage
    439 of that mechanism.
    440 </p>
    441 
    442 <ul>
    443  <li> The <tt>s6-ipc*</tt> family of programs is about designing clients
    444 or servers that communicate over Unix domain sockets. </li>
    445  <li> The <tt>s6-*access*</tt> and <a href="s6-connlimit.html">s6-connlimit</a>
    446 family of programs is about client access control. </li>
    447  <li> The <tt>s6-sudo*</tt> family of programs is about using a local
    448 service in order to give selected
    449 clients the ability to run a command line with the privileges of the
    450 server, without using suid programs. </li>
    451 </ul>
    452 
    453 <h3> Keeping file descriptors open </h3>
    454 
    455 <p>
    456  Sometimes you want to keep a file descriptor open, even if the program
    457 normally using it dies - so the program can restart and use the same
    458 file descriptor without losing any data. To do that, you need to
    459 <em>hold</em> the descriptor in another process, i.e. that process
    460 should have it open but do nothing with it.
    461 </p>
    462 
    463 <p>
    464 <a href="s6-svscan.html">s6-svscan</a>, for instance, holds the pipe
    465 existing between a supervised daemon and its logger, so even if the
    466 daemon or the logger dies while there are logs in the pipe, the pipe
    467 remains open and the logs are not lost.
    468 </p>
    469 
    470 <p>
    471  s6 provides a mechanism to store and retrieve open file descriptors
    472 in a totally generic way: the <tt>s6-fdholder*</tt> family of programs.
    473 </p>
    474 
    475 <ul>
    476  <li> The <a href="s6-fdholder-daemon.html">s6-fdholder-daemon</a> program
    477 is a daemon (or, rather, executes into the
    478 <a href="s6-fdholderd.html">s6-fdholderd</a> daemon), meant to be
    479 supervised, that will hold file descriptors on its clients' behalf. </li>
    480  <li> Other programs in the family, such as
    481 <a href="s6-fdholder-store.html">s6-fdholder-store</a>, are client
    482 programs that interact with this daemon to store and retrieve file
    483 descriptors. </li>
    484 </ul>
    485 
    486 <p>
    487  Note that "socket activation", one of the main advertised benefits of the
    488 <a href="https://www.freedesktop.org/wiki/Software/systemd/">systemd</a>
    489 init system, sounds similar to fd-holding.
    490 The reality is that socket activation is a mixture of several different
    491 mechanisms, one of which is fd-holding; s6 allows you to implement the
    492 <a href="socket-activation.html">healthy parts</a> of socket activation.
    493 </p>
    494 
    495 <h3> Other miscellaneous utilities </h3>
    496 
    497 <p>
    498  This page does not list or classify every s6 tool. Please
    499 explore the "Reference" section of the
    500 <a href="index.html">main s6 page</a> for details on a specific program.
    501 </p>
    502 
    503 </body>
    504 </html>
	s6 Mirror/fork of https://skarnet.org/software/s6/
	git clone https://ccx.te2000.cz/git/s6
	Log \| Files \| Refs \| README \| LICENSE