s6

Mirror/fork of https://skarnet.org/software/s6/
git clone https://ccx.te2000.cz/git/s6
Log | Files | Refs | README | LICENSE

s6-supervise.html (12456B)


      1 <html>
      2   <head>
      3     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
      4     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
      5     <meta http-equiv="Content-Language" content="en" />
      6     <title>s6: the s6-supervise program</title>
      7     <meta name="Description" content="s6: the s6-supervise program" />
      8     <meta name="Keywords" content="s6 command s6-supervise servicedir supervision supervise" />
      9     <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
     10   </head>
     11 <body>
     12 
     13 <p>
     14 <a href="index.html">s6</a><br />
     15 <a href="//skarnet.org/software/">Software</a><br />
     16 <a href="//skarnet.org/">skarnet.org</a>
     17 </p>
     18 
     19 <h1> The s6-supervise program </h1>
     20 
     21 <p>
     22 s6-supervise monitors a long-lived process (or <em>service</em>), making sure it
     23 stays alive, sending notifications to registered processes when it dies, and
     24 providing an interface to control its state. s6-supervise is designed to be the
     25 last non-leaf branch of a <em>supervision tree</em>, the supervised process
     26 being a leaf.
     27 </p>
     28 
     29 <h2> Interface </h2>
     30 
     31 <pre>
     32      s6-supervise <em>servicedir</em>
     33 </pre>
     34 
     35 <p>
     36  s6-supervise's behaviour is approximately the following:
     37 </p>
     38 
     39 <ul>
     40  <li> s6-supervise changes its current directory to <em>servicedir</em>. </li>
     41  <li> It exits 100 if another s6-supervise process is already monitoring this service. </li>
     42  <li> It forks and executes the <tt>./run</tt> file in the service directory.
     43  <li> <tt>./run</tt> should be a long-lived process: it can chain load (i.e. exec into
     44 other binaries), but should not die. It's the daemon that s6-supervise monitors
     45 and manages. </li>
     46  <li> When <tt>./run</tt> dies, s6-supervise spawns <tt>./finish</tt>, if it exists.
     47 This script should be short-lived: it's meant to clean up application state, if
     48 necessary, that has not been cleaned up by <tt>./run</tt> itself before dying. </li>
     49  <li> When <tt>./finish</tt> dies, s6-supervise spawns <tt>./run</tt> again. </li>
     50  <li> s6-supervise operation can be controlled by the <a href="s6-svc.html">s6-svc</a>
     51 program. It can be sent commands like "restart the service", "bring the service down", etc. </li>
     52  <li> s6-supervise normally runs forever. If told to exit by <a href="s6-svc.html">s6-svc</a>,
     53 it waits for the service to go down one last time, then exits 0. </li>
     54 </ul>
     55 
     56 <p>
     57  For a precise description of s6-supervise's behaviour, check the
     58 <a href="#detailed">Detailed operation</a> section below, as well as
     59 the <a href="servicedir.html">service directory</a> page:
     60 s6-supervise operation can be extensively configured by the presence
     61 of certain files in the service directory.
     62 </p>
     63 
     64 <h2> Options </h2>
     65 
     66 <p>
     67  s6-supervise does not support options, because it is normally not run
     68 manually via a command line; it is usually launched by its own
     69 supervisor, <a href="s6-svscan.html">s6-svscan</a>. The way to
     70 tune s6-supervise's behaviour is via files in the
     71 <a href="servicedir.html">service directory</a>.
     72 </p>
     73 
     74 <h2> Readiness notification support </h2>
     75 
     76 <p>
     77  If the <a href="servicedir.html">service directory</a> contains a valid
     78 <tt>notification-fd</tt> file when the service is started, or restarted,
     79 s6-supervise creates and listens to an additional pipe from the service
     80 for <a href="notifywhenup.html">readiness notification</a>. When the
     81 notification occurs, s6-supervise updates the <tt>./supervise/status</tt>
     82 file accordingly, then sends
     83 a <tt>'U'</tt> event to <tt>./event</tt>.
     84 </p>
     85 
     86 <p>
     87  If the service is logged, i.e. if the service directory has a
     88 <tt>log</tt> subdirectory that is also a service directory, and the
     89 s6-supervise process has been launched by
     90 that is also <a href="s6-svscan.html">s6-svscan</a>, then by default
     91 the service's stdout goes into the logging pipe. If you set
     92 <tt>notification-fd</tt> to 1, the logging pipe will be overwritten
     93 by the notification pipe, which is probably not what you want. Instead,
     94 if your daemon writes a notification message to its stdout, you should
     95 set <tt>notification-fd</tt> to (for instance) 3, and redirect outputs
     96 in your run script. For instance, to redirect stderr to the logger and
     97 stdout to a <tt>notification-fd</tt> set to 3, you would start your
     98 daemon as <tt>fdmove -c 2 1 fdmove 1 3 prog...</tt> (in execline), or
     99 <tt>exec 2&gt;&amp;1 1&gt;&amp;3 3&lt;&amp;- prog...</tt> (in shell).
    100 </p>
    101 
    102 <h2> Signals </h2>
    103 
    104 <p>
    105  s6-supervise reacts to the following signals:
    106 </p>
    107 
    108 <ul>
    109  <li> SIGTERM: bring down the service and exit, as if a
    110 <a href="s6-svc.html">s6-svc -xd</a> command had been received </li>
    111  <li> SIGHUP: close its own stdin and stdout, and exit as soon as the
    112 service stops, as if an <a href="s6-svc.html">s6-svc -x</a> command
    113 had been received </li>
    114  <li> SIGQUIT: exit immediately without touching the service in any
    115 way. </li>
    116  <li> SIGINT: send a SIGINT to the process group of the service, then
    117 exit immediately. (The point here is to correctly forward SIGINT
    118 in the case where s6-supervise is running in a terminal and the user
    119 sent ^C to interrupt it.) </li>
    120 </ul>
    121 
    122 <a name="#detailed">
    123 <h2> Detailed operation </h2>
    124 </a>
    125 
    126 <ul>
    127  <li> s6-supervise switches to the <em>servicedir</em>
    128 <a href="servicedir.html">service directory</a>. </li>
    129  <li> It creates a <tt>supervise/</tt> subdirectory (if it doesn't exist yet) to
    130 store its internal data. </li>
    131  <li> It exits 100 if another s6-supervise process is already monitoring this service. </li>
    132  <li> If the <tt>./event</tt> <a href="fifodir.html">fifodir</a> does not exist,
    133 s6-supervise creates it and allows subscriptions to it from processes having the same
    134 effective group id as the s6-supervise process.
    135 If it already exists, it uses it as is, without modifying the subscription rights. </li>
    136  <li> It <a href="libs6/ftrigw.html">sends</a> a <tt>'s'</tt> event to <tt>./event</tt>. </li>
    137  <li> If the default service state is up (i.e. there is no <tt>./down</tt> file),
    138 s6-supervise spawns <tt>./run</tt>. One argument is given to the <tt>./run</tt>
    139 program: <em>servicedir</em>, the name of the directory s6-supervise is being
    140 run on. It is given exactly as given to s6-supervise, without recanonicalization.
    141 In particular, if s6-supervise is being managed by <a href="s6-svscan.html">s6-svscan</a>,
    142 <em>servicedir</em> is always of the form <tt><em>foo</em></tt> or <tt><em>foo</em>/log</tt>,
    143 and <em>foo</em> contains no slashes. </li>
    144  <li> s6-supervise sends a <tt>'u'</tt> event to <tt>./event</tt> whenever it
    145 successfully spawns <tt>./run</tt>. </li>
    146  <li> If there is a <tt>./notification-fd</tt> file in the service directory and,
    147 at some point after the service has been spawned, s6-supervise is told that the
    148 service is ready, it sends a <tt>'U'</tt> event to <tt>./event</tt>. There are
    149 several ways to tell s6-supervise that the service is ready:
    150   <ul>
    151    <li> the daemon may <a href="notifywhenup.html">do so itself</a>. </li>
    152    <li> the run script may have forked a
    153 <a href="s6-notifyoncheck.html">s6-notifyoncheck</a> process that polls the
    154 service for readiness. </li>
    155   </ul> </li>
    156  <li> When <tt>./run</tt> dies, s6-supervise sends a <tt>'d'</tt> event to <tt>./event</tt>.
    157 It then spawns <tt>./finish</tt> if it exists.
    158 <tt>./finish</tt> will have <tt>./run</tt>'s exit code as first argument, or 256 if
    159 <tt>./run</tt> was signaled; it will have the number of the signal that killed <tt>./run</tt>
    160 as second argument, or an undefined number if <tt>./run</tt> was not signaled;
    161 and it will have <em>servicedir</em> as third argument. </li>
    162  <li> By default, <tt>./finish</tt> must exit in less than 5 seconds. If it takes more than that,
    163 s6-supervise kills it with a SIGKILL. This can be configured via the
    164 <tt>./timeout-finish</tt> file, see the description in the
    165 <a href="servicedir.html">service directory page</a>. </li>
    166  <li> When <tt>./finish</tt> dies (or is killed),
    167 s6-supervise sends a <tt>'D'</tt> event to <tt>./event</tt>. Then
    168 it restarts <tt>./run</tt> unless it has been told not to. </li>
    169  <li> If <tt>./finish</tt> exits 125, then s6-supervise sends a <tt>'O'</tt> event
    170 to <tt>./event</tt> <em>before</em> the <tt>'D'</tt>, and it
    171 <strong>does not restart the service</strong>, as if <tt>s6-svc -O</tt> had
    172 been called. This can be used to signify permanent failure to start the service. </li>
    173  <li> There is a minimum 1-second delay between two <tt>./run</tt> spawns, to avoid busylooping
    174 if <tt>./run</tt> exits too quickly. If the service has been <em>ready</em> for more
    175 than one second, it will restart immediately, but if it is not <em>ready</em> when
    176 it dies, s6-supervise will always pause for 1 second before spawning it again. </li>
    177  <li> When killed or asked to exit, it waits for the service to go down one last time, then
    178 sends a <tt>'x'</tt> event to <tt>./event</tt> before exiting 0. </li>
    179 </ul>
    180 
    181 <p>
    182  Make sure to also check the <a href="servicedir.html">service directory</a>
    183 documentation page, for the full list of files that can be present in a service
    184 directory and impact s6-supervise's behaviour in any way.
    185 </p>
    186 
    187 <h2> Usage notes </h2>
    188 
    189 <ul>
    190  <li> s6-supervise is a long-lived process. It normally runs forever, from the system's
    191 boot scripts, until shutdown time; it should not be killed or told to exit. If you have
    192 no use for a service, just turn it off; the s6-supervise process does not hurt. </li>
    193  <li> Even in boot scripts, s6-supervise should normally not be run directly. It's
    194 better to have a collection of <a href="servicedir.html">service directories</a> in a
    195 single <a href="scandir.html">scan directory</a>, and just run
    196 <a href="s6-svscan.html">s6-svscan</a> on that scan directory. s6-svscan will spawn
    197 the necessary s6-supervise processes, and will also take care of logged services. </li>
    198  <li> s6-supervise always spawns its child in a new session, as a session leader.
    199 The goal is to protect the supervision tree from misbehaved services that would
    200 send signals to their whole process group. Nevertheless, s6-supervise's handling of
    201 SIGINT ensures that its service is killed if you happen to run it in a terminal and
    202 send it a ^C. </li>
    203  <li> You can use <a href="s6-svc.html">s6-svc</a> to send commands to the s6-supervise
    204 process; mostly to change the service state and send signals to the monitored
    205 process. </li>
    206  <li> You can use <a href="s6-svok.html">s6-svok</a> to check whether s6-supervise
    207 is successfully running. </li>
    208  <li> You can use <a href="s6-svstat.html">s6-svstat</a> to check the status of a
    209 service. </li>
    210  <li> s6-supervise maintains internal information inside the <tt>./supervise</tt>
    211 subdirectory of <em>servicedir</em>. <em>servicedir</em> itself can be read-only,
    212 but both <em>servicedir</em><tt>/supervise</tt> and <em>servicedir</em><tt>/event</tt>
    213 need to be read-write. </li>
    214  <li> If <em>servicedir</em> isn't writable by s6-supervise, for any reason, then the
    215 <a href="s6-svc.html">s6-svc</a> <tt>-D</tt> and <tt>-U</tt> commands will not work
    216 properly since s6-supervise will be unable to create or delete a
    217 <em>servicedir</em><tt>/down</tt> file; in this case s6-supervise will print a warning
    218 on stderr, and perform the equivalent of <tt>-d</tt> or <tt>-u</tt> instead &mdash; it
    219 will just be unable to change the permanent service configuration. </li>
    220 </ul>
    221 
    222 <h2> Implementation notes </h2>
    223 
    224 <ul>
    225  <li> s6-supervise tries its best to stay alive and running despite possible
    226 system call failures. It will write to its standard error everytime it encounters a
    227 problem. However, unlike <a href="s6-svscan.html">s6-svscan</a>, it will not go out
    228 of its way to stay alive; if it encounters an unsolvable situation, it will just
    229 die. </li>
    230  <li> Unlike other "supervise" implementations, s6-supervise is a fully asynchronous
    231 state machine. That means that it can read and process commands at any time, even
    232 when the machine is in trouble (full process table, for instance). </li>
    233  <li> s6-supervise <em>does not use malloc()</em>. That means it will <em>never leak
    234 memory</em>. <small>However, s6-supervise uses opendir(), and most opendir()
    235 implementations internally use heap memory - so unfortunately, it's impossible to
    236 guarantee that s6-supervise does not use heap memory at all.</small> </li>
    237  <li> s6-supervise has been carefully designed so every instance maintains as little
    238 data as possible, so it uses a very small
    239 amount of non-sharable memory. It is not a problem to have several
    240 dozens of s6-supervise processes, even on constrained systems: resource consumption
    241 will be negligible. </li>
    242 </ul>
    243 
    244 </body>
    245 </html>