s6-svscan-1.html - s6 - Mirror/fork of https://skarnet.org/software/s6/

s6-svscan-1.html (19495B)
      1 <html>
      2   <head>
      3     <meta name="viewport" content="width=device-width, initial-scale=1.0" />
      4     <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
      5     <meta http-equiv="Content-Language" content="en" />
      6     <title>s6: How to run s6-svscan as process 1</title>
      7     <meta name="Description" content="s6: s6-svscan as init" />
      8     <meta name="Keywords" content="s6 supervision svscan s6-svscan init process boot 1" />
      9     <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> -->
     10   </head>
     11 <body>
     12 
     13 <p>
     14 <a href="index.html">s6</a><br />
     15 <a href="//skarnet.org/software/">Software</a><br />
     16 <a href="//skarnet.org/">skarnet.org</a>
     17 </p>
     18 
     19 <h1> How to run s6-svscan as process 1 </h1>
     20 
     21 <p>
     22  <em> Since 2015-06-17, if you're a Linux user, you can use the
     23 <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>
     24 package to help you do so! Please read this documentation page first,
     25 though, it will help you understand what s6-linux-init does. </em>
     26 </p>
     27 
     28 <p>
     29  It is possible to run s6-svscan as process 1, i.e. the <tt>init</tt>
     30 process. However, that does not mean you can directly <em>boot</em>
     31 on s6-svscan; that little program cannot do everything
     32 your stock init does. Replacing the <tt>init</tt> process requires a
     33 bit of understanding of what is going on.
     34 </p>
     35 
     36 <a name="stages">
     37 <h2> The three stages of init </h2>
     38 </a>
     39 
     40 <p>
     41 <small> Okay, it's actually four, but the fourth stage is an implementation
     42 detail that users don't care about, so we'll stick with three. </small>
     43 </p>
     44 
     45 <p>
     46  The life of a Unix machine has three stages.
     47  <small>Yes, three.</small>
     48 </p>
     49 
     50 <ol>
     51  <li> The <em>early initialization</em> phase. It starts when the
     52 kernel launches the first userland process, traditionally called <tt>init</tt>.
     53 During this phase, init is the only lasting process; its duty is to
     54 prepare the machine for the start of <em>other</em> long-lived processes,
     55 i.e. services. Work such as mounting filesystems, setting the system clock,
     56 etc. can be done at this point. This phase ends when process 1 launches
     57 its first services. </li>
     58  <li> The <em>cruising</em> phase. This is the "normal", stable state of an
     59 up and running Unix machine. Early work is done, and init launches and
     60 maintains <em>services</em>, i.e. long-lived processes such as gettys,
     61 the ssh server, and so on. During this phase, init's duties are to reap
     62 orphaned zombies and to supervise services - also allowing the administrator
     63 to add or remove services. This phase ends when the administrator
     64 requires a shutdown. </li>
     65  <li> The <em>shutdown</em> phase. Everything is cleaned up, services are
     66 stopped, filesystems are unmounted, the machine is getting ready to be
     67 halted. At the end of this phase, all processes are killed, first with
     68 a SIGTERM, then with a SIGKILL (to catch processes that resist SIGTERM).
     69 The only processes that survive it are process 1; if this process is
     70 <a href="s6-svscan.html">s6-svscan</a> and its <a href="scandir.html">scandir</a>
     71 is not empty, then the supervision tree is restarted. </li>
     72  <li> The <em>hardware shutdown</em> phase. The system clock is stored,
     73 filesystems are unmounted, and the system call that reboots the machine or
     74 powers it off is called. </li>
     75 </ol>
     76 
     77 <p>
     78 <small> Unless you're implementing a shutdown procedure over a supervision
     79 tree, you can absolutely consider that the hardware shutdown is part of stage 3. </small>
     80 </p>
     81 
     82 <p>
     83  As you can see, process 1's duties are <em>radically different</em> from
     84 one stage to the next, and init has the most work when the machine
     85 is booting or shutting down, which means a normally negligible fraction
     86 of the time it is up. The only common thing is that at no point is process
     87 1 allowed to exit.
     88 </p>
     89 
     90 <p>
     91  Still, all common init systems insist that the same <tt>init</tt>
     92 executable must handle these three stages. From System V init to launchd,
     93 via busybox init, you name it - one init program from bootup to shutdown.
     94 No wonder those programs, even basic ones, seem complex to write and
     95 complex to understand!
     96 </p>
     97 
     98 <p>
     99 Even the <a href="http://smarden.org/runit/runit.8.html">runit</a>
    100 program, designed with supervision in mind, remains as process 1 all the
    101 time; at least runit makes things simple by clearly separating the three
    102 stages and delegating every stage's work to a different script that is
    103 <em>not</em> run as process 1. (Since runit does not distinguish between
    104 stage 3 and stage 4, it needs very careful handling of the
    105 <tt>kill -9 -1</tt> part of stage 3: getting <tt>/etc/runit/3</tt> killed
    106 before it unmounts the filesystems would be bad.)
    107 </p>
    108 
    109 <p>
    110  One init to rule them all?
    111 <a href="https://en.wikipedia.org/wiki/Porgy_and_Bess">It ain't necessarily so!</a>
    112 </p>
    113 
    114 <a name="stage2">
    115 <h2> The role of s6-svscan </h2>
    116 </a>
    117 
    118 <p>
    119  init does not have the right to die, but fortunately, <em>it has the right
    120 to <a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/execve.html">execve()</a>!</em>
    121 During stage 2, why use precious RAM, or at best, swap space, to store data
    122 that are only relevant to stages 1 or 3-4? It only makes sense to have an
    123 init process that handles stage 1, then executes into an init process that
    124 handles stage 2, and when told to shutdown, this "stage 2" init executes into
    125 a "stage 3" init which just performs shutdown. Just as runit does with the
    126 <tt>/etc/runit/[123]</tt> scripts, but exec'ing the scripts as process 1
    127 instead of forking them.
    128 </p>
    129 
    130 <p>
    131 It becomes clear now that
    132 <a href="s6-svscan.html">s6-svscan</a> is perfectly suited to
    133 exactly fulfill process 1's role <strong>during stage 2</strong>.
    134 </p>
    135 
    136 <ul>
    137  <li> It does not die </li>
    138  <li> The reaper takes care of every zombie on the system </li>
    139  <li> The scanner maintains services alive </li>
    140  <li> It can be sent commands via the <a href="s6-svscanctl.html">s6-svscanctl</a>
    141 interface </li>
    142  <li> It execs into a given script when told to </li>
    143 </ul>
    144 
    145 <p>
    146  However, an init process for stage 1 and another one for stage 3 are still
    147 needed. Fortunately, those processes are very easy to design! The only
    148 difficulty here is that they're heavily system-dependent, so it's not possible
    149 to provide a stage 1 init and a stage 3 init that will work everywhere.
    150 s6 was designed to be as portable as possible, and it should run on virtually
    151 every Unix platform; but outside of stage 2 is where portability stops.
    152 </p>
    153 
    154 <p>
    155  The <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>
    156 package provides a tool, <tt>s6-linux-init-maker</tt>, to automatically
    157 create a suitable stage 1 init (so, the <tt>/sbin/init</tt> binary) for
    158 Linux.
    159 It is also possible to write similar tools for other operating systems,
    160 but the details are heavily system-dependent.
    161 </p>
    162 
    163 <p>
    164  For the adventurous and people who need to do this by hand, though, here are
    165 are some general design tips.
    166 </p>
    167 
    168 <a name="stage1">
    169 <h2> How to design a stage 1 init </h2>
    170 </a>
    171 
    172 <h3> What stage 1 init must do </h3>
    173 
    174 <ul>
    175  <li> Prepare an initial <a href="scandir.html">scan directory</a>, say in
    176 <tt>/run/service</tt>, with a few vital services, such as s6-svscan's own logger,
    177 and an early getty (in case debugging is needed). That implies mounting a
    178 read-write filesystem, creating it in RAM if needed, if the root filesystem
    179 is read-only. </li>
    180  <li> Either perform all the one-time initialization, as stage 1
    181 <a href="http://smarden.org/runit/">runit</a> does; </li>
    182  <li> or fork a process that will perform most of the one-time initialization
    183 once s6-svscan is in charge. </li>
    184  <li> Be extremely simple and not fail, because recovery is almost impossible
    185 here. </li>
    186 </ul>
    187 
    188 <p>
    189  Unlike the <tt>/etc/runit/1</tt> script, an init-stage1 script running as
    190 process 1 has nothing to back it up, and if it fails and dies, the machine
    191 crashes. Does that mean the runit approach is better? It's certainly safer,
    192 but not necessarily better, because init-stage1 can be made <em>extremely
    193 small</em>, to the point it is practically failproof, and if it fails, it
    194 means something is so wrong that you
    195 would have had to reboot the machine with <tt>init=/bin/sh</tt> anyway.
    196 </p>
    197 
    198 <p>
    199  To make init-stage1 as small as possible, only this realization is needed:
    200 you do not need to perform all of the one-time initialization tasks before
    201 launching s6-svscan. Actually, once init-stage1 has made it possible for
    202 s6-svscan to run, it can fork a background "init-stage2" process and exec
    203 into s6-svscan immediately! The "init-stage2" process can then pursue the
    204 one-time initialization, with a big advantage over the "init-stage1"
    205 process: s6-svscan is running, as well as a few vital services, and if
    206 something bad happens, there's a getty for the administrator to log on.
    207 No need to play fancy tricks with <tt>/dev/console</tt> anymore! Yes,
    208 the theoretical separation in 3 stages is a bit more flexible in practice:
    209 the "stage 2" process 1 can be already running when a part of the
    210 "stage 1" one-time tasks are still being run.
    211 </p>
    212 
    213 <p>
    214  Of course, that means that the scan directory is still incomplete when
    215 s6-svscan first starts, because most services can't yet be run, for
    216 lack of mounted filesystems, network etc. The "init-stage2" one-time
    217 initialization script must populate the scan directory when it has made
    218 it possible for all wanted services to run, and trigger the scanner.
    219 Once all the one-time tasks are done, the scan directory is fully
    220 populated and the scanner has been triggered, the machine is fully
    221 operational and in stage 2, and the "init-stage2" script can die.
    222 </p>
    223 
    224 <h3> Is it possible to write stage 1 init in a scripting language? </h3>
    225 
    226 <p>
    227  It is very possible, and if you are attempting to write your own stage 1,
    228 I definitely recommend it. If you are using
    229 s6-svscan as stage 2 init, stage 1 init should be simple enough
    230 that it can be written in any scripting language you want, just
    231 as <tt>/etc/runit/1</tt> is if you're using runit. And since it
    232 should be so small, the performance impact will be negligible,
    233 while maintainability is enhanced. Definitely make your stage 1
    234 init a script.
    235 </p>
    236 
    237 <p>
    238  Of course, most people will use the <em>shell</em> as scripting
    239 language; however, I advocate the use of
    240 <a href="//skarnet.org/software/execline/">execline</a>
    241 for this, and not only for the obvious reasons. Piping s6-svscan's
    242 stderr to a logging service before said service is even up requires
    243 some <a href="#log">tricky fifo handling</a> that execline can do
    244 and the shell cannot.
    245 </p>
    246 
    247 <a name="stage3">
    248 <h2> How to design a stage 3-4 init </h2>
    249 </a>
    250 
    251 <p>
    252  If you're using s6-svscan as stage 2 init on <tt>/run/service</tt>, then
    253 stage 3 init is naturally the <tt>/run/service/.s6-svscan/finish</tt> program.
    254 Of course, <tt>/run/service/.s6-svscan/finish</tt> can be a symbolic link
    255 to anything else; just make sure it points to something in the root
    256 filesystem (unless your program is an execline script, in which case
    257 it is not even necessary).
    258 </p>
    259 
    260 <h3> What stage 3-4 init must do </h3>
    261 
    262 <ul>
    263  <li> Destroy the supervision tree and stop all services </li>
    264  <li> Kill all processes <em>save itself</em>, first gently, then harshly, and <em>reap all the zombies</em>. </li>
    265  <li> Up until that point we were in stage 3; now we're in stage 4. </li>
    266  <li> Unmount all the filesystems </li>
    267  <li> Halt or reboot the machine, depending on what root asked for </li>
    268 </ul>
    269 
    270 <p>
    271  This is seemingly very simple, even simpler than stage 1, but experience
    272 shows that it's trickier than it looks.
    273 </p>
    274 
    275 <p>
    276  One tricky part is the <tt>kill -9 -1</tt> operation at the end of
    277 stage 3: you must make sure that <em>process 1</em> regains control and keeps
    278 running after it, because it will be the only process left alive. If you
    279 are running a stage 3 script as process 1, it is almost automatic: your
    280 script survives the kill and continues running, up into stage 4. If you
    281 are using another model, the behaviour becomes system-dependent: your
    282 script may or may not survive the kill, so on systems where it does not,
    283 you will have to design a way to regain control in order to accomplish
    284 stage 4 tasks.
    285 </p>
    286 
    287 <p>
    288  Another tricky part, that is only apparent with practice, is solidity.
    289 It is even more vital that <em>nothing fails</em> during stages 3 and 4
    290 than it is in stage 1, because in stage 1, the worst that can happen is
    291 that the machine does not boot, whereas in stages 3 and 4, the worst that
    292 can happen is that the machine <em>does not shut down</em>, and that is
    293 a much bigger issue.
    294 </p>
    295 
    296 <p>
    297  For these reasons, I now recommend <em>not</em> tearing down the
    298 supervision tree for stages 3-4. It is easier to work in a stable
    299 environment, as a regular process, than it is to manage a whole shutdown
    300 sequence as pid 1: the presence of s6-svscan as pid 1, and of a working
    301 supervision tree, is a pillar you can rely on, and with experience I find
    302 it a good idea to keep the supervision infrastructure running until the end.
    303 Of course, that requires the scandir, and the active supervision directories,
    304 to be on a RAM filesystem such as <tt>tmpfs</tt>; that is good policy
    305 anyway.
    306 </p>
    307 
    308 <h3> Is it possible to write stage 3 init in a scripting language? </h3>
    309 
    310 <p>
    311  Yes, definitely, just like stage 1.
    312 </p>
    313 
    314 <p>
    315  However, you really should leave <tt>/run/service/.s6-svscan/finish</tt>
    316 (and the other scripts in <tt>/run/service/.s6-svscan</tt>) alone, and
    317 write your shutdown sequence without dismantling the supervision tree.
    318 You will still have to stop most of the services, but s6-svscan should
    319 stay. For a more in-depth study of what to do in stages 3-4 and how
    320 to do it, you can look at the source of <tt>s6-linux-init-shutdownd</tt>
    321 in the <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>
    322 package.
    323 </p>
    324 
    325 
    326 <a name="log">
    327 <h2> How to log the supervision tree's messages </h2>
    328 </a>
    329 
    330 <p>
    331  When the Unix kernel launches your (stage 1) init process, it does it
    332 with descriptors 0, 1 and 2 open and reading from or writing to
    333 <tt>/dev/console</tt>. This is okay for the early boot: you actually
    334 want early error messages to be displayed to the system console. But
    335 this is not okay for stage 2: the system console should only be used
    336 to display extremely serious error messages such as kernel errors, or
    337 errors from the logging system itself; everything else should be
    338 handled by the logging system, following the
    339 <a href="s6-log.html#loggingchain">logging chain</a> mechanism. The
    340 supervision tree's messages should go to the catch-all logger instead
    341 of the system console. (And the console should never be read, so no
    342 program should run with <tt>/dev/console</tt> as stdin, but this is easy
    343 enough to fix: s6-svscan will be started with stdin redirected from
    344 <tt>/dev/null</tt>.)
    345 </p>
    346 
    347 <p>
    348  The catch-all logger is a service, and we want <em>every</em>
    349 service to run under the supervision tree. Chicken and egg problem:
    350 before starting s6-svscan, we must redirect s6-svscan's output to
    351 the input of a program that will only be started once s6-svscan is
    352 running and can start services.
    353 </p>
    354 
    355 <p>
    356  There are several solutions to this problem, but the simplest one is
    357 to use a FIFO, a.k.a. named pipe. s6-svscan's stdout and stderr can
    358 be redirected to a named pipe before s6-svscan is run, and the
    359 catch-all logger service can be made to read from this named pipe.
    360 Only two minor problems remain:
    361 </p>
    362 
    363 <ul>
    364  <li> If s6-svscan or s6-supervise writes to the FIFO before there is
    365 a reader, i.e. before the catch-all logging service is started, the
    366 write will fail (and a SIGPIPE will be emitted). This is not a real issue
    367 for an s6 installation because s6-svscan and s6-supervise ignore SIGPIPE,
    368 and they only write
    369 to their stderr if an error occurs; and if an error occurs before they are
    370 able to start the catch-all logger, this means that the system is seriously
    371 damaged (as if an error occurs during stage 1) and the only solution is
    372 to reboot with <tt>init=/bin/sh</tt> anyway. </li>
    373  <li> Normal Unix semantics <em>do not allow</em> a writer to open a
    374 FIFO before there is a reader: if there is no reader when the FIFO is
    375 opened for writing, the <tt>open()</tt> system call <em>blocks</em>
    376 until a reader appears. This is obviously not what we want: we want
    377 to be able to <em>actually start</em> s6-svscan with its stdout and
    378 stderr pointing to the logging FIFO, even without a reader process,
    379 and we want it to run normally so it can start the logging service
    380 that will provide such a reader process. </li>
    381 </ul>
    382 
    383 <p>
    384  This second point cannot be solved in a shell script, and that is why
    385 you are discouraged to write your stage 1 init script in the shell
    386 language: you cannot properly set up a FIFO output for s6-svscan without
    387 resorting to horrible and unreliable hacks involving a temporary background
    388 FIFO reader process.
    389 </p>
    390 
    391 <p>
    392  Instead, you are encouraged to use the
    393 <a href="//skarnet.org/software/execline/">execline</a> language -
    394 or, at least,
    395 the <a href="//skarnet.org/software/execline/redirfd.html">redirfd</a>
    396 command, which is part of the execline distribution. The
    397 <a href="//skarnet.org/software/execline/redirfd.html">redirfd</a>
    398 command does just the right amount of trickery with FIFOs for you to be
    399 able to properly redirect process 1's stdout and stderr to the logging FIFO
    400 without blocking: <tt>redirfd -w 1 /run/service/s6-svscan-log/fifo</tt> blocks
    401 if there's no process reading on <tt>/run/service/s6-svscan-log/fifo</tt>, but
    402 <tt>redirfd -wnb 1 /run/service/s6-svscan-log/fifo</tt> <em>does not</em>.
    403 </p>
    404 
    405 <p>
    406  This trick with FIFOs can even be used to avoid potential race conditions
    407 in the one-time initialization script that runs in stage 2. If forked from
    408 init-stage1 right before executing s6-svscan, depending on the scheduler
    409 mood, this script may actually run a long way before s6-svscan is actually
    410 executed and running the initial services - and may do dangerous things,
    411 such as writing messages to the logging FIFO before there's a reader, and
    412 eating a SIGPIPE and dying without completing the initialization. To avoid
    413 that and be sure that s6-svscan really runs and initial services are really
    414 started before the stage 2 init script is allowed to continue, it is possible
    415 to redirect the child script's output (stdout and/or stderr) <em>once again</em>
    416 to the logging FIFO, but in the normal way without redirfd trickery,  before
    417 it execs into the init-stage2 script. So, the child process blocks on the
    418 FIFO until a reader appears, while process 1 - which does not block - execs
    419 into s6-svscan and starts the logging service, which then opens the logging
    420 FIFO for reading and unblocks the child process, which then runs the
    421 initialization tasks with the guarantee that s6-svscan is running.
    422 </p>
    423 
    424 <p>
    425  It really is simpler than it sounds. :-)
    426 </p>
    427 
    428 <h2> A working example </h2>
    429 
    430 <p>
    431  This whole page may sound very theoretical, dry, wordy, and hard to
    432 grasp without a live example to try things on; unfortunately, s6 cannot provide
    433 live examples without becoming system-specific.
    434 </p>
    435 
    436 <p>
    437  However, the
    438 <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>
    439 package provides you with the
    440 <a href="//skarnet.org/software/s6-linux-init/s6-linux-init-maker.html">s6-linux-init-maker</a>
    441 command, which produces a set of working scripts, including a script
    442 that is suitable as <tt>/sbin/init</tt>, for you to study and edit.
    443 You can <em>run</em> the <tt>s6-linux-init-maker</tt> command even
    444 on non-Linux systems: it will produce scripts that do not work as
    445 is for another OS, but can still be used for study and as a basis for
    446 a working stage 1 script.
    447 </p>
    448 
    449 </body>
    450 </html>
	s6 Mirror/fork of https://skarnet.org/software/s6/
	git clone https://ccx.te2000.cz/git/s6
	Log \| Files \| Refs \| README \| LICENSE