s6-svscan-1.html (19495B)
1 <html> 2 <head> 3 <meta name="viewport" content="width=device-width, initial-scale=1.0" /> 4 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 5 <meta http-equiv="Content-Language" content="en" /> 6 <title>s6: How to run s6-svscan as process 1</title> 7 <meta name="Description" content="s6: s6-svscan as init" /> 8 <meta name="Keywords" content="s6 supervision svscan s6-svscan init process boot 1" /> 9 <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> --> 10 </head> 11 <body> 12 13 <p> 14 <a href="index.html">s6</a><br /> 15 <a href="//skarnet.org/software/">Software</a><br /> 16 <a href="//skarnet.org/">skarnet.org</a> 17 </p> 18 19 <h1> How to run s6-svscan as process 1 </h1> 20 21 <p> 22 <em> Since 2015-06-17, if you're a Linux user, you can use the 23 <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a> 24 package to help you do so! Please read this documentation page first, 25 though, it will help you understand what s6-linux-init does. </em> 26 </p> 27 28 <p> 29 It is possible to run s6-svscan as process 1, i.e. the <tt>init</tt> 30 process. However, that does not mean you can directly <em>boot</em> 31 on s6-svscan; that little program cannot do everything 32 your stock init does. Replacing the <tt>init</tt> process requires a 33 bit of understanding of what is going on. 34 </p> 35 36 <a name="stages"> 37 <h2> The three stages of init </h2> 38 </a> 39 40 <p> 41 <small> Okay, it's actually four, but the fourth stage is an implementation 42 detail that users don't care about, so we'll stick with three. </small> 43 </p> 44 45 <p> 46 The life of a Unix machine has three stages. 47 <small>Yes, three.</small> 48 </p> 49 50 <ol> 51 <li> The <em>early initialization</em> phase. It starts when the 52 kernel launches the first userland process, traditionally called <tt>init</tt>. 53 During this phase, init is the only lasting process; its duty is to 54 prepare the machine for the start of <em>other</em> long-lived processes, 55 i.e. services. Work such as mounting filesystems, setting the system clock, 56 etc. can be done at this point. This phase ends when process 1 launches 57 its first services. </li> 58 <li> The <em>cruising</em> phase. This is the "normal", stable state of an 59 up and running Unix machine. Early work is done, and init launches and 60 maintains <em>services</em>, i.e. long-lived processes such as gettys, 61 the ssh server, and so on. During this phase, init's duties are to reap 62 orphaned zombies and to supervise services - also allowing the administrator 63 to add or remove services. This phase ends when the administrator 64 requires a shutdown. </li> 65 <li> The <em>shutdown</em> phase. Everything is cleaned up, services are 66 stopped, filesystems are unmounted, the machine is getting ready to be 67 halted. At the end of this phase, all processes are killed, first with 68 a SIGTERM, then with a SIGKILL (to catch processes that resist SIGTERM). 69 The only processes that survive it are process 1; if this process is 70 <a href="s6-svscan.html">s6-svscan</a> and its <a href="scandir.html">scandir</a> 71 is not empty, then the supervision tree is restarted. </li> 72 <li> The <em>hardware shutdown</em> phase. The system clock is stored, 73 filesystems are unmounted, and the system call that reboots the machine or 74 powers it off is called. </li> 75 </ol> 76 77 <p> 78 <small> Unless you're implementing a shutdown procedure over a supervision 79 tree, you can absolutely consider that the hardware shutdown is part of stage 3. </small> 80 </p> 81 82 <p> 83 As you can see, process 1's duties are <em>radically different</em> from 84 one stage to the next, and init has the most work when the machine 85 is booting or shutting down, which means a normally negligible fraction 86 of the time it is up. The only common thing is that at no point is process 87 1 allowed to exit. 88 </p> 89 90 <p> 91 Still, all common init systems insist that the same <tt>init</tt> 92 executable must handle these three stages. From System V init to launchd, 93 via busybox init, you name it - one init program from bootup to shutdown. 94 No wonder those programs, even basic ones, seem complex to write and 95 complex to understand! 96 </p> 97 98 <p> 99 Even the <a href="http://smarden.org/runit/runit.8.html">runit</a> 100 program, designed with supervision in mind, remains as process 1 all the 101 time; at least runit makes things simple by clearly separating the three 102 stages and delegating every stage's work to a different script that is 103 <em>not</em> run as process 1. (Since runit does not distinguish between 104 stage 3 and stage 4, it needs very careful handling of the 105 <tt>kill -9 -1</tt> part of stage 3: getting <tt>/etc/runit/3</tt> killed 106 before it unmounts the filesystems would be bad.) 107 </p> 108 109 <p> 110 One init to rule them all? 111 <a href="https://en.wikipedia.org/wiki/Porgy_and_Bess">It ain't necessarily so!</a> 112 </p> 113 114 <a name="stage2"> 115 <h2> The role of s6-svscan </h2> 116 </a> 117 118 <p> 119 init does not have the right to die, but fortunately, <em>it has the right 120 to <a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/execve.html">execve()</a>!</em> 121 During stage 2, why use precious RAM, or at best, swap space, to store data 122 that are only relevant to stages 1 or 3-4? It only makes sense to have an 123 init process that handles stage 1, then executes into an init process that 124 handles stage 2, and when told to shutdown, this "stage 2" init executes into 125 a "stage 3" init which just performs shutdown. Just as runit does with the 126 <tt>/etc/runit/[123]</tt> scripts, but exec'ing the scripts as process 1 127 instead of forking them. 128 </p> 129 130 <p> 131 It becomes clear now that 132 <a href="s6-svscan.html">s6-svscan</a> is perfectly suited to 133 exactly fulfill process 1's role <strong>during stage 2</strong>. 134 </p> 135 136 <ul> 137 <li> It does not die </li> 138 <li> The reaper takes care of every zombie on the system </li> 139 <li> The scanner maintains services alive </li> 140 <li> It can be sent commands via the <a href="s6-svscanctl.html">s6-svscanctl</a> 141 interface </li> 142 <li> It execs into a given script when told to </li> 143 </ul> 144 145 <p> 146 However, an init process for stage 1 and another one for stage 3 are still 147 needed. Fortunately, those processes are very easy to design! The only 148 difficulty here is that they're heavily system-dependent, so it's not possible 149 to provide a stage 1 init and a stage 3 init that will work everywhere. 150 s6 was designed to be as portable as possible, and it should run on virtually 151 every Unix platform; but outside of stage 2 is where portability stops. 152 </p> 153 154 <p> 155 The <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a> 156 package provides a tool, <tt>s6-linux-init-maker</tt>, to automatically 157 create a suitable stage 1 init (so, the <tt>/sbin/init</tt> binary) for 158 Linux. 159 It is also possible to write similar tools for other operating systems, 160 but the details are heavily system-dependent. 161 </p> 162 163 <p> 164 For the adventurous and people who need to do this by hand, though, here are 165 are some general design tips. 166 </p> 167 168 <a name="stage1"> 169 <h2> How to design a stage 1 init </h2> 170 </a> 171 172 <h3> What stage 1 init must do </h3> 173 174 <ul> 175 <li> Prepare an initial <a href="scandir.html">scan directory</a>, say in 176 <tt>/run/service</tt>, with a few vital services, such as s6-svscan's own logger, 177 and an early getty (in case debugging is needed). That implies mounting a 178 read-write filesystem, creating it in RAM if needed, if the root filesystem 179 is read-only. </li> 180 <li> Either perform all the one-time initialization, as stage 1 181 <a href="http://smarden.org/runit/">runit</a> does; </li> 182 <li> or fork a process that will perform most of the one-time initialization 183 once s6-svscan is in charge. </li> 184 <li> Be extremely simple and not fail, because recovery is almost impossible 185 here. </li> 186 </ul> 187 188 <p> 189 Unlike the <tt>/etc/runit/1</tt> script, an init-stage1 script running as 190 process 1 has nothing to back it up, and if it fails and dies, the machine 191 crashes. Does that mean the runit approach is better? It's certainly safer, 192 but not necessarily better, because init-stage1 can be made <em>extremely 193 small</em>, to the point it is practically failproof, and if it fails, it 194 means something is so wrong that you 195 would have had to reboot the machine with <tt>init=/bin/sh</tt> anyway. 196 </p> 197 198 <p> 199 To make init-stage1 as small as possible, only this realization is needed: 200 you do not need to perform all of the one-time initialization tasks before 201 launching s6-svscan. Actually, once init-stage1 has made it possible for 202 s6-svscan to run, it can fork a background "init-stage2" process and exec 203 into s6-svscan immediately! The "init-stage2" process can then pursue the 204 one-time initialization, with a big advantage over the "init-stage1" 205 process: s6-svscan is running, as well as a few vital services, and if 206 something bad happens, there's a getty for the administrator to log on. 207 No need to play fancy tricks with <tt>/dev/console</tt> anymore! Yes, 208 the theoretical separation in 3 stages is a bit more flexible in practice: 209 the "stage 2" process 1 can be already running when a part of the 210 "stage 1" one-time tasks are still being run. 211 </p> 212 213 <p> 214 Of course, that means that the scan directory is still incomplete when 215 s6-svscan first starts, because most services can't yet be run, for 216 lack of mounted filesystems, network etc. The "init-stage2" one-time 217 initialization script must populate the scan directory when it has made 218 it possible for all wanted services to run, and trigger the scanner. 219 Once all the one-time tasks are done, the scan directory is fully 220 populated and the scanner has been triggered, the machine is fully 221 operational and in stage 2, and the "init-stage2" script can die. 222 </p> 223 224 <h3> Is it possible to write stage 1 init in a scripting language? </h3> 225 226 <p> 227 It is very possible, and if you are attempting to write your own stage 1, 228 I definitely recommend it. If you are using 229 s6-svscan as stage 2 init, stage 1 init should be simple enough 230 that it can be written in any scripting language you want, just 231 as <tt>/etc/runit/1</tt> is if you're using runit. And since it 232 should be so small, the performance impact will be negligible, 233 while maintainability is enhanced. Definitely make your stage 1 234 init a script. 235 </p> 236 237 <p> 238 Of course, most people will use the <em>shell</em> as scripting 239 language; however, I advocate the use of 240 <a href="//skarnet.org/software/execline/">execline</a> 241 for this, and not only for the obvious reasons. Piping s6-svscan's 242 stderr to a logging service before said service is even up requires 243 some <a href="#log">tricky fifo handling</a> that execline can do 244 and the shell cannot. 245 </p> 246 247 <a name="stage3"> 248 <h2> How to design a stage 3-4 init </h2> 249 </a> 250 251 <p> 252 If you're using s6-svscan as stage 2 init on <tt>/run/service</tt>, then 253 stage 3 init is naturally the <tt>/run/service/.s6-svscan/finish</tt> program. 254 Of course, <tt>/run/service/.s6-svscan/finish</tt> can be a symbolic link 255 to anything else; just make sure it points to something in the root 256 filesystem (unless your program is an execline script, in which case 257 it is not even necessary). 258 </p> 259 260 <h3> What stage 3-4 init must do </h3> 261 262 <ul> 263 <li> Destroy the supervision tree and stop all services </li> 264 <li> Kill all processes <em>save itself</em>, first gently, then harshly, and <em>reap all the zombies</em>. </li> 265 <li> Up until that point we were in stage 3; now we're in stage 4. </li> 266 <li> Unmount all the filesystems </li> 267 <li> Halt or reboot the machine, depending on what root asked for </li> 268 </ul> 269 270 <p> 271 This is seemingly very simple, even simpler than stage 1, but experience 272 shows that it's trickier than it looks. 273 </p> 274 275 <p> 276 One tricky part is the <tt>kill -9 -1</tt> operation at the end of 277 stage 3: you must make sure that <em>process 1</em> regains control and keeps 278 running after it, because it will be the only process left alive. If you 279 are running a stage 3 script as process 1, it is almost automatic: your 280 script survives the kill and continues running, up into stage 4. If you 281 are using another model, the behaviour becomes system-dependent: your 282 script may or may not survive the kill, so on systems where it does not, 283 you will have to design a way to regain control in order to accomplish 284 stage 4 tasks. 285 </p> 286 287 <p> 288 Another tricky part, that is only apparent with practice, is solidity. 289 It is even more vital that <em>nothing fails</em> during stages 3 and 4 290 than it is in stage 1, because in stage 1, the worst that can happen is 291 that the machine does not boot, whereas in stages 3 and 4, the worst that 292 can happen is that the machine <em>does not shut down</em>, and that is 293 a much bigger issue. 294 </p> 295 296 <p> 297 For these reasons, I now recommend <em>not</em> tearing down the 298 supervision tree for stages 3-4. It is easier to work in a stable 299 environment, as a regular process, than it is to manage a whole shutdown 300 sequence as pid 1: the presence of s6-svscan as pid 1, and of a working 301 supervision tree, is a pillar you can rely on, and with experience I find 302 it a good idea to keep the supervision infrastructure running until the end. 303 Of course, that requires the scandir, and the active supervision directories, 304 to be on a RAM filesystem such as <tt>tmpfs</tt>; that is good policy 305 anyway. 306 </p> 307 308 <h3> Is it possible to write stage 3 init in a scripting language? </h3> 309 310 <p> 311 Yes, definitely, just like stage 1. 312 </p> 313 314 <p> 315 However, you really should leave <tt>/run/service/.s6-svscan/finish</tt> 316 (and the other scripts in <tt>/run/service/.s6-svscan</tt>) alone, and 317 write your shutdown sequence without dismantling the supervision tree. 318 You will still have to stop most of the services, but s6-svscan should 319 stay. For a more in-depth study of what to do in stages 3-4 and how 320 to do it, you can look at the source of <tt>s6-linux-init-shutdownd</tt> 321 in the <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a> 322 package. 323 </p> 324 325 326 <a name="log"> 327 <h2> How to log the supervision tree's messages </h2> 328 </a> 329 330 <p> 331 When the Unix kernel launches your (stage 1) init process, it does it 332 with descriptors 0, 1 and 2 open and reading from or writing to 333 <tt>/dev/console</tt>. This is okay for the early boot: you actually 334 want early error messages to be displayed to the system console. But 335 this is not okay for stage 2: the system console should only be used 336 to display extremely serious error messages such as kernel errors, or 337 errors from the logging system itself; everything else should be 338 handled by the logging system, following the 339 <a href="s6-log.html#loggingchain">logging chain</a> mechanism. The 340 supervision tree's messages should go to the catch-all logger instead 341 of the system console. (And the console should never be read, so no 342 program should run with <tt>/dev/console</tt> as stdin, but this is easy 343 enough to fix: s6-svscan will be started with stdin redirected from 344 <tt>/dev/null</tt>.) 345 </p> 346 347 <p> 348 The catch-all logger is a service, and we want <em>every</em> 349 service to run under the supervision tree. Chicken and egg problem: 350 before starting s6-svscan, we must redirect s6-svscan's output to 351 the input of a program that will only be started once s6-svscan is 352 running and can start services. 353 </p> 354 355 <p> 356 There are several solutions to this problem, but the simplest one is 357 to use a FIFO, a.k.a. named pipe. s6-svscan's stdout and stderr can 358 be redirected to a named pipe before s6-svscan is run, and the 359 catch-all logger service can be made to read from this named pipe. 360 Only two minor problems remain: 361 </p> 362 363 <ul> 364 <li> If s6-svscan or s6-supervise writes to the FIFO before there is 365 a reader, i.e. before the catch-all logging service is started, the 366 write will fail (and a SIGPIPE will be emitted). This is not a real issue 367 for an s6 installation because s6-svscan and s6-supervise ignore SIGPIPE, 368 and they only write 369 to their stderr if an error occurs; and if an error occurs before they are 370 able to start the catch-all logger, this means that the system is seriously 371 damaged (as if an error occurs during stage 1) and the only solution is 372 to reboot with <tt>init=/bin/sh</tt> anyway. </li> 373 <li> Normal Unix semantics <em>do not allow</em> a writer to open a 374 FIFO before there is a reader: if there is no reader when the FIFO is 375 opened for writing, the <tt>open()</tt> system call <em>blocks</em> 376 until a reader appears. This is obviously not what we want: we want 377 to be able to <em>actually start</em> s6-svscan with its stdout and 378 stderr pointing to the logging FIFO, even without a reader process, 379 and we want it to run normally so it can start the logging service 380 that will provide such a reader process. </li> 381 </ul> 382 383 <p> 384 This second point cannot be solved in a shell script, and that is why 385 you are discouraged to write your stage 1 init script in the shell 386 language: you cannot properly set up a FIFO output for s6-svscan without 387 resorting to horrible and unreliable hacks involving a temporary background 388 FIFO reader process. 389 </p> 390 391 <p> 392 Instead, you are encouraged to use the 393 <a href="//skarnet.org/software/execline/">execline</a> language - 394 or, at least, 395 the <a href="//skarnet.org/software/execline/redirfd.html">redirfd</a> 396 command, which is part of the execline distribution. The 397 <a href="//skarnet.org/software/execline/redirfd.html">redirfd</a> 398 command does just the right amount of trickery with FIFOs for you to be 399 able to properly redirect process 1's stdout and stderr to the logging FIFO 400 without blocking: <tt>redirfd -w 1 /run/service/s6-svscan-log/fifo</tt> blocks 401 if there's no process reading on <tt>/run/service/s6-svscan-log/fifo</tt>, but 402 <tt>redirfd -wnb 1 /run/service/s6-svscan-log/fifo</tt> <em>does not</em>. 403 </p> 404 405 <p> 406 This trick with FIFOs can even be used to avoid potential race conditions 407 in the one-time initialization script that runs in stage 2. If forked from 408 init-stage1 right before executing s6-svscan, depending on the scheduler 409 mood, this script may actually run a long way before s6-svscan is actually 410 executed and running the initial services - and may do dangerous things, 411 such as writing messages to the logging FIFO before there's a reader, and 412 eating a SIGPIPE and dying without completing the initialization. To avoid 413 that and be sure that s6-svscan really runs and initial services are really 414 started before the stage 2 init script is allowed to continue, it is possible 415 to redirect the child script's output (stdout and/or stderr) <em>once again</em> 416 to the logging FIFO, but in the normal way without redirfd trickery, before 417 it execs into the init-stage2 script. So, the child process blocks on the 418 FIFO until a reader appears, while process 1 - which does not block - execs 419 into s6-svscan and starts the logging service, which then opens the logging 420 FIFO for reading and unblocks the child process, which then runs the 421 initialization tasks with the guarantee that s6-svscan is running. 422 </p> 423 424 <p> 425 It really is simpler than it sounds. :-) 426 </p> 427 428 <h2> A working example </h2> 429 430 <p> 431 This whole page may sound very theoretical, dry, wordy, and hard to 432 grasp without a live example to try things on; unfortunately, s6 cannot provide 433 live examples without becoming system-specific. 434 </p> 435 436 <p> 437 However, the 438 <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a> 439 package provides you with the 440 <a href="//skarnet.org/software/s6-linux-init/s6-linux-init-maker.html">s6-linux-init-maker</a> 441 command, which produces a set of working scripts, including a script 442 that is suitable as <tt>/sbin/init</tt>, for you to study and edit. 443 You can <em>run</em> the <tt>s6-linux-init-maker</tt> command even 444 on non-Linux systems: it will produce scripts that do not work as 445 is for another OS, but can still be used for study and as a basis for 446 a working stage 1 script. 447 </p> 448 449 </body> 450 </html>