commit 90989cf9c0381c28a0285320518e570da8a6bd00
parent 78068199a7680a8b2deedd01e1b20655c0740068
Author: Laurent Bercot <ska-skaware@skarnet.org>
Date: Mon, 10 Jan 2022 20:37:11 +0000
Update s6-svscan-1.html to reflect how s6-l-i works
Signed-off-by: Laurent Bercot <ska@appnovation.com>
Diffstat:
M | doc/s6-svscan-1.html | | | 105 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------------- |
1 file changed, 75 insertions(+), 30 deletions(-)
diff --git a/doc/s6-svscan-1.html b/doc/s6-svscan-1.html
@@ -38,7 +38,7 @@ bit of understanding of what is going on.
</a>
<p>
- The life of a Unix machine has three stages:
+ The life of a Unix machine has four stages:
</p>
<ol>
@@ -58,12 +58,14 @@ to add or remove services. This phase ends when the administrator
requires a shutdown. </li>
<li> The <em>shutdown</em> phase. Everything is cleaned up, services are
stopped, filesystems are unmounted, the machine is getting ready to be
-halted. During this phase, everything but the shutdown procedure gets
-killed - the only surefire way to kill everything is <tt>kill -9 -1</tt>,
-and only process 1 can survive it and keep working: it's only logical
-that the shutdown procedure, or at least the shutdown procedure from
-the <tt>kill -9 -1</tt> on and until the final poweroff or reboot
-command, is performed by process 1. </li>
+halted. At the end of this phase, all processes are killed, first with
+a SIGTERM, then with a SIGKILL (to catch processes that resist SIGTERM).
+The only processes that survive it are process 1; if this process is
+<a href="s6-svscan.html">s6-svscan</a> and its <a href="scandir.html">scandir</a>
+is not empty, then the supervision tree is restarted. </li>
+ <li> The <em>hardware shutdown</em> phase. The system clock is stored,
+filesystems are unmounted, and the system call that reboots the machine or
+powers it off is called. </li>
</ol>
<p>
@@ -87,8 +89,10 @@ Even the <a href="http://smarden.org/runit/runit.8.html">runit</a>
program, designed with supervision in mind, remains as process 1 all the
time; at least runit makes things simple by clearly separating the three
stages and delegating every stage's work to a different script that is
-<em>not</em> run as process 1. (This requires very careful handling of the
-<tt>kill -9 -1</tt> part of stage 3, though.)
+<em>not</em> run as process 1. (Since runit does not distinguish between
+stage 3 and stage 4, it needs very careful handling of the
+<tt>kill -9 -1</tt> part of stage 3: getting <tt>/etc/runit/3</tt> killed
+before it unmounts the filesystems would be bad.)
</p>
<p>
@@ -104,7 +108,7 @@ stages and delegating every stage's work to a different script that is
init does not have the right to die, but fortunately, <em>it has the right
to <a href="https://pubs.opengroup.org/onlinepubs/9699919799/functions/execve.html">execve()</a>!</em>
During stage 2, why use precious RAM, or at best, swap space, to store data
-that are only relevant to stages 1 or 3? It only makes sense to have an
+that are only relevant to stages 1 or 3-4? It only makes sense to have an
init process that handles stage 1, then executes into an init process that
handles stage 2, and when told to shutdown, this "stage 2" init executes into
a "stage 3" init which just performs shutdown. Just as runit does with the
@@ -158,7 +162,7 @@ are some general design tips.
<ul>
<li> Prepare an initial <a href="scandir.html">scan directory</a>, say in
-<tt>/service</tt>, with a few vital services, such as s6-svscan's own logger,
+<tt>/run/service</tt>, with a few vital services, such as s6-svscan's own logger,
and an early getty (in case debugging is needed). That implies mounting a
read-write filesystem, creating it in RAM if needed, if the root filesystem
is read-only. </li>
@@ -209,7 +213,8 @@ operational and in stage 2, and the "init-stage2" script can die.
<h3> Is it possible to write stage 1 init in a scripting language? </h3>
<p>
- It is very possible, and I even recommend it. If you are using
+ It is very possible, and if you are attempting to write your own stage 1,
+I definitely recommend it. If you are using
s6-svscan as stage 2 init, stage 1 init should be simple enough
that it can be written in any scripting language you want, just
as <tt>/etc/runit/1</tt> is if you're using runit. And since it
@@ -229,44 +234,84 @@ and the shell cannot.
</p>
<a name="stage3">
-<h2> How to design a stage 3 init </h2>
+<h2> How to design a stage 3-4 init </h2>
</a>
<p>
- If you're using s6-svscan as stage 2 init on <tt>/service</tt>, then
-stage 3 init is naturally the <tt>/service/.s6-svscan/finish</tt> program.
-Of course, <tt>/service/.s6-svscan/finish</tt> can be a symbolic link
+ If you're using s6-svscan as stage 2 init on <tt>/run/service</tt>, then
+stage 3 init is naturally the <tt>/run/service/.s6-svscan/finish</tt> program.
+Of course, <tt>/run/service/.s6-svscan/finish</tt> can be a symbolic link
to anything else; just make sure it points to something in the root
filesystem (unless your program is an execline script, in which case
it is not even necessary).
</p>
-<h3> What stage 3 init must do </h3>
+<h3> What stage 3-4 init must do </h3>
<ul>
<li> Destroy the supervision tree and stop all services </li>
- <li> Kill all processes <em>save itself</em>, first gently, then harshly </li>
+ <li> Kill all processes <em>save itself</em>, first gently, then harshly, and <em>reap all the zombies</em>. </li>
+ <li> Up until that point we were in stage 3; now we're in stage 4. </li>
<li> Unmount all the filesystems </li>
<li> Halt or reboot the machine, depending on what root asked for </li>
</ul>
<p>
- This is also very simple; even simpler than stage 1.
- The only tricky part is the <tt>kill -9 -1</tt> phase: you must make sure
-that <em>process 1</em> regains control and keeps running after it, because
-it will be the only process left alive. But since we're running stage 3
-init directly, it's almost automatic! this is an advantage of running
-the shutdown procedure as process 1, as opposed to, for instance,
-<tt>/etc/runit/3</tt>.
+ This is seemingly very simple, even simpler than stage 1, but experience
+shows that it's trickier than it looks.
+</p>
+
+<p>
+ One tricky part is the <tt>kill -9 -1</tt> operation at the end of
+stage 3: you must make sure that <em>process 1</em> regains control and keeps
+running after it, because it will be the only process left alive. If you
+are running a stage 3 script as process 1, it is almost automatic: your
+script survives the kill and continues running, up into stage 4. If you
+are using another model, the behaviour becomes system-dependent: your
+script may or may not survive the kill, so on systems where it does not,
+you will have to design a way to regain control in order to accomplish
+stage 4 tasks.
+</p>
+
+<p>
+ Another tricky part, that is only apparent with practice, is solidity.
+It is even more vital that <em>nothing fails</em> during stages 3 and 4
+than it is in stage 1, because in stage 1, the worst that can happen is
+that the machine does not boot, whereas in stages 3 and 4, the worst that
+can happen is that the machine <em>does not shut down</em>, and that is
+a much bigger issue.
+</p>
+
+<p>
+ For these reasons, I now recommend <em>not</em> tearing down the
+supervision tree for stages 3-4. It is easier to work in a stable
+environment, as a regular process, than it is to manage a whole shutdown
+sequence as pid 1: the presence of s6-svscan as pid 1, and of a working
+supervision tree, is a pillar you can rely on, and with experience I find
+it a good idea to keep the supervision infrastructure running until the end.
+Of course, that requires the scandir, and the active supervision directories,
+to be on a RAM filesystem such as <tt>tmpfs</tt>; that is good policy
+anyway.
</p>
<h3> Is it possible to write stage 3 init in a scripting language? </h3>
<p>
- You'd have to be a masochist, or have extremely specific needs, not to
-do so.
+ Yes, definitely, just like stage 1.
+</p>
+
+<p>
+ However, you really should leave <tt>/run/service/.s6-svscan/finish</tt>
+(and the other scripts in <tt>/run/service/.s6-svscan</tt>) alone, and
+write your shutdown sequence without dismantling the supervision tree.
+You will still have to stop most of the services, but s6-svscan should
+stay. For a more in-depth study of what to do in stages 3-4 and how
+to do it, you can look at the source of <tt>s6-linux-init-shutdownd</tt>
+in the <a href="//skarnet.org/software/s6-linux-init/">s6-linux-init</a>
+package.
</p>
+
<a name="log">
<h2> How to log the supervision tree's messages </h2>
</a>
@@ -341,9 +386,9 @@ command, which is part of the execline distribution. The
<a href="//skarnet.org/software/execline/redirfd.html">redirfd</a>
command does just the right amount of trickery with FIFOs for you to be
able to properly redirect process 1's stdout and stderr to the logging FIFO
-without blocking: <tt>redirfd -w 1 /service/s6-svscan-log/fifo</tt> blocks
-if there's no process reading on <tt>/service/s6-svscan-log/fifo</tt>, but
-<tt>redirfd -wnb 1 /service/s6-svscan-log/fifo</tt> <em>does not</em>.
+without blocking: <tt>redirfd -w 1 /run/service/s6-svscan-log/fifo</tt> blocks
+if there's no process reading on <tt>/run/service/s6-svscan-log/fifo</tt>, but
+<tt>redirfd -wnb 1 /run/service/s6-svscan-log/fifo</tt> <em>does not</em>.
</p>
<p>