s6-supervise.html (12456B)
1 <html> 2 <head> 3 <meta name="viewport" content="width=device-width, initial-scale=1.0" /> 4 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 5 <meta http-equiv="Content-Language" content="en" /> 6 <title>s6: the s6-supervise program</title> 7 <meta name="Description" content="s6: the s6-supervise program" /> 8 <meta name="Keywords" content="s6 command s6-supervise servicedir supervision supervise" /> 9 <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> --> 10 </head> 11 <body> 12 13 <p> 14 <a href="index.html">s6</a><br /> 15 <a href="//skarnet.org/software/">Software</a><br /> 16 <a href="//skarnet.org/">skarnet.org</a> 17 </p> 18 19 <h1> The s6-supervise program </h1> 20 21 <p> 22 s6-supervise monitors a long-lived process (or <em>service</em>), making sure it 23 stays alive, sending notifications to registered processes when it dies, and 24 providing an interface to control its state. s6-supervise is designed to be the 25 last non-leaf branch of a <em>supervision tree</em>, the supervised process 26 being a leaf. 27 </p> 28 29 <h2> Interface </h2> 30 31 <pre> 32 s6-supervise <em>servicedir</em> 33 </pre> 34 35 <p> 36 s6-supervise's behaviour is approximately the following: 37 </p> 38 39 <ul> 40 <li> s6-supervise changes its current directory to <em>servicedir</em>. </li> 41 <li> It exits 100 if another s6-supervise process is already monitoring this service. </li> 42 <li> It forks and executes the <tt>./run</tt> file in the service directory. 43 <li> <tt>./run</tt> should be a long-lived process: it can chain load (i.e. exec into 44 other binaries), but should not die. It's the daemon that s6-supervise monitors 45 and manages. </li> 46 <li> When <tt>./run</tt> dies, s6-supervise spawns <tt>./finish</tt>, if it exists. 47 This script should be short-lived: it's meant to clean up application state, if 48 necessary, that has not been cleaned up by <tt>./run</tt> itself before dying. </li> 49 <li> When <tt>./finish</tt> dies, s6-supervise spawns <tt>./run</tt> again. </li> 50 <li> s6-supervise operation can be controlled by the <a href="s6-svc.html">s6-svc</a> 51 program. It can be sent commands like "restart the service", "bring the service down", etc. </li> 52 <li> s6-supervise normally runs forever. If told to exit by <a href="s6-svc.html">s6-svc</a>, 53 it waits for the service to go down one last time, then exits 0. </li> 54 </ul> 55 56 <p> 57 For a precise description of s6-supervise's behaviour, check the 58 <a href="#detailed">Detailed operation</a> section below, as well as 59 the <a href="servicedir.html">service directory</a> page: 60 s6-supervise operation can be extensively configured by the presence 61 of certain files in the service directory. 62 </p> 63 64 <h2> Options </h2> 65 66 <p> 67 s6-supervise does not support options, because it is normally not run 68 manually via a command line; it is usually launched by its own 69 supervisor, <a href="s6-svscan.html">s6-svscan</a>. The way to 70 tune s6-supervise's behaviour is via files in the 71 <a href="servicedir.html">service directory</a>. 72 </p> 73 74 <h2> Readiness notification support </h2> 75 76 <p> 77 If the <a href="servicedir.html">service directory</a> contains a valid 78 <tt>notification-fd</tt> file when the service is started, or restarted, 79 s6-supervise creates and listens to an additional pipe from the service 80 for <a href="notifywhenup.html">readiness notification</a>. When the 81 notification occurs, s6-supervise updates the <tt>./supervise/status</tt> 82 file accordingly, then sends 83 a <tt>'U'</tt> event to <tt>./event</tt>. 84 </p> 85 86 <p> 87 If the service is logged, i.e. if the service directory has a 88 <tt>log</tt> subdirectory that is also a service directory, and the 89 s6-supervise process has been launched by 90 that is also <a href="s6-svscan.html">s6-svscan</a>, then by default 91 the service's stdout goes into the logging pipe. If you set 92 <tt>notification-fd</tt> to 1, the logging pipe will be overwritten 93 by the notification pipe, which is probably not what you want. Instead, 94 if your daemon writes a notification message to its stdout, you should 95 set <tt>notification-fd</tt> to (for instance) 3, and redirect outputs 96 in your run script. For instance, to redirect stderr to the logger and 97 stdout to a <tt>notification-fd</tt> set to 3, you would start your 98 daemon as <tt>fdmove -c 2 1 fdmove 1 3 prog...</tt> (in execline), or 99 <tt>exec 2>&1 1>&3 3<&- prog...</tt> (in shell). 100 </p> 101 102 <h2> Signals </h2> 103 104 <p> 105 s6-supervise reacts to the following signals: 106 </p> 107 108 <ul> 109 <li> SIGTERM: bring down the service and exit, as if a 110 <a href="s6-svc.html">s6-svc -xd</a> command had been received </li> 111 <li> SIGHUP: close its own stdin and stdout, and exit as soon as the 112 service stops, as if an <a href="s6-svc.html">s6-svc -x</a> command 113 had been received </li> 114 <li> SIGQUIT: exit immediately without touching the service in any 115 way. </li> 116 <li> SIGINT: send a SIGINT to the process group of the service, then 117 exit immediately. (The point here is to correctly forward SIGINT 118 in the case where s6-supervise is running in a terminal and the user 119 sent ^C to interrupt it.) </li> 120 </ul> 121 122 <a name="#detailed"> 123 <h2> Detailed operation </h2> 124 </a> 125 126 <ul> 127 <li> s6-supervise switches to the <em>servicedir</em> 128 <a href="servicedir.html">service directory</a>. </li> 129 <li> It creates a <tt>supervise/</tt> subdirectory (if it doesn't exist yet) to 130 store its internal data. </li> 131 <li> It exits 100 if another s6-supervise process is already monitoring this service. </li> 132 <li> If the <tt>./event</tt> <a href="fifodir.html">fifodir</a> does not exist, 133 s6-supervise creates it and allows subscriptions to it from processes having the same 134 effective group id as the s6-supervise process. 135 If it already exists, it uses it as is, without modifying the subscription rights. </li> 136 <li> It <a href="libs6/ftrigw.html">sends</a> a <tt>'s'</tt> event to <tt>./event</tt>. </li> 137 <li> If the default service state is up (i.e. there is no <tt>./down</tt> file), 138 s6-supervise spawns <tt>./run</tt>. One argument is given to the <tt>./run</tt> 139 program: <em>servicedir</em>, the name of the directory s6-supervise is being 140 run on. It is given exactly as given to s6-supervise, without recanonicalization. 141 In particular, if s6-supervise is being managed by <a href="s6-svscan.html">s6-svscan</a>, 142 <em>servicedir</em> is always of the form <tt><em>foo</em></tt> or <tt><em>foo</em>/log</tt>, 143 and <em>foo</em> contains no slashes. </li> 144 <li> s6-supervise sends a <tt>'u'</tt> event to <tt>./event</tt> whenever it 145 successfully spawns <tt>./run</tt>. </li> 146 <li> If there is a <tt>./notification-fd</tt> file in the service directory and, 147 at some point after the service has been spawned, s6-supervise is told that the 148 service is ready, it sends a <tt>'U'</tt> event to <tt>./event</tt>. There are 149 several ways to tell s6-supervise that the service is ready: 150 <ul> 151 <li> the daemon may <a href="notifywhenup.html">do so itself</a>. </li> 152 <li> the run script may have forked a 153 <a href="s6-notifyoncheck.html">s6-notifyoncheck</a> process that polls the 154 service for readiness. </li> 155 </ul> </li> 156 <li> When <tt>./run</tt> dies, s6-supervise sends a <tt>'d'</tt> event to <tt>./event</tt>. 157 It then spawns <tt>./finish</tt> if it exists. 158 <tt>./finish</tt> will have <tt>./run</tt>'s exit code as first argument, or 256 if 159 <tt>./run</tt> was signaled; it will have the number of the signal that killed <tt>./run</tt> 160 as second argument, or an undefined number if <tt>./run</tt> was not signaled; 161 and it will have <em>servicedir</em> as third argument. </li> 162 <li> By default, <tt>./finish</tt> must exit in less than 5 seconds. If it takes more than that, 163 s6-supervise kills it with a SIGKILL. This can be configured via the 164 <tt>./timeout-finish</tt> file, see the description in the 165 <a href="servicedir.html">service directory page</a>. </li> 166 <li> When <tt>./finish</tt> dies (or is killed), 167 s6-supervise sends a <tt>'D'</tt> event to <tt>./event</tt>. Then 168 it restarts <tt>./run</tt> unless it has been told not to. </li> 169 <li> If <tt>./finish</tt> exits 125, then s6-supervise sends a <tt>'O'</tt> event 170 to <tt>./event</tt> <em>before</em> the <tt>'D'</tt>, and it 171 <strong>does not restart the service</strong>, as if <tt>s6-svc -O</tt> had 172 been called. This can be used to signify permanent failure to start the service. </li> 173 <li> There is a minimum 1-second delay between two <tt>./run</tt> spawns, to avoid busylooping 174 if <tt>./run</tt> exits too quickly. If the service has been <em>ready</em> for more 175 than one second, it will restart immediately, but if it is not <em>ready</em> when 176 it dies, s6-supervise will always pause for 1 second before spawning it again. </li> 177 <li> When killed or asked to exit, it waits for the service to go down one last time, then 178 sends a <tt>'x'</tt> event to <tt>./event</tt> before exiting 0. </li> 179 </ul> 180 181 <p> 182 Make sure to also check the <a href="servicedir.html">service directory</a> 183 documentation page, for the full list of files that can be present in a service 184 directory and impact s6-supervise's behaviour in any way. 185 </p> 186 187 <h2> Usage notes </h2> 188 189 <ul> 190 <li> s6-supervise is a long-lived process. It normally runs forever, from the system's 191 boot scripts, until shutdown time; it should not be killed or told to exit. If you have 192 no use for a service, just turn it off; the s6-supervise process does not hurt. </li> 193 <li> Even in boot scripts, s6-supervise should normally not be run directly. It's 194 better to have a collection of <a href="servicedir.html">service directories</a> in a 195 single <a href="scandir.html">scan directory</a>, and just run 196 <a href="s6-svscan.html">s6-svscan</a> on that scan directory. s6-svscan will spawn 197 the necessary s6-supervise processes, and will also take care of logged services. </li> 198 <li> s6-supervise always spawns its child in a new session, as a session leader. 199 The goal is to protect the supervision tree from misbehaved services that would 200 send signals to their whole process group. Nevertheless, s6-supervise's handling of 201 SIGINT ensures that its service is killed if you happen to run it in a terminal and 202 send it a ^C. </li> 203 <li> You can use <a href="s6-svc.html">s6-svc</a> to send commands to the s6-supervise 204 process; mostly to change the service state and send signals to the monitored 205 process. </li> 206 <li> You can use <a href="s6-svok.html">s6-svok</a> to check whether s6-supervise 207 is successfully running. </li> 208 <li> You can use <a href="s6-svstat.html">s6-svstat</a> to check the status of a 209 service. </li> 210 <li> s6-supervise maintains internal information inside the <tt>./supervise</tt> 211 subdirectory of <em>servicedir</em>. <em>servicedir</em> itself can be read-only, 212 but both <em>servicedir</em><tt>/supervise</tt> and <em>servicedir</em><tt>/event</tt> 213 need to be read-write. </li> 214 <li> If <em>servicedir</em> isn't writable by s6-supervise, for any reason, then the 215 <a href="s6-svc.html">s6-svc</a> <tt>-D</tt> and <tt>-U</tt> commands will not work 216 properly since s6-supervise will be unable to create or delete a 217 <em>servicedir</em><tt>/down</tt> file; in this case s6-supervise will print a warning 218 on stderr, and perform the equivalent of <tt>-d</tt> or <tt>-u</tt> instead — it 219 will just be unable to change the permanent service configuration. </li> 220 </ul> 221 222 <h2> Implementation notes </h2> 223 224 <ul> 225 <li> s6-supervise tries its best to stay alive and running despite possible 226 system call failures. It will write to its standard error everytime it encounters a 227 problem. However, unlike <a href="s6-svscan.html">s6-svscan</a>, it will not go out 228 of its way to stay alive; if it encounters an unsolvable situation, it will just 229 die. </li> 230 <li> Unlike other "supervise" implementations, s6-supervise is a fully asynchronous 231 state machine. That means that it can read and process commands at any time, even 232 when the machine is in trouble (full process table, for instance). </li> 233 <li> s6-supervise <em>does not use malloc()</em>. That means it will <em>never leak 234 memory</em>. <small>However, s6-supervise uses opendir(), and most opendir() 235 implementations internally use heap memory - so unfortunately, it's impossible to 236 guarantee that s6-supervise does not use heap memory at all.</small> </li> 237 <li> s6-supervise has been carefully designed so every instance maintains as little 238 data as possible, so it uses a very small 239 amount of non-sharable memory. It is not a problem to have several 240 dozens of s6-supervise processes, even on constrained systems: resource consumption 241 will be negligible. </li> 242 </ul> 243 244 </body> 245 </html>