s6-permafailon.html (4121B)
1 <html> 2 <head> 3 <meta name="viewport" content="width=device-width, initial-scale=1.0" /> 4 <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 5 <meta http-equiv="Content-Language" content="en" /> 6 <title>s6: the s6-permafailon program</title> 7 <meta name="Description" content="s6: the s6-permafailon program" /> 8 <meta name="Keywords" content="s6 supervision finish permanent failure service" /> 9 <!-- <link rel="stylesheet" type="text/css" href="//skarnet.org/default.css" /> --> 10 </head> 11 <body> 12 13 <p> 14 <a href="index.html">s6</a><br /> 15 <a href="//skarnet.org/software/">Software</a><br /> 16 <a href="//skarnet.org/">skarnet.org</a> 17 </p> 18 19 <h1> The <tt>s6-permafailon</tt> program </h1> 20 21 <p> 22 <tt>s6-permafailon</tt> is a program that is meant to be used 23 in the <tt>./finish</tt> script of a 24 <a href="servicedir.html">service directory</a> supervised by 25 <a href="s6-supervise.html">s6-supervise</a>. When used, it 26 reads and analyses the death tally of a service (i.e. the recent 27 process death events that happened), and if the death tally 28 matches a given pattern, it causes <em>permanent failure</em> 29 of the service, i.e. it tells the supervisor not to try and 30 restart it. 31 </p> 32 33 <h2> Interface </h2> 34 35 <pre> 36 s6-permafailon <em>secs</em> <em>deathcount</em> <em>events</em> <em>prog...</em> 37 </pre> 38 39 <ul> 40 <li> <tt>s6-permafailon</tt> must have the service directory of the 41 tested service as its current directory. This is the default if it is 42 called from the <tt>finish</tt> script of the service. </li> 43 <li> It reads the <em>death tally</em> of the service, which is 44 maintained by <a href="s6-supervise.html">s6-supervise</a>. </li> 45 <li> If the supervised process has died at least <em>deathcount</em> 46 times in the last <em>secs</em> seconds with a cause listed in 47 <em>events</em>, then <tt>s6-permafailon</tt> exits 125. </li> 48 <li> Else <tt>s6-permafailon</tt> execs into <em>prog...</em>. </li> 49 </ul> 50 51 <p> 52 <em>events</em> is a comma-separated list of events. An event can be 53 one of the following: 54 </p> 55 56 <ul> 57 <li> An exit code, which is an integer between 0 and 255. Example: <tt>1</tt> </li> 58 <li> An exit code interval, which is two exit codes separated by a dash. Example: <tt>1-50</tt> </li> 59 <li> A signal name, or a signal number preceded by "SIG". Examples: <tt>SIGTERM</tt>, <tt>sigabrt</tt>, <tt>sig11</tt> </li> 60 </ul> 61 62 <h2> Usage </h2> 63 64 <ul> 65 <li> <a href="s6-supervise.html">s6-supervise</a> detects when the <tt>./finish</tt> 66 script of its service exits 125, and stops respawning the service. So, if the 67 <tt>./finish</tt> script is a chain-loading command line starting with a 68 <tt>s6-permafailon</tt> invocation (or containing such an invocation), when 69 <tt>s6-permafailon</tt> exits 125, then the <tt>./finish</tt> script also 70 exits 125 (because it is the same process), and the service is then marked as 71 failing permanently. </li> 72 <li> The <tt>./finish</tt> script is <em>naturally</em> a chain-loading 73 command line if it is written in the 74 <a href="//skarnet.org/software/execline/">execline</a> language. It 75 can also be made into a chain-loading command line from a shell script by using 76 <tt>exec s6-permafailon secs deathcount events rest-of-chainloading-cmdline...</tt> </li> 77 <li> Multiple invocations of <tt>s6-permafailon</tt> can be chained, in order 78 to test several death patterns. </li> 79 <li> If a permanent failure is triggered and <em>secs</em> is high, it is 80 possible that when the administrator manually launches the service again, 81 the next death triggers a permanent failure again. If this is not wanted, 82 the administrator should clear the death tally with the 83 <a href="s6-svdt-clear.html">s6-svdt-clear</a> command. </li> 84 <li> The current death tally can be viewed via the <a href="s6-svdt.html">s6-svdt</a> 85 command. </li> 86 </ul> 87 88 <h2> Example </h2> 89 90 <p> 91 <tt>s6-permafailon 60 5 1,101-103,SIGSEGV,SIGBUS <em>prog...</em></tt> 92 will exit 125 if the service has died 5 times in the last 60 seconds with 93 an exit code of 1, 101, 102 or 103, a SIGSEGV or a SIGBUS. Else it will 94 chainload into the <em>prog...</em> command line. 95 </p> 96 97 </body> 98 </html>