• Supervisor 3.0a1 Released

    Posted by Chris McDonough on August 16th, 2007   -   Post a comment

    Supervisor 3.0a1 is the latest release of supervisor .

    v3.0a1 is a major feature release version. Most of the features were commissioned by Maintainable Software. As well as contributing development funding, members of Maintainable have contributed a good deal of code to the supervisor codebase.

    The major new features in 3.0a1 as compared to the prior (2.2b1) release:

    • An event notification system. Supervisor now sends event
      notifications during normal operations for things like subprocess
      start and stop, supervisor startup and shutdown, and “process
      communications”. Users can write “event listeners” (which run as
      subprocesses) and subscribe them to these events selectively.
      Event listeners can perform arbitrary actions when an event
      notification occurs, such as sending an email, posting to an HTTP
      server, etc.
    • Subprocesses can emit data on stdout or stderr between special
      tokens which cause supervisor to perform a “process
      communications” event notification.
    • Process groups. Two types of process groups can be defined via
      the config file: “homogeneous” and “heterogeneous”. “Homogeneous”
      process groups are process groups comprised of very similar
      processes (e.g. many instances of an application server).
      “Heterogeneous” process groups are process groups comprised of
      potentially differing process types (e.g. “processes related to
      customer X”). Eventually it will be possible to start and stop
      process groups from within supervisorctl and the web interface
      (this feature is currently yet-to-be-implemented).
    • The XML-RPC interface API can be extended in arbitrary ways by
      registering new top-level namespace factories.
    • Stdout and stderr of processes may now be logged independently.
    • Improved web interface styling.

    As with any alpha release, the API and config file format is subject to change, so I wouldn’t recommend deploying this release to production.

    Supervisor 3.0a1 can be installed via “easy_install supervisor” if you’re using an internet-connected machine with setuptools installed into your Python.

    I’ve created a maillist for users of supervisor and as always, bug reports can be sent to the collector.

  • Supervisor 2.2b1 Released

    Posted by Chris McDonough on April 1st, 2007   -   Post a comment

    The new release contains these changes:

    • Individual program configuration sections can now specify an environment.
    • Added a version command to supervisorctl. This returns the version of the supervisor2 package which the remote supervisord process is using.
  • Supervisor 2.1 Released

    Posted by Chris McDonough on March 17th, 2007   -   Post a comment

    Between the band listening and pig pickin I was able to release a new version of the supervisor2 program. This 2.1 release is a bugfix release, and fixes several important problems. Here’s the changelist:

    • When supervisord was invoked more than once, and its configuration was set up to use a UNIX domain socket as the HTTP server, the socket file would be erased in error. The symptom of this was that a subsequent invocation of supervisorctl could not find the socket file, so the process could not be controlled (it and all of its subprocesses would need to be killed by hand).
    • Close subprocess file descriptors properly when a subprocess exits or otherwise dies. This should result in fewer “too many open files to spawn foo” messages when supervisor is left up for long periods of time.
    • When a process was not killable with a “normal” signal at shutdown time, too many “INFO: waiting for x to die” messages would be sent to the log until we ended up killing the process with a SIGKILL. Now a maximum of one every three seconds is sent up until SIGKILL time. Thanks to Ian Bicking.
    • Add an assertion: we never want to try to marshal None to XML-RPC callers. Issue 223 in the collector from vgatto indicates that somehow a supervisor XML-RPC method is returning None (which should never happen), but I cannot identify how. Maybe the assertion will give us more clues if it happens again.
    • Supervisor would crash when run under Python 2.5 because the xmlrpclib.Transport class in Python 2.5 changed in a backward-incompatible way. Thanks to Eric Westra for the bug report and a fix.
    • Tests now pass under Python 2.5.
    • Better supervisorctl reporting on stop requests that have a FAILED status.
    • Removed duplicated code (readLog/readMainLog), thanks to Mike Naberezny.

    I can’t thank Chris Calloway, the organizer of the sprint, enough for his generosity while we’ve been here at Chapel Hill. This was an amazing sprint.

  • Supervisor 2.1b1 Released

    Posted by Chris McDonough on August 30th, 2006   -   Post a comment

    Download at http://www.plope.com/software/supervisor2/supervisor-2.1b1.tar.gz/download

    Changes from 2.0:

    • “supervisord -h” and “supervisorctl -h” did not work (traceback instead of showing help view (thanks to Damjan from Macedonia for the bug report).
    • Processes which started successfully after failing to start initially are no longer reported in BACKOFF state once they are started successfully (thanks to Damjan from Macdonia for the bug report).
    • Add new maintail command to supervisorctl shell, which allows you to tail the main supervisor log. This uses a new readMainLog xmlrpc API.
    • Various process-state-transition related changes, all internal. README.txt updated with new state transition map.
    • startProcess and startAllProcesses xmlrpc APIs changed: instead of accepting a timeout integer, these accept a wait boolean (timeout is implied by process’ “startsecs” configuration). If wait is False, do not wait for startsecs.
  • Supervisor vs. Launchd

    Posted by Chris McDonough on August 8th, 2006   -   Post a comment

    I didn’t know that Apple’s launchd source code had been available. But it is, and as of recently apparently under the Apache license.

    I haven’t really audited the code very closely (and likely won’t), but the first example given for launchd in its sparse narrative documentation is this:

      The basics:
    
      For the simplest of scenarios, launchd just keeps a process alive. A simple "hello world" example of that would be:
    
      <?xml version="1.0" encoding="UTF-8"?>
      <!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
      <plist version="1.0">
      <dict>
              <key>Label</key>
    
            <string>com.example.sleep</string>
            <key>ProgramArguments</key>
            <array>
                    <string>sleep</string>
    
                    <string>100</string>
            </array>
            <key>OnDemand</key>
            <false/>
      </dict>
      </plist>
    
      In the above example, we have three keys to our top level
      dictionary. The first is the Label which is what is used to uniquely
      identify jobs when interacting with launchd. The second is
      ProgramArguments which for its value, we have an array of strings
      which represent the tokenized arguments and the program to run. The
      third and final key is OnDemand which overrides the default value of
      true with false thereby instructing launchd to always try and keep
      this job running. That's it! A Label, some ProgramArguments and
      OnDemand set to false is all you need to keep a daemon alive with
      launchd!
    
      Now if you've ever written a daemon before, you've either called the
      daemon() function or written one yourself. With launchd, that is not
      only unnecessary, but unsupported. If you try and run a daemon you
      didn't write under launchd, you must, at the very least, find a
      configuration option to keep the daemon from daemonizing itself so
      that launchd can monitor it.
    

    Looks pretty familiar! Under supervisor, the same configuration would
    be:

      [program:sleep]
      command=sleep 100
    

    The examples go on to show other options that supervisor already has, including starting a process under a specific uid/gid, putting stdout and stderr into a named log file, and starting a process that should not be restarted automatically.

    While investigating further, I read the launchd manpage, which goes on to show things which launchd has that supervisord doesn’t, like:

    • environment variables on a per-process basis
    • working directory on a per-process basis
    • umask on a per-process basis
    • nice value on a per-process basis
    • setting resource consumption limits for processes
    • whether to call or not call initgroups(2) before process launch
    • scheduled process kickoff
    • emulating inetd services
    • causing processes to be kicked off based on modification of a file or the contents of a directory as a queue
    • special “service ipc” behavior which programs run under launchd can implement to talk to the launchd service they’re running under.
    • separate log files for stderr and stdout of a process
    • some other stuff that i can’t understand because the launchd.conf manpage seems to have been written under the influence of a substance which has negatively effected the author’s grammar and sentence structure

    Things that supervisord has that launchd doesn’t seem to have:

    • a shell-like or web interactive user interface
    • remote control of the master process via xml-rpc
    • process logfile backup and rotation
    • control over how many times and at what frequency a failing program will restart
    • a configurable stop signal for subprocesses

    I guess the main question, given this comparison, is why would I not use/develop launchd and ditch supervisor? Good question, they do about the same thing. It could use a little loving as a general-purpose process manager. Supervisor has a far more “retail” focus; for example, it doesn’t use generic object serialization as a configuration language and it has two interactive user interfaces and can be scripted via XML-RPC. launchd seems quite nice but it’s got too much focus for my taste on being “pid 1″ as opposed to being a retail process manager in sort of the same way that djb’s daemontools has too much focus on security as opposed to being a process manager for my taste.

    All that said, launchd seems very nice. But I guess the real reason to continue development of supervisor is that I’m far more competent in Python than I am in C, and as a result there’s likely not much value I could add to launchd development.

    So. Given that. The first five things that launchd doesn’t have but supervisord does would be very easy to add to supervisord if it were desired, and I might go ahead and add that stuff to the next release, why not? We always call initgroups, iirc; I’ll need to check out why folks might want to set this. Scheduled processing would likely be a nice-to-have, and I’ve actually run into a case where it would be nice to just not use cron, so maybe I’ll work on that too. I suspect it’s not useful for supervisord to do the job of inetd, so it likely never will. The filesystem polling I suspect is outside the scope of supervisor entirely, given that it would likely be difficult to do with any competence across many platforms. It’s likely useful to be able to log stderr and stdout into independent files, so this might go into supervisord.

    As an aside, here’s an interesting chunk from the launchd manpage which is likely useful information to give to people running processes under supervisord:

      EXPECTATIONS
    
         Daemons or agents managed by launchd are expected to behave certain ways.
    
         A daemon or agent launched by launchd MUST NOT do the following in the
         process directly launched by launchd:
    
               o   fork(2) and have the parent process exit(3) or _exit(2).
               o   Call daemon(3)
    
         A daemon or agent launched by launchd SHOULD NOT do the following as a
         part of their startup initialization:
    
               o   Setup the user ID or group ID.
               o   Setup the working directory.
               o   chroot(2)
               o   setsid(2)
               o   Close "stray" file descriptors.
               o   Change stdio(3) to /dev/null.
               o   Setup resource limits with setrusage(2).
               o   Setup priority with setpriority(2).
               o   Ignore the SIGTERM signal.
    
  • Supervisor 2.0 Final Released

    Posted by Chris McDonough on August 3rd, 2006   -   Post a comment

    Supervisor2 2.0 is out. Changes from 2.0b1 to 2.0 include:

    • pidfile written in daemon mode had incorrect pid.
    • supervisorctl: tail (non -f) did not pass through proper error messages when supplied by the server.
    • Log signal name used to kill processes at debug level.
    • supervisorctl “tail -f” didn’t work with supervisorctl sections configured with an absolute unix:// URL
    • New “environment” config file option allows you to add environment variable values to supervisord environment from config file.
  • Supervisor 2.0b1 Released

    Posted by Chris McDonough on July 12th, 2006   -   Post a comment

    The “old” supervisor “1″ was a multiprocess controller a lot like DJB’s daemontools but written in Python, and a lot less secure and paranoid. So is this one, except:

    • It now uses XML-RPC for command-line-client-to-server communications. This also means the server can be controlled by processes other than its vanillia client via XML-RPC (think “up-down check tools”).
    • It has a web interface that can do approximately the same thing as the command-line client.
    • It uses ConfigParser rather than ZConfig for its configuration files.

    The current release is a beta release (2.0b1). It can be obtained from http://www.plope.com/software/supervisor2/ .

    Many thanks to Ian Bicking and Mike Naberezny for bugreports and support for supervisor 1. Thanks to Titus Brown for writing a supervisor 1 HOWTO, which prodded me to create better documentation for supervisor 2.

  • Supervisor 1.0.7 Released

    Posted by Chris McDonough on July 11th, 2006   -   Post a comment

    Get it supervisor 1.0.7 at http://www.plope.com/software/supervisor/supervisor-1.0.7.tgz/download. Supervisor is a program that allows you to control process state under UNIX. This is a minor bugfix release that fixes the following symptoms:

    On Mac OS X and FreeBSD, “waitpid error” could still show up in log files when no processes were running.

  • Supervisor 2

    Posted by Chris McDonough on June 27th, 2006   -   Post a comment

    Supervisor is a program that allows you to manage the state of other programs. I use it a lot for customer engagements, where it makes it easier to manage all of the UNIXisms of process startup for both development and production. It grew out of Guido’s zdaemon code, which was capable of managing one process only.

    The current version of supervisor uses a homegrown wire protocol which is capable of being used over a TCP connection or a UNIX domain socket to allow communication between the process supervisor and the “controller” (supervisorctl). supervisorct is the user interface portio. It’s currently a very limited command-line program. The process supervisor is configured via ZConfig.

    What I’d like to see for supervisor 2 (broad goals):

    • Replace the home grown wire protocol with XML-RPC to allow for better extensibility.
    • Replace ZConfig with ConfigParser to prevent us from needing to install it too.
    • Web interface for observing and controlling process status and reading log output.
    • Better status reporting.

    I’ve done a bit of work towards all of these things, still working out some pieces.

  • Supervisor 1.0.6 Released

    Posted by Chris McDonough on November 20th, 2005   -   Post a comment

    Download it here

    From the CHANGES.txt file:

    
        - Various tweaks to make run more effectively on Mac OS X
          (including fixing tests to run there, no more "error reading
          from fd XXX" in logtail output, reduced disk/CPU usage as a
          result of not writing to log file unnecessarily on Mac OS).