systemd: Type=simple and avoiding forking considered harmful?

(This came up in a discussion on debian-user-french@l.d.o)

When converting from sysvinit scripts to systemd init files, the default practice seems to be to start services without forking, and to use Type=simple in the service description.

What Type=simple does is, well, simple. from systemd.service(5):

If set to simple (the default value if neither Type= nor BusName= are specified), it is expected that the process configured with ExecStart= is the main process of the service. In this mode, if the process offers functionality to other processes on the system, its communication channels should be installed before the daemon is started up (e.g. sockets set up by systemd, via socket activation), as systemd will immediately proceed starting follow-up units.

In other words, systemd just runs the command described in ExecStart=, and it’s done: it considers the service is started.

Unfortunately, this causes a regression compared to the sysvinit behaviour, as described in #778913: if there’s a configuration error, the process will start and exit almost immediately. But from systemd’s point-of-view, the service will have been started successfully, and the error only shows in the logs:

root@debian:~# systemctl start ssh
root@debian:~# echo $?
root@debian:~# systemctl status ssh
● ssh.service - OpenBSD Secure Shell server
 Loaded: loaded (/lib/systemd/system/ssh.service; enabled)
 Active: failed (Result: start-limit) since mer. 2015-05-13 09:32:16 CEST; 7s ago
 Process: 2522 ExecStart=/usr/sbin/sshd -D $SSHD_OPTS (code=exited, status=255)
 Main PID: 2522 (code=exited, status=255)
mai 13 09:32:16 debian systemd[1]: ssh.service: main process exited, code=exited, status=255/n/a
mai 13 09:32:16 debian systemd[1]: Unit ssh.service entered failed state.
mai 13 09:32:16 debian systemd[1]: ssh.service start request repeated too quickly, refusing to start.
mai 13 09:32:16 debian systemd[1]: Failed to start OpenBSD Secure Shell server.
mai 13 09:32:16 debian systemd[1]: Unit ssh.service entered failed state.

With sysvinit, this error is detected before the fork(), so it shows during startup:

root@debian:~# service ssh start
 [....] Starting OpenBSD Secure Shell server: sshd/etc/ssh/sshd_config: line 4: Bad configuration option: blah
 /etc/ssh/sshd_config: terminating, 1 bad configuration options

It’s not trivial to fix that. The implicit behaviour of sysvinit is that fork() sort-of signals the end of service initialization. The systemd way to do that would be to use Type=notify, and have the service signals that it’s ready using systemd-notify(1) or sd_notify(3) (or to use socket activation, but that’s another story). However that requires changes to the service. Returning to the sysvinit behaviour by using Type=forking would help, but is not really a solution: but what if some of the initialization happens *after* the fork? This is actually the case for sshd, where the socket is bound after the fork (see strace -f -e trace=process,network /usr/sbin/sshd), so if another process is listening on port 22 and preventing sshd to successfully start, it would not be detected.

I wonder if systemd shouldn’t do more to detect problems during services initialization, as the transition to proper notification using sd_notify will likely take some time. A possibility would be to wait 100 or 200ms after the start to ensure that the service doesn’t exit almost immediately. But that’s not really a solution for several obvious reasons. A more hackish, but still less dirty solution could be to poll the state of processes inside the cgroup, and assume that the service is started only when all processes are sleeping. Still, that wouldn’t be entirely satisfying…

10 thoughts on “systemd: Type=simple and avoiding forking considered harmful?

  1. Sure, but it might still hide problems, as in the port-22-already-in-use example.

    Also note that, quite interestingly, the systemd-sysv-generator service files (which use Type=forking) do not detect failures during initialization, because the scripts usually exit with 0 even when the services fail to start. So this causes a regression even for services without service files.

  2. The sysv-generator is probably not at fault there but rather the general failure of sysv-init scripts to behave properly. However a properly constructed service file for openssh ought to work nicely. I think it’s sad that jessie’s ssh.service appears to use Type=simple implicitly rather than using Type=forking, but that’s hardly systemd’s fault.

    I agree that we should perhaps mark Type=simple as ‘potentially harmful, certainly risky, caveat-packager’ for the next little while.

  3. Wait, doesn’t the return “Active: failed” mean that systemd knows it went wrong?

    Then, you say systemd doesn’t detect something sysv doesn’t, that is the port-in-use… But that’s a problem of the daemon. sshd should rather work with xinetd, and use socket activation. Let’s not make systemd into a nanny, it has enough complexity.

  4. Type=forking should work here, assuming one thing: the parent process should not exit until it has been notified by the child process that all is OK.

    So if port 22 is in use, the child (actually, grandchild) process should tell the parent process that it couldn’t bind to the desired address, and the parent process should printout a useful message and exit indicating failure.

    See the “SysV daemons” section of daemon(7) for details, in particular:

    “The process that invoked the daemon must be able to rely on [the fact] that [the parent’s] exit() happens after initialization is complete and all external communication channels are established and accessible.”

  5. @Derek:
    > Wait, doesn’t the return “Active: failed” mean that systemd knows it went wrong?

    It does, sure. But one could easily argue that “failing to parse config file” is an example of “failing to start”. Which is undetected at start time, because with Type=simple, systemd will just start the process and then consider it fully started (see exit code of 0 for systemctl start). So it gets detected, but not at start time. It’s not so obvious that, with Type=simple, it’s a good idea to check (with systemctl status) that the service is still running a few (milli)seconds after systemctl start.

    @Sam: interesting, I did not know about daemon(7). It seems that sshd doesn’t respect that convention, though.

  6. I agree that Type=simple is potentially dangerous. Along the same lines, it’s also dangerous for any service that listens to a socket to use Type=simple, since the socket won’t actually be open when the service “starts”.

    Type=simple makes sense for services that other services don’t depend on, and for which any failure can be treated as a runtime failure (just like a service crash). SSH is not one of those services.

  7. It would be helpful to have a list of daemons that do/do not work properly with Type=forking.

    On my system I disabled & stoped ssh.service, and enabled & started ssh.socket. That way systemd launches an sshd on demand when a client connects. With this setup you always have to test that you can connect after you edit sshd_config, however; an error in the config file will result in new connections being rejected as sshd shuts down immediately after being launched!

  8. I think the ExecPreStart option is there to prevent those issues. If you check the config before the process start, you’ll have an error starting the process. Something like:

    ExecStartPre=/usr/sbin/sshd -t
    ExecReload=/usr/bin/sshd -t ; /bin/kill -HUP $MAINPID

Comments are closed.