systemd: Type=simple and avoiding forking considered harmful?

(This came up in a discussion on debian-user-french@l.d.o)

When converting from sysvinit scripts to systemd init files, the default practice seems to be to start services without forking, and to use Type=simple in the service description.

What Type=simple does is, well, simple. from systemd.service(5):

If set to simple (the default value if neither Type= nor BusName= are specified), it is expected that the process configured with ExecStart= is the main process of the service. In this mode, if the process offers functionality to other processes on the system, its communication channels should be installed before the daemon is started up (e.g. sockets set up by systemd, via socket activation), as systemd will immediately proceed starting follow-up units.

In other words, systemd just runs the command described in ExecStart=, and it’s done: it considers the service is started.

Unfortunately, this causes a regression compared to the sysvinit behaviour, as described in #778913: if there’s a configuration error, the process will start and exit almost immediately. But from systemd’s point-of-view, the service will have been started successfully, and the error only shows in the logs:

root@debian:~# systemctl start ssh
root@debian:~# echo $?
0
root@debian:~# systemctl status ssh
● ssh.service - OpenBSD Secure Shell server
 Loaded: loaded (/lib/systemd/system/ssh.service; enabled)
 Active: failed (Result: start-limit) since mer. 2015-05-13 09:32:16 CEST; 7s ago
 Process: 2522 ExecStart=/usr/sbin/sshd -D $SSHD_OPTS (code=exited, status=255)
 Main PID: 2522 (code=exited, status=255)
mai 13 09:32:16 debian systemd[1]: ssh.service: main process exited, code=exited, status=255/n/a
mai 13 09:32:16 debian systemd[1]: Unit ssh.service entered failed state.
mai 13 09:32:16 debian systemd[1]: ssh.service start request repeated too quickly, refusing to start.
mai 13 09:32:16 debian systemd[1]: Failed to start OpenBSD Secure Shell server.
mai 13 09:32:16 debian systemd[1]: Unit ssh.service entered failed state.

With sysvinit, this error is detected before the fork(), so it shows during startup:

root@debian:~# service ssh start
 [....] Starting OpenBSD Secure Shell server: sshd/etc/ssh/sshd_config: line 4: Bad configuration option: blah
 /etc/ssh/sshd_config: terminating, 1 bad configuration options
 failed!
 root@debian:~#

It’s not trivial to fix that. The implicit behaviour of sysvinit is that fork() sort-of signals the end of service initialization. The systemd way to do that would be to use Type=notify, and have the service signals that it’s ready using systemd-notify(1) or sd_notify(3) (or to use socket activation, but that’s another story). However that requires changes to the service. Returning to the sysvinit behaviour by using Type=forking would help, but is not really a solution: but what if some of the initialization happens *after* the fork? This is actually the case for sshd, where the socket is bound after the fork (see strace -f -e trace=process,network /usr/sbin/sshd), so if another process is listening on port 22 and preventing sshd to successfully start, it would not be detected.

I wonder if systemd shouldn’t do more to detect problems during services initialization, as the transition to proper notification using sd_notify will likely take some time. A possibility would be to wait 100 or 200ms after the start to ensure that the service doesn’t exit almost immediately. But that’s not really a solution for several obvious reasons. A more hackish, but still less dirty solution could be to poll the state of processes inside the cgroup, and assume that the service is started only when all processes are sleeping. Still, that wouldn’t be entirely satisfying…

Debian Package of the Day revival (quite)

TL;DR: static version of http://debaday.debian.net/, as it was when it was shut down in 2009, available!

A long time ago, between 2006 and 2009, there was a blog called Debian Package of the Day. About once per week, it featured an article about one of the gems available in the Debian archive: one of those many great packages that you had never heard about.

At some point in November 2009, after 181 articles, the blog was hacked and never brought up again. Last week I retrieved the old database, generated a static version, and put it online with the help of DSA. It is now available again at http://debaday.debian.net/. Some of the articles are clearly outdated, but many of them are about packages that are still available in Debian, and still very relevant today.

Will the packages you rely on be part of Debian Jessie?

The start of the jessie freeze is quickly approaching, so now is a good time to ensure that packages you rely on will the part of the upcoming release. Thanks to automated removals, the number of release critical bugs has been kept low, but this was achieved by removing many packages from jessie: 841 packages from unstable are not part of jessie, and won’t be part of the release if things don’t change.

It is actually simple to check if you have packages installed locally that are part of those 841 packages:

  1. apt-get install how-can-i-help (available in backports if you don’t use testing or unstable)
  2. how-can-i-help --old
  3. Look at packages listed under Packages removed from Debian ‘testing’ and Packages going to be removed from Debian ‘testing’

Then, please fix all the bugs :-) Seriously, not all RC bugs are hard to fix. A good starting point to understand why a package is not part of jessie is tracker.d.o.

On my laptop, the two packages that are not part of jessie are the geeqie image viewer (which looks likely to be fixed in time), and josm, the OpenStreetMap editor, due to three RC bugs. It seems much harder to fix… If you fix it in time for jessie, I’ll offer you a $drink!

on the Dark Ages of Free Software: a “Free Service Definition”?

Stefano Zacchiroli opened DebConf’14 with an insightful talk titled Debian in the Dark Ages of Free Software (slides available, video available soon).

He makes the point (quoting slide 16) that the Free Software community is winning a war that is becoming increasingly pointless: yes, users have 100% Free Software thin client at their fingertips [or are really a few steps from there]. But all their relevant computations happen elsewhere, on remote systems they do not control, in the Cloud.

That give-up on control of computing is a huge and important problem, and probably the largest challenge for everybody caring about freedom, free speech, or privacy today. Stefano rightfully points out that we must do something about it. The big question is: how can we, as a community, address it?

Towards a Free Service Definition?

I believe that we all feel a bit lost with this issue because we are trying to attack it with our current tools & weapons. However, they are largely irrelevant here: the Free Software Definition is about software, and software is even to be understood strictly in it, as software programs. Applying it to services, or to computing in general, doesn’t lead anywhere. In order to increase the general awareness about this issue, we should define more precisely what levels of control can be provided, to understand what services are not providing to users, and to make an informed decision about waiving a particular level of control when choosing to use a particular service.

Benjamin Mako Hill pointed out yesterday during the post-talk chat that services are not black or white: there aren’t impure and pure services. Instead, there’s a graduation of possible levels of control for the computing we do. The Free Software Definition lists four freedoms — how many freedoms, or types of control, should there be in a Free Service Definition, or a Controlled-Computing Definition? Again, this is not only about software: the platform on which a particular piece of software is executed has a huge impact on the available level of control: running your own instance of WordPress, or using an instance on wordpress.com, provides very different control (even if as Asheesh Laroia pointed out yesterday, WordPress does a pretty good job at providing export and import features to limit data lock-in).

The creation of such a definition is an iterative process. I actually just realized today that (according to Wikipedia) the very first occurrence of an attempt at a Free Software Definition was published in 1986 (GNU’s bulletin Vol 1 No.1, page 8) — I thought it happened a couple of years earlier. Are there existing attempts at defining such freedoms or levels of controls, and at benchmarking such criteria against existing services? Such criteria would not only include control over software modifications and (re)distribution, but also likely include mentions of interoperability and open standards, both to enable the user to move to a compatible service, and to avoid forcing the user to use a particular implementation of a service. A better understanding of network effects is also needed: how much and what type of service lock-in is acceptable on social networks in exchange of functionality?

I think that we should inspire from what was achieved during the last 30 years on Free Software. The tools that were produced are probably irrelevant to address this issue, but there’s a lot to learn from the way they were designed. I really look forward to the day when we will have:

  • a Free Software Definition equivalent for services
  • Debian Free Software Guidelines-like tests/checklist to evaluate services
  • an equivalent of The Cathedral and the Bazaar, explaining how one can build successful business models on top of open services

Exciting times!

Introducing the Debian Maintainer Dashboard — help needed!

The brand new machine for Ultimate Debian Database motivated me to do UDD-related work again, so I implemented an old idea: build a maintainer/team-centric dashboard relying on UDD, the Debian Maintainer Dashboard.

The idea is to expose as much useful information as possible about a maintainer’s packages, both in the traditional (Developers Packages Overview-like) “big table with all the info”, but also in a task-oriented listing (to answer the “ok, I have two hours for Debian, what should I do now?” question).

At this point, the Debian Maintainer Dashboard already brings several cool features (try it with my packages or pkg-ruby’s packages, or your own!):

  • Reports buildd issues (failed builds)
  • Upstream status monitor that works (UDD has its own)
  • Knows about teams’ VCS (if your team uses PET) and display the current version in the VCS
  • Lists packages you maintain or co-maintain, and also the package you showed interest for by subscribing to them on the PTS

Of course, as of now, it’s pretty ugly: that’s the “help needed” part of the post. I really suck at web design, and would really appreciate if someone would help me to turn it into something nice to use. Besides adding links and colors everywhere, it could probably be nice to add sortable tables, and to create tabs for “TODO items”, “Versions” and “Bugs” (and possibly Lintian, for example). Contact me if you are interested.

Debian archive rebuilds on Amazon Web Services

I like to think that archive rebuilds play an important role in Debian Quality Assurance and Release Management efforts. By trying to rebuild every Debian package from source, one can identify packages that do not build anymore due to changes in other packages (compilers, interpreters, libraries, …). It is also a good way to stress-test all packages that are involved in building other packages.

Since 2007, I had been running Debian archive rebuilds on the Grid’5000 testbed, a research infrastructure for performing experiments on distributed systems – HPC/Grid/Cloud/P2P. I filed more than 6000 release-critical bugs in the process.

Late last year, Amazon kindly offered us a grant to allow us to run such QA tests on Amazon Web Services. With Sébastien Badia, we ported the rebuild infrastructure to AWS (scripts), and several rebuilds have already been carried out on AWS.

On the technical level, 50 to 100 EC2 spot instances are started, and then controlled from a master instance using SSH. On build instances, a classic sbuild setup is used. Logs are retrieved to the master node after rebuilds, and build instances are simply shut down when there are no more tasks to process. Several tasks are processed simultaneously on each instance, and when they fail, they are retried again with no other concurrent build on the same instance, to eliminate random failures caused by load or timing issues. All the scripts are designed to support other kind of QA tests, not just rebuilds.

Moving to Amazon Web Services will facilitate sharing the human workload of doing those tests. It is now possible for developers interested in custom tests to do them themselves (hint hint).

Update of Debian Packaging Tutorial

I’ve just updated the Debian Packaging Tutorial. This new version addresses a few comments and questions I received over the past months.

Note that there are also french and spanish versions of the tutorial, and I’m of course open to adding other translations.

The tutorial can be found in the packaging-tutorial package (PDF files are in /usr/share/doc/packaging-tutorial/), or on www.debian.org (see links above).

Re: Ubuntu vs Ruby

(original post)

> If Ubuntu 12.04 if a LTS release, and Ruby 1.8.7 goes out of support in June of
> 2013, then why is the default still 1.8.7?
>
> Ruby 1.9.2 was released in 2010. Ruby 1.9.3 was released in October of this year.

First, there’s almost nobody in the Ubuntu development community doing any Ruby work. Packages are just imported from Debian, and Ubuntu follows what is done on the Debian side.

In the Debian/Ruby team, we are currently transitioning to a new packaging helper (gem2deb) that makes it much
easier to support several Ruby versions. Once this will be done, switching to 1.9.3 by default will be very easy. We already provide a way for the sysadmin to change the default on a system.

Now, doing that transition takes time, and we could have used *your* help (and could still use it). We are still quite on time to do it for Debian wheezy, but it sounds very hard to do it for Ubuntu 12.04 unless someone from Ubuntu steps
up to help.

dash as /bin/sh, and now ld –as-needed. Pattern?

I must admit that I’ve never been a big fan of the dash as /bin/sh change. I have three main problems with the switch:

POSIX compliance as an argument

Complying to standards is a really good thing. But when everybody is ignoring the standard because they want the comfort of newer features, maybe it’s a sign that the standard should be updated to include those newer features. Most of the bashims used everywhere in scripts were signifiant improvements, like the ability to write:
cp {short1/path1,short2/path2}/very/long/common/path/to/a/file
instead of:
cp short1/path1/very/long/common/path/to/a/file short2/path2/very/long/common/path/to/a/file

The option to improve bash was not fully explored

We started with the premise that bash is bloated, slow, and cannot be improved. Maybe you can help me with that, but I could only find a few simplistic benchmarks comparing dash and bash, and I could not find any analysis of why bash is slow, and why it cannot be improved.
One of the obvious problems is that bash is also an interactive shell, and is linked to ncurses and terminfo, which increases the startup time. But have we investigated the possibility to have a /bin/bash-non-interactive binary that would not be linked to ncurses?

The change was brought to users

While it is OK for Debian (or Ubuntu, in that case, since that change was done in Ubuntu first) to force its developers to use POSIX-compliant scripts, the switch could have been made only to Debian-created scripts (by switching them from a /bin/sh shebang to a /bin/dash shebang, for example). I have trouble justifying that this change was forced on users as well.

Next: linker changes

… and we are doing it again. A set of linker changes (see also the Ubuntu page) was already done in Ubuntu, and is very likely to be done in Debian as well. This switch requires deep changes in some buildsystems (it requires ordering of libraries and forbids indirect dependencies), and is rather painful (it was reverted before the Ubuntu 11.04 release because it was not possible to fix all the packages during the natty release cycle, but is done in the 11.10 release). Of course, there are justifications for this change. But I’m not sure that it’s worth all the trouble created for users.