systemd services, and queue management?

I’ve been increasingly using systemd timers as a replacement for cron jobs. The fact that you get free logging is great, and also the fact that you don’t have to care about multiple instances running simultaneously.

However, sometimes I would be interested in more complex scenarios, such as:

  • I’d like to trigger a full run of the service unit: if the service is not running, it should be started immediately. If it’s currently running, it should be started again when it terminates.
  • Same as the above, but with queue coalescing: If I do the above multiple times in a row, I only want the guarantee that there’s one full run of the service after the last time I triggered it (typical scenario: each run processes all pending events, so there’s no point in running multiple times).

Is this doable with systemd? If not, how do people do that outside of systemd?

Implementing “right to disconnect” by delaying outgoing email?

France passed a law about “right to disconnect” (more info here or here). The idea of not sending professional emails when people are not supposed to read them in order to protect their private lifes, is a pretty good one, especially when hierarchy is involved. However, I tend to do email at random times, and I would rather continue doing that, but just delay the actual sending of the email to the appropriate time (e.g., when I do email in the evening, it would actually be sent the following morning at 9am).

I wonder how I could make this fit into my email workflow. I write email using mutt on my laptop, then push it locally to nullmailer, that then relays it,  over an SSH tunnel, to a remote server (running Exim4).

Of course the fallback solution would be to use mutt’s postponing feature. Or to draft the email in a text editor. But that’s not really nice, because it requires going back to the email at the appropriate time. I would like a solution where I would write the email, add a header (or maybe manually add a Date: header — in all cases that header should reflect the time the mail was sent, not the time it was written), send the email, and have nullmailer or the remote server queue it until the appropriate time is reached (e.g., delaying while “current_time < Date header in email”). I don’t want to do that for all emails: e.g. personal emails can go out immediately.

Any ideas on how to implement that? I’m attached to mutt and relaying using SSH, but not attached to nullmailer or exim4. Ideally the delaying would happen on my remote server, so that my laptop doesn’t need to be online at the appropriate time.

Update: mutt does not allow to set the Date: field manually (if you enable the edit_headers option and edit it manually, its value gets overwritten). I did not find the relevant code yet, but that behaviour is mentioned in that bug.

Update 2: ah, it’s this code in sendlib.c (and there’s no way to configure that behaviour):

 /* mutt_write_rfc822_header() only writes out a Date: header with
 * mode == 0, i.e. _not_ postponment; so write out one ourself */
 if (post)
   fprintf (msg->fp, "%s", mutt_make_date (buf, sizeof (buf)));

The Linux 2.5, Ruby 1.9 and Python 3 release management anti-pattern

There’s a pattern that comes up from time to time in the release management of free software projects.

To allow for big, disruptive changes, a new development branch is created. Most of the developers’ focus moves to the development branch. However at the same time, the users’ focus stays on the stable branch.

As a result:

  • The development branch lacks user testing, and tends to make slower progress towards stabilization.
  •  Since users continue to use the stable branch, it is tempting for developers to spend time backporting new features to the stable branch instead of improving the development branch to get it stable.

This situation can grow up to a quasi-deadlock, with people questioning whether it was a good idea to do such a massive fork in the first place, and if it is a good idea to even spend time switching to the new branch.

To make things more unclear, the development branch is often declared “stable” by its developers, before most of the libraries or applications have been ported to it.

This has happened at least three times.

First, in the Linux 2.4 / 2.5 era. Wikipedia describes the situation like this:

Before the 2.6 series, there was a stable branch (2.4) where only relatively minor and safe changes were merged, and an unstable branch (2.5), where bigger changes and cleanups were allowed. Both of these branches had been maintained by the same set of people, led by Torvalds. This meant that users would always have a well-tested 2.4 version with the latest security and bug fixes to use, though they would have to wait for the features which went into the 2.5 branch. The downside of this was that the “stable” kernel ended up so far behind that it no longer supported recent hardware and lacked needed features. In the late 2.5 kernel series, some maintainers elected to try backporting of their changes to the stable kernel series, which resulted in bugs being introduced into the 2.4 kernel series. The 2.5 branch was then eventually declared stable and renamed to 2.6. But instead of opening an unstable 2.7 branch, the kernel developers decided to continue putting major changes into the 2.6 branch, which would then be released at a pace faster than 2.4.x but slower than 2.5.x. This had the desirable effect of making new features more quickly available and getting more testing of the new code, which was added in smaller batches and easier to test.

Then, in the Ruby community. In 2007, Ruby 1.8.6 was the stable version of Ruby. Ruby 1.9.0 was released on 2007-12-26, without being declared stable, as a snapshot from Ruby’s trunk branch, and most of the development’s attention moved to 1.9.x. On 2009-01-31, Ruby 1.9.1 was the first release of the 1.9 branch to be declared stable. But at the same time, the disruptive changes introduced in Ruby 1.9 made users stay with Ruby 1.8, as many libraries (gems) remained incompatible with Ruby 1.9.x. Debian provided packages for both branches of Ruby in Squeeze (2011) but only changed the default to 1.9 in 2012 (in a stable release with Wheezy – 2013).

Finally, in the Python community. Similarly to what happened with Ruby 1.9, Python 3.0 was released in December 2008. Releases from the 3.x branch have been shipped in Debian Squeeze (3.1), Wheezy (3.2), Jessie (3.4). But the ‘python’ command still points to 2.7 (I don’t think that there are plans to make it point to 3.x, making python 3.x essentially a different language), and there are talks about really getting rid of Python 2.7 in Buster (Stretch+1, Jessie+2).

In retrospect, and looking at what those projects have been doing in recent years, it is probably a better idea to break early, break often, and fix a constant stream of breakages, on a regular basis, even if that means temporarily exposing breakage to users, and spending more time seeking strategies to limit the damage caused by introducing breakage. What also changed since the time those branches were introduced is the increased popularity of automated testing and continuous integration, which makes it easier to measure breakage caused by disruptive changes. Distributions are in a good position to help here, by being able to provide early feedback to upstream projects about potentially disruptive changes. And distributions also have good motivations to help here, because it is usually not a great solution to ship two incompatible branches of the same project.

(I wonder if there are other occurrences of the same pattern?)

Update: There’s a discussion about this post on HN

Re: Sysadmin Skills and University Degrees

Russell Coker wrote about Sysadmin Skills and University Degrees. I couldn’t agree more that a major deficiency in Computer Science degrees is the lack of sysadmin training. It seems like most sysadmins learned most of what they know from experience. It’s very hard to recruit young engineers (freshly out of university) for sysadmin jobs, and the job interviews are often a bit depressing. Sysadmins jobs are also not very popular with this public, probably because university curriculums fail to emphasize what’s exciting about those jobs.

However, I think I disagree rather deeply with Russell’s detailed analysis.

First, Version Control. Well, I think that it’s pretty well covered in university curriculums nowadays. From my point of view, teaching CS in Université de Lorraine (France), mostly in Licence Professionnelle Administration de Systèmes, Réseaux et Applications à base de Logiciels Libres (warning: french), a BSc degree focusing on Linux systems administration, it’s not usual to see student projects with a mandatory use of Git. And it doesn’t seem to be a major problem for students (which always surprises me). However, I wouldn’t rate Version Control as the most important thing that is required for a sysadmin. Similarly Dependencies and Backups are things that should be covered, but probably not as first class citizens.

I think that there are several pillars in the typical sysadmin knowledge.

First and foremost, sysadmins need a good understanding of the inner workings of an operating system. I sometimes feel that many Operating Systems Design courses are a bit too much focused on the “Design” side of things. Yes, it’s useful to understand the low-level mechanisms, and be able to (mentally) recreate an OS from scratch. But it’s also interesting to know how real systems are actually built, and what are the trade-off involved. I very much enjoyed reading Branden Gregg’s Systems Performance: Enterprise and the Cloud because each chapter starts with a great overview of how things are in the real world, with a very good level of detail. Also, addressing OS design from the point of view of performance could be a way to turn those courses into something more attractive for students: many people like to measure, benchmark, optimize things, and it’s quite easy to demonstrate how different designs, or different configurations, make a big difference in terms of performance in the context of OS design. It’s possible to be a sysadmin and ignore, say, the existence of the VFS, but there’s a large class of problems that you will never be able to solve. It can be a good trade-off for a curriculum (e.g. at the BSc level) to decide to ignore most of the low-level stuff, but it’s important to be aware of it.

Students also need to learn how to design a proper infrastructure (that meets requirements in terms of scalability, availability, security, and maybe elasticity). Yes, backups are important. But monitoring is, too. As well as high availability. In order to scale, it’s important to be able to automatize stuff. Russell writes that Sysadmins need some programming skills, but that’s mostly scripting and basic debugging. Well, when you design an infrastructure, or when you use configuration management tools such as Puppet, in some sense, you are programming, and in terms of needs to abstract things, it’s actually similar to doing object-oriented programming, with similar choices (should I use that off-the-shelf puppet module, or re-develop my own? How should everything fit together?). Also, when debugging, it’s often useful to be able to dig into code, understand what the developer was trying to do, and if the expected behavior actually matches what you are seeing. It often results in spending a lot of time to create a one-line fix, and it requires very advanced programming skills. Again, it’s possible to be a sysadmin with only limited software development knowledge, but there’s a large class of things that you are unlikely to address properly.

I think that what makes sysadmins jobs both very interesting and very challenging is that they require a very wide range of knowledge. There’s often the ability to learn about new stuff (much more than in software development jobs). Of course, the difficult question is where to draw the line. What is the sysadmin knowledge that every CS graduate should have, even in curriculums not targeting sysadmin jobs? What is the sysadmin knowledge for a sysadmin BSc degree? for a sysadmin MSc degree?

Debian packages with /outdated/ packaging style

(This is just a copy of this debian-devel@ email)

Following my blog post yesterday with graphs about Debian packaging evolution, I prepared lists of packages for each kind of outdatedness. Of course not all practices highlighted below are deprecated, and there are good reasons to continue to do some of them. But still, given that they all represent a clear minority of packages, I thought that it would be useful to list the related packages. (I honestly didn’t know if some of my packages would show up in the lists!)

The lists are available at https://people.debian.org/~lucas/qa-20151226/

I also pushed them to alioth, so you can either do:
ssh people.debian.org 'grep -A 10 YOURNAME ~lucas/public_html/qa-20151226/*ddlist'
or:
ssh alioth.debian.org 'grep -A 10 YOURNAME ~lucas/qa-20151226/*ddlist'

the meaning of the lists is:

  • qa-comaint_but_no_vcs.txt (275 packages): Based on the content of Maintainer/Uploaders, the package is co-maintained, but there are no Vcs-* fields.
  • qa-format_10.txt (3153 packages): The package is still using format 1.0.
  • qa-helper_classic_debhelper.txt (3647 packages): The package is still using “classic” debhelper (no dh, no CDBS).
  • qa-helper_not_debhelper.txt (144 packages): The package is not using debhelper (nor dh, nor CDBS).
  • qa-patch_dpatch.txt (170 packages): The package is using dpatch.
  • qa-patch_modified-files-outside-debian.txt (1156 packages): The package has modified files outside the debian/ directory (not tracked using patches).
  • qa-patch_more_than_one.txt (201 packages): The package uses more than one “patch system”. In most cases, it means that the package uses a patch system, but also has files modified directly outside of debian/.
  • qa-patch_other.txt (51 packages): The package has patches, but using an unidentified/unknown patch system.
  • qa-patch_quilt.txt (445 packages): The package uses quilt (with 1.0 format, not 3.0 format).
  • qa-patch_simple-patchsys.txt (129 packages): The package uses simple-patchsys.
  • qa-vcs_but_not_git_or_svn.txt (290 packages): The package is maintained using a VCS, which is not either Git or SVN.
  • qa-vcs_more_than_one_declared_vcs.txt (1 package): The package declares more than one VCS.

If you don’t understand why your package is listed, you can have a look at allpackages-20151226.yaml that provides more details. If you still don’t understand, just ask me.

Excluding duplicates, a total of 5469 packages are listed. The dd-list output for the merged list is also available (which isn’t very useful, except to know if you are listed).

Debian is still changing

Here is an update to the usual graphs generated from snapshot.d.o. See my previous blog post for the background info.

In all graphs, it’s easy to see the effect of the Jessie freeze (and the previous freezes since 2005, too).

Team maintenance

comaint-2015

 

It’s interesting to see that, while the number of team-maintained packages increases, the number of packages that aren’t co-maintained is very stable.

Maintenance using a VCS

vcs-2015

Git is the clear winner now, with the migration rate increasing recently.

Packaging helpers

helpers-2015

As usual, the number of packages using CDBS is quite stable. The number of packages using traditional debhelper might soon be lower than those using CDBS.

Patch systems and packaging formats

formats-patches-2015

 

3.0 is the clear winner, even if we still have 3000+ packages using 1.0, and ~1000 of those modifying files directly. The other patch systems have basically disappeared.

So, all those graphs are kind-of boring now. Any good ideas of additional things to track, that be can identified reliably by looking at source packages?

For those interested, below are links to the graphs with percentages of packages.

comaint-percent-2015    formats-patches-percent-2015    vcs-percent-2015    helpers-percent-2015

DebConf’15

I attended DebConf’15 last week. After being on semi-vacation from Debian for the last few months, recovering after the end of my second DPL term, it was great to be active again, talk to many people, and go back to doing technical work. Unfortunately, I caught the debbug quite early in the week, so I was not able to make it as intense as I wanted, but it was great nevertheless.

I still managed to do quite a lot:

  • I rewrote a core part of UDD, which will make it easier to monitor data importer scripts and reduce the cron-spam
  • with DSA members, I worked on finding a suitable workaround for the storage performance issues that have been plaguing UDD for the last few months. fsyncs() will now longer hang for 15 minutes, yay!
  • I added a DUCK importer to UDD, and added that information to the Debian Maintainer Dashboard
  • I worked a bit on cleaning up the status of my packages, including digging into a strange texlive issue (that showed up in developers-reference), that is now fixed in unstable
  • I worked a bit on improving git-buildpackage documentation (more to come in that area)
  • Last but not least, I played Mao for the first time in years, and it was a lot of fun. (even if my brain is still slowly recovering)

DC15 was a great DebConf, probably one of the two bests I’ve attended so far. I’m now looking forward to DC16 in Cape Town!

systemd: Type=simple and avoiding forking considered harmful?

(This came up in a discussion on debian-user-french@l.d.o)

When converting from sysvinit scripts to systemd init files, the default practice seems to be to start services without forking, and to use Type=simple in the service description.

What Type=simple does is, well, simple. from systemd.service(5):

If set to simple (the default value if neither Type= nor BusName= are specified), it is expected that the process configured with ExecStart= is the main process of the service. In this mode, if the process offers functionality to other processes on the system, its communication channels should be installed before the daemon is started up (e.g. sockets set up by systemd, via socket activation), as systemd will immediately proceed starting follow-up units.

In other words, systemd just runs the command described in ExecStart=, and it’s done: it considers the service is started.

Unfortunately, this causes a regression compared to the sysvinit behaviour, as described in #778913: if there’s a configuration error, the process will start and exit almost immediately. But from systemd’s point-of-view, the service will have been started successfully, and the error only shows in the logs:

root@debian:~# systemctl start ssh
root@debian:~# echo $?
0
root@debian:~# systemctl status ssh
● ssh.service - OpenBSD Secure Shell server
 Loaded: loaded (/lib/systemd/system/ssh.service; enabled)
 Active: failed (Result: start-limit) since mer. 2015-05-13 09:32:16 CEST; 7s ago
 Process: 2522 ExecStart=/usr/sbin/sshd -D $SSHD_OPTS (code=exited, status=255)
 Main PID: 2522 (code=exited, status=255)
mai 13 09:32:16 debian systemd[1]: ssh.service: main process exited, code=exited, status=255/n/a
mai 13 09:32:16 debian systemd[1]: Unit ssh.service entered failed state.
mai 13 09:32:16 debian systemd[1]: ssh.service start request repeated too quickly, refusing to start.
mai 13 09:32:16 debian systemd[1]: Failed to start OpenBSD Secure Shell server.
mai 13 09:32:16 debian systemd[1]: Unit ssh.service entered failed state.

With sysvinit, this error is detected before the fork(), so it shows during startup:

root@debian:~# service ssh start
 [....] Starting OpenBSD Secure Shell server: sshd/etc/ssh/sshd_config: line 4: Bad configuration option: blah
 /etc/ssh/sshd_config: terminating, 1 bad configuration options
 failed!
 root@debian:~#

It’s not trivial to fix that. The implicit behaviour of sysvinit is that fork() sort-of signals the end of service initialization. The systemd way to do that would be to use Type=notify, and have the service signals that it’s ready using systemd-notify(1) or sd_notify(3) (or to use socket activation, but that’s another story). However that requires changes to the service. Returning to the sysvinit behaviour by using Type=forking would help, but is not really a solution: but what if some of the initialization happens *after* the fork? This is actually the case for sshd, where the socket is bound after the fork (see strace -f -e trace=process,network /usr/sbin/sshd), so if another process is listening on port 22 and preventing sshd to successfully start, it would not be detected.

I wonder if systemd shouldn’t do more to detect problems during services initialization, as the transition to proper notification using sd_notify will likely take some time. A possibility would be to wait 100 or 200ms after the start to ensure that the service doesn’t exit almost immediately. But that’s not really a solution for several obvious reasons. A more hackish, but still less dirty solution could be to poll the state of processes inside the cgroup, and assume that the service is started only when all processes are sleeping. Still, that wouldn’t be entirely satisfying…

Several improvements to UDD’s Bug Search and Maintainer Dashboard

Several improvements have been made to UDD’s Bug Search and Maintainer Dashboard recently.

On the Maintainer Dashboard side, the main new feature is a QA checks table that provides an overview of results from lintian, reproducible builds, piuparts, and ci.debian.net. Check the dashboard for the Ruby team for an example. Also, thanks to Daniel Pocock, the TODO items can now be exported as iCalendar tasks.

Bugs Search now has much better JSON and YAML outputs. It’s probably a good start if you want to do some data-mining on bugs. Packages can now be selected using the same form as the Maintainer Dashboard’s one, which makes it easy to build your own personal bug list, and will suppress the need for some of the team-specific listings.

Many bugs have been fixed too. More generally, thanks to the work of Christophe Siraut, the code is much better now, with a clean separation of the data analysis logic and the rendering sides that will make future improvements easier.

As the reminder, it’s quite easy to hack on UDD (even if you are not a DD). Please report bugs, including about additional features you would like to see!