The Linux 2.5, Ruby 1.9 and Python 3 release management anti-pattern

There’s a pattern that comes up from time to time in the release management of free software projects.

To allow for big, disruptive changes, a new development branch is created. Most of the developers’ focus moves to the development branch. However at the same time, the users’ focus stays on the stable branch.

As a result:

  • The development branch lacks user testing, and tends to make slower progress towards stabilization.
  •  Since users continue to use the stable branch, it is tempting for developers to spend time backporting new features to the stable branch instead of improving the development branch to get it stable.

This situation can grow up to a quasi-deadlock, with people questioning whether it was a good idea to do such a massive fork in the first place, and if it is a good idea to even spend time switching to the new branch.

To make things more unclear, the development branch is often declared “stable” by its developers, before most of the libraries or applications have been ported to it.

This has happened at least three times.

First, in the Linux 2.4 / 2.5 era. Wikipedia describes the situation like this:

Before the 2.6 series, there was a stable branch (2.4) where only relatively minor and safe changes were merged, and an unstable branch (2.5), where bigger changes and cleanups were allowed. Both of these branches had been maintained by the same set of people, led by Torvalds. This meant that users would always have a well-tested 2.4 version with the latest security and bug fixes to use, though they would have to wait for the features which went into the 2.5 branch. The downside of this was that the “stable” kernel ended up so far behind that it no longer supported recent hardware and lacked needed features. In the late 2.5 kernel series, some maintainers elected to try backporting of their changes to the stable kernel series, which resulted in bugs being introduced into the 2.4 kernel series. The 2.5 branch was then eventually declared stable and renamed to 2.6. But instead of opening an unstable 2.7 branch, the kernel developers decided to continue putting major changes into the 2.6 branch, which would then be released at a pace faster than 2.4.x but slower than 2.5.x. This had the desirable effect of making new features more quickly available and getting more testing of the new code, which was added in smaller batches and easier to test.

Then, in the Ruby community. In 2007, Ruby 1.8.6 was the stable version of Ruby. Ruby 1.9.0 was released on 2007-12-26, without being declared stable, as a snapshot from Ruby’s trunk branch, and most of the development’s attention moved to 1.9.x. On 2009-01-31, Ruby 1.9.1 was the first release of the 1.9 branch to be declared stable. But at the same time, the disruptive changes introduced in Ruby 1.9 made users stay with Ruby 1.8, as many libraries (gems) remained incompatible with Ruby 1.9.x. Debian provided packages for both branches of Ruby in Squeeze (2011) but only changed the default to 1.9 in 2012 (in a stable release with Wheezy – 2013).

Finally, in the Python community. Similarly to what happened with Ruby 1.9, Python 3.0 was released in December 2008. Releases from the 3.x branch have been shipped in Debian Squeeze (3.1), Wheezy (3.2), Jessie (3.4). But the ‘python’ command still points to 2.7 (I don’t think that there are plans to make it point to 3.x, making python 3.x essentially a different language), and there are talks about really getting rid of Python 2.7 in Buster (Stretch+1, Jessie+2).

In retrospect, and looking at what those projects have been doing in recent years, it is probably a better idea to break early, break often, and fix a constant stream of breakages, on a regular basis, even if that means temporarily exposing breakage to users, and spending more time seeking strategies to limit the damage caused by introducing breakage. What also changed since the time those branches were introduced is the increased popularity of automated testing and continuous integration, which makes it easier to measure breakage caused by disruptive changes. Distributions are in a good position to help here, by being able to provide early feedback to upstream projects about potentially disruptive changes. And distributions also have good motivations to help here, because it is usually not a great solution to ship two incompatible branches of the same project.

(I wonder if there are other occurrences of the same pattern?)

Update: There’s a discussion about this post on HN

systemd: Type=simple and avoiding forking considered harmful?

(This came up in a discussion on debian-user-french@l.d.o)

When converting from sysvinit scripts to systemd init files, the default practice seems to be to start services without forking, and to use Type=simple in the service description.

What Type=simple does is, well, simple. from systemd.service(5):

If set to simple (the default value if neither Type= nor BusName= are specified), it is expected that the process configured with ExecStart= is the main process of the service. In this mode, if the process offers functionality to other processes on the system, its communication channels should be installed before the daemon is started up (e.g. sockets set up by systemd, via socket activation), as systemd will immediately proceed starting follow-up units.

In other words, systemd just runs the command described in ExecStart=, and it’s done: it considers the service is started.

Unfortunately, this causes a regression compared to the sysvinit behaviour, as described in #778913: if there’s a configuration error, the process will start and exit almost immediately. But from systemd’s point-of-view, the service will have been started successfully, and the error only shows in the logs:

root@debian:~# systemctl start ssh
root@debian:~# echo $?
0
root@debian:~# systemctl status ssh
● ssh.service - OpenBSD Secure Shell server
 Loaded: loaded (/lib/systemd/system/ssh.service; enabled)
 Active: failed (Result: start-limit) since mer. 2015-05-13 09:32:16 CEST; 7s ago
 Process: 2522 ExecStart=/usr/sbin/sshd -D $SSHD_OPTS (code=exited, status=255)
 Main PID: 2522 (code=exited, status=255)
mai 13 09:32:16 debian systemd[1]: ssh.service: main process exited, code=exited, status=255/n/a
mai 13 09:32:16 debian systemd[1]: Unit ssh.service entered failed state.
mai 13 09:32:16 debian systemd[1]: ssh.service start request repeated too quickly, refusing to start.
mai 13 09:32:16 debian systemd[1]: Failed to start OpenBSD Secure Shell server.
mai 13 09:32:16 debian systemd[1]: Unit ssh.service entered failed state.

With sysvinit, this error is detected before the fork(), so it shows during startup:

root@debian:~# service ssh start
 [....] Starting OpenBSD Secure Shell server: sshd/etc/ssh/sshd_config: line 4: Bad configuration option: blah
 /etc/ssh/sshd_config: terminating, 1 bad configuration options
 failed!
 root@debian:~#

It’s not trivial to fix that. The implicit behaviour of sysvinit is that fork() sort-of signals the end of service initialization. The systemd way to do that would be to use Type=notify, and have the service signals that it’s ready using systemd-notify(1) or sd_notify(3) (or to use socket activation, but that’s another story). However that requires changes to the service. Returning to the sysvinit behaviour by using Type=forking would help, but is not really a solution: but what if some of the initialization happens *after* the fork? This is actually the case for sshd, where the socket is bound after the fork (see strace -f -e trace=process,network /usr/sbin/sshd), so if another process is listening on port 22 and preventing sshd to successfully start, it would not be detected.

I wonder if systemd shouldn’t do more to detect problems during services initialization, as the transition to proper notification using sd_notify will likely take some time. A possibility would be to wait 100 or 200ms after the start to ensure that the service doesn’t exit almost immediately. But that’s not really a solution for several obvious reasons. A more hackish, but still less dirty solution could be to poll the state of processes inside the cgroup, and assume that the service is started only when all processes are sleeping. Still, that wouldn’t be entirely satisfying…

Debian Package of the Day revival (quite)

TL;DR: static version of http://debaday.debian.net/, as it was when it was shut down in 2009, available!

A long time ago, between 2006 and 2009, there was a blog called Debian Package of the Day. About once per week, it featured an article about one of the gems available in the Debian archive: one of those many great packages that you had never heard about.

At some point in November 2009, after 181 articles, the blog was hacked and never brought up again. Last week I retrieved the old database, generated a static version, and put it online with the help of DSA. It is now available again at http://debaday.debian.net/. Some of the articles are clearly outdated, but many of them are about packages that are still available in Debian, and still very relevant today.

Will the packages you rely on be part of Debian Jessie?

The start of the jessie freeze is quickly approaching, so now is a good time to ensure that packages you rely on will the part of the upcoming release. Thanks to automated removals, the number of release critical bugs has been kept low, but this was achieved by removing many packages from jessie: 841 packages from unstable are not part of jessie, and won’t be part of the release if things don’t change.

It is actually simple to check if you have packages installed locally that are part of those 841 packages:

  1. apt-get install how-can-i-help (available in backports if you don’t use testing or unstable)
  2. how-can-i-help --old
  3. Look at packages listed under Packages removed from Debian ‘testing’ and Packages going to be removed from Debian ‘testing’

Then, please fix all the bugs :-) Seriously, not all RC bugs are hard to fix. A good starting point to understand why a package is not part of jessie is tracker.d.o.

On my laptop, the two packages that are not part of jessie are the geeqie image viewer (which looks likely to be fixed in time), and josm, the OpenStreetMap editor, due to three RC bugs. It seems much harder to fix… If you fix it in time for jessie, I’ll offer you a $drink!

on the Dark Ages of Free Software: a “Free Service Definition”?

Stefano Zacchiroli opened DebConf’14 with an insightful talk titled Debian in the Dark Ages of Free Software (slides available, video available soon).

He makes the point (quoting slide 16) that the Free Software community is winning a war that is becoming increasingly pointless: yes, users have 100% Free Software thin client at their fingertips [or are really a few steps from there]. But all their relevant computations happen elsewhere, on remote systems they do not control, in the Cloud.

That give-up on control of computing is a huge and important problem, and probably the largest challenge for everybody caring about freedom, free speech, or privacy today. Stefano rightfully points out that we must do something about it. The big question is: how can we, as a community, address it?

Towards a Free Service Definition?

I believe that we all feel a bit lost with this issue because we are trying to attack it with our current tools & weapons. However, they are largely irrelevant here: the Free Software Definition is about software, and software is even to be understood strictly in it, as software programs. Applying it to services, or to computing in general, doesn’t lead anywhere. In order to increase the general awareness about this issue, we should define more precisely what levels of control can be provided, to understand what services are not providing to users, and to make an informed decision about waiving a particular level of control when choosing to use a particular service.

Benjamin Mako Hill pointed out yesterday during the post-talk chat that services are not black or white: there aren’t impure and pure services. Instead, there’s a graduation of possible levels of control for the computing we do. The Free Software Definition lists four freedoms — how many freedoms, or types of control, should there be in a Free Service Definition, or a Controlled-Computing Definition? Again, this is not only about software: the platform on which a particular piece of software is executed has a huge impact on the available level of control: running your own instance of WordPress, or using an instance on wordpress.com, provides very different control (even if as Asheesh Laroia pointed out yesterday, WordPress does a pretty good job at providing export and import features to limit data lock-in).

The creation of such a definition is an iterative process. I actually just realized today that (according to Wikipedia) the very first occurrence of an attempt at a Free Software Definition was published in 1986 (GNU’s bulletin Vol 1 No.1, page 8) — I thought it happened a couple of years earlier. Are there existing attempts at defining such freedoms or levels of controls, and at benchmarking such criteria against existing services? Such criteria would not only include control over software modifications and (re)distribution, but also likely include mentions of interoperability and open standards, both to enable the user to move to a compatible service, and to avoid forcing the user to use a particular implementation of a service. A better understanding of network effects is also needed: how much and what type of service lock-in is acceptable on social networks in exchange of functionality?

I think that we should inspire from what was achieved during the last 30 years on Free Software. The tools that were produced are probably irrelevant to address this issue, but there’s a lot to learn from the way they were designed. I really look forward to the day when we will have:

  • a Free Software Definition equivalent for services
  • Debian Free Software Guidelines-like tests/checklist to evaluate services
  • an equivalent of The Cathedral and the Bazaar, explaining how one can build successful business models on top of open services

Exciting times!

Introducing the Debian Maintainer Dashboard — help needed!

The brand new machine for Ultimate Debian Database motivated me to do UDD-related work again, so I implemented an old idea: build a maintainer/team-centric dashboard relying on UDD, the Debian Maintainer Dashboard.

The idea is to expose as much useful information as possible about a maintainer’s packages, both in the traditional (Developers Packages Overview-like) “big table with all the info”, but also in a task-oriented listing (to answer the “ok, I have two hours for Debian, what should I do now?” question).

At this point, the Debian Maintainer Dashboard already brings several cool features (try it with my packages or pkg-ruby’s packages, or your own!):

  • Reports buildd issues (failed builds)
  • Upstream status monitor that works (UDD has its own)
  • Knows about teams’ VCS (if your team uses PET) and display the current version in the VCS
  • Lists packages you maintain or co-maintain, and also the package you showed interest for by subscribing to them on the PTS

Of course, as of now, it’s pretty ugly: that’s the “help needed” part of the post. I really suck at web design, and would really appreciate if someone would help me to turn it into something nice to use. Besides adding links and colors everywhere, it could probably be nice to add sortable tables, and to create tabs for “TODO items”, “Versions” and “Bugs” (and possibly Lintian, for example). Contact me if you are interested.

Debian archive rebuilds on Amazon Web Services

I like to think that archive rebuilds play an important role in Debian Quality Assurance and Release Management efforts. By trying to rebuild every Debian package from source, one can identify packages that do not build anymore due to changes in other packages (compilers, interpreters, libraries, …). It is also a good way to stress-test all packages that are involved in building other packages.

Since 2007, I had been running Debian archive rebuilds on the Grid’5000 testbed, a research infrastructure for performing experiments on distributed systems – HPC/Grid/Cloud/P2P. I filed more than 6000 release-critical bugs in the process.

Late last year, Amazon kindly offered us a grant to allow us to run such QA tests on Amazon Web Services. With Sébastien Badia, we ported the rebuild infrastructure to AWS (scripts), and several rebuilds have already been carried out on AWS.

On the technical level, 50 to 100 EC2 spot instances are started, and then controlled from a master instance using SSH. On build instances, a classic sbuild setup is used. Logs are retrieved to the master node after rebuilds, and build instances are simply shut down when there are no more tasks to process. Several tasks are processed simultaneously on each instance, and when they fail, they are retried again with no other concurrent build on the same instance, to eliminate random failures caused by load or timing issues. All the scripts are designed to support other kind of QA tests, not just rebuilds.

Moving to Amazon Web Services will facilitate sharing the human workload of doing those tests. It is now possible for developers interested in custom tests to do them themselves (hint hint).

Update of Debian Packaging Tutorial

I’ve just updated the Debian Packaging Tutorial. This new version addresses a few comments and questions I received over the past months.

Note that there are also french and spanish versions of the tutorial, and I’m of course open to adding other translations.

The tutorial can be found in the packaging-tutorial package (PDF files are in /usr/share/doc/packaging-tutorial/), or on www.debian.org (see links above).

Re: Ubuntu vs Ruby

(original post)

> If Ubuntu 12.04 if a LTS release, and Ruby 1.8.7 goes out of support in June of
> 2013, then why is the default still 1.8.7?
>
> Ruby 1.9.2 was released in 2010. Ruby 1.9.3 was released in October of this year.

First, there’s almost nobody in the Ubuntu development community doing any Ruby work. Packages are just imported from Debian, and Ubuntu follows what is done on the Debian side.

In the Debian/Ruby team, we are currently transitioning to a new packaging helper (gem2deb) that makes it much
easier to support several Ruby versions. Once this will be done, switching to 1.9.3 by default will be very easy. We already provide a way for the sysadmin to change the default on a system.

Now, doing that transition takes time, and we could have used *your* help (and could still use it). We are still quite on time to do it for Debian wheezy, but it sounds very hard to do it for Ubuntu 12.04 unless someone from Ubuntu steps
up to help.