Better mentors.debian.net/REVU to improve our sponsorship workflow?

I recently sponsored several uploads, and was asked to sponsor even more uploads, and that got me thinking about our sponsorship workflow. It’s a clear bottleneck in Debian, and discourages many new contributors, which obviously sucks.

It’s important to note that the same problems exist in Ubuntu (their equivalent to mentors.debian.net is named REVU).

The best way to improve the process would be to have packages of better quality when a DD first look at them. They would be more likely to be uploaded right away, which frees time for other packages. I think that there’s a lot of room for improvement in the current mentors.debian.net implementation. Here is a small list of features I would like to see.

  • Integration of some QA tests in mentors, as soon as the package is uploaded:
    • does the package build cleanly?
    • piuparts test?
    • lintian/linda checks?
  • Better list of packages awaiting sponsors, with info including:
    • does the package fixes bugs (number of bugs fixed per severity)?
    • is that package already in Debian?
    • is that package a new upstream version?
    • popcon score
    • how long has the package been waiting?

    This would allow potential sponsors to prioritize requests.

  • A commenting system, for each package, so comments for rejected packages are not lost, and the next potential sponsor can double-check
  • A way for sponsors to mark some sponsorees as “friends”, so it’s easy to find all the requests from people I “trust” (for some definitions of “trust” ;)
  • Maybe, a scoring system, where providing good comments on other’s packages would make you win “karma points”, and improve your classification, which could later be used by sponsors to choose what they are going to sponsor next.

The good thing with this whole list of features is that everybody can help. So, if you are looking for a sponsor and want to help solve this problem, start coding now ;) And if you need me to create nicenameforyourservice.debian.net, just ping me. There’s probably some code to steal from svnbuildstat.debian.net, so contacting its developers would be a good idea.

Re: Giving credit where due?

Christian Perrier is wondering why the official announcement about the Gutsy release does not even contain the word “Debian”.

It’s not new: Debian is virtually nonexistent in Ubuntu’s communication. It seems that the last Ubuntu release to acknowledge its Debian origins was Dapper (June 2006), maybe because that was the “Long Term Support” release.

The fact that there’s no “Ubuntu is based on Debian” paragraph on www.ubuntu.com was raised during Debconf, and it was supposed to get fixed, but it seems that it didn’t happen for some reason (there was such a paragraph before the website redesign).

In other news, I’ve been trying to install Ubuntu Gutsy inside qemu, but it fails miserably while booting the installer. I removed the “quiet” and “splash” options from the kernel cmdline, and discovered that after trying to “mount the root filesystem”, I get dropped into busybox with no error message to google for. Feisty fails as well, but Dapper boots fine. So much for the Ubuntu is an ancient African word, meaning “I can’t install Debian” joke!

ZFS as LVM killer … really?

From the ZFS FAQ:

Can devices be removed from a ZFS pool?

You can remove a device from a mirrored ZFS configuration by using the zpool detach command. Removal of a top-level vdev, such as an entire RAID-Z group or a disk in an unmirrored configuration, is not currently supported. This feature is planned for a future release.

buzz, buzz, buzz…

Compiz interest

From time to time, I try Compiz, to see if how it has evolved. The last time was yesterday (I also switched to the xserver-xorg-driver-ati from experimental).

But as usual, after using it for a few minutes, I can’t help switching back to metacity. I don’t think that Compiz’s visual effects bring anything on the usability POV, and I just find them annoying after the initial “WOW”. Of course, it’s nice to show off, but to do actual work? Are there people really using it all the time?

creating a “Distributions Developers Forum”: follow-up

After that blog post, I decided to write a mail asking a first set of questions. I sent it to developers’ mailing lists of Fedora, Gentoo, Mandriva, openSUSE, and of course Debian and Ubuntu. I got really interesting answers from everyone, except …. Debian and Ubuntu.

  • I want to wait some more before publishing the answers. If you are a Debian or an Ubuntu developer, and were interested in that initiative, please answer ASAP my mails sent to debian-devel@ and ubuntu-devel-discuss@ (respectively). Not having answers from Debian and Ubuntu people would really be a shame, since everyone else I contacted was really helpful and interested.
  • I plan to use a mailing list archiving software to publish the mails. Can you recommend a good mbox->html converter, that would work well in a “run once” use case, and that doesn’t take ages to set up?
  • Can you think of another distro I should have contacted? At first, I don’t want to include simple derivatives of the “big distros”. I also chose to limit myself to the Linux distros, so I didn’t contact the BSD or Nexenta folks. Both of this could change: my current plan for the future is to try to setup a mailing list+wiki, so everybody could join.

Idea: creating a “Distributions Developers Forum” ?

Scientific papers always have a “related works” section, where the authors describe how the work they are presenting compares with what others did. In the Free Software world, this is nearly non-existent: in a way, it seems that many of us are thinking of their projects as competing products, fighting for market share. On a project web page, I would love to read something like:

This project is particularly well suited if you want XX. But if YY is more important to you, you might want to have a look at ZZ.

Or simply links to similar projects for other environments, etc. All in all, I think that the goal is to improve the global satisfaction. Not to win a few users, who won’t be totally happy, because the project doesn’t really suit their needs.

While some projects cooperate and share ideas, like I think desktop environments do inside freedesktop.org, most just ignore each other. I am both a Debian and an Ubuntu developer, and I’m sometimes amazed that Ubuntu discusses technical choices that were discussed (and solved) a few weeks earlier in Debian. And it’s even worse with the other big distros out there.

Couldn’t we try to improve this ? We could just create a mailing list, where developers from various distributions could present the way they do things. This would allow to discuss future developments (“We are planning to improve this, what are you doing about that ?“) or simply to improve people’s knowledge of the various distributions.

Of course, this could easily turn into flamefests, but they are technical ways to avoid that, like moderating posts from trollers…

Does something like that already exist ? Do you think that it would be interesting ? Would you like to contribute to such a forum ?

Some examples of things that could be discussed:

  • How many packages do you have, and how do you support them ? Do you have several “classes” of packages ?
  • How do you manage your releases ? Goal-based ? Time-based ? Bug-count-based ?
  • Which kind of quality assurance do you do ?
  • How many contributors do you have ? Are they split into different “classes” ? Who has “commit rights” ? Can you give out “commit rights” restricted to subsets of your packages ? A organized sponsorship system for people who don’t have commit rights ?
  • etc, etc, etc.

Which distribution on Thinkpads: really the good question to ask?

After Dell, Lenovo decided to ask users which Linux distribution they should put on Thinkpads. Seriously, who cares? If I buy a laptop that comes with Linux pre-installed, my first step would be to reinstall it from scratch, exactly like with a laptop with Windows pre-installed. Because the choices that were made wouldn’t match mine (think of partitioning, etc). Or simply because I wouldn’t totally trust the hardware manufacturer.

So, what would make me happier about a laptop?

  • That installing any recent enough mainstream Linux distribution works without requiring tricks
  • That it’s possible to buy it without an operating system, with no additional charge (and no, I don’t buy the “we need the OS installed to do some quality tests before we ship” argument. USB keys and CDROMs have been bootable for years.)

I couldn’t care less about which distribution comes preinstalled. If Lenovo wants to make me happy, there are two ways:

  • Talk to free software developers: kernel developers, etc. Not distribution developers. And get the required changes merged in, so they will land in my favorite distribution after some time.
  • If they prefer to play on their own, they could create an open “Linux on Lenovo laptops” task force, where they would provide the needed drivers in a way that makes it dead easy to integrate them in Linux distros and to report problems.

It’s not _that_ hard: some manufacturers got it right, at least for some of their products. There are many manufacturers contributing code directly to the Linux kernel, for network drivers for example.

But maybe this is just about marketing and communication, not about results? Because after all, Dell and Lenovo will look nice to the random user. While playing-by-the-rules manufacturers are hidden deep in the Linux changelog.

Easy migration of a service to another, totally different host with iptables

I’m tired of googling for this every time I need it, so I’m blogging about it.

Q: How can one redirect all connections to hostA:portA to hostB:portB, where hostA and hostB and in totally different parts of the Internet?

A:
echo 1 > /proc/sys/net/ipv4/ip_forward
$IPT -t nat -A PREROUTING -p tcp --dport portA -j DNAT --to hostB:portB
$IPT -A FORWARD -i eth0 -o eth0 -d hostB -p tcp --dport portB -j ACCEPT
$IPT -A FORWARD -i eth0 -o eth0 -s hostB -p tcp --sport portB -j ACCEPT
$IPT -t nat -A POSTROUTING -p tcp -d hostB --dport portB -j SNAT --to-source hostA

Connections are masqueraded, that means that, for hostB, all connections are coming from hostA. So be careful.

How do archive rebuilds and piuparts tests on Grid’5000 work?

With the development of rebuildd and the fact that several people are interested in re-using my scripts, I feel the need to explain how this stuff works.

Grid’5000

First, Grid’5000 is a research platform used to study computer grids. It’s not really a grid (it doesn’t use all the classic grid middleware such as Globus). Grid’5000 is composed of 9 sites, each hosting from 1 to 3 clusters. Inside clusters, nodes are usually connected using gigabit ethernet, and sometimes another high speed network (Myrinet, Infiniband, etc). Clusters are connected using a dedicated 1OG ethernet network. Grid’5000 is in a big private network (you access it through special gateways), and one can access each node from any other node directly (no need for complex tunnelling).

Using Grid’5000 nodes

When you want to use some nodes on Grid’5000, you have to use a resource manager to say “I’d like to use 50 nodes for 10 hours”. Then your job starts. At that point, you can use a tool called KaDeploy to install your own system on all the nodes (think of it as “FAI for large clusters”). When KaDeploy finishes, the nodes are rebooted in your environment, and you can connect as root. At that point, you can basically break the nodes the way you want, since they will be restored at the end of your job.

Running Debian QA tasks on Grid’5000

None of that was Debian-specific. I will now try to explain how QA tasks are run on Grid’5000. The scripts mentioned below are in the debcluster directory of the collab-qa SVN repository.

When the nodes are ready, the first node is chosen to play a special role (it’s called the master node from now on). A script is run on the master node to prepare it. This consists in mounting a shared NFS directory, and run another script located on this shared NFS directory to install a few packages, configure some stuff, and start a script (masternode.rb) that will schedule the tasks on all the other nodes.

masternode.rb is also responsible for preparing the other nodes. Which consists in mounting the same shared NFS directory, and executing a script (preparenode.rb) that installs a few packages and configures some stuff. After the nodes have been prepared, they are ready to execute tasks.

To execute a task, masternode.rb connects to the node using ssh and executes a script in the shared directory. Those scripts are basically wrappers around lower-level tools. Examples are buildpackage.rb, and piuparts.rb.

Now, some specific details:

  • masternode.rb schedules tasks, not builds. Tasks are commands. So it is possible, in a single Grid’5000 job, to mix a piuparts test on Ubuntu and an archive rebuild on Debian. When another QA task is created, I just have to write another wrapper.
  • Tasks are scheduled using “longest job first”. This doesn’t matter with piuparts tests (which are usually quite short) but is important for archive rebuilds: some packages take a very long time to build. If I want to rebuild all packages in about 10 hours, openoffice.org has to be the first build to start, since building openoffice.org takes about 10 hours itself… So one node will only build openoffice.org, and the other nodes will build the other packages.
  • I use sbuild to build packages, not pbuilder. pbuilder’s algorithm to resolve build-dependencies is a bit broken (#141888, #215065). sbuild’s is broken as well (#395271, #422879, #272955, #403246), but at least it’s broken in the same way as the buildds’, so something that doesn’t build on sbuild won’t build on the buildds, and you can file bugs.
  • I use schroot with “file” chroots. The tarballs are stored on the NFS directory. Which looks ineffective, but actually works very well and is very flexible. A tarball of a build environment is not that big, and this allows for a lot of flexibility and garantees that my build environment is always clean. If I want to build with a different dpkg-dev, I just have to:
    • cp sid32.tgz sid32-new.tgz
    • add the chroot to schroot.conf
    • tell buildpackage.rb to use sid32-new instead of sid32
  • Logs and (if needed) resulting packages are written to the NFS directory.

Comments and questions welcomed :)

How old are our packages?

Debian’s binary packages are only built when they are uploaded (or when a binNMU is requested, but that doesn’t happen frequently). They are never rebuilt later. This means that some binary packages in etch weren’t build in an up-to-date etch environment, but possibly in a much older environment.

Is this a problem? It depends. Old packages don’t benefit from the improvements introduced in Debian after they were built. Like new compiler optimizations, or other changes in the toolchain that could induce changes in binary packages.

For example, some files used to be at some place, but are now put at another place. Also, some parts of maintainers scripts are automatically generated, and would be different if the package was rebuilt today.

But is it really a problem? Are our packages _that_ old? Also, when a big change is made in the way we generate our binary packages (like Raphael Hertzog’s new dpkg-shlibdeps), when can we expect that the change will be effective in all packages?

I went through all binary packages in unstable (as on 24/06/2007) (both main, contrib and non-free packages) on i386. Using dpkg-deb –contents, I extracted the most recent date of each package (which can reasonably be taken as the date of the package’s creation). And here are the results.

Most packages are actually quite recent. 9008 packages (43%) were built after the release of etch. And 19857 packages (94%) were built after the release of sarge. But that still leaves us with 1265 packages that were built before sarge was released, and even one package (exim-doc-html) that was built before the release of woody! (the removal of this package has been requested, so we will soon be woody-clean :-)

Now, what could we do with this data? We could review the older packages, to determine if:

  • They would benefit from a rebuild (by comparing them the result of a fresh build) <= I'm planning to work on that
  • Integrate that data with other sources of data (popcon, for example), to see if the package should be removed. Such old packages are probably good candidates for removal.

Here is the full sorted list of packages.