Looking for cliques in the GPG signatures graph

The strongly connected set of the GPG keys graph contains a bit more than 40000 keys now (yes, that’s a lot of geeks!). I wondered what was the biggest clique (complete subgraph) in that graph, and also of course the biggest clique I was in.

It’s easy to grab the whole web of trust there. Finding the maximum clique in a graph is NP-complete, but there are algorithms that work quite well for small instances (and you don’t need to consider all 40000 keys: to be in a clique of n keys, a key must have at least n-1 signatures, so it’s easy to simplify the graph — if you find a clique with 20 keys, you can remove all keys that have less than 19 signatures).

My first googling result pointed to Ashay Dharwadker’s solver implementation (which also proves P=NP ;). Googling further allowed me to find the solver provided with the DIMACS benchmarks. It’s clearly not the state of the art, but it was enough in my case (allowed to find the result almost immediately).

The biggest clique contains 47 keys. However, it looks like someone had fun, and injected a lot of bogus keys in the keyring. See the clique. So I ignored those keys, and re-ran the solver. And guess what’s the size of the biggest “real” clique? Yes. 42. Here are the winners:

CF3401A9 Elmar Hoffmann 
AF260AB1 Florian Zumbiehl 
454C864C Moritz Lapp 
E6AB2957 Tilman Koschnick 
A0ED982D Christian Brueffer 
5A35FD42 Christoph Ulrich Scholler 
514B3E7C Florian Ernst 
AB0CB8C0 Frank Mohr 
797EBFAB Enrico Zini 
A521F8B5 Manuel Zeise 
57E19B02 Thomas Glanzmann 
3096372C Michael Fladerer 
E63CD6D6 Daniel Hess 
A244C858 Torsten Marek 
82FB4EAD Timo Weingärtner
1EEF26F4 Christoph Ulrich Scholler 
AAE6022E Karlheinz Geyer 
EA2D2C41 Mattia Dongili 
FCC5040F Stephan Beyer 
6B79D401 Giunchedi Filippo 
74B11360 Frank Mohr 
94C09C7F Peter Palfrader
2274C4DA Andreas Priesz 
3B443922 Mathias Rachor 
C54BD798 Helmut Grohne 
9DE1EEB1 Marc Brockschmidt 
41CF0322 Christoph Reeg 
218D18D7 Robert Schiele 
0DCB0431 Daniel Hess 
B84EF12A Mathias Rachor 
FD6A8D9D Andreas Madsack 
67007C30 Bernd Paysan 
9978AF86 Christoph Probst 
BD8B050D Roland Rosenfeld 
E3DB4EA7 Christian Barth 
E263FCD4 Kurt Gramlich 
0E6D09CE Mathias Rachor 
2A623F72 Christoph Probst 
E05C21AF Sebastian Inacker 
5D64F870 Martin Zobel-Helas 
248AEB73 Rene Engelhard 
9C67CD96 Torsten Veller

It’s likely that this happened thanks to a very successful key signing party somewhere in germany (looking at the email addresses). [Update: It was the LinuxTag 2005 KSP.] It might be a nice challenge to beat that clique during next Debconf ;)

And the biggest clique I’m in contains 23 keys. Not too bad.

tool to mirror a website locally?

Dear lazyweb,

I need a tool to mirror a website locally (so I can browse it offline). Requirements:
– not GUI-based (I want to run it in a script)
– support recursive retrieval and include/exclude lists (like wget)
– no output when everything is fine, but still output errors (not possible with wget, which still output “basic information” when running with –no-verbose, and doesn’t output errors when running with –quiet)
– understands timestamps, and retransfers files if timestamps or sizes don’t match
– not too over-engineered, not too badly maintained, etc…

Thank you.

New Debian Developers!

We got a lot of (>= 10) new Debian developers recently. I’m really happy to see that the bottlenecks in the New Maintainer process were (at least partially) solved. My first NM (actually my second, my first one is on hold) also became a DD today.

So, how long does it take to become a DD ? Let’s take 2 examples. Both are very active and skilled new contributors, that probably were quite close from being the faster you can be through NM:

Name Applied AM assigned Approved by AM Account created
Chris Lamb 2008-05-01 2008-06-12 2008-07-22 2008-09-16
Sandro Tosi 2008-03-24 2008-05-06 2008-06-22 2008-09-16

We have the proof: provided you have all the required skills, you can become a DD in less than 6 months!

Of course, some things are not perfect yet:

  • A lot of very good contributors are waiting for an AM, because not enough DDs volunteer to be AMs.
  • Some NMs still take too long to answer questions, using AMs that could probably mentor faster NMs. If your AM is waiting for you, feel guilty now!
  • Front Desk and DAM are still managed by a small set of very active (and very busy elsewhere) DDs. Many of the new DDs were FD-approved and DAM-approved by the same person, which is not so great if we want to keep this two-steps check.

Is Mozilla the new XFree86? Could Ubuntu actually help?

All the recent moves of Mozilla make me feel that they are really taking the XFree86 path. Reading the Launchpad bug log about the EULA shows that most of the posters agree on who is on the wrong side, and favor switching to IceWeasel or Epiphany+Webkit.

Even if Mozilla is apparently going to back off on the EULA story, it looks like the harm was done. If they want to fix that, they will have to start listening to other players in the Free Software community. Or just watch Webkit eat their market share.

Since Ubuntu leaders are apparently talking to Mozilla about that, I really hope that they are aiming for a solution that will help the Free Software community as a whole, and are not looking for a work-around that will “fix” the problem for Ubuntu.

There has been a lot of noise about the lack of “giving back” to the community by Ubuntu. Using Ubuntu user base to weight in and solve such issues in a way that benefit the whole community would probably be seen as a much more valuable contribution than another bunch of patches.

UDD and Buildstat

Ultimate Debian Database (UDD) is a GSoC project (now finished) which aimed at importing all different data sources that we have in Debian in a single SQL database, to make it easy to combine that data. Currently, we import information about source and binary packages, all bugs (both archived and unarchived), lintian, carnivore, popcon, history of uploads, history of migrations to testing, and orphaned packages. The goal is really to have that data, without really thinking of a specific use cases: there will be lots of use cases.

Buildstat is a project by Gonéri Le Bouder, that provides a framework for running QA tests (rebuilds and lintian currently, but buildstat is built in a very extensible way) on packages, using both packages in the archive, and packages in the VCS repositories of teams. This is pretty cool: it allows teams to get an overview of the status of their packages, not using the archive as reference, but using their VCS. Buildstat schedules and runs the tests, store the data in an SQL database, and allows to browse the data using a web interface. Buildstat also import some data from other sources (only the BTS currently, using the LDAP dump) to display it on the web interface.

Since both projects are using an SQL DB, people have been asking why we don’t simply merge them. The big advantage would be that the data is synchronized: buildstat would display more up-to-date info about the data sources it doesn’t generate locally (like bugs), and UDD would get fresh buildstat data. We have been talking a lot with Gonéri, considering the different possibilities. But I don’t think it’s a good idea.

I think that both projects should try to do one thing, but do it very well, instead of trying to fix the world. UDD focuses on importing data that exists elsewhere. Sometimes it means doing some complex processing. But data should be be generated by UDD. Merging the projects would mean having a very big piece of software that does everything (or tries to do everything).

Both databases were designed differently, with different goals. UDD tries to stay close to the data it imports. There might be some incoherences in the data sources, but that’s fine: one of the goal of UDD is to make it easy to find them (and fix them), so we need them in the DB. In buildstat, since the goal of importing data is to display it on the web interface, with a strict use case, you can freely “simplify” data if it helps. Another big difference is that UDD is designed to be easy to use (ie write and run queries) by a human user: UDD uses multi-column primary keys, while buildstat uses surrogate keys (integer “id” keys), that ORM tools usually require.

There are also more technical concerns: currently UDD makes a compromise, for each data source, on what it imports: it tries to import the data that is useful, not all the data available. Merging buildstat and UDD would mean increasing the DB size significantly, by adding all the “private” data that buildstat needs. Another problem is the stable API problem: if buildstat and UDD are merged, it means that buildstat cannot change its DB schema without making sure that it wouldn’t break what UDD users are doing.

So, what should we do, from my POV, instead of merging?

– Continue to talk, and get Gonéri into the UDD “team”. He gathered a lot of experience working on buildstat, and he probably would be able to help a lot.

– Data that is not generated by buildstat (bugs data) should be imported from UDD. Doing SQL->SQL will probably make things easier there.

– A summary of buildstat’s infos should be imported into UDD.

There’s also the issue of providing the data through a web interface, to the DDs, which buildstat tries to address partially. The Debian Developer Packages Overview (DDPO)’s main limitations are:

– lack of knowledge about VCS (what buildstat solves)

– lack of knowledge about complex organizations (you can’t get any list of packages, or list of packages maintained by teams not using a consistant Maintainer/Uploaders scheme, or list of packages from tasks, etc)

– poor handling of large amount of packages. In teams with lots of packages, it’s useful to have restricted views, such as “packages outdated compared to upstream”, “packages which have bugs/RC bugs”, “packages which are newer in the VCS than in the archive”. The perl team’s work on PET clearly shows the kind of things that are needed.

At this point, I think that DDPO would benefit from a full rewrite (using the existing code as a source of inspiration, and making sure that there are no regressions, of course).

Using UDD as the data source, it should be easy to get something done (even if UDD still lacks some of the data DDPO has currently). But it still requires web developer skills, which I don’t have. If you are interested, contact me!

tiling terminals manager

I tried terminator (thanks go to Nicolas Valcarcel for asking me to sponsor a Debian upload, thus forcing me to try it, and Asheesh Laroia for doing a lightning talk at debconf about it), but I’m not convinced.
– More keybindings are clearly missing. You can only switch terminals using Previous/Next keybindings.
– More features would be great, like the ability to switch the position of two terminals (so you could reorganize them).
– It has some small usability problems, like the fact that the config is text-based, not using gconf, that it’s not possible to change the config without restarting it, that the title bar doesn’t display anything useful most of the time, since it prefixes the current terminal’s title with “Terminator: “, etc.

So, is there any other tiling terminals manager I should try, before filing tons of feature requests on terminator? My other requirement is that it mustn’t reinvent the wheel, but use the gnome-terminal widget.

Thank you.

Debian’s Freeze

Debian’s freeze sounds like a technical hack to address a social problem, and that disturbs me a bit.

The social problem is: At some point, we need everybody in Debian to make only non-disruptive changes, so everything can converge very fast into a releasable state.

The “solution” we are using is that we are blocking all packages from migrating to testing, and requiring manual review from someone on the release team. Consequences are:
– many people feel that you need to be very convincing to fix a small, not RC bug, even if fixing that bug definitely increases your package’s quality.
– the release team is completely overwhelmed by unblock requests during the freeze
– many people just stop trying to fix things during the freeze (which definitely doesn’t improve Debian’s quality), both because they think it’s hard to get a fix in, and because they don’t want to bother the release team

I wonder if we really need such a strict policy. Are there other Free Software projects that use such a technical measure to prevent software from disrupting stable releases? I am the impression that most other projects rely on social pressure instead of technical measures for that, except maybe during the last few hours before the release.

Couldn’t we act on the social level? We could default to allow everyone’s package to migrate to testing, and, when someone fucks up and uploads something that should not have been uploaded, block all his packages (switching to manual review mode) until the release. Of course, that require the release team to make decisions about _people_, which is harder than making decisions about _packages_. But if the rules are clearly stated, couldn’t this work?

Code for Debian versions comparison?

Do you know code that compare versions of debian packages (for example, that knows if 2:23.2.3~rc1-1 is lower than 2:23.2.3-2?), besides dpkg –compare-versions? If yes, please write a comment to this blog post, preferably with a link to the code.

Also, did someone already write a test suite for that? Who would be interested in such a test suite?

I’m considering writing a function in PL/SQL to compare debian versions (for the Ultimate Debian Database project). If someone already wrote that, I’m interested as well.

Of popular packages removed from testing, and the Ultimate Debian Database GSOC project

Some time ago, there was some flamewars^H^Hdebate about the Release Team’s removals of RC-buggy packages from testing. Basically, some people claimed that popular packages shouldn’t be removed, even if RC-buggy.

But, do we really miss popular packages in testing?

It’s difficult to know. You could get the popcon data, and compare it with the Packages files for testing and unstable. Or work with source packages (which removes a lot of noise), but then, you have to convert the popcon data (which uses binary packages names) to source packages. Not completely trivial.

That’s where the Ultimate Debian Database GSOC project comes to the rescue. The goal of Christian von Essen’s project is to gather data from various sources in Debian into a single SQL DB, so queries that combine all those data sources can easily be written.

For example, here is the query that lists the source packages that are in unstable, but not in testing, sorted by their popcon (using the number of insts of the most popular binary package of the source package as value for the source package):

SELECT DISTINCT unstable.package, insts
FROM (SELECT DISTINCT package FROM sources
WHERE distribution = 'debian' and release = 'sid') AS unstable, popcon_src
WHERE unstable.package NOT IN (
   SELECT package FROM sources
   WHERE distribution = 'debian' AND release = 'lenny')
AND popcon_src.source = unstable.package ORDER BY insts DESC;

And the results are available on the web!

Top packages (> 1000 insts):

lzo	64962
gnome-cups-manager	32346
db4.6	20708
ffmpeg-debian	12908
freetype1	10569
flashplugin-nonfree	7116
perlftlib	6769
nvidia-graphics-drivers	3864
wxwindows2.4	3640
dvi2tty	2239
kdebase-runtime	1725
easytag	1717
g-wrap	1582
yaird	1507
slocate	1499
youtube-dl	1390
hugin	1275
w3c-libwww	1058

Interested in UDD? Join #debian-qa or debian-qa@lists.d.o (or talk to me @DebConf!)

Exporting logs from Suunto X6HR watches on Linux

I’m the happy owner of a nice geeky toy: a Suunto X6HR watch, that includes an altimeter and an heart rate monitor, which I use mainly for moutain biking and hiking.

During outings, the watch can log the altitude and heart rate every 2, 10 or 60 seconds, and the data can be transfered to a PC using a serial interface. The problem is that Suunto only provides software for Windows. I got tired of using virtualbox to connect to the watch (qemu doesn’t work, Suunto Activity Manager apparently does strange things with the serial port), so I reverse-engineered the protocol (using skimanager and Jérome Kieffer’s work as a basis) and implemented a script to fetch the logs, and export them in a format suitable for gnuplot.

Of course, Suuntux is publicly available. I’d be happy to hear from you if it works for you too. Also, if you own a Suunto X6 (similar watch, without HRM), I’d be interested in supporting it too (if it’s not supported already).

Below is a example graph, from a short mountain bike ride just before leaving for Debconf.

example suuntux output