Ubuntu information on the Debian Package Tracking System and the Developer Packages Overview

Users of Debian derivatives sometimes report bugs that are not reported in the
Debian BTS, but that also affect Debian. It already happened a few times that
looking at the Ubuntu bugs for my packages allowed me to fix an unreported bug
in my Debian packages.

But it’s difficult to keep track of the status of our packages in Ubuntu, since
Launchpad doesn’t provide a per-Debian-maintainer summary. Since it’s always fun to abuse proprietary software, I fetched all the bug data from Launchpad and inserted it in an SQLite DB (takes about 30 mins at 1200 HTTP requests/minute — it would be so much easier if the Launchpad devs added a text export of all bugs).

The result is that there’s now an “Ubuntu” box on the Packages Tracking System, giving the current version in Ubuntu, a link to the Ubuntu patch (if any), and the number of open bugs. An Ubuntu column has also been added to the Debian Developer Packages Overview by Christoph Berg, with the current version in Ubuntu and the number of open bugs. It’s hidden by default: click on Display Configuration to enable it (then it’s stored in a cookie).

I hope that this will help Debian maintainers to track what has been reported/fixed in Ubuntu. Also, if other Debian derivatives want to export the same kind of information, don’t hesitate to contact us.

See for example:

PS: the data might be slightly outdated, as it is processed on merkel.d.o, which was offline until recently. Expect it to be up-to-date in the next 24 hours.

Text normalizer, anyone?

When I write text documents (using LaTeX or Docbook), I like to wrap lines, as it makes them easier to edit (less things moving on the screen), and allow to have easy-to-read diffs.

However, I always hesitate before rewrapping paragraphs (using vim’s gqap): this mean that I will add noise to my git history. So I only do that from time to time, making “rewrapping-only commits”. But that sucks, since in the meantime, I sometimes make a lot of changes, and my lines grow long again. Of course, I could rewrap my paragraphs before each commit, but if I simply add a word to a paragraph, it might cause all the lines to be rewrapped.

So I what I would need is some kind of “text normalizer” that will:

  • split lines at The Right Place. After ‘.’, ‘,’, ‘:’, ‘;’, etc. So rewrapping won’t propagate changes too far away.
  • understand the basics of LaTeX, so it won’t rewrap
    \begin{tabular}{|l|l|}\hline
    x & y \\
    1 & 2 \\\hline
    \end{tabular}

    or

    \begin{figure}
    \centerline{\includegraphics{fig}}
    \caption{Cool stuff}
    \label{coolstuff}
    \end{figure}

    (vim does rewrap those examples.)

  • be editor-agnostic. So other committers could use it as well.
  • support for other document formats (docbook XML) would be nice too.

I’ve looked at plasTeX: I could use it to parse a LaTeX document, and export it as LaTeX. But then it would be a LaTeX-only solution. Does anyone have a better solution?

Datamining Launchpad bugs

One think that really is annoying with Launchpad is its lack of interfaces with the outside world. No SOAP interface (well, I think that work is being done on this), no easy way to export all bugs. The only way to get all the bug data in a machine-parseable is to first fetch this URL, and then, for each bug number listed there, to make another request for https://launchpad.net/bugs/$bug/+text. I filed a bug a few weeks ago, asking for a simpler way to get all the data.

A Launchpad dev suggested to do what I just described (fetch all the number, then fetch the data for each bug). I originally dismissed the idea because it just sounded too dirty/aggressive/whatever, but since I needed to practice python, I gave it a try. And actually, it works: I was able to get all the data in less than an hour (but that probably put some load on Launchpad ;-)).

That allows to write cool SQL queries.

Bugs with the most subscribers:

select bugs.bug, title, count(*) as subscribers
from bugs, subscribers
where bugs.bug = subscribers.bug
group by bugs.bug, title
order by subscribers desc
limit 10;
bug firefox subscribers
188540 firefox-3.0 crashed with SIGSEGV in g_slice_alloc() 291
154697 package update-manager 1:0.81 failed to install/upgrade: ErrorMessage: SystemError in cache.commit(): E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1) 278
141613 npviewer.bin crashed with SIGSEGV 262
59695 High frequency of load/unload cycles on some hard disks may shorten lifetime 182
215005 jockey-gtk crashed with AttributeError in enables_composite() 171
216043 compiz.real crashed with SIGSEGV 168
121653 [gutsy] fglrx breaks over suspend/resume 144
1 Microsoft has a majority market share 142
145360 compiz.real crashed with SIGSEGV 134
23369 firefox(-gnome-support) should get proxy from gconf 126

Bugs where someone is subscribed twice:

select bug, subscriber_login as cnt
from subscribers
group by bug, subscriber_login
having count(*) > 1;
bug subscriber
33065 mvo
48262 mvo
144628 skyguy
158126 benekal
213741 sandro-grundmann
216043 jotacayul
221630 kami911

(Yes, that forced me to change a primary key)

Packages with the most bugs:

select package, count(distinct bug) as cnt
from tasks
group by package
order by cnt desc
limit 10;
package number
ubuntu 5392
linux 1464
linux-source-2.6.20 1034
update-manager 826
linux-source-2.6.22 724
firefox 684
kdebase 673
firefox-3.0 668
ubiquity 590
openoffice.org 566

Bugs with the shortest titles:

select bug, title, length(title) as len
from bugs
order by len asc
limit 5;
bug title length
190560 1
160381 uh 2
224350 css 3
133621 gnus 4
138052 pbe5 4

If you want to play too, you can fetch the SQLite3 DB (5.8M, lzma-compressed), the DB creation script, and the script that fetches the bugs and import them into the DB. Comments about my code would be very appreciated (stuff like “oh, there’s a better way to do that in python!”), as I’m not very confident about my pythonic skills. :-)

Update: apparently, I’m not really fetching all the bugs. I’m getting the same results as when you just press “Search” on https://launchpad.net/ubuntu/+bugs. But if you click on “Advanced search”, then select all the bug statuses, and click search, you get a lot more bugs (154066 vs 49031). If someone know which bugs are excluded with the default search, I’m interested!

Update 2: Got it. Apparently the default search doesn’t list bugs that have all their “tasks” marked “Won’t fix”, “Fix Released”, or “Invalid”.