One think that really is annoying with Launchpad is its lack of interfaces with the outside world. No SOAP interface (well, I think that work is being done on this), no easy way to export all bugs. The only way to get all the bug data in a machine-parseable is to first fetch this URL, and then, for each bug number listed there, to make another request for https://launchpad.net/bugs/$bug/+text. I filed a bug a few weeks ago, asking for a simpler way to get all the data.
A Launchpad dev suggested to do what I just described (fetch all the number, then fetch the data for each bug). I originally dismissed the idea because it just sounded too dirty/aggressive/whatever, but since I needed to practice python, I gave it a try. And actually, it works: I was able to get all the data in less than an hour (but that probably put some load on Launchpad ;-)).
That allows to write cool SQL queries.
Bugs with the most subscribers:
select bugs.bug, title, count(*) as subscribers from bugs, subscribers where bugs.bug = subscribers.bug group by bugs.bug, title order by subscribers desc limit 10;
|188540||firefox-3.0 crashed with SIGSEGV in g_slice_alloc()||291|
|154697||package update-manager 1:0.81 failed to install/upgrade: ErrorMessage: SystemError in cache.commit(): E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1)||278|
|141613||npviewer.bin crashed with SIGSEGV||262|
|59695||High frequency of load/unload cycles on some hard disks may shorten lifetime||182|
|215005||jockey-gtk crashed with AttributeError in enables_composite()||171|
|216043||compiz.real crashed with SIGSEGV||168|
|121653||[gutsy] fglrx breaks over suspend/resume||144|
|1||Microsoft has a majority market share||142|
|145360||compiz.real crashed with SIGSEGV||134|
|23369||firefox(-gnome-support) should get proxy from gconf||126|
Bugs where someone is subscribed twice:
select bug, subscriber_login as cnt from subscribers group by bug, subscriber_login having count(*) > 1;
(Yes, that forced me to change a primary key)
Packages with the most bugs:
select package, count(distinct bug) as cnt from tasks group by package order by cnt desc limit 10;
Bugs with the shortest titles:
select bug, title, length(title) as len from bugs order by len asc limit 5;
If you want to play too, you can fetch the SQLite3 DB (5.8M, lzma-compressed), the DB creation script, and the script that fetches the bugs and import them into the DB. Comments about my code would be very appreciated (stuff like “oh, there’s a better way to do that in python!”), as I’m not very confident about my pythonic skills. :-)
Update: apparently, I’m not really fetching all the bugs. I’m getting the same results as when you just press “Search” on https://launchpad.net/ubuntu/+bugs. But if you click on “Advanced search”, then select all the bug statuses, and click search, you get a lot more bugs (154066 vs 49031). If someone know which bugs are excluded with the default search, I’m interested!
Update 2: Got it. Apparently the default search doesn’t list bugs that have all their “tasks” marked “Won’t fix”, “Fix Released”, or “Invalid”.