One think that really is annoying with Launchpad is its lack of interfaces with the outside world. No SOAP interface (well, I think that work is being done on this), no easy way to export all bugs. The only way to get all the bug data in a machine-parseable is to first fetch this URL, and then, for each bug number listed there, to make another request for https://launchpad.net/bugs/$bug/+text. I filed a bug a few weeks ago, asking for a simpler way to get all the data.
A Launchpad dev suggested to do what I just described (fetch all the number, then fetch the data for each bug). I originally dismissed the idea because it just sounded too dirty/aggressive/whatever, but since I needed to practice python, I gave it a try. And actually, it works: I was able to get all the data in less than an hour (but that probably put some load on Launchpad ;-)).
That allows to write cool SQL queries.
Bugs with the most subscribers:
select bugs.bug, title, count(*) as subscribers
from bugs, subscribers
where bugs.bug = subscribers.bug
group by bugs.bug, title
order by subscribers desc
limit 10;
| bug |
firefox |
subscribers |
| 188540 |
firefox-3.0 crashed with SIGSEGV in g_slice_alloc() |
291 |
| 154697 |
package update-manager 1:0.81 failed to install/upgrade: ErrorMessage: SystemError in cache.commit(): E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1) |
278 |
| 141613 |
npviewer.bin crashed with SIGSEGV |
262 |
| 59695 |
High frequency of load/unload cycles on some hard disks may shorten lifetime |
182 |
| 215005 |
jockey-gtk crashed with AttributeError in enables_composite() |
171 |
| 216043 |
compiz.real crashed with SIGSEGV |
168 |
| 121653 |
[gutsy] fglrx breaks over suspend/resume |
144 |
| 1 |
Microsoft has a majority market share |
142 |
| 145360 |
compiz.real crashed with SIGSEGV |
134 |
| 23369 |
firefox(-gnome-support) should get proxy from gconf |
126 |
Bugs where someone is subscribed twice:
select bug, subscriber_login as cnt
from subscribers
group by bug, subscriber_login
having count(*) > 1;
| bug |
subscriber |
| 33065 |
mvo |
| 48262 |
mvo |
| 144628 |
skyguy |
| 158126 |
benekal |
| 213741 |
sandro-grundmann |
| 216043 |
jotacayul |
| 221630 |
kami911 |
(Yes, that forced me to change a primary key)
Packages with the most bugs:
select package, count(distinct bug) as cnt
from tasks
group by package
order by cnt desc
limit 10;
| package |
number |
| ubuntu |
5392 |
| linux |
1464 |
| linux-source-2.6.20 |
1034 |
| update-manager |
826 |
| linux-source-2.6.22 |
724 |
| firefox |
684 |
| kdebase |
673 |
| firefox-3.0 |
668 |
| ubiquity |
590 |
| openoffice.org |
566 |
Bugs with the shortest titles:
select bug, title, length(title) as len
from bugs
order by len asc
limit 5;
| bug |
title |
length |
| 190560 |
– |
1 |
| 160381 |
uh |
2 |
| 224350 |
css |
3 |
| 133621 |
gnus |
4 |
| 138052 |
pbe5 |
4 |
If you want to play too, you can fetch the SQLite3 DB (5.8M, lzma-compressed), the DB creation script, and the script that fetches the bugs and import them into the DB. Comments about my code would be very appreciated (stuff like “oh, there’s a better way to do that in python!”), as I’m not very confident about my pythonic skills. :-)
Update: apparently, I’m not really fetching all the bugs. I’m getting the same results as when you just press “Search” on https://launchpad.net/ubuntu/+bugs. But if you click on “Advanced search”, then select all the bug statuses, and click search, you get a lot more bugs (154066 vs 49031). If someone know which bugs are excluded with the default search, I’m interested!
Update 2: Got it. Apparently the default search doesn’t list bugs that have all their “tasks” marked “Won’t fix”, “Fix Released”, or “Invalid”.