One think that really is annoying with Launchpad is its lack of interfaces with the outside world. No SOAP interface (well, I think that work is being done on this), no easy way to export all bugs. The only way to get all the bug data in a machine-parseable is to first fetch this URL, and then, for each bug number listed there, to make another request for https://launchpad.net/bugs/$bug/+text. I filed a bug a few weeks ago, asking for a simpler way to get all the data.
A Launchpad dev suggested to do what I just described (fetch all the number, then fetch the data for each bug). I originally dismissed the idea because it just sounded too dirty/aggressive/whatever, but since I needed to practice python, I gave it a try. And actually, it works: I was able to get all the data in less than an hour (but that probably put some load on Launchpad ;-)).
That allows to write cool SQL queries.
Bugs with the most subscribers:
select bugs.bug, title, count(*) as subscribers
from bugs, subscribers
where bugs.bug = subscribers.bug
group by bugs.bug, title
order by subscribers desc
limit 10;
bug |
firefox |
subscribers |
188540 |
firefox-3.0 crashed with SIGSEGV in g_slice_alloc() |
291 |
154697 |
package update-manager 1:0.81 failed to install/upgrade: ErrorMessage: SystemError in cache.commit(): E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1), E:Sub-process /tmp/tmpjP6Bsx/backports/usr/bin/dpkg returned an error code (1) |
278 |
141613 |
npviewer.bin crashed with SIGSEGV |
262 |
59695 |
High frequency of load/unload cycles on some hard disks may shorten lifetime |
182 |
215005 |
jockey-gtk crashed with AttributeError in enables_composite() |
171 |
216043 |
compiz.real crashed with SIGSEGV |
168 |
121653 |
[gutsy] fglrx breaks over suspend/resume |
144 |
1 |
Microsoft has a majority market share |
142 |
145360 |
compiz.real crashed with SIGSEGV |
134 |
23369 |
firefox(-gnome-support) should get proxy from gconf |
126 |
Bugs where someone is subscribed twice:
select bug, subscriber_login as cnt
from subscribers
group by bug, subscriber_login
having count(*) > 1;
bug |
subscriber |
33065 |
mvo |
48262 |
mvo |
144628 |
skyguy |
158126 |
benekal |
213741 |
sandro-grundmann |
216043 |
jotacayul |
221630 |
kami911 |
(Yes, that forced me to change a primary key)
Packages with the most bugs:
select package, count(distinct bug) as cnt
from tasks
group by package
order by cnt desc
limit 10;
package |
number |
ubuntu |
5392 |
linux |
1464 |
linux-source-2.6.20 |
1034 |
update-manager |
826 |
linux-source-2.6.22 |
724 |
firefox |
684 |
kdebase |
673 |
firefox-3.0 |
668 |
ubiquity |
590 |
openoffice.org |
566 |
Bugs with the shortest titles:
select bug, title, length(title) as len
from bugs
order by len asc
limit 5;
bug |
title |
length |
190560 |
– |
1 |
160381 |
uh |
2 |
224350 |
css |
3 |
133621 |
gnus |
4 |
138052 |
pbe5 |
4 |
If you want to play too, you can fetch the SQLite3 DB (5.8M, lzma-compressed), the DB creation script, and the script that fetches the bugs and import them into the DB. Comments about my code would be very appreciated (stuff like “oh, there’s a better way to do that in python!”), as I’m not very confident about my pythonic skills. :-)
Update: apparently, I’m not really fetching all the bugs. I’m getting the same results as when you just press “Search” on https://launchpad.net/ubuntu/+bugs. But if you click on “Advanced search”, then select all the bug statuses, and click search, you get a lot more bugs (154066 vs 49031). If someone know which bugs are excluded with the default search, I’m interested!
Update 2: Got it. Apparently the default search doesn’t list bugs that have all their “tasks” marked “Won’t fix”, “Fix Released”, or “Invalid”.