{"id":293,"date":"2008-06-07T23:43:55","date_gmt":"2008-06-07T21:43:55","guid":{"rendered":"http:\/\/www.lucas-nussbaum.net\/blog\/?p=293"},"modified":"2008-06-08T00:18:43","modified_gmt":"2008-06-07T22:18:43","slug":"dumping-launchpad-bugs-as-text","status":"publish","type":"post","link":"https:\/\/www.lucas-nussbaum.net\/blog\/?p=293","title":{"rendered":"Datamining Launchpad bugs"},"content":{"rendered":"<p>One think that really is annoying with Launchpad is its lack of interfaces with the outside world. No SOAP interface (well, I think that work is being done on this), no easy way to export all bugs. The only way to get all the bug data in a machine-parseable is to first fetch <a href=\"https:\/\/launchpad.net\/ubuntu\/+bugs-text\">this URL<\/a>, and then, for each bug number listed there, to make another request for <a href=\"https:\/\/launchpad.net\/bugs\/33065\/+text\">https:\/\/launchpad.net\/bugs\/$bug\/+text<\/a>. I filed <a href=\"https:\/\/bugs.launchpad.net\/malone\/+bug\/231402\">a bug<\/a> a few weeks ago, asking for a simpler way to get all the data.<\/p>\n<p>A Launchpad dev suggested to do what I just described (fetch all the number, then fetch the data for each bug). I originally dismissed the idea because it just sounded too dirty\/aggressive\/whatever, but since I needed to practice python, I gave it a try. And actually, it works: I was able to get all the data in less than an hour (but that probably put some load on Launchpad ;-)).<\/p>\n<p>That allows to write cool SQL queries.<\/p>\n<p><b>Bugs with the most subscribers:<\/b><\/p>\n<pre>select bugs.bug, title, count(*) as subscribers\r\nfrom bugs, subscribers\r\nwhere bugs.bug = subscribers.bug\r\ngroup by bugs.bug, title\r\norder by subscribers desc\r\nlimit 10;<\/pre>\n<table border=\"1\">\n<tr>\n<th>bug<\/th>\n<th>firefox<\/th>\n<th>subscribers<\/th>\n<\/tr>\n<tr>\n<td>188540<\/td>\n<td>firefox-3.0 crashed with SIGSEGV in g_slice_alloc()<\/td>\n<td>291<\/td>\n<\/tr>\n<tr>\n<td>154697<\/td>\n<td>package update-manager 1:0.81 failed to install\/upgrade: ErrorMessage: SystemError in cache.commit(): E:Sub-process \/tmp\/tmpjP6Bsx\/backports\/usr\/bin\/dpkg returned an error code (1), E:Sub-process \/tmp\/tmpjP6Bsx\/backports\/usr\/bin\/dpkg returned an error code (1), E:Sub-process \/tmp\/tmpjP6Bsx\/backports\/usr\/bin\/dpkg returned an error code (1), E:Sub-process \/tmp\/tmpjP6Bsx\/backports\/usr\/bin\/dpkg returned an error code (1)<\/td>\n<td>278<\/td>\n<\/tr>\n<tr>\n<td>141613<\/td>\n<td>npviewer.bin crashed with SIGSEGV<\/td>\n<td>262<\/td>\n<\/tr>\n<tr>\n<td>59695<\/td>\n<td>High frequency of load\/unload cycles on some hard disks may shorten lifetime<\/td>\n<td>182<\/td>\n<\/tr>\n<tr>\n<td>215005<\/td>\n<td>jockey-gtk crashed with AttributeError in enables_composite()<\/td>\n<td>171<\/td>\n<\/tr>\n<tr>\n<td>216043<\/td>\n<td>compiz.real crashed with SIGSEGV<\/td>\n<td>168<\/td>\n<\/tr>\n<tr>\n<td>121653<\/td>\n<td>[gutsy] fglrx breaks over suspend\/resume<\/td>\n<td>144<\/td>\n<\/tr>\n<tr>\n<td>1<\/td>\n<td>Microsoft has a majority market share<\/td>\n<td>142<\/td>\n<\/tr>\n<tr>\n<td>145360<\/td>\n<td>compiz.real crashed with SIGSEGV<\/td>\n<td>134<\/td>\n<\/tr>\n<tr>\n<td>23369<\/td>\n<td>firefox(-gnome-support) should get proxy from gconf<\/td>\n<td>126<\/td>\n<\/tr>\n<\/table>\n<p><b>Bugs where someone is subscribed twice:<\/b><\/p>\n<pre>select bug, subscriber_login as cnt\r\nfrom subscribers\r\ngroup by bug, subscriber_login\r\nhaving count(*) > 1;<\/pre>\n<table border=\"1\">\n<tr>\n<th>bug<\/th>\n<th>subscriber<\/th>\n<\/tr>\n<tr>\n<td>33065<\/td>\n<td>mvo<\/td>\n<\/tr>\n<tr>\n<td>48262<\/td>\n<td>mvo<\/td>\n<\/tr>\n<tr>\n<td>144628<\/td>\n<td>skyguy<\/td>\n<\/tr>\n<tr>\n<td>158126<\/td>\n<td>benekal<\/td>\n<\/tr>\n<tr>\n<td>213741<\/td>\n<td>sandro-grundmann<\/td>\n<\/tr>\n<tr>\n<td>216043<\/td>\n<td>jotacayul<\/td>\n<\/tr>\n<tr>\n<td>221630<\/td>\n<td>kami911<\/td>\n<\/tr>\n<\/table>\n<p>(Yes, that forced me to change a primary key)<\/p>\n<p><b>Packages with the most bugs:<\/b><\/p>\n<pre>select package, count(distinct bug) as cnt\r\nfrom tasks\r\ngroup by package\r\norder by cnt desc\r\nlimit 10;<\/pre>\n<table border=\"1\">\n<tr>\n<th>package<\/th>\n<th>number<\/th>\n<\/tr>\n<tr>\n<td>ubuntu<\/td>\n<td>5392<\/td>\n<\/tr>\n<tr>\n<td>linux<\/td>\n<td>1464<\/td>\n<\/tr>\n<tr>\n<td>linux-source-2.6.20<\/td>\n<td>1034<\/td>\n<\/tr>\n<tr>\n<td>update-manager<\/td>\n<td>826<\/td>\n<\/tr>\n<tr>\n<td>linux-source-2.6.22<\/td>\n<td>724<\/td>\n<\/tr>\n<tr>\n<td>firefox<\/td>\n<td>684<\/td>\n<\/tr>\n<tr>\n<td>kdebase<\/td>\n<td>673<\/td>\n<\/tr>\n<tr>\n<td>firefox-3.0<\/td>\n<td>668<\/td>\n<\/tr>\n<tr>\n<td>ubiquity<\/td>\n<td>590<\/td>\n<\/tr>\n<tr>\n<td>openoffice.org<\/td>\n<td>566<\/td>\n<\/tr>\n<\/table>\n<p><b>Bugs with the shortest titles:<\/b><\/p>\n<pre>select bug, title, length(title) as len\r\nfrom bugs\r\norder by len asc\r\nlimit 5;<\/pre>\n<table border=\"1\">\n<tr>\n<th>bug<\/th>\n<th>title<\/th>\n<th>length<\/th>\n<\/tr>\n<tr>\n<td>190560<\/td>\n<td>&#8211;<\/td>\n<td>1<\/td>\n<\/tr>\n<tr>\n<td>160381<\/td>\n<td>uh<\/td>\n<td>2<\/td>\n<\/tr>\n<tr>\n<td>224350<\/td>\n<td>css<\/td>\n<td>3<\/td>\n<\/tr>\n<tr>\n<td>133621<\/td>\n<td>gnus<\/td>\n<td>4<\/td>\n<\/tr>\n<tr>\n<td>138052<\/td>\n<td>pbe5<\/td>\n<td>4<\/td>\n<\/tr>\n<\/table>\n<p>If you want to play too, you can fetch the <a href=\"http:\/\/blop.info\/pub\/lpbugs\/ubuntubugs.db.lzma\">SQLite3 DB<\/a> (5.8M, lzma-compressed), the <a href=\"http:\/\/blop.info\/pub\/lpbugs\/createdb.py\">DB creation script<\/a>, and the <a href=\"http:\/\/blop.info\/pub\/lpbugs\/launchpad-sucker.py\">script that fetches the bugs and import them into the DB<\/a>. Comments about my code would be very appreciated (stuff like &#8220;oh, there&#8217;s a better way to do that in python!&#8221;), as I&#8217;m not very confident about my pythonic skills. :-)<\/p>\n<p><b>Update:<\/b> apparently, I&#8217;m not really fetching all the bugs. I&#8217;m getting the same results as when you just press &#8220;Search&#8221; on https:\/\/launchpad.net\/ubuntu\/+bugs. But if you click on &#8220;Advanced search&#8221;, then select all the bug statuses, and click search, you get a lot more bugs (154066 vs 49031). If someone know which bugs are excluded with the default search, I&#8217;m interested!<\/p>\n<p><b>Update 2:<\/b> Got it. Apparently the default search doesn&#8217;t list bugs that have all their &#8220;tasks&#8221; marked &#8220;Won&#8217;t fix&#8221;, &#8220;Fix Released&#8221;, or &#8220;Invalid&#8221;.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One think that really is annoying with Launchpad is its lack of interfaces with the outside world. No SOAP interface (well, I think that work is being done on this), no easy way to export all bugs. The only way to get all the bug data in a machine-parseable is to first fetch this URL, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"0","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13,12],"tags":[],"class_list":["post-293","post","type-post","status-publish","format-standard","hentry","category-planetdebian","category-planetubuntu"],"_links":{"self":[{"href":"https:\/\/www.lucas-nussbaum.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/293","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.lucas-nussbaum.net\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.lucas-nussbaum.net\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.lucas-nussbaum.net\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.lucas-nussbaum.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=293"}],"version-history":[{"count":0,"href":"https:\/\/www.lucas-nussbaum.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/293\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.lucas-nussbaum.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=293"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.lucas-nussbaum.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=293"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.lucas-nussbaum.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=293"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}