UDD and packages metrics

Peter Eisentraut played with Ultimate Debian Database, and wanted to create a “maintenance effort” metric by multiplying each package’s installed size by its popcon. His query is:
SELECT rank() OVER (ORDER BY score DESC), source,
sum(installed_size::numeric * insts) AS score
FROM packages JOIN popcon USING (package)
WHERE distribution = 'debian' AND release = 'sid'
AND component = 'main' AND architecture IN ('all', 'i386')
GROUP BY source, version ORDER BY score DESC LIMIT 30;

Besides all the interesting things that I learnt by looking at his query (rank(), and a bug in UDD because installed_size should really be numeric to avoid the conversion), Peter had a problem with his query: linux-2.6 is missing from the results, while it should obviously have a large popcon and a large install size.

The problem is that the binary packages for linux-2.6 often change, so they don’t get very high in popcon. The unstable kernel package gets a ridiculous popcon score:

select package, insts from popcon
where package in (select package from packages where source ='linux-2.6' and release='sid')
order by insts desc limit 30;

package                           | insts
----------------------------------+-------
linux-libc-dev                    | 38703
linux-source-2.6.29               |   614
linux-headers-2.6.29-2-common     |   256
linux-image-2.6.29-2-amd64        |   239

A solution could be to change the metric to be: MAX(insts over all binary packages from this source package) * SUM(installed_size)
The good thing is that UDD already offers a popcon_src view, that gives the popcon score for a source package. So the query becomes:
SELECT rank() OVER (ORDER BY score DESC), source,
sum(installed_size::numeric * insts) AS score
FROM packages JOIN popcon_src USING (source)
WHERE distribution = 'debian' AND release = 'sid'
AND component = 'main' AND architecture IN ('all', 'i386')
GROUP BY source ORDER BY score DESC LIMIT 30;

 rank |     source    |    score
------+---------------+-------------
1 | openoffice.org    | 92177633504
2 | qt4-x11           | 18503941620
3 | linux-2.6         | 16036201020
4 | gcc-4.3           | 14369300376
5 | mesa              | 12962475968
6 | eglibc            | 12581290240
7 | gcc-4.4           | 11411296672
8 | samba             | 10021083072
9 | xulrunner         |  9037295424
10 | mysql-dfsg-5.0   |  8348333532

This time, linux-2.6 shows up near the top of the list.

One thought on “UDD and packages metrics

  1. The number of lines of debian patches is also an indicator of Maintainer Effort.

Comments are closed.