qemu 0.8.1 and tun/tap networking

qemu’s documentation isn’t really crystal clear, and I lost a lot of time with this today.

The way you can link your guest system and your host system has changed between qemu 0.7 and 0.8, with the introduction of Virtual LANs. They are mandatory: you always have to use one, even if you only want to use one qemu instance.

When running qemu, you have to specify both endpoints :

  • the one to the host system (using -net tap)
  • the one to the guest system (using -net nic)

A working command line to start qemu looks like qemu -net nic -net tap -hda ….

IP over DNS

Sometimes, you are at an airport, hotel, train station, … Wireless Internet access is available, but very expensive, and you start to feel not so well because you haven’t checked your mail in the last 6 hours.

Often, DNS works : you can resolve everything, which is quite frustrating… But wait. Why can’t we tunnel IP packets into DNS packets ?

There are two existing solutions (to my knowledge) to do that. Both work using the same principle : you delegate a subdomain to a special DNS server (so you need to be able to run a DNS server somewhere, that means having UDP port 53 still available). Then, the client software issues DNS requests for TXT records in this subdomain. Data is encapsulated into those DNS requests.

NSTX (latest version, project page, Debian package) is written in C and tries to build a full VPN between you and the server. While it works, it doesn’t always work, probably because of some caching DNS servers which might not like your crafted packets. Also, it requires setting tun interfaces at both ends. So I gave up with it, and switched to OzymanDNS

OzymanDNS (latest release, no real documentation available) is a simpler alternative, written in Perl. It only allows you to tunnel a TCP connection, which is enough, because if you can establish an SSH connection, you can use SSH tunnelling, Socks proxying, or even build a full VPN with OpenSSH 4.3.

To run OzymanDNS, you need to install the necessary perl modules (Debian packages libnet-dns-perl and libmime-base32-perl, amongst others) on both ends. You also need to delegate a subdomain to your crafted server (we will use tunnel.example.com). Then :

  • On the server, start: ./nomde.pl -i 127.0.0.1 tunnel.example.com (the -i 127.0.0.1 is not important, it’s just the IP address the server returns in case it receives an A query).
  • On the client, run ssh with ProxyCommand: ssh -o ProxyCommand="./droute.pl sshdns.tunnel.example.com" localhost. The sshdns asks the server for a connection to localhost:22.
  • You know have an SSH connection open to the other end.

While it’s not very fast, it’s still quite usable for stuff like reading mails with mutt. Also, I experienced some crashes while trying to transfer a quite large file with scp over socks over ssh over IP over DNS, so there might be some bugs. If somebody wants to hack on this … :-)

Update: Nice presentation about all this stuff here.

Update 2: As mentionned in the comments, there’s also iodine, which is similar to nstx (tunnelling IP over DNS using tun devices on both ends). I also found DNScat. It’s written in Java, and can do tunnelling using ppp.

pipes and progress bars

dd is a very frustrating application. You often run it through complex pipes (well, I do), and you never know if it’s performing well (or if you forgot to tune the HD perf with hdparm before). The only thing you can is wait.

Well, not quite. If you read the doc, you would have found this :

       Sending  a  USR1  signal  to  a running ‘dd’ process makes it print I/O        statistics to standard error and then resume copying.               $ dd if=/dev/zero of=/dev/null& pid=$!               $ kill -USR1 $pid; sleep 1; kill $pid               18335302+0 records in 18335302+0 records  out  9387674624  bytes               (9.4 GB) copied, 34.6279 seconds, 271 MB/s

Interesting, isn’t it ? But I’ve found something even more interesting : pv (packaged in Debian/Ubuntu).

Description: Shell pipeline element to meter data passing through  pv (Pipe Viewer) can be inserted into any normal pipeline between two processes  to give a visual indication of how quickly data is passing through, how long it  has taken, how near to completion it is, and an estimate of how long it will be  until completion.  .  To use it, insert it in a pipeline between two processes, with the appropriate  options. Its standard input will be passed through to its standard output and  progress will be shown on standard error.

It’s very easy to use :

# dd if=/dev/hda12 | pv |dd of=/dev/hda13 88.2MB 0:00:04 [23.3MB/s] [                                               ] 

And it can even give an ETA if you use it like cat (e.g: pv file | nc -w 1 somewhere.com 3000).

Played with DTrace

Today, I installed NexentaOS on a Grid5000 cluster node. NexentaOS is basically Debian GNU/kOpenSolaris (Debian userland, with an OpenSolaris kernel. APT repository here). It works very well (good hardware support & detection, nice GNOME desktop). And I got the chance to play with DTrace.

Before writing this blog entry, I was considering writing a LinuxFR article about DTrace, when I came across this LinuxFR “journal” about Solaris 10, which gave me a good laugh. The best part is :

– DTrace : Sorte de surcouche de strace/ltrace. C’est peu intéressant. En démonstration, un type utilise DTrace pour “découvrir” que lancer une xterm écrit dans ~/.bash_history. C’est presque comique. (In approximate english: DTrace : a sort of layer above ltrace/strace. Not really interesting. In the demo, the developer used DTrace to “discover” that running xterm writes in ~/.bash_history. It’s nearly funny.)

It’s funny how people continue to compare DTrace to strace. It’s like comparing the GNOME project with fvwm. Yeah, both of them can display windows, and there are still people thinking that fvwm is enough for everybody.

OK, back to DTrace. DTrace is a tracing framework, which allows a consumer application (generally a D script) to register with some DTrace providers (probes) and get some data. This nice graph from the DTrace howto explains it much better than I do :

Most system monitoring tools on Linux use polling : they retrieve some data from the system at regular intervals (think of top, “vmstat 1”, …). DTrace changes this and uses push instead, which allows to monitor events that you wouldn’t notice on Linux. It also allows to monitor much more stuff than current Linux tools, in a very easy and clean way.

With DTrace, you can monitor a lot of stuff and find the answer to a lot of questions, like :

  • Monitor process creation, even the short ones. The execsnoop script (which would be a one-liner if you remove the output formatting, and is available in the DTrace Toolkit) shows that logging in by ssh and running for i in $(seq 1 3); do /bin/echo $i; done runs the following processes :
         0   1172   1171 sh -c /usr/bin/locale -a     0   1172   1171 /usr/bin/locale -a     0   1175   1173 -bash     0   1177   1176 id -u     0   1179   1178 dircolors -b     0   1174   1173 pt_chmod 9     0   1180   1175 mesg n     0   1181   1175 seq 1 3     0   1182   1175 /bin/echo 1     0   1183   1175 /bin/echo 2     0   1184   1175 /bin/echo 3 
  • Monitor user and library function calls, and profile them like gprof (yeah, DTrace can replace gprof)
  • Monitor system calls for the whole system or a specific app (yeah, DTrace can replace strace, but you already knew that ;). And you don’t need to restart the app before monitoring it.
  • Replace vmstat. Of course, you can also get the usual vmstat results, but only for events caused by a specific process.
  • Mesure the average latency between GET requests and their result when you browse the web using mozilla. DTrace does this by monitor write syscalls issued by your browser containing a GET and mesuring the delay before the subsequent read returns.
  • Monitor all open syscalls issued on the whole system
  • Monitor all TCP connections received by the system
  • Analyze disk I/O : how much data was written/read to/from the disk, by which process. Nice way to understand buffering and I/O scheduling.

Another example combining rwsnoop and iosnoop :

  1. I start rwsnoop -n bash (read/write monitor, only on processes named bash) and iosnoop -m / (I/O monitor, only on the root partition)
  2. I run : echo blop > t
  3. rwsnoop shows all the read/write calls issued to write to my pseudo-terminal, and the write call to /root/t :
       UID    PID CMD          D   BYTES FILE [...]     0   1175 bash         R       1 /devices/pseudo/pts@0:1     0   1175 bash         W       1 /devices/pseudo/pts@0:1     0   1175 bash         W       5 /root/t     0   1175 bash         W      32 /devices/pseudo/pts@0:1     0   1175 bash         W      17 /devices/pseudo/pts@0:1 
  4. iosnoop doesn’t display anything. But when I run sync :
       UID   PID D    BLOCK   SIZE       COMM PATHNAME     0  1264 W    96640   1024       sync /root/t 
  5. We can see that the write is buffered in the kernel. And that I the 5 chars to my file were transformed in 1024 bytes written to the disk. (Question for the reader: why 5 ? Yeah, it’s easy)

Short conclusion: DTrace looks fantastic. As a toy, it allows to demonstrate/understand the inner workings of Solaris. As a tool, it can probably provide A LOT of useful info, especially since writing DTrace providers seems quite easy (Ruby provider, PHP provider.

Second short conclusion for those who really haven’t understand anything (some LinuxFR readers ;-) : as you can see, when you run echo blop > t, “blop” is actually written to disk in /root/t. Fabulous, isn’t it ?

Comment se faire haïr de son hébergeur en quelques minutes

Disclaimer : toute ressemblance avec le code d’un site existant permettant d’échanger des maisons est totalement fortuite.

Il est très facile d’abuser de MySQL pour le transformer en une arme de destruction massive pour les performances de votre hébergeur. Il y a deux manières très simples d’y arriver :

  • Transférer le plus souvent possible d’énormes quantités de données du serveur SQL au serveur web. Bien sûr, votre script n’a pas besoin de toutes ces données, mais utiliser un SELECT restreint au lieu d’un SELECT *, ou pire, utiliser LIMIT, c’est tellement compliqué … alors que c’est si simple à faire en PHP. Par exemple, dans de vieilles versions de SPIP, la fonction chargée de nettoyer le cache (stocké dans MySQL) transférait l’ensemble du cache avant de décider si chaque entrée devait être conservée ou non. (C’est corrigé depuis février 2005 d’après le CVS de SPIP)
  • Une autre manière plus pernicieuse est de générer dynamiquement des requêtes SQL sans se poser la question des cas extrèmes de requêtes générées. Prenons un petit exemple.

Créons une petite table :

CREATE TABLE `torture` ( `id` INT UNSIGNED NOT NULL AUTO_INCREMENT , `value` INT, PRIMARY KEY ( `id` ) ) TYPE = MYISAM ;

À l’aide d’un petit script PHP, remplissons là avec 10000 valeurs, ce qui n’est pas si énorme que ça.

for ($i = 0; $i  

Maintenant, la partie intéressante : générons dynamiquement une requête.

 $sql = "SELECT id FROM torture WHERE TRUE"; for ($i = 0; $i  ".rand(0, 10000); }

Vous l'aurez compris, pour n = 10, ça donne qqchose comme SELECT id FROM torture WHERE TRUE AND value <> 14647 AND value <> 9936 AND value <> 10106 AND value <> 8136 AND value <> 5952 AND value <> 6908 AND value <> 14290 AND value <> 15359 AND value <> 2179 AND value <> 8005.

En se débrouillant pour passer le paramètre n à la page, on peut facilement tester pour différentes valeurs de n. J'ai testé sur le nouveau serveur MySQL d'Apinc, peu chargé et très performant. Si avec n = 10, la requête ne met que 0.03s à s'exécuter, elle met 2.8s avec n = 1000, et 22s (aye) avec n = 5000. Avec n = 10000, on atteint 38s.

Histoire d'être complet, on peut préciser qu'ajouter un index sur la colonne value ne change rien : on n'évite pas l'évaluation pour chaque ligne de la table. Par contre, en exprimant la même requête sous la forme SELECT id FROM torture WHERE value NOT IN(2791, 962, 49, 5845, 4425, 4129, 9905, 6468, 9681, 5776), on la transforme en une requête s'exécutant immédiatement (0.05s avec n = 10000)

Conclusions :

  • IN et NOT IN, c'est bien(tm).
  • Les requêtes générées dynamiquement sans connaitre leurs tailles, c'est mal(tm).