Played with DTrace

Today, I installed NexentaOS on a Grid5000 cluster node. NexentaOS is basically Debian GNU/kOpenSolaris (Debian userland, with an OpenSolaris kernel. APT repository here). It works very well (good hardware support & detection, nice GNOME desktop). And I got the chance to play with DTrace.

Before writing this blog entry, I was considering writing a LinuxFR article about DTrace, when I came across this LinuxFR “journal” about Solaris 10, which gave me a good laugh. The best part is :

– DTrace : Sorte de surcouche de strace/ltrace. C’est peu intéressant. En démonstration, un type utilise DTrace pour “découvrir” que lancer une xterm écrit dans ~/.bash_history. C’est presque comique. (In approximate english: DTrace : a sort of layer above ltrace/strace. Not really interesting. In the demo, the developer used DTrace to “discover” that running xterm writes in ~/.bash_history. It’s nearly funny.)

It’s funny how people continue to compare DTrace to strace. It’s like comparing the GNOME project with fvwm. Yeah, both of them can display windows, and there are still people thinking that fvwm is enough for everybody.

OK, back to DTrace. DTrace is a tracing framework, which allows a consumer application (generally a D script) to register with some DTrace providers (probes) and get some data. This nice graph from the DTrace howto explains it much better than I do :

Most system monitoring tools on Linux use polling : they retrieve some data from the system at regular intervals (think of top, “vmstat 1”, …). DTrace changes this and uses push instead, which allows to monitor events that you wouldn’t notice on Linux. It also allows to monitor much more stuff than current Linux tools, in a very easy and clean way.

With DTrace, you can monitor a lot of stuff and find the answer to a lot of questions, like :

  • Monitor process creation, even the short ones. The execsnoop script (which would be a one-liner if you remove the output formatting, and is available in the DTrace Toolkit) shows that logging in by ssh and running for i in $(seq 1 3); do /bin/echo $i; done runs the following processes :
         0   1172   1171 sh -c /usr/bin/locale -a     0   1172   1171 /usr/bin/locale -a     0   1175   1173 -bash     0   1177   1176 id -u     0   1179   1178 dircolors -b     0   1174   1173 pt_chmod 9     0   1180   1175 mesg n     0   1181   1175 seq 1 3     0   1182   1175 /bin/echo 1     0   1183   1175 /bin/echo 2     0   1184   1175 /bin/echo 3 
  • Monitor user and library function calls, and profile them like gprof (yeah, DTrace can replace gprof)
  • Monitor system calls for the whole system or a specific app (yeah, DTrace can replace strace, but you already knew that ;). And you don’t need to restart the app before monitoring it.
  • Replace vmstat. Of course, you can also get the usual vmstat results, but only for events caused by a specific process.
  • Mesure the average latency between GET requests and their result when you browse the web using mozilla. DTrace does this by monitor write syscalls issued by your browser containing a GET and mesuring the delay before the subsequent read returns.
  • Monitor all open syscalls issued on the whole system
  • Monitor all TCP connections received by the system
  • Analyze disk I/O : how much data was written/read to/from the disk, by which process. Nice way to understand buffering and I/O scheduling.

Another example combining rwsnoop and iosnoop :

  1. I start rwsnoop -n bash (read/write monitor, only on processes named bash) and iosnoop -m / (I/O monitor, only on the root partition)
  2. I run : echo blop > t
  3. rwsnoop shows all the read/write calls issued to write to my pseudo-terminal, and the write call to /root/t :
       UID    PID CMD          D   BYTES FILE [...]     0   1175 bash         R       1 /devices/pseudo/pts@0:1     0   1175 bash         W       1 /devices/pseudo/pts@0:1     0   1175 bash         W       5 /root/t     0   1175 bash         W      32 /devices/pseudo/pts@0:1     0   1175 bash         W      17 /devices/pseudo/pts@0:1 
  4. iosnoop doesn’t display anything. But when I run sync :
       UID   PID D    BLOCK   SIZE       COMM PATHNAME     0  1264 W    96640   1024       sync /root/t 
  5. We can see that the write is buffered in the kernel. And that I the 5 chars to my file were transformed in 1024 bytes written to the disk. (Question for the reader: why 5 ? Yeah, it’s easy)

Short conclusion: DTrace looks fantastic. As a toy, it allows to demonstrate/understand the inner workings of Solaris. As a tool, it can probably provide A LOT of useful info, especially since writing DTrace providers seems quite easy (Ruby provider, PHP provider.

Second short conclusion for those who really haven’t understand anything (some LinuxFR readers ;-) : as you can see, when you run echo blop > t, “blop” is actually written to disk in /root/t. Fabulous, isn’t it ?

5 thoughts on “Played with DTrace

  1. Tu es sûr que les processus sh -c /usr/bin/locale -a et /usr/bin/locale -a portaient le même pid? Le second ne portait pas plutôt le pid 1173?

    Accessoirement, je te rappelle (enfin, c’est surtout pour tes lecteurs ;-) ) que l’on n’a pas besoin de redémarrer une application pour la tracer sous Linux : strace -p PID.

    Pour le reste, effectivement, dtrace a l’air intéressant. Dis-moi, NexentaOs, ça tourne sur Sparc, x86, ou les deux?

  2. <i>Tu es sûr que les processus sh -c /usr/bin/locale -a et /usr/bin/locale -a portaient le même pid? Le second ne portait pas plutôt le pid 1173?</i>

    Je n’ai pas revérifié, mais j’imagine que sh fait un exec() dans ce cas là, donc pas de raison que le PID change (et dtrace trace les appels à exec()).

    <i>Accessoirement, je te rappelle (enfin, c’est surtout pour tes lecteurs ;-) ) que l’on n’a pas besoin de redémarrer une application pour la tracer sous Linux : strace -p PID.</i>

    Oui, enfin comme je l’ai dit dans le billet, ptrace est à dtrace ce que fvwm est à gnome …

    <i>Pour le reste, effectivement, dtrace a l’air intéressant. Dis-moi, NexentaOs, ça tourne sur Sparc, x86, ou les deux?</i>

    NexentaOS ne tourne que sur x86 (et peut-être amd64) pour l’instant.

  3. Effectivement, pour ce qui est des pids, c’est assez logique.

    Quelques questions subsidiaires :

    – Je présume qu’il faut disposer de privilèges pour pouvoir observer tous les appels open() sur le système… Ou d’autres appels. Je me trompe?

    – Quel est le statut de Dtrace, l’outil en espace utilisateur? Paquet Débian standard (ça, ça m’étonnerait…), outil Solaris libre, pas libre?

  4. Pt de vue privilèges, je pense qu’il faut être root pour tout. Je ne sais pas s’il est possible, par exemple, de monitorer son appli en tant qu’utilisateur (comme c’est possible avec strace).

    Pour le statut de dtrace, c’est libre, et packagé dans nexentaOS sous forme de paquet Debian. Mais bon, il faut opensolaris en dessous de tte facon.

  5. <i>Pt de vue privilèges, je pense qu’il faut être root pour tout. Je ne sais pas s’il est possible, par exemple, de monitorer son appli en tant qu’utilisateur (comme c’est possible avec strace).</i>

    Euh, ça, c’est quand-même assez gênant! Tu imagines annoncer "Fvwm, c’est antédiluvien par rapport à Gnome; le seul défaut, c’est que Gnome ne tourne qu’en root" ?!

Comments are closed.