List operators on files in shell ? (updated)

November 11th, 2006 by lucas

I often need to do list-like operations on files in shell, for example:

  • substract lines in one file from the lines in another file
  • add lines from two lines, suppressing duplicates
  • keep only lines which are not in both files
  • keep only lines which are in both files
  • etc

Such operations are easy to do with a combination of sort, uniq, cut, diff, etc. But they are so basic operations that it is a bit annoying to write the small shell script each time I need to do one of them.

Isn’t there a tool out there already providing all of them ?

Also, it would be great if such operations could be achieved considering only the first n characters or words (a bit like uniq -w, or the removed uniq -W option). It would be an easy way to do :

i1 c1
i2 c2
i3 c3
i4 c4

minus

i2
i4

Comments are opened.

Update: many people pointed me to moreutils‘ combine. It looks good, bug not exactly what I need, so I filed wishlist bugs #398187 (combine: provide aliases for set theory operators) and #398193 (combine: allow to compare only on a subset of the lines). I won’t have time to provide patches, so if somebody want to work on them …. :-)

7 Responses to “List operators on files in shell ? (updated)”

  1. Guillaume wrote on 11/11/06 at 4:23 pm :

    The comm command should be a step forward.

  2. lucas wrote on 11/11/06 at 4:42 pm :

    It is, thank you, but it still doesn’t allow to limit the comparison to the first n chars/words…

  3. Raphael Hertzog wrote on 11/11/06 at 5:49 pm :

    You should checkout the package “moreutils” of Joey Hess, it contains the “combine” command: “combine file1 and file2″, “combine file1 not file2″, etc.

  4. Damien wrote on 11/11/06 at 6:46 pm :

    Why not make a simple script ? In Ruby it shouldn’t be so hard…

  5. Colin Watson wrote on 11/11/06 at 6:47 pm :

    The join(1) command can do some of this, and can be told to act on only certain fields.

  6. Anonymous wrote on 11/11/06 at 8:21 pm :

    I second the recommendation for “moreutils”.

  7. Rocco Stanzione wrote on 11/12/06 at 9:38 am :

    Try this. Seemed like an interesting problem, and something I deal with a lot too, so I wrote this up. Testing was pretty tedious, but I tested scenarios I could think of. Let me know if something’s broken or if you think it should do more or behave differently.