List operators on files in shell ? (updated)

I often need to do list-like operations on files in shell, for example:

  • substract lines in one file from the lines in another file
  • add lines from two lines, suppressing duplicates
  • keep only lines which are not in both files
  • keep only lines which are in both files
  • etc

Such operations are easy to do with a combination of sort, uniq, cut, diff, etc. But they are so basic operations that it is a bit annoying to write the small shell script each time I need to do one of them.

Isn’t there a tool out there already providing all of them ?

Also, it would be great if such operations could be achieved considering only the first n characters or words (a bit like uniq -w, or the removed uniq -W option). It would be an easy way to do :

i1 c1
i2 c2
i3 c3
i4 c4

minus

i2
i4

Comments are opened.

Update: many people pointed me to moreutils‘ combine. It looks good, bug not exactly what I need, so I filed wishlist bugs #398187 (combine: provide aliases for set theory operators) and #398193 (combine: allow to compare only on a subset of the lines). I won’t have time to provide patches, so if somebody want to work on them …. :-)

7 thoughts on “List operators on files in shell ? (updated)

  1. You should checkout the package “moreutils” of Joey Hess, it contains the “combine” command: “combine file1 and file2”, “combine file1 not file2”, etc.

  2. Try this. Seemed like an interesting problem, and something I deal with a lot too, so I wrote this up. Testing was pretty tedious, but I tested scenarios I could think of. Let me know if something’s broken or if you think it should do more or behave differently.

Comments are closed.