List operators on files in shell ? (updated)

Posted on November 11, 2006November 12, 2006 by lucas

I often need to do list-like operations on files in shell, for example:

substract lines in one file from the lines in another file
add lines from two lines, suppressing duplicates
keep only lines which are not in both files
keep only lines which are in both files
etc

Such operations are easy to do with a combination of sort, uniq, cut, diff, etc. But they are so basic operations that it is a bit annoying to write the small shell script each time I need to do one of them.

Isn’t there a tool out there already providing all of them ?

Also, it would be great if such operations could be achieved considering only the first n characters or words (a bit like uniq -w, or the removed uniq -W option). It would be an easy way to do :

i1 c1
i2 c2
i3 c3
i4 c4

minus

i2
i4

Comments are opened.

Update: many people pointed me to moreutils‘ combine. It looks good, bug not exactly what I need, so I filed wishlist bugs #398187 (combine: provide aliases for set theory operators) and #398193 (combine: allow to compare only on a subset of the lines). I won’t have time to provide patches, so if somebody want to work on them …. :-)

7 thoughts on “List operators on files in shell ? (updated)”

Guillaume says:

November 11, 2006 at 4:23 pm

The comm command should be a step forward.
lucas says:

November 11, 2006 at 4:42 pm

It is, thank you, but it still doesn’t allow to limit the comparison to the first n chars/words…
Raphael Hertzog says:

November 11, 2006 at 5:49 pm

You should checkout the package “moreutils” of Joey Hess, it contains the “combine” command: “combine file1 and file2”, “combine file1 not file2”, etc.
Damien says:

November 11, 2006 at 6:46 pm

Why not make a simple script ? In Ruby it shouldn’t be so hard…
Colin Watson says:

November 11, 2006 at 6:47 pm

The join(1) command can do some of this, and can be told to act on only certain fields.
Anonymous says:

November 11, 2006 at 8:21 pm

I second the recommendation for “moreutils”.
Rocco Stanzione says:

November 12, 2006 at 9:38 am

Try this. Seemed like an interesting problem, and something I deal with a lot too, so I wrote this up. Testing was pretty tedious, but I tested scenarios I could think of. Let me know if something’s broken or if you think it should do more or behave differently.

Comments are closed.