I often need to do list-like operations on files in shell, for example:
- substract lines in one file from the lines in another file
- add lines from two lines, suppressing duplicates
- keep only lines which are not in both files
- keep only lines which are in both files
- etc
Such operations are easy to do with a combination of sort, uniq, cut, diff, etc. But they are so basic operations that it is a bit annoying to write the small shell script each time I need to do one of them.
Isn’t there a tool out there already providing all of them ?
Also, it would be great if such operations could be achieved considering only the first n characters or words (a bit like uniq -w, or the removed uniq -W option). It would be an easy way to do :
i1 c1
i2 c2
i3 c3
i4 c4
minus
i2
i4
Comments are opened.
Update: many people pointed me to moreutils‘ combine. It looks good, bug not exactly what I need, so I filed wishlist bugs #398187 (combine: provide aliases for set theory operators) and #398193 (combine: allow to compare only on a subset of the lines). I won’t have time to provide patches, so if somebody want to work on them …. :-)
The
comm
command should be a step forward.It is, thank you, but it still doesn’t allow to limit the comparison to the first n chars/words…
You should checkout the package “moreutils” of Joey Hess, it contains the “combine” command: “combine file1 and file2”, “combine file1 not file2”, etc.
Why not make a simple script ? In Ruby it shouldn’t be so hard…
The join(1) command can do some of this, and can be told to act on only certain fields.
I second the recommendation for “moreutils”.
Try this. Seemed like an interesting problem, and something I deal with a lot too, so I wrote this up. Testing was pretty tedious, but I tested scenarios I could think of. Let me know if something’s broken or if you think it should do more or behave differently.