File comparisons using awk: Match columns



a  1 
b  2 
c  3 
d  4 
e  5

the one liner for comparing the first field of file4 with the first field of file3 is:

awk 'FNR==NR{a[$0];next}($1 in a)' file3 file4

and the output is:

a  1 
c  3 
e  5

And if you want to remove the lines which match just change the above mentioned command by adding a !

awk 'FNR==NR{a[$0];next}!($1 in a)' file3 file4


  1. Can you please explain how it is working ?

  2. awk 'FNR==NR{a[$0];next}($1 in a)' file3 file4

    FNR->line number of the file.
    NR->line number of all collected data of all the files.
    So the first thing is:
    FNR==NR->this condition will be a succes untill all the lines in the first file are completed processing.As soon as all the lines in the file3 are completed,FNR will be set back to 1 and NR will continue with its numbering.

    So untill this condition satisfies the array a keeps on building with $0(which is the complete line of file3 here).So at the end of file3 the array has all the lines of file3.
    next is like continue in c language it will tell awk to start processing the next line.

    The rest of the code ($1 in a) will applied only after all the lines in file3 are completed(that is from first line of file4).$1 represents the first field of file4.
    ($1 in a) will check whether ther is a $1 as a key in the array a.If success this will print the line