File comparisons using awk: Match columns

Saturday, December 29, 2012 , 3 Comments


File3
a
c
e

File4
a  1 
b  2 
c  3 
d  4 
e  5

the one liner for comparing the first field of file4 with the first field of file3 is:

awk 'FNR==NR{a[$0];next}($1 in a)' file3 file4

and the output is:

a  1 
c  3 
e  5

And if you want to remove the lines which match just change the above mentioned command by adding a !

awk 'FNR==NR{a[$0];next}!($1 in a)' file3 file4

3 comments:

  1. Can you please explain how it is working ?

    ReplyDelete
  2. awk 'FNR==NR{a[$0];next}($1 in a)' file3 file4

    FNR->line number of the file.
    NR->line number of all collected data of all the files.
    So the first thing is:
    FNR==NR->this condition will be a succes untill all the lines in the first file are completed processing.As soon as all the lines in the file3 are completed,FNR will be set back to 1 and NR will continue with its numbering.

    So untill this condition satisfies the array a keeps on building with $0(which is the complete line of file3 here).So at the end of file3 the array has all the lines of file3.
    next is like continue in c language it will tell awk to start processing the next line.

    The rest of the code ($1 in a) will applied only after all the lines in file3 are completed(that is from first line of file4).$1 represents the first field of file4.
    ($1 in a) will check whether ther is a $1 as a key in the array a.If success this will print the line

    ReplyDelete
  3. I want to cmpare two files columnwise in unix using shell script
    file1
    datasrid BMStrid Mersionid country curr
    Met_CCD V14121011081 Recent US USD
    Met_CCD V14121011082 Recent US USD
    Met_CCD V14121011083 Recent GB GDB
    Met_CCD V14121011084 Recent IE GDB
    Met_CCD V14121011085 Recent GB GDB
    Met_CCD V14121011086 Recent AU AUD
    Met_CCD V14121011086 Recent HK HKD
    Met_CCD V14121011087 Recent IE GDB


    file2
    datasrid BMStrid Mersionid country curr
    Met_CCD V14121011081 Recent US USD
    Met_CCD V14121011082 Recent US USD
    Met_CCD V14121011083 Recent GB GDB
    Met_CCD V14121011088 Recent IE GDB
    Met_CCD V14121011085 Recent HK GDB
    Met_CCD V14121011086 Recent AU AUD
    Met_CCD V14121011086 Recent HK HKD
    Met_CCD V14121011087 Recent IE GDB

    Outputfile

    need to compare file2 wrt file1.
    change in any cell should get highlighted in output file.
    like
    o/p file should contain
    Met_CCD 'V14121011088' Recent IE GDB
    Met_CCD V14121011085 Recent 'HK' GDB

    ReplyDelete