Column wise comaprision using awk

Friday, March 29, 2013 , , , 3 Comments

There was a requirement once where i need to compare two files not with thier rows instead  i needed to do the comparision with columns.
I wanted only those rows where any of the columns in the lines differ.

for eg:
File1
1 A B C D
2 E F G H
File2
1 A Z C D
2 E F Y H
3 M N O P
Below is the Output I need:
file1 1 col2 B
file2 1 col2 Z
file1 2 col3 G
file2 2 col3 Y

Below is the solution in awk that i have written.
awk 'FNR==NR{a[FNR]=$0;next} {
if(a[FNR])
{split(a[FNR],b);
for(i=1;i<=NF;i++)
{
if($i!=b[i])
{
printf "file1 "b[1]" col"b[i-1]" "b[i]"\n";
printf "file2 "$1" col"b[i-1]" "$i"\n";
}
}
}
}'
Below is the test i made on my solaris server:


> nawk 'FNR==NR{a[FNR]=$0;next}{if(a[FNR]){split(a[FNR],b);for(i=1;i<=NF;i++){if($i!=b[i]){printf "file1 "b[1]" col"i-1" "b[i]"\n";printf "file2 "$1" col"i-1" "$i"\n";}}}}' file1 file2
file1 1 col2 B
file2 1 col2 Z
file1 2 col3 G
file2 2 col3 Y
>

3 comments:

  1. I'm using awk version GNU 3.1.5 , above script is not working. Do it works only in above version. If yes can you help me in providing the script which works for awk version GNU 3.1.5

    ReplyDelete
  2. I want to cmpare two .csv files columnwise in unix using shell script

    file1
    datasrid BMStrid Mersionid country curr
    Met_CCD V14121011081 Recent US USD
    Met_CCD V14121011082 Recent US USD
    Met_CCD V14121011083 Recent GB GDB
    Met_CCD V14121011084 Recent IE GDB
    Met_CCD V14121011085 Recent GB GDB
    Met_CCD V14121011086 Recent AU AUD
    Met_CCD V14121011086 Recent HK HKD
    Met_CCD V14121011087 Recent IE GDB

    file2
    datasrid BMStrid Mersionid country curr
    Met_CCD V14121011081 Recent US USD
    Met_CCD V14121011082 Recent US USD
    Met_CCD V14121011083 Recent GB GDB
    Met_CCD V14121011088 Recent IE GDB
    Met_CCD V14121011085 Recent HK GDB
    Met_CCD V14121011086 Recent AU AUD
    Met_CCD V14121011086 Recent HK HKD
    Met_CCD V14121011087 Recent IE GDB

    Outputfile

    need to compare file2 wrt file1.

    change in any cell should get highlighted in output file.

    like
    o/p file should contain

    Met_CCD 'V14121011088' Recent IE GDB
    Met_CCD V14121011085 Recent 'HK' GDB


    original files may contain hundreds of columns..i need a general solution

    ReplyDelete
    Replies
    1. If I am not wrong, basically you need to ignore the duplicate rows in both the files.
      So load all rows in a associative array in file1. Then scan around the file2 and if the row exists in associative array,ignore it, else print it.
      roughly below solution should work.
      awk 'NR==FNR{a[$0];next}{if($0 in a){}else{print}}' file1 file2

      Delete