Column wise comaprision using awk
There was a requirement once where i need to compare two files not with thier rows instead i needed to do the comparision with columns.I wanted only those rows where any of the columns in the lines differ.
for eg:
File1
1 A B C D
2 E F G H
File22 E F G H
1 A Z C D
2 E F Y H
3 M N O P
Below is the Output I need:
2 E F Y H
3 M N O P
Below is the Output I need:
file1 1 col2 B
file2 1 col2 Z
file1 2 col3 G
file2 2 col3 Y
file2 1 col2 Z
file1 2 col3 G
file2 2 col3 Y
Below is the solution in awk that i have written.
awk 'FNR==NR{a[FNR]=$0;next} {
if(a[FNR])
{split(a[FNR],b);
for(i=1;i<=NF;i++)
{
if($i!=b[i])
{
printf "file1 "b[1]" col"b[i-1]" "b[i]"\n";
printf "file2 "$1" col"b[i-1]" "$i"\n";
}
}
}
}'
Below is the test i made on my solaris server:if(a[FNR])
{split(a[FNR],b);
for(i=1;i<=NF;i++)
{
if($i!=b[i])
{
printf "file1 "b[1]" col"b[i-1]" "b[i]"\n";
printf "file2 "$1" col"b[i-1]" "$i"\n";
}
}
}
}'
> nawk 'FNR==NR{a[FNR]=$0;next}{if(a[FNR]){split(a[FNR],b);for(i=1;i<=NF;i++){if($i!=b[i]){printf "file1 "b[1]" col"i-1" "b[i]"\n";printf "file2 "$1" col"i-1" "$i"\n";}}}}' file1 file2
file1 1 col2 B
file2 1 col2 Z
file1 2 col3 G
file2 2 col3 Y
>
file1 1 col2 B
file2 1 col2 Z
file1 2 col3 G
file2 2 col3 Y
>
I'm using awk version GNU 3.1.5 , above script is not working. Do it works only in above version. If yes can you help me in providing the script which works for awk version GNU 3.1.5
ReplyDeleteI want to cmpare two .csv files columnwise in unix using shell script
ReplyDeletefile1
datasrid BMStrid Mersionid country curr
Met_CCD V14121011081 Recent US USD
Met_CCD V14121011082 Recent US USD
Met_CCD V14121011083 Recent GB GDB
Met_CCD V14121011084 Recent IE GDB
Met_CCD V14121011085 Recent GB GDB
Met_CCD V14121011086 Recent AU AUD
Met_CCD V14121011086 Recent HK HKD
Met_CCD V14121011087 Recent IE GDB
file2
datasrid BMStrid Mersionid country curr
Met_CCD V14121011081 Recent US USD
Met_CCD V14121011082 Recent US USD
Met_CCD V14121011083 Recent GB GDB
Met_CCD V14121011088 Recent IE GDB
Met_CCD V14121011085 Recent HK GDB
Met_CCD V14121011086 Recent AU AUD
Met_CCD V14121011086 Recent HK HKD
Met_CCD V14121011087 Recent IE GDB
Outputfile
need to compare file2 wrt file1.
change in any cell should get highlighted in output file.
like
o/p file should contain
Met_CCD 'V14121011088' Recent IE GDB
Met_CCD V14121011085 Recent 'HK' GDB
original files may contain hundreds of columns..i need a general solution
If I am not wrong, basically you need to ignore the duplicate rows in both the files.
DeleteSo load all rows in a associative array in file1. Then scan around the file2 and if the row exists in associative array,ignore it, else print it.
roughly below solution should work.
awk 'NR==FNR{a[$0];next}{if($0 in a){}else{print}}' file1 file2