AWK one-liner for multi-column comparision two unsorted files
This awk one-liner works for multi-column on unsorted files. Comparision is done based on 1st,2nd,3rd,4th of the first file and 1st,3rd,6th,7th of the second file.File1
7000,2,1,6 7001,2,1,7 7002,2,1,6 7003,1,2,1File2
7000,john,2,0,0,1,6 7000,john,2,0,0,1,7 7000,john,2,0,0,1,8 7000,john,2,0,0,1,9 7001,elen,2,0,0,1,7 7002,sami,2,0,0,1,6 7003,mike,1,0,0,2,1 7003,mike,1,0,0,2,2 7003,mike,1,0,0,2,3 7003,mike,1,0,0,2,4 8001,nike,1,2,4,1,8 8002,paul,2,0,0,2,7Output
awk -F, 'NR==FNR{a[$1,$2,$3,$4]++;next} (a[$1,$3,$6,$7])' File1 File2
7000,john,2,0,0,1,6 7001,elen,2,0,0,1,7 7002,sami,2,0,0,1,6 7003,mike,1,0,0,2,1
Hi Vijay, Thanks for the post.
ReplyDeleteCan you help me out in this. Actually, I have rows like
50121 abc.com 28/1/2014-12:00:00
52111 xyz.com 27/1/2014-12:00:00
deusr abc.com 26/1/2014-12:00:00
50121 abc.com 26/1/2014-12:00:00
52111 abc.com 25/1/2014-12:00:00
I removed duplicates based on first column and got the output as
50121 abc.com 28/1/2014-12:00:00
52111 xyz.com 27/1/2014-12:00:00
deusr abc.com 26/1/2014-12:00:00
but the issue here is I am willing to remove duplicates based on 2 columns comparison. i.e., 1st and 2nd one. I am trying using 'awk' command. But I am not getting it. Can you help me out in this please..
@Bharghav, try the below:
ReplyDeleteawk '{a[$1,$2]=$0}END{for(i in a)print a[i]}' your_file
Hi Vijay - in your original post the comparison of the 4 columns are based on AND condition.
ReplyDeleteIf we need to use OR condition for comparing those columns, then what would be the AWK command for that?
Thanks
Hi Vijay,
ReplyDeleteThank you for such a great posts.
Can you please help with the below requirement:
File1:
23455|abc|xyz
23455|abc|xsd
34433|wer|sad
45655|fdf|fcd
File2:
v343v|23455
z565z|23455
c9898|34433
b2323|45655
Output should be:
23455|abc|xyz|v343v
23455|abc|xyz|z565z
23455|abc|xsd|v343v
23455|abc|xsd|z565z
34433|wer|sad|c9898
45655|fdf|fcd|b2323
I am using the below command :
awk -F'|' 'NR==FNR { a[$1]=$1"|"$2"|"$3; next} $2 > 0 {print a[$2] "|" $1}' file1 file2 > result.txt
But its only showing the result :
23455|abc|xsd|v343v
23455|abc|xsd|z565z
34433|wer|sad|c9898
45655|fdf|fcd|b2323
I Have two files as below,
ReplyDeletefirst file:
Start State Next State Session Count Transition% LA_product_view home 694 28.660%
LA_product_view searchresults 54 2.230%
home 1101260 2 0.050%
second file:
Start State Next State Session Count Transition%
LA_product_view home 618 27.560%
LA_product_view searchresults 59 2.630%
home price 25 0.360%
i need to compare first two coloumns of first file and first two coloumns of second file and if they both match it should print 3,4 coloums of both file.if both the files has unique they should be listed with NA from the both files. my output should be like:
LA_product_view home 694 28.660% 618 27.560%
LA_product_view searchresults 54 2.230% 59 2.630%
home 1101260 2 0.050% N/A N/A
home price N/A N/A 25 0.360%
can someone please help me in this..!