Comparing two files using awk
Friday, December 28, 2012 awk , compare , nawk 28 Comments

I have two files
File 1 contains 3 fields
File 2 contains 4 fields

The number of rows of File 1 is much smaller than that of File 2

I would like to compare between two files based on 1st field with the following operation
If the first field in any row of file 1 appears in the first field of a row in file 2,
don't print that row for file 2.

Input File 1

S13109 3739 31082
S45002 3800 31873
S43722 3313 26638

Input File 2

S13109 3738 31081 0
S13109 3737 31080 0
S00033 3008 29985 0
S00033 3007 29984 0
S00022 4130 31838 0
S00022 4129 31837 0
S00188 3317 27372 0
S45002 3759 31832 0
S45002 3758 31831 0
S45002 3757 31830 0
S43722 3020 26345 0
S43722 3019 26344 0
S00371 3737 33636 0
S00371 3736 33635 0

Desired Output

S00033 3008 29985 0
S00033 3007 29984 0
S00022 4130 31838 0
S00022 4129 31837 0
S00188 3317 27372 0
S00371 3737 33636 0
S00371 3736 33635 0

solution
---------

awk 'FNR==NR{a[$1]++;next}!a[$1]' file1 file2

How it works:

FNR==NR

When you have two (or more) input files to awk, FNR will reset back to 1
on the first line of the next file whereas NR will continuing incrementing
from where it left off. By checking FNR==NR we are essentially checking
to see if we are currently parsing the first file.

a[$1]++

If we are parsing the first file (see above) then create an associative
array with the first field $1 as the key and post increment the value by 1.
This essentially lets us create a 'seen' list.

next

This command tells awk not to process any further commands and to read in
the next record and start over.
We do this because file1 is only meant to set the associative array

!a[$1]

This line only executes when FNR==NR is false, i.e. we are not parsing
file1 and thus must be parsing file2. We then use the first field $1 of
file2 as the key to index into our 'seen' list created earlier. If the
value returned is 0 it means we didn't see it in file1 and therefore we
should print this line. Conversely, if the value is non-zero then we
did see it in file1 and thus we should not print its value.
Note that !a[$1] is equivalent to !a[$1]{print} because the default action
when one is not given is to print the entire line.

28 comments:

haiMay 22, 2013 at 8:53 AM
In the same way, what could be the change in the above command to print lines which are present in both?
ReplyDelete
Replies
UnknownSeptember 15, 2013 at 6:53 AM
nice explanation..
ReplyDelete
Replies
UnknownSeptember 24, 2013 at 6:28 AM
For hai,
You simply change !a[$1] to a[$1] so that when the value returned is 0,i.e., when the values do not match, it will not print and when it is otherwise, it will print. This will print the similar values in both and exclude the ones which do not match in the first column.
ReplyDelete
Replies
infoNovember 2, 2013 at 8:07 PM
what if all of the line in one line. how to separate each field in multiline back.
could you explain it in code?
ReplyDelete
Replies
UnknownNovember 5, 2013 at 6:30 AM
Good post
How about, for instance, if you wanted to match column 1 in file 1 to column 2 and then 3 and then 4 in file 2, and only output any matching columns from file 2?
ReplyDelete
Replies
UnknownJanuary 16, 2014 at 7:23 AM
Nice example. My requirement is identify the line by line difference between the two files and create a third file containing the identified delta. If any line is new in file 2, then 'I' should be appended to the text of that line and if any line is updated in file2 (i.e. value of any field is changed), then line text should be appended with 'U' in the end.
ReplyDelete
Replies
VijayJanuary 20, 2014 at 12:52 AM
@Sachin..It quite complex and if you provide me an input. I can try for something
ReplyDelete
Replies
RitzFebruary 17, 2014 at 6:38 AM
hi,

Do we need to have files sorted or will it still work if the fields are not sorted?

Regards.
Ritesh
ReplyDelete
Replies
VijayFebruary 20, 2014 at 11:10 PM
sorting is not required.
ReplyDelete
Replies
VelavaMarch 25, 2014 at 8:31 PM
My requirement some like this
File1
123ABCDEFGH
456IJKLMNOPQ
789RSTUVWXY

File2
ABC123DEF
GHI456JKL
MNO789PQR

I need take the first three charcters of the file1 and serch it in the second file (file2) then prints the corrresponding line from the file2 that is matching this in separate file

Hope I have given my requirement clear enough.

Thanks for your help!
ReplyDelete
Replies
VijayApril 8, 2014 at 12:18 AM
Use a perl script for this:

open(F1,"File1");
while(){
push @a, $1 if(/^(...)/)
}
open(F2,"File2");
while(){
my $line=$_;
for(@a){
if($line=~/$_/){
print $line;
break;
}
}
}
ReplyDelete
Replies
AnonymousJune 24, 2014 at 4:40 AM
Hey Vijay Sarathi, the fact you said about NR & FNR is wrong...!!! It is actually the opposite...

NR - This is the number of input records awk has processed since the beginning of the program's execution. NR is set each time a new record is read.

FNR - FNR is the current record number in the current file. FNR is incremented each time a new record is read. It is reinitialized to zero each time a new input file is started
ReplyDelete
Replies
AnonymousNovember 6, 2014 at 9:28 AM
This Command doesn't work.
ReplyDelete
Replies
AnonymousNovember 26, 2014 at 12:59 AM
Thanks for your useful explanations.

I didn't understand the ++ in a[$1]++ so I tried your script with just a[$1] instead. It yielded the same result.

In awk, an index can be an integer as well as a string so perhaps the increment isn't needed since the index and value of each item in the array are automatically updated with every new record.

What do you think?
ReplyDelete
Replies
AnonymousDecember 23, 2014 at 4:56 AM
I want to cmpare two files columnwise in unix using shell script

file1
datasrid BMStrid Mersionid country curr
Met_CCD V14121011081 Recent US USD
Met_CCD V14121011082 Recent US USD
Met_CCD V14121011083 Recent GB GDB
Met_CCD V14121011084 Recent IE GDB
Met_CCD V14121011085 Recent GB GDB
Met_CCD V14121011086 Recent AU AUD
Met_CCD V14121011086 Recent HK HKD
Met_CCD V14121011087 Recent IE GDB

file2
datasrid BMStrid Mersionid country curr
Met_CCD V14121011081 Recent US USD
Met_CCD V14121011082 Recent US USD
Met_CCD V14121011083 Recent GB GDB
Met_CCD V14121011088 Recent IE GDB
Met_CCD V14121011085 Recent HK GDB
Met_CCD V14121011086 Recent AU AUD
Met_CCD V14121011086 Recent HK HKD
Met_CCD V14121011087 Recent IE GDB

Outputfile
need to compare file2 wrt file1.
change in any cell should get highlighted in output file.
like
o/p file should contain

Met_CCD V14121011088 Recent IE GDB
Met_CCD V14121011085 Recent HK GDB

original files may contain hundreds of columns..i need a general solution
ReplyDelete
Replies
AnonymousFebruary 19, 2015 at 7:55 AM
Ho do you exclude items that are not present in file 1.
ReplyDelete
Replies
AnonymousMarch 27, 2015 at 11:08 AM
Hi..I have two files file1 and file2 with content as follows
File1:
Hai
Welcome
Server1 10.11.12.13
Server2 10.11.12.14

File2:
Goodbye
Server1 18.19.23.66
Server2 10.11.12.14

Now I want to extract lines from file1 and file2 which has Server* and if the server names are same then check if the ip's are same or not.if they are same no action is required.
If they are not same then update file2 IP with the corresponding IP from file1..can someone answer pls
ReplyDelete
Replies
Saravana Kumar Vs Cinderella....(SS)January 14, 2016 at 1:53 AM
I need to compare two excel files, using unix script.. could anyone sort out?
ReplyDelete
Replies
UnknownJanuary 26, 2016 at 12:42 PM
Thanks, vijay sarathi for so usefull post, it helped me a lot. I was trying to implement this code using a file .awk
I have the files input1.txt and input2.txt with the content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

and input2.txt

2
4
6
8
10
12
14
16
18
20
22
24
26
28
30

I've tried this in the file:

BEGIN {
FS = "\t";
LINEA = ""

#sw = 1
}

FNR == NR # verdadero sólo en el primer archivo
{
a[$1]++; # construye un array asociativo de la primera columna del archivo
next # salta todos los bloques de procedimiento y procesa la siguiente línea
}
($1 in a) # comprueba si el valor en la columna uno del segundo archivo está en el array
{
print $0 # si es que sí, imprime la línea
} # estas llaves y su contenido se pueden omitir (es la acción por defecto)

I call it by: awk -f compara.awk input1.txt input2.txt > output.txt
and I got this output:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

but when I execute this: awk 'FNR==NR{a[$1]++;next}($1 in a){print $0}' input1.txt input2.txt > output.txt
in shell I get this:

2
4
6
8
10
12
14
16
18
20

I would be so thankful if somebody could help me or give me a tip.
Cheers.
ReplyDelete
Replies
UnknownMay 3, 2016 at 3:49 PM
Hi, I have two files
cat a1.csv
one
two
three
four
one
two
five
six
seven

and cat a2.csv
eight
nine
ten
one
two
ten

#awk 'FNR==NR{a[$1]++;next}!a[$1]' a1.csv a2.csv
eight
nine
ten
ten

#awk 'FNR==NR{a[$1]++;next}a[$1]' a1.csv a2.csv
one
two

As per your commands I am getting above output.
But I want when output like
1)(Without Sort) in one command repeated entries from A1.csv itself and A2.csv itself and then repeated entries from Both A1.csv and A2.csv
2) I want to print repeated entries to repeated.csv
3) and non repeated entries to non_repeated.csv
Please help to resolve.
ReplyDelete
Replies
deepak singhOctober 26, 2016 at 10:10 PM
Thanks for providing this informative information you may also refer.
http://www.s4techno.com/blog/2016/07/11/aix-paging-space-commands/
ReplyDelete
Replies
Sachin RDecember 6, 2016 at 11:14 PM
we have to print the old entries only Delta(Difference) in File-1 and File-2. newly added enties in files-1 no need to print.

I have NR=FNR logic to do so, but its not working for below case

File-1

7! J9AA-50! LHR! 35!
8! J9BB-50! LHW! 22!
7! test3! test3! 8
7! test3! test3! 8
7! JWZZ-50! LHN! 15!
7! J9AA-50! LHR! 34!
8! J9BB-50! LHW! 21!

File-2

7! J9AA-50! LHR! 34!
8! J9BB-50! LHW! 22!
7! JWZZ-50! LHN! 14!
7! J9AA-50! LHR! 34!
8! J9BB-50! LHW! 21!

command to compare and print,

/usr/xpg4/bin/awk -F'!' 'NR==FNR{++a[$2,$4,FNR];next} !a[$2,$4]++ ' File-1 File-2

Current Result:

7! J9AA-50! LHR! 34!
8! J9BB-50! LHW! 22!
7! JWZZ-50! LHN! 14!
8! J9BB-50! LHW! 21!

Expected result:

7! J9AA-50! LHR! 34!
7! JWZZ-50! LHN! 14!

can anyone please suggest the changes/new logic to get Expected output. if its not possible by using NR=FNR then how can we get it by using shell/perl script?.
ReplyDelete
Replies
UnknownMarch 15, 2018 at 9:58 PM
Hi,

I would like to compare two files and print down the value which is not common in both the files. How can I achieve that?
ReplyDelete
Replies
AnonymousJuly 17, 2018 at 4:23 AM
I'm trying my best to learn linux shell and hack things out myself. But this part has me stuck.

I have two CSV files with hundreds of rows each.

CSV 1 looks like this:

name, email, interest

CSV 2 looks like this:

email, name only

I'm trying to write a script to compare the files with email looking for duplicates and remove it. But as you can see, CSV 2 only contains an email, name. If an email in CSV 1 DOES NOT EXIST in CSV 2, then the row containing that email in CSV 1 should be deleted.

The end result can either give output in another location with 2 new file

I would be grateful for the help.

I tried something along these lines with no luck
ReplyDelete
Replies
AnonymousJuly 20, 2018 at 10:18 PM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownJuly 20, 2018 at 10:22 PM
Hi,

I would like to compare two csv files and get output as two file one as same in csv1 and second file csv2 which is not common in both the files. How can I achieve that using shell script?
ReplyDelete
Replies
AnonymousJuly 20, 2018 at 10:27 PM
rks
Hi,
I would like to compare two csv files and get output as two file one as same in csv1 and second file csv2 which is not common in both the files. How can I achieve that using shell script?
ReplyDelete
Replies

Add comment

>The Unix Shell

Comparing two files using awk
Friday, December 28, 2012 awk , compare , nawk 28 Comments

Vijay

28 comments:

Popular Posts

Facebook

About The Author

Labels

Search This Blog

Blog Archive

Pages

Gallery

Contact Us

Popular Posts

About us

Labels

Comparing two files using awk Friday, December 28, 2012 awk , compare , nawk 28 Comments

Vijay

28 comments:

Popular Posts

Facebook

About The Author

Labels

Search This Blog

Blog Archive

Pages

Gallery

Contact Us

Popular Posts

Comparing two files using awk
Friday, December 28, 2012 awk , compare , nawk 28 Comments