Column wise comaprision using awk

Friday, March 29, 2013 , , , 3 Comments

There was a requirement once where i need to compare two files not with thier rows instead  i needed to do the comparision with columns.
I wanted only those rows where any of the columns in the lines differ.

for eg:
File1
1 A B C D
2 E F G H
File2
1 A Z C D
2 E F Y H
3 M N O P
Below is the Output I need:
file1 1 col2 B
file2 1 col2 Z
file1 2 col3 G
file2 2 col3 Y

Below is the solution in awk that i have written.
awk 'FNR==NR{a[FNR]=$0;next} {
if(a[FNR])
{split(a[FNR],b);
for(i=1;i<=NF;i++)
{
if($i!=b[i])
{
printf "file1 "b[1]" col"b[i-1]" "b[i]"\n";
printf "file2 "$1" col"b[i-1]" "$i"\n";
}
}
}
}'
Below is the test i made on my solaris server:


> nawk 'FNR==NR{a[FNR]=$0;next}{if(a[FNR]){split(a[FNR],b);for(i=1;i<=NF;i++){if($i!=b[i]){printf "file1 "b[1]" col"i-1" "b[i]"\n";printf "file2 "$1" col"i-1" "$i"\n";}}}}' file1 file2
file1 1 col2 B
file2 1 col2 Z
file1 2 col3 G
file2 2 col3 Y
>

3 comments:

Counting duplicates in a file

Tuesday, March 26, 2013 0 Comments

I have a file where column one has a list of family identifiers
AB
AB
AB
AB
SAR
SAR
EAR
Is there a way that I can create a new column where each repeat is numbered creating a new label for each repeat i.e.
AB_1
AB_2
AB_3
AB_4
SAR_1
SAR_2
EAR_1
Below is a pretty simple solution for this:
awk '{print $1"_"++a[$1]}' file
Since the hash map a has all the counters for all the duplicates. You can use that in the END block if you wish to see just the counters.

0 comments:

Converting single column to multiple columns

Tuesday, March 26, 2013 , 0 Comments

I have a file which contains all the entries in a single column like:
0
SYSCATSPACE
16384
13432
2948
1
1
TEMPSPACE1
1
1
applicable
1
2
USERSPACE1
4096
1888
2176
1
If I want to convert this in a tabular form of 3*6:
0 SYSCATSPACE 16384 13432 2948       1
1 TEMPSPACE1  1     1     applicable 1
2 USERSPACE1  4096  1888  2176       1
Below is the command that I will use:
perl -lne '$a.="$_ ";
           if($.%6==0){push(@x,$a);$a=""}
           END{for(@x){print $_}}' your_file
output would be :
> perl -lne '$a.="$_ ";if($.%6==0){push(@x,$a);$a=""}END{for(@x){print $_}}' temp
0 SYSCATSPACE 16384 13432 2948 1 
1 TEMPSPACE1 1 1 applicable 1 
2 USERSPACE1 4096 1888 2176 1

0 comments:

Capture all the letters which are repeated more than once

Tuesday, March 12, 2013 , 0 Comments

Recently i came across a need where i need to fetch all the letters in a line which are repeated more than once in a line contiguously.

for example :

lets say there a word "foo bar". I want the letter 'o' in this.
lets say there a word "foo baaar". I want the letters 'o','a' in this.
lets say there a word "foo baaar foo". I want the letters 'o','a' in this again.

Below is code which worked for me:

perl -lne 'push @a,/(\w)\1+/g;END{print @a}' your_file

The above command will scan complete file and prints just the letters that are repeated continguously in the file.

0 comments: