Join lines based upon the first field

Friday, December 28, 2012 , , , 0 Comments

Let's say I have a file with data like below:
ENST000001.1 + 67208778 67210057
ENST000001.1 + 67208778 67210768
ENST000001.1 + 67208778 67208882
ENST000002.5 + 67208778 67213982
ENST000003.1 - 57463571 57463801
ENST000003.1 - 57476352 57476463
ENST000003.1 - 57476817 57476945
I want to join some lines based on the first field and follow certain pattern for joining the lines. Expected output is:
ENST000001.1 + 67208778_67210057  67208778_67210768  67208778_67208882 
ENST000002.5 + 67208778_67213982
ENST000003.1 - 57463571_57463801  57476352_57476463  57476817_57476945
I actually have two different solutions to achieve the same. Below they are one in Awk and one in Perl:
awk '{a[$1" "$2]=a[$1" "$2]" "$3" "$4;}
     END{
         for(i in a)print i,a[i]
        }' your_file
In Perl:
perl -F -lane '$H{$F[0]." ".$F[1]}=
               $H{$F[0]." ".$F[1]}." ".$F[2]."_".$F[3];
               if(eof){
                      foreach(keys %H){print $_,$H{$_}}
                      }' your_file

0 comments: