Counting duplicates in a file
Tuesday, March 26, 2013 awk 0 Comments

I have a file where column one has a list of family identifiers

AB
AB
AB
AB
SAR
SAR
EAR

Is there a way that I can create a new column where each repeat is numbered creating a new label for each repeat i.e.

AB_1
AB_2
AB_3
AB_4
SAR_1
SAR_2
EAR_1

Below is a pretty simple solution for this:

awk '{print $1"_"++a[$1]}' file

Since the hash map a has all the counters for all the duplicates. You can use that in the END block if you wish to see just the counters.

>The Unix Shell