Removing duplicates from a line
Friday, December 28, 2012 awk 0 Comments

Removing duplicate fields from the same line.
lets say I have a file that is in the format:

0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margi
0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888
0000234223|Q2.10|saigon|Q3.9|tango|Q1.1|money

I am trying to remove the duplicates that appear on the same line.
So, the line:

0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margi

becomes

0000000540|Q1.1|margi

So, the line:

0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888

becomes

0099940598|Q1.2|8888|Q1.3|5454

Solution for this has been written in awk. Though this is a command line one liner, you can put the code between the single quotes in a .awk file and execute it with awk using -f command.

awk '
{
delete p;
n = split($0, a, "|");
printf("%s", a[1]);
for (i = 2; i <= n ; i++)
{
if (!(a[i] in p))
{
printf("|%s", a[i]);
p[a[i]] = "";
}
}
printf "\n";
}' YourFileName

>The Unix Shell

Removing duplicates from a line
Friday, December 28, 2012 awk 0 Comments

Vijay

0 comments:

Popular Posts

Facebook

About The Author

Labels

Search This Blog

Blog Archive

Pages

Gallery

Contact Us

Popular Posts

About us

Labels

Removing duplicates from a line Friday, December 28, 2012 awk 0 Comments

Vijay

0 comments:

Popular Posts

Facebook

About The Author

Labels

Search This Blog

Blog Archive

Pages

Gallery

Contact Us

Popular Posts

Removing duplicates from a line
Friday, December 28, 2012 awk 0 Comments