Removing duplicates from a line
Removing duplicate fields from the same line.lets say I have a file that is in the format:
0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margi 0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888 0000234223|Q2.10|saigon|Q3.9|tango|Q1.1|moneyI am trying to remove the duplicates that appear on the same line.
So, the line:
0000000540|Q1.1|margi|Q1.1|margi|Q1.1|margibecomes
0000000540|Q1.1|margiSo, the line:
0099940598|Q1.2|8888|Q1.3|5454|Q1.2|8888becomes
0099940598|Q1.2|8888|Q1.3|5454Solution for this has been written in awk. Though this is a command line one liner, you can put the code between the single quotes in a .awk file and execute it with awk using -f command.
awk ' { delete p; n = split($0, a, "|"); printf("%s", a[1]); for (i = 2; i <= n ; i++) { if (!(a[i] in p)) { printf("|%s", a[i]); p[a[i]] = ""; } } printf "\n"; }' YourFileName
0 comments: