remove linhas com o mesmo valor em uma determinada coluna

1

Eu tenho o arquivo de entrada (classificado pela coluna 2 com -t,):

TOP,25424242,T0137,0.08,0.06,0.02,24
TOP,25424242,T0138,0.07,0.06,0.01,24
TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,23525255,T0137,0.40,0.30,0.11,24
TOP,23525255,T0138,0.08,0.07,0.01,24
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,75856354,T0137,0.18,0.17,0.01,36
TOP,75856354,T0138,0.18,0.17,0.01,26
TOP,42401990,T0137,0.06,0.05,0.01,24

Eu quero me livrar de todas as duas linhas que têm o mesmo valor na coluna 2, então, finalmente, obtenha apenas linhas com valor exclusivo no campo 2 - no exemplo acima, seria:

TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24
    
por DonJ 21.12.2017 / 10:54

3 respostas

2

Isso deve funcionar:

 $ awk -F, '{a[$2]=$0; b[$2]++;} END{for(i in a){if(b[i]==1){print a[i]}}}' file
TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24
    
por 21.12.2017 / 11:01
2

Truque uniq curto para a sua estrutura de entrada atual (com o primeiro comprimento estático de 2 campos):

uniq -s4 -w8 -u file
  • -s4 - pula os primeiros 4 caracteres (por exemplo, TOP, )
  • -w8 - compare no máximo 8 caracteres em linhas
  • -u - imprima apenas linhas exclusivas

A saída:

TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24
    
por 21.12.2017 / 11:26
0

Você pode conseguir isso usando o awk:

for k in 'awk -F "," '{print $2}' file.txt | uniq -D'; do
  sed -i '/'$k'/d' file.txt;
done

Saída

TOP,17236110,T0138,9.65,9.37,0.28,89
TOP,21627012,T0138,0.41,0.33,0.08,24
TOP,42401990,T0137,0.06,0.05,0.01,24
    
por 21.12.2017 / 16:34