Eu tenho um arquivo. Eu preciso imprimir todas as quantidades com valores únicos sem duplicatas, usando perl

0

No exemplo abaixo, quantidade é o campo com a tag '38', por exemplo 38 = 100 na primeira linha.

Order:167342,9=205|21=1|553=2453|49=11342|56=MBT|10=085|55=/GCQ3|1=30532|114=Y|40=1|35=D|54=|60=20130624-09:45:02.046|34=388|11=|38=100|52=20130624-09:45:02.046|59=0|100=MBTX|43=Y|8=FIX.4.4|

Order:544291,52=20130624-09:45:02.046|59=0|100=MBTX|43=Y|8=FIX.4.4|54=|60=20130624-09:45:02.046|40=1|35=D|34=388|11=|38=100|56=MBT|1=30532|114=Y|10=085|55=/GCQ3|9=205|21=1|553=2453|49=11342|

Order:916070,35=D|40=1|54=|60=20130624-09:45:02.046|38=234|34=388|11=|59=0|52=20130624-09:45:02.046|8=FIX.4.4|100=MBTX|43=Y|9=205|553=2453|49=11342|21=1|56=MBT|55=/GCQ3|10=085|1=30532|114=Y|

Order:332907,9=205|49=11342|553=2453|21=1|56=MBT|114=Y|1=30532|55=/GCQ3|10=085|60=20130624-09:45:02.046|54=|35=D|40=1|38=26|11=|34=388|59=0|52=20130624-09:45:02.046|8=FIX.4.4|43=Y|100=MBTX|

Order:385327,38=100|34=388|11=|35=D|40=1|60=20130624-09:45:02.046|54=|8=FIX.4.4|43=Y|100=MBTX|59=0|52=20130624-09:45:02.046|553=2453|49=11342|21=1|9=205|55=/GCQ3|10=085|1=30532|114=Y|56=MBT|

Order:610550,59=0|52=20130624-09:45:02.046|8=FIX.4.4|100=MBTX|43=Y|35=D|40=1|54=|60=20130624-09:45:02.046|38=521|11=|34=388|56=MBT|55=/GCQ3|10=085|1=30532|114=Y|9=205|553=2453|49=11342|21=1|

Order:408689,59=0|52=20130624-09:45:02.046|8=FIX.4.4|43=Y|100=MBTX|35=D|40=1|60=20130624-09:45:02.046|54=|38=658|34=388|11=|56=MBT|55=/GCQ3|10=085|114=Y|1=30532|9=205|49=11342|553=2453|21=1|

Order:43899,56=MBT|10=085|55=/GCQ3|114=Y|1=30532|9=205|21=1|49=11342|553=2453|52=20130624-09:45:02.046|59=0|100=MBTX|43=Y|8=FIX.4.4|40=1|35=D|60=20130624-09:45:02.046|54=|11=|34=388|38=531|

Resultado: - se houver três tags com os mesmos valores, então apenas uma saída será mostrada: O resultado deve ser assim: -

Order:167342,9=205|21=1|553=2453|49=11342|56=MBT|10=08‌​5|55=/GCQ3|1=30532|1‌​14=Y|40=1|35=D|54=|6‌​0=20130624-09:45:02.‌​046|34=388|11=|38=10‌​0|52=20130624-09:45:‌​02.046|59=0|100=MBTX‌​|43=Y|8=FIX.4.4| 

Order:916070,35=D|40=1|54=|60=20130624-09:45:02.046|38=234|3‌​4=388|11=|59=0|52=20‌​130624-09:45:02.046|‌​8=FIX.4.4|100=MBTX|4‌​3=Y|9=205|553=2453|4‌​9=11342|21=1|56=MBT|‌​55=/GCQ3|10=085|1=30‌​532|114=Y|

e assim por diante .....

    
por Sonal 28.12.2016 / 20:47

2 respostas

4

Como você marcou sua pergunta perl , que tal usar um hash perl?

$ perl -ne '/[,|](38=\d+)/ ; print unless $seen{ $1 }++' file
Order:167342,9=205|21=1|553=2453|49=11342|56=MBT|10=085|55=/GCQ3|1=30532|114=Y|40=1|35=D|54=|60=20130624-09:45:02.046|34=388|11=|38=100|52=20130624-09:45:02.046|59=0|100=MBTX|43=Y|8=FIX.4.4|
Order:916070,35=D|40=1|54=|60=20130624-09:45:02.046|38=234|34=388|11=|59=0|52=20130624-09:45:02.046|8=FIX.4.4|100=MBTX|43=Y|9=205|553=2453|49=11342|21=1|56=MBT|55=/GCQ3|10=085|1=30532|114=Y|
Order:332907,9=205|49=11342|553=2453|21=1|56=MBT|114=Y|1=30532|55=/GCQ3|10=085|60=20130624-09:45:02.046|54=|35=D|40=1|38=26|11=|34=388|59=0|52=20130624-09:45:02.046|8=FIX.4.4|43=Y|100=MBTX|
Order:610550,59=0|52=20130624-09:45:02.046|8=FIX.4.4|100=MBTX|43=Y|35=D|40=1|54=|60=20130624-09:45:02.046|38=521|11=|34=388|56=MBT|55=/GCQ3|10=085|1=30532|114=Y|9=205|553=2453|49=11342|21=1|
Order:408689,59=0|52=20130624-09:45:02.046|8=FIX.4.4|43=Y|100=MBTX|35=D|40=1|60=20130624-09:45:02.046|54=|38=658|34=388|11=|56=MBT|55=/GCQ3|10=085|114=Y|1=30532|9=205|49=11342|553=2453|21=1|
Order:43899,56=MBT|10=085|55=/GCQ3|114=Y|1=30532|9=205|21=1|49=11342|553=2453|52=20130624-09:45:02.046|59=0|100=MBTX|43=Y|8=FIX.4.4|40=1|35=D|60=20130624-09:45:02.046|54=|11=|34=388|38=531|

Para imprimir o número total de linhas no final do processamento, você pode modificar para

perl -ne '
  /[,|](38=\d+)/ ; print unless $seen{ $1 }++ ; 
  END { print "Total lines: $.\n" }
' file

ou para imprimir o número total de registros (definidos aqui como correspondências da string 38=\d+ quantity)

perl -ne '
  $c += () = /[,|](38=\d+)/ ; print unless $seen{ $1 }++ ; 
  END { print "Total records: $c\n" }
' file

Se você quiser o número de quantidades únicas, pode usar o valor escalar das chaves hash:

$ perl -ne '
  /[,|](38=\d+)/ ; print unless $seen{ $1 }++ ;
  END { print "Unique records: ", scalar keys %seen, "\n" }
' file
Order:167342,9=205|21=1|553=2453|49=11342|56=MBT|10=085|55=/GCQ3|1=30532|114=Y|40=1|35=D|54=|60=20130624-09:45:02.046|34=388|11=|38=100|52=20130624-09:45:02.046|59=0|100=MBTX|43=Y|8=FIX.4.4|
Order:916070,35=D|40=1|54=|60=20130624-09:45:02.046|38=234|34=388|11=|59=0|52=20130624-09:45:02.046|8=FIX.4.4|100=MBTX|43=Y|9=205|553=2453|49=11342|21=1|56=MBT|55=/GCQ3|10=085|1=30532|114=Y|
Order:332907,9=205|49=11342|553=2453|21=1|56=MBT|114=Y|1=30532|55=/GCQ3|10=085|60=20130624-09:45:02.046|54=|35=D|40=1|38=26|11=|34=388|59=0|52=20130624-09:45:02.046|8=FIX.4.4|43=Y|100=MBTX|
Order:610550,59=0|52=20130624-09:45:02.046|8=FIX.4.4|100=MBTX|43=Y|35=D|40=1|54=|60=20130624-09:45:02.046|38=521|11=|34=388|56=MBT|55=/GCQ3|10=085|1=30532|114=Y|9=205|553=2453|49=11342|21=1|
Order:408689,59=0|52=20130624-09:45:02.046|8=FIX.4.4|43=Y|100=MBTX|35=D|40=1|60=20130624-09:45:02.046|54=|38=658|34=388|11=|56=MBT|55=/GCQ3|10=085|114=Y|1=30532|9=205|49=11342|553=2453|21=1|
Order:43899,56=MBT|10=085|55=/GCQ3|114=Y|1=30532|9=205|21=1|49=11342|553=2453|52=20130624-09:45:02.046|59=0|100=MBTX|43=Y|8=FIX.4.4|40=1|35=D|60=20130624-09:45:02.046|54=|11=|34=388|38=531|
Unique records: 6

Se você deseja apenas o campo 38=qty , basta imprimir $1 da correspondência de expressão regular:

$ perl -lne '
  /[,|](38=\d+)/ ; print $1 unless $seen{ $1 }++ ;
  END { print "Unique records: ", scalar keys %seen }
' file
38=100
38=234
38=26
38=521
38=658
38=531
Unique records: 6

Para gerar as contagens, você deve aguardar até que o END e, em seguida, faça um loop sobre o hash. Você pode, opcionalmente, sort no keys nesse ponto:

$ perl -lne '
  $seen{ $1 }++ if /[,|](38=\d+)/ ;
  END {
    for $key (sort keys %seen) { print "$seen{$key} $key" };
    print "Unique records: ", scalar keys %seen
  }
' file
3 38=100
1 38=234
1 38=26
1 38=521
1 38=531
1 38=658
Unique records: 6
    
por steeldriver 28.12.2016 / 22:49
1

Se eu entendi a pergunta, este comando do awk deve fazer o que você quer:

#!/bin/sh
awk -F "[,\|]" '
(NF>0){delete key
printf "%s,", $1
for (i=2; i<=NF-1; i++) {if (key[$i]!=1) printf "%s|", $i
            key[$i]=1}
printf "\n"}' <tf

(onde a entrada é armazenada no arquivo tf)

Com base no seu arquivo de entrada, recebo a saída:

Order:167342,9=205|21=1|553=2453|49=11342|56=MBT|10=085|55=/GCQ3|1=30532|114=Y|40=1|35=D|54=|60=20130624-09:45:02.046|34=388|11=|38=100|52=20130624-09:45:02.046|59=0|100=MBTX|43=Y|8=FIX.4.4|
Order:544291,52=20130624-09:45:02.046|59=0|100=MBTX|43=Y|8=FIX.4.4|54=|60=20130624-09:45:02.046|40=1|35=D|34=388|11=|38=100|56=MBT|1=30532|114=Y|10=085|55=/GCQ3|9=205|21=1|553=2453|49=11342|
Order:916070,35=D|40=1|54=|60=20130624-09:45:02.046|38=234|34=388|11=|59=0|52=20130624-09:45:02.046|8=FIX.4.4|100=MBTX|43=Y|9=205|553=2453|49=11342|21=1|56=MBT|55=/GCQ3|10=085|1=30532|114=Y|
Order:332907,9=205|49=11342|553=2453|21=1|56=MBT|114=Y|1=30532|55=/GCQ3|10=085|60=20130624-09:45:02.046|54=|35=D|40=1|38=26|11=|34=388|59=0|52=20130624-09:45:02.046|8=FIX.4.4|43=Y|100=MBTX|
Order:385327,38=100|34=388|11=|35=D|40=1|60=20130624-09:45:02.046|54=|8=FIX.4.4|43=Y|100=MBTX|59=0|52=20130624-09:45:02.046|553=2453|49=11342|21=1|9=205|55=/GCQ3|10=085|1=30532|114=Y|56=MBT|
Order:610550,59=0|52=20130624-09:45:02.046|8=FIX.4.4|100=MBTX|43=Y|35=D|40=1|54=|60=20130624-09:45:02.046|38=521|11=|34=388|56=MBT|55=/GCQ3|10=085|1=30532|114=Y|9=205|553=2453|49=11342|21=1|
Order:408689,59=0|52=20130624-09:45:02.046|8=FIX.4.4|43=Y|100=MBTX|35=D|40=1|60=20130624-09:45:02.046|54=|38=658|34=388|11=|56=MBT|55=/GCQ3|10=085|114=Y|1=30532|9=205|49=11342|553=2453|21=1|
Order:43899,56=MBT|10=085|55=/GCQ3|114=Y|1=30532|9=205|21=1|49=11342|553=2453|52=20130624-09:45:02.046|59=0|100=MBTX|43=Y|8=FIX.4.4|40=1|35=D|60=20130624-09:45:02.046|54=|11=|34=388|38=531|
    
por Nick Sillito 28.12.2016 / 22:24