Filtre um arquivo .CSV com base nos valores da quinta coluna de um arquivo e imprima esses registros em um novo arquivo

Question

Filtre um arquivo .CSV com base nos valores da quinta coluna de um arquivo e imprima esses registros em um novo arquivo

#1 resposta do (15 votos)
#2 resposta do (1 votos)
#3 resposta do (-1 votos)

12

Eu tenho um arquivo .CSV com o formato abaixo:

"column 1","column 2","column 3","column 4","column 5","column 6","column 7","column 8","column 9","column 10
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23455","12312255564","string, with, multiple, commas","string with or, without commas","string 2","USD","433","70%","07/15/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""
"46476","15467534544","lengthy string, with commas, multiple: colans","string with or, without commas","string 2","CAND","388","70%","09/21/2013",""

A quinta coluna do arquivo tem strings diferentes. Preciso filtrar o arquivo com base no valor da quinta coluna. Vamos dizer, eu preciso de um novo arquivo do arquivo atual, que tem registros apenas com o valor "string 1" no seu quinto campo.

Para isso eu tentei o comando abaixo,

awk -F"," ' { if toupper($5) == "STRING 1") PRINT }' file1.csv > file2.csv

mas estava me jogando um erro da seguinte forma:

awk: { if toupper($5) == "STRING 1") PRINT }
awk: ^ syntax error
awk: { if toupper($5) == "STRING 1") PRINT }
awk: ^ syntax error

Eu então usei o seguinte, o que me dá uma saída estranha.

awk -F"," '$5="string 1" {print}' file1.csv > file2.csv

Saída:

"column 1" "column 2" "column 3" "column 4" string 1 "column 6" "column 7" "column 8" "column 9" "column 10
"12310" "42324564756" "a simple string with a comma" string 1 without commas" "string 1" "USD" "12" "70%" "08/01/2013" ""
"23455" "12312255564" "string with string 1 commas" "string with or without commas" "string 2" "USD" "433" "70%" "07/15/2013" ""
"23525" "74535243123" "string with commas string 1 "string with or without commas" "string 1" "CAND" "744" "70%" "05/06/2013" ""
"46476" "15467534544" "lengthy string with commas string 1 "string with or without commas" "string 2" "CAND" "388" "70%" "09/21/2013" ""

P.S: Eu usei o comando toupper para estar no lado seguro, pois não tenho certeza se a string estará em maiúsculas ou minúsculas. Além disso, por favor, informe o que está errado com o meu código e se o espaço na cadeia de caracteres é importante ao procurar um padrão usando o AWK.

awk sed csv linux filter

por Dhruuv 22.10.2013 / 03:46

3 respostas

1

O problema com o CSV é que não há padrão. Se você precisa lidar com dados formatados em CSV com frequência, convém pesquisar um método mais robusto em vez de usar apenas "," como seu separador de campo. Nesse caso, os módulos% CP_de% CPAN do Perl são excepcionalmente adequados para o trabalho:

$ perl -mText::CSV_XS -WlanE '
    BEGIN {our $csv = Text::CSV_XS->new;} 
    $csv->parse($_); 
    my @fields = $csv->fields(); 
    print if $fields[4] =~ /string 1/i;
' file1.csv
"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

por 14.04.2014 / 09:18

-1

awk 'BEGIN {FS = "," }'  '{ (if toupper($5)  == "STRING 1") print; }'  file1.csv > file2.csv

por 22.10.2013 / 04:42

Tags awk sed csv linux filter

ulimit PICKLE: “Operação não permitida” e “Comando não encontrado” Como medir a luz?

score 15 · Accepted Answer

awk -F '","'  'BEGIN {OFS=","} { if (toupper($5) == "STRING 1")  print }' file1.csv > file2.csv

Resultado

"12310","42324564756","a simple string with a , comma","string with or, without commas","string 1","USD","12","70%","08/01/2013",""
"23525","74535243123","string , with commas, and - hypens and: semicolans","string with or, without commas","string 1","CAND","744","70%","05/06/2013",""

Eu acho que isso é o que você quer.