Organizando dados por um cabeçalho

0

Eu tenho dois arquivos csv.

O arquivo 1 contém o cabeçalho. Arquivo 2 contém dados

formato do arquivo 1: file1.csv

id,abc,xyz,aaa,bbb,ccc

formato do arquivo 2: file2.csv

id,source,value                
1,abc,100      
1,xyz,200   
2,aaa,300   
2,bbb,400   
2,ccc,500

Agora, eu tenho que coincidir com os dados na coluna de origem no arquivo2.csv com cabeçalho no file1.csv e saída deve ser como abaixo

id,abc,xyz,aaa,bbb,ccc   
1,100,200,null,null,null   
2,null,null,300,400,500         
    
por prathima 17.10.2018 / 14:10

2 respostas

0

não pode ... resistir ... ruído de linha ...

perl -F, -slE'if(@ARGV){say;@h=@F}elsif($.>1){$d{$F[0]}->@{@h}=($F[0],("null")x@h)unless$d{$F[0]};$d{$F[0]}{$F[1]}=$F[2]}close ARGV if eof}END{say$d{$_}->@{@h}for keys%d' -- -,=, file{1,2}.csv

ou um one-liner (um pouco) mais sensato

perl -F, -lane '
    if (@ARGV) {print; @sources = @F[1..$#F]}  # the first file
    elsif ($. > 1) {                           # the 2nd file, minus the header
        $data{$F[0]}->@{@sources} = ("null") x @sources unless $data{$F[0]};
        $data{$F[0]}{$F[1]} = $F[2]; 
    }
    close ARGV if eof;    # reset $. for each new file
  } END {
    $, = ",";
    print $_, $data{$_}->@{@sources} for sort keys %data
' file1.csv file2.csv

ou, isso é "combine.pl"

#!/usr/bin/env perl
use v5.22;
use Path::Tiny;        # convenience module from CPAN

# read the header from the first file
my $file = path("file1.csv");
my @lines = $file->lines;
my $header = $lines[0];
chomp $header;
my @sources = split /,/, $header;

# read the data from the second file
$file = path("file2.csv");
chomp( @lines = $file->lines );
shift @lines;          # ignore the header
my %data;
for my $line (@lines) {
    my ($id, $source, $value) = split /,/, $line, 3;
    if (not exists $data{$id}) {
        # initialize the output data for a new id
        $data{$id}->@{ @sources } = ($id, ("null") x scalar(@sources));
    }
    # and store this value
    $data{$id}{$source} = $value;
}

# output the results
say $header;
$, = ",";
for my $id (sort {$a <=> $b} keys %data) {
    say $data{$id}->@{@sources};
}

então: perl combine.pl > output.csv

    
por 17.10.2018 / 19:19
0

... e a inevitável proposta awk :

awk -F, '
function PRT()  {printf "%d", ID                                        # print first field: ID
                 for (i=2; i<=MAX; i++) printf ",%s",PF[i]?PF[i]:"null" # print popuated fields in sequence, "null" if empty
                 printf ORS                                             # line feed
                }

NR==FNR         {print                                                  # in first file; print header
                 for (MAX=n=split ($0, T); n; n--) COL[T[n]] = n        # split header and find column No. per header field
                 next                                                   # no further processing of this line
                }
FNR > 2 &&                                                              # skip printout for first ID (in second line of file2)
$1 != ID        {PRT()                                                  # print if new ID found
                 split ("", PF)                                         # empty the print array
                }

                {ID = $1                                                # retain ID
                 PF[COL[$2]] = $3                                       # collect col values into respective column
                }

END             {PRT()                                                  # print last IDs record
                }
' file[12]                                                              # shells "pattern matching"  expands resp. files
id,abc,xyz,aaa,bbb,ccc
1,100,200,null,null,null
2,null,null,300,400,500
    
por 17.10.2018 / 23:08