Classificar dados em várias colunas

1

ENTRADA (não classificada, ruim)

"XYZ-ZTE-43255 serverB618 agreed","XYZ-ZTE-44432"
,"XYZ-ZTE-43255"
"XYZ-ZTE-52775 serverB110 agreed",
"XYZ-ZTE-79213 - serverB688 agreed",
"XYZ-ZTE-77323 serverB617 agreed",
"XYZ-ZTE-81422 - serverB609 agreed","XYZ-ZTE-77323"
"XYZ-ZTE-32785 - serverA626 agreed","XYZ-ZTE-52775"
"XYZ-ZTE-43235 - serverA605 disagreed (asdfjlasdj yxvv il lkyeas sadfa)","XYZ-ZTE-43235"
"XYZ-ZTE-11591 serverB144 agreed",
,"XYZ-ZTE-11591"

OUTPUT (classificado, o necessário, o bom!)

"XYZ-ZTE-43255 serverB618 agreed","XYZ-ZTE-43255"
,"XYZ-ZTE-44432"
"XYZ-ZTE-52775 serverB110 agreed","XYZ-ZTE-52775"
"XYZ-ZTE-79213 - serverB688 agreed",
"XYZ-ZTE-77323 serverB617 agreed","XYZ-ZTE-77323"
"XYZ-ZTE-81422 - serverB609 agreed",
"XYZ-ZTE-32785 - serverA626 agreed",
"XYZ-ZTE-43235 - serverA605 disagreed (asdfjlasdj yxvv il lkyeas sadfa)","XYZ-ZTE-43235"
"XYZ-ZTE-11591 serverB144 agreed","XYZ-ZTE-11591"

Resumindo: este é um trecho de um XLS com duas colunas. As duas colunas devem ser classificadas juntas de forma que, por exemplo:

XYZ-ZTE-77323 na coluna da esquerda corresponde ao XYZ-ZTE-77323 na coluna da direita

Mas existem coisas como:

,"XYZ-ZTE-43255"

que tem que ir com a linha:

"XYZ-ZTE-43255 serverB618 agreed","XYZ-ZTE-43255"

mas a linha (no INPUT) já contém uma coluna da direita:

"XYZ-ZTE-43255 serverB618 agreed","XYZ-ZTE-44432"

e, e. se:

"XYZ-ZTE-44432"

não existe na coluna da esquerda, então deve estar em uma nova linha no OUTPUT

Alguém tem uma ideia de como fazer isso?

    
por LanceBaynes 17.06.2011 / 11:51

3 respostas

2

Eu tentei este programa, não o melhor (ele analisa duas vezes um arquivo e tem algum código duplicado), mas sinta-se à vontade para adaptá-lo às suas necessidades. Eu acho que faz o trabalho.

$ cat script.pl
use warnings;
use strict;
use Text::CSV_XS;

my (%col1, %col2);

my $csv = Text::CSV_XS->new(
        { empty_is_undef => 1 }
) or die "Error: " . Text::CSV_XS->error_diag();

chomp( my @data = <STDIN> );

## Read file and save first column in %col1 hash and second
## column in %col2 hash.
foreach my $line ( @data ) {
    die "Error in parse of CSV file\n" unless $csv->parse( $line );
    my @columns = $csv->fields();
    $col1{ $columns[0] }++ if defined $columns[0];
    $col2{ $columns[1] }++ if defined $columns[1];
}

LINE:
foreach my $line ( @data ) {
        die "Error in parse of CSV file\n" unless $csv->parse( $line );
        my @columns = $csv->fields();

        ## Discard line if both columns are undefined.
        next if !defined $columns[0] && !defined $columns[1];

        ## 1.- Undefined first column: Save second column in hash.
        do { $col2{ $columns[1] } = 1; next } unless defined $columns[0];

        ## 2.- Both columns are defined: Sort them.
        if ( defined $columns[0] && defined $columns[1] ) {
                if ( index( $columns[0], $columns[1] ) > -1 ) {
                        # Line is sorted, print it.
                        print quote($columns[0]), ",", quote($columns[1]), "\n";
                        delete $col2{ $columns[1] };
                } else {
                        # Line unsorted, search its equivalent in hash of second column
                        # and print.
                        my $key = $1 if $columns[0] =~ /^(\S*)/;
                        print quote($columns[0]), ",", ( exists $col2{ $key } ? quote($key) : "" ), "\n";
                        delete $col2{ $key } if exists $col2{ $key };
                        # Here, the second unsorted column, search its equivalent in first
                        # column. If not found print it now, else it will be printed later.
                        for my $str ( keys %col1 ) {
                                next LINE if index( $str, $columns[1] ) > -1;
                        }
                        print ",", quote($columns[1]), "\n";
                }
                next;
        }

        ## 3.- Undefined second column: Check if second column is saved in
        ## hash and join it with first column.
        unless ( defined $columns[1] ) {
                my $key = $1 if $columns[0] =~ /^(\S*)/;
                print quote($columns[0]), ",", ( exists $col2{ $key } ? quote($key) : "" ), "\n";
                delete $col2{ $key } if exists $col2{ $key };
        }
}

sub quote {
        my ($str) = $_[0];
        $str =~ s/^(.*)$/"$1"/;
        return $str;
}

Seu arquivo de dados:

"XYZ-ZTE-43255 serverB618 agreed","XYZ-ZTE-44432"
,"XYZ-ZTE-43255"
"XYZ-ZTE-52775 serverB110 agreed",
"XYZ-ZTE-79213 - serverB688 agreed",
"XYZ-ZTE-77323 serverB617 agreed",
"XYZ-ZTE-81422 - serverB609 agreed","XYZ-ZTE-77323"
"XYZ-ZTE-32785 - serverA626 agreed","XYZ-ZTE-52775"
"XYZ-ZTE-43235 - serverA605 disagreed (asdfjlasdj yxvv il lkyeas sadfa)","XYZ-ZTE-43235"
"XYZ-ZTE-11591 serverB144 agreed",
,"XYZ-ZTE-11591"

Os resultados:

$ perl script.pl <yourdatafile
"XYZ-ZTE-43255 serverB618 agreed","XYZ-ZTE-43255"
,"XYZ-ZTE-44432"
"XYZ-ZTE-52775 serverB110 agreed","XYZ-ZTE-52775"
"XYZ-ZTE-79213 - serverB688 agreed",
"XYZ-ZTE-77323 serverB617 agreed","XYZ-ZTE-77323"
"XYZ-ZTE-81422 - serverB609 agreed",
"XYZ-ZTE-32785 - serverA626 agreed",
"XYZ-ZTE-43235 - serverA605 disagreed (asdfjlasdj yxvv il lkyeas sadfa)","XYZ-ZTE-43235"
"XYZ-ZTE-11591 serverB144 agreed","XYZ-ZTE-11591"
    
por 28.06.2011 / 18:25
1

Coloque seus dados em um banco de dados e use as uniões para recuperá-las com base em qualquer critério que desejar. O SQLite é um bom sistema leve, sem nenhum componente de servidor, ótimo para manipular pequenos conjuntos de dados dessa maneira.

    
por 17.06.2011 / 12:56
0

Isso produz sua saída esperada, usando python. É um pouco hackish porque eu não conheço muito o Pyton, mas funciona:)

#!/bin/bash

python -c '
import sys, csv;
reader = csv.reader(sys.stdin)
nbli=0 # line number (to maintin input order)
nbwi=6 # width of line number.. (zero padded) 
left=[] 
rght=[] 
for row in reader:
    nbli+=1
    fnum=format(nbli, "0"+str(nbwi)+"d")
    if row[0] != "": left.append(row[0]+fnum) 
    if row[1] != "": rght.append(row[1]+fnum)
left.sort()
rght.sort()
coll = []
l ,r = 0, 0
while l < len(left) or r < len(rght):
    #
    if l >= len(left):
        numr = rght[r][-nbwi:]
        datr = rght[r][:len(rght[r])-nbwi]
        coll.append(numr + " " + ",\"" + datr + "\"")
        r+=1
    elif r >= len(rght):
        numl = left[l][-nbwi:]
        datl = left[l][:len(left[l])-nbwi]
        coll.append(numl + " " + "\"" + datl + "\",")
        l+=1
    else:
        numl = left[l][-nbwi:]
        numr = rght[r][-nbwi:]
        datl = left[l][:len(left[l])-nbwi]
        datr = rght[r][:len(rght[r])-nbwi]
        #
        if datl.startswith(datr+" "):
            coll.append(numl + " " + "\"" + datl + "\",\"" + datr + "\"")
            l+=1
            r+=1
        elif datl < datr + " ":
            coll.append(numl + " " + "\"" + datl + "\",")
            l+=1
        else:
            coll.append(numr + " " + ",\"" + datr + "\"")
            r+=1

coll.sort()
c = 0
while c < len(coll):
    print coll[c][nbwi+1:]
    c+=1
' \
<<-'STDIN'
"XYZ-ZTE-43255 serverB618 agreed","XYZ-ZTE-44432"
,"XYZ-ZTE-43255"
"XYZ-ZTE-52775 serverB110 agreed",
"XYZ-ZTE-79213 - serverB688 agreed",
"XYZ-ZTE-77323 serverB617 agreed",
"XYZ-ZTE-81422 - serverB609 agreed","XYZ-ZTE-77323"
"XYZ-ZTE-32785 - serverA626 agreed","XYZ-ZTE-52775"
"XYZ-ZTE-43235 - serverA605 disagreed (asdfjlasdj yxvv il lkyeas sadfa)","XYZ-ZTE-43235"
"XYZ-ZTE-11591 serverB144 agreed",
,"XYZ-ZTE-11591"
STDIN

Saída:

"XYZ-ZTE-43255 serverB618 agreed","XYZ-ZTE-43255"
,"XYZ-ZTE-44432"
"XYZ-ZTE-52775 serverB110 agreed","XYZ-ZTE-52775"
"XYZ-ZTE-79213 - serverB688 agreed",
"XYZ-ZTE-77323 serverB617 agreed","XYZ-ZTE-77323"
"XYZ-ZTE-81422 - serverB609 agreed",
"XYZ-ZTE-32785 - serverA626 agreed",
"XYZ-ZTE-43235 - serverA605 disagreed (asdfjlasdj yxvv il lkyeas sadfa)","XYZ-ZTE-43235"
"XYZ-ZTE-11591 serverB144 agreed","XYZ-ZTE-11591"
    
por 29.06.2011 / 06:18