Mover um retorno de carro que é adicionado ao usar Join

0

Estou unindo dois arquivos delimitados por pipe, mas depois de usar o comando join:

join -a 1 -i -t"|" -o 1.3 1.1 2.2 1.4 1.5 2.3 2.4 2.5 2.6 2.7 2.8 2.9  <(sort -d -t"|" -z  alt.csv) <(sort -d -t"|" -z  ../original/alt.csv) > ../out/alt.csv

O arquivo de saída tem um retorno de carro em que a junção ocorreu, por exemplo:

IRN|EADUnitID|EADPhysicalTechnical|AdmPublishWebNoPassword|AdmPublishWebPassword
|EADUnitTitle|EADBiographyOrHistory|EADScopeAndContent|EADArrangement|EADAcquisitionInformationRef|EADRelatedMaterial|BibBibliographyRef_tab
51899|ga.1.1|GLS Add. GA 1/1|Yes|Yes
|Photographic negatives ||&lt;p&gt;The albums comprise of negatives of Gypsies and Gypsy life in Germany and eastern Europe. The albums have been indexed and the negatives numbered by Althaus in series I-IV; VII-VIII, though numbering is not continuous. The majority of the negatives have duplicates in slide or photograph format (GA 1/2 and GA 3) and reference has been made to these. The captions are those taken from the index except for unindexed negatives, whereupon the caption has been taken from a duplicate photograph or slide. Where there is no duplicate, the caption simply describes what can be seen in the negative. The list also includes 22 negatives that are indexed in the albums but are missing. There is a closed section from GA 1/1/53 - GA 1/1/68 due to the sensitive nature of the negatives. &lt;&#x0002F;p&gt;||||
51900|ga.1.1.1|GLS Add. GA 1/1/1|Yes|Yes
|Ehepaar Weltzel. ||||||
51901|ga.1.1.2|GLS Add. GA 1/1/2|Yes|Yes
|Ehepaar Weltzel. ||||||
51902|ga.1.1.3|GLS Add. GA 1/1/3|Yes|Yes
|Roßlau, Dessauerstr Kegli. Julius Braun, Bitterfield, 1939 Koitsch. ||||||

Mas, para ser processado corretamente, o retorno de carro precisa ocorrer após a última coluna:

IRN|EADUnitID|EADPhysicalTechnical|AdmPublishWebNoPassword|AdmPublishWebPassword|EADUnitTitle|EADBiographyOrHistory|EADScopeAndContent|EADArrangement|EADAcquisitionInformationRef|EADRelatedMaterial|BibBibliographyRef_tab
51899|ga.1.1|GLS Add. GA 1/1|Yes|Yes|Photographic negatives ||&lt;p&gt;The albums comprise of negatives of  life in Germany and eastern Europe. The albums have been indexed and the negatives numbered by Althaus in series I-IV; VII-VIII, though numbering is not continuous. The majority of the negatives have duplicates in slide or photograph format (GA 1/2 and GA 3) and reference has been made to these. The captions are those taken from the index except for unindexed negatives, whereupon the caption has been taken from a duplicate photograph or slide. Where there is no duplicate, the caption simply describes what can be seen in the negative. The list also includes 22 negatives that are indexed in the albums but are missing. There is a closed section from GA 1/1/53 - GA 1/1/68 due to the sensitive nature of the negatives. &lt;&#x0002F;p&gt;||||
51900|ga.1.1.1|GLS Add. GA 1/1/1|Yes|Yes|Ehepaar Weltzel. ||||||
51901|ga.1.1.2|GLS Add. GA 1/1/2|Yes|Yes|Ehepaar Weltzel. ||||||
51902|ga.1.1.3|GLS Add. GA 1/1/3|Yes|Yes|Roßlau, Dessauerstr Kegli. Julius Braun, Bitterfield, 1939 Koitsch. ||||||

Existe uma maneira de usar sed ou awk para obter o resultado desejado? Preciso primeiro adicionar outro pipe ao final da última coluna e fazer uma substituição com base no número de ocorrências?

    
por joesch 04.10.2018 / 12:30

1 resposta

1

Eu meio que encontrei uma solução, mas não é particularmente elegante. Eu decidi adicionar um pipe adicional ao segundo arquivo para entrar, já que isso me permitiu fazer algum processamento adicional para obter o formato correto.

Neste momento, os passos que preciso dar são:

    # add pipe to the end of the line for ORIGINAL files only
    sed -i 's/$/|/' ../original/alt.csv

    --- Do join and output joined file to ../out/alt.csv ---

    # match on last pipe and add a carriage return
    sed -i 's/\(.*\)\|/
    # add pipe to the end of the line for ORIGINAL files only
    sed -i 's/$/|/' ../original/alt.csv

    --- Do join and output joined file to ../out/alt.csv ---

    # match on last pipe and add a carriage return
    sed -i 's/\(.*\)\|/%pre%\r/' ../out/alt.csv

    # remove carriage return where join occurred (the use of pipe is simply to locate carriage return) and replace with pipe
    sed -i 's/\r|/|/' ../out/alt.csv

    # remove all blank lines 
    sed -i '/^\s*$/d' ../out/alt.csv

    # remove pipe at the end of the line of output file and add a carriage return
    sed -i 's/[^\r\n].$/\r/' ../out/alt.csv 
\r/' ../out/alt.csv # remove carriage return where join occurred (the use of pipe is simply to locate carriage return) and replace with pipe sed -i 's/\r|/|/' ../out/alt.csv # remove all blank lines sed -i '/^\s*$/d' ../out/alt.csv # remove pipe at the end of the line of output file and add a carriage return sed -i 's/[^\r\n].$/\r/' ../out/alt.csv

Se houver uma maneira fácil de conseguir isso, ficarei feliz em ouvir.

    
por 04.10.2018 / 15:15