mesclando arquivos com a mesma coluna 1

0

Eu tenho vários arquivos chamados p1.bbmap.mapping.csv, p2.bbmap.mapping.csv, p3.bbmap.mapping.csv ... p3.bbmap.mapping.csv

Cada arquivo tem duas colunas.

p1_60_length_504_cov_1.580902   12
p1_61_length_503_cov_4.457447   24
p1_62_length_500_cov_4.037534   35
p1_63_length_500_cov_1.718499   6
p5_1_length_5181_cov_48.147804  0
p5_2_length_4872_cov_28.387777  0
p5_4_length_4057_cov_39.930534  0
p5_5_length_3873_cov_30.397758  0
p5_6_length_3431_cov_43.591404  8
p5_8_length_3325_cov_10.154159  6
p5_10_length_3289_cov_30.577166 0
p5_11_length_3288_cov_48.411262 0
p5_12_length_3263_cov_28.849171 67
p5_13_length_3258_cov_16.862344 2
p5_14_length_3149_cov_24.703839 0
p5_15_length_3099_cov_329.678331    0
p5_16_length_3055_cov_34.035861 0
p5_17_length_3039_cov_29.560096 0
p5_18_length_2924_cov_22.790490 0
p5_20_length_2793_cov_13.807577 0
p5_21_length_2779_cov_35.737179 0
p5_22_length_2682_cov_23.347554 0
p5_23_length_2682_cov_17.336986 0
p5_24_length_2668_cov_23.246753 0
p5_25_length_2652_cov_46.648317 0
p5_26_length_2639_cov_9.353105  0
p5_27_length_2599_cov_20.695388 1
p5_28_length_2576_cov_28.790935 0
p5_29_length_2571_cov_14.885025 6
p5_30_length_2551_cov_26.988036 1
p5_31_length_2462_cov_10.844540 0
p5_32_length_2323_cov_22.107923 0
p5_33_length_2261_cov_41.717901 0
p5_34_length_2250_cov_34.612341 0
p5_35_length_2242_cov_7.208983  0
p5_37_length_2140_cov_15.349727 0
p8_280_length_1323_cov_4.788462 0
p8_281_length_1317_cov_21.436975    10
p8_282_length_1317_cov_13.748739    0
p8_283_length_1317_cov_5.379832 0
p8_284_length_1315_cov_10.328283    0

O número de linhas é igual em todos os arquivos. Eu quero mesclar-los e adicionar acima de cada coluna (com números) o nome do arquivo (parte do nome). A saída deve ficar assim:

                                    p1  p2  p3  pn
    p1_60_length_504_cov_1.580902   12  51  0   51
    p1_61_length_503_cov_4.457447   24  121 0   21
    p1_62_length_500_cov_4.037534   35  151 0   51
    p1_63_length_500_cov_1.718499   6   5418    0   4
    p5_1_length_5181_cov_48.147804  0   0   0   1
    p5_2_length_4872_cov_28.387777  0   0   1561    0
    p5_4_length_4057_cov_39.930534  0   0   151 0
    p5_5_length_3873_cov_30.397758  0   0   0   0
    p5_6_length_3431_cov_43.591404  8   48  0   0
    p5_8_length_3325_cov_10.154159  6   0   2132    0
    p5_10_length_3289_cov_30.577166 0   21  0   0
    p5_11_length_3288_cov_48.411262 0   0   0   0
    p5_12_length_3263_cov_28.849171 67  0   0   0
    p5_13_length_3258_cov_16.862344 2   0   1521    0
    p5_14_length_3149_cov_24.703839 0   0   0   0
    p5_15_length_3099_cov_329.678331    0   0   0   15
    p5_16_length_3055_cov_34.035861 0   11  0   0
    p5_17_length_3039_cov_29.560096 0   0   115 121
    p5_18_length_2924_cov_22.790490 0   0   1251    0
    p5_20_length_2793_cov_13.807577 0   0   0   15
    p5_21_length_2779_cov_35.737179 0   0   0   0
    p5_22_length_2682_cov_23.347554 0   0   0   0
    p5_23_length_2682_cov_17.336986 0   6   151 0
    p5_24_length_2668_cov_23.246753 0   0   0   0
    p5_25_length_2652_cov_46.648317 0   0   0   0
    p5_26_length_2639_cov_9.353105  0   0   0   151
    p5_27_length_2599_cov_20.695388 1   0   0   0
    p5_28_length_2576_cov_28.790935 0   0   0   0
    p5_29_length_2571_cov_14.885025 6   0   0   0
    p5_30_length_2551_cov_26.988036 1   0   0   0
    p5_31_length_2462_cov_10.844540 0   0   0   0
    p5_32_length_2323_cov_22.107923 0   0   0   0
    p5_33_length_2261_cov_41.717901 0   0   0   0
    p5_34_length_2250_cov_34.612341 0   18  0   0
    p5_35_length_2242_cov_7.208983  0   0   0   0
    p5_37_length_2140_cov_15.349727 0   0   0   0
    p8_280_length_1323_cov_4.788462 0   0   0   0
    p8_281_length_1317_cov_21.436975    10  0   0   0
    p8_282_length_1317_cov_13.748739    0   0   0   0
    p8_283_length_1317_cov_5.379832 0   0   0   0
    p8_284_length_1315_cov_10.328283    0   0   0   0

Alguma idéia?

    
por k_a_r_o_l 07.06.2018 / 18:05

1 resposta

0

Isso parece fazer o truque:

$ printf "\t"; find input? | xargs printf "%s\t"; echo "";  paste input? | awk 'BEGIN { OFS="\t" } {printf "%s%s", $1,OFS; for( i=2;i<=NF;i++) { if( $1 != $i ) { printf "%s%s", $i, OFS } } printf "\n" }'
    input1  input2  input3
p1_61_length_503_cov_4.457447   24  24  24
p1_62_length_500_cov_4.037534   35  35  35
p1_63_length_500_cov_1.718499   6   6   6
p5_1_length_5181_cov_48.147804  0   0   0
p5_2_length_4872_cov_28.387777  0   0   0
p5_4_length_4057_cov_39.930534  0   0   0

As colunas aqui são separadas por tabulações para facilitar o processamento; fazer as colunas se alinharem verticalmente é relativamente trivial daqui.

O script awk real:

BEGIN { 
   OFS="\t" 
} 

{
   printf "%s%s", $1,OFS;
   for( i=2;i<=NF;i++) { 
      if( $1 != $i ) { 
         printf "%s%s", $i, OFS
      }
    } 
    printf "\n"
}
    
por 07.06.2018 / 18:44