Substitua strings diferentes com base em uma correspondência por uma lista de strings em outro arquivo

Question

Substitua strings diferentes com base em uma correspondência por uma lista de strings em outro arquivo

#1 resposta do (1 votos)
#2 resposta do (1 votos)

0

Eu tenho um arquivo como este com o delimitador de tabulação:

Chr1    mak   gene    120221  120946  .       +       .       ID=spa-h0003.02;Name=spa-h0003.02
Chr1    mak   mRNA    120221  120946  .       +       .       ID=spa-cap_Chr1_00M;Parent=spa-h0003.02;Name=spa-cap_Chr1_00M
Chr1    mak   exon    120221  120946  .       +       .       Parent=spa-cap_Chr1_00M
Chr1    mak   gene    18546165        18546939        .       +       .       ID=spa-h0004.02;Name=spa-h0004.02
Chr1    mak   mRNA    18546165        18546939        .       +       .       ID=spa-cap_Chr1_18;Parent=spa-h0004.02;Name=spa-cap_Chr1_18
Chr1    mak   exon    18546165        18546504        .       +       .       Parent=spa-cap_Chr1_18
Chr1    mak   exon    18546791        18546939        .       +       .       Parent=spa-cap_Chr1_18

Eu quero substituir diferentes strings apenas se a terceira coluna tiver "gene". Mas as strings na nona coluna devem ser substituídas de acordo com as informações presentes em um segundo arquivo como este (com abas):

spa-h0003.02  spa-cap_Chr1_00M
spa-h0004.02  spa-cap_Chr1_18

Não sei como fazer isso. Eu estava pensando em algo como (XX deve ser a informação do segundo arquivo?):

cat file | awk '$3 == "gene" && $9 == "spa-" {$9 = "XX"} {print}'

Mas como posso usar as informações do segundo arquivo? Talvez:

while read n k; do sed -i 's/$n/$k/g' file1; done < fileA

awk sed replace

por Paul 18.10.2018 / 14:40

2 respostas

1

Uma escolha impopular: Tcl. O Tcl tem um bom comando string map que realiza exatamente isso. Infelizmente, o Tcl não é realmente construído para uma linha única.

echo '
    # read the mapping file into a list
    set fh [open "mapping" r]
    set content [read $fh]
    close $fh
    set mapping [regexp -all -inline {\S+} $content]

    # read the contents of the data file
    # and apply mapping to field 9 when field 3 is "gene"
    set fh [open "file" r]
    while {[gets $fh line] != -1} {
        set fields [split $line \t]
        if {[lindex $fields 2] eq "gene"} {
            lset fields 8 [string map $mapping [lindex $fields 8]]
        }
        puts [join $fields \t]
    }
    close $fh
' | tclsh

Com o awk, eu escrevia:

awk -F'\t' -v OFS='\t' '
    NR == FNR {repl[$1]= $2; next}
    $3 == "gene" {
        for (seek in repl) 
            while ((idx = index($9, seek)) > 0) 
                $9 = substr($9, 1, idx-1) repl[seek] substr($9, idx + length(seek))
    }
    {print}
' mapping file

por 18.10.2018 / 15:19

Tags awk sed replace

O que é responsável pelas permissões de arquivo em um sistema Linux? Obter lista de tela em exibição

score 1 · Accepted Answer

Assumindo que file1 contém o texto a ser substituído, file2 contém o texto de substituição, e você pode confiar no ID= para realizar a pesquisa entre ambos, você pode usar esse script (mais popular, eu acho) awk :

awk -F'\t' '
  NR==FNR{
    a[$1]=$2                                   # fills the array a with the replacement text
    next
  }
  $3=="gene"{                                  # check only lines with 'gene'
    id=gensub("ID=([^;]*);.*","\1",1,$9);     # extract the id string
    if(id in a)                                # if the id is part of the array a
       gsub(id,a[id])                          # replace it
  }
  1                                            # print the line
' file2 file1