edita nomes em árvore filogenética em formato newick [closed]

1

Eu tenho uma árvore filogenética no formato newick, e gostaria de remover alguns fragmentos do nome dos táxons,

1_[genus_specie_1]_characters:0.2654682758,(((((((((((((((2_[genus_specie_2]_characters:0.0379334280,54_[genus_specie_2]_characters:0.0605802067)/1/100:0.0121248674,(3_[genus_specie_3]_characters:0.0206432295,4_[genus_specie_4]_characters:0.0141250479)/1/100:0.0647820408)/1/100:0.0235327264,30_[genus_specie_5]_characters

Eu gostaria de remover os fragmentos dos colchetes, por exemplo

genus_specie_1:0.2654682758,(((((((((((((((genus_specie_2:0.0379334280,genus_specie_2:0.0605802067)/1/100:0.0121248674,(genus_specie_3:0.0206432295,genus_specie_4:0.0141250479)/1/100:0.0647820408)/1/100:0.0235327264,genus_specie_5

Eu tentei com um forro perl remover todos os colchetes

perl -i -pe 'y/[]//d' file.nwk

e eu também tentei o próximo comando sed

sed 's/[[:alnum:]_]*\[\([[:alnum:]_]*\)\][[:alnum:]_]*//g' 

mas não funciona

    
por erick rodriguez 11.10.2018 / 23:48

1 resposta

1

perl regexes são bons aqui:

$ initial='1_[genus_specie_1]_characters:0.2654682758,(((((((((((((((2_[genus_specie_2]_characters:0.0379334280,54_[genus_specie_2]_characters:0.0605802067)/1/100:0.0121248674,(3_[genus_specie_3]_characters:0.0206432295,4_[genus_specie_4]_characters:0.0141250479)/1/100:0.0647820408)/1/100:0.0235327264,30_[genus_specie_5]_characters'
$ expected='genus_specie_1:0.2654682758,(((((((((((((((genus_specie_2:0.0379334280,genus_specie_2:0.0605802067)/1/100:0.0121248674,(genus_specie_3:0.0206432295,genus_specie_4:0.0141250479)/1/100:0.0647820408)/1/100:0.0235327264,genus_specie_5'

$ result=$( perl -pe 's/\d+_\[(.+?)\]_.*?(?=:|$)/$1/g' <<<"$initial" )

$ [[ $result = $expected ]] && echo yes
yes

Isso usa quantificadores não-vorazes ( .*? ) e um look-ahead ( (?=:|$) )

    
por 12.10.2018 / 00:46