Bash: combinação da composição dos vários arquivos de log com base no padrão de pesquisa


Eu tenho uma pasta com muitos dos arquivos txt. Onde cada arquivo está presente no seguinte formato:

Allowed overlap: -3
H-bond overlap reduction: 0.4
Ignore contacts between atoms separated by 4 bonds or less
Detect intra-residue contacts: False
Detect intra-molecule contacts: False

19 contacts
atom1  atom2  overlap  distance
:128.B@BB  :300.C@BB  -1.676  4.996
:179.B@BB  :17.C@BB   -1.898  5.218
:182.B@BB  :17.C@BB   -2.015  5.335

Meu objetivo: é fazer o loop dos arquivos dentro da pasta e combiná-los dentro da saída global. Notável no exemplo, quero considerar apenas as strings após (e incluindo) os contatos "19 (esse número é diferente em cada arquivo)", pulando assim as seis primeiras linhas do arquivo.

Possível fluxo de trabalho para a realização:

# make a log file which will contain info from all files going to be looped on the next step.
echo "This is a beginning of the global output" > ./final_output.txt
# that is a key phrase which is the indicator of the first string which should be taken from each of the files
key= "#any of the digit# contacts" 

#now I want to loop each of the files with the aim to add all of the strings after (and including) ${key} to the final_output.txt
for file in ${folder}/*.txt; do
  file_title=$(basename "$file")
  # 1- print the ${file_title} within the final_output.txt
  # 2 -  add all of the strings from the file into the final_output.txt
  # NB ! I need to take only the strings after (and including) the key-phrace

por user3470313 07.12.2017 / 15:50

2 respostas


Tomemos o exemplo de 3 arquivos


Allowed overlap: -3
H-bond overlap reduction: 0.4
Ignore contacts between atoms separated by 4 bonds or less
Detect intra-residue contacts: False
Detect intra-molecule contacts: False

19 contacts
atom1  atom2  overlap  distance
:128.B@BB  :300.C@BB  -1.676  4.996
:179.B@BB  :17.C@BB   -1.898  5.218
:182.B@BB  :17.C@BB   -2.015  5.335


Allowed overlap: -3
H-bond overlap reduction: 0.4
Ignore contacts between atoms separated by 4 bonds or less
Detect intra-residue contacts: False
Detect intra-molecule contacts: False

17 contacts
atom1  atom2  overlap  distance
:128.B@BB  :300.C@BB  -1.676  4.996
:179.B@BB  :17.C@BB   -1.898  5.218
:182.B@BB  :17.C@BB   -2.015  5.335


Allowed overlap: -3
H-bond overlap reduction: 0.4
Ignore contacts between atoms separated by 4 bonds or less
Detect intra-residue contacts: False
Detect intra-molecule contacts: False

12 contacts
atom1  atom2  overlap  distance
:128.B@BB  :300.C@BB  -1.676  4.996
:179.B@BB  :17.C@BB   -1.898  5.218
:182.B@BB  :17.C@BB   -2.015  5.335

Abaixo está o código que salvará a saída de 19contacts, 17contacts, 12 contatos até o final do arquivo

 for i in file1 file3 file4; do sed -n '/^[0-9]/,$p'  $i; done > /var/tmp/outputfile.txt


19 contacts
atom1  atom2  overlap  distance
:128.B@BB  :300.C@BB  -1.676  4.996
:179.B@BB  :17.C@BB   -1.898  5.218
:182.B@BB  :17.C@BB   -2.015  5.335
17 contacts
atom1  atom2  overlap  distance
:128.B@BB  :300.C@BB  -1.676  4.996
:179.B@BB  :17.C@BB   -1.898  5.218
:182.B@BB  :17.C@BB   -2.015  5.335
12 contacts
atom1  atom2  overlap  distance
:128.B@BB  :300.C@BB  -1.676  4.996
:179.B@BB  :17.C@BB   -1.898  5.218
:182.B@BB  :17.C@BB   -2.015  5.335
por 07.12.2017 / 16:17

Eu encontrei outro método com os mesmos arquivos de entrada


 for i in file1 file3 file4; do sed '1,6d'  $i; done > /var/tmp/outputfile.txt


19 contacts
atom1  atom2  overlap  distance
:128.B@BB  :300.C@BB  -1.676  4.996
:179.B@BB  :17.C@BB   -1.898  5.218
:182.B@BB  :17.C@BB   -2.015  5.335
17 contacts
atom1  atom2  overlap  distance
:128.B@BB  :300.C@BB  -1.676  4.996
:179.B@BB  :17.C@BB   -1.898  5.218
:182.B@BB  :17.C@BB   -2.015  5.335
12 contacts
atom1  atom2  overlap  distance
:128.B@BB  :300.C@BB  -1.676  4.996
:179.B@BB  :17.C@BB   -1.898  5.218
:182.B@BB  :17.C@BB   -2.015  5.335
por 07.12.2017 / 16:25