awk :
Supondo um fragmento de teste do arquivo batch_1.catalog.tags.tsv
:
0 1 6 gi|586799556|ref|NW_006530744.1| 141 + consensus 0 1_33,14_43 CGGGCGGTGGTGGCGCACGCCTTTAATCCCAGCACTTGGGAGGCAGAGGCAGGTGGATCTTTGTGAGTTCGAGGCCAGCCTGGGCTACCAAGTGAGCTCC 0 0 0 0
1 2 7 hi|686711556|ref|NW_006530744.2| 141 + consensus 0 1_33,14_43 CGGGCGGTGGTGGCGCACGCCTTTAATCCCAGCACTTGGGAGGCAGAGGCAGGTGGATCTTTGTGAGTTCGAGGCCAGCCTGGGCTACCAAGTGAGCTCC 1 1 0 1
2 2 8 hi|686711556|ref|NW_006530744.2| 141 + consensus 0 1_33,14_43 CGGGCGGTGGTGGCGCACGCCTTTAATCCCAGCACTTGGGAGGCAGAGGCAGGTGGATCTTTGTGAGTTCGAGGCCAGCCTGGGCTACCAAGTGAGCTCC 1 1 1 1
3 3 9 th|776711556|ref|NW_006530744.2| 141 + consensus 1 1_33,14_43 CGGGCGGTGGTGGCGCACGCCTTTAATCCCAGCACTTGGGAGGCAGAGGCAGGTGGATCTTTGTGAGTTCGAGGCCAGCCTGGGCTACCAAGTGAGCTCC 1 0 1 1
E um fragmento de teste do arquivo whitelight.txt
:
6
7
9
O comando:
awk 'NR==FNR{ a[$0]++; next }{ if ($3 in a) {
$0=">"$3 FS $4 FS $5 FS $6 FS $11 FS $12 FS $13 FS $14 RS $10; print}}' whitelist.txt batch_1.catalog.tags.tsv > cat.fa
Conteúdo final de cat.fa
:
>6 gi|586799556|ref|NW_006530744.1| 141 + 0 0 0 0
CGGGCGGTGGTGGCGCACGCCTTTAATCCCAGCACTTGGGAGGCAGAGGCAGGTGGATCTTTGTGAGTTCGAGGCCAGCCTGGGCTACCAAGTGAGCTCC
>7 hi|686711556|ref|NW_006530744.2| 141 + 1 1 0 1
CGGGCGGTGGTGGCGCACGCCTTTAATCCCAGCACTTGGGAGGCAGAGGCAGGTGGATCTTTGTGAGTTCGAGGCCAGCCTGGGCTACCAAGTGAGCTCC
>9 th|776711556|ref|NW_006530744.2| 141 + 1 0 1 1
CGGGCGGTGGTGGCGCACGCCTTTAATCCCAGCACTTGGGAGGCAGAGGCAGGTGGATCTTTGTGAGTTCGAGGCCAGCCTGGGCTACCAAGTGAGCTCC
Detalhes :
NR==FNR
- executa a ação para o primeiro arquivo, por exemplo, whitelight.txt
a[$0]++;
- acumulando números do arquivo whitelight.txt
if ($3 in a)
- permite ação se um valor da 3ª coluna do 2º arquivo corresponder a qualquer um dos números acumulados
RS
- separador de registros do awk, padrão para o caractere de nova linha