rearranje a saída IO do primer3 boulder

0

Estou tentando reorganizar a saída primer3_core .

Por exemplo:

SEQUENCE_ID=ID_1
PRIMER_LEFT_0_SEQUENCE=ACGTGTAGCGGTTCAGACG
PRIMER_RIGHT_0_SEQUENCE=ACCATGCATGATCCATCCAGG
PRIMER_LEFT_1_SEQUENCE=CACAGCCACAGCAGCACAC
PRIMER_RIGHT_1_SEQUENCE=ATGCAGGTGATCAAGTTACGCC
=
SEQUENCE_ID=ID_2
PRIMER_LEFT_0_SEQUENCE=CACAGCCACAGCAGCACAC
PRIMER_RIGHT_0_SEQUENCE=GCAGGTGATCAAGTTACGCCATT
=

Assim, é possível que cada ID tenha um número diferente de primers que produz, entre 0 e 20.

A saída ficaria assim:

ID_1 ACGTGTAGCGGTTCAGACG
ID_1 ACCATGCATGATCCATCCAGG
ID_1 CACAGCCACAGCAGCACAC
ID_1 ATGCAGGTGATCAAGTTACGCC
ID_2 CACAGCCACAGCAGCACAC
ID_2 GCAGGTGATCAAGTTACGCCATT
    
por agp5432 02.10.2017 / 15:53

3 respostas

1

awk -F= '$0 ~ "^SEQUENCE" {SEQ=$2} $0 !~ "^SEQUENCE" { print SEQ" "$2 }' filename

Use awk e use = como o delimitador de campo. Onde, então, a linha começa com SEQUENCE, defina a variável SEQ igual à segunda parte delimitada. Para todas as outras instâncias, imprima SEQ juntamente com o segundo fragmento de dados

    
por 02.10.2017 / 16:22
1

Abordagem :

awk -F'=' '/^SEQUENCE_ID/{ s = $2 }/^PRIMER/{ print s, $2 }' file

A saída:

ID_1 ACGTGTAGCGGTTCAGACG
ID_1 ACCATGCATGATCCATCCAGG
ID_1 CACAGCCACAGCAGCACAC
ID_1 ATGCAGGTGATCAAGTTACGCC
ID_2 CACAGCCACAGCAGCACAC
ID_2 GCAGGTGATCAAGTTACGCCATT
    
por 02.10.2017 / 17:08
0

Usando um script sed :

# delete lines starting with '='
/^=/d

# handle sequence ID lines
/^SEQUENCE_ID=/{
    # remove everything up to and including the '='
    s///
    # put the sequence ID in the hold space
    h
    # delete the pattern space and continue with next line
    d
}

# handle primer lines
/^PRIMER.*=/{
    # remove everything up to and including the '='
    s///
    # append a newline and the sequence ID from the hold space to the pattern space
    G
    # swap the two bits of the pattern space around, deleting the newline
    s/^\(.*\)\n\(.*\)$/ /
}

Teste:

$ sed -f script.sed file
ID_1 ACGTGTAGCGGTTCAGACG
ID_1 ACCATGCATGATCCATCCAGG
ID_1 CACAGCCACAGCAGCACAC
ID_1 ATGCAGGTGATCAAGTTACGCC
ID_2 CACAGCCACAGCAGCACAC
ID_2 GCAGGTGATCAAGTTACGCCATT

Sem um arquivo de script separado:

$ sed -e '/^=/d' -e '/^SEQUENCE_ID=/{s///;h;d;}' -e '/^PRIMER.*=/{s///;G;s/^\(.*\)\n\(.*\)$/ /;}' file
ID_1 ACGTGTAGCGGTTCAGACG
ID_1 ACCATGCATGATCCATCCAGG
ID_1 CACAGCCACAGCAGCACAC
ID_1 ATGCAGGTGATCAAGTTACGCC
ID_2 CACAGCCACAGCAGCACAC
ID_2 GCAGGTGATCAAGTTACGCCATT

Variante mais curta:

$ sed -n -e '/^SEQUENCE_ID=/{s///;h;}' -e '/^PRIMER.*=/{s///;G;s/^\(.*\)\n\(.*\)$/ /p;}' file
    
por 02.10.2017 / 19:14