como reformatar a saída do mapeador Kegg no linux?

0

Eu preciso reformatar a saída kegg reconstruct pathway , tenho algo assim em file1:

00550 Peptidoglycan biosynthesis (2)

K01000

K02563

00511 Other glycan degradation (8) K01190   K01191

K01192

K01201

K01227

K12309

Eu preciso de algumas coisas assim no arquivo2:

00550 Peptidoglycan biosynthesis (2)   K01000   K02563
00511 Other glycan degradation (6)   K01190   K01191   K01192   K01201   K01227   K12309

Como eu poderia reformatar isso no linux ou no python?

Obrigado

    
por Dieunel Derilus 23.10.2018 / 15:42

1 resposta

0

Até onde isso te leva:

awk '
!NF             {next                                                   # don"t process empty lines
                }
/^[0-9]+ /      {sub (/\([0-9]*\)/, "(" CNT ")", PRT)                   # for the "glycan" lines (leading numerical)
                                                                        # correct the count in parentheses
                 if (PRT) print PRT                                     # print the PRT buffer (NOT first line when empty)
                 PRT = ""                                               # empty it after print
                 CNT = gsub (/K[0-9]*/, "&") - 1                        # get this line"s "K..." count, corr.for later incr.
                }
                {PRT = sprintf ("%s%s%s", PRT, PRT?" ":"", $0)          # append this line to buffer
                 CNT++                                                  # increment "K..." count
                }
END             {sub (/\([0-9]*\)/, "(" CNT ")", PRT)                   # see above
                 print PRT
                }
' file
00550 Peptidoglycan biosynthesis (2) K01000 K02563
00511 Other glycan degradation (6) K01190   K01191 K01192 K01201 K01227 K12309
    
por 23.10.2018 / 16:25