Conceder grep
com P
erl C
ompatible RE
gexp module uma tentativa:
-
para remover combinações de duas letras:
pcregrep -Mv '>.*\n([ACGT])*([ACGT])*(|)*$' file
saída:
>NB501013:9:HJJ75BGXX:4:21602:19346:16945/2 CTCGTCGCATCACAAAGGGAT >NB501013:9:HJJ75BGXX:3:11407:17650:13229/2 CCGCGGGCCGGTGCGGGGGTTTTTTTGTTTTTTTGGTTACAACGGGTGGG >NB501013:9:HJJ75BGXX:3:13509:1817:13239/2 CAGCCC >NB501013:9:HJJ75BGXX:4:22611:20567:13384/2 GAATA
-
para remover a combinação de 5 letras ou menos:
pcregrep -Mv '>.*\n[ACGT]{1,5}$' file
saída:
>NB501013:9:HJJ75BGXX:4:13609:24076:18015/2 GGGGGGGAAAAAAA >NB501013:9:HJJ75BGXX:4:21602:19346:16945/2 CTCGTCGCATCACAAAGGGAT >NB501013:9:HJJ75BGXX:3:11407:17650:13229/2 CCGCGGGCCGGTGCGGGGGTTTTTTTGTTTTTTTGGTTACAACGGGTGGG >NB501013:9:HJJ75BGXX:3:13509:1817:13239/2 CAGCCC