EQ963472 29264 . G A 212.0 . DP=170;VDB=0.0253;AF1=1;AC1=2;DP4=0,0,79,83;MQ=60;FQ=-282;ANN=A|stop_gained|HIGH|AFLA_072280|AFLA_072280|transcript|EED56534|protein_coding|
EQ963472 31777 . C T 222.0 . DP=179;VDB=0.0245;AF1=1;AC1=2;DP4=0,0,66,95;MQ=60;FQ=-282;ANN=T|stop_gained|HIGH|AFLA_072310|AFLA_072310|transcript|EED56537|protein_coding|
EQ963472 58523 . G A 222.0 . DP=161;VDB=0.0269;AF1=1;AC1=2;DP4=0,0,71,83;MQ=60;FQ=-282;ANN=A|start_lost|HIGH|AFLA_072370|AFLA_072370|transcript|EED56543|protein_coding|1
EQ963472 171022 . A C 222.0 . DP=164;VDB=0.0253;AF1=1;AC1=2;DP4=0,0,90,65;MQ=60;FQ=-282;ANN=C|stop_lost&splice_region_variant|HIGH|AFLA_072870|AFLA_072870|transcript|EED5
EQ963472 174382 . C T 136.0 . DP=159;VDB=0.0253;AF1=1;AC1=2;DP4=0,0,65,76;MQ=60;FQ=-282;ANN=T|stop_gained|HIGH|AFLA_072890|AFLA_072890|transcript|EED56595|protein_coding|
EQ963472 185314 . T C 77.0 . DP=168;VDB=0.0259;AF1=1;AC1=2;DP4=0,0,6,2;MQ=60;FQ=-51;ANN=C|stop_lost|HIGH|AFLA_072940|AFLA_072940|transcript|EED56600|protein_coding|1/1|c
EQ963472 188490 . C T 217.0 . DP=175;VDB=0.0267;AF1=1;AC1=2;DP4=0,1,86,78;MQ=60;FQ=-282;PV4=0.48,8.8e-08,1,1;ANN=T|stop_gained|HIGH|AFLA_072960|AFLA_072960|transcript|EED
Buscar IDs com AFLA_ * e ids de coluna da primeira correspondência usando correspondência padrão e expressão regular
Eu tentei buscar apenas os IDs do AFLA usando o seguinte:
grep -o "AFLA_ [0-9]" A1.SNP.contig.snpeff_high.out | menos
Isto resulta em:
AFLA_0
AFLA_0
AFLA_0
AFLA_0
Podemos concatenar as duas primeiras colunas com um _ para obter um ID único.
A saída do arquivo deve ter duas primeiras colunas e o AFLA_ *
a saída deve ser:
EQ963472_29264 AFLA_072280 AFLA_072280