Vamos usar sed
:
sed -r 's/.*\|\|\|;(CSQ[^|]+)\|([^|]+)\|([^|]+)\|([^|]+)\|([^|]+)\|.*/\t\t\t\t/' file.txt
python
não é rápido na manipulação de arquivos muito grandes, isso seria muito mais rápido que python
.
Exemplo:
% cat file.txt
2 41620 . T G 100 PASS AC=3;AF=0.000599042;AN=5008;NS=2504;DP=18872;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.0031;AA=.|||;CSQ=G|ENSG00000184731|ENST00000327669|Transcript|missense_variant|954|954|318|K/N|aaA/aaC|||-1|tolerated(0.47)|benign(0)||||;GENCODE=ENST00000327669
2 41620 . T G 100 PASS AC=3;AF=0.000599042;AN=5008;NS=2504;DP=18872;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.0031;AA=.|||;CSQ=G|ENSG00000184731|ENST00000327669|Transcript|missense_variant|954|954|318|K/N|aaA/aaC|||-1|tolerated(0.47)|benign(0)||||;GENCODE=ENST00000327669
2 41620 . T G 100 PASS AC=3;AF=0.000599042;AN=5008;NS=2504;DP=18872;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.0031;AA=.|||;CSQ=G|ENSG00000184731|ENST00000327669|Transcript|missense_variant|954|954|318|K/N|aaA/aaC|||-1|tolerated(0.47)|benign(0)||||;GENCODE=ENST00000327669
2 41620 . T G 100 PASS AC=3;AF=0.000599042;AN=5008;NS=2504;DP=18872;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.0031;AA=.|||;CSQ=G|ENSG00000184731|ENST00000327669|Transcript|missense_variant|954|954|318|K/N|aaA/aaC|||-1|tolerated(0.47)|benign(0)||||;GENCODE=ENST00000327669
% sed -r 's/.*\|\|\|;(CSQ[^|]+)\|([^|]+)\|([^|]+)\|([^|]+)\|([^|]+)\|.*/\t\t\t\t/' file.txt
CSQ=G ENSG00000184731 ENST00000327669 Transcript missense_variant
CSQ=G ENSG00000184731 ENST00000327669 Transcript missense_variant
CSQ=G ENSG00000184731 ENST00000327669 Transcript missense_variant
CSQ=G ENSG00000184731 ENST00000327669 Transcript missense_variant