Como remover uma linha se a diferença entre duas colunas for menor que 2000

1

Eu tenho um conjunto de dados com a seguinte aparência:

chr1    HAVANA  gene    69091   70008   .   +   .   gene_id "ENSG00000186092.4"; transcript_id "ENSG00000186092.4"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "OR4F5"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "OR4F5"; level 2; havana_gene "OTTHUMG00000001094.1";
chr1    ENSEMBL gene    134901  139379  .   -   .   gene_id "ENSG00000237683.5"; transcript_id "ENSG00000237683.5"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "AL627309.1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "AL627309.1"; level 3;
chr1    HAVANA  gene    367640  368634  .   +   .   gene_id "ENSG00000235249.1"; transcript_id "ENSG00000235249.1"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "OR4F29"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "OR4F29"; level 2; havana_gene "OTTHUMG00000002860.1";
chr1    HAVANA  gene    621059  622053  .   -   .   gene_id "ENSG00000185097.2"; transcript_id "ENSG00000185097.2"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "OR4F16"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "OR4F16"; level 2; havana_gene "OTTHUMG00000002581.1";
chr1    ENSEMBL gene    738532  739137  .   -   .   gene_id "ENSG00000269831.1"; transcript_id "ENSG00000269831.1"; gene_type "protein_coding"; gene_status "NOVEL"; gene_name "AL669831.1"; transcript_type "protein_coding"; transcript_status "NOVEL"; transcript_name "AL669831.1"; level 3;

Eu gostaria de remover genes onde a diferença entre $ 5 e $ 4 é menor que 2000 usando o awk, se for possível. Embora sed seja aceitável também.

Por isso, retorna o seguinte:

 chr1   ENSEMBL gene    134901  139379  .   -   .   gene_id "ENSG00000237683.5"; transcript_id "ENSG00000237683.5"; gene_type "protein_coding"; gene_status "KNOWN"; gene_name "AL627309.1"; transcript_type "protein_coding"; transcript_status "KNOWN"; transcript_name "AL627309.1"; level 3;

Obrigado.

    
por System 14.01.2016 / 20:26

1 resposta

3

awk '$5-$4 >= 2000' file

se $ 5 sempre for maior que $ 4

    
por 14.01.2016 / 20:32