Excluindo várias strings com sed

0

Eu tenho um arquivo muito grande do qual preciso excluir várias linhas. Parece:

CAM_READ_0623233309 /library_id=CAM_LIB_002149 /sample_id=CAM_SMPL_003380 raw_id=G9ALM7U02F5HAW length=383 /IP_notice=?This genetic information downloaded from CAMERA may be considered to be part of the genetic patrimony of Denmark, the country from which the sample was obtained. Users of this information agree to: 1) acknowledge Denmark as the country of origin in any country where the genetic information is presented and 2) contact the CBD focal point identified on the CBD website (http://www.cbd.int/countries/) if they intend to use the genetic information for commercial purposes.? TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT

Minha saída deve se parecer com

CAM_READ_0623233309 TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT

O seguinte comando sed funcionará? sed -i "// library_id = CAM_LIB_ \ d {6} / sample_id = CAM_SMPL_ \ d {6} raw_id = G \ d {1} [AZ] {3} \ d {1} [AZ] {1} \ d {2} [AZ] {1} \ d {1} [AZ] {3} tamanho = \ d {3} / IP_notice = \? Esta informação genética baixada da CAMERA pode ser considerada como parte do patrimônio genético da Dinamarca O usuário desta informação concorda em: 1) reconhecer a Dinamarca como o país de origem em qualquer país onde a informação genética é apresentada e 2) entrar em contato com o ponto focal da CDB identificado no site da CDB (http: : //www.cbd.int/countries/) se pretenderem usar a informação genética para fins comerciais. \? / d 'g1.fa

    
por meenalm 01.05.2016 / 01:30

1 resposta

1

Considerando que sua entrada é apenas uma linha longa e você deseja obter o primeiro e o último item, podemos usar awk para fazer exatamente isso. O comando para isso seria:

awk '{printf $1"\n"$NF"\n"}' data.txt

Exemplo de saída:

$> cat data.txt                                                                                                          
CAM_READ_0623233309 /library_id=CAM_LIB_002149 /sample_id=CAM_SMPL_003380 raw_id=G9ALM7U02F5HAW length=383 /IP_notice=?This genetic information downloaded from CAMERA may be considered to be part of the genetic patrimony of Denmark, the country from which the sample was obtained. Users of this information agree to: 1) acknowledge Denmark as the country of origin in any country where the genetic information is presented and 2) contact the CBD focal point identified on the CBD website (http://www.cbd.int/countries/) if they intend to use the genetic information for commercial purposes.? TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
$> awk '{printf $1"\n"$NF"\n"}' data.txt                                                                                     
CAM_READ_0623233309
TGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGTGT
$> 
    
por Sergiy Kolodyazhnyy 30.06.2016 / 08:32