Se o @Mark solicitar que o arquivo CSV contenha um valor por linha, você poderá fazer isso trivialmente substituindo sua lista inicial por uma substituição de comando:
for ACC in 'cat csvfile'
do
...
done
Estou tentando substituir esses números "A00002 X53307 BB145968 CAA42669 V00181 AH002406 HQ844023" no loop a seguir, com uma nova lista de números. Mas minha nova lista é um arquivo .CSV e existem centenas de números. Minha pergunta é, posso ler o arquivo .CSV diretamente e fazê-lo funcionar como a lista no loop for?
for ACC in A00002 X53307 BB145968 CAA42669 V00181 AH002406 HQ844023
do
echo -n -e "$ACC\t"
curl -s "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore&id=${ACC}&rettype=fasta&retmode=xml" |\
grep TSeq_taxid |\
cut -d '>' -f 2 |\
cut -d '<' -f 1 |\
tr -d "\n"
echo
done
O arquivo .csv tem esta aparência:
WP_004064712.1
WP_023555236.1
WP_051593235.1
KAJ52037.1
WP_012103448.1
WP_049740904.1
WP_003346264.1
WP_026134014.1
WP_051870539.1
AKF93952.1
XP_008397367.1
XP_014896959.1
XP_007567109.1
XP_014847432.1
EHG27035.1
EGX75147.1
WP_033630878.1
Se você sabe a quais valores substituirá "A00002 X53307 BB145968 CAA42669 V00181 AH002406 HQ844023" para fazer isso:
CSV='cat csvfile'
for LINE in $CSV
do
sed -i "s/A00002/NewValue/g" $CSV
sed -i "s/X53307/NewValue/g" $CSV
...
done
Explicação do comando Sed:
sed -i "s/X53307/NewValue/g" $CSV
O que este comando está fazendo é: Substitua X53307 por NewValue diretamente no arquivo $ CSV.
Você está esquecendo duas coisas aqui:
Assim, você não precisa substituir os valores da string, basta sobrescrevê-los.
Antigo:
<?xml version="1.0"?>
<!DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">
<TSeqSet>
<TSeq>
<TSeq_seqtype value="nucleotide"/>
<TSeq_gi>39899</TSeq_gi>
<TSeq_accver>X53307.1</TSeq_accver>
<TSeq_taxid>1423</TSeq_taxid>
<TSeq_orgname>Bacillus subtilis</TSeq_orgname>
<TSeq_defline>Bacillus subtilis epr gene for a novel serine protease</TSeq_defline>
<TSeq_length>2521</TSeq_length>
<TSeq_sequence>GTTAACAGGATATCCGAGCTTATCGGCCCACTCGTTCCCAAACACACTCGCCATGAAATCAGCATACCCCGGAATCGGCAAGCTCGTTAAAATCAAGAAGACAGACCCGATAATAATCAGCGGCATGGACTGGATAATTCCGTCACGCAAAGCGCTGAGATGCCGCTGCCCGGCAATTTTCCCGGCGACAGGCATTATTTTTTCCTCCATCACCCGAGTGAATGTGCTCATCTTAAAAACCCCCTTTTCTCATTGCTTTGTGAACAACAACCTCCGCAATGTTTTCTTTATCTTATTTTGAAAACGCTTAGAAATTCATTTGGAAAATTTCCTCTTCATGCGGAAAAAATCTGCATTTTGCTAAACAACCCTGCCCATGAAAATTTTTTCCTTCTTACTATTAATCTCTCTTTTTTTCTCCGATATATATATCAAACATCATAGAAAAAGGAGATGAATCATGAAAAACATGTCTTGCAAACTTGTTGTATCAGTCACTCTGTTTTTCAGTTTTCTCACCATAGGCCCTCTCGCTCATGCGCAAAACAGCAGCGAGAAAGAGGTTATTGTGGTTTATAAAAACAAGGCCGGAAAGGAAACCATCCTGGACAGTGATGCTGATGTTGAACAGCAGTATAAGCATCTTCCCGCGGTAGCGGTCACAGCAGACCAGGAGACAGTAAAAGAATTAAAGCAGGATCCTGATATTTTGTATGTAGAAAACAACGTATCATTTACCGCAGCAGACAGCACGGATTTCAAAGTGCTGTCAGACGGCACTGACACCTCTGACAACTTTGAGCAATGGAACCTTGAGCCCATTCAGGTGAAACAGGCTTGGAAGGCAGGACTGACAGGAAAAAATATCAAAATTGCCGTCATTGACAGCGGGATCTCCCCCCACGATGACCTGTCGATTGCCGGCGGGTATTCAGCTGTCAGTTATACCTCTTCTTACAAAGATGATAACGGCCACGGAACACATGTCGCAGGGATTATCGGAGCCA
AGCATAACGGCTACGGAATTGACGGCATCGCACCGGAAGCACAAATATACGCGGTTAAAGCGCTTGATCAGAACGGCTCGGGGGATCTTCAAAGTCTTCTCCAAGGAATTGACTGGTCGATCGCAAACAGGATGGACATCGTCAATATGAGCCTTGGCACGACGTCAGACAGCAAAATCCTTCATGACGCCGTGAACAAAGCATATGAACAAGGTGTTCTGCTTGTTGCCGCAAGCGGTAACGACGGAAACGGCAAGCCAGTGAATTATCCGGCGGCATACAGCAGTGTCGTTGCGGTTTCAGCAACAAACGAAAAGAATCAGCTTGCCTCCTTTTCAACAACTGGAGATGAAGTTGAATTTTCAGCACCGGGGACAAACATCACAAGCACTTACTTAAACCAGTATTATGCAACGGGAAGCGGAACATCCCAAGCGACACCGCACGCCGCTGCCATGTTTGCCTTGTTAAAACAGCGTGATCCTGCCGAGACAAACGTCCAGCTTCGCGAGGAAATGCGGAAAAACATCGTTGATCTTGGTACCGCAGGCCGCGATCAGCAATTTGGCTACGGCTTAATCCAGTATAAAGCACAGGCAACAGATTCAGCGTACGCGGCAGCAGAGCAAGCGGTGAAAAAAGCGGAACAAACAAAAGCACAAATCGATATCAACAAAGCGCGAGAACTCATCAGCCAGCTGCCGAACTCCGACGCCAAAACTGCCCTGCACAAAAGACTGGATAAAGTACAGTCATACAGAAATGTAAAAGATGCGAAAGACAAAGTCGCAAAGGCAGAAAAATATAAAACACAGCAAACCGTTGACACAGCACAAACTGCCATCAACAAGCTGCCAAACGGAACAGACAAAAAGAACCTTCAAAAACGCTTAGACCAAGTAAAACGATACATCGCGTCAAAGCAAGCGAAAGACAAAGTTGCGAAAGCGGAAAAAAGCAAAAAGAAAACAGATGTGGACAGCGCACAATCAGCAATTGGCAAGCTGCCTGCAAGTTCAGAAAA
AACGTCCCTGCAGAAACGCCTTAACAAAGTGAAGAGCACCAATTTGAAGACGGCACAGCAATCCGTATCTGCGGCTGAAAAGAAATCAACTGATGCAAATGCGGCAAAAGCACAATCAGCCGTCAATCAGCTTCAAGCAGGCAAGGACAAAACGGCATTGCAAAAACGGTTAGACAAAGTGAAGAAAAAGGTGGCGGCGGCTGAAGCAAAAAAAGTGGAAACTGCAAAGGCAAAAGTGAAGAAAGCGGAAAAAGACAAAACAAAGAAATCAAAGACATCCGCTCAGTCTGCAGTGAATCAATTAAAAGCATCCAATGAAAAAACAAAGCTGCAAAAACGGCTGAACGCCGTCAAACCGAAAAAGTAACCAAAAACCTTTAAGATTTGCATTCCAAGTCTTAAAGGTTTTTTTCATTCTAAGAACACCACACACAACCTTTTTCCCATCCATTGTACAGGCTTTTCATACTATTGCTATACAGCCATGAAC</TSeq_sequence>
</TSeq>
</TSeqSet>
Novo:
<?xml version="1.0"?>
<!DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">
<TSeqSet>
<TSeq>
<TSeq_seqtype value="protein"/>
<TSeq_gi>490166065</TSeq_gi>
<TSeq_accver>WP_004064712.1</TSeq_accver>
<TSeq_taxid>97253</TSeq_taxid>
<TSeq_orgname>Eubacterium plexicaudatum</TSeq_orgname>
<TSeq_defline>hypothetical protein [Eubacterium plexicaudatum]</TSeq_defline>
<TSeq_length>1508</TSeq_length>
<TSeq_sequence>MKKSFMTRVLAVSLSAAMAFSMSSASNLVTASAASTVNLKTTFKTLKVGQTYKLTLKKNTLNWKITKVQTTNKKICTVYGKTASSVMLKGKGVGRAKISVKVKTTKRKYPKNIKIMKCTANVKAADGSGTTDEFKVTSATASSNTEVRVMFSKAIDAAEMTNFTVSDSVTVSKAELSEDKKSVLLTIAGAEYGKNYELTVNGIKVAGKEQAAQKVTFTTPSASEKYPTTLEAKDPVLASDGHSQTLVTFTIKDANGNPITDKGVEVAFATSLGKFAEQRVSIQNGVATVMYTSEALMETQTSAITATVVESTDNQELMGLSATSSITLTPNPDEFNIVPIITSITAPTADRVIAYFNEKVSASDFKTASGKLDHSKFTANVAWGFDNGFDELGNRLVGRSNVVGILDVPGSDNALQLLVDRPMTDNTNISVTFENKTKASSLVSASNTVYTKLTDAHQPSVLTAKGDGLRTVVVNFSEAVLPTAYCDNVETDKKNANQTLFAADNIENYLIDGKPLSYWGVTEVKTPDSETPDDTSSNLKKESSKNDATKTGSEKPGEIQVGSYKDGEDNRHVVTIKLSRERFLEPGTHSMTISNVGDWAAKTDRERNIVNTQTFDFVVENNDVIPTFEVEEQSPEQWLLKFNSDIEPVSETLTTPNSQYSDQASILKLQELVGSTWVDISDSDAAGKNPIRVSQVDDTRNYVVEVRKDWTEVYNTSSTKQNYFNKQLRLHIDAGKIVNIANNKQNGTIDIPLDGTIMRTPDVVSPEIGEVTPAEDTSGNVLDSYNVKLSEPVKLSDGTGGAGGANGEGLTPSQIQSANGSNSNNQGVPMPSAQFIRVDNGQTVEGIITSNVFVDAYDTTINIAPESALSAGKWRLVISSISDDYGNTASTVAHEIDVTQESVTTDFKIVWAAVSDQQTYAEDHIGVERGRYIFVKFSKPVTMTGNSVNAGVTGNYTVNGATLPTGTQIRANIVGYDDHDAVTDSVTIMLPTGNVNAGWGATGDYTV
SGKNAMLNVSRAITATTGENLSNGGLIRIPFQYGSATEDTGYNDYNDSLTALTDAVWGNYRSETRAGYDNLRDYYKALKSALENDKYRRVVLTAPLDLSNPDDNPNEDQKDAVAVFGRSHTLTIKRAVDFDLNGNNITGNVVISTTDAVNRIKLHSSKERAHIYGYANNKDNVATLTVNAGSAKEFLLDNVEVHETDKGNALNINDTWKASFVNNGVIDGKIRITDTNGCGFKNENTTDGFTNRTRFIIDSTGDVNLKGDLSALRNLTDEFGITVNQAAKLSFGVDSKDETTPCDISGVKIVVRGPGARVIFTPVATTTADTALTAEADNVRVQLSQANSGSGKIQFFTDRGGKIVAVDKDNKEVTSDSKDAVKISSDDIKVTGIQKALENLDVQTGVITDGKVDSTVTISCGAISGGSYNIEELAKNIKKAEFEYKGKPDTTGIVANYSLLSTNLLKKDSTHIWPKDNWTDQKDDVSDTIRVTLAYDGYTMVKYIKVTRV</TSeq_sequence>
</TSeq>
</TSeqSet>
Tags loop-device