Eu tenho um arquivo XML muito grande, semelhante a uma árvore, com cerca de 1 GB.
Eu preciso excluir as linhas <Sample> ... </Sample>
, incluindo sublines dentro de <Sample> ... </Sample>
, que não contêm o valor <segmentation><![CDATA[0.11]]></segmentation>
.
Por exemplo, existem linhas com as tags da seguinte forma:
<segmentation><![CDATA[0.11]]></segmentation>
<segmentation><![CDATA[0.25]]></segmentation>
<segmentation><![CDATA[0.61]]></segmentation>
No exemplo abaixo, é possível excluir todas as <Sample>
linhas e sub-linhas mantendo apenas <Sample>
incluindo sub-linhas com a tag <segmentation><![CDATA[0.11]]></segmentation>
?
Inicial:
<Sample>
<title><![CDATA[South Park]]></title>
<date><![CDATA[Tue, 29 Nov 2016 00:00:00 EST]]></date>
<referencenumber><![CDATA[20983990]]></referencenumber>
<segmentation><![CDATA[0.11]]></segmentation>
<description><![CDATA[Some text goes here]]></description>
</Sample>
<Sample>
<title><![CDATA[South Park]]></title>
<date><![CDATA[Tue, 29 Nov 2016 00:00:00 EST]]></date>
<referencenumber><![CDATA[20983990]]></referencenumber>
<segmentation><![CDATA[0.25]]></segmentation>
<description><![CDATA[Some text goes here]]></description>
</Sample>
<Sample>
<title><![CDATA[South Park]]></title>
<date><![CDATA[Tue, 29 Nov 2016 00:00:00 EST]]></date>
<referencenumber><![CDATA[20983990]]></referencenumber>
<segmentation><![CDATA[0.61]]></segmentation>
<description><![CDATA[Some text goes here]]></description>
</Sample>
Resultado:
<Sample>
<title><![CDATA[South Park]]></title>
<date><![CDATA[Tue, 29 Nov 2016 00:00:00 EST]]></date>
<referencenumber><![CDATA[20983990]]></referencenumber>
<segmentation><![CDATA[0.11]]></segmentation>
<description><![CDATA[Some text goes here]]></description>
</Sample>