Remove apenas as strings dadas da coluna dada?

1

ENTRADA:

<tr><td>FOOBAAR</td><td>FOOO</td><td>BAAR</td><td><font style=BACKGROUND-COLOR:red>2014-02-14 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-02-17 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-03-07 13:34</font></td></tr>

OUTPUT:

<tr><td>FOOBAAR</td><td>FOOO</td><td>BAAR</td><td>2014-02-14 13:34</td><td><font style=BACKGROUND-COLOR:red>2014-02-17 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-03-07 13:34</font></td></tr>

Diferença: o:

<font style=BACKGROUND-COLOR:red>

e

</font>

foi removido apenas da quarta coluna.

Minha pergunta: Como posso remover apenas determinadas strings de determinada coluna?

</td><td>

é o delimitador

    
por evachristine 03.06.2014 / 21:42

4 respostas

3

Eu recomendaria uma ferramenta de análise de HTML em vez de usar expressões regulares. (Resposta famosa explicando por que aqui )

Veja um exemplo de uso de um analisador XML (note: requer que a entrada seja XML bem formado, o que não é o seu HTML de amostra)

# change the value of the style attribute of the font tag of the 4th td tag 
# to the empty string
xmlstarlet ed -O -u '//table/tr/td[4]/font[@style]/@style' -v "" <<END
<html><head></head><body><table>
<tr><td>FOOBAAR</td><td>FOOO</td><td>BAAR</td><td><font style="BACKGROUND-COLOR:red">2014-02-14 13:34</font></td><td><font style="BACKGROUND-COLOR:red">2014-02-17 13:34</font></td><td><font style="BACKGROUND-COLOR:red">2014-03-07 13:34</font></td></tr>
</table></body></html>
END
<html>
  <head/>
  <body>
    <table>
      <tr>
        <td>FOOBAAR</td>
        <td>FOOO</td>
        <td>BAAR</td>
        <td>
          <font style="">2014-02-14 13:34</font>
        </td>
        <td>
          <font style="BACKGROUND-COLOR:red">2014-02-17 13:34</font>
        </td>
        <td>
          <font style="BACKGROUND-COLOR:red">2014-03-07 13:34</font>
        </td>
      </tr>
    </table>
  </body>
</html>
    
por 04.06.2014 / 04:09
2

Isso pode funcionar ..

#!/bin/sh

# replace specific strings from the fourth column
INSTRING="<tr><td>FOOBAAR</td><td>FOOO</td><td>BAAR</td><td><font style=BACKGROUND-COLOR:red>2014-02-14 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-02-17 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-03-07 13:34</font></td></tr>"

DEL_STRING1="<font style=BACKGROUND-COLOR:red>"
DEL_STRING2="</font>"
DELIM="</td><td>"
OUT_FIRST='echo $INSTRING | awk -F $DELIM '{print $1,$2,$3,$4}' OFS="</td><td>"'
OUT_FIRST='echo $OUT_FIRST | awk -F "$DEL_STRING1" '{print $1,$2}' OFS=""'
OUT_FIRST='echo $OUT_FIRST | awk -F "$DEL_STRING2" '{print $1}''
OUT_LAST='echo $INSTRING | awk -F $DELIM '{print substr($0, index($0,$5))}' OFS=$DELIM'
echo "$OUT_FIRST$DELIM$OUT_LAST"

Espero que isso ajude ..

    
por 04.06.2014 / 01:12
1

Comando awk one-liner,

$ awk -F '<\/td><td>' 'BEGIN{OFS=FS;} {gsub (/<font style=BACKGROUND-COLOR:red>/,"",$4); gsub (/<\/font>/,"",$4);}1' file 2>/dev/null
<tr><td>FOOBAAR</td><td>FOOO</td><td>BAAR</td><td>2014-02-14 13:34</td><td><font style=BACKGROUND-COLOR:red>2014-02-17 13:34</font></td><td><font style=BACKGROUND-COLOR:red>2014-03-07 13:34</font></td></tr>
    
por 04.06.2014 / 04:23
1
sed 's|</td><td>|</td>\nTGT_LINE_MARKER<td>|4' |
sed '\|TGT_LINE_MARKER|{function applied to target field}'
    
por 04.06.2014 / 07:18

Tags