Listar todas as tags html em um arquivo

Question

Listar todas as tags html em um arquivo

#1 resposta do (3 votos)
#2 resposta do (2 votos)

1

Eu quero saber se existe uma maneira de listar todas as tags html em um arquivo. Vamos dizer que eu tenho um arquivo file.html

<html>
<head>
<title>Test</title>
</head>
<body>
This is a test
</body>
</html>

E eu quero obter uma lista de todas as tags. Isso é:

<html>
<head>
<title>
</title>
</head>
<body>
</body>
</html>

Eu tentei usar sed

cat file.html | sed 's/<[^>]*>//g'

Mas removeu todas as tags html criadas. . . .

grep sed shell scripting

por gkmohit 23.07.2014 / 20:17

2 respostas

2

Usar um analisador html real não é tão difícil:

perl -MHTML::Parser -E '
  $handler = sub {say "<".shift.">"};
  HTML::Parser->new(start_h => [$handler,"tag"], end_h => [$handler,"tag"])
              ->parse_file(shift @ARGV)
' file.html

<html>
<head>
<title>
</title>
</head>
<body>
</body>
</html>

por 23.07.2014 / 21:07

Tags grep sed shell scripting

Problema ao dividir o comando com barra invertida no prompt unix O que os colchetes significam na saída do pstree?

score 3 · Accepted Answer

Um hack rápido com perl:

perl -wlne 'print for(/<.*?>/g)' file.html

Mas, para uma solução séria, você deve usar uma ferramenta que realmente entenda html / xml.