Script Bash para extrair valores do HTML

0

Eu estou tentando extrair valores de contador de um servidor de dispositivo HA7NET 1wire, mas não estou tão acostumado com scripts sed ou awk ou bash, então estou correndo para problemas.

Esse script me fornece uma matriz com as IDs do contador:

#!/bin/sh
Counters=$(curl -q "http://192.168.70.21/1Wire/Search.html?FamilyCode=1D" 2>/dev/null | sed --silent -e 's/.*<INPUT.*NAME="Address_\(.*\)".*VALUE="\(.*\)".*.//p')

# iterating by for to see the array.
for x in $Counters; do echo $x; done;

Os resultados listam os dispositivos e são assim:

    D90000000C8A9A1D
    C00000000C8C9D1D
    2D0000000EE97D1D

Agora eu quero usar a matriz para outra solicitação de onda para obter a leitura real do contador? O URL para obter a leitura de ambos os contadores (cada dispositivo tem 2 contadores A, B) pode ser estendido para ler todos os dispositivos de uma só vez e se parece com isso:

curl -q "http://192.168.70.21/1Wire/ReadCounter.html?Address_Channel_Array={D90000000C8A9A1D,A},{D90000000C8A9A1D,B},{C00000000C8C9D1D,A},{C00000000C8C9D1D,B},2D0000000EE97D1D,A},{2D0000000EE97D1D,B}" 

E a página HTML resultante:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><!-- InstanceBegin template="/Templates/1WireReply.dwt" codeOutsideHTMLIsLocked="false" -->
<head>
<!-- InstanceBeginEditable name="doctitle" -->
<title>Read Counter Reply</title><!-- InstanceEndEditable -->
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<!-- InstanceBeginEditable name="head" --><!-- InstanceEndEditable -->
<style type="text/css">
<!--
@import url("/eds.css");
-->
</style>
<!-- InstanceParam name="pagePreprocessor" type="text" value="preProcessReadCounter" --><!-- InstanceParam name="functionname" type="text" value="Read Counter" --><!-- InstanceParam name="nextpage" type="text" value="PgReadCounterResult" --><!-- InstanceParam name="enctype" type="text" value="application/x-www-form-urlencoded" --><!-- InstanceParam name="name" type="text" value="Read Counter Result" -->
</head><body>
<table width="100%" border="0" cellspacing="0" cellpadding="0" bgcolor="#EEEEEE">
<tr>
<td class="title" colspan="2"><h1>&nbsp;</h1><h1 class="title">Embedded Data Systems</h1><a class="title" href="http://www.embeddeddatasystems.com">http://www.embeddeddatasystems.com</a></td></tr><tr class="spacer">
<td><H2 class="spacer">Read Counter Reply</h2></td><td><p class="spacer">HA7Net: 1.0.0.22</p></td></tr><tr>
<td colspan="2"><FORM METHOD="POST" ACTION="/Forms/ReadCounterResult_1" name="Read Counter Result"><table name="Exceptions" ID="Exceptions">
<tr>
<td><INPUT CLASS="HA7Value" NAME="Exception_Code_0" ID="Exception_Code_0" TYPE="hidden" VALUE="0" Size="5" disabled></td><td><INPUT CLASS="HA7Value" NAME="Exception_String_0" ID="Exception_String_0" TYPE="hidden" VALUE="None" Size="5" disabled></td></tr></table><!-- InstanceBeginEditable name="WorkArea" -->
<table name="Counter" id="Counter">
<tr><td colspan=1>Address</td><td colspan=1>Count</td><td colspan=1>Status</td></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_0" ID="Address_0" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_0" ID="Count_0" TYPE="text" VALUE="240155653"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_0" ID="Device_Exception_0" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_0" ID="Device_Exception_Code_0" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_1" ID="Address_1" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_1" ID="Count_1" TYPE="text" VALUE="48719610"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_1" ID="Device_Exception_1" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_1" ID="Device_Exception_Code_1" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_2" ID="Address_2" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_2" ID="Count_2" TYPE="text" VALUE="0"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_2" ID="Device_Exception_2" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_2" ID="Device_Exception_Code_2" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_3" ID="Address_3" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_3" ID="Count_3" TYPE="text" VALUE="1"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_3" ID="Device_Exception_3" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_3" ID="Device_Exception_Code_3" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_4" ID="Address_4" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_4" ID="Count_4" TYPE="text" VALUE="1973018"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_4" ID="Device_Exception_4" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_4" ID="Device_Exception_Code_4" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_5" ID="Address_5" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_5" ID="Count_5" TYPE="text" VALUE="17260345"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_5" ID="Device_Exception_5" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_5" ID="Device_Exception_Code_5" TYPE="hidden" VALUE="0"></tr>
</table><!-- InstanceEndEditable -->
<table name="Statistics" ID="Statistics">

    

Eu quero extrair os valores para cada contador em um arquivo ou variável para uso posterior.

Eu sei que é possível usar os owfs, mas eu queria ter a flexibilidade para fazer isso dessa maneira.

    
por Alf Pettersson 01.05.2017 / 17:12

1 resposta

1

HTML não é um idioma regular, portanto, primeiro, esteja ciente de que tentar analisá-lo com expressões regulares é uma passagem de primeira classe para uma queda na loucura. Dito isso, parece que o HTML que você está tentando mastigar deve se prestar a uma extração bastante simples. Os dados que você deseja extrair parecem ser dessas linhas:

<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_0" ID="Address_0" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_0" ID="Count_0" TYPE="text" VALUE="240155653"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_0" ID="Device_Exception_0" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_0" ID="Device_Exception_Code_0" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_1" ID="Address_1" TYPE="text" VALUE="D90000000C8A9A1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_1" ID="Count_1" TYPE="text" VALUE="48719610"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_1" ID="Device_Exception_1" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_1" ID="Device_Exception_Code_1" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_2" ID="Address_2" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_2" ID="Count_2" TYPE="text" VALUE="0"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_2" ID="Device_Exception_2" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_2" ID="Device_Exception_Code_2" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_3" ID="Address_3" TYPE="text" VALUE="C00000000C8C9D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_3" ID="Count_3" TYPE="text" VALUE="1"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_3" ID="Device_Exception_3" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_3" ID="Device_Exception_Code_3" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_4" ID="Address_4" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_4" ID="Count_4" TYPE="text" VALUE="1973018"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_4" ID="Device_Exception_4" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_4" ID="Device_Exception_Code_4" TYPE="hidden" VALUE="0"></tr>
<tr><td colspan=1><INPUT CLASS="HA7Value" NAME="Address_5" ID="Address_5" TYPE="text" VALUE="2D0000000EE97D1D"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Count_5" ID="Count_5" TYPE="text" VALUE="17260345"></td><td colspan=1><INPUT CLASS="HA7Value" NAME="Device_Exception_5" ID="Device_Exception_5" TYPE="text" VALUE="OK"></td><INPUT CLASS="HA7Value" NAME="Device_Exception_Code_5" ID="Device_Exception_Code_5" TYPE="hidden" VALUE="0"></tr>

Então, provavelmente podemos sed this:

$ sed -n '/HA7Value.*Address_/{ s/VALUE="/%%%/;s/^.*%%%//; s/".*//; p; }' input.html
D90000000C8A9A1D
D90000000C8A9A1D
C00000000C8C9D1D
C00000000C8C9D1D
2D0000000EE97D1D
2D0000000EE97D1D

Para expor isso:

/HA7Value.*Address_/ # Only run on lines that match this expression
{                    # Begin code block
  s/VALUE="/%%%/     # Replace (only) the first 'VALUE="' with a special marker
  s/^.*%%%//         # Delete everything up to that marker
  s/".*//            # Delete from the first '"' to the end of the line
  p                  # Print what's left
}                    # End code block
    
por 01.05.2017 / 17:56