Fazendo o grep entender os escapes de byte

4

Estou tentando combinar com alguns caracteres UTF-8. O problema é grep não traduzir \x byte escapa, então isso falha:

echo -e '\xd8\xaa' | grep -P '\xd8\xaa'

enquanto isso for bem sucedido:

echo -e '\xd8\xaa' | grep -P $(printf '\xd8\xaa')

O grep entende que o byte escapa diretamente sem usar o printf? Como?

    
por RYN 11.03.2018 / 23:37

1 resposta

4

Isso falha:

$ echo -e '\xd8\xaa' | grep -P '\xd8\xaa' | hexdump

Isso é bem-sucedido:

$ echo -e '\xd8\xaa' | grep -P $'\xd8\xaa' | hexdump
0000000 aad8 000a                              
0000003

Documentação

De man bash :

Words of the form $'string' are treated specially. The word expands to string, with backslash-escaped characters replaced as specified by the ANSI C standard. Backslash escape sequences, if present, are decoded as follows:

          \a     alert (bell)
          \b     backspace
          \e
          \E     an escape character
          \f     form feed
          \n     new line
          \r     carriage return
          \t     horizontal tab
          \v     vertical tab
          \     backslash
          \'     single quote
          \"     double quote
          \?     question mark
          \nnn   the eight-bit character whose value is the octal value nnn (one to three digits)
          \xHH   the eight-bit character whose value is the hexadecimal value HH (one or two hex digits)
          \uHHHH the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits)
          \UHHHHHHHH
                 the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex digits)
          \cx    a control-x character

The expanded result is single-quoted, as if the dollar sign had not been present.

    
por 11.03.2018 / 23:44