MySQL: valor da string incorreto

0

Eu tenho tido alguns problemas com alguns personagens em páginas da Web, que temos em UTF8, geralmente em pouco tempo.

Estou executando o MySQL 5.7 com o Debian 9. Meus bancos de dados estão usando o utf8 charset.

Hoje, ao depurar uma consulta do MySQL, executei SHOW WARNINGS; e vi no campo Mensagem:

 Incorrect string value: '\xF0\x9D\x8C\x86' for column 'xxxx';

O que está acontecendo?

    
por Rui F Ribeiro 02.03.2018 / 19:37

1 resposta

1

Por fim, pesquisando sobre o erro, encontrei alguns artigos sugerindo que no MySQL o conjunto de caracteres utf8 tem bugs / danos cerebrais e não deve ser usado.

Em sua substituição, deve-se usar o conjunto de caracteres utf8mb4 .

De No MySQL, nunca use “utf8”. Use "utf8mb4"

MySQL’s “utf8” isn’t UTF-8.

The “utf8” encoding only supports three bytes per character. The real UTF-8 encoding — which everybody uses, including you — needs up to four bytes per character.

MySQL developers never fixed this bug. They released a workaround in 2010: a new character set called “utf8mb4”.

In short:
- MySQL’s “utf8mb4” means “UTF-8”.
- MySQL’s “utf8” means “a proprietary character encoding”. This encoding can’t encode many Unicode characters.

I’ll make a sweeping statement here: all MySQL and MariaDB users who are currently using “utf8” should actually use “utf8mb4”. Nobody should ever use “utf8”.

De Como suportar Unicode completo em bancos de dados MySQL

Turns out MySQL’s utf8 charset only partially implements proper UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported.

As shown above, this behavior can lead to data loss, but it gets worse — it can result in security vulnerabilities. Here are some examples, all of which were discovered after publishing this write-up:

  • PHP object injection vulnerability in WordPress < 3.6.1, leading to remote >code execution in combination with certain WordPress plugins
  • Email authentication bypass in Phabricator Stored XSS in WordPress 4.1.2
  • Remote command execution in the Joomla! CMS

TL;DR MySQL’s utf8 encoding is awkwardly named, as it’s different from proper UTF-8 encoding. It doesn’t offer full Unicode support, which can lead to data loss or security vulnerabilities.

    
por 02.03.2018 / 19:37