如何修复“不正确的字符串值"错误? [英] How to fix "Incorrect string value" errors?

查看:257
本文介绍了如何修复“不正确的字符串值"错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在注意到一个应用程序由于不正确的字符串值错误而倾向于丢弃随机电子邮件之后,我仔细检查并切换了许多文本列,以使用utf8列字符集和默认列整理(utf8_general_ci),以便它可以接受他们.这样可以修复大多数错误,并使应用程序在遇到非拉丁电子邮件时也停止出现sql错误.

After noticing an application tended to discard random emails due to incorrect string value errors, I went though and switched many text columns to use the utf8 column charset and the default column collate (utf8_general_ci) so that it would accept them. This fixed most of the errors, and made the application stop getting sql errors when it hit non-latin emails, too.

尽管如此,某些电子邮件仍然导致程序遇到不正确的字符串值错误:(Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)

Despite this, some of the emails are still causing the program to hit incorrect string value errrors: (Incorrect string value: '\xE4\xC5\xCC\xC9\xD3\xD8...' for column 'contents' at row 1)

contents列是MEDIUMTEXT数据类型,它使用utf8列字符集和utf8_general_ci列进行整理.在此列中没有可切换的标志.

The contents column is a MEDIUMTEXT datatybe which uses the utf8 column charset and the utf8_general_ci column collate. There are no flags that I can toggle in this column.

请记住,除非绝对必要,否则我不想触摸甚至查看应用程序源代码:

Keeping in mind that I don't want to touch or even look at the application source code unless absolutely necessary:

  • 是什么导致该错误? (是的,我知道电子邮件中到处都是随机垃圾,但我认为utf8会很宽松)
  • 我该如何解决?
  • 此修复程序可能产生什么影响?

我考虑的一件事是在打开二进制标志的情况下切换到utf8 varchar([some large number]),但我对MySQL并不熟悉,并且不知道这样的修复是否有意义.

One thing I considered was switching to a utf8 varchar([some large number]) with the binary flag turned on, but I'm rather unfamiliar with MySQL, and have no idea if such a fix makes sense.

推荐答案

"\xE4\xC5\xCC\xC9\xD3\xD8"无效的UTF-8.使用Python测试:

"\xE4\xC5\xCC\xC9\xD3\xD8" isn't valid UTF-8. Tested using Python:

>>> "\xE4\xC5\xCC\xC9\xD3\xD8".decode("utf-8")
...
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-2: invalid data

如果您正在寻找避免解码数据库中错误的方法,则cp1252编码(又称"Windows-1252"或"Windows Western European")是目前最宽松的编码-每个字节值都是有效的代码点.

If you're looking for a way to avoid decoding errors within the database, the cp1252 encoding (aka "Windows-1252" aka "Windows Western European") is the most permissive encoding there is - every byte value is a valid code point.

当然不会再了解真正的UTF-8或任何其他非cp1252编码了,但是听起来您不太在意吗?

Of course it's not going to understand genuine UTF-8 any more, nor any other non-cp1252 encoding, but it sounds like you're not too concerned about that?

这篇关于如何修复“不正确的字符串值"错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆