Python,转换4字节的字符,以避免MySQL错误“不正确的字符串值: [英] Python, convert 4-byte char to avoid MySQL error "Incorrect string value:"

查看:735
本文介绍了Python,转换4字节的字符,以避免MySQL错误“不正确的字符串值:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要将(在Python中)一个4字节的字符转换为其他字符。这是将它插入到我的utf-8 mysql数据库,而没有得到一个错误,如:不正确的字符串值:'\xF0 \x9F \x94 \x8E'行第1行行

I need to convert (in Python) a 4-byte char into some other character. This is to insert it into my utf-8 mysql database without getting an error such as: "Incorrect string value: '\xF0\x9F\x94\x8E' for column 'line' at row 1"

提出警告通过插入4字节unicode到mysql 显示以这种方式:

>>> import re
>>> highpoints = re.compile(u'[\U00010000-\U0010ffff]')
>>> example = u'Some example text with a sleepy face: \U0001f62a'
>>> highpoints.sub(u'', example)
u'Some example text with a sleepy face: '


$ b b

然而,我得到与用户在评论中相同的错误,...坏字符范围..这显然是因为我的Python是一个UCS-2(不是UCS-4)构建。

However, I get the same error as the user in the comment, "...bad character range.." This is apparently because my Python is a UCS-2 (not UCS-4) build. But then I am not clear on what to do instead?

推荐答案

在UCS-2版本中,python在内部使用2个代码单元对于每个unicode字符在 \U0000ffff 代码点。正则表达式需要使用这些正则表达式,因此您需要使用以下正则表达式来匹配这些:

In a UCS-2 build, python uses 2 code units internally for each unicode character over the \U0000ffff code point. Regular expressions need to work with those, so you'd need to use the following regular expression to match these:

highpoints = re.compile(u'[\uD800-\uDBFF][\uDC00-\uDFFF]')


$ b b

此正则表达式匹配使用UTF-16代理对编码的任何代码点(请参阅 UTF-16代码点U + 10000到U + 10FFFF

要使这个版本在Python UCS-2和UCS- ,您可以使用尝试: / (除了)使用一个或另一个:

To make this compatible across Python UCS-2 and UCS-4 versions, you could use a try:/except to use one or the other:

try:
    highpoints = re.compile(u'[\U00010000-\U0010ffff]')
except re.error:
    # UCS-2 build
    highpoints = re.compile(u'[\uD800-\uDBFF][\uDC00-\uDFFF]')

在UCS-2 python版本上演示:

Demonstration on a UCS-2 python build:

>>> import re
>>> highpoints = re.compile(u'[\uD800-\uDBFF][\uDC00-\uDFFF]')
>>> example = u'Some example text with a sleepy face: \U0001f62a'
>>> highpoints.sub(u'', example)
u'Some example text with a sleepy face: '

这篇关于Python,转换4字节的字符,以避免MySQL错误“不正确的字符串值:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆