mysql-python排序规则问题:如何强制unicode数据类型? [英] mysql-python collation issue: how to force unicode datatype?

查看:114
本文介绍了mysql-python排序规则问题:如何强制unicode数据类型?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

出于某些目的,我不得不将数据库中的字段排序规则从utf8_unicode_ci更改为utf8_bin.原来,这种变化导致python数据类型的变化.

For certain purposes I had to change field collations from utf8_unicode_ci to utf8_bin in a database. It turned out that the change lead to changes in datatypes that come to python.

问题是如何强制mysql-python将unicode对象返回给python .

以下是显示问题的示例(显式字符集强制use_unicode = 1):

Here is a sample that shows the problem (explicit charset forces use_unicode=1):

>>> con = MySQLdb.connect(..., charset='utf8')
>>> c = c.cursor()
>>> c.execute('SELECT %s COLLATE utf8_bin', u'м')
1L
>>> c.fetchone()
('\xd0\xbc',)
>>> c.description
(("'\xd0\xbc' COLLATE utf8_bin", 253, 2, 3, 3, 31, 0),)


>>> c.execute('SELECT %s COLLATE utf8_unicode_ci', u'м')
1L
>>> c.fetchone()
(u'\u043c',)
>>> c.description
(("'\xd0\xbc' COLLATE utf8_unicode_ci", 253, 2, 3, 3, 31, 0),)

在我的数据库中,字段的类型为VARCHAR,但更改后,它们的行为类似于BINARY,这不是我想要的.

In my database the fields are of type VARCHAR, but after the change they behave like BINARY which is not what I want.

推荐答案

事实证明,问题相当棘手.简而言之, MySQL字符串数据类型中的大多数变量和种类使用附加的BINARY标志映射到MySQL界面中的单个数据类型.

It turns out, that the problem is rather awkward. In short, most variaties and species in MySQL string datatypes map to a single datatype in MySQL's interface with an additional BINARY flag.

因此,MySQL的VARCHARVARBINARY和字符串文字在列类型定义中映射到相同的MySQLdb.constants.FIELD_TYPE.VAR_STRING类型,但是当类型为VARBINARY或字符串归类时具有附加的MySQLdb.constants.FLAG.BINARY标志*_bin排序规则.

Thus, MySQL's VARCHAR, VARBINARY, and a string literal map to the same MySQLdb.constants.FIELD_TYPE.VAR_STRING type in column type definitions, but having an additional MySQLdb.constants.FLAG.BINARY flag when the type is VARBINARY or a string collated with a *_bin collation.

即使有MySQLdb.constants.FIELD_TYPE.VARCHAR类型,我也无法确定何时使用它.就像我说的那样,MySQL VARCHAR列映射到FIELD_TYPE.VAR_STRING.

Even though there is a MySQLdb.constants.FIELD_TYPE.VARCHAR type, I failed to find out when it is used. As I said, MySQL VARCHAR columns maps to FIELD_TYPE.VAR_STRING.

如果您的应用程序使用真实的二进制字符串(例如,您存储图像并使用与文本相同的连接来获取它们),该解决方案将变得非常脆弱,因为它假定将所有二进制字符串都解码为unicode.不过,它可以工作.

The solution becomes rather fragile, if your application uses true binary strings (for example, you store images and fetch them with the same connection as text), since it assumes decoding all binary strings to unicode. Though, it works.

官方文档指出:

因为MySQL会将所有数据返回为字符串,并希望您自己进行转换.这确实是个麻烦,但实际上,_mysql可以为您完成此任务. (MySQLdb会为您执行此操作.)要完成自动类型转换,您需要创建一个类型转换器字典,并将其作为 conv 关键字参数传递给connect().

Because MySQL returns all data as strings and expects you to convert it yourself. This would be a real pain in the ass, but in fact, _mysql can do this for you. (And MySQLdb does do this for you.) To have automatic type conversion done, you need to create a type converter dictionary, and pass this to connect() as the conv keyword parameter.

在实践中,真正的痛苦可能是构建自己的转换器字典的过程.但是您可以从MySQLdb.converters.conversions导入默认值并对其进行修补,甚至可以在Connection的实例上对其进行修补.诀窍是删除用于FLAG.BINARY标志的特殊转换器,并为所有情况添加一个解码器.如果您为MySQLdb.connect明确指定charset参数,则会强制使用use_unicode=1参数,该参数会为您添加解码器,但您可以自己执行以下操作:

In practice, real pain in the ass might be the process of constructing your own converters dictionary. But you can import the default one from MySQLdb.converters.conversions and patch it, or even patch it on an instance of the Connection. The trick is to remove a special converter for a FLAG.BINARY flag and add a decoder for all cases. If you explicitly specify a charset parameter for MySQLdb.connect, it forces use_unicode=1 parameter, which adds the decoder for you, but you can do it yourself:

>>> con = MySQLdb.connect(**params)
>>> con.converter[FIELD_TYPE.VAR_STRING]
[(128, <type 'str'>), (None, <function string_decoder at 0x01FFA130>)]
>>> con.converter[FIELD_TYPE.VAR_STRING] = [(None, con.string_decoder)]
>>> c = con.cursor()
>>> c.execute("SELECT %s COLLATE utf8_bin", u'м')
1L
>>> c.fetchone()
(u'\u043c',)

如果需要,您可能需要对FIELD_TYPE.STRING进行相同的修改.

You might probably need to make the same hack for FIELD_TYPE.STRING if required.

另一种解决方案是将显式的use_unicode=0传递给MySQLdb.connect并在您的代码中进行所有解码,但我不会.

Another solution is to pass explicit use_unicode=0 to MySQLdb.connect and make all decodings in your code, but I would not.

希望,这对某人可能有用.

Hope, this might be useful to someone.

这篇关于mysql-python排序规则问题:如何强制unicode数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆