mysql-python排序规则问题:如何强制unicode数据类型? [英] mysql-python collation issue: how to force unicode datatype?
问题描述
出于某些目的,我不得不将数据库中的字段排序规则从utf8_unicode_ci
更改为utf8_bin
.原来,这种变化导致python数据类型的变化.
For certain purposes I had to change field collations from utf8_unicode_ci
to utf8_bin
in a database. It turned out that the change lead to changes in datatypes that come to python.
问题是如何强制mysql-python将unicode对象返回给python .
以下是显示问题的示例(显式字符集强制use_unicode = 1):
Here is a sample that shows the problem (explicit charset forces use_unicode=1):
>>> con = MySQLdb.connect(..., charset='utf8')
>>> c = c.cursor()
>>> c.execute('SELECT %s COLLATE utf8_bin', u'м')
1L
>>> c.fetchone()
('\xd0\xbc',)
>>> c.description
(("'\xd0\xbc' COLLATE utf8_bin", 253, 2, 3, 3, 31, 0),)
>>> c.execute('SELECT %s COLLATE utf8_unicode_ci', u'м')
1L
>>> c.fetchone()
(u'\u043c',)
>>> c.description
(("'\xd0\xbc' COLLATE utf8_unicode_ci", 253, 2, 3, 3, 31, 0),)
在我的数据库中,字段的类型为VARCHAR,但更改后,它们的行为类似于BINARY,这不是我想要的.
In my database the fields are of type VARCHAR, but after the change they behave like BINARY which is not what I want.
推荐答案
事实证明,问题相当棘手.简而言之, MySQL字符串数据类型中的大多数变量和种类使用附加的BINARY标志映射到MySQL界面中的单个数据类型.
It turns out, that the problem is rather awkward. In short, most variaties and species in MySQL string datatypes map to a single datatype in MySQL's interface with an additional BINARY flag.
因此,MySQL的VARCHAR
,VARBINARY
和字符串文字在列类型定义中映射到相同的MySQLdb.constants.FIELD_TYPE.VAR_STRING
类型,但是当类型为VARBINARY
或字符串归类时具有附加的MySQLdb.constants.FLAG.BINARY
标志*_bin
排序规则.
Thus, MySQL's VARCHAR
, VARBINARY
, and a string literal map to the same MySQLdb.constants.FIELD_TYPE.VAR_STRING
type in column type definitions, but having an additional MySQLdb.constants.FLAG.BINARY
flag when the type is VARBINARY
or a string collated with a *_bin
collation.
即使有MySQLdb.constants.FIELD_TYPE.VARCHAR
类型,我也无法确定何时使用它.就像我说的那样,MySQL VARCHAR
列映射到FIELD_TYPE.VAR_STRING
.
Even though there is a MySQLdb.constants.FIELD_TYPE.VARCHAR
type, I failed to find out when it is used. As I said, MySQL VARCHAR
columns maps to FIELD_TYPE.VAR_STRING
.
如果您的应用程序使用真实的二进制字符串(例如,您存储图像并使用与文本相同的连接来获取它们),该解决方案将变得非常脆弱,因为它假定将所有二进制字符串都解码为unicode.不过,它可以工作.
The solution becomes rather fragile, if your application uses true binary strings (for example, you store images and fetch them with the same connection as text), since it assumes decoding all binary strings to unicode. Though, it works.
官方文档指出:
因为MySQL会将所有数据返回为字符串,并希望您自己进行转换.这确实是个麻烦,但实际上,_mysql可以为您完成此任务. (MySQLdb会为您执行此操作.)要完成自动类型转换,您需要创建一个类型转换器字典,并将其作为 conv 关键字参数传递给connect().
Because MySQL returns all data as strings and expects you to convert it yourself. This would be a real pain in the ass, but in fact, _mysql can do this for you. (And MySQLdb does do this for you.) To have automatic type conversion done, you need to create a type converter dictionary, and pass this to connect() as the conv keyword parameter.
在实践中,真正的痛苦可能是构建自己的转换器字典的过程.但是您可以从MySQLdb.converters.conversions
导入默认值并对其进行修补,甚至可以在Connection的实例上对其进行修补.诀窍是删除用于FLAG.BINARY
标志的特殊转换器,并为所有情况添加一个解码器.如果您为MySQLdb.connect
明确指定charset
参数,则会强制使用use_unicode=1
参数,该参数会为您添加解码器,但您可以自己执行以下操作:
In practice, real pain in the ass might be the process of constructing your own converters dictionary. But you can import the default one from MySQLdb.converters.conversions
and patch it, or even patch it on an instance of the Connection. The trick is to remove a special converter for a FLAG.BINARY
flag and add a decoder for all cases. If you explicitly specify a charset
parameter for MySQLdb.connect
, it forces use_unicode=1
parameter, which adds the decoder for you, but you can do it yourself:
>>> con = MySQLdb.connect(**params)
>>> con.converter[FIELD_TYPE.VAR_STRING]
[(128, <type 'str'>), (None, <function string_decoder at 0x01FFA130>)]
>>> con.converter[FIELD_TYPE.VAR_STRING] = [(None, con.string_decoder)]
>>> c = con.cursor()
>>> c.execute("SELECT %s COLLATE utf8_bin", u'м')
1L
>>> c.fetchone()
(u'\u043c',)
如果需要,您可能需要对FIELD_TYPE.STRING
进行相同的修改.
You might probably need to make the same hack for FIELD_TYPE.STRING
if required.
另一种解决方案是将显式的use_unicode=0
传递给MySQLdb.connect
并在您的代码中进行所有解码,但我不会.
Another solution is to pass explicit use_unicode=0
to MySQLdb.connect
and make all decodings in your code, but I would not.
希望,这对某人可能有用.
Hope, this might be useful to someone.
这篇关于mysql-python排序规则问题:如何强制unicode数据类型?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!