MySQL的Django编码问题 [英] Django Encoding Issues with MySQL

查看:92
本文介绍了MySQL的Django编码问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,所以我建立了一个MySQL数据库。大多数表都是latin1,Django可以很好地处理它们。但是,其中一些是UTF-8,而Django无法处理它们。

Okay, so I have a MySQL database set up. Most of the tables are latin1 and Django handles them fine. But, some of them are UTF-8 and Django does not handle them.

这是一个示例表(这些表均来自django-geonames):

Here's a sample table (these tables are all from django-geonames):

DROP TABLE IF EXISTS `geoname`;
SET @saved_cs_client     = @@character_set_client;
SET character_set_client = utf8;
CREATE TABLE `geoname` (
  `id` int(11) NOT NULL,
  `name` varchar(200) NOT NULL,
  `ascii_name` varchar(200) NOT NULL,
  `latitude` decimal(20,17) NOT NULL,
  `longitude` decimal(20,17) NOT NULL,
  `point` point default NULL,
  `fclass` varchar(1) NOT NULL,
  `fcode` varchar(7) NOT NULL,
  `country_id` varchar(2) NOT NULL,
  `cc2` varchar(60) NOT NULL,
  `admin1_id` int(11) default NULL,
  `admin2_id` int(11) default NULL,
  `admin3_id` int(11) default NULL,
  `admin4_id` int(11) default NULL,
  `population` int(11) NOT NULL,
  `elevation` int(11) NOT NULL,
  `gtopo30` int(11) NOT NULL,
  `timezone_id` int(11) default NULL,
  `moddate` date NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `country_id_refs_iso_alpha2_e2614807` (`country_id`),
  KEY `admin1_id_refs_id_a28cd057` (`admin1_id`),
  KEY `admin2_id_refs_id_4f9a0f7e` (`admin2_id`),
  KEY `admin3_id_refs_id_f8a5e181` (`admin3_id`),
  KEY `admin4_id_refs_id_9cc00ec8` (`admin4_id`),
  KEY `fcode_refs_code_977fe2ec` (`fcode`),
  KEY `timezone_id_refs_id_5b46c585` (`timezone_id`),
  KEY `geoname_52094d6e` (`name`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
SET character_set_client = @saved_cs_client;

现在,如果我尝试直接使用MySQLdb和游标从表中获取数据,具有正确编码的文本:

Now, if I try to get data from the table directly using MySQLdb and a cursor, I get the text with the proper encoding:

>>> import MySQLdb
>>> from django.conf import settings
>>> 
>>> conn = MySQLdb.connect (host = "localhost",
... user = settings.DATABASES['default']['USER'],
... passwd = settings.DATABASES['default']['PASSWORD'],
... db = settings.DATABASES['default']['NAME'])
>>> cursor = conn.cursor ()
>>> cursor.execute("select name from geoname where name like 'Uni%Hidalgo'");
1L
>>> g = cursor.fetchone()
>>> g[0]
'Uni\xc3\xb3n Hidalgo'
>>> print g[0]
Unión Hidalgo

但是,如果我尝试使用Geoname模型(实际上是 django.contrib.gis.db.models.Model ),它会失败:

However, if I try to use the Geoname model (which is actually a django.contrib.gis.db.models.Model), it fails:

>>> from geonames.models import Geoname
>>> g = Geoname.objects.get(name__istartswith='Uni',name__icontains='Hidalgo')
>>> g.name
u'Uni\xc3\xb3n Hidalgo'
>>> print g.name
Unión Hidalgo

这里显然存在编码错误。在这两种情况下,数据库都返回 Uni\xc3\xb3n Hidalgo,但是Django(不正确?)将 \xc3\xb3n转换为ó。

There's pretty clearly an encoding error here. In both cases the database is returning 'Uni\xc3\xb3n Hidalgo' but Django is (incorrectly?) translating the '\xc3\xb3n' to ó.

该如何解决?

好的,所以很奇怪:

>>> c = unicode('Uni\xc3\xb3n Hidalgo','utf-8')
>>> c
u'Uni\xf3n Hidalgo'
>>> print c
Unión Hidalgo

如果我 force python进行编码将字符串从utf-8转换为Unicode,就可以了。但是,这会重现错误:

If I force python to encode the string into Unicode from utf-8, it works. However, this recreates the mistake:

>>> c = unicode('Unión Hidalgo','latin1')
>>> c
u'Uni\xc3\xb3n Hidalgo'
>>> print c
Unión Hidalgo

所以,我猜MySQL正在发送utf-8,但告诉Python是latin1吗?

So, my guess MySQL is sending utf-8 but telling Python it is latin1?

推荐答案

毕竟问题出在MySQL中。我删除了表,使用charset和collat​​e设置为UTF来重新创建它们,然后重新导入了所有数据。

Looks like the problem was in MySQL after all. I dropped the tables, recreated them with charset and collate set to UTF, and re-imported all of the data.

现在正在工作。

这篇关于MySQL的Django编码问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆