UnicodeEncodeError:'ascii'编解码器不能将字符u'\’'编码在位置47:序号不在范围(128) [英] UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)

查看:437
本文介绍了UnicodeEncodeError:'ascii'编解码器不能将字符u'\’'编码在位置47:序号不在范围(128)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是Python 2.7和MySQLdb 1.2.3。我尝试过在stackoverflow和其他论坛上发现的一切,以处理我的脚本抛出的编码错误。
我的脚本从源MySQL DB中的所有表中读取数据,将它们写入python StringIO.StringIO 对象,然后从 StringIO 对象到Postgres数据库(显然是UTF-8编码格式,我通过使用psycopg2库的copy_from命令查找属性 - 数据库在pgadmin中的定义)。



我发现我的源MySQL数据库在latin1_swedish_ci编码中有一些表,而其他的则是utf_8编码格式(发现这个来自TABLE_COLLATION in information_schema.tables)。



根据我在互联网上的研究,我在Python脚本的顶部写了所有这些代码。

  db_conn = MySQLdb.connect(host = host,user = user,passwd = passwd,db = db,charset =utf8,init_command ='SET NAMES UTF8',use_unicode = True)
db_conn.set_character_set utf8')
db_conn_cursor = db_conn.cursor()
db_conn_cursor.execute('SET NAMES utf8;')
db_c onn_cursor.execute('SET CHARACTER SET utf8;')
db_conn_cursor.execute('SET character_set_connection = utf8;')

我仍然使用这一行得到 UnicodeEncodeError cell = str(cell).replace(\r ,).replace(\\\
,).replace(\t,'').replace(\,)#从列值中删除不需要的字符

  UnicodeEncodeError:'ascii'编解码器不能将字符u'\\\’'编码在位置47:序号不在范围内(128)

我写了以下代码行,清理每个表格中的单元格在编写StringIO对象时,请输入MySQL数据库。

  cell = str(cell).replace(\r, ).replace(\\\
,).replace(\t,'').replace(\,)#从列值中删除不需要的字符
<

解决方案

code $ > str(cell)正在尝试将单元格转换为ASCII。 ASCII只支持小于255的序数的字符。什么是单元格?



如果单元格是一个unicode字符串,只是请执行 cell.encode(utf8),并将返回编码为utf 8的副作用。



..我真的很喜欢如果你通过mysql unicode,那么数据库会自动将它转换成utf8 ...



你也可以尝试,

  cell = unicode(cell).replace(\r,).replace(\\\
,).replace(\t ,'').replace(\,)

或只是使用第3派对图书馆,有一个很好的修正文本。


I am using Python 2.7 and MySQLdb 1.2.3. I tried everything I found on stackoverflow and other forums to handle encoding errors my script is throwing. My script reads data from all tables in a source MySQL DB, writes them in a python StringIO.StringIO object, and then loads that data from StringIO object to Postgres database (which apparently is in UTF-8 encoding format. I found this by looking into Properties--Definition of database in pgadmin) using psycopg2 library's copy_from command.

I found out that my source MySQL database has some tables in latin1_swedish_ci encoding while others in utf_8 encoding format (Found this from TABLE_COLLATION in information_schema.tables).

I wrote all this code on the top of my Python script based on my research on the internet.

db_conn = MySQLdb.connect(host=host,user=user,passwd=passwd,db=db, charset="utf8", init_command='SET NAMES UTF8' ,use_unicode=True) 
db_conn.set_character_set('utf8') 
db_conn_cursor = db_conn.cursor()
db_conn_cursor.execute('SET NAMES utf8;')
db_conn_cursor.execute('SET CHARACTER SET utf8;')
db_conn_cursor.execute('SET character_set_connection=utf8;')

I still get the UnicodeEncodeError below with this line: cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value,

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)

I wrote the following line of code to clean cells in every table of source MySQL database when writing to StringIO object.

cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value

Please help.

解决方案

str(cell) is trying to convert cell to ASCII. ASCII only supports characters with ordinals less than 255. What is cell?

If cell is a unicode string, just do cell.encode("utf8"), and that will return a bytestring encoded as utf 8

...or really iirc. If you pass mysql unicode, then the database will automagically convert it to utf8...

You could also try,

cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "")

or just use a 3rd party library. There is a good one that will fix text for you.

这篇关于UnicodeEncodeError:'ascii'编解码器不能将字符u'\’'编码在位置47:序号不在范围(128)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆