Python:对于CSV文件,将Unicode转换为ASCII无错误 [英] Python: Convert Unicode to ASCII without errors for CSV file

查看:778
本文介绍了Python:对于CSV文件,将Unicode转换为ASCII无错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在阅读所有关于从Python转换为CSV的问题在Python这里在StackOverflow和我仍然迷路。每次我收到UnicodeEncodeError:'ascii'编解码器不能编码字符u'\xd1'在位置12:序数不在范围(128)

  buffer = cStringIO.StringIO()
writer = csv.writer(buffer,csv.excel)
cr.execute(query,query_param)
while )
row = cr.fetchone()
writer.writerow([s.encode('ascii','ignore')for s in row])



的值为

 (56,uLIMPIADOR BA\xd1O 1'5 L)

其中数据库中的\xd10的值为,具有西班牙语使用的变音符号。起初,我试图将值转换为有效的ascii,但在失去这么多时间后,我只想忽略这些字符(我想我会有同样的问题与重音元音)。



我想将值保存到CSV,最好用ñ(LIMPIADORBAÑO1'5 L),但如果不可能,至少能够保存它LIMPIADOR BAO 1'5 L)。

解决方案

正确,ñ不是有效的ASCII字符,它到ASCII。因此,您可以像上面的代码一样,忽略它们。另一种方式,即删除口音,你可以在这里找到:
在python unicode字符串中删除重音符的最好方法是什么?



但是请注意,可能会导致不良后果,如使词语实际上意味着不同的东西,等等。所以最好是保持口音。然后你不能使用ASCII,但你可以使用另一个编码。 UTF-8是安全的赌注。拉丁语-1或ISO-88591-1是常见的,但它只包括西欧字符。 CP-1252在Windows上是常见的等等。



所以只要切换ascii任何你想要的编码。






您的实际代码根据您的意见是:

  .writerow([s.encode('utf8')if type(s)is unicode else s for s in row])


b $ b

其中

  row =(56,uLIMPIADOR BA\xd1O 1'5 L)

现在,我相信应该能工作,但显然不行。我认为unicode被传递到cvs作家错误无论如何。打开那条长线到它的部分:

  col1,col2 = row#使用实际存在的名称
row = col1,col2.encode('utf8')
writer.writerow(row)

现在你的真正的错误不会被隐藏的事实,你把一切都在同一行。如果您已包含适当的回溯,这也可能已避免。


I've been reading all questions regarding conversion from Unicode to CSV in Python here in StackOverflow and I'm still lost. Everytime I receive a "UnicodeEncodeError: 'ascii' codec can't encode character u'\xd1' in position 12: ordinal not in range(128)"

buffer=cStringIO.StringIO()
writer=csv.writer(buffer, csv.excel)
cr.execute(query, query_param)
while (1):
    row = cr.fetchone()
    writer.writerow([s.encode('ascii','ignore') for s in row])

The value of row is

(56, u"LIMPIADOR BA\xd1O 1'5 L")

where the value of \xd10 at the database is ñ, a n with a diacritical tilde used in Spanish. At first I tried to convert the value to something valid in ascii, but after losing so much time I'm trying only to ignore those characters (I suppose I'd have the same problem with accented vowels).

I'd like to save the value to the CSV, preferably with the ñ ("LIMPIADOR BAÑO 1'5 L"), but if not possible, at least be able to save it ("LIMPIADOR BAO 1'5 L").

解决方案

Correct, ñ is not a valid ASCII character, so you can't encode it to ASCII. So you can, as your code does above, ignore them. Another way, namely to remove the accents, you can find here: What is the best way to remove accents in a python unicode string?

But note that both techniques can result in bad effects, like making words actually mean something different, etc. So the best is to keep the accents. And then you can't use ASCII, but you can use another encoding. UTF-8 is the safe bet. Latin-1 or ISO-88591-1 is common one, but it includes only Western European characters. CP-1252 is common on Windows, etc, etc.

So just switch "ascii" for whatever encoding you want.


Your actual code, according to your comment is:

writer.writerow([s.encode('utf8') if type(s) is unicode else s for s in row]) 

where

row = (56, u"LIMPIADOR BA\xd1O 1'5 L")

Now, I believe that should work, but apparently it doesn't. I think unicode gets passed into the cvs writer by mistake anyway. Unwrap that long line to it's parts:

col1, col2 = row # Use the names of what is actually there instead
row = col1, col2.encode('utf8')
writer.writerow(row) 

Now your real error will not be hidden by the fact that you stick everything in the same line. This could also probably have been avoided if you had included a proper traceback.

这篇关于Python:对于CSV文件,将Unicode转换为ASCII无错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆