UnicodeDecodeError:'ascii'编解码器无法解码位置2中的字节0xd1:序号不在范围(128) [英] UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 2: ordinal not in range(128)

查看:859
本文介绍了UnicodeDecodeError:'ascii'编解码器无法解码位置2中的字节0xd1:序号不在范围(128)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用一个非常大的数据集,其中包含一些非标准字符。我需要使用unicode,根据工作规格,但我很困惑。 (并且很可能这样做都是错误的。)



我使用以下方式打开CSV:

 code> 15 ncesReader = csv.reader(open('geocoded_output.csv','rb'),delimiter ='\t',quotechar ='')

然后,我尝试用:

  name = school_name.encode('utf-8'),street = row [9] .encode('utf-8'),city = row [10] .encode('utf-8'),state = row [ 11] .encode('utf-8'),zip5 = row [12],zip4 = row [13],county = row [25] .encode('utf-8'),lat = row [22],lng = row [23])

我编码除lat和lng之外的所有内容,因为那些需要发送到一个API,当我运行程序来解析数据集到我可以使用的东西,我得到以下追溯。

 文件push_into_db.py,第80行,< module> 
main()
文件push_into_db.py,第74行, main
district_map = buildDistrictS choolMap()
文件push_into_db.py,第32行,buildDistrictSchoolMap
county = row [25] .encode('utf-8'),lat = row [22],lng = row [ 23])
UnicodeDecodeError:'ascii'编解码器无法解码位置2中的字节0xd1:ordinal不在范围内(128)

我想我应该告诉你,我使用的是python 2.7.2,这是在django 1.4上构建的应用程序的一部分。我已经阅读了关于这个话题的几篇文章,但似乎没有一个直接适用。任何帮助将不胜感激。



您可能还想知道导致问题的一些非标准字符是Ñ,可能É。

解决方案

Unicode不等于UTF-8。后者只是前者的编码



你这样做是错误的。您正在阅读 UTF-8- 编码数据,因此您必须将UTF-8编码的字符串解码为unicode字符串。 p>

所以只需用 .decode 替换 .encode 应该工作(如果你的.csv是UTF-8编码的)。



尽管如此,没有什么可耻的。我打赌5中的3个程序员有麻烦,首先了解这个,如果不是更多;)



更新:
如果您的输入数据不是 UTF-8编码,那么你必须使用适当的编码 .decode()。如果没有给出,python会采用ASCII码,这在非ASCII字符上显然失败。


I am attempting to work with a very large dataset that has some non-standard characters in it. I need to use unicode, as per the job specs, but I am baffled. (And quite possibly doing it all wrong.)

I open the CSV using:

 15     ncesReader = csv.reader(open('geocoded_output.csv', 'rb'), delimiter='\t', quotechar='"')

Then, I attempt to encode it with:

name=school_name.encode('utf-8'), street=row[9].encode('utf-8'), city=row[10].encode('utf-8'), state=row[11].encode('utf-8'), zip5=row[12], zip4=row[13],county=row[25].encode('utf-8'), lat=row[22], lng=row[23])

I'm encoding everything except the lat and lng because those need to be sent out to an API. When I run the program to parse the dataset into what I can use, I get the following Traceback.

Traceback (most recent call last):
  File "push_into_db.py", line 80, in <module>
    main()
  File "push_into_db.py", line 74, in main
    district_map = buildDistrictSchoolMap()
  File "push_into_db.py", line 32, in buildDistrictSchoolMap
    county=row[25].encode('utf-8'), lat=row[22], lng=row[23])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 2: ordinal not in range(128)

I think I should tell you that I'm using python 2.7.2, and this is part of an app build on django 1.4. I've read several posts on this topic, but none of them seem to directly apply. Any help will be greatly appreciated.

You might also want to know that some of the non-standard characters causing the issue are Ñ and possibly É.

解决方案

Unicode is not equal to UTF-8. The latter is just an encoding for the former.

You are doing it the wrong way around. You are reading UTF-8-encoded data, so you have to decode the UTF-8-encoded String into a unicode string.

So just replace .encode with .decode, and it should work (if your .csv is UTF-8-encoded).

Nothing to be ashamed of, though. I bet 3 in 5 programmers had trouble at first understanding this, if not more ;)

Update: If your input data is not UTF-8 encoded, then you have to .decode() with the appropriate encoding, of course. If nothing is given, python assumes ASCII, which obviously fails on non-ASCII-characters.

这篇关于UnicodeDecodeError:'ascii'编解码器无法解码位置2中的字节0xd1:序号不在范围(128)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆