UnicodeDecodeError: 'ascii' 编解码器无法解码位置 2 中的字节 0xd1:序号不在范围内 (128) [英] UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 2: ordinal not in range(128)

查看:25
本文介绍了UnicodeDecodeError: 'ascii' 编解码器无法解码位置 2 中的字节 0xd1:序号不在范围内 (128)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试处理一个非常大的数据集,其中包含一些非标准字符.根据工作规范,我需要使用 unicode,但我很困惑.(而且很可能做错了.)

我使用以下方法打开 CSV:

 15 ncesReader = csv.reader(open('geocoded_output.csv', 'rb'), delimiter='	', quotechar='"')

然后,我尝试对其进行编码:

name=school_name.encode('utf-8'), street=row[9].encode('utf-8'), city=row[10].encode('utf-8'), state=row[11].encode('utf-8'), zip5=row[12], zip4=row[13],county=row[25].encode('utf-8'), lat=row[22], lng=row[23])

我正在编码除 lat 和 lng 之外的所有内容,因为这些需要发送到 API.当我运行程序将数据集解析为我可以使用的数据时,我得到以下回溯.

回溯(最近一次调用最后一次):文件push_into_db.py",第 80 行,在 <module> 中主要的()文件push_into_db.py",第 74 行,在 mainDistrict_map = buildDistrictSchoolMap()文件push_into_db.py",第 32 行,在 buildDistrictSchoolMap 中县=行[25].编码('utf-8'),纬度=行[22],lng=行[23])UnicodeDecodeError: 'ascii' 编解码器无法解码位置 2 中的字节 0xd1:序号不在范围内 (128)

我想我应该告诉你我使用的是 python 2.7.2,这是在 django 1.4 上构建的应用程序的一部分.我已经阅读了几篇关于这个主题的帖子,但似乎没有一个直接适用.任何帮助将不胜感激.

您可能还想知道导致问题的一些非标准字符是 Ñ 和可能的 É.

解决方案

Unicode 不等于 UTF-8.后者只是前者的编码.

你的做法是错误的.您正在读取 UTF-8-编码数据,因此您必须将 UTF-8 编码字符串解码成一个 unicode 字符串.>

所以只需将 .encode 替换为 .decode,它应该可以工作(如果您的 .csv 是 UTF-8 编码的).

不过,没什么可羞耻的.我敢打赌,五分之三的程序员一开始都很难理解这一点,如果不是更多的话;)

更新:如果您的输入数据不是 UTF-8 编码,那么您当然必须.decode() 使用适当的编码.如果没有给出任何内容,python 假定 ASCII,这显然在非 ASCII 字符上失败.

I am attempting to work with a very large dataset that has some non-standard characters in it. I need to use unicode, as per the job specs, but I am baffled. (And quite possibly doing it all wrong.)

I open the CSV using:

 15     ncesReader = csv.reader(open('geocoded_output.csv', 'rb'), delimiter='	', quotechar='"')

Then, I attempt to encode it with:

name=school_name.encode('utf-8'), street=row[9].encode('utf-8'), city=row[10].encode('utf-8'), state=row[11].encode('utf-8'), zip5=row[12], zip4=row[13],county=row[25].encode('utf-8'), lat=row[22], lng=row[23])

I'm encoding everything except the lat and lng because those need to be sent out to an API. When I run the program to parse the dataset into what I can use, I get the following Traceback.

Traceback (most recent call last):
  File "push_into_db.py", line 80, in <module>
    main()
  File "push_into_db.py", line 74, in main
    district_map = buildDistrictSchoolMap()
  File "push_into_db.py", line 32, in buildDistrictSchoolMap
    county=row[25].encode('utf-8'), lat=row[22], lng=row[23])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd1 in position 2: ordinal not in range(128)

I think I should tell you that I'm using python 2.7.2, and this is part of an app build on django 1.4. I've read several posts on this topic, but none of them seem to directly apply. Any help will be greatly appreciated.

You might also want to know that some of the non-standard characters causing the issue are Ñ and possibly É.

解决方案

Unicode is not equal to UTF-8. The latter is just an encoding for the former.

You are doing it the wrong way around. You are reading UTF-8-encoded data, so you have to decode the UTF-8-encoded String into a unicode string.

So just replace .encode with .decode, and it should work (if your .csv is UTF-8-encoded).

Nothing to be ashamed of, though. I bet 3 in 5 programmers had trouble at first understanding this, if not more ;)

Update: If your input data is not UTF-8 encoded, then you have to .decode() with the appropriate encoding, of course. If nothing is given, python assumes ASCII, which obviously fails on non-ASCII-characters.

这篇关于UnicodeDecodeError: 'ascii' 编解码器无法解码位置 2 中的字节 0xd1:序号不在范围内 (128)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆