UnicodeEncodeError: 'ascii' 编解码器无法对位置 20 中的字符 u'xa0' 进行编码:序号不在范围内 (128) [英] UnicodeEncodeError: 'ascii' codec can't encode character u'xa0' in position 20: ordinal not in range(128)

查看:31
本文介绍了UnicodeEncodeError: 'ascii' 编解码器无法对位置 20 中的字符 u'xa0' 进行编码:序号不在范围内 (128)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在处理从不同网页(在不同网站上)获取的文本中的 unicode 字符时遇到问题.我正在使用 BeautifulSoup.

I'm having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

问题在于错误并不总是可重现;它有时适用于某些页面,有时,它会抛出 UnicodeEncodeError.我已经尝试了几乎所有我能想到的方法,但我还没有找到任何可以在不抛出某种 Unicode 相关错误的情况下始终如一地工作的东西.

The problem is that the error is not always reproducible; it sometimes works with some pages, and sometimes, it barfs by throwing a UnicodeEncodeError. I have tried just about everything I can think of, and yet I have not found anything that works consistently without throwing some kind of Unicode-related error.

导致问题的代码部分如下所示:

One of the sections of code that is causing problems is shown below:

agent_telno = agent.find('div', 'agent_contact_number')
agent_telno = '' if agent_telno is None else agent_telno.contents[0]
p.agent_info = str(agent_contact + ' ' + agent_telno).strip()

这是运行上述代码段时在某些字符串上生成的堆栈跟踪:

Here is a stack trace produced on SOME strings when the snippet above is run:

Traceback (most recent call last):
  File "foobar.py", line 792, in <module>
    p.agent_info = str(agent_contact + ' ' + agent_telno).strip()
UnicodeEncodeError: 'ascii' codec can't encode character u'xa0' in position 20: ordinal not in range(128)

我怀疑这是因为某些页面(或更具体地说,来自某些站点的页面)可能已编码,而其他页面可能未编码.所有网站均位于英国,并提供供英国消费的数据 - 因此不存在与内部化或处理用英语以外的任何文字书写的文本相关的问题.

I suspect that this is because some pages (or more specifically, pages from some of the sites) may be encoded, whilst others may be unencoded. All the sites are based in the UK and provide data meant for UK consumption - so there are no issues relating to internalization or dealing with text written in anything other than English.

有没有人对如何解决这个问题有任何想法,以便我能够始终如一地解决这个问题?

Does anyone have any ideas as to how to solve this so that I can CONSISTENTLY fix this problem?

推荐答案

你需要阅读 Python Unicode 操作指南.此错误是第一个示例.

You need to read the Python Unicode HOWTO. This error is the very first example.

基本上,停止使用 str 将 unicode 转换为编码文本/字节.

Basically, stop using str to convert from unicode to encoded text / bytes.

相反,正确使用 .encode() 对字符串进行编码:

Instead, properly use .encode() to encode the string:

p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()

或完全使用 unicode.

or work entirely in unicode.

这篇关于UnicodeEncodeError: 'ascii' 编解码器无法对位置 20 中的字符 u'xa0' 进行编码:序号不在范围内 (128)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆