UNI $ C $岑codeError:“ASCII”codeC无法连接code字符U'\\ XA0'位置20:范围序数不(128) [英] UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

查看:395
本文介绍了UNI $ C $岑codeError:“ASCII”codeC无法连接code字符U'\\ XA0'位置20:范围序数不(128)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在与UNI code字符处理来自不同的网页文本抓取的问题(在不同的网站)。我使用BeautifulSoup。

I'm having problems dealing with unicode characters from text fetched from different web pages (on different sites). I am using BeautifulSoup.

的问题是,该错误不总是可再现的;它有时会与一些网页的工作,有时,它barfs通过抛出一个的Uni $ C $岑codeError 。我曾尝试只是一切我能想到的,但我没有发现任何东西,始终作品,未经投掷某种统一code相关的错误。

The problem is that the error is not always reproducible; it sometimes works with some pages, and sometimes, it barfs by throwing a UnicodeEncodeError. I have tried just about everything I can think of, and yet I have not found anything that works consistently without throwing some kind of Unicode-related error.

一个code是造成问题的部分如下图所示:

One of the sections of code that is causing problems is shown below:

agent_telno = agent.find('div', 'agent_contact_number')
agent_telno = '' if agent_telno is None else agent_telno.contents[0]
p.agent_info = str(agent_contact + ' ' + agent_telno).strip()

下面是一些字符串产生一个堆栈跟踪代码段以上运行时:

Here is a stack trace produced on SOME strings when the snippet above is run:

Traceback (most recent call last):
  File "foobar.py", line 792, in <module>
    p.agent_info = str(agent_contact + ' ' + agent_telno).strip()
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)

我怀疑,这是因为某些页面(或者更具体地说,从一些网站的页面)可连接codeD,而其他人可能unen codeD。所有的网站都设在英国,并提供意味着英国消费数据 - 所以有关于内部或处理的书面英语以外的其他任何文本没有任何问题。

I suspect that this is because some pages (or more specifically, pages from some of the sites) may be encoded, whilst others may be unencoded. All the sites are based in the UK and provide data meant for UK consumption - so there are no issues relating to internalization or dealing with text written in anything other than English.

没有人有任何想法,如何解决这个问题,这样我可以始终如一地解决这个问题呢?

Does anyone have any ideas as to how to solve this so that I can CONSISTENTLY fix this problem?

推荐答案

您需要阅读的Python 统一code HOWTO 。这个错误是第一个示例

You need to read the Python Unicode HOWTO. This error is the very first example.

基本上停止使用 STR 从UNI code转换为EN codeD文/字节。

Basically, stop using str to convert from unicode to encoded text / bytes.

相反,正确使用 .EN code( ) 以EN code中的字符串:

Instead, properly use .encode() to encode the string:

p.agent_info = u' '.join((agent_contact, agent_telno)).encode('utf-8').strip()

或UNI code完全正常工作。

or work entirely in unicode.

这篇关于UNI $ C $岑codeError:“ASCII”codeC无法连接code字符U'\\ XA0'位置20:范围序数不(128)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆