UNI $ C $岑codeError与BeautifulSoup 3.1.0.1和Python 2.5.2 [英] UnicodeEncodeError with BeautifulSoup 3.1.0.1 and Python 2.5.2

查看:123
本文介绍了UNI $ C $岑codeError与BeautifulSoup 3.1.0.1和Python 2.5.2的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用BeautifulSoup 3.1.0.1和Python 2.5.2,并试图用法语解析网页。然而,当我打电话的findAll,我得到以下错误:

的Uni $ C $岑codeError:ASCIIcodeC无法连接code字符U'\\ xe9在1146的位置是:序数不在范围内(128)

下面是code我目前运行的:

 进口的urllib2
从BeautifulSoup进口BeautifulSoup
页= urllib2.urlopen(http://fr.encarta.msn.com/encyclopedia_761561798/Paris.html)
汤= BeautifulSoup(页,fromEncoding =LATIN1)
R = soup.findAll(表)
打印内容R

任何人是否有一个想法,为什么?

谢谢!

更新:作为resquested,下面是完整的回溯

 回溯(最后最近一次调用):
  文件[...] \\ test.py,6号线,上述<&模块GT;
    打印内容R
UNI $ C $岑codeError:ASCIIcodeC无法连接在1146年至1147年的位置code字符:在范围序数不(128)


解决方案

下面是另一种思路。您的终端不能够显示来自一个Python UNI code字符串。跨preTER试图先将其转换成ASCII。你应该带code将其明确地打印之前。我不知道 soup.findAll的精确语义()。但它可能是这样的:

 在soup.findAll(表)T:
    打印t.en code('latin1的')

如果 T 真的是一个字符串。也许它只是从你必须建立你要显示的数据的另一个对象。

With BeautifulSoup 3.1.0.1 and Python 2.5.2, and trying to parse a web page in French. However, as soon as I call findAll, I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 1146: ordinal not in range(128)

Below is the code I am currently running:

import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("http://fr.encarta.msn.com/encyclopedia_761561798/Paris.html")
soup = BeautifulSoup(page, fromEncoding="latin1")
r = soup.findAll("table")
print r

Does anybody have an idea why?

Thanks!

UPDATE: As resquested, below is the full Traceback

Traceback (most recent call last):
  File "[...]\test.py", line 6, in <module>
    print r
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1146-1147: ordinal not in range(128)

解决方案

Here is another idea. Your terminal is not capable of displaying an unicode string from Python. The interpreter tries to convert it to ASCII first. You should encode it explicitly before printing. I don't know the exact semantics of soup.findAll(). But it is probably something like:

for t in  soup.findAll("table"):
    print t.encode('latin1')

If t really is a string. Maybe its just another object from which you have to build the data that you want to display.

这篇关于UNI $ C $岑codeError与BeautifulSoup 3.1.0.1和Python 2.5.2的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆