用BeautifulSoup卡在python中的编码中 [英] stuck with encodings in python with BeautifulSoup
本文介绍了用BeautifulSoup卡在python中的编码中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
该页面使用UTF-8编码,并且使用python的HTMLParser正常运行,没有UnicodeDecodeError
,但是当我尝试使用BeautifulSoup解析该页面时,确实出现了错误.
我已经尝试过_*_
编码:utf-8 _*_
,.encode('utf-8')
到处都出现错误
The page is encoded in UTF-8 and with python's HTMLParser it works well, no UnicodeDecodeError
, but I do get an error when I try to parse it with BeautifulSoup.
I've tried _*_
coding: utf-8 _*_
, .encode('utf-8')
everywhere and am still getting the error
import urllib
from BeautifulSoup import BeautifulSoup
args=urllib.urlencode({'keywords':'magic'})
doc=urllib.urlopen('http://www.example.com/submit', args)
soup=BeautifulSoup(doc)
stuff = soup.findAll('section',id='banner')
print stuff
Traceback (most recent call last):
File "test.py", line 7, in <module>
print stuff
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 112: ordinal not in range(128)
推荐答案
好吧,我在上一次尝试中找到了解决方案,也许它可以帮助遇到相同问题的其他人. 它需要编码,而不是解码
Ok i found the solution in my last try, maybe it will help others with the same problem. It needs to be encoded, not decoded
print( [e.encode('utf-8', 'ignore') for e in stuff] )
这篇关于用BeautifulSoup卡在python中的编码中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文