用BeautifulSoup卡在python中的编码中 [英] stuck with encodings in python with BeautifulSoup

查看:56
本文介绍了用BeautifulSoup卡在python中的编码中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该页面使用UTF-8编码,并且使用python的HTMLParser正常运行,没有UnicodeDecodeError,但是当我尝试使用BeautifulSoup解析该页面时,确实出现了错误. 我已经尝试过_*_编码:utf-8 _*_.encode('utf-8')到处都出现错误

The page is encoded in UTF-8 and with python's HTMLParser it works well, no UnicodeDecodeError, but I do get an error when I try to parse it with BeautifulSoup. I've tried _*_ coding: utf-8 _*_, .encode('utf-8') everywhere and am still getting the error

import urllib
from BeautifulSoup import BeautifulSoup
args=urllib.urlencode({'keywords':'magic'})
doc=urllib.urlopen('http://www.example.com/submit', args)
soup=BeautifulSoup(doc)
stuff = soup.findAll('section',id='banner')
print stuff

Traceback (most recent call last):
      File "test.py", line 7, in <module>
        print stuff
    UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 112: ordinal not in range(128)

推荐答案

好吧,我在上一次尝试中找到了解决方案,也许它可以帮助遇到相同问题的其他人. 它需要编码,而不是解码

Ok i found the solution in my last try, maybe it will help others with the same problem. It needs to be encoded, not decoded

print( [e.encode('utf-8', 'ignore') for e in stuff] )

这篇关于用BeautifulSoup卡在python中的编码中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆