如何将BeautifulSoup.ResultSet转换为字符串 [英] How to convert BeautifulSoup.ResultSet to string
问题描述
所以我用.findAll
(BeautifulSoup)将一个html页面解析为名为result
的变量.
如果我在Python shell中键入result
,然后按Enter,则可以看到正常的文本,但是由于我想将该结果作为字符串对象进行后处理,因此我注意到str(result)
返回垃圾,就像下面的示例:
So I parsed a html page with .findAll
(BeautifulSoup) to variable named result
.
If I type result
in Python shell then press Enter, I see normal text as expected, but as I wanted to postprocess this result as string object, I noticed that str(result)
returns garbage, like this sample:
\xd1\x87\xd0\xb8\xd0\xbb\xd0\xbd\xd0\xb8\xd1\x86\xd0\xb0</a><br />\n<hr />\n</div>
HTML页面源是utf-8
编码的
Html page source is utf-8
encoded
我该如何处理?
在必要时,代码基本上是这样的:
Code is basically this, in case it matters:
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib.open(url).read())
result = soup.findAll(something)
Python是2.7
Python is 2.7
推荐答案
Python 2.6.7 BeautifulSoup.版本 3.2.0
Python 2.6.7 BeautifulSoup.version 3.2.0
这对我有用:
unicode.join(u'\n',map(unicode,result))
我很确定result
是BeautifulSoup.ResultSet
对象,这似乎是标准python列表的扩展
I'm pretty sure a result
is a BeautifulSoup.ResultSet
object, which seems to be an extension of the standard python list
这篇关于如何将BeautifulSoup.ResultSet转换为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!