如何将BeautifulSoup.ResultSet转换为字符串 [英] How to convert BeautifulSoup.ResultSet to string

查看:746
本文介绍了如何将BeautifulSoup.ResultSet转换为字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我用.findAll(BeautifulSoup)将一个html页面解析为名为result的变量. 如果我在Python shell中键入result,然后按Enter,则可以看到正常的文本,但是由于我想将该结果作为字符串对象进行后处理,因此我注意到str(result)返回垃圾,就像下面的示例:

So I parsed a html page with .findAll (BeautifulSoup) to variable named result. If I type result in Python shell then press Enter, I see normal text as expected, but as I wanted to postprocess this result as string object, I noticed that str(result) returns garbage, like this sample:

\xd1\x87\xd0\xb8\xd0\xbb\xd0\xbd\xd0\xb8\xd1\x86\xd0\xb0</a><br />\n<hr />\n</div>

HTML页面源是utf-8编码的

Html page source is utf-8 encoded

我该如何处理?

在必要时,代码基本上是这样的:

Code is basically this, in case it matters:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(urllib.open(url).read())
result = soup.findAll(something)

Python是2.7

Python is 2.7

推荐答案

Python 2.6.7 BeautifulSoup.版本 3.2.0

Python 2.6.7 BeautifulSoup.version 3.2.0

这对我有用:

unicode.join(u'\n',map(unicode,result))

我很确定resultBeautifulSoup.ResultSet对象,这似乎是标准python列表的扩展

I'm pretty sure a result is a BeautifulSoup.ResultSet object, which seems to be an extension of the standard python list

这篇关于如何将BeautifulSoup.ResultSet转换为字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆