美丽的汤和整洁 [英] Beautiful Soup and uTidy
本文介绍了美丽的汤和整洁的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想将 utidy 的结果传递给Beautiful Soup,ala:
I want to pass the results of utidy to Beautiful Soup, ala:
page = urllib2.urlopen(url)
options = dict(output_xhtml=1,add_xml_decl=0,indent=1,tidy_mark=0)
cleaned_html = tidy.parseString(page.read(), **options)
soup = BeautifulSoup(cleaned_html)
运行时,会出现以下错误:
When run, the following error results:
Traceback (most recent call last):
File "soup.py", line 34, in <module>
soup = BeautifulSoup(cleaned_html)
File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1499, in __init__
BeautifulStoneSoup.__init__(self, *args, **kwargs)
File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1230, in __init__
self._feed(isHTML=isHTML)
File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1245, in _feed
smartQuotesTo=self.smartQuotesTo, isHTML=isHTML)
File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1751, in __init__
self._detectEncoding(markup, isHTML)
File "/var/lib/python-support/python2.6/BeautifulSoup.py", line 1899, in _detectEncoding
xml_encoding_match = re.compile(xml_encoding_re).match(xml_data)
TypeError: expected string or buffer
我收集utidy返回一个XML文档,而BeautifulSoup需要一个字符串.有没有办法投射cleaned_html?还是我做错了,应该采取其他方法?
I gather utidy returns an XML document while BeautifulSoup wants a string. Is there a way to cast cleaned_html? Or am I doing it wrong and should take a different approach?
推荐答案
只需包装> cleaned_html
附近
将其传递给BeautifulSoup时.
Just wrap str()
around cleaned_html
when passing it to BeautifulSoup.
这篇关于美丽的汤和整洁的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文