如何获得BeautifulSoup 4,尊重一个自闭的标签? [英] How to get BeautifulSoup 4 to respect a self-closing tag?
问题描述
此问题是特定于 BeautifulSoup4 ,这使得它从previous不同的问题
<一个href=\"http://stackoverflow.com/questions/1567402/why-is-beautifulsoup-modifying-my-self-closing-elements\">Why是BeautifulSoup修改我自闭的元素呢?
由于 BeautifulStoneSoup
消失(在previous XML解析器),我怎么能得到 BS4
尊重一个新的自关闭标签?例如:
进口BS4
S ='''&LT;富&GT; &LT;扎了=3/&GT; &LT; / foo的&GT;'''
汤= bs4.BeautifulSoup(S,selfClosingTags = ['酒吧'])打印汤。prettify()
不自行关闭栏
标签,反而给人一种暗示。这是什么树生成器,BS4指的是,如何我自闭的标签?
/usr/local/lib/python2.7/dist-packages/bs4 / __ init__.py:112:UserWarning:BS4不尊重selfClosingTags参数的构造函数BeautifulSoup。树构建器是负责了解自闭的标签。
BS4不尊重selfClosingTags参数传递给
&LT; HTML和GT;
&LT;身体GT;
&LT;富&GT;
&LT;扎了=3&GT;
&LT; /酒吧和GT;
&LT; / foo的&GT;
&LT; /身体GT;
&LT; / HTML&GT;
为了解析您的XML传递XML作为第二个参数的构造函数BeautifulSoup。
汤= bs4.BeautifulSoup(S,'XML')
您并不需要通过 selfClosingTags
了:
在[1]:进口BS4
在[2]:S ='''&所述;富&GT; &LT;扎了=3/&GT; &LT; / foo的&GT;'''
在[3]:汤= bs4.BeautifulSoup(S,'XML')
[4]:打印汤prettify()
&LT; XML版本=1.0编码=UTF-8&GT?;
&LT;富&GT;
&LT;扎了=3/&GT;
&LT; / foo的&GT;
This question is specific to BeautifulSoup4, which makes it different from the previous questions:
Why is BeautifulSoup modifying my self-closing elements?
selfClosingTags in BeautifulSoup
Since BeautifulStoneSoup
is gone (the previous xml parser), how can I get bs4
to respect a new self-closing tag? For example:
import bs4
S = '''<foo> <bar a="3"/> </foo>'''
soup = bs4.BeautifulSoup(S, selfClosingTags=['bar'])
print soup.prettify()
Does not self-close the bar
tag, but gives a hint. What is this tree builder that bs4 is referring to and how to I self-close the tag?
/usr/local/lib/python2.7/dist-packages/bs4/__init__.py:112: UserWarning: BS4 does not respect the selfClosingTags argument to the BeautifulSoup constructor. The tree builder is responsible for understanding self-closing tags.
"BS4 does not respect the selfClosingTags argument to the "
<html>
<body>
<foo>
<bar a="3">
</bar>
</foo>
</body>
</html>
To parse XML you pass in "xml" as the second argument to the BeautifulSoup constructor.
soup = bs4.BeautifulSoup(S, 'xml')
You’ll need to have lxml installed.
You don't need to pass selfClosingTags
anymore:
In [1]: import bs4
In [2]: S = '''<foo> <bar a="3"/> </foo>'''
In [3]: soup = bs4.BeautifulSoup(S, 'xml')
In [4]: print soup.prettify()
<?xml version="1.0" encoding="utf-8"?>
<foo>
<bar a="3"/>
</foo>
这篇关于如何获得BeautifulSoup 4,尊重一个自闭的标签?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!