在BeautifulSoup selfClosingTags [英] selfClosingTags in BeautifulSoup
问题描述
使用BeautifulSoup解析我的XML
Using BeautifulSoup to parse my XML
import BeautifulSoup
soup = BeautifulSoup.BeautifulStoneSoup( """<alan x="y" /><anne>hello</anne>""" ) # selfClosingTags=['alan'])
print soup.prettify()
这将输出:
<alan x="y">
<anne>
hello
</anne>
</alan>
也就是说,安妮标签是艾伦标签的孩子。
ie, the anne tag is a child of the alan tag.
如果我通过selfClosingTags = ['阿伦']当我创建了汤,我得到:
If I pass selfClosingTags=['alan'] when I create the soup, I get:
<alan x="y" />
<anne>
hello
</anne>
大!
我的问题:为什么不能 /&gt;中presence;
用来表示一个自我结束标记
My question: why can't the presence of the />
be used to indicate a self closing tag?
推荐答案
您所问的是作者的心灵,在已经指出,他给了像美丽的[石]汤名称类/模块: - )
You are asking what was in the mind of an author, after having noted that he gives names like Beautiful[Stone]Soup to classes/modules :-)
下面是两个多BeautifulStoneSoup的行为的例子:
Here are two more examples of the behaviour of BeautifulStoneSoup:
>>> soup = BeautifulSoup.BeautifulStoneSoup(
"""<alan x="y" ><anne>hello</anne>"""
)
>>> print soup.prettify()
<alan x="y">
<anne>
hello
</anne>
</alan>
>>> soup = BeautifulSoup.BeautifulStoneSoup(
"""<alan x="y" ><anne>hello</anne>""",
selfClosingTags=['alan'])
>>> print soup.prettify()
<alan x="y" />
<anne>
hello
</anne>
>>>
我的看法:如果没有定义解析器自动关闭的标签是不合法的。因此,笔者决定如何处理非法的片段像&LT时不得不选择;艾伦X =Y/&GT;
(1)假定 /
是一个错误(2)治疗阿伦
作为完全独立的如何可能会在输入别处使用的自闭标签(3)使2越过输入在第一遍是如何使用的每个标签纳丁出来。哪种选择你preFER?
My take: a self-closing tag is not legal if it is not defined to the parser. So the author had choices when deciding how to handle an illegal fragment like <alan x="y" />
... (1) assume that the /
was a mistake (2) treat alan
as a self-closing tag quite independently of how it might be used elsewhere in the input (3) make 2 passes over the input nutting out in the first pass how each tag was used. Which choice do you prefer?
这篇关于在BeautifulSoup selfClosingTags的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!