BeautifulSoup 中的 selfClosingTags [英] selfClosingTags in BeautifulSoup
问题描述
使用 BeautifulSoup 解析我的 XML
导入 BeautifulSoup汤 = BeautifulSoup.BeautifulStoneSoup( """<alan x="y"/><anne>hello</anne>""" ) # selfClosingTags=['alan'])打印汤.美化()
这将输出:
<安妮>你好</安妮></艾伦>
即,anne 标签是 alan 标签的子标签.
如果我在创建汤时传递 selfClosingTags=['alan'],我会得到:
<安妮>你好</安妮>
太好了!
我的问题:为什么不能使用 />
来表示自结束标记?
在注意到作者为类/模块命名了诸如 Beautiful[Stone]Soup 之类的名称后,您是在问作者的想法:-)
以下是 BeautifulStoneSoup 行为的另外两个示例:
<预><代码>>>>汤 = BeautifulSoup.BeautifulStoneSoup("""<alan x="y" ><anne>你好</anne>""")>>>打印汤.美化()<alan x="y"><安妮>你好</安妮></艾伦>>>>汤 = BeautifulSoup.BeautifulStoneSoup("""<alan x="y" ><anne>你好</anne>""",selfClosingTags=['alan'])>>>打印汤.美化()<alan x="y"/><安妮>你好</安妮>>>>我的看法:如果没有为解析器定义自闭合标签,则它是不合法的.所以作者在决定如何处理像 <alan x="y"/>
这样的非法片段时有选择...... (1) 假设 /
是一个错误 (2) 将 alan
视为一个自闭合标签,完全独立于它在输入中的其他地方的使用方式 (3) 在第一次传递中对输入进行 2 次传递,每个标签如何被使用.你更喜欢哪个选择?
Using BeautifulSoup to parse my XML
import BeautifulSoup
soup = BeautifulSoup.BeautifulStoneSoup( """<alan x="y" /><anne>hello</anne>""" ) # selfClosingTags=['alan'])
print soup.prettify()
This will output:
<alan x="y">
<anne>
hello
</anne>
</alan>
ie, the anne tag is a child of the alan tag.
If I pass selfClosingTags=['alan'] when I create the soup, I get:
<alan x="y" />
<anne>
hello
</anne>
Great!
My question: why can't the presence of the />
be used to indicate a self closing tag?
You are asking what was in the mind of an author, after having noted that he gives names like Beautiful[Stone]Soup to classes/modules :-)
Here are two more examples of the behaviour of BeautifulStoneSoup:
>>> soup = BeautifulSoup.BeautifulStoneSoup(
"""<alan x="y" ><anne>hello</anne>"""
)
>>> print soup.prettify()
<alan x="y">
<anne>
hello
</anne>
</alan>
>>> soup = BeautifulSoup.BeautifulStoneSoup(
"""<alan x="y" ><anne>hello</anne>""",
selfClosingTags=['alan'])
>>> print soup.prettify()
<alan x="y" />
<anne>
hello
</anne>
>>>
My take: a self-closing tag is not legal if it is not defined to the parser. So the author had choices when deciding how to handle an illegal fragment like <alan x="y" />
... (1) assume that the /
was a mistake (2) treat alan
as a self-closing tag quite independently of how it might be used elsewhere in the input (3) make 2 passes over the input nutting out in the first pass how each tag was used. Which choice do you prefer?
这篇关于BeautifulSoup 中的 selfClosingTags的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!