BeautifulSoup 中的 selfClosingTags [英] selfClosingTags in BeautifulSoup

查看:15
本文介绍了BeautifulSoup 中的 selfClosingTags的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用 BeautifulSoup 解析我的 XML

导入 BeautifulSoup汤 = BeautifulSoup.BeautifulStoneSoup( """<alan x="y"/><anne>hello</anne>""" ) # selfClosingTags=['alan'])打印汤.美化()

这将输出:

<安妮>你好</安妮></艾伦>

即,anne 标签是 alan 标签的子标签.

如果我在创建汤时传递 selfClosingTags=['alan'],我会得到:

<安妮>你好</安妮>

太好了!

我的问题:为什么不能使用 /> 来表示自结束标记?

解决方案

在注意到作者为类/模块命名了诸如 Beautiful[Stone]Soup 之类的名称后,您是在问作者的想法:-)

以下是 BeautifulStoneSoup 行为的另外两个示例:

<预><代码>>>>汤 = BeautifulSoup.BeautifulStoneSoup("""<alan x="y" ><anne>你好</anne>""")>>>打印汤.美化()<alan x="y"><安妮>你好</安妮></艾伦>>>>汤 = BeautifulSoup.BeautifulStoneSoup("""<alan x="y" ><anne>你好</anne>""",selfClosingTags=['alan'])>>>打印汤.美化()<alan x="y"/><安妮>你好</安妮>>>>

我的看法:如果没有为解析器定义自闭合标签,则它是不合法的.所以作者在决定如何处理像 <alan x="y"/> 这样的非法片段时有选择...... (1) 假设 / 是一个错误 (2) 将 alan 视为一个自闭合标签,完全独立于它在输入中的其他地方的使用方式 (3) 在第一次传递中对输入进行 2 次传递,每个标签如何被使用.你更喜欢哪个选择?

Using BeautifulSoup to parse my XML

import BeautifulSoup

soup = BeautifulSoup.BeautifulStoneSoup( """<alan x="y" /><anne>hello</anne>""" ) # selfClosingTags=['alan'])

print soup.prettify()

This will output:

<alan x="y">
 <anne>
  hello
 </anne>
</alan>

ie, the anne tag is a child of the alan tag.

If I pass selfClosingTags=['alan'] when I create the soup, I get:

<alan x="y" />
<anne>
 hello
</anne>

Great!

My question: why can't the presence of the /> be used to indicate a self closing tag?

解决方案

You are asking what was in the mind of an author, after having noted that he gives names like Beautiful[Stone]Soup to classes/modules :-)

Here are two more examples of the behaviour of BeautifulStoneSoup:

>>> soup = BeautifulSoup.BeautifulStoneSoup(
    """<alan x="y" ><anne>hello</anne>"""
    )
>>> print soup.prettify()
<alan x="y">
 <anne>
  hello
 </anne>
</alan>

>>> soup = BeautifulSoup.BeautifulStoneSoup(
    """<alan x="y" ><anne>hello</anne>""",
    selfClosingTags=['alan'])
>>> print soup.prettify()
<alan x="y" />
<anne>
 hello
</anne>
>>>

My take: a self-closing tag is not legal if it is not defined to the parser. So the author had choices when deciding how to handle an illegal fragment like <alan x="y" /> ... (1) assume that the / was a mistake (2) treat alan as a self-closing tag quite independently of how it might be used elsewhere in the input (3) make 2 passes over the input nutting out in the first pass how each tag was used. Which choice do you prefer?

这篇关于BeautifulSoup 中的 selfClosingTags的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆