在BeautifulSoup selfClosingTags [英] selfClosingTags in BeautifulSoup

查看:201
本文介绍了在BeautifulSoup selfClosingTags的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用BeautifulSoup解析我的XML

Using BeautifulSoup to parse my XML

import BeautifulSoup

soup = BeautifulSoup.BeautifulStoneSoup( """<alan x="y" /><anne>hello</anne>""" ) # selfClosingTags=['alan'])

print soup.prettify()

这将输出:

<alan x="y">
 <anne>
  hello
 </anne>
</alan>

也就是说,安妮标签是艾伦标签的孩子。

ie, the anne tag is a child of the alan tag.

如果我通过selfClosingTags = ['阿伦']当我创建了汤,我得到:

If I pass selfClosingTags=['alan'] when I create the soup, I get:

<alan x="y" />
<anne>
 hello
</anne>

大!

我的问题:为什么不能 /&gt;中presence; 用来表示一个自我结束标记

My question: why can't the presence of the /> be used to indicate a self closing tag?

推荐答案

您所问的是作者的心灵,在已经指出,他给了像美丽的[石]汤名称类/模块: - )

You are asking what was in the mind of an author, after having noted that he gives names like Beautiful[Stone]Soup to classes/modules :-)

下面是两个多BeautifulStoneSoup的行为的例子:

Here are two more examples of the behaviour of BeautifulStoneSoup:

>>> soup = BeautifulSoup.BeautifulStoneSoup(
    """<alan x="y" ><anne>hello</anne>"""
    )
>>> print soup.prettify()
<alan x="y">
 <anne>
  hello
 </anne>
</alan>

>>> soup = BeautifulSoup.BeautifulStoneSoup(
    """<alan x="y" ><anne>hello</anne>""",
    selfClosingTags=['alan'])
>>> print soup.prettify()
<alan x="y" />
<anne>
 hello
</anne>
>>>

我的看法:如果没有定义解析器自动关闭的标签是不合法的。因此,笔者决定如何处理非法的片段像&LT时不得不选择;艾伦X =Y/&GT; (1)假定 / 是一个错误(2)治疗阿伦作为完全独立的如何可能会在输入别处使用的自闭标签(3)使2越过输入在第一遍是如何使用的每个标签纳丁出来。哪种选择你preFER?

My take: a self-closing tag is not legal if it is not defined to the parser. So the author had choices when deciding how to handle an illegal fragment like <alan x="y" /> ... (1) assume that the / was a mistake (2) treat alan as a self-closing tag quite independently of how it might be used elsewhere in the input (3) make 2 passes over the input nutting out in the first pass how each tag was used. Which choice do you prefer?

这篇关于在BeautifulSoup selfClosingTags的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆