可以BeautifulSoup解析XML当某些标记是自闭合和不同时 [英] Can BeautifulSoup parse xml when certain tag is self-closing and not at the same time

查看:374
本文介绍了可以BeautifulSoup解析XML当某些标记是自闭合和不同时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



情况如下所示。
XML文件:

<tag1/>  
<tag2>some_data</tag2>
<tag1>some_another_data</tag1>

标记1有时是自闭,有时有内部的数据。

tag1 is sometimes self-closing and sometimes has data inside.

code:

from BeautifulSoup import BeautifulStoneSoup
s = '<tag1/><tag2>some_data</tag2><tag1>some_another_data</tag1>'
soup1 = BeautifulStoneSoup(s)
soup2 = BeautifulStoneSoup(s, selfClosingTags=["tag1"])
print soup1.prettify()
print
print soup2.prettify()

输出:

<tag1>
 <tag2>
  some_data
 </tag2>
</tag1>
<tag1>
 some_another_data
</tag1>

<tag1 />
<tag2>
 some_data
</tag2>
<tag1 />
some_another_data

在第一种情况下TAG1吃下面的标记(如果它不被再次TAG1),因为在默认情况下不支持自闭标签。
在第二种情况下自动关闭的标签不支持子标签。

In the first case tag1 eats the following tag (if it is not tag1 again), because there is no support of self-closing tags by default. in the second case self-closing tag doesn't support child tags.

我只想得到结构原始XML文档。是否有可能与BeautifulSoup?如果有可能,那么如何让默认情况下,所有的标签自闭?有很多的XML文件,我不想手动搜索所有此类情况。

I just want to get structure as original xml document. Is it possible with BeautifulSoup? And if it is possible, then how to make all tags self-closing by default? There is a lot of xml files and I don't want to search all such situations manually.

推荐答案

我不建议BeautifulSoup(即使不是HTML解析)。从标准库的ElementTree使用,或 LXML 的,如果你需要一个更强大的XML库。

I'd not recommend BeautifulSoup (not even for HTML parsing). Use ElementTree from the standard library, or lxml, if you need a more powerful XML library.

这篇关于可以BeautifulSoup解析XML当某些标记是自闭合和不同时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆