可以BeautifulSoup解析XML当某些标记是自闭合和不同时 [英] Can BeautifulSoup parse xml when certain tag is self-closing and not at the same time
问题描述
情况如下所示。
XML文件:
<tag1/>
<tag2>some_data</tag2>
<tag1>some_another_data</tag1>
标记1有时是自闭,有时有内部的数据。
tag1 is sometimes self-closing and sometimes has data inside.
code:
from BeautifulSoup import BeautifulStoneSoup
s = '<tag1/><tag2>some_data</tag2><tag1>some_another_data</tag1>'
soup1 = BeautifulStoneSoup(s)
soup2 = BeautifulStoneSoup(s, selfClosingTags=["tag1"])
print soup1.prettify()
print
print soup2.prettify()
输出:
<tag1>
<tag2>
some_data
</tag2>
</tag1>
<tag1>
some_another_data
</tag1>
<tag1 />
<tag2>
some_data
</tag2>
<tag1 />
some_another_data
在第一种情况下TAG1吃下面的标记(如果它不被再次TAG1),因为在默认情况下不支持自闭标签。
在第二种情况下自动关闭的标签不支持子标签。
In the first case tag1 eats the following tag (if it is not tag1 again), because there is no support of self-closing tags by default. in the second case self-closing tag doesn't support child tags.
我只想得到结构原始XML文档。是否有可能与BeautifulSoup?如果有可能,那么如何让默认情况下,所有的标签自闭?有很多的XML文件,我不想手动搜索所有此类情况。
I just want to get structure as original xml document. Is it possible with BeautifulSoup? And if it is possible, then how to make all tags self-closing by default? There is a lot of xml files and I don't want to search all such situations manually.
推荐答案
我不建议BeautifulSoup(即使不是HTML解析)。从标准库的ElementTree使用,或 LXML 的,如果你需要一个更强大的XML库。
I'd not recommend BeautifulSoup (not even for HTML parsing). Use ElementTree from the standard library, or lxml, if you need a more powerful XML library.
这篇关于可以BeautifulSoup解析XML当某些标记是自闭合和不同时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!