如何在beautifulsoup中删除xml标头? [英] How to remove xml header in beautifulsoup?
问题描述
我已经导入并修改了一些xml,但是当我使用test.prettify()写出xml时.它将xml的顶行更改为
I have imported and modified some xml, but when I write out my xml using test.prettify(). It changes the top line of the xml from
<?xml version="1.0"?>
到
<?xml version="1.0" encoding="utf-8"?>
我不希望有此更改.我怎样才能保持第一行不变?最简单的方法是什么?
I don't want this change. How can I just keep the first line unchanged? What is the easiest way to do this?
如果有关系,我正在使用xml解析器.
If it matters, I'm using the xml parser.
soup = BeautifulSoup(r.text,'xml')
推荐答案
我敢肯定,有一个使用BeautifulSoup内置函数的更优雅的方法,但是根据您的评论,我会给您退出"版本:
I'm sure there's a more elegant way to do this using BeautifulSoup's built-ins, but based on your comment, I'll give you the "strip it out" version:
xml_string = '<?xml version="1.0" encoding="utf-8"?>'
print xml_string[:xml_string.find("encoding")-1] + "?>"
这足以从标头中剥离任何编码(而不仅仅是utf-8).
This is general enough to strip out any encoding from the header (not just utf-8).
这篇关于如何在beautifulsoup中删除xml标头?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!