如何在beautifulsoup中删除xml标头? [英] How to remove xml header in beautifulsoup?

查看:59
本文介绍了如何在beautifulsoup中删除xml标头?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经导入并修改了一些xml,但是当我使用test.prettify()写出xml时.它将xml的顶行更改为

I have imported and modified some xml, but when I write out my xml using test.prettify(). It changes the top line of the xml from

<?xml version="1.0"?>

<?xml version="1.0" encoding="utf-8"?>

我不希望有此更改.我怎样才能保持第一行不变?最简单的方法是什么?

I don't want this change. How can I just keep the first line unchanged? What is the easiest way to do this?

如果有关系,我正在使用xml解析器.

If it matters, I'm using the xml parser.

soup = BeautifulSoup(r.text,'xml')

推荐答案

我敢肯定,有一个使用BeautifulSoup内置函数的更优雅的方法,但是根据您的评论,我会给您退出"版本:

I'm sure there's a more elegant way to do this using BeautifulSoup's built-ins, but based on your comment, I'll give you the "strip it out" version:

xml_string = '<?xml version="1.0" encoding="utf-8"?>'
print xml_string[:xml_string.find("encoding")-1] + "?>"

这足以从标头中剥离任何编码(而不仅仅是utf-8).

This is general enough to strip out any encoding from the header (not just utf-8).

这篇关于如何在beautifulsoup中删除xml标头?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆