Python 从文档中去除 XML 标签 [英] Python strip XML tags from document

查看:182
本文介绍了Python 从文档中去除 XML 标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 Python 从文档中剥离 XML 标签,Python 是一种我是新手的语言.这是我第一次尝试使用正则表达式,whixh 确实是一个希望得到最好的主意.

I am trying to strip XML tags from a document using Python, a language I am a novice in. Here is my first attempt using regex, whixh was really a hope-for-the-best idea.

mfile = file("somefile.xml","w")

for line in mfile:
    re.sub('<./>',"",line) #trying to match elements between < and />

那很失败.我想知道应该如何使用正则表达式来完成.

That failed miserably. I would like to know how it should be done with regex.

其次,我用谷歌搜索发现:http://code.activestate.com/recipes/440481-strips-xmlhtml-tags-from-string/

Secondly, I googled and found: http://code.activestate.com/recipes/440481-strips-xmlhtml-tags-from-string/

这似乎有效.但我想知道有没有更简单的方法来摆脱所有 xml 标签?也许使用 ElementTree?

which seems to work. But I would like to know is there a simpler way to get rid of all xml tags? Maybe using ElementTree?

推荐答案

请注意,使用正则表达式通常是不正常的.请参阅耶利米回答.

试试这个:

import re

text = re.sub('<[^<]+>', "", open("/path/to/file").read())
with open("/path/to/file", "w") as f:
    f.write(text)

这篇关于Python 从文档中去除 XML 标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆