Python 从文档中去除 XML 标签 [英] Python strip XML tags from document
问题描述
我正在尝试使用 Python 从文档中剥离 XML 标签,Python 是一种我是新手的语言.这是我第一次尝试使用正则表达式,whixh 确实是一个希望得到最好的主意.
I am trying to strip XML tags from a document using Python, a language I am a novice in. Here is my first attempt using regex, whixh was really a hope-for-the-best idea.
mfile = file("somefile.xml","w")
for line in mfile:
re.sub('<./>',"",line) #trying to match elements between < and />
那很失败.我想知道应该如何使用正则表达式来完成.
That failed miserably. I would like to know how it should be done with regex.
其次,我用谷歌搜索发现:http://code.activestate.com/recipes/440481-strips-xmlhtml-tags-from-string/
Secondly, I googled and found: http://code.activestate.com/recipes/440481-strips-xmlhtml-tags-from-string/
这似乎有效.但我想知道有没有更简单的方法来摆脱所有 xml 标签?也许使用 ElementTree?
which seems to work. But I would like to know is there a simpler way to get rid of all xml tags? Maybe using ElementTree?
推荐答案
请注意,使用正则表达式通常是不正常的.请参阅耶利米回答.
试试这个:
import re
text = re.sub('<[^<]+>', "", open("/path/to/file").read())
with open("/path/to/file", "w") as f:
f.write(text)
这篇关于Python 从文档中去除 XML 标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!