如何使用 Python 在 XML 文件中搜索和替换文本? [英] How to search and replace text in an XML file using Python?

查看:54
本文介绍了如何使用 Python 在 XML 文件中搜索和替换文本?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在整个 xml 文件中搜索特定文本模式,然后用 Python 3.5 中的新文本模式替换该文本的每个匹配项?

How do I search an entire xml file for a specific text pattern and then replace each occurrence of that text with new text pattern in Python 3.5?

其他所有内容(格式、属性、注释等)都需要在原始 xml 文件中保持原样.

Everything else (format, attributes, comments, etc.) needs to remain as it is in the original xml file.

我在 Windows (win32) 上运行 Python 3.5.1.

I am running Python 3.5.1 on Windows (win32).

具体来说,我想用THIS WORKED"替换每次出现的FEATURE NAME",用12345"替换每次出现的FEATURE NUMBER".

Specifically, I would like to replace each occurrence of "FEATURE NAME" with "THIS WORKED" and replace each occurrence of "FEATURE NUMBER" with "12345".

我一直在尝试学习 Python 和 xml.etree.ElementTree 但无法弄清楚这一点.我已经看过在 Python 中搜索和替换 .xml 文件中的一行"、在 Python 中搜索和替换文件中的一行"和如何使用 Python 搜索和替换文件中的文本?"和本网站上其他现有的问答,但无法弄清楚 - 我不是一个有经验的程序员,所以如果需要更多输入,请告诉我.非常感谢您的帮助!!!

I have been trying to learn Python and xml.etree.ElementTree but cannot figure this out. I already looked at "Search and replace a line in a .xml file in Python", "Search and replace a line in a file in Python", and "How to search and replace text in a file using Python?" and other existing Q/A's on this site but cannot figure this out - I'm not an experienced programmer, so please let me know if more input is needed . Your help is greatly appreciated!!!

这是我在记事本中打开 xml 代码时的样子的副本(除了我添加了空格以缩进每一行并在将其粘贴到此问题中时按回车键):

Here is a copy of what the xml code looks like when I open it in Notepad (except I added spaces to indent each line and hit return for some lines when I pasted it into this question):

<description-topic>
    <access-info>
        <index-term-set>
            <index-term>
                <primary>FID FEATURE NUMBER</primary>
            </index-term>
            <index-term>
                <primary>FEATURE NAME</primary>
            </index-term>
            <index-term>
                <primary>Common features</primary>
                <secondary>FID FEATURE NUMBER</secondary>
            </index-term>
        </index-term-set>
    </access-info>
    <title>FEATURE NUMBER - FEATURE NAME</title>
    <block>
        <label>Platform</label>
        <comment>REVIEWERS: I guessed at the FEATURE NAME</comment>
        <para>
            This feature applies to the following platforms: FEATURE NAME<!--Check the values--></para>
    </block>
    <block branch="no">
        <label>Feature Benefits</label>
        <para>
            <comment>REVIEWERS: What do we put here? See template (link given in review email) for more information.</comment>
        </para>
    </block>
    <block branch="no">
        <label>Dependencies</label>
        <para/>
        <subblock>
            <label>Features</label>
            <comment>What FEATURE NAME do we put here?</comment>
        </subblock>
        <subblock>
            <label>Hardware</label>
            <comment>What FEATURE NAME do we put here?</comment>
            <para>This feature applies to the following: FEATURE NUMBER and text.</para><?Pub Caret -1?>
        </subblock>
        <subblock>
            <label>Dependencies outside the eNodeB</label>
            <comment>What FEATURE NAME do we put here?</comment>
        </subblock>
    </block>
    <block branch="no">
        <label>Impacts</label>
        <comment>REVIEWERS: What FEATURE NUMBER do we put here?</comment>
        <para>
            <comment/>
        </para>
    </block>
</description-topic>

这是我试图开始工作的最新代码:

Here is the latest code I am trying to get to work:

from xml.etree import ElementTree as et
tree = et.parse('Atemplate2.xml')
tree.find('description-topic/access-info/index-term-set/index-term/primary/').text = '12345'
tree.write('Atemplate2.xml')

我收到以下错误:回溯(最近一次调用最后一次):文件ajktest18.py",第 15 行,在tree.find('description-topic/access-info/index-term-set/index-term/primary/').text = '12345'

I get the following error: Traceback (most recent call last): File "ajktest18.py", line 15, in tree.find('description-topic/access-info/index-term-set/index-term/primary/').text = '12345'

AttributeError: 'NoneType' 对象没有属性 'text'

AttributeError: 'NoneType' object has no attribute 'text'

我希望能够搜索和修改整个文件中的任何出现,但我不知道如何找到我正在搜索的文本的特定出现.

I would prefer to be able to search and modify any occurrences in the entire file, but I can't figure out how to get to even one specific occurrence of the text I am searching for.

这是我试图用来查找路径的代码:

Here is the code I tried to use to find the path:

import xml.etree.ElementTree as ET
tree = ET.parse('Atemplate.xml')
root = tree.getroot()

print(root.tag, root.attrib, root.text)

for child in root:
    print(child.tag, child.attrib, child.text)
for label in root.iter('label'):
    print(label.tag, label.attrib, label.text)
for title in root.iter('title'):
    print(title.attrib)

我还尝试了以下代码:

with open('Atemplate2.xml') as f:
    tree = ET.parse(f)
    root = tree.getroot()

for elem in root.getiterator():
    try:
        elem.text = elem.text.replace('FEATURE NAME', 'THIS WORKED')
        elem.text = elem.text.replace('FEATURE NUMBER', '12345')
    except AttributeError:
        pass

tree.write('output.xml')

但是会出现以下错误:

File "<pyshell#40>", line 2, in <module>
    tree = ET.parse(f)
File "C:\MyPath\Python35-32\lib\xml\etree\ElementTree.py", line 1182, in parse
    tree.parse(source, parser)
File "C:\ MyPath \Python35-32\lib\xml\etree\ElementTree.py", line 594, in parse
    self._root = parser._parse_whole(source)
File "C:\ MyPath \Python35-32\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' 编解码器无法解码位置 1119 中的字节 0x9d:字符映射到

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1119: character maps to

最终更新 - 这是最终对我有用的代码(谢谢你,Jarad!):

import lxml.etree as ET
#using lxml instead of xml preserved the comments

#adding the encoding when the file is opened and written is needed to avoid a charmap error
with open('filename.xml', encoding="utf8") as f:
  tree = ET.parse(f)
  root = tree.getroot()


  for elem in root.getiterator():
    try:
      elem.text = elem.text.replace('FEATURE NAME', 'THIS WORKED')
      elem.text = elem.text.replace('FEATURE NUMBER', '123456')
    except AttributeError:
      pass

#tree.write('output.xml', encoding="utf8")
# Adding the xml_declaration and method helped keep the header info at the top of the file.
tree.write('output.xml', xml_declaration=True, method='xml', encoding="utf8")

推荐答案

注意事项:

  • 我从未使用过 xml.etree.ElementTree
  • 我从未使用过它,因为我从未发现自己在操纵 XML
  • 与熟悉图书馆内外的人相比,我不知道这是否是最佳"方式
  • 评论员似乎开始评判你而不是帮助你

这是对这个优秀答案的修改.问题是,您需要读入 XML 文件并对其进行解析.

This is a modification from this excellent answer. The thing is, you need to read the XML file in and parse it.

import xml.etree.ElementTree as ET

with open('xmlfile.xml', encoding='latin-1') as f:
  tree = ET.parse(f)
  root = tree.getroot()

  for elem in root.getiterator():
    try:
      elem.text = elem.text.replace('FEATURE NAME', 'THIS WORKED')
      elem.text = elem.text.replace('FEATURE NUMBER', '123456')
    except AttributeError:
      pass

tree.write('output.xml', encoding='latin-1')

请注意,您可以将 encoding 参数更改为其他内容,例如:utf-8cp1252ISO-8859-1 等.确实取决于您的系统和文件.

Note that you can change the encoding parameter to something else such as: utf-8, cp1252, ISO-8859-1, etc. Really depends on your system and file.

这篇关于如何使用 Python 在 XML 文件中搜索和替换文本?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆