使用Python和正则表达式编辑本地XML文件 [英] Editing local XML file using Python and Regular expression

查看:91
本文介绍了使用Python和正则表达式编辑本地XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的新手,正在尝试修改本地系统中存在的一些xml配置文件.

I am new to python and trying to modify some xml configuration files which are present in my local system.

输入:我有一个具有以下内容的xml文件(例如Test.xml).

Input: I have an xml file(say Test.xml) with the following content.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <JavaHost xmlns="SomeInfo/v1.1">
        <Domain>
           <MessageProcessor>
              <!-- This comment should not be removed and all formating should be untouched -->
              <SocketTimeout>500</SocketTimeout>
           </MessageProcessor>
            <!-- This comment should not be removed and all formating should be untouched -->
           <Composer>
                <SocketTimeout>5000</SocketTimeout>
                <Enabled>true</Enabled>
           </Composer> 
       </Domain>
    </JavaHost>

我想实现的目标: 我想实现以下两点:

WHAT I WANT TO ACHIEVE: I want to achieve below 2 things:

第1部分: 我想将SocketTimeout标记的值(仅在composer标记下)修改为60,还想添加这样的注释(例如,更改此值以减少SocketTimeout). 因此,文件Test.xml应如下所示:

Part 1: I want to modify value of SocketTimeout tag(only under composer tag) to 60 and also want to add a comment like this (foe e.g. Changed this value to reduce SocketTimeout). Hence the file Test.xml should be as below:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <JavaHost xmlns="SomeInfo/v1.1">
       <MessageProcessor>
          <!-- This comment should not be removed and all formating should be untouched -->
          <SocketTimeout>500</SocketTimeout>
       </MessageProcessor>
        <!-- This comment should not be removed and all formating should be untouched -->
       <Composer>
       <!-- Changed this value to reduce SocketTimeout -->
            <SocketTimeout>60</SocketTimeout>
            <Enabled>true</Enabled>
       </Composer>
   </Domain>
</JavaHost>

第2部分: 在文件Test.xml中,我想在域标签下添加一个新标签,如下所示:

Part 2: In the file Test.xml, I want to add a new tag under Domain tag as below:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <JavaHost xmlns="SomeInfo/v1.1">
       <MessageProcessor>
          <!-- This comment should not be removed and all formating should be untouched -->
          <SocketTimeout>500</SocketTimeout>
       </MessageProcessor>
       <!-- comment should not be removed and all formatting should be untouched -->
       <Composer>
       <!-- Changed this value to reduce SocketTimeout -->
            <SocketTimeout>60</SocketTimeout>
            <Enabled>true</Enabled>
       </Composer>
       <New_tag>
       <!-- New Tag -->
            <Enabled>true</Enabled>
       </New_tag>
   </Domain>
</JavaHost>

这就是我想要的:)

我尝试过的东西:

为了完成这项任务,我在下面的选项中进行了考虑:

To achieve this task I considered below optons:

Minidom/ElementTree/lxml删除文件中的注释,并更改文件的格式.

Minidom/ElementTree/lxml removes comments in the file and also changes the formatting of the file.

正则表达式:不会删除注释,也不会干扰格式. 因此,我选择了正则表达式,以下是我开始使用的内容,但没有用:(

Regex: Doesn’t removes comments, also doesn’t disturb formatting. Hence, I opted for regex and below is what I started with, but is not working :(

import os, re
# set the working directory 
os.chdir('C:\\Users\\Dell\\Desktop\\')

# open the source file and read it
fh = open('C:\\Users\\Dell\\Desktop\\Test.xml', 'r')
subject = fh.read()
fh.close()

pattern = re.compile(r"\[<Composer>\].*?\[/<Composer>\]")
#Replace
result = pattern.sub(lambda match: match.group(0).replace('<SocketTimeout>500</SocketTimeout>','<SocketTimeout>60</SocketTimeout>') ,subject)

# write the file
f_out = open('C:\\Users\\Dell\\Desktop\\Test.xml', 'w')
f_out.write(result)
f_out.close()

实施我想要的任何想法或纠正错误的任何想法都是非常可取的. 尽管我是python的新手,但会尽我所能来处理建议.

Any idea in implementing what I want or rectification in mistakes would be highly appreciable. Although I am new to python but will try my best to work on the suggestions.

推荐答案

并非完全您想要的,但是已经很接近了.一方面,请避免针对xml,html和类似瘟疫的类似处理使用正则表达式.同时,如果在使用lxml之类的产品时偶尔遇到挑战",也不要感到惊讶.

This is not exactly what you wanted but it's close. For one thing, avoid regex for xml, html and similar processing like the plague. At the same time, don't be surprised if you find occasional 'challenges' in using products like lxml.

我认为,这一次,我发现了一个错误.

I think, this time, I found a bug.

from lxml import etree
tree = etree.parse('shivam.xml')
element_to_change = tree.xpath('.//Composer/SocketTimeout')[0]
print(element_to_change)
element_to_change.text='60'
comment_will_follow_this = tree.xpath('.//Composer')[0]
print(comment_will_follow_this)
comment = etree.Comment('This did not work')
comment_will_follow_this.append(comment)

comment = etree.Comment('Changed this value to reduce SocketTimeout')
element_to_change.addprevious(comment)

tree.write('see_it.xml', pretty_print=True)

  • 我使用xpath查找要更改的元素,并使用文件中的位置接收注释.
  • append方法应该将注释或其他元素作为子元素添加到给定元素.但是,我发现在这种情况下,此行不通"注释已添加为前面的元素注释.
  • 但是,我确实发现addprevious能够在所需位置添加注释,但美中不足的是,它无法在注释和下一个xml元素之间放置结尾行.
    • I used xpath to find the element to change, and the places in the file to receive the comments.
    • The append method is supposed to add a comment or other element to a given element as a child. However, I found in this case that the 'This did not work' comment was added as a preceding element comment.
    • However, I did find that addprevious was able to add the comment in the desired location, the fly in the ointment being that it fails to place an end-line between the comment and the next xml element.
    • 这是结果文件.顺便提一句,您会注意到原始注释是完整的.

      Here's the resulting file. Indicidentally, you will note that the original comments are intact.

      <JavaHost>
          <Domain>
             <MessageProcessor>
                <!-- This comment should not be removed and all formating should be untouched -->
                <SocketTimeout>500</SocketTimeout>
             </MessageProcessor>
              <!-- This comment should not be removed and all formating should be untouched -->
             <Composer>
                  <!--Changed this value to reduce SocketTimeout--><SocketTimeout>60</SocketTimeout>
                  <Enabled>true</Enabled>
             <!--This did not work--></Composer> 
         </Domain>
      </JavaHost>
      

      这篇关于使用Python和正则表达式编辑本地XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆