使用python搜索和替换xml/text文件中的多行 [英] Search and replace multiple lines in xml/text files using python

查看:45
本文介绍了使用python搜索和替换xml/text文件中的多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

---更新3:我已经完成了将所需数据更新到 xml 文件中的脚本,但是正在从写入的文件中删除以下代码.为什么是这样?我怎样才能更换它?

<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type='text/xsl' href='ANZMeta.xsl'?>

当前工作代码(上述问题除外).

导入os、xml、arcpy、shutil从 xml.etree 导入 ElementTree as et路径=os.getcwd()arcpy.env.workspace = 路径FileList = arcpy.ListFeatureClasses()FileCount = len(FileList)区域="_Zone"对于 FileList 中的文件:FileDesc_obj = arcpy.Describe(文件)FileNm=FileDesc_obj.filenewMetaFile=FileNm+"_BaseMetadata.xml"check_meta=os.listdir(路径)如果在 check_meta 中 FileNm+'.xml':Shutil.copy2(FileNm+'.xml', newMetaFile)别的:Shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)树=et.parse(newMetaFile)打印处理:"+str(文件)对于 tree.findall('.//title') 中的节点:node.text = str(FileNm)对于 tree.findall('.//northbc') 中的节点:node.text = str(FileDesc_obj.extent.YMax)对于 tree.findall('.//southbc') 中的节点:node.text = str(FileDesc_obj.extent.YMin)对于 tree.findall('.//westbc') 中的节点:node.text = str(FileDesc_obj.extent.XMin)对于 tree.findall('.//eastbc') 中的节点:node.text = str(FileDesc_obj.extent.XMax)对于 tree.findall('.//native/nondig/formname') 中的节点:node.text = str(os.getcwd()+"\\"+File)对于 tree.findall('.//native/digform/formname') 中的节点:node.text = str(FileDesc_obj.featureType)对于 tree.findall('.//avlform/nondig/formname') 中的节点:node.text = str(FileDesc_obj.extension)对于 tree.findall('.//avlform/digform/formname') 中的节点:node.text = str(float(os.path.getsize(File))/int(1024))+"KB"对于 tree.findall('.//theme') 中的节点:node.text = str(FileDesc_obj.spatialReference.name +" ; EPSG: "+str(FileDesc_obj.spatialReference.factoryCode))打印节点文本投影信息=[]Zone=FileDesc_obj.spatialReference.name如果 str(FileDesc_obj.spatialReference.name) 中的GCS":投影信息=[FileDesc_obj.spatialReference.GCSName, FileDesc_obj.spatialReference.angularUnitName, FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName]打印地理坐标系"别的:投影信息=[FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName, FileDesc_obj.spatialReference.angularUnitName, Zone[Zone.rfind(zone)-3:]]打印投影坐标系"x=0对于 tree.findall('.//spdom') 中的节点:对于 node.findall('.//keyword') 中的 node2:打印 node2.textnode2.text = str(projection_info[x])打印 node2.textx=x+1tree.write(newMetaFile)

---更新 1&2:感谢 Aleyna,我有以下工作的基本代码

导入os、xml、arcpy、shutil从 xml.etree 导入 ElementTree as etCodeString=['northbc','southbc', '']nondig='非数字'路径=os.getcwd()arcpy.env.workspace = 路径xmlfile = path+"\\test.xml"FileList = arcpy.ListFeatureClasses()FileCount = len(FileList)对于 FileList 中的文件:FileDesc_obj = arcpy.Describe(文件)FileNm=FileDesc_obj.filenewMetaFile=FileNm+"_Metadata.xml"Shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)树=et.parse(newMetaFile)对于 tree.findall('.//northbc') 中的节点:node.text = str(FileDesc_obj.extent.YMax)对于 tree.findall('.//southbc') 中的节点:node.text = str(FileDesc_obj.extent.YMin)对于 tree.findall('.//westbc') 中的节点:node.text = str(FileDesc_obj.extent.XMin)对于 tree.findall('.//eastbc') 中的节点:node.text = str(FileDesc_obj.extent.XMax)对于 tree.findall('.//native/nondig/formname') 中的节点:node.text = nondigtree.write(newMetaFile)

问题在于处理像 xml 代码

- <keyword thesaurus="">GDA94</keyword><keyword thesaurus="">GRS80</keyword><keyword thesaurus="">横轴墨卡托</keyword><keyword thesaurus="">Zone 55 (144E - 150E)</keyword></spdom>

由于关键字 thes... 在 中不是唯一的,我们可以按照来自

的值的顺序更新它们吗?

FileDesc_obj.spatialReference.name

<块引用>

u'GCS_GDA_1994'

---原帖---

我正在构建一个程序,用于从我们库中的空间文件生成 xml 元数据文件.我已经创建了脚本来从文件中提取所需的空间和属性数据,并创建一个基于 shp 和文本文件的文件索引,但现在我想将此信息写入基本元数据 xml 文件,该文件通过替换写入 anzlic 标准公共/静态元素所持有的值...

比如我想替换下面的xml代码

8097970<southbc>8078568</southbc>

GeneratedValue_[desc.extent.XMax]/<southbc>GeneratedValue_[desc.extent.XMax] </southbc>

问题在于 和 之间的数字/值显然不会相同.

类似于 <title>、<nondig><formname> 等 xml 标签......在后一个例子中,两个标签必须一起搜索,因为 formname 出现多次(是不是唯一的).

我正在使用 Python 正则表达式手册 [此处][1],

解决方案

使用上面给定的标签:

导入操作系统导入xml从 xml.etree 导入 ElementTree as etpath = r"/your/path/to/xml.file"树 = et.parse(path)对于 tree.findall('.//northbc') 中的节点:node.text = "新值"树写(路径)

此处,XPATH .//northbc 返回 XML 文档中的所有northbc"节点.您可以轻松地根据需要定制代码.

---Update 3: I have got the script to update the required data into the xml files completed but the following code is being dropped from the written file. Why is this? how can I replace it?

<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type='text/xsl' href='ANZMeta.xsl'?>

Current working code (except for issue mentioned above).

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

path=os.getcwd()
arcpy.env.workspace = path

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)
zone="_Zone"

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_BaseMetadata.xml"

    check_meta=os.listdir(path)
    if FileNm+'.xml' in check_meta:
        shutil.copy2(FileNm+'.xml', newMetaFile)
    else:
        shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    print "Processing: "+str(File)

    for node in tree.findall('.//title'):
        node.text = str(FileNm)
    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = str(os.getcwd()+"\\"+File)
    for node in tree.findall('.//native/digform/formname'):
        node.text = str(FileDesc_obj.featureType)
    for node in tree.findall('.//avlform/nondig/formname'):
        node.text = str(FileDesc_obj.extension)
    for node in tree.findall('.//avlform/digform/formname'):
        node.text = str(float(os.path.getsize(File))/int(1024))+" KB"
    for node in tree.findall('.//theme'):
        node.text = str(FileDesc_obj.spatialReference.name +" ; EPSG: "+str(FileDesc_obj.spatialReference.factoryCode))
    print node.text
    projection_info=[]
    Zone=FileDesc_obj.spatialReference.name

    if "GCS" in str(FileDesc_obj.spatialReference.name):
        projection_info=[FileDesc_obj.spatialReference.GCSName, FileDesc_obj.spatialReference.angularUnitName, FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName]
        print "Geographic Coordinate system"
    else:
        projection_info=[FileDesc_obj.spatialReference.datumName, FileDesc_obj.spatialReference.spheroidName, FileDesc_obj.spatialReference.angularUnitName, Zone[Zone.rfind(zone)-3:]]
        print "Projected Coordinate system"
    x=0
    for node in tree.findall('.//spdom'):
        for node2 in node.findall('.//keyword'):
            print node2.text
            node2.text = str(projection_info[x])
            print node2.text
            x=x+1


    tree.write(newMetaFile)

---Update 1&2: Thanks to Aleyna I have the following basic code that works

import os, xml, arcpy, shutil
from xml.etree import ElementTree as et 

CodeString=['northbc','southbc', '<nondig><formname>']

nondig='nondigital'
path=os.getcwd()
arcpy.env.workspace = path
xmlfile = path+"\\test.xml"

FileList = arcpy.ListFeatureClasses()
FileCount = len(FileList)

for File in FileList:
    FileDesc_obj = arcpy.Describe(File)
    FileNm=FileDesc_obj.file
    newMetaFile=FileNm+"_Metadata.xml"
    shutil.copy2('L:\Data_Admin\QA\Metadata_python_toolset\Master_Metadata.xml', newMetaFile)
    tree=et.parse(newMetaFile)

    for node in tree.findall('.//northbc'):
        node.text = str(FileDesc_obj.extent.YMax)
    for node in tree.findall('.//southbc'):
        node.text = str(FileDesc_obj.extent.YMin)
    for node in tree.findall('.//westbc'):
        node.text = str(FileDesc_obj.extent.XMin)
    for node in tree.findall('.//eastbc'):
        node.text = str(FileDesc_obj.extent.XMax)        
    for node in tree.findall('.//native/nondig/formname'):
        node.text = nondig

    tree.write(newMetaFile)

The issue is with dealing with xml code like

- <spdom>
  <keyword thesaurus="">GDA94</keyword> 
  <keyword thesaurus="">GRS80</keyword> 
  <keyword thesaurus="">Transverse Mercator</keyword> 
  <keyword thesaurus="">Zone 55 (144E - 150E)</keyword> 
  </spdom>

As keyword thes...is not unique within the <spdom> can we update these in a order from the values coming from

FileDesc_obj.spatialReference.name

u'GCS_GDA_1994'

---ORIGINAL POST---

I am building up a program to generate xml metadata files from spatial files in our library. I have already created the scripts to extract the required spatial and attrib data from the files and create a shp and text file based index of the files but now I want to write this info to base metadata xml file that is written to anzlic standards by replacing the values held by common/static elements...

So for example I want to replace the following xml code

<northbc>8097970</northbc>
<southbc>8078568</southbc>

with

<northbc> GeneratedValue_[desc.extent.XMax] /<northbc>
<southbc> GeneratedValue_[desc.extent.XMax] </southbc>

The issue is that obviously the number/value between and will not be the same.

Similarly for xml tags like <title>, <nondig><formname> etc...in the latter example both tags must be searched for together as formname appears multiple times (is not unique).

I am using the Python Regular Expression manual [here][1],

解决方案

Using the given tag(s) above:

import os
import xml
from xml.etree import ElementTree as et 
path = r"/your/path/to/xml.file" 
tree = et.parse(path)
for node in tree.findall('.//northbc'):
    node.text = "New Value"
tree.write(path)

Here, XPATH .//northbc returns all the 'northbc' nodes in the XML doc. You can tailor the code for your need easily.

这篇关于使用python搜索和替换xml/text文件中的多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆