如何使用ElementTree在具有名称空间的XML文件中查找和编辑标签 [英] how to find and edit tags in XML files with namespaces using ElementTree

查看:65
本文介绍了如何使用ElementTree在具有名称空间的XML文件中查找和编辑标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在我的XML文档中找到特定的标签并编辑它们的文本或属性.我的XML文件包含名称空间(据我所知,它是嵌套的名称空间).我要用于此目的的工具是ElementTree.我设法通过 iterparse 读取了XML文件,但是我不知道如何保存已编辑的XML,因为 iterparse 没有 write 元素.我需要一种解决方案,通过 parse 读取XML文件,并剥离其名称空间和嵌套名称空间,以保存迭代文件的方式.

I would like to find specific tags in my XML document and edit their text or attributes. My XML file contains namespaces (and as I understand it correctly, nested namespaces). The tool I'd like to use for this purpose is ElementTree. I managed to read XML file by iterparse, however I don't know how I can save edited XML, because iterparse doesn't have write element. I need a solution to read XML file by parse and strip its namespaces and nested namespaces or a way to save iterparsed file.

在这种情况下,让我们编辑评分"标签文字.

For this case, let's edit the "Rating" tag text.

it = ET.iterparse(adiPath)
    for _, el in it:
        if '}' in el.tag:
            el.tag = el.tag.split('}', 1)[1]  # strip all namespaces
        for at in list(el.attrib): # strip namespaces of attributes too
            if '}' in at:
                newat = at.split('}', 1)[1]
                el.attrib[newat] = el.attrib[at]
                del el.attrib[at]
    root = it.root

    # Search Rating tag and edit it's value
    for rating in root.iter('Rating'):
        print(rating.text) # Prints 18
        rating.text = "999"
        print(rating.text) # Prints 999

但是在这种情况下,XML文件保持不变.

However in this case XML file remains unchanged.

这是XML文件:

<?xml version="1.0" encoding="utf-8"?>
<ADI3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:content="urn:cablelabs:md:xsd:content:3.0" xmlns:core="urn:cablelabs:md:xsd:core:3.0" xmlns:offer="urn:cablelabs:md:xsd:offer:3.0" xmlns:terms="urn:cablelabs:md:xsd:terms:3.0" xmlns:title="urn:cablelabs:md:xsd:title:3.0" xmlns:adb="urn:adb:md:xsd:adb:01" xmlns:schemaLocation="urn:adb:md:xsd:adb:01 ADB-EXT-C01.xsd urn:cablelabs:md:xsd:core:3.0 MD-SP-CORE-C01.xsd urn:cablelabs:md:xsd:content:3.0 MD-SP-CONTENT-C01.xsd urn:cablelabs:md:xsd:offer:3.0 MD-SP-OFFER-C01.xsd urn:cablelabs:md:xsd:terms:3.0 MD-SP-TERMS-C01.xsd urn:cablelabs:md:xsd:title:3.0 MD-SP-TITLE-C01.xsd" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns="urn:cablelabs:md:xsd:core:3.0">
  <Asset xsi:type="title:TitleType" uriId="ab://cc.com" providerVersionNum="1" internalVersionNum="0" creationDateTime="2020-01-28T08:55:19Z" startDateTime="2019-05-20T00:00:00Z" endDateTime="2028-08-20T23:59:00Z">
    <AlternateId identifierSystem="VOD1.1">ab://cc.com</AlternateId>
    <Ext>
        <adb:ExtensionType>
            <adb:TitleExt>
                <adb:SeriesInfo episodeNumber="6">
                    <adb:series seriesId="GOT" seasonCount="8"></adb:series>
                    <adb:season seasonId="GOTS08" number="8" episodeCount="6"></adb:season>
                </adb:SeriesInfo>
            </adb:TitleExt>
        </adb:ExtensionType>
    </Ext>
    <title:LocalizableTitle xml:lang="pol">
      <title:TitleLong>Game of Thrones VIII</title:TitleLong>
      <title:SummaryLong>Long summary, long summary, long summary...</title:SummaryLong>
      <title:Actor fullName="Peter Dinklage" firstName="Peter" lastName="Dinklage" />
      <title:Actor fullName="Nikolaj Coster-Waldau" firstName="Nikolaj" lastName="Coster-Waldau" />
      <title:Actor fullName="Emilia Clarke" firstName="Emilia" lastName="Clarke" />
      <title:Actor fullName="Lena Headey" firstName="Lena" lastName="Headey" />
      <title:Director fullName="David Nutter" firstName="David" lastname="Nutter" />
    </title:LocalizableTitle>
    <title:Rating ratingSystem="PL">18</title:Rating>
    <title:Audience>General</title:Audience>
    <title:DisplayRunTime>01:15</title:DisplayRunTime>
    <title:Year>2019</title:Year>
    <title:CountryOfOrigin>US</title:CountryOfOrigin>
    <title:Genre>Film fantasy</title:Genre>
    <title:ShowType>Movie</title:ShowType>
  </Asset>
  <Asset xsi:type="offer:CategoryType" uriId="cc.com/XX">
    <AlternateId identifierSystem="VOD1.1">cc.com/XX</AlternateId>
    <offer:CategoryPath>VOD/GOT/Season 8</offer:CategoryPath>
  </Asset>
  <Asset xsi:type="content:MovieType" uriId="GraoTronVIII_0_1080mp4">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIII_0_1080mp4</AlternateId>
    <content:SourceUrl>GOTS08E06.mp4</content:SourceUrl>
    <content:Resolution>1080p</content:Resolution>
    <content:Duration>PT1H15M20S</content:Duration>
    <content:Language>pol</content:Language>
    <content:Language>eng</content:Language>
  </Asset>
  <Asset xsi:type="content:PreviewType" uriId="GraoTronVIII_1_1080mp4">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIII_1_1080mp4</AlternateId>
    <content:SourceUrl>GOTS08E06_trailer.mp4</content:SourceUrl>
    <content:Resolution>1080p</content:Resolution>
    <content:Duration>PT0H01M48S</content:Duration>
    <content:Language>pol</content:Language>
    <content:Language>eng</content:Language>
  </Asset>
  <Asset xsi:type="content:PosterType" uriId="GraoTronVIIIPoster">
    <AlternateId identifierSystem="VOD1.1">GraoTronVIIIPoster</AlternateId>
    <content:SourceUrl>GOTS08E06.jpg</content:SourceUrl>
    <content:X_Resolution>600</content:X_Resolution>
    <content:Y_Resolution>900</content:Y_Resolution>
    <content:Language>pol</content:Language>
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIII_0_1080mp4" />
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIII_1_1080mp4" />
  </Asset>
  <Asset xsi:type="offer:ContentGroupType" uriId="abc">
    <AlternateId identifierSystem="VOD1.1">abc</AlternateId>
    <offer:TitleRef uriId="abc" />
    <offer:MovieRef uriId="GraoTronVIIIPoster" />
  </Asset>
</ADI3>

推荐答案

我建议不要使用命名空间通配符来剥离命名空间.在python 3.8中添加了对此的支持.

Instead of stripping out the namespaces, I suggest using namespace wildcards. Support for this was added in Python 3.8.

from xml.etree import ElementTree as ET

tree = ET.parse(adiPath)

rating = tree.find(".//{*}Rating")  # Find the Rating element in any namespace
rating.text = "999"

请注意,您必须使用 find()(或 findall()).通配符不适用于 iter().

Note that you have to use find() (or findall()). Wildcards do not work with iter().

以下解决方法可用于在序列化XML文档时保留原始名称空间前缀(另请参见 https://stackoverflow.com/a/42372404/407651 https://stackoverflow.com/a/54491129/407651 ).

The following workaround can be used to preserve the original namespace prefixes when serializing the XML document (see also https://stackoverflow.com/a/42372404/407651 and https://stackoverflow.com/a/54491129/407651).

namespaces = dict([elem for _, elem in ET.iterparse("test1.xml", events=['start-ns'])])
for ns in namespaces:
    ET.register_namespace(ns, namespaces[ns])

这篇关于如何使用ElementTree在具有名称空间的XML文件中查找和编辑标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆