通过按属性值匹配元素来合并两个XML文件 [英] Merge two XML files by matching elements by attribute value

查看:54
本文介绍了通过按属性值匹配元素来合并两个XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个要合并的XML文件.我看了以前的其他问题,但我觉得我无法通过阅读解决这些问题.我认为使我的情况与众不同的是,我必须按属性值查找元素,然后合并到相反的文件中.

I have two XML files that I'm trying to merge. I looked at other previous questions, but I don't feel like I can solve my problem from reading those. What I think makes my situation unique is that I have to find elements by attribute value and then merge to the opposite file.

我有两个文件.一个是英语翻译目录,第二个是日语翻译目录.请参阅下面的内容.

I have two files. One is an English translation catalog and the second is a Japanese translation catalog. Pleas see below.

在下面的代码中,您将看到XML具有三个要合并子元素的元素-MessageCatalogueEntry,MessageCatalogueFormEntry和MessageCatalogueFormItemEntry.我有数百个文件,每个文件有数千行.可能有比我刚刚列出的三个元素更多的元素,但是我确定所有元素都具有"key"属性.

In the code below you'll see the XML has three elements which I will be merging children on - MessageCatalogueEntry, MessageCatalogueFormEntry, and MessageCatalogueFormItemEntry. I have hundreds of files and each file has thousands of lines. There may be more elements than the three I just listed, but I know for sure that all the elements have a "key" attribute.

我的计划:

  • 遍历文件1并创建"key"属性的所有值的列表.
    • 在此示例中,列表将为key_values = [321, 260, 320]
    • Iterate through File 1 and create a list of all the values of the "key" attribute.
      • In this example, the list would be key_values = [321, 260, 320]

      文件1:

      <?xml version="1.0" encoding="utf-8"?>
      <!DOCTYPE MessageCatalogue []>
      <PackageEntry>
          <MessageCatalogue designNotes="Undefined" isPrivate="false" lastKey="362" name="AddKMRichSearchEngineAdmin_AutoTranslationCatalogue" nested="false" version="3.12.0">
            <MessageCatalogueEntry key="321">
              <MessageCatalogueEntry_loc locale="" message="active"/>
            </MessageCatalogueEntry>
            <MessageCatalogueFormEntry key="260">
              <MessageCatalogueFormEntry_loc locale="" shortTitle="Configuration" title="Spider Configuration"/>
            </MessageCatalogueFormEntry>
            <MessageCatalogueFormItemEntry key="320">
              <MessageCatalogueFormItemEntry_loc hintText="" label="Manage Recognised Phrases" locale="" mnemonic="" scriptText=""/>
            </MessageCatalogueFormItemEntry>
          </MessageCatalogue>
        </PackageEntry>
      

      文件2:

      <?xml version="1.0" encoding="utf-8"?>
      <!DOCTYPE MessageCatalogue[]>
      <PackageEntry>
        <MessageCatalogue designNotes="Undefined" isPrivate="false" lastKey="362" name="" nested="false" version="3.12.0">
          <MessageCatalogueEntry key="321">
            <MessageCatalogueEntry_loc locale="ja" message="アクティブ" />
          </MessageCatalogueEntry>
          <MessageCatalogueFormEntry key="260">
            <MessageCatalogueFormEntry_loc locale="ja" shortTitle="設定" title="Spider Configuration/スパイダー設定" />
          </MessageCatalogueFormEntry>
          <MessageCatalogueFormItemEntry key="320">
            <MessageCatalogueFormItemEntry_loc hintText="" label="認識されたフレーズを管理" locale="ja" mnemonic="" scriptText="" />
          </MessageCatalogueFormItemEntry>
        </MessageCatalogue>
      </PackageEntry>
      

      输出:

      <?xml version="1.0" encoding="utf-8"?>
      <!DOCTYPE MessageCatalogue []>
      <PackageEntry>
          <MessageCatalogue designNotes="Undefined" isPrivate="false" lastKey="362" name="AddKMRichSearchEngineAdmin_AutoTranslationCatalogue" nested="false" version="3.12.0">
            <MessageCatalogueEntry key="321">
              <MessageCatalogueEntry_loc locale="" message="active"/>
              <MessageCatalogueEntry_loc locale="ja" message="アクティブ" />
            </MessageCatalogueEntry>
            <MessageCatalogueFormEntry key="260">
              <MessageCatalogueFormEntry_loc locale="" shortTitle="Configuration" title="Spider Configuration"/>
              <MessageCatalogueFormEntry_loc locale="ja" shortTitle="設定" title="Spider Configuration/スパイダー設定" />
            </MessageCatalogueFormEntry>
            <MessageCatalogueFormItemEntry key="320">
              <MessageCatalogueFormItemEntry_loc hintText="" label="Manage Recognised Phrases" locale="" mnemonic="" scriptText=""/>
              <MessageCatalogueFormItemEntry_loc hintText="" label="認識されたフレーズを管理" locale="ja" mnemonic="" scriptText="" />
            </MessageCatalogueFormItemEntry>
          </MessageCatalogue>
        </PackageEntry>
      

      我甚至在抓取元素时都遇到了麻烦,没关系通过键值来抓取它们.例如,我一直在使用elementtree库,并且编写了这段代码,希望只获取MessageCatalogueEntry,但我只得到他们的孩子:

      I'm having trouble just even grabbing elements, nevermind grabbing them by key value. For example, I've been playing with the elementtree library and I wrote this code hoping to get just the MessageCatalogueEntry but I'm only getting their children:

      from xml.etree import ElementTree as et
      
      tree_japanese = et.parse('C:\\blah\\blah\\blah\\AddKMRichSearchEngineAdmin_AutoTranslationCatalogue_JA.xml')
      root_japanese = tree_japanese.getroot()
      MC_japanese =  root_japanese.findall("MessageCatalogue")
      
      for x in MC_japanese:
          messageCatalogueEntry = x.findall("MessageCatalogueEntry")
          for m in messageCatalogueEntry:
              print et.tostring(m[0], encoding='utf8')
      
      tree_english = et.parse('C:\\blah\\blah\\blah\\AddKMRichSearchEngineAdmin\\AddKMRichSearchEngineAdmin_AutoTranslationCatalogue.xml')
      root_english = tree_english.getroot()
      MC_english =  root_english.findall("MessageCatalogue")
      
      for x in MC_english:
          messageCatalogueEntry = x.findall("MessageCatalogueEntry")
          for m in messageCatalogueEntry:
              print et.tostring(m[0], encoding='utf8')
      

      任何帮助将不胜感激.我已经在这里工作了几天,而且比起我刚开始时还差一点!

      Any help would be appreciated. I've been at this for a few work days now and I'm not any closer to finishing than I was when I first started!

      推荐答案

      实际上,您正在获取MessageCatalogEntry.问题出在打印语句中.元素的作用类似于列表,因此m[0]是MessageCatalogEntry的第一个子级.在

      Actually, you are getting the MessageCatalogEntry's. The problem is in the print statement. An element acts like a list, so m[0] is the first child of the MessageCatalogEntry. In

      messageCatalogueEntry = x.findall("MessageCatalogueEntry")
      for m in messageCatalogueEntry:
          print et.tostring(m[0], encoding='utf8')
      

      将打印内容更改为print et.tostring(m, encoding='utf8')以查看正确的元素.

      change the print to print et.tostring(m, encoding='utf8') to see the right element.

      我个人更喜欢lxml而不是elementtree.假设您想通过'key'属性关联条目,则可以使用xpath为其中一个文档建立索引,然后将其拉入其他文档中.

      I personally prefer lxml to elementtree. Assuming you want to associate entries by the 'key' attribute, you could use xpath to index one of the docs and then pull them into other doc.

      import lxml.etree
      
      tree_english = lxml.etree.parse('english.xml')
      tree_japanese = lxml.etree.parse('japanese.xml')
      
      # index the japanese catalog
      j_index = {}
      for catalog in tree_japanese.xpath('MessageCatalogue/*[@key]'):
          j_index[catalog.get('key')] = catalog
      
      # find catalog entries in english and merge the japanese
      for catalog in tree_english.xpath('MessageCatalogue/*[@key]'):
          j_catalog = j_index.get(catalog.get('key'))
          if j_catalog is not None:
              print 'found match'
              for child in j_catalog:
                  print 'add one'
                  catalog.append(child)
      
      print lxml.etree.tostring(tree_english, pretty_print=True, encoding='utf8')
      

      这篇关于通过按属性值匹配元素来合并两个XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆