Python将XML兄弟姐妹放入字典 [英] Python get XML siblings into dictionary

查看:69
本文介绍了Python将XML兄弟姐妹放入字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的xml:

I have an xml that looks like this:

<root>
    <G>
        <G1>1</G1>
        <G2>some text</G2>
        <G3>some text</G3>
        <GP>
            <GP1>1</GP1>
            <GP2>a</GP2>
            <GP3>a</GP3>
        </GP>
        <GP>
            <GP1>2</GP1>
            <GP2>b</GP2>
            <GP3>b</GP3>
        </GP>
        <GP>
            <GP1>3</GP1>
            <GP2>c</GP2>
            <GP3>c</GP3>
        </GP>
    </G>
    <G>
        <G1>2</G1>
        <G2>some text</G2>
        <G3>some text</G3>
        <GP>
            <GP1>1</GP1>
            <GP2>aa</GP2>
            <GP3>aa</GP3>
        </GP>
        <GP>
            <GP1>2</GP1>
            <GP2>bb</GP2>
            <GP3>bb</GP3>
        </GP>
        <GP>
            <GP1>3</GP1>
            <GP2>cc</GP2>
            <GP3>cc</GP3>
        </GP>
    </G>
    <G>
        <G1>3</G1>
        <G2>some text</G2>
        <G3>some text</G3>
        <GP>
            <GP1>1</GP1>
            <GP2>aaa</GP2>
            <GP3>aaa</GP3>
        </GP>
        <GP>
            <GP1>2</GP1>
            <GP2>bbb</GP2>
            <GP3>bbb</GP3>
        </GP>
        <GP>
            <GP1>3</GP1>
            <GP2>ccc</GP2>
            <GP3>ccc</GP3>
        </GP>
    </G>
</root>

我正在尝试将此xml转换为称为 G的嵌套字典:

Im trying to transform this xml into a nested dictionary called "G":

{ 1: {G1: 1,
      G2: some text,
      G3: some text,
      GP: { 1: {GP1: 1,
                GP2: a,
                GP3: a},
            2: {GP1: 2,
                GP2: b,
                GP3: b},
            3: {GP1: 3,
                GP2: c,
                GP3: c}}
      },
  2: {G1: 2,
      G2: some text,
      G3: some text,
      GP: { 1: {GP1: 1,
                GP2: aa,
                GP3: aa},
            2: {GP1: 2,
                GP2: bb,
                GP3: bb},
            3: {GP1: 3,
                GP2: cc,
                GP3: cc}}
      },
  3: {G1: 3,
      G2: some text,
      G3: some text,
               GP: { 1: {GP1: 1,
                GP2: a,
                GP3: a},
            2: {GP1: 2,
                GP2: bbb,
                GP3: bbb},
            3: {GP1: 3,
                GP2: ccc,
                GP3: ccc}}
      }
    }

我的代码可以很好地获取所有正好在 G下,所以G1,G2等,但是对于GP,我要么只获得一个记录,要么得到所有记录,但是它重复了同一件事两次,或者我在一个GP下获得了所有9个GP元素在字典中。这是我的代码:

My code works fine to get all elements that are straight under "G", so G1, G2 etc, but for GP I either only just get one record, either I get all of them but it duplicates the same thing couple of times either I get all 9 GP elements under one single GP in the dictionary. Here is my code:

    f = 'path to file'
    tree = ET.parse(f)
    root = tree.getroot()
    self.tree = tree
    self.root = root
    gs = len(self.tree.getiterator('G'))
    g = {}
    for i in range(0, gs):
        d = {}
        for elem in self.tree.getiterator('G')[i]:
            if elem.text == "\n      " and elem.tag not in ['GP']:
                    dd = {}
                    for parent in elem:
                        if parent.text == "\n        ":
                            ddd = {}
                            for child in parent:
                                ddd[child.tag] = child.text
                            dd[parent.tag] = ddd
                        else:
                            dd[parent.tag] = parent.text
                    d[elem.tag] = dd
            else:
                d[elem.tag] = elem.text
        g[i+1] = d

    # Build GP
    count = 0
    gp = {}
    for elem in self.tree.getiterator('GP'):
        d = {}
        for parent in elem:
            if parent.text == "\n      ":
                dd = {}
                for child in parent:
                    dd[child.tag] = child.text
                d[parent.tag] = dd
            else:
                d[parent.tag] = parent.text
        count += 1
        gp[count] = d
    g["GP"] = gp


推荐答案

code.py

#!/usr/bin/env python3

import sys
import xml.etree.ElementTree as ET
from pprint import pprint as pp


FILE_NAME = "data.xml"


def convert_node(node, depth_level=0):
    #print("  " * depth_level + node.tag)
    child_nodes = list(node)
    if not child_nodes:
        return (node.text or "").strip()
    ret_dict = dict()
    child_node_tags = [item.tag for item in child_nodes]
    child_index = 0
    for child_node in child_nodes:
        tag = child_node.tag
        if child_node_tags.count(tag) > 1:
            sub_obj_dict = ret_dict.get(tag, dict())
            child_index += 1
            sub_obj_dict[str(child_index)] = convert_node(child_node, depth_level=depth_level + 1)
            ret_dict[tag] = sub_obj_dict
        else:
            ret_dict[tag] = convert_node(child_node, depth_level=depth_level + 1)
    return ret_dict


def main():
    tree = ET.parse(FILE_NAME)
    root_node = tree.getroot()
    converted_xml = convert_node(root_node)
    print("\nResulting dict(s):\n")
    for key in converted_xml: # converted_xml should be a dictionary having only one key (in our case "G" - we only care about its value, to match the required output)
        pp(converted_xml[key])


if __name__ == "__main__":
    print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
    main()

注释


  • FILE_NAME 包含包含输入 xml 的文件名。随意更改它,以匹配您的

  • 转换发生在 convert_node 中。这是一个递归函数,它在每个 xml 节点上调用,并返回一个 Python 字典(或字符串)。算法:


    • 对于每个节点,获取其(直接)子代的列表。如果该节点没有任何节点(它是一个 leaf 节点-如 G# GP#节点),它将返回其文本

    • 如果该节点有多个带有特定标签的子节点,则其内容将添加到代表其索引的键下(例如 G GP 节点),在当前词典的子词典中,与子标签键相对应

    • 所有具有唯一标签的子级将其内容直接置于与其标签相同的键下在当前字典下

    • depth_level 不使用(可以将其删除),我用它来打印 xml 节点标签树形这是 xml 树中的深度( root -0, G -1, G# GP -2, GP#-3,...)

    • FILE_NAME contains the file name that contains the input xml. Feel free to change it, in order to match yours
    • The conversion happens in convert_node. It's a recursive function that it's called upon each xml node and returns a Python dictionary (or a string). The algorithm:
      • For each node, get a list of its (direct) children. If the node hasn't any (it's a leaf node - like G# or GP# nodes), it will return its text
      • If the node has more than one child with a specific tag, then its content will be added under a key representing its index (like G or GP nodes), in a sub dictionary of the current dictionary corresponding to the the child tag key
      • All the children with unique tags will have their content placed under a key equal to their tag directly under the current dictionary
      • depth_level is not used (you can remove it), I used it to print the xml node tags in a tree form; it's the depth in the xml tree (root - 0, G - 1, G#, GP - 2, GP# - 3, ...)

      • 常规:请注意,没有没有硬编码的键名

      • Scalable :如果某个时候 xml 变得复杂(例如在 GP 节点下,将有一个 GP D 节点,并且该节点也将具有子节点-基本上 xml 将获得一个更高的深度级别),代码将对其进行处理无需更改

      • Python 3 Python 2 兼容

      • General: notice there are no hardcoded key names
      • Scalable: if at some point the xml will become ore complex (e.g. under a GP node there will be a GPD node let's say, and that node will have subnodes as well - basically the xml will gain one more depth level), the code will handle it without change
      • Python 3 and Python 2 compatible

      输出


      (py_064_03.05.04_test0) e:\Work\Dev\StackOverflow\q045799991>"e:\Work\Dev\VEnvs\py_064_03.05.04_test0\Scripts\python.exe" code.py
      Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
      
      Resulting dict(s):
      
      {'1': {'G1': '1',
             'G2': 'some text',
             'G3': 'some text',
             'GP': {'1': {'GP1': '1', 'GP2': 'a', 'GP3': 'a'},
                    '2': {'GP1': '2', 'GP2': 'b', 'GP3': 'b'},
                    '3': {'GP1': '3', 'GP2': 'c', 'GP3': 'c'}}},
       '2': {'G1': '2',
             'G2': 'some text',
             'G3': 'some text',
             'GP': {'1': {'GP1': '1', 'GP2': 'aa', 'GP3': 'aa'},
                    '2': {'GP1': '2', 'GP2': 'bb', 'GP3': 'bb'},
                    '3': {'GP1': '3', 'GP2': 'cc', 'GP3': 'cc'}}},
       '3': {'G1': '3',
             'G2': 'some text',
             'G3': 'some text',
             'GP': {'1': {'GP1': '1', 'GP2': 'aaa', 'GP3': 'aaa'},
                    '2': {'GP1': '2', 'GP2': 'bbb', 'GP3': 'bbb'},
                    '3': {'GP1': '3', 'GP2': 'ccc', 'GP3': 'ccc'}}}}
      


      这篇关于Python将XML兄弟姐妹放入字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆