如何重写此函数以实现 OrderedDict? [英] How can this function be rewritten to implement OrderedDict?

查看:21
本文介绍了如何重写此函数以实现 OrderedDict?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下函数可以将 XML 文件解析为字典.

I have the following function which does a crude job of parsing an XML file into a dictionary.

不幸的是,由于 Python 词典没有排序,我无法按照自己的意愿循环浏览节点.

Unfortunately, since Python dictionaries are not ordered, I am unable to cycle through the nodes as I would like.

如何更改它以输出一个有序字典,该字典反映了使用 for 循环时节点的原始顺序.

How do I change this so it outputs an ordered dictionary which reflects the original order of the nodes when looped with for.

def simplexml_load_file(file):
    import collections
    from lxml import etree

    tree = etree.parse(file)
    root = tree.getroot()

    def xml_to_item(el):
        item = None
        if el.text:
            item = el.text
        child_dicts = collections.defaultdict(list)
        for child in el.getchildren():
            child_dicts[child.tag].append(xml_to_item(child))
        return dict(child_dicts) or item

    def xml_to_dict(el):
        return {el.tag: xml_to_item(el)}

    return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')

print x

for y in x['root']:
    print y

输出:

{'root': {
    'a': ['1'],
    'aa': [{'b': [{'c': ['2']}, '2']}],
    'aaaa': [{'bb': ['4']}],
    'aaa': ['3'],
    'aaaaa': ['5']
}}

a
aa
aaaa
aaa
aaaaa

如何实现 collections.OrderedDict 以确保获得正确的节点顺序?

How can I implement collections.OrderedDict so that I can be sure of getting the correct order of the nodes?

供参考的 XML 文件:

XML file for reference:

<root>
    <a>1</a>
    <aa>
        <b>
            <c>2</c>
        </b>
        <b>2</b>
    </aa>
    <aaa>3</aaa>
    <aaaa>
        <bb>4</bb>
    </aaaa>
    <aaaaa>5</aaaaa>
</root>

推荐答案

您可以使用新的 OrderedDictdict 子类,在 2.7 版中添加到标准库的 collections 模块.实际上你需要的是一个不存在的 Ordered+defaultdict 组合——但是可以通过子类化 OrderedDict 来创建一个组合,如下图所示:

You could use the new OrderedDictdict subclass which was added to the standard library's collections module in version 2.7. Actually what you need is an Ordered+defaultdict combination which doesn't exist — but it's possible to create one by subclassing OrderedDict as illustrated below:

如果您的 Python 版本没有 OrderedDict,您应该可以使用 Raymond Hettinger 的 Py2.4 的有序字典 ActiveState recipe 作为基类.

If your version of Python doesn't have OrderedDict, you should be able use Raymond Hettinger's Ordered Dictionary for Py2.4 ActiveState recipe as the base class instead.

import collections

class OrderedDefaultdict(collections.OrderedDict):
    """ A defaultdict with OrderedDict as its base class. """

    def __init__(self, default_factory=None, *args, **kwargs):
        if not (default_factory is None or callable(default_factory)):
            raise TypeError('first argument must be callable or None')
        super(OrderedDefaultdict, self).__init__(*args, **kwargs)
        self.default_factory = default_factory  # called by __missing__()

    def __missing__(self, key):
        if self.default_factory is None:
            raise KeyError(key,)
        self[key] = value = self.default_factory()
        return value

    def __reduce__(self):  # Optional, for pickle support.
        args = (self.default_factory,) if self.default_factory else tuple()
        return self.__class__, args, None, None, iter(self.items())

    def __repr__(self):  # Optional.
        return '%s(%r, %r)' % (self.__class__.__name__, self.default_factory, self.items())

def simplexml_load_file(file):
    from lxml import etree

    tree = etree.parse(file)
    root = tree.getroot()

    def xml_to_item(el):
        item = el.text or None
        child_dicts = OrderedDefaultdict(list)
        for child in el.getchildren():
            child_dicts[child.tag].append(xml_to_item(child))
        return collections.OrderedDict(child_dicts) or item

    def xml_to_dict(el):
        return {el.tag: xml_to_item(el)}

    return xml_to_dict(root)

x = simplexml_load_file('routines/test.xml')
print(x)

for y in x['root']:
    print(y)

从您的测试 XML 文件生成的输出如下所示:

The output produced from your test XML file looks like this:

{'root':
    OrderedDict(
        [('a', ['1']),
         ('aa', [OrderedDict([('b', [OrderedDict([('c', ['2'])]), '2'])])]),
         ('aaa', ['3']),
         ('aaaa', [OrderedDict([('bb', ['4'])])]),
         ('aaaaa', ['5'])
        ]
    )
}

a
aa
aaa
aaaa
aaaaa

我认为这与您想要的很接近.

Which I think is close to what you want.

小更新:

添加了一个 __reduce__() 方法,该方法将允许类的实例被正确地pickle 和unpickled.对于这个问题,这不是必需的,但出现在类似问题中.

Added a __reduce__() method which will allow the instances of the class to be pickled and unpickled properly. This wasn't necessary for this question, but came up in a similar one.

这篇关于如何重写此函数以实现 OrderedDict?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆