在嵌套字典和列表中查找所有出现的键 [英] Find all occurrences of a key in nested dictionaries and lists

查看:253
本文介绍了在嵌套字典和列表中查找所有出现的键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一本这样的字典:

{ "id" : "abcde",
  "key1" : "blah",
  "key2" : "blah blah",
  "nestedlist" : [ 
    { "id" : "qwerty",
      "nestednestedlist" : [ 
        { "id" : "xyz",
          "keyA" : "blah blah blah" },
        { "id" : "fghi",
          "keyZ" : "blah blah blah" }],
      "anothernestednestedlist" : [ 
        { "id" : "asdf",
          "keyQ" : "blah blah" },
        { "id" : "yuiop",
          "keyW" : "blah" }] } ] } 

基本上是具有嵌套列表,字典和字符串的任意深度的字典.

Basically a dictionary with nested lists, dictionaries, and strings, of arbitrary depth.

遍历此方法以提取每个"id"键的值的最佳方法是什么?我想实现与"//id"之类的XPath查询等效的功能. "id"的值始终是一个字符串.

What is the best way of traversing this to extract the values of every "id" key? I want to achieve the equivalent of an XPath query like "//id". The value of "id" is always a string.

因此,在我的示例中,我需要的输出基本上是:

So from my example, the output I need is basically:

["abcde", "qwerty", "xyz", "fghi", "asdf", "yuiop"]

顺序并不重要.

推荐答案

我发现此问题非常有趣,因为它为同一问题提供了几种不同的解决方案.我采用了所有这些功能,并使用一个复杂的字典对象对其进行了测试.我必须从测试中删除两个函数,因为它们必须有很多失败的结果,并且它们不支持将返回列表或字典作为值,我认为这是必不可少的,因为应该为几乎所有 准备一个函数. >即将到来的数据.

I found this Q/A very interesting, since it provides several different solutions for the same problem. I took all these functions and tested them with a complex dictionary object. I had to take two functions out of the test, because they had to many fail results and they did not support returning lists or dicts as values, which i find essential, since a function should be prepared for almost any data to come.

因此,我通过timeit模块以100.000迭代的速度注入了其他函数,输出结果如下:

So i pumped the other functions in 100.000 iterations through the timeit module and output came to following result:

0.11 usec/pass on gen_dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6.03 usec/pass on find_all_items(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.15 usec/pass on findkeys(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.79 usec/pass on get_recursively(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.14 usec/pass on find(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.36 usec/pass on dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

所有功能都具有相同的搜索针(日志记录")和相同的字典对象,其结构如下:

All functions had the same needle to search for ('logging') and the same dictionary object, which is constructed like this:

o = { 'temparature': '50', 
      'logging': {
        'handlers': {
          'console': {
            'formatter': 'simple', 
            'class': 'logging.StreamHandler', 
            'stream': 'ext://sys.stdout', 
            'level': 'DEBUG'
          }
        },
        'loggers': {
          'simpleExample': {
            'handlers': ['console'], 
            'propagate': 'no', 
            'level': 'INFO'
          },
         'root': {
           'handlers': ['console'], 
           'level': 'DEBUG'
         }
       }, 
       'version': '1', 
       'formatters': {
         'simple': {
           'datefmt': "'%Y-%m-%d %H:%M:%S'", 
           'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
         }
       }
     }, 
     'treatment': {'second': 5, 'last': 4, 'first': 4},   
     'treatment_plan': [[4, 5, 4], [4, 5, 4], [5, 5, 5]]
}

所有功能均提供相同的结果,但时间差异非常大!函数gen_dict_extract(k,o)是我从此处的函数改编的函数,实际上,它非常类似于Alfe的find函数,主要区别在于,如果要传递字符串,我要检查给定的对象是否具有iteritems函数.递归期间:

All functions delivered the same result, but the time differences are dramatic! The function gen_dict_extract(k,o) is my function adapted from the functions here, actually it is pretty much like the find function from Alfe, with the main difference, that i am checking if the given object has iteritems function, in case strings are passed during recursion:

def gen_dict_extract(key, var):
    if hasattr(var,'iteritems'):
        for k, v in var.iteritems():
            if k == key:
                yield v
            if isinstance(v, dict):
                for result in gen_dict_extract(key, v):
                    yield result
            elif isinstance(v, list):
                for d in v:
                    for result in gen_dict_extract(key, d):
                        yield result

因此,此变体是此处功能中最快,最安全的.并且find_all_items的速度慢得令人难以置信,并且与第二慢的get_recursivley相去甚远,而除dict_extract之外的其他速度都非常接近.函数funkeyHole仅在您要查找字符串时起作用.

So this variant is the fastest and safest of the functions here. And find_all_items is incredibly slow and far off the second slowest get_recursivley while the rest, except dict_extract, is close to each other. The functions fun and keyHole only work if you are looking for strings.

有趣的学习方式:)

这篇关于在嵌套字典和列表中查找所有出现的键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆