通过嵌套dict中的键路径映射功能,包括切片,通配符和参差不齐的层次结构 [英] Map a function by key path in nested dict including slices, wildcards and ragged hierarchies

查看:62
本文介绍了通过嵌套dict中的键路径映射功能,包括切片,通配符和参差不齐的层次结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题是基于此处.

在嵌套字典中将函数映射到指定键路径的好方法是什么,包括以下路径说明:

What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:

  1. 给定路径位置的键列表
  2. 键片(假设排序)
  3. 通配符(即路径位置上的所有键)
  4. 通过忽略未出现在给定级别的键来处理参差不齐的层次结构

如果它比较简单,则可以假定仅嵌套字典,不包含字典列表,因为可以使用dict(enumerate(...))获得前者.

If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...)).

但是,层次结构可能参差不齐,例如:

However, the hierarchy can be ragged, eg:

data = {0: {'a': 1, 'b': 2},
 1: {'a': 10, 'c': 13},
 2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},
 3: {'a': 30, 'b': 31, 'c': {'d': 300}}}

希望能够这样指定键路径:

Would like to be able to specify key path like this:

map_at(f, ['*',['b','c'],'d'])

要返回:

{0: {'a': 1, 'b': 2},
     1: {'a': 10, 'c': 13},
     2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},
     3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}

此处f映射到键路径[2,b,d][3,c,d].

Here f is mapped to key paths [2,b,d] and [3,c,d].

例如,将切片指定为[0:3,b].

我认为路径规范是明确的,尽管可以概括为例如匹配键路径前缀(在这种情况下,f也将映射到[0,b]`和其他路径).

I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f would also be mapped at [0,b]` and other paths).

这可以通过理解和递归来实现吗,还是需要大量的工作才能捕捉到KeyError等?

Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError etc?

请不要建议使用熊猫作为替代品.

Please do not suggest Pandas as an alternative.

推荐答案

我不是伪代码的忠实拥护者,但是在这种情况下,您需要写下一个算法.这是我对您的要求的理解:

I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:

map_at(func, path_pattern, data):

  1. 如果path_pattern不为空
    • 如果data是终端,则失败:我们没有匹配完整的path_pattern ̀,因此没有理由应用该功能.只需返回data.
    • 否则,我们必须探索数据中的每条路径.如果可能,我们消耗path_pattern的头部.那会返回一个dict data key-> map_at(func, new_path, data value),如果键与head匹配,则new_pathpath_patterntail,否则是path_pattern本身.
  1. if path_pattern is not empty
    • if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.
    • else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.
  • 如果data是终端,则返回func(data)
  • 否则,找到叶子并应用func:return返回一则字典data key-> map_at(func, [], data value)
  • if data is terminal, return func(data)
  • else, find the leaves and apply func: return return a dict data key -> map_at(func, [], data value)

注意:

  • 我假定模式*-b-d与路径0-a-b-c-d-e相匹配;
  • 这是一个渴望的算法:在可能的情况下,始终消耗路径的开头;
  • 如果路径已被完全消耗,则应映射每个终端;
  • 这是一个简单的DFS,因此我想可以用堆栈编写一个迭代版本.
  • I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;
  • it's an eager algorithm: the head of the path is always consumed when possible;
  • if the path is fully consumed, every terminal should be mapped;
  • it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.

代码如下:

def map_at(func, path_pattern, data):
    def matches(pattern, value):
        try:
            return pattern == '*' or value == pattern or value in pattern
        except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list. 
            return False

    if path_pattern:
        head, *tail = path_pattern
        try: # try to consume head for each key of data
            return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
        except AttributeError: # fail: terminal data but path_pattern was not consumed
            return data
    else: # success: path_pattern is empty.
        try: # not a leaf: map every leaf of every path
            return {k: map_at(func, [], v) for k,v in data.items()}
        except AttributeError: # a leaf: map it
            return func(data)

请注意,tail if matches(head, k) else path_pattern表示:尽可能消耗head.要在模式中使用范围,只需使用range(...).

Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).

如您所见,您永远不会从情况2中逃脱:如果path_pattern为空,那么无论发生什么情况,您都必须映射所有叶子.在此版本中,这一点更加清楚:

As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:

def map_all_leaves(func, data):
    """Apply func to all leaves"""
    try:
        return {k: map_all_leaves(func, v) for k,v in data.items()}
    except AttributeError:
        return func(data)

def map_at(func, path_pattern, data):
    def matches(pattern, value):
        try:
            return pattern == '*' or value == pattern or value in pattern
        except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list. 
            return False

    if path_pattern:
        head, *tail = path_pattern
        try: # try to consume head for each key of data
            return {k: map_at(func, tail if matches(head, k) else  path_pattern, v) for k,v in data.items()}
        except AttributeError: # fail: terminal data but path_pattern is not consumed
            return data
    else:
        map_all_leaves(func, data)

编辑

如果要处理列表,可以尝试以下操作:

If you want to handle lists, you can try this:

def map_at(func, path_pattern, data):
    def matches(pattern, value):
        try:
            return pattern == '*' or value == pattern or value in pattern
        except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list. 
            return False

    def get_items(data):
        try:
            return data.items()
        except AttributeError:
            try:
                return enumerate(data)
            except TypeError:
                raise

    if path_pattern:
        head, *tail = path_pattern
        try: # try to consume head for each key of data
            return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in get_items(data)}
        except TypeError: # fail: terminal data but path_pattern was not consumed
            return data
    else: # success: path_pattern is empty.
        try: # not a leaf: map every leaf of every path
            return {k: map_at(func, [], v) for k,v in get_items(data)}
        except TypeError: # a leaf: map it
            return func(data)

这个想法很简单:enumerate等同于dict.items的列表:

The idea is simple: enumerate is the equivalent for a list of dict.items:

>>> list(enumerate(['a', 'b']))
[(0, 'a'), (1, 'b')]
>>> list({0:'a', 1:'b'}.items())
[(0, 'a'), (1, 'b')]

因此,get_items只是返回dict项目,列表项目(索引,值)或引发错误的包装器.

Hence, get_items is just a wrapper to return the dict items, the list items (index, value) or raise an error.

缺陷在于列表在此过程中会转换为字典:

The flaw is that lists are converted to dicts in the process:

>>> data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}]
>>> map_at(type,['*',['b','c'],'d'],data2)
{0: {'a': 1, 'b': 2}, 1: {'a': 10, 'c': 13}, 2: {'a': 20, 'b': {'d': <class 'int'>, 'e': 101}, 'c': 23}, 3: {'a': 30, 'b': 31, 'c': {'d': <class 'int'>}}}

编辑

由于您正在寻找类似Xpath的JSON,因此可以尝试 https://pypi.org/project/jsonpath/ https://pypi.org/project/jsonpath-rw/. (我没有测试那些库).

Since you are looking for something like Xpath for JSON, you could try https://pypi.org/project/jsonpath/ or https://pypi.org/project/jsonpath-rw/. (I did not test those libs).

这篇关于通过嵌套dict中的键路径映射功能,包括切片,通配符和参差不齐的层次结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆