将嵌套对象列表反规范化/展平为点分隔的键值对 [英] Denormalize/flatten list of nested objects into dot separated key value pairs

查看:79
本文介绍了将嵌套对象列表反规范化/展平为点分隔的键值对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我的嵌套对象是字典,那会更简单,但是这些是字典列表. 示例:

It would have simpler if my nested objects were dictionaries, but these are list of dictionaries. Example:

all_objs1 = [{
    'a': 1,
    'b': [{'ba': 2, 'bb': 3}, {'ba': 21, 'bb': 31}],
    'c': 4
}, {
    'a': 11,
    'b': [{'ba': 22, 'bb': 33, 'bc': [{'h': 1, 'e': 2}]}],
    'c': 44
}]

我希望以以下格式输出:

I expect output in following format:

[
  {'a': 1, 'b.ba': 2, 'b.bb': 3, 'c': 4},
  {'a': 1, 'b.ba': 21, 'b.bb': 31, 'c': 4},
  {'a': 11, 'b.ba': 22, 'b.bb': 33, 'bc.h': 1, 'bc.e': 2, 'c': 44},
]

基本上,生成的展平对象的数量将等于(obj *深度)

Basically, number of flattened objects generated will be equal to (obj * depth)

使用我当前的代码:

def flatten(obj, flattened_obj, last_key=''):
  for k,v in obj.iteritems():
    if not isinstance(v, list):
      flattened_obj.update({last_key+k : v})
    else:
      last_key += k + '.'
      for nest_obj in v:
        flatten(nest_obj, flattened_obj, last_key)
        last_key = remove_last_key(last_key)

def remove_last_key(key_path):
    second_dot = key_path[:-1].rfind('.')
    if second_dot > 0:
      return key_path[:second_dot+1]
    return key_path

输出:

[
  {'a': 1, 'b.bb': 31, 'c': 4, 'b.ba': 21},
  {'a': 11, 'b.bc.e': 2, 'c': 44, 'b.bc.h': 1, 'b.bb': 33, 'b.ba': 22}
]


我能够展平对象(虽然不准确),但是我不能在每个嵌套对象上创建一个新对象. 我的应用程序已部署在应用程序引擎上,因此我无法使用pandas库.


I am able to flatten the object (not accurate though), but I am not able to create a new object at each nested object. I can not use pandas library as my app is deployed on app engine.

推荐答案

code.py :

from itertools import product
from pprint import pprint as pp


all_objs = [{
    "a": 1,
    "b": [{"ba": 2, "bb": 3}, {"ba": 21, "bb": 31}],
    "c": 4,
    #"d": [{"da": 2}, {"da": 5}],
}, {
    "a": 11,
    "b": [{"ba": 22, "bb": 33, "bc": [{"h": 1, "e": 2}]}],
    "c": 44,
}]


def flatten_dict(obj, parent_key=None):
    base_dict = dict()
    complex_items = list()
    very_complex_items = list()
    for key, val in obj.items():
        new_key = ".".join((parent_key, key)) if parent_key is not None else key
        if isinstance(val, list):
            if len(val) > 1:
                very_complex_items.append((key, val))
            else:
                complex_items.append((key, val))
        else:
            base_dict[new_key] = val
    if not complex_items and not very_complex_items:
        return [base_dict]
    base_dicts = list()
    partial_dicts = list()
    for key, val in complex_items:
        partial_dicts.append(flatten_dict(val[0], parent_key=new_key))
    for product_tuple in product(*tuple(partial_dicts)):
        new_base_dict = base_dict.copy()
        for new_dict in product_tuple:
            new_base_dict.update(new_dict)
        base_dicts.append(new_base_dict)
    if not very_complex_items:
        return base_dicts
    ret = list()
    very_complex_keys = [item[0] for item in very_complex_items]
    very_complex_vals = tuple([item[1] for item in very_complex_items])
    for product_tuple in product(*very_complex_vals):
        for base_dict in base_dicts:
            new_dict = base_dict.copy()
            new_items = zip(very_complex_keys, product_tuple)
            for key, val in new_items:
                new_key = ".".join((parent_key, key)) if parent_key is not None else key
                new_dict.update(flatten_dict(val, parent_key=new_key)[0])
            ret.append(new_dict)
    return ret


def main():
    flatten = list()
    for obj in all_objs:
        flatten.extend(flatten_dict(obj))
    pp(flatten)


if __name__ == "__main__":
    main()

注释:

  • 按预期,使用了递归
  • 一般来说,它也适用于我在2 nd 注释中提到的情况(对于一个输入dict,它具有多个键,且其值包含一个具有多个元素的列表) ,可以通过分解 all_objs 中的 "d" 键进行测试.而且,理论上它应该支持任何深度
  • flatten_dict :获取输入字典并输出字典列表(因为输入字典可能会产生多个输出字典):
    • 每个具有简单"(不是列表)值的键都不变地进入输出字典(y/ies)
    • 至此,一个 base 输出字典已完成(如果输入字典生成的内容多于输出字典,则所有字典都将具有 base 字典键/值.它只会生成一个输出字典,然后是 base 一个)
    • 接下来,处理具有问题"值的键(可能会产生比输出字典更多的值)(如果有的话):
      • 具有包含单个元素的列表的键(有问题的")-每个可能会生成多个输出字典:
        • 每个值都将被展平(可能会产生多个输出字典);相应的密钥将在该过程中使用
        • 然后,将在所有拼合字典列表上计算笛卡尔积(对于当前输入,将只有一个包含一个元素的列表)
        • 现在,每个产品项都必须位于与众不同的输出字典中,因此将复制 base 字典并使用每个产品项中的元素(对于当前输入,每个产品项只有一个元素)
        • 新词典添加到列表中
        • As expected, recursion is used
        • It's general, it also works for the case that I mentioned in my 2nd comment (for one input dict having more than one key with a value consisting of a list with more than one element), that can be tested by decommenting the "d" key in all_objs. Also, theoretically it should support any depth
        • flatten_dict: takes an input dictionary and outputs a list of dictionaries (as the input dictionary might yield more than one output dictionary):
          • Every key having a "simple" (not list) value, goes into the output dictionar(y/ies) unchanged
          • At this point, a base output dictionary is complete (if the input dictionary will generate more than output dictionary, all will have the base dictionary keys/values, if it only generates one output dictionary, then that will be the base one)
          • Next, the keys with "problematic" values - that may generate more than output dictionary - (if any) are processed:
            • Keys having a list with a single element ("problematic") - each might generate more than one output dictionary:
              • Each of the values will be flattened (might yield more than one output dictionary); the corresponding key will be used in the process
              • Then, the cartesian product will be computed on all the flatten dictionary lists (for current input, there will only be one list with one element)
              • Now, each product item needs to be in a distinct output dictionary, so the base dictionary is duplicated and updated with the keys / values of every element in the product item (for current input, there will be only one element per product item)
              • The new dictionary is appended to a list
              • 首先,将针对所有值(具有多个元素的列表)计算笛卡尔乘积.在当前情况下,由于仅是一个这样的列表,因此每个产品项将只包含该列表中的一个元素
              • 然后,对于每个产品项元素,都需要根据列表顺序建立其键(对于当前输入,产品项将仅包含一个元素,并且也将只有一个键) li>
              • 同样,每个产品项都需要放在一个不同输出字典中,因此将复制 base 字典并使用展平的产品项的键/值对其进行更新.
              • First, the cartesian product will be computed against all the values (lists with more than one element). In the current case, since since it's only one such list, each product item will only contain an element from that list
              • Then, for each product item element, its key will need to be established based on the lists order (for the current input, the product item will only contain one element, and also, there will only be one key)
              • Again, each product item needs to be in a distinct output dictionary, so the base dictionary is duplicated and updated with the keys / values, of the flattened product item

              输出:

              c:\Work\Dev\StackOverflow\q046341856>c:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe code.py
              [{'a': 1, 'b.ba': 2, 'b.bb': 3, 'c': 4},
               {'a': 1, 'b.ba': 21, 'b.bb': 31, 'c': 4},
               {'a': 11, 'b.ba': 22, 'b.bb': 33, 'b.bc.e': 2, 'b.bc.h': 1, 'c': 44}]
              

              @ EDIT0 :

              • 使其更通用(尽管在当前输入中不可见):仅包含一个元素的值比输出字典(展平时)产生的收益更多,解决了这种情况(在我只考虑1 st <之前/sup>输出字典,只是忽略其余部分)
              • 更正了一个逻辑错误,该错误被掩盖了元组拆解与笛卡尔积相结合的问题:if not complex_items ... part
              • Made it more general (although it's not visible for the current input): values containing only one element can yield more than output dictionary (when flattened), addressed that case (before I was only considering the 1st output dictionary, simply ignoring the rest)
              • Corrected a logical error that was masked out tuple unpacking combined with cartesian product: if not complex_items ... part

              @ EDIT1 :

              • 修改了代码以匹配需求更改:拼合字典中的键在输入字典中必须具有完整的嵌套路径

              这篇关于将嵌套对象列表反规范化/展平为点分隔的键值对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆