将带有列表的嵌套字典展开到pandas DataFrame中 [英] Unfold a nested dictionary with lists into a pandas DataFrame

查看:778
本文介绍了将带有列表的嵌套字典展开到pandas DataFrame中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个嵌套的字典,据此子字典可以列出:

I have a nested dictionary, whereby the sub-dictionary use lists:

nested_dict = {'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]}, 
    `string2` :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]}, ... }

列表中的子词典中至少有两个元素,但可能还有更多.

There are at least two elements in the list for the sub-dictionaries, but there could be more.

我想将此字典展开"到pandas DataFrame中,其中第一列用于第一个字典键(例如"string1","string2",..),一列用于子目录键,一列列表中的第一项,下一列的一列,依此类推.

I would like to "unfold" this dictionary into a pandas DataFrame, with one column for the first dictionary keys (e.g. 'string1', 'string2', ..), one column for the sub-directory keys, one column for the first item in the list, one column for the next item, and so on.

这是输出的样子:

col1       col2    col3     col4    col5    col6
string1    69      1231     232
string1    67      682      12
string1    65      1        1
string2    28672   82       23
string2    22736   82       93      1102    102
string2    19423   64       23

自然,我尝试使用pd.DataFrame.from_dict:

new_df = pd.DataFrame.from_dict({(i,j): nested_dict[i][j] 
                           for i in nested_dict.keys() 
                           for j in nested_dict[i].keys()
                           ... 

现在,我被卡住了.并且存在许多现有问题:

Now I'm stuck. And there are many existing problems:

  1. 如何解析字符串(即nested_dict[i].values()),以使每个元素都是新的pandas DataFrame列?

  1. How do I parse the strings (i.e. the nested_dict[i].values()) such that each element is a new pandas DataFrame column?

上面实际上不会为每个字段创建一列

The above will actually not create a column for each field

以上内容不会用元素填充列,例如string1应该在子目录键值对的每一行中. (对于col5col6,我可以用零填充NA)

The above will not fill up the columns with elements, e.g. string1 should be in each row for the sub-directory key-value pair. (For col5 and col6, I can fill the NA with zeros)

我不确定如何正确命名这些列.

I'm not sure how to name these columns correctly.

推荐答案

这可能会为您提供所需的结果,尽管它可能不是最优雅的解决方案.可能有更好的方法(pandas).

This should give you the result you are looking for, although it's probably not the most elegant solution. There's probably a better (more pandas way) to do it.

我解析了您嵌套的字典,并建立了一个字典列表(每行一个).

I parsed your nested dict and built a list of dictionaries (one for each row).

# some sample input
nested_dict = {
    'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]}, 
    'string2' :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]},
    'string3' :{28673: [83, 24], 22737:[83, 94, 1103, 103], 19424: [65, 24]}
}

# new list is what we will use to hold each row
new_list = []
for k1 in nested_dict:
    curr_dict = nested_dict[k1]
    for k2 in curr_dict:
        new_dict = {'col1': k1, 'col2': k2}
        new_dict.update({'col%d'%(i+3): curr_dict[k2][i] for i in range(len(curr_dict[k2]))})
        new_list.append(new_dict)

# create a DataFrame from new list
df = pd.DataFrame(new_list)

输出:

      col1   col2  col3  col4    col5   col6
0  string2  28672    82    23     NaN    NaN
1  string2  22736    82    93  1102.0  102.0
2  string2  19423    64    23     NaN    NaN
3  string3  19424    65    24     NaN    NaN
4  string3  28673    83    24     NaN    NaN
5  string3  22737    83    94  1103.0  103.0
6  string1     65     1     1     NaN    NaN
7  string1     67   682    12     NaN    NaN
8  string1     69  1231   232     NaN    NaN

假设输入将始终包含足够的数据以创建col1col2.

There is an assumption that the input will always contain enough data to create a col1 and a col2.

我在nested_dict中循环.假定nested_dict的每个元素也是一个字典.我们也循环浏览该字典(curr_dict).键k1k2用于填充col1col2.对于其余的键,我们遍历列表内容,并为每个元素添加一列.

I loop through nested_dict. It is assumed that each element of nested_dict is also a dictionary. We loop through that dictionary as well (curr_dict). The keys k1 and k2 are used to populate col1 and col2. For the rest of the keys, we iterate through the list contents and add a column for each element.

这篇关于将带有列表的嵌套字典展开到pandas DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆