将带有列表的嵌套字典展开到pandas DataFrame中 [英] Unfold a nested dictionary with lists into a pandas DataFrame
问题描述
我有一个嵌套的字典,据此子字典可以列出:
I have a nested dictionary, whereby the sub-dictionary use lists:
nested_dict = {'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]},
`string2` :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]}, ... }
列表中的子词典中至少有两个元素,但可能还有更多.
There are at least two elements in the list for the sub-dictionaries, but there could be more.
我想将此字典展开"到pandas DataFrame中,其中第一列用于第一个字典键(例如"string1","string2",..),一列用于子目录键,一列列表中的第一项,下一列的一列,依此类推.
I would like to "unfold" this dictionary into a pandas DataFrame, with one column for the first dictionary keys (e.g. 'string1', 'string2', ..), one column for the sub-directory keys, one column for the first item in the list, one column for the next item, and so on.
这是输出的样子:
col1 col2 col3 col4 col5 col6
string1 69 1231 232
string1 67 682 12
string1 65 1 1
string2 28672 82 23
string2 22736 82 93 1102 102
string2 19423 64 23
自然,我尝试使用pd.DataFrame.from_dict
:
new_df = pd.DataFrame.from_dict({(i,j): nested_dict[i][j]
for i in nested_dict.keys()
for j in nested_dict[i].keys()
...
现在,我被卡住了.并且存在许多现有问题:
Now I'm stuck. And there are many existing problems:
-
如何解析字符串(即
nested_dict[i].values()
),以使每个元素都是新的pandas DataFrame列?
How do I parse the strings (i.e. the
nested_dict[i].values()
) such that each element is a new pandas DataFrame column?
上面实际上不会为每个字段创建一列
The above will actually not create a column for each field
以上内容不会用元素填充列,例如string1
应该在子目录键值对的每一行中. (对于col5
和col6
,我可以用零填充NA)
The above will not fill up the columns with elements, e.g. string1
should be in each row for the sub-directory key-value pair. (For col5
and col6
, I can fill the NA with zeros)
我不确定如何正确命名这些列.
I'm not sure how to name these columns correctly.
推荐答案
这可能会为您提供所需的结果,尽管它可能不是最优雅的解决方案.可能有更好的方法(pandas
).
This should give you the result you are looking for, although it's probably not the most elegant solution. There's probably a better (more pandas
way) to do it.
我解析了您嵌套的字典,并建立了一个字典列表(每行一个).
I parsed your nested dict and built a list of dictionaries (one for each row).
# some sample input
nested_dict = {
'string1': {69: [1231, 232], 67:[682, 12], 65: [1, 1]},
'string2' :{28672: [82, 23], 22736:[82, 93, 1102, 102], 19423: [64, 23]},
'string3' :{28673: [83, 24], 22737:[83, 94, 1103, 103], 19424: [65, 24]}
}
# new list is what we will use to hold each row
new_list = []
for k1 in nested_dict:
curr_dict = nested_dict[k1]
for k2 in curr_dict:
new_dict = {'col1': k1, 'col2': k2}
new_dict.update({'col%d'%(i+3): curr_dict[k2][i] for i in range(len(curr_dict[k2]))})
new_list.append(new_dict)
# create a DataFrame from new list
df = pd.DataFrame(new_list)
输出:
col1 col2 col3 col4 col5 col6
0 string2 28672 82 23 NaN NaN
1 string2 22736 82 93 1102.0 102.0
2 string2 19423 64 23 NaN NaN
3 string3 19424 65 24 NaN NaN
4 string3 28673 83 24 NaN NaN
5 string3 22737 83 94 1103.0 103.0
6 string1 65 1 1 NaN NaN
7 string1 67 682 12 NaN NaN
8 string1 69 1231 232 NaN NaN
假设输入将始终包含足够的数据以创建col1
和col2
.
There is an assumption that the input will always contain enough data to create a col1
and a col2
.
我在nested_dict
中循环.假定nested_dict
的每个元素也是一个字典.我们也循环浏览该字典(curr_dict
).键k1
和k2
用于填充col1
和col2
.对于其余的键,我们遍历列表内容,并为每个元素添加一列.
I loop through nested_dict
. It is assumed that each element of nested_dict
is also a dictionary. We loop through that dictionary as well (curr_dict
). The keys k1
and k2
are used to populate col1
and col2
. For the rest of the keys, we iterate through the list contents and add a column for each element.
这篇关于将带有列表的嵌套字典展开到pandas DataFrame中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!