将Pandas DataFrame转换为嵌套字典 [英] Convert pandas DataFrame to a nested dict
问题描述
我正在寻找一种将DataFrame转换为嵌套字典的通用方法
I'm Looking for a generic way of turning a DataFrame to a nested dictionary
这是一个示例数据框
name v1 v2 v3
0 A A1 A11 1
1 A A2 A12 2
2 B B1 B12 3
3 C C1 C11 4
4 B B2 B21 5
5 A A2 A21 6
列数可能会有所不同,列名也会有所不同.
The number of columns may differ and so does the column names.
像这样:
{
'A' : {
'A1' : { 'A11' : 1 }
'A2' : { 'A12' : 2 , 'A21' : 6 }} ,
'B' : {
'B1' : { 'B12' : 3 } } ,
'C' : {
'C1' : { 'C11' : 4}}
}
实现此目标的最佳方法是什么?
What is best way to achieve this ?
我最近得到的是zip
函数,但是没有设法使其工作于一个以上级别(两列).
closest I got was with the zip
function but haven't managed to make it work for more then one level (two columns).
推荐答案
我不明白为什么您的词典中没有B2
.我也不确定在重复的列值的情况下要发生什么(我是说除最后一个列之外的所有列.)假设第一个是疏忽大意,我们可以使用递归:
I don't understand why there isn't a B2
in your dict. I'm also not sure what you want to happen in the case of repeated column values (every one except the last, I mean.) Assuming the first is an oversight, we could use recursion:
def recur_dictify(frame):
if len(frame.columns) == 1:
if frame.values.size == 1: return frame.values[0][0]
return frame.values.squeeze()
grouped = frame.groupby(frame.columns[0])
d = {k: recur_dictify(g.ix[:,1:]) for k,g in grouped}
return d
产生
>>> df
name v1 v2 v3
0 A A1 A11 1
1 A A2 A12 2
2 B B1 B12 3
3 C C1 C11 4
4 B B2 B21 5
5 A A2 A21 6
>>> pprint.pprint(recur_dictify(df))
{'A': {'A1': {'A11': 1}, 'A2': {'A12': 2, 'A21': 6}},
'B': {'B1': {'B12': 3}, 'B2': {'B21': 5}},
'C': {'C1': {'C11': 4}}}
不过,使用非熊猫方法可能会更简单:
It might be simpler to use a non-pandas approach, though:
def retro_dictify(frame):
d = {}
for row in frame.values:
here = d
for elem in row[:-2]:
if elem not in here:
here[elem] = {}
here = here[elem]
here[row[-2]] = row[-1]
return d
这篇关于将Pandas DataFrame转换为嵌套字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!