从Pandas DataFrame创建复杂的嵌套字典 [英] Creating complex nested dictionaries from Pandas DataFrame

查看:6400
本文介绍了从Pandas DataFrame创建复杂的嵌套字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



假设我有以下DataFrame:

我正在尝试找到从一个扁平的Pandas DataFrame实例创建(可能深入)嵌套字典的通用方法。 / p>

  dat = pd.DataFrame({'name':['John','John','John','John' ,亨利,亨利,
'年龄:[24,24,24,24,31,31],
'gender':['男','男' 数学,数学,数学,哲学,物理学,物理学,男性,男性,男性
'course':['微积分101','微积分101','微积分102','亚里士多德伦理','量子力学','量子力学'],
'test'考试,'散文','考试','散文','考试1','考试2'],
'pass':[True,True,True,True,True, True],
'grade':['A','A','B','A','C','C']})
dat = dat [['name' ,年龄,性别,学习,课程,考试,成绩,通过]]重新排列列以更好地反映数据结构

我想创建一个深度嵌套的字典(或嵌套字典列表),尊重这个数据的底层结构。也就是说,一个成绩是关于一个测试的信息,这是一个课程的一部分,这是一个研究的一部分,一个人。此外,年龄和性别是有关同一人的信息。



需要输出的示例是:

  [{'John':{'age':24,
'gender':'Male',
'study':{'Mathematics':{'calculus 101 ':{'考试':{'grade':'B',
'pass':True}}},
'哲学':{'Aristotelean Ethics':{'Essay' '''''
'pass':True}}}}}},
{'Henry':{'age':31,
'gender' ,
'study':{'Physics':{'Quantum mechanics':{'Exam1':{'Grade':'C',
'Pass':True},
' Exam2':{'Grade':'C',
'pass':True}}}}}}]

(尽管可能有其他类似的方式结构这样的数据)。



我尝试使用groupby,这使得它很容易,例如,将'grade'和'pass'嵌套在'test'下,考试,学习下的课程,名字下的学习。但是,我不知道如何在名字下添加性别和年龄呢?这样的事情是我想出的最好的:

  dic = {} 
for ind,row in dat。 groupby(['name','study','course','test'])['grade','pass']:

#这是丑陋而不是很通用,一个例子
如果不是ind [0]在dic中:
dic [ind [0]] = {}
如果不是ind [1]在dic [ind [0]]:
dic [ind [0]] [ind [1]] = {}
如果没有ind [2]在dic [ind [0]] [ind [1]]:
dic [ind [0]] [ind [1]] [ind [2]] = {}
如果不是ind [3]在dic [ind [0]] [ind [1]] [ind [2]]:
dic [ind [0]] [ind [1]] [ind [2]] [ind [3]] = {}

dic [ind [0]] [ 1]] [ind [2]] [ind [3]] ['grade'] = row ['grade']。values [0]
dic [ind [0]] [ind [1]] [ ind [2]] [ind [3]] ['pass'] = row ['pass']。values [0]

但是在这种情况下,年龄和性别不嵌套在名称下。我似乎并不关心如何做到这一点...



另一个选项是设置一个MultiIndex并创建一个.to_dict('index')调用。但是再一次,我看不到我可以在单个密钥下嵌套dict和non-dicts ...



我的问题类似于这个:
将大熊猫DataFrame转换为嵌套的dict ,但是我' m寻找一个更复杂的嵌套(例如,不只是最后一列,应该嵌套在所有其他列下)。
Stackoverflow中的其他大多数问题要求相反:从深度嵌套的字典创建(可能是MultiIndex)DataFrame。



编辑:问题也类似于这个q:熊猫将数据框转换为嵌套Json ,但在这个问题上,只有最后一个列(例如,列 n )应该嵌套在所有其他列( n-1 n-2 em>等;完全递归嵌套)。
在我的问题中,列 n n-1 应该嵌套在 n-2 下,而列 2 n-3 应该嵌套在 n-4 (因此,重要的是, n-2 不是嵌套在 n-3 n-4 之下)。 Mohammad Yusuf Ghazi提供的MultiIndex部分解决方案很好地描绘了结构。

解决方案

不是很简洁,但它是最好的现在:

 >>> def rollup1(x):
... return x.set_index('test')[['grade','pass']] to_dict(orient ='index')
>> > def rollup2(x):
... return x.groupby('course')。apply(rollup1).to_dict()
>>> def rollup3(x):
... return x.groupby('study')。apply(rollup2).to_dict()

>>> df = dat.groupby(['name','age','gender'])。apply(rollup3)
>>> df.name ='study'
>>> res = df.reset_index(level = [1,2])。to_dict(orient ='index')
>>> pprint.pprint(res)
{'Henry':{'age':31L,
'gender':'Male',
'study':{'Physics':{'Quantum机械师:{'Exam1':{'grade':'C',
'pass':True},
'Exam2':{'grade':'C',
'通过':True}}}}},
'John':{'age':24L,
'gender':'Male',
'study':{'Mathematics' {'Calculus 101':{'Essay':{'grade':'A',
'pass':True},
'Exam':{'grade':'A',
'pass':True}},
'微积分102':{'Exa m':{'grade':'B',
'pass':True}}},
'哲学':{'Aristotelean Ethics':{'Essay':{'grade' A',
'pass':True}}}}}}

将数据汇总到字典,同时分组数据以获取学习列



更新
我试图创建更多的通用解决方案,所以它也可以像这一个一样工作:

  def rollup_to_dict_core(x,values,columns,d_columns = None):
如果d_columns为None:
d_columns = []

如果len(columns)== 1:
如果len(值)== 1:
返回x.set_index(columns)[values [0] ] .to_dict()
else:
返回x.set_index(columns)[values] .to_dict(orient ='index')
else:
res = x.groupby([columns [0]] + d_columns).apply(lambda y: rollup_to_dict_core(y,values,columns [1:]))
if len(d_columns)== 0:
return res.to_dict()
else:
res.name =列[1]
res = res.reset_index(level = range(1,len(d_columns)+ 1))
return res.to_dict(orient ='index')

def rollup_to_dict(x,values,d_columns = None):
如果d_columns为None:
d_columns = []

columns = [c for x.columns中的c如果c不在值和c不在d_columns]
返回rollup_to_dict_core(x,值,列,d_columns)

>>> pprint(rollup_to_dict(dat,['pass','grade'],['age','gender']))
{'Henry':{'age':31L,
'gender' :'男',
'study':{'Physics':{'Quantum mechanics':{'Exam1':{'grade':'C',
'pass':True},
'Exam2':{'grade':'C',
'pass':True}}}}},
'John':{'age':24L,
'性别':'男',
'研究':{'数学':{'微积分101':{'散文':{'grade':'A',
'pass':True },
'考试':{'grade':'A',
'pass':True}},
'微积分102':{'考试':{'grade':'B',
'pass':True}}},
'哲学':{'Aristotelean Ethics':{'Essay' :{'grade':'A',
'pass':True}}}}}}


I'm trying to find a generic way of creating (possibly deeply) nested dictionaries from a flat Pandas DataFrame instance.

Suppose I have the following DataFrame:

dat = pd.DataFrame({'name' : ['John', 'John', 'John', 'John', 'Henry', 'Henry'],
                    'age' : [24, 24, 24, 24, 31, 31],
                    'gender' : ['Male','Male','Male','Male','Male','Male'],
                    'study' : ['Mathematics', 'Mathematics', 'Mathematics', 'Philosophy', 'Physics', 'Physics'],
                    'course' : ['Calculus 101', 'Calculus 101', 'Calculus 102', 'Aristotelean Ethics', 'Quantum mechanics', 'Quantum mechanics'],
                    'test' : ['Exam', 'Essay','Exam','Essay', 'Exam1','Exam2'],
                    'pass' : [True, True, True, True, True, True],
                    'grade' : ['A', 'A', 'B', 'A', 'C', 'C']})
dat = dat[['name', 'age', 'gender', 'study', 'course', 'test', 'grade', 'pass']] #re-order columns to better reflect data structure

I want to create a deeply nested dictionary (or list of nested dictionaries), that 'respects' the underlying structure of this data. That is, a grade is information about a test, which is part of a course, which is part of a study, that a person does. Also, age and gender are information about that same person.

An example desired output is this:

[{'John': {'age': 24,
           'gender': 'Male',
           'study': {'Mathematics': {'Calculus 101': {'Exam': {'grade': 'B',
                                                               'pass': True}}},
                     'Philosophy': {'Aristotelean Ethics': {'Essay': {'grade': 'A',
                                                                      'pass': True}}}}}},
 {'Henry': {'age': 31,
            'gender': 'Male',
            'study': {'Physics': {'Quantum mechanics': {'Exam1': {'Grade': 'C',
                                                                  'Pass': True},
                                                        'Exam2': {'Grade': 'C',
                                                                  'Pass': True}}}}}}]

(although there may be other, similar ways to structure such data).

I tried using groupby, which makes it easy, for example, to nest 'grade' and 'pass' under 'test', nest 'test' under 'course', nest 'course' under 'study', and 'study' under 'name'. But, then I don't see how to add 'gender' and 'age' under 'name' as well? Something like this is the best I came up with:

dic = {}
for ind, row in dat.groupby(['name', 'study', 'course', 'test'])['grade', 'pass']:

    #this is ugly and not very generic, but just as an example
    if not ind[0] in dic:
        dic[ind[0]] = {}
    if not ind[1] in dic[ind[0]]:
        dic[ind[0]][ind[1]] = {}
    if not ind[2] in dic[ind[0]][ind[1]]:
        dic[ind[0]][ind[1]][ind[2]] = {}
    if not ind[3] in dic[ind[0]][ind[1]][ind[2]]:
        dic[ind[0]][ind[1]][ind[2]][ind[3]] = {}

    dic[ind[0]][ind[1]][ind[2]][ind[3]]['grade'] = row['grade'].values[0]
    dic[ind[0]][ind[1]][ind[2]][ind[3]]['pass'] = row['pass'].values[0]

But in this case, 'age' and 'gender' are not nested under 'name'. I can't seem to wrap my head around how to do this...

Another option is to set a MultiIndex and make a .to_dict('index') call. But then again, I don't see how I can nest both dicts and non-dicts under a single key...

My question is similar to this one: Convert pandas DataFrame to a nested dict, but I'm looking for a more complex nesting (e.g., not just one last column which should be nested under all other columns). Most other questions on Stackoverflow ask for the reverse: creating a (possibly MultiIndex) DataFrame from a deeply nested dictionary.

Edit: The question is also similar to this q: Pandas convert Dataframe to Nested Json, but in that question, only the last column (e.g., column n) should be nested under all other columns (n-1, n-2 etc; fully recursive nesting). In my question, column n and n-1 should be nested under n-2, but column n-2 and n-3 should be nested under n-4 (thus, importantly, n-2 is not nested under n-3 but under n-4). The MultiIndex partial solution offered by Mohammad Yusuf Ghazi depicts the structure nicely.

解决方案

Not really concise, but it's the best I can get now:

>>> def rollup1(x):
...     return x.set_index('test')[['grade', 'pass']].to_dict(orient='index')
>>> def rollup2(x):
...     return x.groupby('course').apply(rollup1).to_dict()
>>> def rollup3(x):
...     return x.groupby('study').apply(rollup2).to_dict()

>>> df = dat.groupby(['name','age','gender']).apply(rollup3)
>>> df.name = 'study'
>>> res = df.reset_index(level=[1,2]).to_dict(orient='index')
>>> pprint.pprint(res)
{'Henry': {'age': 31L,
           'gender': 'Male',
           'study': {'Physics': {'Quantum mechanics': {'Exam1': {'grade': 'C',
                                                                 'pass': True},
                                                       'Exam2': {'grade': 'C',
                                                                 'pass': True}}}}},
 'John': {'age': 24L,
          'gender': 'Male',
          'study': {'Mathematics': {'Calculus 101': {'Essay': {'grade': 'A',
                                                               'pass': True},
                                                     'Exam': {'grade': 'A',
                                                              'pass': True}},
                                    'Calculus 102': {'Exam': {'grade': 'B',
                                                              'pass': True}}},
                    'Philosophy': {'Aristotelean Ethics': {'Essay': {'grade': 'A',
                                                                     'pass': True}}}}}}

The idea is to roll up data to dictionaries while grouping data to get 'study' column

update I've tried to create more generic solution, so it'd work for question like this one as well:

def rollup_to_dict_core(x, values, columns, d_columns=None):
    if d_columns is None:
        d_columns = []

    if len(columns) == 1:
        if len(values) == 1:
            return x.set_index(columns)[values[0]].to_dict()
        else:
            return x.set_index(columns)[values].to_dict(orient='index')
    else:
        res = x.groupby([columns[0]] + d_columns).apply(lambda y: rollup_to_dict_core(y, values, columns[1:]))
        if len(d_columns) == 0:
            return res.to_dict()
        else:
            res.name = columns[1]
            res = res.reset_index(level=range(1, len(d_columns) + 1))
            return res.to_dict(orient='index')

def rollup_to_dict(x, values, d_columns=None):
    if d_columns is None:
        d_columns = []

    columns = [c for c in x.columns if c not in values and c not in d_columns]
    return rollup_to_dict_core(x, values, columns, d_columns)

>>> pprint(rollup_to_dict(dat, ['pass', 'grade'], ['age','gender']))
{'Henry': {'age': 31L,
           'gender': 'Male',
           'study': {'Physics': {'Quantum mechanics': {'Exam1': {'grade': 'C',
                                                                 'pass': True},
                                                       'Exam2': {'grade': 'C',
                                                                 'pass': True}}}}},
 'John': {'age': 24L,
          'gender': 'Male',
          'study': {'Mathematics': {'Calculus 101': {'Essay': {'grade': 'A',
                                                               'pass': True},
                                                     'Exam': {'grade': 'A',
                                                              'pass': True}},
                                    'Calculus 102': {'Exam': {'grade': 'B',
                                                              'pass': True}}},
                    'Philosophy': {'Aristotelean Ethics': {'Essay': {'grade': 'A',
                                                                     'pass': True}}}}}}

这篇关于从Pandas DataFrame创建复杂的嵌套字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆