如何在Python Pivot_table中使用agg_func ='All' [英] How to use agg_func = 'All' in Python Pivot_table

查看:675
本文介绍了如何在Python Pivot_table中使用agg_func ='All'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的输入数据框

I have an input dataframe like given below

df = pd.DataFrame({'person_id' :[1,1,1,2,2,2,2,2,2],'level_1': ['L1FR','L1Date','L1value','L1FR','L1Date','L1value','L2FR','L2Date','L2value'], 'val3':['Fasting','11/4/2005',1.33,'Random','18/1/2007',4.63,'Fasting','18/1/2017',8.63]})

如下图所示

我想将输出转换为如下图所示

I would like to convert my output to look like as shown below

这是我根据SO论坛的回复尝试的

This is what I tried based on response from SO forum

g = df.level_1.str[-2:]
df.pivot_table(index='person_id', columns=g, values='val3',aggfunc='first')

这仅提供了每个组(或每个人)的第一条记录,如下所示,它与我的预期输出非常接近.基本上,我正在尝试进行某种变换/重新排列输入数据的显示方式

This provides only the first record of each group (or person) like shown below which is very close to my expected output.Basically I am trying to do sort of transformation/rearrange the way input data is shown

但是我想拥有所有记录.在真实数据中,我有超过10万条记录,每个人可以重复多次.

but I want to have all the records. In real data, I have more than 100k records and each person can be repeated multiple times.

当我使用g.groupby(g).cumcount()作为索引时,我得到了一半记录的不适用值,我想避免使用它.我不知道为什么会发生这个问题.

When I use g.groupby(g).cumcount() as index, I get NA's for half of the records and I would like to avoid it. I don't know why this issue happens.

任何其他解决此问题的方法也很有帮助.你能帮我得到这个输出吗?

Any other way to approach this is also helpful. Can you help me get this output?

推荐答案

您已经很接近拥有您想要的东西了.

You're pretty close to have what you want.

这里的窍门是在第二级添加另一个索引,如下所示:

The trick here is to add another index to your second level like this :

df = pd.DataFrame({'person_id' :[1,1,1,2,2,2,2,2,2],'level_1': ['L1FR','L1Date','L1value','L1FR','L1Date','L1value','L2FR','L2Date','L2value'], 'val3':['Fasting','11/4/2005',1.33,'Random','18/1/2007',4.63,'Fasting','18/1/2017',8.63]})
g = df.level_1.str[-2:]
# Extracting level's number
df['lvl'] = df.level_1.apply(lambda x: int(''.join(filter(str.isdigit, x))))
# Then you pivot with person_id and lvl
df = df.pivot_table(index=['person_id', 'lvl'], columns=g, values='val3', aggfunc='first')

输出应为:

            level_1     FR          te          ue
person_id   lvl             
        1   1       Fasting     11/4/2005   1.33
        2   1           Random      18/1/2007   4.63
            2           Fasting     18/1/2017   8.63

然后,如果您像这样重置1级索引:

Then if you reset the level 1 index like this :

df.reset_index(level=1).drop("lvl", axis=1)

输出为:

level_1     FR          te          ue
person_id           
1           Fasting     11/4/2005   1.33
2           Random      18/1/2007   4.63
2           Fasting     18/1/2017   8.63

然后去!

这篇关于如何在Python Pivot_table中使用agg_func ='All'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆