将用户定义的函数应用于Pandas中的每个子组 [英] Applying a user defined function to each subgroup of Group By in Pandas

查看:139
本文介绍了将用户定义的函数应用于Pandas中的每个子组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我现在一直在使用熊猫,但是我真的会在功能上让我的小组感到担忧。



我有以下功能定义,它最终对新列R,F,M和RFM进行排序和赋值:

  def get_rfm(dataframe): 
dfr = dataframe.sort('last_order_date',ascending = True)
get_var(dfr.R)

dff = dfr.sort('number_of_orders',ascending = True)
get_var(dff.F)

dfm = dff.sort('total_price',ascending = True)
get_var(dfm.M)

dfm.RFM [:] = dfm ['R'] + dfm ['M'] + dfm ['F']
dfrfm = dfm.sort('RFM',ascending = True)
print (dfrfm.info())
return dfrfm

我在pandas dataframe上运行这个函数,并得到看起来像预期的结果。我将它返回到一个新的df中,然后我运行一些统计数据。

我现在要做的是在数据框上按功能分组,其他列之一,并在子组上执行此分析。我尝试

  df.groupby('size_of_business')。apply(get_rfm)

但结果并不符合我的预期。我返回了一个似乎是multiIndexed的数据框

 < class'pandas.core.frame.DataFrame'> 
MultiIndex:57196条目,(未回答,67103)至(超过10人,5617)
数据栏(共11列):
pre>

然后是列的列表。多索引的第一部分应该是我为数据框分组的名称,然后是看起来是索引的名称。



我认为应用将每个组视为子组,数据框,然后我可以操作然后返回。我相信我对结构的理解是有缺陷的,而且我很难找到任何有助于纠正自己的东西。

使用as_index = False:

  df.groupby('size_of_business',as_index = False)


I've been working with pandas a little bit now, but I'm really getting my feet wet in the group by function.

I have the following function defined, which ultimately sorts and assigns values to new columns R, F, M, and RFM:

def get_rfm(dataframe):
    dfr=dataframe.sort('last_order_date', ascending=True)
    get_var(dfr.R)

    dff=dfr.sort('number_of_orders', ascending=True)
    get_var(dff.F)

    dfm=dff.sort('total_price',ascending=True)
    get_var(dfm.M)

    dfm.RFM[:]=dfm['R']+dfm['M']+dfm['F']
    dfrfm=dfm.sort('RFM', ascending=True)
    print(dfrfm.info())
    return dfrfm

I run this function on my pandas dataframe, and get what looks like the expected results. I return it into a new df, which I then run some statistics on.

What I now want to do is run a group by function on the dataframe, grouping them by one of the other columns, and perform this analysis on the subgroup. I try

df.groupby('size_of_business').apply(get_rfm)

But the results are not what I expected. I am returned a Dataframe that seems to be multiIndexed

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 57196 entries, ( Did Not Answer, 67103) to (More than 10 people, 5617)
Data columns (total 11 columns):

which is then followed by the list of columns. The first parts of the multiindex should be the names i grouped the dataframe by, followed by what looks to be the index.

I thought apply treated each group as a sub-dataframe, which i can then manipulate and then return. I believe my understanding of the structure is flawed, and I've had trouble finding anything to help correct myself.

解决方案

You can use as_index=False:

df.groupby('size_of_business', as_index=False)

这篇关于将用户定义的函数应用于Pandas中的每个子组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆