将用户定义的函数应用于Pandas中的每个子组 [英] Applying a user defined function to each subgroup of Group By in Pandas

查看：139 发布时间：2018/5/30 14:28:37 python group-by pandas

本文介绍了将用户定义的函数应用于Pandas中的每个子组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我现在一直在使用熊猫，但是我真的会在功能上让我的小组感到担忧。

我有以下功能定义，它最终对新列R，F，M和RFM进行排序和赋值：

  def get_rfm（dataframe）： 
 dfr = dataframe.sort（'last_order_date'，ascending = True）
 get_var（dfr.R）
 
 dff = dfr.sort（'number_of_orders'，ascending = True） 
 get_var（dff.F）
 
 dfm = dff.sort（'total_price'，ascending = True）
 get_var（dfm.M）
 
 dfm.RFM [：] = dfm ['R'] + dfm ['M'] + dfm ['F'] 
 dfrfm = dfm.sort（'RFM'，ascending = True）
 print （dfrfm.info（））
 return dfrfm

我在pandas dataframe上运行这个函数，并得到看起来像预期的结果。我将它返回到一个新的df中，然后我运行一些统计数据。

我现在要做的是在数据框上按功能分组，其他列之一，并在子组上执行此分析。我尝试

  df.groupby（'size_of_business'）。apply（get_rfm）

但结果并不符合我的预期。我返回了一个似乎是multiIndexed的数据框

 < class'pandas.core.frame.DataFrame'> 
 MultiIndex：57196条目，（未回答，67103）至（超过10人，5617）
数据栏（共11列）：
  pre> 
 
 然后是列的列表。多索引的第一部分应该是我为数据框分组的名称，然后是看起来是索引的名称。
 
 
 我认为应用将每个组视为子组，数据框，然后我可以操作然后返回。我相信我对结构的理解是有缺陷的，而且我很难找到任何有助于纠正自己的东西。
    使用as_index = False： 
 
 
  df.groupby（'size_of_business'，as_index = False）
  
 
I've been working with pandas a little bit now, but I'm really getting my feet wet in the group by function.

I have the following function defined, which ultimately sorts and assigns values to new columns R, F, M, and RFM:
def get_rfm(dataframe):
    dfr=dataframe.sort('last_order_date', ascending=True)
    get_var(dfr.R)

    dff=dfr.sort('number_of_orders', ascending=True)
    get_var(dff.F)

    dfm=dff.sort('total_price',ascending=True)
    get_var(dfm.M)

    dfm.RFM[:]=dfm['R']+dfm['M']+dfm['F']
    dfrfm=dfm.sort('RFM', ascending=True)
    print(dfrfm.info())
    return dfrfm
I run this function on my pandas dataframe, and get what looks like the expected results.  I return it into a new df, which I then run some statistics on.

What I now want to do is run a group by function on the dataframe, grouping them by one of the other columns, and perform this analysis on the subgroup.  I try
df.groupby('size_of_business').apply(get_rfm)
But the results are not what I expected.  I am returned a Dataframe that seems to be multiIndexed
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 57196 entries, ( Did Not Answer, 67103) to (More than 10 people, 5617)
Data columns (total 11 columns):
which is then followed by the list of columns.  The first parts of the multiindex should be the names i grouped the dataframe by, followed by what looks to be the index.

I thought apply treated each group as a sub-dataframe, which i can then manipulate and then return.  I believe my understanding of the structure is flawed, and I've had trouble finding anything to help correct myself.
 解决方案 
You can use as_index=False:
df.groupby('size_of_business', as_index=False)


                        
这篇关于将用户定义的函数应用于Pandas中的每个子组的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将用户定义的函数应用于Pandas中的每个子组 [英] Applying a user defined function to each subgroup of Group By in Pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

将用户定义的函数应用于Pandas中的每个子组 [英] Applying a user defined function to each subgroup of Group By in Pandas

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭