将用户定义的函数应用于Pandas中的每个子组 [英] Applying a user defined function to each subgroup of Group By in Pandas
问题描述
我现在一直在使用熊猫,但是我真的会在功能上让我的小组感到担忧。
我有以下功能定义,它最终对新列R,F,M和RFM进行排序和赋值:
def get_rfm(dataframe):
dfr = dataframe.sort('last_order_date',ascending = True)
get_var(dfr.R)
dff = dfr.sort('number_of_orders',ascending = True)
get_var(dff.F)
dfm = dff.sort('total_price',ascending = True)
get_var(dfm.M)
dfm.RFM [:] = dfm ['R'] + dfm ['M'] + dfm ['F']
dfrfm = dfm.sort('RFM',ascending = True)
print (dfrfm.info())
return dfrfm
我在pandas dataframe上运行这个函数,并得到看起来像预期的结果。我将它返回到一个新的df中,然后我运行一些统计数据。
我现在要做的是在数据框上按功能分组,其他列之一,并在子组上执行此分析。我尝试
df.groupby('size_of_business')。apply(get_rfm)
但结果并不符合我的预期。我返回了一个似乎是multiIndexed的数据框
< class'pandas.core.frame.DataFrame'>
pre>
MultiIndex:57196条目,(未回答,67103)至(超过10人,5617)
数据栏(共11列):
然后是列的列表。多索引的第一部分应该是我为数据框分组的名称,然后是看起来是索引的名称。
我认为应用将每个组视为子组,数据框,然后我可以操作然后返回。我相信我对结构的理解是有缺陷的,而且我很难找到任何有助于纠正自己的东西。
使用as_index = False:
df.groupby('size_of_business',as_index = False)
I've been working with pandas a little bit now, but I'm really getting my feet wet in the group by function.
I have the following function defined, which ultimately sorts and assigns values to new columns R, F, M, and RFM:
def get_rfm(dataframe): dfr=dataframe.sort('last_order_date', ascending=True) get_var(dfr.R) dff=dfr.sort('number_of_orders', ascending=True) get_var(dff.F) dfm=dff.sort('total_price',ascending=True) get_var(dfm.M) dfm.RFM[:]=dfm['R']+dfm['M']+dfm['F'] dfrfm=dfm.sort('RFM', ascending=True) print(dfrfm.info()) return dfrfm
I run this function on my pandas dataframe, and get what looks like the expected results. I return it into a new df, which I then run some statistics on.
What I now want to do is run a group by function on the dataframe, grouping them by one of the other columns, and perform this analysis on the subgroup. I try
df.groupby('size_of_business').apply(get_rfm)
But the results are not what I expected. I am returned a Dataframe that seems to be multiIndexed
<class 'pandas.core.frame.DataFrame'> MultiIndex: 57196 entries, ( Did Not Answer, 67103) to (More than 10 people, 5617) Data columns (total 11 columns):
which is then followed by the list of columns. The first parts of the multiindex should be the names i grouped the dataframe by, followed by what looks to be the index.
I thought apply treated each group as a sub-dataframe, which i can then manipulate and then return. I believe my understanding of the structure is flawed, and I've had trouble finding anything to help correct myself.
解决方案You can use as_index=False:
df.groupby('size_of_business', as_index=False)
这篇关于将用户定义的函数应用于Pandas中的每个子组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!