pandas GroupBy.apply方法复制第一个组 [英] Pandas GroupBy.apply method duplicates first group

查看:68
本文介绍了 pandas GroupBy.apply方法复制第一个组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的第一个SO问题: 我对在熊猫(0.12.0-4)中groupby的apply方法的这种行为感到困惑,它似乎将TWICE函数应用于数据帧的第一行.例如:

My first SO question: I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:

>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame({'class': ['A', 'B', 'C'], 'count':[1,0,2]})
>>> print(df)
   class  count  
0     A      1  
1     B      0    
2     C      2

我首先检查groupby函数是否可以正常工作,这似乎还不错:

I first check that the groupby function works ok, and it seems to be fine:

>>> for group in df.groupby('class', group_keys = True):
>>>     print(group)
('A',   class  count
0     A      1)
('B',   class  count
1     B      0)
('C',   class  count
2     C      2)

然后我尝试对groupby对象应用apply来做类似的事情,并且两次获得第一行输出:

Then I try to do something similar using apply on the groupby object and I get the first row output twice:

>>> def checkit(group):
>>>     print(group)
>>> df.groupby('class', group_keys = True).apply(checkit)
  class  count
0     A      1
  class  count
0     A      1
  class  count
1     B      0
  class  count
2     C      2

任何帮助将不胜感激!谢谢.

Any help would be appreciated! Thanks.

@Jeff提供以下答案.我很忙,并没有立即理解它,因此,这是一个简单的示例,显示尽管上面的示例中第一组的两次打印输出,apply方法仅对第一组操作一次,并且不会改变原始数据帧:

@Jeff provides the answer below. I am dense and did not understand it immediately, so here is a simple example to show that despite the double printout of the first group in the example above, the apply method operates only once on the first group and does not mutate the original data frame:

>>> def addone(group):
>>>     group['count'] += 1
>>>     return group

>>> df.groupby('class', group_keys = True).apply(addone)
>>> print(df)

      class  count
0     A      1
1     B      0
2     C      2

但是通过将方法的返回值分配给新对象,我们看到它可以按预期工作:

But by assigning the return of the method to a new object, we see that it works as expected:

df2 = df.groupby('class',group_keys = True).apply(addone) 打印(df2)

df2 = df.groupby('class', group_keys = True).apply(addone) print(df2)

      class  count
0     A      2
1     B      1
2     C      3

推荐答案

这是设计使然,如此处

apply函数需要知道返回数据的形状,以便智能地确定如何将其组合.为此,它将调用函数两次(在您的情况下为checkit)以实现此目的.

The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit in your case) twice to achieve this.

根据您的实际用例,您可以用aggregatetransformfilter替换对apply的调用,详细说明

Depending on your actual use case, you can replace the call to apply with aggregate, transform or filter, as described in detail here. These functions require the return value to be a particular shape, and so don't call the function twice.

但是-如果您正在调用的函数没有副作用,那么在第一个值上调用两次该函数就很可能无关紧要.

However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.

这篇关于 pandas GroupBy.apply方法复制第一个组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆