Python pandas groupby对象应用方法复制第一组 [英] Python pandas groupby object apply method duplicates first group

查看:291
本文介绍了Python pandas groupby对象应用方法复制第一组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的第一个SO问题:
我对pandas(0.12.0-4)中groupby的apply方法的这种行为感到困惑,它似乎将TWICE函数应用于数据框的第一行。例如:

 >>> from pandas import Series,DataFrame 
>>>将pandas导入为pd
>>> df = pd.DataFrame({'class':['A','B','C'],'count':[1,0,2]})
>>> print(df)
class count
0 A 1
1 B 0
2 C 2

我首先检查groupby函数是否正常工作,它看起来很好:

 >>> for group in df.groupby('class',group_keys = True):
>>>打印(组)
('A',班数
0 A 1)
('B',班数
1 B 0)
('C' ,class count
2 C 2)

然后我尝试使用apply groupby对象,我得到第一行输出两次:

 >>> def checkit(group):
>>>打印(群组)
>>> df.groupby('class',group_keys = True).apply(checkit)
class count
0 A 1
class count
0 A 1
class count
1 B 0
班级计数
2 C 2

任何帮助将不胜感激!谢谢。

编辑:@Jeff在下面提供了答案。我很密集,并且不能立即理解它,所以这里有一个简单的例子来说明,尽管在上面的例子中第一组的double打印输出,apply方法只对第一组操作一次,并且不会改变原始数据框:

 >>> def addone(group):
>>> group ['count'] + = 1
>>>返回组

>>> df.groupby('class',group_keys = True).apply(addone)
>>>打印(df)

课程数量
0 A 1
1 B 0
2 C 2

但是,通过将方法的返回赋值给一个新对象,我们可以看到它的工作方式如预期:




df2 = df.groupby('class',group_keys = True).apply(addone)
print(df2) p>





  class count 
0 A 2
1 B 1
2 C 3


解决方案

这是设计的,如这里 a>和此处



apply 函数需要知道返回数据的形状,以智能地找出它将如何结合。要做到这一点,它调用函数(在您的情况下 checkit )两次以实现此目的。



根据您的实际情况用例可以用集合 transform 替换对 apply 的调用$ c>或 filter ,详细描述此处。这些函数要求返回值是一个特定的形状,所以不要调用这个函数两次。



但是 - 如果你调用的函数没有side-效果,这很可能无关紧要,该函数在第一个值上被调用两次。


My first SO question: I am confused about this behavior of apply method of groupby in pandas (0.12.0-4), it appears to apply the function TWICE to the first row of a data frame. For example:

>>> from pandas import Series, DataFrame
>>> import pandas as pd
>>> df = pd.DataFrame({'class': ['A', 'B', 'C'], 'count':[1,0,2]})
>>> print(df)
   class  count  
0     A      1  
1     B      0    
2     C      2

I first check that the groupby function works ok, and it seems to be fine:

>>> for group in df.groupby('class', group_keys = True):
>>>     print(group)
('A',   class  count
0     A      1)
('B',   class  count
1     B      0)
('C',   class  count
2     C      2)

Then I try to do something similar using apply on the groupby object and I get the first row output twice:

>>> def checkit(group):
>>>     print(group)
>>> df.groupby('class', group_keys = True).apply(checkit)
  class  count
0     A      1
  class  count
0     A      1
  class  count
1     B      0
  class  count
2     C      2

Any help would be appreciated! Thanks.

Edit: @Jeff provides the answer below. I am dense and did not understand it immediately, so here is a simple example to show that despite the double printout of the first group in the example above, the apply method operates only once on the first group and does not mutate the original data frame:

>>> def addone(group):
>>>     group['count'] += 1
>>>     return group

>>> df.groupby('class', group_keys = True).apply(addone)
>>> print(df)

      class  count
0     A      1
1     B      0
2     C      2

But by assigning the return of the method to a new object, we see that it works as expected:

df2 = df.groupby('class', group_keys = True).apply(addone) print(df2)

      class  count
0     A      2
1     B      1
2     C      3

解决方案

This is by design, as described here and here

The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. To do this it calls the function (checkit in your case) twice to achieve this.

Depending on your actual use case, you can replace the call to apply with aggregate, transform or filter, as described in detail here. These functions require the return value to be a particular shape, and so don't call the function twice.

However - if the function you are calling does not have side-effects, it most likely does not matter that the function is being called twice on the first value.

这篇关于Python pandas groupby对象应用方法复制第一组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆