pandas 0.16.1 groupby().apply()方法是否将函数多次应用到同一组? [英] Is Pandas 0.16.1 groupby().apply() method applying function more than once to the same group?

查看:67
本文介绍了 pandas 0.16.1 groupby().apply()方法是否将函数多次应用到同一组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我注意到在某些情况下,对于熊猫0.16.1,groupby()上的apply()函数被多次应用到一个或多个输出组.这是复制品:

I have noticed that in some cases with pandas 0.16.1, the apply() function on groupby() is being applied more than once to one or more of the output groups. Here is a reproduction:

In [1]: 
df2 = DataFrame ({"a" : ["alpha", "alpha", "alpha", "beta","beta","beta","beta","gamma"]})
df2 ["b"] = Series ([i for i in range(0,len(df2))])
df2

Out [1]:
    a   b
0   alpha   0
1   alpha   1
2   alpha   2
3   beta    3
4   beta    4
5   beta    5
6   beta    6
7   gamma   7

In [2]: 
def my_func (df):
    print(df.index)

In [3]: 
df2.groupby("a").apply(my_func)

Out [3]:
Int64Index([0, 1, 2], dtype='int64')
Int64Index([0, 1, 2], dtype='int64')
Int64Index([3, 4, 5, 6], dtype='int64')
Int64Index([7], dtype='int64')

请注意[0,1,2]索引在输出中出现两次.这似乎表明该函数两次应用于alpha组.

Notice the [0,1,2] index appearing twice in the output. This would seem to indicate that the function was applied to the alpha group twice.

这不是一个大问题,因为将这些功能放在第一位是一个好习惯.但是,如果这些函数在运行时方面很昂贵(请考虑进行大量的回归操作等),则可能会成问题.

This is not a huge issue, since it's good practice for these functions to be idempotent in the first place. However, if the functions are costly in terms of runtime (think big regression runs, etc.), it can be more of a problem.

我是否错误地使用了API和/或错误地解释了此输出,还是这里可能存在问题?

Am I using the API incorrectly and/or misinterpreting this output, or is there a possible issue here?

推荐答案

根据文档(

在当前实现中,在第一列/行上两次应用func调用,以确定它可以采用快速还是慢速代码路径.

In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path.

这篇关于 pandas 0.16.1 groupby().apply()方法是否将函数多次应用到同一组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆