pandas 0.16.1 groupby().apply()方法是否将函数多次应用到同一组? [英] Is Pandas 0.16.1 groupby().apply() method applying function more than once to the same group?
问题描述
我注意到在某些情况下,对于熊猫0.16.1,groupby()
上的apply()
函数被多次应用到一个或多个输出组.这是复制品:
I have noticed that in some cases with pandas 0.16.1, the apply()
function on groupby()
is being applied more than once to one or more of the output groups. Here is a reproduction:
In [1]:
df2 = DataFrame ({"a" : ["alpha", "alpha", "alpha", "beta","beta","beta","beta","gamma"]})
df2 ["b"] = Series ([i for i in range(0,len(df2))])
df2
Out [1]:
a b
0 alpha 0
1 alpha 1
2 alpha 2
3 beta 3
4 beta 4
5 beta 5
6 beta 6
7 gamma 7
In [2]:
def my_func (df):
print(df.index)
In [3]:
df2.groupby("a").apply(my_func)
Out [3]:
Int64Index([0, 1, 2], dtype='int64')
Int64Index([0, 1, 2], dtype='int64')
Int64Index([3, 4, 5, 6], dtype='int64')
Int64Index([7], dtype='int64')
请注意[0,1,2]
索引在输出中出现两次.这似乎表明该函数两次应用于alpha
组.
Notice the [0,1,2]
index appearing twice in the output. This would seem to indicate that the function was applied to the alpha
group twice.
这不是一个大问题,因为将这些功能放在第一位是一个好习惯.但是,如果这些函数在运行时方面很昂贵(请考虑进行大量的回归操作等),则可能会成问题.
This is not a huge issue, since it's good practice for these functions to be idempotent in the first place. However, if the functions are costly in terms of runtime (think big regression runs, etc.), it can be more of a problem.
我是否错误地使用了API和/或错误地解释了此输出,还是这里可能存在问题?
Am I using the API incorrectly and/or misinterpreting this output, or is there a possible issue here?