pandas groupby/apply与int和string类型具有不同的行为 [英] Pandas groupby/apply has different behaviour with int and string types
问题描述
我有以下数据框
X Y
0 A 10
1 A 9
2 A 8
3 A 5
4 B 100
5 B 90
6 B 80
7 B 50
和两个非常相似的不同功能
and two different functions that are very similar
def func1(x):
if x.iloc[0]['X'] == 'A':
x['D'] = 1
else:
x['D'] = 0
return x[['X', 'D']]
def func2(x):
if x.iloc[0]['X'] == 'A':
x['D'] = 'u'
else:
x['D'] = 'v'
return x[['X', 'D']]
现在我可以对这些功能进行分组/应用
Now I can groupby/apply these functions
df.groupby('X').apply(func1)
df.groupby('X').apply(func2)
第一行给我我想要的东西,即
The first line gives me what I want, i.e.
X D
0 A 1
1 A 1
2 A 1
3 A 1
4 B 0
5 B 0
6 B 0
7 B 0
但是第二行返回的内容很奇怪
But the second line returns something quite strange
X D
0 A u
1 A u
2 A u
3 A u
4 A u
5 A u
6 A u
7 A u
所以我的问题是:
- 有人可以解释为什么类型更改时groupby/apply的行为不同吗?
- 我如何获得与
func2
类似的东西?
- Can anybody explain why the behavior of groupby/apply is different when the type changes?
- How can I get something similar with
func2
?
推荐答案
问题很简单,就是应用于GroupBy的函数应该从不尝试更改接收到的数据帧.它是副本(可以安全地更改,但更改不会在原始数据帧中看到)或视图取决于实现.该选择由pandas优化器完成,作为用户,您应该知道它是禁止的.
The problem is simply that a function applied to a GroupBy should never try to change the dataframe it receives. It is implementation dependant whether it is a copy (that can safely be changed but changes will not be seen in original dataframe) or a view. The choice is done by pandas optimizer, and as a user, you should just know that it is forbidden.
正确的方法是强制复制:
The correct way is to force a copy:
def func2(x):
x = x.copy()
if x.iloc[0]['X'] == 'A':
x['D'] = 'u'
else:
x['D'] = 'v'
return x[['X', 'D']]
然后,df.groupby('X').apply(func2).reset_index(level=0, drop=True)
给出预期结果:
X D
0 A u
1 A u
2 A u
3 A u
4 B v
5 B v
6 B v
7 B v
这篇关于 pandas groupby/apply与int和string类型具有不同的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!