Pandas groupby/apply 对 int 和 string 类型有不同的行为 [英] Pandas groupby/apply has different behaviour with int and string types
问题描述
我有以下数据框
X Y
0 A 10
1 A 9
2 A 8
3 A 5
4 B 100
5 B 90
6 B 80
7 B 50
和两个非常相似的不同功能
and two different functions that are very similar
def func1(x):
if x.iloc[0]['X'] == 'A':
x['D'] = 1
else:
x['D'] = 0
return x[['X', 'D']]
def func2(x):
if x.iloc[0]['X'] == 'A':
x['D'] = 'u'
else:
x['D'] = 'v'
return x[['X', 'D']]
现在我可以分组/应用这些功能
Now I can groupby/apply these functions
df.groupby('X').apply(func1)
df.groupby('X').apply(func2)
第一行给了我我想要的,即
The first line gives me what I want, i.e.
X D
0 A 1
1 A 1
2 A 1
3 A 1
4 B 0
5 B 0
6 B 0
7 B 0
但是第二行返回了一些很奇怪的东西
But the second line returns something quite strange
X D
0 A u
1 A u
2 A u
3 A u
4 A u
5 A u
6 A u
7 A u
所以我的问题是:
- 谁能解释为什么 groupby/apply 的行为在类型改变时会有所不同?
- 我怎样才能获得与
func2
类似的东西?
- Can anybody explain why the behavior of groupby/apply is different when the type changes?
- How can I get something similar with
func2
?
推荐答案
问题只是应用于 GroupBy 的函数应该永远尝试更改它接收的数据帧.它是副本(可以安全地更改,但不会在原始数据帧中看到更改)还是视图取决于实现.选择是由pandas优化器完成的,作为用户,你应该知道这是被禁止的.
The problem is simply that a function applied to a GroupBy should never try to change the dataframe it receives. It is implementation dependant whether it is a copy (that can safely be changed but changes will not be seen in original dataframe) or a view. The choice is done by pandas optimizer, and as a user, you should just know that it is forbidden.
正确的做法是强制复制:
The correct way is to force a copy:
def func2(x):
x = x.copy()
if x.iloc[0]['X'] == 'A':
x['D'] = 'u'
else:
x['D'] = 'v'
return x[['X', 'D']]
之后,df.groupby('X').apply(func2).reset_index(level=0, drop=True)
给出了预期:
X D
0 A u
1 A u
2 A u
3 A u
4 B v
5 B v
6 B v
7 B v
这篇关于Pandas groupby/apply 对 int 和 string 类型有不同的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!