pandas 变换()与应用() [英] Pandas transform() vs apply()
问题描述
我不明白为什么 apply
和 transform
在同一个数据帧上调用时返回不同的数据类型.我之前向自己解释这两个函数的方式是apply
折叠数据,而 transform
和 apply> 做的事情完全一样code> 但保留原始索引并且不会崩溃."考虑以下事项.
df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],'猫':[1,1,0,0,1,0,0,0,0,1]})
让我们找出那些在 cat
列中具有非零条目的 id
.
太好了.但是,如果我们想创建一个指标列,我们可以执行以下操作.
<预><代码>>>>df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 11 12 13 14 15 16 17 08 09 1名称:猫,数据类型:int64我不明白为什么 dtype 现在是 int64
而不是 any()
函数返回的布尔值.
当我将原始数据框更改为包含一些布尔值(请注意零仍然存在)时,转换方法会在 object
列中返回布尔值.这对我来说是一个额外的谜,因为所有值都是布尔值,但它被列为 object
显然是为了匹配整数和布尔值的原始混合类型列的 dtype
.
df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],'猫': [真,真,0,0,真,0,0,0,0,真]})>>>df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 真1 真2 真3 真4 真5 真6 真7 错误8 错误9 真名称:猫,数据类型:对象
但是,当我使用所有布尔值时,转换函数返回一个布尔列.
df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],'猫':[真,真,假,假,真,假,假,假,假,真]})>>>df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 真1 真2 真3 真4 真5 真6 真7 错误8 错误9 真名称:cat,数据类型:bool
使用我敏锐的模式识别技能,结果列的 dtype
似乎反映了原始列的dtype
.我很感激关于为什么会发生这种情况或 transform
函数中发生了什么的任何提示.干杯.
看起来 SeriesGroupBy.transform()
试图将结果 dtype 转换为与原始列相同的数据类型,但是 DataFrameGroupBy.transform()
似乎没有这样做:
在[139]: df.groupby('id')['cat'].transform(lambda x: (x == 1).any())出[139]:0 11 12 13 14 15 16 17 08 09 1名称:猫,数据类型:int64# v v在 [140]: df.groupby('id')[['cat']].transform(lambda x: (x == 1).any())出[140]:猫0 真1 真2 真3 真4 真5 真6 真7 错误8 错误9 真在 [141]: df.dtypes出[141]:猫 int64id int64数据类型:对象
I don't understand why apply
and transform
return different dtypes when called on the same data frame. The way I explained the two functions to myself before went something along the lines of "apply
collapses the data, and transform
does exactly the same thing as apply
but preserves the original index and doesn't collapse." Consider the following.
df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],
'cat': [1,1,0,0,1,0,0,0,0,1]})
Let's identify those id
s which have a nonzero entry in the cat
column.
>>> df.groupby('id')['cat'].apply(lambda x: (x == 1).any())
id
1 True
2 True
3 False
4 True
Name: cat, dtype: bool
Great. If we wanted to create an indicator column, however, we could do the following.
>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 0
8 0
9 1
Name: cat, dtype: int64
I don't understand why the dtype is now int64
instead of the boolean returned by the any()
function.
When I change the original data frame to contain some booleans (note that the zeros remain), the transform approach returns booleans in an object
column. This is an extra mystery to me since all of the values are boolean, but it's listed as object
apparently to match the dtype
of the original mixed-type column of integers and booleans.
df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],
'cat': [True,True,0,0,True,0,0,0,0,True]})
>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 False
8 False
9 True
Name: cat, dtype: object
However, when I use all booleans, the transform function returns a boolean column.
df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],
'cat': [True,True,False,False,True,False,False,False,False,True]})
>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 False
8 False
9 True
Name: cat, dtype: bool
Using my acute pattern-recognition skills, it appears that the dtype
of the resulting column mirrors that of the original column. I would appreciate any hints about why this occurs or what's going on under the hood in the transform
function. Cheers.
It looks like SeriesGroupBy.transform()
tries to cast the result dtype to the same one as the original column has, but DataFrameGroupBy.transform()
doesn't seem to do that:
In [139]: df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
Out[139]:
0 1
1 1
2 1
3 1
4 1
5 1
6 1
7 0
8 0
9 1
Name: cat, dtype: int64
# v v
In [140]: df.groupby('id')[['cat']].transform(lambda x: (x == 1).any())
Out[140]:
cat
0 True
1 True
2 True
3 True
4 True
5 True
6 True
7 False
8 False
9 True
In [141]: df.dtypes
Out[141]:
cat int64
id int64
dtype: object
这篇关于 pandas 变换()与应用()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!