pandas 变换()与应用() [英] Pandas transform() vs apply()

查看:19
本文介绍了 pandas 变换()与应用()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不明白为什么 applytransform 在同一个数据帧上调用时返回不同的数据类型.我之前向自己解释这两个函数的方式是apply 折叠数据,而 transformapply 做的事情完全一样code> 但保留原始索引并且不会崩溃."考虑以下事项.

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],'猫':[1,1,0,0,1,0,0,0,0,1]})

让我们找出那些在 cat 列中具有非零条目的 id.

<预><代码>>>>df.groupby('id')['cat'].apply(lambda x: (x == 1).any())ID1 真2 真3 错误4 真名称:cat,数据类型:bool

太好了.但是,如果我们想创建一个指标列,我们可以执行以下操作.

<预><代码>>>>df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 11 12 13 14 15 16 17 08 09 1名称:猫,数据类型:int64

我不明白为什么 dtype 现在是 int64 而不是 any() 函数返回的布尔值.

当我将原始数据框更改为包含一些布尔值(请注意零仍然存在)时,转换方法会在 object 列中返回布尔值.这对我来说是一个额外的谜,因为所有值都是布尔值,但它被列为 object 显然是为了匹配整数和布尔值的原始混合类型列的 dtype.

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],'猫': [真,真,0,0,真,0,0,0,0,真]})>>>df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 真1 真2 真3 真4 真5 真6 真7 错误8 错误9 真名称:猫,数据类型:对象

但是,当我使用所有布尔值时,转换函数返回一个布尔列.

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],'猫':[真,真,假,假,真,假,假,假,假,真]})>>>df.groupby('id')['cat'].transform(lambda x: (x == 1).any())0 真1 真2 真3 真4 真5 真6 真7 错误8 错误9 真名称:cat,数据类型:bool

使用我敏锐的模式识别技能,结果列的 dtype 似乎反映了原始列的dtype.我很感激关于为什么会发生这种情况或 transform 函数中发生了什么的任何提示.干杯.

解决方案

看起来 SeriesGroupBy.transform() 试图将结果 dtype 转换为与原始列相同的数据类型,但是 DataFrameGroupBy.transform() 似乎没有这样做:

在[139]: df.groupby('id')['cat'].transform(lambda x: (x == 1).any())出[139]:0 11 12 13 14 15 16 17 08 09 1名称:猫,数据类型:int64# v v在 [140]: df.groupby('id')[['cat']].transform(lambda x: (x == 1).any())出[140]:猫0 真1 真2 真3 真4 真5 真6 真7 错误8 错误9 真在 [141]: df.dtypes出[141]:猫 int64id int64数据类型:对象

I don't understand why apply and transform return different dtypes when called on the same data frame. The way I explained the two functions to myself before went something along the lines of "apply collapses the data, and transform does exactly the same thing as apply but preserves the original index and doesn't collapse." Consider the following.

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],
                   'cat': [1,1,0,0,1,0,0,0,0,1]})

Let's identify those ids which have a nonzero entry in the cat column.

>>> df.groupby('id')['cat'].apply(lambda x: (x == 1).any())
id
1     True
2     True
3    False
4     True
Name: cat, dtype: bool

Great. If we wanted to create an indicator column, however, we could do the following.

>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    0
8    0
9    1
Name: cat, dtype: int64

I don't understand why the dtype is now int64 instead of the boolean returned by the any() function.

When I change the original data frame to contain some booleans (note that the zeros remain), the transform approach returns booleans in an object column. This is an extra mystery to me since all of the values are boolean, but it's listed as object apparently to match the dtype of the original mixed-type column of integers and booleans.

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],
                   'cat': [True,True,0,0,True,0,0,0,0,True]})

>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
0     True
1     True
2     True
3     True
4     True
5     True
6     True
7    False
8    False
9     True
Name: cat, dtype: object

However, when I use all booleans, the transform function returns a boolean column.

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],
                   'cat': [True,True,False,False,True,False,False,False,False,True]})

>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
0     True
1     True
2     True
3     True
4     True
5     True
6     True
7    False
8    False
9     True
Name: cat, dtype: bool

Using my acute pattern-recognition skills, it appears that the dtype of the resulting column mirrors that of the original column. I would appreciate any hints about why this occurs or what's going on under the hood in the transform function. Cheers.

解决方案

It looks like SeriesGroupBy.transform() tries to cast the result dtype to the same one as the original column has, but DataFrameGroupBy.transform() doesn't seem to do that:

In [139]: df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
Out[139]:
0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    0
8    0
9    1
Name: cat, dtype: int64

#                         v       v
In [140]: df.groupby('id')[['cat']].transform(lambda x: (x == 1).any())
Out[140]:
     cat
0   True
1   True
2   True
3   True
4   True
5   True
6   True
7  False
8  False
9   True

In [141]: df.dtypes
Out[141]:
cat    int64
id     int64
dtype: object

这篇关于 pandas 变换()与应用()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆