pandas transform()vs apply() [英] Pandas transform() vs apply()

查看:61
本文介绍了 pandas transform()vs apply()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不明白为什么在同一数据帧上调用applytransform时会返回不同的dtype.我之前对自己解释这两个函数的方式大致遵循"apply折叠数据,而transform做与apply完全相同的事情,但保留了原始索引,并且不折叠".请考虑以下内容.

I don't understand why apply and transform return different dtypes when called on the same data frame. The way I explained the two functions to myself before went something along the lines of "apply collapses the data, and transform does exactly the same thing as apply but preserves the original index and doesn't collapse." Consider the following.

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],
                   'cat': [1,1,0,0,1,0,0,0,0,1]})

让我们确定在cat列中具有非零条目的id.

Let's identify those ids which have a nonzero entry in the cat column.

>>> df.groupby('id')['cat'].apply(lambda x: (x == 1).any())
id
1     True
2     True
3    False
4     True
Name: cat, dtype: bool

太好了.但是,如果要创建指标列,则可以执行以下操作.

Great. If we wanted to create an indicator column, however, we could do the following.

>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    0
8    0
9    1
Name: cat, dtype: int64

我不明白为什么dtype现在是int64而不是any()函数返回的布尔值.

I don't understand why the dtype is now int64 instead of the boolean returned by the any() function.

当我将原始数据帧更改为包含一些布尔值时(请注意,零仍然存在),变换方法将在object列中返回布尔值.对我来说,这是个额外的谜,因为所有值都是布尔值,但是它被列为object显然是与原始整数和布尔值的混合类型列的dtype匹配.

When I change the original data frame to contain some booleans (note that the zeros remain), the transform approach returns booleans in an object column. This is an extra mystery to me since all of the values are boolean, but it's listed as object apparently to match the dtype of the original mixed-type column of integers and booleans.

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],
                   'cat': [True,True,0,0,True,0,0,0,0,True]})

>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
0     True
1     True
2     True
3     True
4     True
5     True
6     True
7    False
8    False
9     True
Name: cat, dtype: object

但是,当我使用所有布尔值时,transform函数将返回一个布尔值列.

However, when I use all booleans, the transform function returns a boolean column.

df = pd.DataFrame({'id': [1,1,1,2,2,2,2,3,3,4],
                   'cat': [True,True,False,False,True,False,False,False,False,True]})

>>> df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
0     True
1     True
2     True
3     True
4     True
5     True
6     True
7    False
8    False
9     True
Name: cat, dtype: bool

使用我的敏锐模式识别技能,看来结果列的dtype与原始列的相似.我会很感激为什么会发生这种情况,或者transform函数的内幕是怎么回事.干杯.

Using my acute pattern-recognition skills, it appears that the dtype of the resulting column mirrors that of the original column. I would appreciate any hints about why this occurs or what's going on under the hood in the transform function. Cheers.

推荐答案

看起来SeriesGroupBy.transform()试图将结果dtype转换为与原始列相同的值,但DataFrameGroupBy.transform()似乎没有做到这一点:

It looks like SeriesGroupBy.transform() tries to cast the result dtype to the same one as the original column has, but DataFrameGroupBy.transform() doesn't seem to do that:

In [139]: df.groupby('id')['cat'].transform(lambda x: (x == 1).any())
Out[139]:
0    1
1    1
2    1
3    1
4    1
5    1
6    1
7    0
8    0
9    1
Name: cat, dtype: int64

#                         v       v
In [140]: df.groupby('id')[['cat']].transform(lambda x: (x == 1).any())
Out[140]:
     cat
0   True
1   True
2   True
3   True
4   True
5   True
6   True
7  False
8  False
9   True

In [141]: df.dtypes
Out[141]:
cat    int64
id     int64
dtype: object

这篇关于 pandas transform()vs apply()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆