pandas groupby转换自定义功能 [英] pandas groupby transform custom function

查看:61
本文介绍了 pandas groupby转换自定义功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使用自定义函数进行分组转换?

Is it possible to do a groupby transform with custom functions?

data = {
        'a':['a1','a2','a3','a4','a5'],
        'b':['b1','b1','b2','b2','b1'],
        'c':[55,44.2,33.3,-66.5,0],
        'd':[10,100,1000,10000,100000],
        }

import pandas as pd
df = pd.DataFrame.from_dict(data)

df['e'] = df.groupby(['b'])['c'].transform(sum) #this works as expected
print (df)
#    a   b     c       d     e
#0  a1  b1  55.0      10  99.2
#1  a2  b1  44.2     100  99.2
#2  a3  b2  33.3    1000 -33.2
#3  a4  b2 -66.5   10000 -33.2
#4  a5  b1   0.0  100000  99.2

def custom_calc(x, y):
    return (x * y)

#obviously wrong code here
df['e'] = df.groupby(['b'])['c'].transform(custom_calc(df['c'], df['d'])) 

从上面的示例中我们可以看到,我想要的是探索将自定义函数传递到.transform()的可能性.

As we can see from the above example, what I want is to explore the possibility of being able to pass in a custom function into .transform().

我知道.apply()存在,但我想了解是否可以单独使用.transform().

I am aware that .apply() exists, but I want to find out if it is possible to use .transform() exclusively.

更重要的是,我想了解如何制定适当的函数,该函数可以传递给.transform()以使其正确应用.

More importantly, I want to understand how to formulate a proper function that can be passed into .transform() for it to apply correctly.

P.S.目前,我知道'count'sum'sum'等默认功能正常工作.

P.S. Currently, I know default functions like 'count', sum, 'sum', etc works.

推荐答案

我想看看发生了什么的一种方法是创建一个小的自定义函数并打印出传递的内容及其类型.然后,您会看到必须使用.

One way I like to see what is happening is by creating a small custom function and printing out what is passed and its type. Then, you can see you have to work with.

def f(x):
    print(type(x))
    print('\n')
    print(x)
    print(x.index)
    return df.loc[x.index,'d']*x

df['f'] = df.groupby('b')['c'].transform(f)
print(df)

#Output from print statements in function
<class 'pandas.core.series.Series'>


0    55.0
1    44.2
4     0.0
Name: b1, dtype: float64
Int64Index([0, 1, 4], dtype='int64')
<class 'pandas.core.series.Series'>


2    33.3
3   -66.5
Name: b2, dtype: float64
Int64Index([2, 3], dtype='int64')
#End output from print statements in custom function

    a   b     c       d     e         f
0  a1  b1  55.0      10  99.2     550.0
1  a2  b1  44.2     100  99.2    4420.0
2  a3  b2  33.3    1000 -33.2   33300.0
3  a4  b2 -66.5   10000 -33.2 -665000.0
4  a5  b1   0.0  100000  99.2       0.0

在这里,我正在转换列'c',但是我在自定义函数中对dataframe对象进行了外部"调用以获取"d".

Here, I am transforming on column 'c' but I make an "extranal" call to the dataframe object in my custom function to get 'd'.

您还可以将外部"用作这样的参数:

You can also pass the "external" to be used as an argument like this:

def f(x, col):
    return df.loc[x.index, col]*x

df['g'] = df.groupby('b')['c'].transform(f, col='d')

print(df)

输出:

    a   b     c       d     e         f         g
0  a1  b1  55.0      10  99.2     550.0     550.0
1  a2  b1  44.2     100  99.2    4420.0    4420.0
2  a3  b2  33.3    1000 -33.2   33300.0   33300.0
3  a4  b2 -66.5   10000 -33.2 -665000.0 -665000.0
4  a5  b1   0.0  100000  99.2       0.0       0.0

这篇关于 pandas groupby转换自定义功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆