为什么pandas apply计算两次 [英] Why does pandas apply calculate twice

查看:43
本文介绍了为什么pandas apply计算两次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在熊猫的 DataFrame 对象上使用 apply 方法.当我的 DataFrame 只有一列时,应用的函数似乎被调用了两次.问题是为什么?而且,我可以停止这种行为吗?

I'm using the apply method on a panda's DataFrame object. When my DataFrame has a single column, it appears that the applied function is being called twice. The questions are why? And, can I stop that behavior?

代码:

import pandas as pd

def mul2(x):
    print ('hello')
    return 2*x

df = pd.DataFrame({'a': [1,2,0.67,1.34]})
df.apply(mul2)

输出:

hello
hello

0  2.00
1  4.00
2  1.34
3  2.68

我正在应用的函数中打印hello".我知道它被应用了两次,因为 'hello' 打印了两次.更重要的是,如果我有两列,你好"会打印 3 次.更重要的是,当我调用仅应用于列 'hello' 打印 4 次时.

I'm printing 'hello' from within the function being applied. I know it's being applied twice because 'hello' printed twice. What's more is that if I had two columns, 'hello' prints 3 times. Even more still is when I call applied to just the column 'hello' prints 4 times.

代码:

df.a.apply(mul2)

输出:

hello
hello
hello
hello
0    2.00
1    4.00
2    1.34
3    2.68
Name: a, dtype: float64

推荐答案

此问题已在 pandas 1.1 中修复,请升级!

现在,applyapplymap 在 DataFrame 上只计算第一行/列一次.

This behavior has been fixed with pandas 1.1, please upgrade!

Now, apply and applymap on DataFrame evaluates first row/column only once.

最初,我们让 GroupBy.applySeries/df.apply 评估第一组两次.第一组被评估两次的原因是因为 apply 想知道它是否可以优化"计算(有时这是可能的,如果 apply 收到一个 numpy 或 cythonized 函数).在 pandas 0.25 中,此行为已为 GroupBy.apply 修复.现在,在 pandas 1.1 中,df.apply 也将修复此问题.

Initially, we had GroupBy.apply and Series/df.apply evaluating the first group twice. The reason the first group is evaluated twice is because apply wants to know whether it can "optimize" the calculation (sometimes this is possible if apply receives a numpy or cythonized function). With pandas 0.25, this behavior was fixed for GroupBy.apply. Now, with pandas 1.1, this will also be fixed for df.apply.

旧行为 [pandas <= 1.0.X]

Old Behavior [pandas <= 1.0.X]

pd.__version__ 
# '1.0.4'

df.apply(mul2)
hello
hello

      a
0  2.00
1  4.00
2  1.34
3  2.68

新行为 [pandas >= 1.1]

New Behavior [pandas >= 1.1]

pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'

df.apply(mul2)
hello

      a
0  2.00
1  4.00
2  1.34
3  2.68

这篇关于为什么pandas apply计算两次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆