pandas 适用:函数名称是否在引号中的区别 [英] pandas apply: difference if function name is in quotes or not

查看:25
本文介绍了 pandas 适用:函数名称是否在引号中的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

简单的数据框定义示例:

Example simple dataframe definition:

df = pd.DataFrame({'A':[2,4,1],'B':[8,4,1],'C':[6,2,7]})
df

    A   B   C
0   2   8   6
1   4   4   2
2   1   1   7

尝试了解以下块中函数参数调用的区别:

Trying to understand the difference in the calls for the function argument in the chunks below:

df.apply(sum)

df.apply('sum')

两者似乎都给出了相同的正确结果:

Both seem to give the same correct result:

A     7
B    13
C    15
dtype: int64

我知道对于这个简单的例子,我可以直接使用 DataFrame sum() 函数,但问题来自更复杂的代码.

I understand that for this simple example I could have used the DataFrame sum() function directly but the question came from a more convoluted code.

推荐答案

根据文档,DataFrame apply() 函数只接受函数作为第一个参数,但查看 pandas.core.apply 的源代码,以下发生在方法 FrameApply.get_result:

According to the documentation the DataFrame apply() function only accepts functions as first argument, but looking in the source code of pandas.core.apply the following happens in the method FrameApply.get_result:

# string dispatch
if isinstance(self.f, str):
       # Support for `frame.transform('method')`
       # Some methods (shift, etc.) require the axis argument, others
       # don't, so inspect and insert if necessary.
       func = getattr(self.obj, self.f)
       sig = inspect.getfullargspec(func)
       if "axis" in sig.args:
           self.kwds["axis"] = self.axis
       return func(*self.args, **self.kwds)

这里 self.fist 是 DataFrame.apply 的参数(通常是一个函数,但在您的情况下是一个字符串),而 self.obj 是 DataFrame.有趣的部分是

Here self.f ist the argument of DataFrame.apply (which normally is a function, but in your case is a string) and self.obj is the DataFrame. The interesting part is the

func = getattr(self.obj, sel.f)

这意味着如果您执行 df.apply("function_name") 变量 func 将设置为 df.function_name (这就是 getattr 的工作方式).上述源代码的其余行与您的问题无关,只是通过填写其他关键字参数来完成 apply 的执行.

That means if you execute df.apply("function_name") the variable func will be set to df.function_name (that's how getattr works). The remaining lines of the above source code aren't relevant to your question and just finalize the execution of apply by filling in additional keyword arguments.

因此,在您的情况下,df.apply(sum) 将使用 Python 的内置 sum 函数,而 df.apply("sum") 将以某种方式使用 DataFrame.sum 函数.

So, in your case df.apply(sum) will use the built-in sum function from Python and df.apply("sum") will somehow use the DataFrame.sum function.

可以在源代码的注释中找到一些关于为什么通常可以传递字符串的理由,尽管我个人从未遇到过这个用例,我无法读懂开发人员的想法.总而言之,除非您真的知道自己在做什么,否则您应该坚持文档并且只将函数传递给 DataFrame.apply.

Some justification for why passing strings in general is possible can be found in the comment in the source code, though I've personally never encountered this use case and I can't read the developers' minds. All in all, you should stick to the documentation and only pass functions to DataFrame.apply unless you really know what you are doing.

这篇关于 pandas 适用:函数名称是否在引号中的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆