在pyspark中找不到col函数 [英] Cannot find col function in pyspark
问题描述
在pyspark 1.6.2中,我可以通过
In pyspark 1.6.2, I can import col
function by
from pyspark.sql.functions import col
但是当我尝试在 Github 源代码 我在 functions.py
文件中找不到 col
函数,python 如何导入不存在的函数?
but when I try to look it up in the Github source code I find no col
function in functions.py
file, how can python import a function that doesn't exist?
推荐答案
它存在.它只是没有明确定义.从 pyspark.sql.functions
导出的函数是 JVM 代码的瘦包装器,除了少数需要特殊处理的例外情况,使用辅助方法自动生成.
It exists. It just isn't explicitly defined. Functions exported from pyspark.sql.functions
are thin wrappers around JVM code and, with a few exceptions which require special treatment, are generated automatically using helper methods.
如果仔细查看源码你会发现在其他 _functions
中列出了 col
.这本字典是进一步迭代 和 _create_function
用于生成包装器.每个生成的函数都直接分配给 globals
中的相应名称.
If you carefully check the source you'll find col
listed among other _functions
. This dictionary is further iterated and _create_function
is used to generate wrappers. Each generated function is directly assigned to a corresponding name in the globals
.
最后__all__
,定义了从模块导出的项目列表,只导出所有globals
,不包括包含在黑名单中的那些.
Finally __all__
, which defines a list of items exported from the module, just exports all globals
excluding ones contained in the blacklist.
如果这个机制还不清楚,你可以创建一个玩具示例:
If this mechanisms is still not clear you can create a toy example:
创建名为
foo.py
的 Python 模块,内容如下:
Create Python module called
foo.py
with a following content:
# Creates a function assigned to the name foo
globals()["foo"] = lambda x: "foo {0}".format(x)
# Exports all entries from globals which start with foo
__all__ = [x for x in globals() if x.startswith("foo")]
将它放在 Python 路径的某个位置(例如在工作目录中).
Place it somewhere on the Python path (for example in the working directory).
导入foo
:
from foo import foo
foo(1)
这种元编程方法的一个不良副作用是,完全依赖静态代码分析的工具可能无法识别已定义的函数.这不是一个关键问题,可以在开发过程中安全地忽略.
An undesired side effect of such metaprogramming approach is that defined functions might not be recognized by the tools depending purely on static code analysis. This is not a critical issue and can be safely ignored during development process.
根据安装 类型注释 的 IDE 可能会解决问题(参见例如
Depending on the IDE installing type annotations might resolve the problem (see for example zero323/pyspark-stubs#172).
这篇关于在pyspark中找不到col函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!