在pyspark中找不到col函数 [英] Cannot find col function in pyspark
问题描述
在pyspark 1.6.2中,我可以导入col
函数,
In pyspark 1.6.2, I can import col
function by
from pyspark.sql.functions import col
but when I try to look it up in the Github source code I find no col
function in functions.py
file, how can python import a function that doesn't exist?
推荐答案
它存在.只是没有明确定义.从pyspark.sql.functions
导出的函数是围绕JVM代码的精简包装,除少数需要特殊处理的异常外,这些函数是使用辅助方法自动生成的.
It exists. It just isn't explicitly defined. Functions exported from pyspark.sql.functions
are thin wrappers around JVM code and, with a few exceptions which require special treatment, are generated automatically using helper methods.
如果您仔细检查来源您会发现col
在其他_functions
中列出.该字典是进一步迭代的和 _create_function
用于生成包装器.每个生成的函数都直接分配给globals
中的相应名称.
If you carefully check the source you'll find col
listed among other _functions
. This dictionary is further iterated and _create_function
is used to generate wrappers. Each generated function is directly assigned to a corresponding name in the globals
.
最后一个__all__
定义了从模块导出的项目的列表,只导出了所有globals
,而不包括黑名单中的项目.
Finally __all__
, which defines a list of items exported from the module, just exports all globals
excluding ones contained in the blacklist.
如果仍然不清楚该机制,则可以创建一个玩具示例:
If this mechanisms is still not clear you can create a toy example:
-
创建名为
foo.py
的Python模块,其内容如下:
Create Python module called
foo.py
with a following content:
# Creates a function assigned to the name foo
globals()["foo"] = lambda x: "foo {0}".format(x)
# Exports all entries from globals which start with foo
__all__ = [x for x in globals() if x.startswith("foo")]
将其放置在Python路径上的某个位置(例如,在工作目录中).
Place it somewhere on the Python path (for example in the working directory).
导入foo
:
from foo import foo
foo(1)
这种元编程方法的不良副作用是,纯依赖于静态代码分析的工具可能无法识别已定义的功能.这不是一个关键问题,在开发过程中可以忽略不计.
An undesired side effect of such metaprogramming approach is that defined functions might not be recognized by the tools depending purely on static code analysis. This is not a critical issue and can be safely ignored during development process.
根据IDE安装类型注释可能会解决此问题(例如,参见 zero323/pyspark-stubs#172 ).
Depending on the IDE installing type annotations might resolve the problem (see for example zero323/pyspark-stubs#172).
这篇关于在pyspark中找不到col函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!