在pyspark中找不到col函数 [英] Cannot find col function in pyspark

查看:30
本文介绍了在pyspark中找不到col函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在pyspark 1.6.2中,我可以通过

In pyspark 1.6.2, I can import col function by

from pyspark.sql.functions import col

但是当我尝试在 Github 源代码 我在 functions.py 文件中找不到 col 函数,python 如何导入不存在的函数?

but when I try to look it up in the Github source code I find no col function in functions.py file, how can python import a function that doesn't exist?

推荐答案

它存在.它只是没有明确定义.从 pyspark.sql.functions 导出的函数是 JVM 代码的薄包装器,除了少数需要特殊处理的例外情况,使用辅助方法自动生成.

It exists. It just isn't explicitly defined. Functions exported from pyspark.sql.functions are thin wrappers around JVM code and, with a few exceptions which require special treatment, are generated automatically using helper methods.

如果仔细查看源码你会发现在其他 _functions 中列出了 col.这本字典是进一步迭代_create_function 用于生成包装器.每个生成的函数都直接分配给 globals 中的相应名称.

If you carefully check the source you'll find col listed among other _functions. This dictionary is further iterated and _create_function is used to generate wrappers. Each generated function is directly assigned to a corresponding name in the globals.

最后__all__,定义了从模块导出的项目列表,只导出所有globals,不包括包含在黑名单中的那些.

Finally __all__, which defines a list of items exported from the module, just exports all globals excluding ones contained in the blacklist.

如果这个机制还不清楚,你可以创建一个玩具示例:

If this mechanisms is still not clear you can create a toy example:

  • 创建名为 foo.py 的 Python 模块,内容如下:

  • Create Python module called foo.py with a following content:

# Creates a function assigned to the name foo
globals()["foo"] = lambda x: "foo {0}".format(x)

# Exports all entries from globals which start with foo
__all__ = [x for x in globals() if x.startswith("foo")]

  • 将它放在 Python 路径的某个位置(例如在工作目录中).

  • Place it somewhere on the Python path (for example in the working directory).

    导入foo:

    from foo import foo
    
    foo(1)
    

  • 这种元编程方法的一个不良副作用是,完全依赖静态代码分析的工具可能无法识别已定义的函数.这不是一个关键问题,可以在开发过程中安全地忽略.

    An undesired side effect of such metaprogramming approach is that defined functions might not be recognized by the tools depending purely on static code analysis. This is not a critical issue and can be safely ignored during development process.

    根据 IDE 安装 类型注释 可能会解决问题(参见例如 zero323/pyspark-stubs#172).

    Depending on the IDE installing type annotations might resolve the problem (see for example zero323/pyspark-stubs#172).

    这篇关于在pyspark中找不到col函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆