在pyspark中找不到col函数 [英] Cannot find col function in pyspark

查看:668
本文介绍了在pyspark中找不到col函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在pyspark 1.6.2中,我可以导入col函数,

In pyspark 1.6.2, I can import col function by

from pyspark.sql.functions import col

但是当我尝试在

but when I try to look it up in the Github source code I find no col function in functions.py file, how can python import a function that doesn't exist?

推荐答案

它存在.只是没有明确定义.从pyspark.sql.functions导出的函数是围绕JVM代码的精简包装,除少数需要特殊处理的异常外,这些函数是使用辅助方法自动生成的.

It exists. It just isn't explicitly defined. Functions exported from pyspark.sql.functions are thin wrappers around JVM code and, with a few exceptions which require special treatment, are generated automatically using helper methods.

如果您仔细检查来源您会发现col在其他_functions 中列出.该字典是进一步迭代的 _create_function 用于生成包装器.每个生成的函数都直接分配给globals中的相应名称.

If you carefully check the source you'll find col listed among other _functions. This dictionary is further iterated and _create_function is used to generate wrappers. Each generated function is directly assigned to a corresponding name in the globals.

最后一个__all__定义了从模块导出的项目的列表,只导出了所有globals,而不包括黑名单中的项目.

Finally __all__, which defines a list of items exported from the module, just exports all globals excluding ones contained in the blacklist.

如果仍然不清楚该机制,则可以创建一个玩具示例:

If this mechanisms is still not clear you can create a toy example:

  • 创建名为foo.py的Python模块,其内容如下:

  • Create Python module called foo.py with a following content:

# Creates a function assigned to the name foo
globals()["foo"] = lambda x: "foo {0}".format(x)

# Exports all entries from globals which start with foo
__all__ = [x for x in globals() if x.startswith("foo")]

  • 将其放置在Python路径上的某个位置(例如,在工作目录中).

  • Place it somewhere on the Python path (for example in the working directory).

    导入foo:

    from foo import foo
    
    foo(1)
    

  • 这种元编程方法的不良副作用是,纯依赖于静态代码分析的工具可能无法识别已定义的功能.这不是一个关键问题,在开发过程中可以忽略不计.

    An undesired side effect of such metaprogramming approach is that defined functions might not be recognized by the tools depending purely on static code analysis. This is not a critical issue and can be safely ignored during development process.

    根据IDE安装类型注释可能会解决此问题(例如,参见 zero323/pyspark-stubs#172 ).

    Depending on the IDE installing type annotations might resolve the problem (see for example zero323/pyspark-stubs#172).

    这篇关于在pyspark中找不到col函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆