如何在 Pyspark 中注册没有参数的 UDF [英] How to register UDF with no argument in Pyspark
问题描述
我已经使用 lambda 函数尝试了带参数的 Spark UDF 并注册了它.但是我怎么能创建没有参数和注册器的 udf 我已经试过了我的示例代码预计会显示当前时间
I have tried Spark UDF with parameter using lambda function and register it. but how could I create udf with not argument and registrar it I have tried this my sample code will expected to show current time
从日期时间导入日期时间从 pyspark.sql.functions 导入 udf
from datetime import datetime from pyspark.sql.functions import udf
def getTime():
timevalue=datetime.now()
return timevalue
udfGateTime=udf(getTime,TimestampType())
但 PySpark 正在显示
But PySpark is showing
NameError: name 'TimestampType' is not defined
这可能意味着我的 UDF 未注册我对这种格式很满意
which probably means my UDF is not registered I was comfortable with this format
spark.udf.register('GATE_TIME', lambda():getTime(), TimestampType())
但是 lambda 函数是否接受空参数?虽然我没有尝试,但我有点困惑.我如何编写用于注册此 getTime() 函数的代码?
but does lambda function take empty argument? Though I didn't try it, I am a bit confused. How could I write the code for registering this getTime() function?
推荐答案
lambda
表达式可以为空.您只是使用了不正确的语法:lambda
expression can be nullary. You're just using incorrect syntax:spark.udf.register('GATE_TIME', lambda: getTime(), TimestampType())
lambda
表达式在 Spark 上下文中没有什么特别之处.你可以直接使用getTime
:There is nothing special in
lambda
expressions in context of Spark. You can usegetTime
directly:spark.udf.register('GetTime', getTime, TimestampType())
根本不需要低效的
udf
.Spark 提供了开箱即用的所需功能:There is no need for inefficient
udf
at all. Spark provides required function out-of-the-box:spark.sql("SELECT current_timestamp()")
或
from pyspark.sql.functions import current_timestamp spark.range(0, 2).select(current_timestamp())
这篇关于如何在 Pyspark 中注册没有参数的 UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!