如何在Pyspark中注册不带参数的UDF [英] How to register UDF with no argument in Pyspark
问题描述
我已经尝试使用lambda函数使用带参数的Spark UDF并将其注册.但是我如何创建不带参数和注册器的udf呢?我已经试过了,我的示例代码有望显示当前时间
I have tried Spark UDF with parameter using lambda function and register it. but how could I create udf with not argument and registrar it I have tried this my sample code will expected to show current time
从datetime导入datetime 从pyspark.sql.functions导入udf
from datetime import datetime from pyspark.sql.functions import udf
def getTime():
timevalue=datetime.now()
return timevalue
udfGateTime=udf(getTime,TimestampType())
但是PySpark正在显示
But PySpark is showing
NameError: name 'TimestampType' is not defined
这可能意味着我的UDF未注册 我对这种格式感到满意
which probably means my UDF is not registered I was comfortable with this format
spark.udf.register('GATE_TIME', lambda():getTime(), TimestampType())
但是lambda函数是否接受空参数?尽管我没有尝试过,但我还是有些困惑.如何编写用于注册此getTime()函数的代码?
but does lambda function take empty argument? Though I didn't try it, I am a bit confused. How could I write the code for registering this getTime() function?
推荐答案
-
lambda
表达式可以为空.您只是使用了不正确的语法:lambda
expression can be nullary. You're just using incorrect syntax:spark.udf.register('GATE_TIME', lambda: getTime(), TimestampType())
-
在Spark上下文中的
lambda
表达式中没有特殊之处.您可以直接使用getTime
: There is nothing special in
lambda
expressions in context of Spark. You can usegetTime
directly:spark.udf.register('GetTime', getTime, TimestampType())
-
根本不需要低效率的
udf
. Spark提供了开箱即用的必需功能: There is no need for inefficient
udf
at all. Spark provides required function out-of-the-box:spark.sql("SELECT current_timestamp()")
或
from pyspark.sql.functions import current_timestamp spark.range(0, 2).select(current_timestamp())
这篇关于如何在Pyspark中注册不带参数的UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!