如何在Pyspark中注册不带参数的UDF [英] How to register UDF with no argument in Pyspark

查看:374
本文介绍了如何在Pyspark中注册不带参数的UDF的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经尝试使用lambda函数使用带参数的Spark UDF并将其注册.但是我如何创建不带参数和注册器的udf呢?我已经试过了,我的示例代码有望显示当前时间

I have tried Spark UDF with parameter using lambda function and register it. but how could I create udf with not argument and registrar it I have tried this my sample code will expected to show current time

从datetime导入datetime 从pyspark.sql.functions导入udf

from datetime import datetime from pyspark.sql.functions import udf

def getTime():
    timevalue=datetime.now()
    return timevalue 

udfGateTime=udf(getTime,TimestampType())

但是PySpark正在显示

But PySpark is showing

NameError: name 'TimestampType' is not defined

这可能意味着我的UDF未注册 我对这种格式感到满意

which probably means my UDF is not registered I was comfortable with this format

spark.udf.register('GATE_TIME', lambda():getTime(), TimestampType())

但是lambda函数是否接受空参数?尽管我没有尝试过,但我还是有些困惑.如何编写用于注册此getTime()函数的代码?

but does lambda function take empty argument? Though I didn't try it, I am a bit confused. How could I write the code for registering this getTime() function?

推荐答案

  • lambda表达式可以为空.您只是使用了不正确的语法:

    • lambda expression can be nullary. You're just using incorrect syntax:

      spark.udf.register('GATE_TIME', lambda: getTime(), TimestampType())
      

    • 在Spark上下文中的lambda表达式中没有特殊之处.您可以直接使用getTime:

    • There is nothing special in lambda expressions in context of Spark. You can use getTime directly:

      spark.udf.register('GetTime', getTime, TimestampType())
      

    • 根本不需要低效率的udf. Spark提供了开箱即用的必需功能:

    • There is no need for inefficient udf at all. Spark provides required function out-of-the-box:

      spark.sql("SELECT current_timestamp()")
      

      from pyspark.sql.functions import current_timestamp
      
      spark.range(0, 2).select(current_timestamp())
      

    • 这篇关于如何在Pyspark中注册不带参数的UDF的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆