我们可以在UDF中使用关键字参数吗 [英] Can we use keyword arguments in UDF

查看:72
本文介绍了我们可以在UDF中使用关键字参数吗的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题是,是否可以像下面那样在Pyspark中将关键字参数与UDF一起使用? conv方法有一个关键字参数conv_type,默认情况下它分配给特定类型的格式化程序,但是我想在某些地方指定其他格式.由于关键字参数,在udf中无法通过.这里有使用关键字参数的其他方法吗?

Question I have is can we we use keyword arguments along with UDF in Pyspark as I did below. conv method has a keyword argument conv_type which by default is assigned to a specific type of formatter however I want to specify a different format at some places. Which is not getting through in udf because of keyword argument. Is there a different approach of using keyword argument here?

from datetime import datetime as dt, timedelta as td,date

tpid_date_dict = {'69': '%d/%m/%Y', '62': '%Y/%m/%d', '70201': '%m/%d/%y', '66': '%d.%m.%Y', '11': '%d-%m-%Y', '65': '%Y-%m-%d'}

def date_formatter_based_on_id(column, date_format):
    val = dt.strptime(str(column),'%Y-%m-%d').strftime(date_format)
    return val

def generic_date_formatter(column, date_format):
    val = dt.strptime(str(column),date_format).strftime('%Y-%m-%d')
    return val

def conv(column, id, conv_type=date_formatter_based_on_id):
    try:
        date_format=tpid_date_dict[id]
    except KeyError as e:
        print("Key value not found!")
    val = None
    if column:
        try:
            val = conv_type(column, date_format)
        except Exception as err:
            val = column
    return val

conv_func = functions.udf(conv, StringType())

date_formatted = renamed_cols.withColumn("check_in_std", 
conv_func(functions.col("check_in"), functions.col("id"), 
generic_date_formatter))

所以问题出在最后一条语句( date_formatted = named_cols.withColumn("check_in_std", conv_func(functions.col("check_in"),functions.col("id"), generic_date_formatter))) 由于第三个参数 generic_date_formatter 是关键字参数.

So the problem is with the last statement(date_formatted = renamed_cols.withColumn("check_in_std", conv_func(functions.col("check_in"), functions.col("id"), generic_date_formatter))) Since the third argument generic_date_formatter is a keyword argument.

尝试此操作时,出现以下错误: AttributeError:函数"对象没有属性"_get_object_id"

On trying this I get following error: AttributeError: 'function' object has no attribute '_get_object_id'

推荐答案

不幸的是,您不能将udf与关键字参数一起使用. UserDefinedFunction.__call__ 由位置参数定义仅限:

Unfortunately you cannot use udf with keyword arguments. UserDefinedFunction.__call__ is defined with positional arguments only:

def __call__(self, *cols):
    judf = self._judf
    sc = SparkContext._active_spark_context
    return Column(judf.apply(_to_seq(sc, cols, _to_java_column)))

,但是您遇到的问题与关键字参数并没有真正的关系.因为generic_date_formatter不是Column对象而是函数,所以会出现异常.

but the problem you have is not really related to keyword arguments. You get exception because generic_date_formatter is not a Column object but a function.

您可以动态创建udf:

def conv(conv_type=date_formatter_based_on_id):
    def _(column, id):
        try:
            date_format=tpid_date_dict[id]
        except KeyError as e:
            print("Key value not found!")
        val = None
        if column:
            try:
                val = conv_type(column, date_format)
            except Exception as err:
                val = column
        return val
    return udf(_, StringType())

可以称为:

conv_func(generic_date_formatter)(functions.col("check_in"), functions.col("id"))

检查将数据框列和外部列表传递给withColumn下的udf ,以获取详细信息.

Check Passing a data frame column and external list to udf under withColumn for details.

这篇关于我们可以在UDF中使用关键字参数吗的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆