如何将常量值传递给Python UDF? [英] How to pass a constant value to Python UDF?

查看：121 发布时间：2020/9/4 0:58:29 python apache-spark pyspark apache-spark-sql user-defined-functions

本文介绍了如何将常量值传递给Python UDF?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在想，是否有可能创建一个接收两个自变量Column和另一个变量(Object，Dictionary或任何其他类型)的UDF，然后执行一些操作并返回结果

I was thinking if it was possible to create an UDF that receives two arguments a Column and another variable (Object,Dictionary, or any other type), then do some operations and return the result.

实际上，我尝试执行此操作，但出现了异常.因此，我想知道是否有任何方法可以避免此问题.

Actually, I attempted to do this but I got an exception. Therefore, I was wondering if there was any way to avoid this problem.

df = sqlContext.createDataFrame([("Bonsanto", 20, 2000.00), 
                                 ("Hayek", 60, 3000.00), 
                                 ("Mises", 60, 1000.0)], 
                                ["name", "age", "balance"])

comparatorUDF = udf(lambda c, n: c == n, BooleanType())

df.where(comparatorUDF(col("name"), "Bonsanto")).show()

然后出现以下错误:

AnalysisException:u"无法解析给定输入列的"Bonsanto" 名称，年龄，余额；"

AnalysisException: u"cannot resolve 'Bonsanto' given input columns name, age, balance;"

因此，很明显，UDF将"string""Bonsanto"视为列名称，实际上，我正在尝试将记录值与第二个参数进行比较.

So it's obvious that the UDF "sees" the string "Bonsanto" as a column name, and actually I'm trying to compare a record value with the second argument.

另一方面，我知道可以在where子句中使用某些运算符(但实际上我想知道是否可以使用UDF来实现)，如下所示:

On the other hand, I know that it's possible to use some operators inside a where clause (but actually I want to know if it is achievable using an UDF), as follows:

df.where(col("name") == "Bonsanto").show()

#+--------+---+-------+
#|    name|age|balance|
#+--------+---+-------+
#|Bonsanto| 20| 2000.0|
#+--------+---+-------+

推荐答案

传递给UDF的所有内容都被解释为列/列名称.如果要传递文字，则有两种选择:

Everything that is passed to an UDF is interpreted as a column / column name. If you want to pass a literal you have two options:

使用currying传递参数:

Pass argument using currying:

def comparatorUDF(n):
    return udf(lambda c: c == n, BooleanType())

df.where(comparatorUDF("Bonsanto")(col("name")))

此参数可以与任何类型的参数一起使用，只要它可以序列化即可.

This can be used with an argument of any type as long as it is serializable.

使用SQL文字和当前实现:

Use a SQL literal and the current implementation:

from pyspark.sql.functions import lit

df.where(comparatorUDF(col("name"), lit("Bonsanto")))

这仅适用于受支持的类型(字符串，数字，布尔值).对于非原子类型，请参见如何在Spark DataFrame中添加常量列?

This works only with supported types (strings, numerics, booleans). For non-atomic types see How to add a constant column in a Spark DataFrame?

这篇关于如何将常量值传递给Python UDF?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将常量值传递给Python UDF? [英] How to pass a constant value to Python UDF?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将常量值传递给Python UDF? [英] How to pass a constant value to Python UDF?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭