如何将常量值传递给 Python UDF? [英] How to pass a constant value to Python UDF?

查看：26 发布时间：2021/11/14 21:30:33 python apache-spark pyspark apache-spark-sql user-defined-functions

本文介绍了如何将常量值传递给 Python UDF?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在想是否有可能创建一个 UDF 接收两个参数一个 Column 和另一个变量 (Object,字典，或任何其他类型)，然后执行一些操作并返回结果.

I was thinking if it was possible to create an UDF that receives two arguments a Column and another variable (Object,Dictionary, or any other type), then do some operations and return the result.

实际上，我试图这样做，但我得到了一个例外.所以，我想知道有没有什么办法可以避免这个问题.

Actually, I attempted to do this but I got an exception. Therefore, I was wondering if there was any way to avoid this problem.

df = sqlContext.createDataFrame([("Bonsanto", 20, 2000.00), 
                                 ("Hayek", 60, 3000.00), 
                                 ("Mises", 60, 1000.0)], 
                                ["name", "age", "balance"])

comparatorUDF = udf(lambda c, n: c == n, BooleanType())

df.where(comparatorUDF(col("name"), "Bonsanto")).show()

我收到以下错误:

AnalysisException: u"cannot resolve 'Bonsanto' 给定的输入列姓名、年龄、余额；"

AnalysisException: u"cannot resolve 'Bonsanto' given input columns name, age, balance;"

所以很明显UDF看到"stringBonsanto"作为列名，实际上我试图将记录值与第二个进行比较论证.

So it's obvious that the UDF "sees" the string "Bonsanto" as a column name, and actually I'm trying to compare a record value with the second argument.

另一方面，我知道可以在 where 子句中使用一些运算符(但实际上我想知道它是否可以使用 UDF 实现)，如下:

On the other hand, I know that it's possible to use some operators inside a where clause (but actually I want to know if it is achievable using an UDF), as follows:

df.where(col("name") == "Bonsanto").show()

#+--------+---+-------+
#|    name|age|balance|
#+--------+---+-------+
#|Bonsanto| 20| 2000.0|
#+--------+---+-------+

推荐答案

传递给 UDF 的所有内容都被解释为列/列名称.如果你想传递一个文字，你有两个选择:

Everything that is passed to an UDF is interpreted as a column / column name. If you want to pass a literal you have two options:

使用柯里化传递参数:

Pass argument using currying:

def comparatorUDF(n):
    return udf(lambda c: c == n, BooleanType())

df.where(comparatorUDF("Bonsanto")(col("name")))

这可以与任何类型的参数一起使用，只要它是可序列化的.

This can be used with an argument of any type as long as it is serializable.

使用 SQL 文字和当前实现:

Use a SQL literal and the current implementation:

from pyspark.sql.functions import lit

df.where(comparatorUDF(col("name"), lit("Bonsanto")))

这仅适用于支持的类型(字符串、数字、布尔值).对于非原子类型，请参阅如何在 Spark DataFrame 中添加常量列?

This works only with supported types (strings, numerics, booleans). For non-atomic types see How to add a constant column in a Spark DataFrame?

这篇关于如何将常量值传递给 Python UDF?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将常量值传递给 Python UDF? [英] How to pass a constant value to Python UDF?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将常量值传递给 Python UDF? [英] How to pass a constant value to Python UDF?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭