咖喱UDF-Pyspark [英] Curried UDF - Pyspark
问题描述
我正在尝试在Spark中实现UDF;可以同时使用文字和列作为参数.为此,我相信我可以使用咖喱UDF.
I am trying to implement a UDF in spark; that can take both a literal and column as an argument. To achieve this, I believe I can use a curried UDF.
该函数用于将字符串文字与DataFrame
列中的每个值匹配.我总结了以下代码:-
The function is used to match a string literal to each value in the column of a DataFrame
. I have summarized the code below:-
def matching(match_string_1):
def matching_inner(match_string_2):
return difflib.SequenceMatcher(None, match_string_1, match_string_2).ratio()
return matching
hc.udf.register("matching", matching)
matching_udf = F.udf(matching, StringType())
df_matched = df.withColumn("matching_score", matching_udf(lit("match_string"))(df.column))
-
"match_string"
实际上是分配给我要遍历的列表的值. "match_string"
is actually a value assigned to a list which I am iterating over.
不幸的是,这没有像我希望的那样起作用.我正在收到
Unfortunately this is not working as I had hoped; and I am receiving
"TypeError:'列'对象不可调用".
"TypeError: 'Column' object is not callable".
我认为我没有正确调用此函数.
I believe I am not calling this function correctly.
推荐答案
应该是这样的:
def matching(match_string_1):
def matching_inner(match_string_2):
return difflib.SequenceMatcher(
a=match_string_1, b=match_string_2).ratio()
# Here create udf.
return F.udf(matching_inner, StringType())
df.withColumn("matching_score", matching("match_string")(df.column))
如果要为match_string_1
支持Column
参数,则必须像这样重写它:
If you want to support Column
argument for match_string_1
you'll have to rewrite it like this:
def matching(match_string_1):
def matching_inner(match_string_2):
return F.udf(
lambda a, b: difflib.SequenceMatcher(a=a, b=b).ratio(),
StringType())(match_string_1, match_string_2)
return matching_inner
df.withColumn("matching_score", matching(F.lit("match_string"))(df.column)
您当前的代码不起作用,matching_udf
是并且UDF和matching_udf(lit("match_string"))
创建了Column
表达式而不是调用内部函数.
Your current code doesn't work, matching_udf
is and UDF and matching_udf(lit("match_string"))
creates a Column
expression instead of calling internal function.
这篇关于咖喱UDF-Pyspark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!