在udf中加入两个表格 [英] joining two tables on a udf in hive
问题描述
在我写一个用于配置单元的udf之前的一个基本问题。我想根据自定义的UDF连接两个表,它从表a和另一个表b得到一个参数。我已经看到UDF的例子,它们从一个表中加入参数。从两个表中获取参数的效果是否同样好?
A basic question before i write a udf to be used in hive. I want to join two tables based on custom UDF which takes an argument from table a and another from table b. I have seen examples of UDFs which take arguments from one of the tables to be joined. Does taking arguments from two tables work equally well?.
推荐答案
听起来就像你想要一个函数
It sounds like you want a function
function my_udf(val_A, val_B):
trans_A = <do something to val_A>
trans_B = <do something to val_B>
return trans_A cmp trans_B
UDF将返回一个布尔值,您可以在ON条款。
The UDF will return a boolean, which you can use in an ON clause.
我不确定您可以直接在Hive中执行此操作,但您始终可以使用两个UDF将val_A转换为trans_A,将val_B转换为trans_B,然后使用正常的ON:
I'm not sure you can do this directly in Hive, but you can always use two UDFs to transform val_A to trans_A and val_B to trans_B then use a normal ON:
select *
from
(select *, udf_A(some_column) as trans_A from A) as AA
JOIN
(select *, udf_B(some_column) as trans_B from B) as BB on AA.trans_A = BB.trans_B
这篇关于在udf中加入两个表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!