在udf中加入两个表格 [英] joining two tables on a udf in hive

查看:165
本文介绍了在udf中加入两个表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我写一个用于配置单元的udf之前的一个基本问题。我想根据自定义的UDF连接两个表,它从表a和另一个表b得到一个参数。我已经看到UDF的例子,它们从一个表中加入参数。从两个表中获取参数的效果是否同样好?

A basic question before i write a udf to be used in hive. I want to join two tables based on custom UDF which takes an argument from table a and another from table b. I have seen examples of UDFs which take arguments from one of the tables to be joined. Does taking arguments from two tables work equally well?.

推荐答案

听起来就像你想要一个函数

It sounds like you want a function

function my_udf(val_A, val_B):
    trans_A = <do something to val_A>
    trans_B = <do something to val_B>
    return trans_A cmp trans_B

UDF将返回一个布尔值,您可以在ON条款。

The UDF will return a boolean, which you can use in an ON clause.

我不确定您可以直接在Hive中执行此操作,但您始终可以使用两个UDF将val_A转换为trans_A,将val_B转换为trans_B,然后使用正常的ON:

I'm not sure you can do this directly in Hive, but you can always use two UDFs to transform val_A to trans_A and val_B to trans_B then use a normal ON:

select *
from
    (select *, udf_A(some_column) as trans_A from A) as AA
    JOIN
    (select *, udf_B(some_column) as trans_B from B) as BB on AA.trans_A = BB.trans_B

这篇关于在udf中加入两个表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆