Hive加入获取日期明智的图片 [英] Hive join to get the date wise picture
问题描述
我试图在加入时使用表B中的一个关键字和日期以及 >相应的来自表A 的前一个最近日期记录加入HIVE中的2个表。例如:以下是2个输入表
< ---------- TABLE A --- ----------> < ------------表B ------------>
A_id A_date changed_col B_id B_date B_value A_id
**** ****** *********** **** ****** **** *** *****
A01 2017-03-20 ABC B01 2017-04-02 200 A01
A01 2017-04-01 XYZ B01 2017-04-04 500 A01
A01 2017-04-05 LLL B02 2017-04-07 900 A02
A02 2017-04-06 KKK B02 2017-04-30 800 A02
A02 2017-04-12 JJJ
但是,当我使用表A连接表B时,它应该查找表A中最近的最低日期援助)。以下是预期的产出表:
B_id B_date A_id A_date changed_col B_value
**** ***** * **** ****** *********** *******
B01 2017-02-04 A01 2017-01-04 XYZ 200
B01 2017-04-04 A01 2017-01-04 XYZ 500
B02 2017-04-07 A02 2017-04-06 KKK 900
B02 2017-04-30 A02 2017-04-12 JJJ 800
非常感谢任何帮助。谢谢
我自己想出了这个查询
<
(select b。*,a。*,
row_number()over(由b.a_id分区,b.b_date
按b.b_date排序, datediff(to_date(b.b_date),to_date(a.a_date)))as rnk
FROM b JOIN a
ON b.a_id = a.a_id
WHERE a.a_date< = b.b_date
)j1其中j1.rnk = 1
B_id B_date B_value A_id A_date changed_col rnk
B01 2017-04-02 200 A01 A01 2017-04-01 XYZ 1
B01 2017-04-04 500 A01 A01 2017-04-01 XYZ 1
B02 2017-04-07 900 A02 A02 2017-04-06 KKK 1
B02 2017-04-30 800 A02 A02 2017-04-12 JJJ 1
您可以在第1行中选择所需的列。现在我已经选择了A和B的所有专栏,包括排名。
I am trying to join 2 tables in HIVE using a key and date from table B and the corresponding previous closest date record from table A at the time of join. For example: Below are the 2 input tables
<----------TABLE A-------------> <------------TABLE B------------>
A_id A_date changed_col B_id B_date B_value A_id
**** ****** *********** **** ****** ******* *****
A01 2017-03-20 ABC B01 2017-04-02 200 A01
A01 2017-04-01 XYZ B01 2017-04-04 500 A01
A01 2017-04-05 LLL B02 2017-04-07 900 A02
A02 2017-04-06 KKK B02 2017-04-30 800 A02
A02 2017-04-12 JJJ
However when I JOIN table B with Table A, it should look for the nearest lowest date in table A for the same key(A_id). Below is the expected output table:
B_id B_date A_id A_date changed_col B_value
**** ****** **** ****** *********** *******
B01 2017-02-04 A01 2017-01-04 XYZ 200
B01 2017-04-04 A01 2017-01-04 XYZ 500
B02 2017-04-07 A02 2017-04-06 KKK 900
B02 2017-04-30 A02 2017-04-12 JJJ 800
Any help is much appreciated. Thanks
Figured out the query myself
select * from
(select b.*, a.*,
row_number() over (partition by b.a_id, b.b_date
order by b.b_date, datediff(to_date(b.b_date), to_date(a.a_date)) ) as rnk
FROM b JOIN a
ON b.a_id= a.a_id
WHERE a.a_date <= b.b_date
) j1 where j1.rnk = 1
B_id B_date B_value A_id A_date changed_col rnk
B01 2017-04-02 200 A01 A01 2017-04-01 XYZ 1
B01 2017-04-04 500 A01 A01 2017-04-01 XYZ 1
B02 2017-04-07 900 A02 A02 2017-04-06 KKK 1
B02 2017-04-30 800 A02 A02 2017-04-12 JJJ 1
You can select the required columns in line 1. For now I have selected all column from A and B including the rank.
这篇关于Hive加入获取日期明智的图片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!