在 Hive 中加入日期范围内的表 [英] Join Tables on Date Range in Hive
问题描述
我需要在employee_id上将tableA连接到tableB,并且表A中的cal_date需要在表B的日期开始和日期结束之间.我在查询下方运行并收到以下错误消息,请您帮我纠正和查询.谢谢你的帮助!
I need to join tableA to tableB on employee_id and the cal_date from table A need to be between date start and date end from table B. I ran below query and received below error message, Would you please help me to correct and query. Thank you for you help!
在 JOIN 'date_start' 中遇到左右别名.
select a.*, b.skill_group
from tableA a
left join tableB b
on a.employee_id= b.employee_id
and a.cal_date >= b.date_start
and a.cal_date <= b.date_end
推荐答案
RTFM - 引用 LanguageManual Joins
Hive 不支持非等式条件的连接条件因为很难表达诸如 map/reduce 之类的条件工作.
Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job.
您可能会尝试将 BETWEEN 过滤器移至 WHERE 子句,从而导致糟糕的部分笛卡尔连接和后处理清理.哎呀.根据技能组"表的实际基数,它可能工作得很快 - 或需要一整天.
You may try to move the BETWEEN filter to a WHERE clause, resulting in a lousy partially-cartesian-join followed by a post-processing cleanup. Yuck. Depending on the actual cardinality of your "skill group" table, it may work fast - or take whole days.
这篇关于在 Hive 中加入日期范围内的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!