在 Hive 中加入日期范围内的表 [英] Join Tables on Date Range in Hive

查看:21
本文介绍了在 Hive 中加入日期范围内的表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在employee_id上​​将tableA连接到tableB,并且表A中的cal_date需要在表B的日期开始和日期结束之间.我在查询下方运行并收到以下错误消息,请您帮我纠正和查询.谢谢你的帮助!

I need to join tableA to tableB on employee_id and the cal_date from table A need to be between date start and date end from table B. I ran below query and received below error message, Would you please help me to correct and query. Thank you for you help!

在 JOIN 'date_start' 中遇到左右别名.

select a.*, b.skill_group 
from tableA a 
  left join tableB b 
    on a.employee_id= b.employee_id 
    and a.cal_date >= b.date_start 
    and a.cal_date <= b.date_end

推荐答案

RTFM - 引用 LanguageManual Joins

Hive 不支持非等式条件的连接条件因为很难表达诸如 map/reduce 之类的条件工作.

Hive does not support join conditions that are not equality conditions as it is very difficult to express such conditions as a map/reduce job.

您可能会尝试将 BETWEEN 过滤器移至 WHERE 子句,从而导致糟糕的部分笛卡尔连接和后处理清理.哎呀.根据技能组"表的实际基数,它可能工作得很快 - 或需要一整天.

You may try to move the BETWEEN filter to a WHERE clause, resulting in a lousy partially-cartesian-join followed by a post-processing cleanup. Yuck. Depending on the actual cardinality of your "skill group" table, it may work fast - or take whole days.

这篇关于在 Hive 中加入日期范围内的表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆