Hive:解决非Equi左连接问题 [英] Hive: work around for non equi left join

查看:215
本文介绍了Hive:解决非Equi左连接问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Hive不支持non equi连接:常见的解决方法是将连接条件移至where子句,当您需要内部连接时可以很好地工作。但是左连接怎么样?



构造示例。假设我们有一个orderLineItem表,我们需要加入一个ProductPrice表,该表有一个productID,price&价格适用的日期范围。我们想加入到这里ProductID = ProductID&订单日期在开始日期和结束日期之间。如果productID或有效的日期范围不匹配,我仍然希望看到所有orderLineItems。



这个SQL小提琴是我们如何做到这一点的一个例子在MSSQL中:
http://sqlfiddle.com/#!6/fb877/7

问题
如果我应用典型的解决方法,并将非equi过滤器移至where子句,内连接。在上面的例子中,在sql小提琴&下面,我有一个产品ID不在查找中。



问题:
提供的配置单元不支持non eqi-如何能够实现一个左非eqi?



[SQLFiddle内容]



表: p
$ b $ pre $ code $ CREATE TABLE OrderLineItem(
LineItemIDId int IDENTITY(1,1),
OrderID int NOT NULL,
ProductID int NOT NULL,
OrderDate Date
);


CREATE TABLE ProductPrice(
ProductID int,
成本浮点数,
startDate日期,
EndDate日期


);

加载数据&我们如何加入MSSQL:

   - 旧价格。应忽略
INSERT INTO ProductPrice(ProductID,COST,startDate,EndDate)VALUES(1,50,'12 / 1/2012','1/1/2013');
INSERT INTO ProductPrice(ProductID,COST,startDate,EndDate)VALUES(2,55,'12 / 1/2012','1/1/2013');

- 订单2.价格应适用于订单1
插入产品价格(产品ID,成本,开始日期,结束日期)VALUES(1,20,'12 / 1/2013' , '1/1/2014年');
INSERT INTO ProductPrice(ProductID,COST,startDate,EndDate)VALUES(2,25,'12 / 1/2013','1/1/2014');

- 订单2.价格应适用于订单2
插入产品价格(产品ID,成本,开始日期,结束日期)价值(1,15,'1/2/2014' , '3/1/2014年');
INSERT INTO ProductPrice(ProductID,COST,startDate,EndDate)VALUES(2,20',1/2/2014','3/1/2014');


- 2014年1月1日订单
INSERT INTO OrderLineItem(订单ID,产品ID,订单日期)VALUES(1,1,'1/1/2014');
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate)VALUES(1,2,'1/1/2014');

--Feb 1st 2014 Order
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate)VALUES(2,1,'2/1/2014');
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate)VALUES(2,2,'2/1/2014');
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate)VALUES(2,3,'2/1/2014'); - 无价格

SELECT * FROM OrderLineItem;

SELECT * FROM OrderLineItem li LEFT OUTER JOIN ProductPrice p on
p.ProductID = li.ProductID AND OrderDate BETWEEN startDate AND EndDate;


解决方案

为什么不使用允许NULL的WHERE子句分开?

  SELECT * FROM OrderLineItem li 
LEFT OUTER JOIN ProductPrice p
ON p.ProductID = li .ProductID
WHERE(StartDate IS NULL或OrderDate BETWEEN startDate AND EndDate);

这应该照顾它 - 如果左连接匹配,它将使用日期逻辑if它不会像NULL join一样保持NULL值。


Hive does not support non equi joins: The common work around is to move the join condition to the where clause, which work fine when you want an inner join. but what about a left join?

Contrived example. Let say we have an orderLineItem table, and we need to join to a ProductPrice table that has a productID, price & a date range for which the price applies. We want to join to this where ProductID=ProductID & OrderDate between start and End date. If a productID or a valid date range do not match, I'd still want to see all orderLineItems.

This SQL fiddle is an example of how we'd do this in MSSQL: http://sqlfiddle.com/#!6/fb877/7

Problem If I apply the typical workaround, and move the non equi filter to the where clause, it becomes an inner join. In the case above, in the sql fiddle & below, I have a product ID that is not in the lookup.

Question: Provided hive does not support non eqi-joins, How can a left non-eqi be achieved ?

[SQLFiddle Content]

Tables:

CREATE TABLE OrderLineItem(
  LineItemIDId int IDENTITY(1,1),
  OrderID int  NOT NULL,
  ProductID int NOT NULL,
  OrderDate Date
);


CREATE TABLE ProductPrice(
  ProductID int,
  Cost float,
  startDate  Date,
  EndDate  Date


);

loading The data & how we'd join in MSSQL:

--Old Price. Should be ignored
INSERT INTO ProductPrice(ProductID, COST,startDate,EndDate) VALUES  (1, 50,'12/1/2012','1/1/2013');
INSERT INTO ProductPrice(ProductID, COST,startDate,EndDate) VALUES (2, 55,'12/1/2012','1/1/2013');

--Price for Order 2. Should be applied to Order 1
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(1, 20,'12/1/2013','1/1/2014');
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(2, 25,'12/1/2013','1/1/2014');

--Price for Order 2. Should be applied to Order 2
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(1, 15,'1/2/2014','3/1/2014');
INSERT INTO ProductPrice (ProductID, COST,startDate,EndDate) VALUES(2, 20,'1/2/2014','3/1/2014');


--January 1st 2014 Order
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate) VALUES (1, 1,'1/1/2014') ;
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate) VALUES (1, 2,'1/1/2014');

--Feb 1st 2014 Order
INSERT INTO OrderLineItem(OrderID,ProductID,OrderDate) VALUES (2, 1,'2/1/2014');
INSERT INTO OrderLineItem (OrderID,ProductID,OrderDate) VALUES(2, 2,'2/1/2014');
INSERT INTO OrderLineItem (OrderID,ProductID,OrderDate) VALUES(2, 3,'2/1/2014'); -- no price

SELECT * FROM OrderLineItem;

SELECT * FROM OrderLineItem li LEFT OUTER JOIN  ProductPrice p on
p.ProductID=li.ProductID AND  OrderDate BETWEEN  startDate AND  EndDate;

解决方案

Why not use a WHERE clause that allows for NULL cases separately?

SELECT * FROM OrderLineItem li 
LEFT OUTER JOIN  ProductPrice p 
ON p.ProductID=li.ProductID 
WHERE ( StartDate IS NULL OR OrderDate BETWEEN startDate AND EndDate);

That should take care of it - if the left join matches it'll use the date logic, if it doesn't it'll keep the NULL values intact as a left join should.

这篇关于Hive:解决非Equi左连接问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆