重写SQL查询以接受where子句中的日期差异条件 [英] Rewrite SQL query to accept date difference condition in where clause
问题描述
下面是TestingTable1中的数据
BUYER_ID | ITEM_ID | CREATED_TIME
----------- + ------------------- + ------------- -----------
1345653 110909316904 2012-07-09 21:29:06
1345653 151851771618 2012-07-09 19:57:33
1345653 221065796761 2012 -07-09 19:31:48
1345653 400307563710 2012-07-09 18:57:33
1345653 310411560125 2012-07-09 16:09:49
1345653 120945302103 2012-07 -09 13:40:23
1345653 261060982989 2012-07-09 09:02:21
以下是TestingTable2中的数据
USER_ID | PRODUCT_ID | LAST_TIME
----------- + ------------------- + ------------- ------
1345653 110909316904 2012-07-09 21:30:06
1345653 152851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19 :31:48
1345653 400307563710 2012-07-09 18:57:33
I需要比较 TestingTable2
与 TestingTable1
于 BUYER_ID
和 USER_ID
。我需要从 TestingTable1
进行比较后找到 TestingTable2
中缺失和不匹配的项目(基本上是计数)。我为此创建了SQL小提琴 -
http ://sqlfiddle.com/#!3 / d87b2 / 1
如果您在SQL Fiddle中运行我的查询,您将得到as-
BUYER_ID错误
1345653 5
$在$ TestingTable1
中缺少 $ b
这是最后一个三个
行。 c $ c> TestingTable1 on BUYER_ID
和 USER_ID
。
现在复杂的事情开始了。
问题陈述 -
在我当前的输出中,作为 所以在从Stack Overflow获得帮助后,我找到了解决方案,下面是SQL查询语句,它在SQL Server中正常工作将错误计数设为4),但不在Hive中,因为 执行上述功能后需要的预期输出 - UPDATE: AS按照WEST注释,输出只显示ERROR计数为1,但应显示为4.并且也删除了他在SQL小提琴中添加的最后一行后,它不工作,我得到零错误,这是不正确的,因为时差已经有一个错误。 如果你做了一个equijoin,并将你的时间比较逻辑放在CASE表达式中,而不是COUNT? 您需要将日期算法转换为任何配置单元使用... 这里有一个 MS Sql server SQLFiddle ,它会得到4个错误返回。 Below is the data in TestingTable1 Below is the data in TestingTable2 I need to Compare http://sqlfiddle.com/#!3/d87b2/1 If you run my query in the SQL Fiddle, you will get output as- which is right as last Now the complicated thing is starting. Problem Statement- In my current output, I am getting ERROR count as So after taking help from Stack Overflow, I found the solution for this, and below is the sql query that works fine in SQL server(which will give error count as 4) but not in Hive as Expected Output that I need after implementing the above feature- UPDATE:- AS Per Below WEST comment, the output only show ERROR count as 1, but it should be showing as 4. And also after removing the last row he added in his SQL fiddle, its not working and I am getting zero error, which is not right as there is already one error in the time difference. What if you do an equijoin, and put your time comparison logic inside of a CASE expression with a SUM, instead of a COUNT? You will need to convert the date arithmetic to whatever hive uses... Here's a MS Sql server SQLFiddle which gets 4 errors returned. 这篇关于重写SQL查询以接受where子句中的日期差异条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋! 5
。因此,如果您在表格中看到第一行 ITEM_ID
和 PRODUCT_ID
相同但 LAST_TIME
不相同,并且这两次之间的差异仅为 1分钟
。所以目前我报告说这是不匹配,但我需要的是,如果它们之间的差异在 15分钟范围
内,那么我不想报告为错误。因此,在我当前的查询中实现此功能后,我将获得错误计数为 4
,因为差异在 15分钟范围内
为第一行。
Hive仅支持相等JOINS
,我无法在Hive中运行以下查询。所以我需要其他方式来解决这个问题。是否有可能以某种方式在where子句中执行日期差异条件? 基本上,我可以用其他方式重写下面的SQL查询
,以满足上述所有要求。
SELECT TT.BUYER_ID,
COUNT(*)
FROM(
SELECT testingtable1.buyer_id,
testingtable1.item_id,
testingtable1.created_time
FROM testingtable2
RIGHT JOIN testingtable1
ON(
testingtable1.item_id = testingtable2.product_id
AND testingtable1.BUYER_ID = testingtable2.USER_ID
AND ABS(DATEDIFF(mi,testingtable1.created_time,testingtable2.last_time))<= 15
)
WHERE testingtable2.product_id IS NULL
)TT
GROUP BY TT.BUYER_ID ;
BUYER_ID错误
1345653 4
SELECT TT1.BUYER_ID,
SUM(CASE WHEN ABS(DATEDIFF(mi,TT1.created_time,TT2.last_time))<= 15 THEN 0
ELSE 1
END)AS ERROR
FROM testingtable1 TT1
LEFT JOIN testingtable2 TT2
ON(
TT1.item_id = TT2.product_id
AND TT1.BUYER_ID = TT2.USER_ID
)
GROUP BY TT1.BUYER_ID;
BUYER_ID | ITEM_ID | CREATED_TIME
-----------+-------------------+------------------------
1345653 110909316904 2012-07-09 21:29:06
1345653 151851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19:31:48
1345653 400307563710 2012-07-09 18:57:33
1345653 310411560125 2012-07-09 16:09:49
1345653 120945302103 2012-07-09 13:40:23
1345653 261060982989 2012-07-09 09:02:21
USER_ID | PRODUCT_ID | LAST_TIME
-----------+-------------------+-------------------
1345653 110909316904 2012-07-09 21:30:06
1345653 152851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19:31:48
1345653 400307563710 2012-07-09 18:57:33
TestingTable2
with TestingTable1
on BUYER_ID
and USER_ID
. And I need to find all (basically the count) the missing and mismatch entries in TestingTable2
after comparing from TestingTable1
. I created SQL fiddle for this-BUYER_ID ERROR
1345653 5
three
rows from TestingTable1
is missing in TestingTable2
and rest two
are mismatch after comparison from TestingTable1
on BUYER_ID
and USER_ID
.5
. So if you see first row in both the tables ITEM_ID
and PRODUCT_ID
are same but CREATED_TIME
and LAST_TIME
is not same, and difference between those two times is of only 1 minute
. So currently I am reporting that as a mismatch, but what I need is that if the difference between them is within 15 minutes range
, then I don't want to report as an error. So after implementing this feature in my current query, I will be getting error count as 4
because difference is within 15 minutes range
for the first row.Hive supports only equality JOINS
and I cannot run the below query in Hive. So I need some other way of doing this problem. Is it possible to do the date difference condition in where clause somehow? Basically how I can rewrite the below SQL query
in some other way such that it would fulfill my all the requirements above.SELECT TT.BUYER_ID,
COUNT(*)
FROM (
SELECT testingtable1.buyer_id,
testingtable1.item_id,
testingtable1.created_time
FROM testingtable2
RIGHT JOIN testingtable1
ON (
testingtable1.item_id = testingtable2.product_id
AND testingtable1.BUYER_ID = testingtable2.USER_ID
AND ABS(DATEDIFF(mi, testingtable1.created_time, testingtable2.last_time)) <= 15
)
WHERE testingtable2.product_id IS NULL
) TT
GROUP BY TT.BUYER_ID;
BUYER_ID ERROR
1345653 4
SELECT TT1.BUYER_ID,
SUM(CASE WHEN ABS(DATEDIFF(mi, TT1.created_time, TT2.last_time)) <= 15 THEN 0
ELSE 1
END) AS ERROR
FROM testingtable1 TT1
LEFT JOIN testingtable2 TT2
ON (
TT1.item_id = TT2.product_id
AND TT1.BUYER_ID = TT2.USER_ID
)
GROUP BY TT1.BUYER_ID;