重写SQL查询以接受where子句中的日期差异条件 [英] Rewrite SQL query to accept date difference condition in where clause

查看:193
本文介绍了重写SQL查询以接受where子句中的日期差异条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是TestingTable1中的数据

  BUYER_ID | ITEM_ID | CREATED_TIME 
----------- + ------------------- + ------------- -----------
1345653 110909316904 2012-07-09 21:29:06
1345653 151851771618 2012-07-09 19:57:33
1345653 221065796761 2012 -07-09 19:31:48
1345653 400307563710 2012-07-09 18:57:33
1345653 310411560125 2012-07-09 16:09:49
1345653 120945302103 2012-07 -09 13:40:23
1345653 261060982989 2012-07-09 09:02:21

以下是TestingTable2中的数据

  USER_ID | PRODUCT_ID | LAST_TIME 
----------- + ------------------- + ------------- ------
1345653 110909316904 2012-07-09 21:30:06
1345653 152851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19 :31:48
1345653 400307563710 2012-07-09 18:57:33

I需要比较 TestingTable2 TestingTable1 BUYER_ID USER_ID 。我需要从 TestingTable1 进行比较后找到 TestingTable2 中缺失和不匹配的项目(基本上是计数)。我为此创建了SQL小提琴 -



http ://sqlfiddle.com/#!3 / d87b2 / 1



如果您在SQL Fiddle中运行我的查询,您将得到as-

  BUYER_ID错误
1345653 5

$在$ TestingTable1 中缺少 $ b

这是最后一个三个行。 c $ c> TestingTable1 on BUYER_ID USER_ID



现在复杂的事情开始了。

问题陈述 -



在我当前的输出中,作为 5 。因此,如果您在表格中看到第一行 ITEM_ID PRODUCT_ID 相同但 CREATED_TIME 和 LAST_TIME 不相同,并且这两次之间的差异仅为 1分钟。所以目前我报告说这是不匹配,但我需要的是,如果它们之间的差异在 15分钟范围内,那么我不想报告为错误。因此,在我当前的查询中实现此功能后,我将获得错误计数为 4 ,因为差异在 15分钟范围内为第一行。

所以在从Stack Overflow获得帮助后,我找到了解决方案,下面是SQL查询语句,它在SQL Server中正常工作将错误计数设为4),但不在Hive中,因为 Hive仅支持相等JOINS ,我无法在Hive中运行以下查询。所以我需要其他方式来解决这个问题。是否有可能以某种方式在where子句中执行日期差异条件? 基本上,我可以用其他方式重写下面的SQL查询,以满足上述所有要求。

  SELECT TT.BUYER_ID,
COUNT(*)
FROM(
SELECT testingtable1.buyer_id,
testingtable1.item_id,
testingtable1.created_time
FROM testingtable2
RIGHT JOIN testingtable1
ON(
testingtable1.item_id = testingtable2.product_id
AND testingtable1.BUYER_ID = testingtable2.USER_ID
AND ABS(DATEDIFF(mi,testingtable1.created_time,testingtable2.last_time))<= 15

WHERE testingtable2.product_id IS NULL
)TT
GROUP BY TT.BUYER_ID ;

执行上述功能后需要的预期输出 -

  BUYER_ID错误
1345653 4

UPDATE:



AS按照WEST注释,输出只显示ERROR计数为1,但应显示为4.并且也删除了他在SQL小提琴中添加的最后一行后,它不工作,我得到零错误,这是不正确的,因为时差已经有一个错误。

解决方案

如果你做了一个equijoin,并将你的时间比较逻辑放在CASE表达式中,而不是COUNT?

  SELECT TT1.BUYER_ID,
SUM(CASE WHEN ABS(DATEDIFF(mi,TT1.created_time,TT2.last_time))<= 15 THEN 0
ELSE 1
END)AS ERROR
FROM testingtable1 TT1
LEFT JOIN testingtable2 TT2
ON(
TT1.item_id = TT2.product_id
AND TT1.BUYER_ID = TT2.USER_ID

GROUP BY TT1.BUYER_ID;

您需要将日期算法转换为任何配置单元使用...



这里有一个 MS Sql server SQLFiddle ,它会得到4个错误返回。


Below is the data in TestingTable1

BUYER_ID   |   ITEM_ID         |    CREATED_TIME
-----------+-------------------+------------------------
1345653        110909316904         2012-07-09 21:29:06
1345653        151851771618         2012-07-09 19:57:33
1345653        221065796761         2012-07-09 19:31:48
1345653        400307563710         2012-07-09 18:57:33
1345653        310411560125         2012-07-09 16:09:49
1345653        120945302103         2012-07-09 13:40:23
1345653        261060982989         2012-07-09 09:02:21

Below is the data in TestingTable2

USER_ID   |   PRODUCT_ID           |    LAST_TIME
-----------+-------------------+-------------------
1345653       110909316904         2012-07-09 21:30:06
1345653       152851771618         2012-07-09 19:57:33
1345653       221065796761         2012-07-09 19:31:48
1345653       400307563710         2012-07-09 18:57:33

I need to Compare TestingTable2 with TestingTable1 on BUYER_ID and USER_ID. And I need to find all (basically the count) the missing and mismatch entries in TestingTable2 after comparing from TestingTable1. I created SQL fiddle for this-

http://sqlfiddle.com/#!3/d87b2/1

If you run my query in the SQL Fiddle, you will get output as-

BUYER_ID    ERROR
1345653       5

which is right as last three rows from TestingTable1 is missing in TestingTable2 and rest two are mismatch after comparison from TestingTable1 on BUYER_ID and USER_ID.

Now the complicated thing is starting.

Problem Statement-

In my current output, I am getting ERROR count as 5. So if you see first row in both the tables ITEM_ID and PRODUCT_ID are same but CREATED_TIME and LAST_TIME is not same, and difference between those two times is of only 1 minute. So currently I am reporting that as a mismatch, but what I need is that if the difference between them is within 15 minutes range, then I don't want to report as an error. So after implementing this feature in my current query, I will be getting error count as 4 because difference is within 15 minutes range for the first row.

So after taking help from Stack Overflow, I found the solution for this, and below is the sql query that works fine in SQL server(which will give error count as 4) but not in Hive as Hive supports only equality JOINS and I cannot run the below query in Hive. So I need some other way of doing this problem. Is it possible to do the date difference condition in where clause somehow? Basically how I can rewrite the below SQL query in some other way such that it would fulfill my all the requirements above.

SELECT  TT.BUYER_ID,
        COUNT(*)
FROM    (
          SELECT    testingtable1.buyer_id,
                    testingtable1.item_id,
                    testingtable1.created_time
          FROM      testingtable2
                    RIGHT JOIN testingtable1
                        ON (
                             testingtable1.item_id = testingtable2.product_id
                             AND testingtable1.BUYER_ID = testingtable2.USER_ID
                             AND ABS(DATEDIFF(mi, testingtable1.created_time, testingtable2.last_time)) <= 15
                           )
          WHERE     testingtable2.product_id IS NULL
        ) TT
GROUP BY TT.BUYER_ID;

Expected Output that I need after implementing the above feature-

BUYER_ID    ERROR
1345653       4

UPDATE:-

AS Per Below WEST comment, the output only show ERROR count as 1, but it should be showing as 4. And also after removing the last row he added in his SQL fiddle, its not working and I am getting zero error, which is not right as there is already one error in the time difference.

解决方案

What if you do an equijoin, and put your time comparison logic inside of a CASE expression with a SUM, instead of a COUNT?

SELECT  TT1.BUYER_ID,
        SUM(CASE WHEN ABS(DATEDIFF(mi, TT1.created_time, TT2.last_time)) <= 15 THEN 0
                 ELSE 1
            END) AS ERROR
FROM    testingtable1 TT1
        LEFT JOIN testingtable2 TT2
            ON (
                 TT1.item_id = TT2.product_id
                 AND TT1.BUYER_ID = TT2.USER_ID
               )
GROUP BY TT1.BUYER_ID;

You will need to convert the date arithmetic to whatever hive uses...

Here's a MS Sql server SQLFiddle which gets 4 errors returned.

这篇关于重写SQL查询以接受where子句中的日期差异条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆