计算不匹配并丢失 [英] Count the mismatch and missing
问题描述
下面是TestingTable1中的数据
BUYER_ID | ITEM_ID | CREATED_TIME
----------- + ------------------- + ------------- -----------
1345653 110909316904 2012-07-09 21:29:06
1345653 151851771618 2012-07-09 19:57:33
1345653 221065796761 2012 -07-09 19:31:48
1345653 400307563710 2012-07-09 18:57:33
1345653 310411560125 2012-07-09 16:09:49
1345653 120945302103 2012-07 -09 13:40:23
1345653 261060982989 2012-07-09 09:02:21
以下是TestingTable2中的数据
USER_ID | PRODUCT_ID | LAST_TIME
----------- + ------------------- + ------------- ------
1345653 110909316904 2012-07-09 21:30:06
1345653 152851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19 :31:48
1345653 400307563710 2012-07-09 18:57:33
I需要比较 TestingTable2
与 TestingTable1
于 BUYER_ID
和 USER_ID
。我需要从 TestingTable1
进行比较后找到 TestingTable2
中缺失和不匹配的项目(基本上是计数)。我为此创建了SQL小提琴 -
http ://sqlfiddle.com/#!3 / d87b2 / 1
如果您在SQL Fiddle中运行我的查询,您将得到as-
BUYER_ID错误
1345653 5
$在$ TestingTable1
中缺少 $ b
这是最后一个三个
行。 c $ c> TestingTable1 on BUYER_ID
和 USER_ID
。
现在复杂的事情开始了。
问题陈述 -
在我当前的输出中,作为 5
。因此,如果您在表格中看到第一行 ITEM_ID
和 PRODUCT_ID
相同但 LAST_TIME
不相同,并且这两次之间的差异仅为 1分钟
。所以目前我报告说这是不匹配,但我需要的是,如果它们之间的差异在 15分钟范围
内,那么我不想报告为错误。因此,在我当前的查询中实现此功能后,我将获得错误计数为4,因为差异在第一行的15分钟范围内。
P.S-我正与Hive和Hive一起支持sql语法。所以我认为任何修改都可以在我当前的查询中运行。
您可以修改SQL Server SQL Fiddle查询,上班。但是,我不确定Hive是否支持datediff
SELECT TT.BUYER_ID,COUNT(*)FROM
(SELECT testingtable1.buyer_id,testingtable1.item_id,testingtable1.created_time FROM
testingtable2 RIGHT JOIN testingtable1
ON(testingtable1.item_id = testingtable2.product_id
AND testingtable1.BUYER_ID = testingtable2.USER_ID
AND abs(datediff(mi,testingtable1.created_time,testingtable2.last_time))<= 15)
where testingtable2.product_id IS NULL)TT GROUP BY TT.BUYER_ID;
Below is the data in TestingTable1
BUYER_ID | ITEM_ID | CREATED_TIME
-----------+-------------------+------------------------
1345653 110909316904 2012-07-09 21:29:06
1345653 151851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19:31:48
1345653 400307563710 2012-07-09 18:57:33
1345653 310411560125 2012-07-09 16:09:49
1345653 120945302103 2012-07-09 13:40:23
1345653 261060982989 2012-07-09 09:02:21
Below is the data in TestingTable2
USER_ID | PRODUCT_ID | LAST_TIME
-----------+-------------------+-------------------
1345653 110909316904 2012-07-09 21:30:06
1345653 152851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19:31:48
1345653 400307563710 2012-07-09 18:57:33
I need to Compare TestingTable2
with TestingTable1
on BUYER_ID
and USER_ID
. And I need to find all (basically the count) the missing and mismatch entries in TestingTable2
after comparing from TestingTable1
. I created SQL fiddle for this-
http://sqlfiddle.com/#!3/d87b2/1
If you run my query in the SQL Fiddle, you will get output as-
BUYER_ID ERROR
1345653 5
which is right as last three
rows from TestingTable1
is missing in TestingTable2
and rest two
are mismatch after comparison from TestingTable1
on BUYER_ID
and USER_ID
.
Now the complicated thing is starting.
Problem Statement-
In my current output, I am getting ERROR count as 5
. So if you see first row in both the tables ITEM_ID
and PRODUCT_ID
are same but CREATED_TIME
and LAST_TIME
is not same, and difference between those two times is of only 1 minute
. So currently I am reporting that as a mismatch, but what I need is that if the difference between them is within 15 minutes range
, then I don't want to report as an error. So after implementing this feature in my current query, I will be getting error count as 4 because difference is within 15 minute range for the first row.
So how can I implement this feature in my current query? That's my question.
P.S- I am working with Hive and Hive supports sql like syntax. So I think any modification will work in my current query.
your SQL Server SQL Fiddle query could be modified as follows and made to work. However, i am not sure if Hive supports datediff
SELECT TT.BUYER_ID , COUNT(*) FROM
(SELECT testingtable1.buyer_id, testingtable1.item_id, testingtable1.created_time FROM
testingtable2 RIGHT JOIN testingtable1
ON (testingtable1.item_id = testingtable2.product_id
AND testingtable1.BUYER_ID = testingtable2.USER_ID
AND abs(datediff(mi, testingtable1.created_time,testingtable2.last_time)) <= 15)
where testingtable2.product_id IS NULL) TT GROUP BY TT.BUYER_ID;
这篇关于计算不匹配并丢失的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!