多联接在一个SQL查询中 [英] Multi Join in a single SQL query
问题描述
下面是TestingTable1中的数据,按日期降序排列,总是
BUYER_ID | ITEM_ID | CREATED_TIME
---------- + ----------------- + ---------------- ------
1345653 110909316904 2012-07-09 21:29:06
1345653 151851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19 :31:48
1345653 400307563710 2012-07-09 18:57:33
和如果这是以日期降序排序的TestingTable2中的以下数据,则始终
USER_ID | PRODUCT_ID | LAST_TIME
--------- + ---------------- + ------------------ -----
1345653 110909316904 2012-07-09 22:29:06
1345653 150851771618 2012-07-09 19:57:33
TestingTable1
中的每一行都应该与 TestingTable2
,如果不匹配或TestingTable2中没有数据,那么我需要在输出中显示它们,如 TestingTable1
我有这个数据但对应于 TestingTable2
我有这个数据(这将是错误的数据),以便我可以看到什么是不匹配的,哪些数据丢失。
我需要比较 TestingTable2
和 TestingTable1
on BUYER_ID
和 USER_ID
。我需要看到,如果 BUYER_ID
和 USER_ID
匹配,那么我需要比较 ITEM_ID
与 PRODUCT_ID
和 CREATED_TIME
与 LAST_TIME
并且如果在与 TestingTable1
中的任何一个或两者进行比较后 TestingTable2
中存在不匹配,则我需要显示结果。
所以,如果你看看上面的例子 - 我有三个场景基本上是:
- 首先 - 在
TestingTable1
中,第一行ITEM_ID
与<$ c匹配code> TestTable2 但CREATED_TIME
的第一行中的$ c> PRODUCT_ID 与LAST_TIME
作为这两个表中的第一行
- 其次 - 在
TestingTable1
中,在第二行CREATED_TIME
与TestingTable2 $的第二行中的
LAST_TIME
匹配c $ c>但是ITEM_ID
不是ma使用PRODUCT_ID
针对两个表中的第二行进行配置
- 第三 - 在
TestingTable1 code>,最后两行(行)根本不存在于
TestingTable2
中。我写的查询中没有涉及这种情况。我希望这种情况也出现在我的查询中。 - 其次 - 在
因此,这三种情况是我需要在比较 TestingTable2
与 TestingTable1
always。并且 TestingTable1
是总是需要进行比较的MAIN表,所以它意味着 TestingTable1
中的数据总是准确的。
所以我需要显示下面的结果,考虑上面的例子,如果不匹配它们中的任何一个或者数据不在 TestingTable2
at all - TestingTable1
数据然后在它旁边同样 TestingTable2
数据,以便我可以看到 TestingTable1
与 TestingTable2
<$相比有什么价值p $ p>
BUYER_ID | ITEM_ID | CREATED_TIME | USER_ID | PRODUCT_ID | LAST_TIME
----------- + ----------------- + --------------- ------------ + ---------------- + -------------------- + -----------------------
1345653 110909316904 2012-07-09 21:29:06 1345653 110909316904 2012-07-09 22:29 :06
1345653 151851771618 2012-07-09 19:57:33 1345653 150851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19:31:48 NULL NULL NULL
1345653 400307563710 2012-07-09 18:57:33 NULL NULL NULL
以下是查询我写的只覆盖了我上面提到的两个场景
,它工作正常,我将得到类似上面的输出,从输出中留下最后两行。但是,我还需要在这个(下面)查询中添加第三个场景
,以便它可以提供类似上面的输出。
SELECT *
FROM(SELECT T2.buyer_id,
T2.item_id,
T2 .created_time AS created_time,
subq.user_id,
subq.product_id,
subq.LAST_TIME
FROM TestingTable2 subq
JOIN TestingTable1 T2
ON T2.buyer_id = subq.user_id
AND subq.LAST_TIME =(T2.created_time)
WHERE(subq.product_id<> T2.item_id)
UNION ALL
SELECT T2.buyer_id,
T2.item_id AS item_id,
T2.created_time,
subq.user_id,
subq.product_id AS product_id,
subq.LAST_TIME
FROM TestingTable2 subq
JOIN TestingTable1 T2
ON T2.buyer_id = subq.user_id
AND subq.product_id = T2.item_id
WHERE(subq.LAST_T IME<> (T2.created_time)))finalResult
ORDER BY finalResult.BUYER_ID;
任何建议将不胜感激。
PS在过去的几天中,我已经提出了几个与JOIN相关的问题,但这只涉及到我的两种情况,而不是我在此查询中需要的第三种情况。
更新: -
我不能使用SQL的 NOT EXISTS
语法,因为我使用Hive和Hive不支持 NOT IN
或 NOT EXISTS
,所以我需要其他方式处理这个问题。
我需要使用我的查询来修改它以适用于第三方案,因为Hive将支持SQL语法。
以下是我的 SQL Fiddle ,它满足了我上面的两种情况,但不是第三种情况。任何人都可以帮助我修改我的查询以适用于第三种情况吗?
http://sqlfiddle.com/#!3/102dd/1/0 。
Table1中的数据应该在Table2中,如果它不存在,那么我需要显示Table1比较后的数据之间的不匹配,并且它也是可能的话,Table1中的数据将不会出现在表2中,我也想表明这一点。
更新的小提琴输出 http://sqlfiddle.com/#!3/102dd/3/0
BUYER_ID | ITEM_ID | CREATED_TIME | USER_ID | PRODUCT_ID | LAST_TIME
----------- + ----------------- + --------------- ------------ + ---------------- + -------------------- + -----------------------
1345653 151851771618 July,09 2012 19:57:33 1345653 150851771618 July,09 2012 19:57:33
1345653 221065796761 2012年7月9日19:31:48 1345653 221165796761 2012年7月9日19:31:48
1345653 110909316904 2012年7月21日21:29:06 1345653 110909316904 2012年7月09日22:29 :06
1345653 400307563710 July,09 2012 18:57:33 NULL NULL NULL
1345653 310411560125 July,09 2012 16:09:49 NULL NULL NULL
更新的SQL查询给我错误
code> TestingTable1 with this query -
(SELECT BUYER_ID,ITEM_ID,rank(BUYER_ID ),CREATED_TIME
FROM (
SELECT BUYER_ID,ITEM_ID,CREATED_TIME
FROM testingtable1
where to_date(from_unixtime(cast(UNIX_TIMESTAMP(CREATED_TIME)as int)))='2012-07-09'
DISTRIBUTE BY BUYER_ID
通过BUYER_ID排序,CREATED_TIME desc
)T1
WHERE排名(BUYER_ID)< 5)
和 TestingTable2
(SELECT USER_ID,PROD_AND_TS.PRODUCT_ID as PRODUCT_ID,PROD_AND_TS.TIMESTAMPS as TIMESTAMPS FROM testingtable2 lateral view explode(PURCHASED_ITEM)exploded_table as PROD_AND_TS其中to_date(from_unixtime(cast(PROD_AND_TS.TIMESTAMPS as BIGINT)))='2012-07-09')
@eggyal发布的干净解决方案 code>因为数据库管理系统的限制,另一种选择可能是完全复制你的原始联盟,并在 LEFT JOIN
中使用这些结果。
应用于您的 sqlFiddle ,以下语句返回所需的结果。
SQL语句
SELECT *
FROM(
SELECT *
FROM TestingTable1 A
JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.LAST_TIME = A.Created_TIME
WHERE B.PRODUCT_ID<> A.ITEM_ID
UNION ALL
SELECT *
FROM TestingTable1 A
INNER JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.PRODUCT_ID = A.ITEM_ID
WHERE B.LAST_TIME<> A.Created_TIME
)X
UNION ALL
SELECT A. *,NULL,NULL,NULL
FROM TestingTable1 A
LEFT OUTER JOIN(
SELECT *
FROM TestingTable1 A
JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.LAST_TIME = A.Created_TIME
WHERE B.PRODUCT_ID<> A.ITEM_ID
UNION ALL
SELECT *
FROM TestingTable1 A
INNER JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.PRODUCT_ID = A.ITEM_ID
WHERE B.LAST_TIME<> A.Created_TIME
)X ON A.BUYER_ID = X.BUYER_ID AND A.ITEM_ID = X.ITEM_ID
WHERE X.BUYER_ID is NULL
Below is the data in TestingTable1 sorted by date in descending order always
BUYER_ID | ITEM_ID | CREATED_TIME
----------+-----------------+----------------------
1345653 110909316904 2012-07-09 21:29:06
1345653 151851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19:31:48
1345653 400307563710 2012-07-09 18:57:33
And if this is the below data in TestingTable2 sorted by date in descending order always
USER_ID | PRODUCT_ID | LAST_TIME
---------+----------------+-----------------------
1345653 110909316904 2012-07-09 22:29:06
1345653 150851771618 2012-07-09 19:57:33
Each row in TestingTable1
should match with TestingTable2
, if doesn't match or data is not there in TestingTable2, then I need to show them in the output as in TestingTable1
I have this data but corresponding to TestingTable2
I have this data(which will be wrong data), so that I can see what is mismatch and what data is missing.
I need to compare TestingTable2
with TestingTable1
on BUYER_ID
and USER_ID
. I need to see, if BUYER_ID
and USER_ID
gets matched then I need to compare ITEM_ID
with PRODUCT_ID
and CREATED_TIME
with LAST_TIME
and if there is a mismatch in TestingTable2
after comparing with TestingTable1
in either one of them or both of them, then I need to show the result.
So if you look at the above example- I have three scenarios basically
- Firstly- In
TestingTable1
, in the First row ITEM_ID
is matching with PRODUCT_ID
in the First row of TestingTable2
but CREATED_TIME
is not matching with LAST_TIME
for the first row in both the tables
- Secondly- In
TestingTable1
, in the Second row CREATED_TIME
is matching with LAST_TIME
in the second row of TestingTable2
but ITEM_ID
is not matching with PRODUCT_ID
for the second row in both the tables
- Thirdly- In
TestingTable1
, last two lines(rows) are not there at all in TestingTable2
. This scenario is not covered in my query that I wrote. I want this scenario also in my query.
So these are the three cases that I need to cover while comparing TestingTable2
with TestingTable1
always. And TestingTable1
is the MAIN table through which comparisons need to be made always, so it means data in TestingTable1
is always accurate.
So I need to show the result like below considering the above example if not matching either one of them or data is not there in TestingTable2
at all- TestingTable1
data then next to it same TestingTable2
data, so that I can see what value was there in TestingTable1
as compared to TestingTable2
BUYER_ID | ITEM_ID | CREATED_TIME | USER_ID | PRODUCT_ID | LAST_TIME
-----------+-----------------+---------------------------+----------------+--------------------+-----------------------
1345653 110909316904 2012-07-09 21:29:06 1345653 110909316904 2012-07-09 22:29:06
1345653 151851771618 2012-07-09 19:57:33 1345653 150851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19:31:48 NULL NULL NULL
1345653 400307563710 2012-07-09 18:57:33 NULL NULL NULL
Below is the query I wrote that covers only my two scenarios
that I mentioned above and it works fine and I will get the output like above leaving last two rows from the Output. But I need to add third scenario
also in this(below) query so that it gives output like above.
SELECT *
FROM (SELECT T2.buyer_id,
T2.item_id,
T2.created_time AS created_time,
subq.user_id,
subq.product_id,
subq.LAST_TIME
FROM TestingTable2 subq
JOIN TestingTable1 T2
ON T2.buyer_id = subq.user_id
AND subq.LAST_TIME = ( T2.created_time )
WHERE ( subq.product_id <> T2.item_id )
UNION ALL
SELECT T2.buyer_id,
T2.item_id AS item_id,
T2.created_time,
subq.user_id,
subq.product_id AS product_id,
subq.LAST_TIME
FROM TestingTable2 subq
JOIN TestingTable1 T2
ON T2.buyer_id = subq.user_id
AND subq.product_id = T2.item_id
WHERE ( subq.LAST_TIME <> ( T2.created_time ) )) finalResult
ORDER BY finalResult.BUYER_ID;
Any suggestion will be appreciated.
P.S. I have asked few questions related to JOIN in past few days, but that covers only my two scenarios, not the third scenario that I need in this query.
Update:-
I cannot use NOT IN
or NOT EXISTS
syntax for SQL, as I am working with Hive and Hive doesn't support NOT IN
or NOT EXISTS
that so I need some other way to deal with this.
I need to use my query only to modify it to work for Third Scenario, as that way Hive will support SQL syntax.
Below is my SQL Fiddle, that fulfills my two scenario above but not the third scenario. Can anyone help me to modify my query to work for third scenario as well?
http://sqlfiddle.com/#!3/102dd/1/0.
Data in Table1 should be there in Table2, if it is not there, then I need to show the mismatch between data after comparing from Table1 and also it is possible, data from Table1 will not be there in Table2, and I want to show that too.
Updated Output for the fiddle http://sqlfiddle.com/#!3/102dd/3/0
BUYER_ID | ITEM_ID | CREATED_TIME | USER_ID | PRODUCT_ID | LAST_TIME
-----------+-----------------+---------------------------+----------------+--------------------+-----------------------
1345653 151851771618 July, 09 2012 19:57:33 1345653 150851771618 July, 09 2012 19:57:33
1345653 221065796761 July, 09 2012 19:31:48 1345653 221165796761 July, 09 2012 19:31:48
1345653 110909316904 July, 09 2012 21:29:06 1345653 110909316904 July, 09 2012 22:29:06
1345653 400307563710 July, 09 2012 18:57:33 NULL NULL NULL
1345653 310411560125 July, 09 2012 16:09:49 NULL NULL NULL
UPDATED SQL QUERY THAT IS GIVING ME ERROR
I replaced TestingTable1
with this query-
(SELECT BUYER_ID, ITEM_ID, rank(BUYER_ID), CREATED_TIME
FROM (
SELECT BUYER_ID, ITEM_ID, CREATED_TIME
FROM testingtable1
where to_date(from_unixtime(cast(UNIX_TIMESTAMP(CREATED_TIME) as int))) = '2012-07-09'
DISTRIBUTE BY BUYER_ID
SORT BY BUYER_ID, CREATED_TIME desc
) T1
WHERE rank(BUYER_ID) < 5)
And TestingTable2
with this query-
(SELECT USER_ID, PROD_AND_TS.PRODUCT_ID as PRODUCT_ID, PROD_AND_TS.TIMESTAMPS as TIMESTAMPS FROM testingtable2 lateral view explode(PURCHASED_ITEM) exploded_table as PROD_AND_TS where to_date(from_unixtime(cast(PROD_AND_TS.TIMESTAMPS as BIGINT))) = '2012-07-09')
解决方案 If you can't use NOT IN
or the clean solution posted by @eggyal
because of DBMS restrictions, another option might be to completely duplicate your original union and use those results in a LEFT JOIN
.
Applied to your sqlFiddle, following statement returns the required results.
SQL Statement
SELECT *
FROM(
SELECT *
FROM TestingTable1 A
JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.LAST_TIME = A.Created_TIME
WHERE B.PRODUCT_ID <> A.ITEM_ID
UNION ALL
SELECT *
FROM TestingTable1 A
INNER JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.PRODUCT_ID = A.ITEM_ID
WHERE B.LAST_TIME <> A.Created_TIME
) X
UNION ALL
SELECT A.*, NULL, NULL, NULL
FROM TestingTable1 A
LEFT OUTER JOIN (
SELECT *
FROM TestingTable1 A
JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.LAST_TIME = A.Created_TIME
WHERE B.PRODUCT_ID <> A.ITEM_ID
UNION ALL
SELECT *
FROM TestingTable1 A
INNER JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.PRODUCT_ID = A.ITEM_ID
WHERE B.LAST_TIME <> A.Created_TIME
) X ON A.BUYER_ID = X.BUYER_ID AND A.ITEM_ID = X.ITEM_ID
WHERE X.BUYER_ID IS NULL
这篇关于多联接在一个SQL查询中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!