多联接在一个SQL查询中 [英] Multi Join in a single SQL query

查看:70
本文介绍了多联接在一个SQL查询中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是TestingTable1中的数据,按日期降序排列,总是

  BUYER_ID | ITEM_ID | CREATED_TIME 
---------- + ----------------- + ---------------- ------
1345653 110909316904 2012-07-09 21:29:06
1345653 151851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19 :31:48
1345653 400307563710 2012-07-09 18:57:33

和如果这是以日期降序排序的TestingTable2中的以下数据,则始终

  USER_ID | PRODUCT_ID | LAST_TIME 
--------- + ---------------- + ------------------ -----
1345653 110909316904 2012-07-09 22:29:06
1345653 150851771618 2012-07-09 19:57:33

TestingTable1 中的每一行都应该与 TestingTable2 ,如果不匹配或TestingTable2中没有数据,那么我需要在输出中显示它们,如 TestingTable1 我有这个数据但对应于 TestingTable2 我有这个数据(这将是错误的数据),以便我可以看到什么是不匹配的,哪些数据丢失。



我需要比较 TestingTable2 TestingTable1 on BUYER_ID USER_ID 。我需要看到,如果 BUYER_ID USER_ID 匹配,那么我需要比较 ITEM_ID PRODUCT_ID CREATED_TIME LAST_TIME 并且如果在与 TestingTable1 中的任何一个或两者进行比较后 TestingTable2 中存在不匹配,则我需要显示结果。



所以,如果你看看上面的例子 - 我有三个场景基本上是:


  1. 首先 - 在 TestingTable1 中,第一行 ITEM_ID 与<$ c匹配code> TestTable2 但 CREATED_TIME 的第一行中的$ c> PRODUCT_ID 与 LAST_TIME 作为这两个表中的第一行
  2. 其次 - 在 TestingTable1 中,在第二行 CREATED_TIME TestingTable2 LAST_TIME 匹配c $ c>但是 ITEM_ID 不是ma使用 PRODUCT_ID 针对两个表中的第二行进行配置

  3. 第三 - 在 TestingTable1 code>,最后两行(行)根本不存在于 TestingTable2 中。我写的查询中没有涉及这种情况。我希望这种情况也出现在我的查询中。

因此,这三种情况是我需要在比较 TestingTable2 TestingTable1 always。并且 TestingTable1 是总是需要进行比较的MAIN表,所以它意味着 TestingTable1 中的数据总是准确的。

所以我需要显示下面的结果,考虑上面的例子,如果不匹配它们中的任何一个或者数据不在 TestingTable2 at all - TestingTable1 数据然后在它旁边同样 TestingTable2 数据,以便我可以看到 TestingTable1 TestingTable2



<$相比有什么价值p $ p> BUYER_ID | ITEM_ID | CREATED_TIME | USER_ID | PRODUCT_ID | LAST_TIME
----------- + ----------------- + --------------- ------------ + ---------------- + -------------------- + -----------------------
1345653 110909316904 2012-07-09 21:29:06 1345653 110909316904 2012-07-09 22:29 :06
1345653 151851771618 2012-07-09 19:57:33 1345653 150851771618 2012-07-09 19:57:33
1345653 221065796761 2012-07-09 19:31:48 NULL NULL NULL
1345653 400307563710 2012-07-09 18:57:33 NULL NULL NULL

以下是查询我写的只覆盖了我上面提到的两个场景,它工作正常,我将得到类似上面的输出,从输出中留下最后两行。但是,我还需要在这个(下面)查询中添加第三个场景,以便它可以提供类似上面的输出。

  SELECT * 
FROM(SELECT T2.buyer_id,
T2.item_id,
T2 .created_time AS created_time,
subq.user_id,
subq.product_id,
subq.LAST_TIME
FROM TestingTable2 subq
JOIN TestingTable1 T2
ON T2.buyer_id = subq.user_id
AND subq.LAST_TIME =(T2.created_time)
WHERE(subq.product_id<> T2.item_id)
UNION ALL
SELECT T2.buyer_id,
T2.item_id AS item_id,
T2.created_time,
subq.user_id,
subq.product_id AS product_id,
subq.LAST_TIME
FROM TestingTable2 subq
JOIN TestingTable1 T2
ON T2.buyer_id = subq.user_id
AND subq.product_id = T2.item_id
WHERE(subq.LAST_T IME<> (T2.created_time)))finalResult
ORDER BY finalResult.BUYER_ID;

任何建议将不胜感激。

PS在过去的几天中,我已经提出了几个与JOIN相关的问题,但这只涉及到我的两种情况,而不是我在此查询中需要的第三种情况。



更新: -
我不能使用SQL的 c> NOT IN
NOT EXISTS 语法,因为我使用Hive和Hive不支持 NOT IN NOT EXISTS ,所以我需要其他方式处理这个问题。



我需要使用我的查询来修改它以适用于第三方案,因为Hive将支持SQL语法。

以下是我的 SQL Fiddle ,它满足了我上面的两种情况,但不是第三种情况。任何人都可以帮助我修改我的查询以适用于第三种情况吗?



http://sqlfiddle.com/#!3/102dd/1/0

Table1中的数据应该在Table2中,如果它不存在,那么我需要显示Table1比较后的数据之间的不匹配,并且它也是可能的话,Table1中的数据将不会出现在表2中,我也想表明这一点。

更新的小提琴输出 http://sqlfiddle.com/#!3/102dd/3/0

  BUYER_ID | ITEM_ID | CREATED_TIME | USER_ID | PRODUCT_ID | LAST_TIME 
----------- + ----------------- + --------------- ------------ + ---------------- + -------------------- + -----------------------
1345653 151851771618 July,09 2012 19:57:33 1345653 150851771618 July,09 2012 19:57:33
1345653 221065796761 2012年7月9日19:31:48 1345653 221165796761 2012年7月9日19:31:48
1345653 110909316904 2012年7月21日21:29:06 1345653 110909316904 2012年7月09日22:29 :06
1345653 400307563710 July,09 2012 18:57:33 NULL NULL NULL
1345653 310411560125 July,09 2012 16:09:49 NULL NULL NULL

更新的SQL查询给我错误



code> TestingTable1 with this query -

 (SELECT BUYER_ID,ITEM_ID,rank(BUYER_ID ),CREATED_TIME 
FROM (
SELECT BUYER_ID,ITEM_ID,CREATED_TIME
FROM testingtable1
where to_date(from_unixtime(cast(UNIX_TIMESTAMP(CREATED_TIME)as int)))='2012-07-09'
DISTRIBUTE BY BUYER_ID
通过BUYER_ID排序,CREATED_TIME desc
)T1
WHERE排名(BUYER_ID)< 5)

TestingTable2

 (SELECT USER_ID,PROD_AND_TS.PRODUCT_ID as PRODUCT_ID,PROD_AND_TS.TIMESTAMPS as TIMESTAMPS FROM testingtable2 lateral view explode(PURCHASED_ITEM)exploded_table as PROD_AND_TS其中to_date(from_unixtime(cast(PROD_AND_TS.TIMESTAMPS as BIGINT)))='2012-07-09')


NOT IN 或由 @eggyal发布的干净解决方案 code>因为数据库管理系统的限制,另一种选择可能是完全复制你的原始联盟,并在 LEFT JOIN 中使用这些结果。



应用于您的 sqlFiddle ,以下语句返回所需的结果。

SQL语句

  SELECT * 
FROM(
SELECT *
FROM TestingTable1 A
JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.LAST_TIME = A.Created_TIME
WHERE B.PRODUCT_ID<> A.ITEM_ID
UNION ALL
SELECT *
FROM TestingTable1 A
INNER JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.PRODUCT_ID = A.ITEM_ID
WHERE B.LAST_TIME<> A.Created_TIME
)X
UNION ALL
SELECT A. *,NULL,NULL,NULL
FROM TestingTable1 A
LEFT OUTER JOIN(
SELECT *
FROM TestingTable1 A
JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.LAST_TIME = A.Created_TIME
WHERE B.PRODUCT_ID<> A.ITEM_ID
UNION ALL
SELECT *
FROM TestingTable1 A
INNER JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.PRODUCT_ID = A.ITEM_ID
WHERE B.LAST_TIME<> A.Created_TIME
)X ON A.BUYER_ID = X.BUYER_ID AND A.ITEM_ID = X.ITEM_ID
WHERE X.BUYER_ID is NULL


Below is the data in TestingTable1 sorted by date in descending order always

BUYER_ID  |   ITEM_ID       |  CREATED_TIME
----------+-----------------+----------------------
1345653      110909316904     2012-07-09 21:29:06
1345653      151851771618     2012-07-09 19:57:33
1345653      221065796761     2012-07-09 19:31:48
1345653      400307563710     2012-07-09 18:57:33

And if this is the below data in TestingTable2 sorted by date in descending order always

USER_ID  |   PRODUCT_ID    |    LAST_TIME
---------+----------------+-----------------------
1345653     110909316904      2012-07-09 22:29:06
1345653     150851771618      2012-07-09 19:57:33    

Each row in TestingTable1 should match with TestingTable2, if doesn't match or data is not there in TestingTable2, then I need to show them in the output as in TestingTable1 I have this data but corresponding to TestingTable2 I have this data(which will be wrong data), so that I can see what is mismatch and what data is missing.

I need to compare TestingTable2 with TestingTable1 on BUYER_ID and USER_ID. I need to see, if BUYER_ID and USER_ID gets matched then I need to compare ITEM_ID with PRODUCT_ID and CREATED_TIME with LAST_TIME and if there is a mismatch in TestingTable2 after comparing with TestingTable1 in either one of them or both of them, then I need to show the result.

So if you look at the above example- I have three scenarios basically

  1. Firstly- In TestingTable1, in the First row ITEM_ID is matching with PRODUCT_ID in the First row of TestingTable2 but CREATED_TIME is not matching with LAST_TIME for the first row in both the tables
  2. Secondly- In TestingTable1, in the Second row CREATED_TIME is matching with LAST_TIME in the second row of TestingTable2 but ITEM_ID is not matching with PRODUCT_ID for the second row in both the tables
  3. Thirdly- In TestingTable1, last two lines(rows) are not there at all in TestingTable2. This scenario is not covered in my query that I wrote. I want this scenario also in my query.

So these are the three cases that I need to cover while comparing TestingTable2 with TestingTable1 always. And TestingTable1 is the MAIN table through which comparisons need to be made always, so it means data in TestingTable1 is always accurate.

So I need to show the result like below considering the above example if not matching either one of them or data is not there in TestingTable2 at all- TestingTable1 data then next to it same TestingTable2 data, so that I can see what value was there in TestingTable1 as compared to TestingTable2

BUYER_ID   |   ITEM_ID       |    CREATED_TIME           |      USER_ID   |     PRODUCT_ID     |     LAST_TIME   
-----------+-----------------+---------------------------+----------------+--------------------+-----------------------
1345653      110909316904       2012-07-09 21:29:06            1345653          110909316904      2012-07-09 22:29:06
1345653      151851771618       2012-07-09 19:57:33            1345653          150851771618      2012-07-09 19:57:33
1345653      221065796761       2012-07-09 19:31:48            NULL             NULL              NULL
1345653      400307563710       2012-07-09 18:57:33            NULL             NULL              NULL

Below is the query I wrote that covers only my two scenarios that I mentioned above and it works fine and I will get the output like above leaving last two rows from the Output. But I need to add third scenario also in this(below) query so that it gives output like above.

SELECT *
FROM   (SELECT T2.buyer_id,
               T2.item_id,
               T2.created_time AS created_time,
               subq.user_id,
               subq.product_id,
               subq.LAST_TIME
        FROM   TestingTable2 subq
               JOIN TestingTable1 T2
                 ON T2.buyer_id = subq.user_id
                    AND subq.LAST_TIME = ( T2.created_time )
        WHERE  ( subq.product_id <> T2.item_id )
        UNION ALL
        SELECT T2.buyer_id,
               T2.item_id AS item_id,
               T2.created_time,
               subq.user_id,
               subq.product_id AS product_id,
               subq.LAST_TIME
        FROM   TestingTable2 subq
               JOIN TestingTable1 T2
                 ON T2.buyer_id = subq.user_id
                    AND subq.product_id = T2.item_id
        WHERE  ( subq.LAST_TIME <> ( T2.created_time ) )) finalResult
ORDER  BY finalResult.BUYER_ID;

Any suggestion will be appreciated.

P.S. I have asked few questions related to JOIN in past few days, but that covers only my two scenarios, not the third scenario that I need in this query.

Update:- I cannot use NOT IN or NOT EXISTS syntax for SQL, as I am working with Hive and Hive doesn't support NOT IN or NOT EXISTS that so I need some other way to deal with this.

I need to use my query only to modify it to work for Third Scenario, as that way Hive will support SQL syntax.

Below is my SQL Fiddle, that fulfills my two scenario above but not the third scenario. Can anyone help me to modify my query to work for third scenario as well?

http://sqlfiddle.com/#!3/102dd/1/0.

Data in Table1 should be there in Table2, if it is not there, then I need to show the mismatch between data after comparing from Table1 and also it is possible, data from Table1 will not be there in Table2, and I want to show that too.

Updated Output for the fiddle http://sqlfiddle.com/#!3/102dd/3/0

BUYER_ID   |   ITEM_ID       |    CREATED_TIME           |      USER_ID   |     PRODUCT_ID     |     LAST_TIME   
-----------+-----------------+---------------------------+----------------+--------------------+-----------------------
1345653       151851771618      July, 09 2012 19:57:33      1345653            150851771618         July, 09 2012 19:57:33
1345653       221065796761      July, 09 2012 19:31:48      1345653            221165796761         July, 09 2012 19:31:48
1345653       110909316904      July, 09 2012 21:29:06      1345653            110909316904         July, 09 2012 22:29:06
1345653       400307563710      July, 09 2012 18:57:33      NULL               NULL                 NULL
1345653       310411560125      July, 09 2012 16:09:49      NULL               NULL                 NULL

UPDATED SQL QUERY THAT IS GIVING ME ERROR

I replaced TestingTable1 with this query-

(SELECT BUYER_ID, ITEM_ID, rank(BUYER_ID), CREATED_TIME
FROM (
    SELECT BUYER_ID, ITEM_ID, CREATED_TIME
    FROM testingtable1
    where to_date(from_unixtime(cast(UNIX_TIMESTAMP(CREATED_TIME) as int))) = '2012-07-09'
    DISTRIBUTE BY BUYER_ID
    SORT BY BUYER_ID, CREATED_TIME desc
) T1
WHERE rank(BUYER_ID) < 5)

And TestingTable2 with this query-

(SELECT USER_ID, PROD_AND_TS.PRODUCT_ID as PRODUCT_ID, PROD_AND_TS.TIMESTAMPS as TIMESTAMPS FROM testingtable2 lateral view explode(PURCHASED_ITEM) exploded_table as PROD_AND_TS where to_date(from_unixtime(cast(PROD_AND_TS.TIMESTAMPS as BIGINT))) = '2012-07-09')

解决方案

If you can't use NOT IN or the clean solution posted by @eggyal because of DBMS restrictions, another option might be to completely duplicate your original union and use those results in a LEFT JOIN.

Applied to your sqlFiddle, following statement returns the required results.

SQL Statement

SELECT * 
FROM(
    SELECT *
    FROM TestingTable1 A
    JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.LAST_TIME = A.Created_TIME 
    WHERE B.PRODUCT_ID <> A.ITEM_ID
    UNION ALL
    SELECT * 
    FROM TestingTable1 A
    INNER JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.PRODUCT_ID = A.ITEM_ID  
    WHERE B.LAST_TIME <> A.Created_TIME      
 ) X
UNION ALL
SELECT A.*, NULL, NULL, NULL
FROM   TestingTable1 A
       LEFT OUTER JOIN (
            SELECT *
            FROM TestingTable1 A
            JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.LAST_TIME = A.Created_TIME 
            WHERE B.PRODUCT_ID <> A.ITEM_ID
            UNION ALL
            SELECT * 
            FROM TestingTable1 A
            INNER JOIN TestingTable2 B ON A.BUYER_ID = B.USER_ID AND B.PRODUCT_ID = A.ITEM_ID  
            WHERE B.LAST_TIME <> A.Created_TIME      
       ) X ON A.BUYER_ID = X.BUYER_ID AND A.ITEM_ID = X.ITEM_ID
WHERE  X.BUYER_ID IS NULL

这篇关于多联接在一个SQL查询中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆