使用HiveQL(Hadoop)在Hive中加入两个表 [英] Joining two Tables in Hive using HiveQL(Hadoop)

查看:130
本文介绍了使用HiveQL(Hadoop)在Hive中加入两个表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


可能存在重复:

SQL查询JOIN与表格






  CREATE EXTERNAL TABLE IF NOT EXESTTS TestingTable1(这是通过它进行比较的MAIN表)

BUYER_ID BIGINT,
ITEM_ID BIGINT,
CREATED_TIME STRING

这就是上面第一张表中的数据

  ** BUYER_ID ** | ** ITEM_ID ** | ** CREATED_TIME ** 
-------------- + ------------------ + ------- ------------------
1015826235 220003038067 * 2001-11-03 19:40:21 *
1015826235 300003861266 2001-11-08 18:19 :59
1015826235 140002997245 2003-08-22 09:23:17
1015826235 * 210002448035 * 2001-11-11 22:21:11

这是Hive中的第二张表 - 它还包含有关我们正在购买的物品的信息。

  CREATE EXTERNAL TABLE IF NOT EXISTS TestingTable2 

USER_ID BIGINT,
PURCHASED_ITEM ARRAY< STRUCT< PRODUCT_ID:BIGINT,TIMESTAMPS:STRING>>

这就是上面第二个表中的数据( TestingTable2

  ** USER_ID ** ** PURCHASED_ITEM ** 
1015826235 [{product_id :220003038067,timestamps:1004941621},{product_id:300003861266,timestamps:1005268799},{product_id:140002997245times tamps:1061569397},{product_id:200002448035,timestamps:1005542471}]

TestingTable2 TestingTable1 进行比较,以满足以下情况。



找到 PRODUCT_ID AND TIMESTAMPS TestingTable2 与TestingTable1中的 ITEM_ID AND CREATED_TIME 不匹配 c> BUYER_ID(USER_ID) TestingTable1 进行比较后。



TestingTable2 数据this(last) ITEM_ID 210002448035 from TestingTable1 is不符合 TestingTable2 PRODUCT_ID- 200002448035 数据,并且与时间戳相似。所以我想用HiveQL查询来显示下面的结果。

  ** BUYER_ID ** | ** ITEM_ID ** | ** CREATED_TIME ** | ** PRODUCT_ID ** | ** TIMESTAMPS ** 
-------------- + ------------------ + ------- ------------------------- + ------------------------ + ----------------------
1015826235 * 210002448035 * 2001-11-11 22:21:11 200002448035 1005542471
1015826235 220003038067 * 2001-11-03 19:40:21 * 220003038067 1004941621

任何人都可以帮助我解决这个问题。因为我是HiveQL的新手,所以有很多问题。



更新: -

<我已经写了这个查询,但它不按我想要的方式工作。

  select * from 
(选择* from
(选择user_id,prod_and_ts.product_id作为product_id,prod_and_ts.timestamps作为时间戳
from testingtable2 LATERAL VIEW
explode(purchased_item)exploded_table作为prod_and_ts)
prod_and_ts
LEFT OUTER JOIN testingtable1
ON(prod_and_ts.user_id = testingtable1.buyer_id AND testingtable1.item_id = prod_and_ts.product_id
AND prod_and_ts.timestamps = UNIX_TIMESTAMP(testingtable1.created_time)

where testingtable1.buyer_id IS NULL)
set_a LEFT OUTER JOIN testingtable1
ON(set_a.user_id = testingtable1.buyer_id AND
(set_a.product_id = testingtable1.item_id或set_a.timestamps = UNIX_TIMESTAMP (testingtable1.created_time))
) ;

更多更新



根据 user1166147 评论。我按照他的查询编写了我的查询。
在配置单元中,我猜< INNER JOIN
是简单的 JOIN




$ b

 select * from(select t2.buyer_id,t2.item_id, t2.created_time as created_time,subq.user_id,subq.product_id,subq.timestamps as timestamps 
from
(选择user_id,prod_and_ts.product_id作为product_id,prod_and_ts.timestamps作为来自testingtable2横向视图爆炸的时间戳(purchased_item )exploded_table as prod_and_ts)subq JOIN testingtable1 t2 on t2.buyer_id = subq.user_id
AND subq.timestamps = unix_timestamp(t2.created_time)
WHERE(subq.product_id<> t2.item_id)
union all
select t2.buyer_id,t2.item_id as item_id,t2.created_time,subq.user_id,subq.product_id as product_id,subq.timestamps
from
(select user_id, prod_and_ts.product_id作为product_id,prod_and_ts.timestamps作为来自testingtable2侧视图的时间戳explode(purchased_item)exploded_table作为prod_and_ts)subq JO IN testingtable1 t2 on t2.buyer_id = subq.user_id
and subq.product_id = t2.item_id
WHERE(subq.timestamps<> unix_timestamp(t2.created_time)))unionall;

运行上述查询后,我得到零结果。



最后一次更新: -



我不好,我没有准确的数据这是我没有得到结果的原因。是的,它正在实际上面的查询。

解决方案

编辑 - 第1部分
好​​的 - 出于某种原因,我要解释我自己 - 所以首先我因为SQL标签而偶然发现了这个问题,并且看到了Hive,并开始不去查看,只是跳过它。但是,我发现它已经过了一天,你没有得到答案。我看了一下 - 我发现原来的查询中有一个SQL逻辑更正,我知道这是需要的,会有所帮助,所以我只发布了,因为没有人回答。我会尽力解决这个最后的问题 - 但之后,我会向自己保留我的建议,因为我可能会提出不好的建议。祝你好运!我试过了!现在你似乎正在得到答案,所以......



在TSQL中,我可以通过下面的单个查询来解决整个问题

pre $ SELECT *
FROM SO_Table1HIVE A
全外连接SO_Table2HIVE B ON A.BUYER_ID = B [ USER_ID] AND(B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID)

它将返回所有内容,包括您的匹配buyer_id / user_id。它不会与另一个表中的时间或产品中没有匹配的buyer_id / user_id行相匹配,但它会将其作为单独的行在其他表的字段中与NULLS一起返回。我不会以任何方式匹配 - 没有提供准确的信息来做到这一点,如下所述。



结束编辑第1部分



如果您不能使用 OR 执行 FULL OUTER JOIN 在Hive中,满足原始标准的最简单方法是使用 UNION ALL 2 INNER JOIN s。在其中一个查询中,除了加入匹配的user_ids之外,加入PRODUCT_ID并在 WHERE 中查找与CREATED_TIME不匹配的TIMESTAMPS。在第二个查询中,除了加入匹配的user_ids之外,加入时间AND在 WHERE 中寻找不匹配的产品。

编辑第二部分 - 更新评论问题补充标准

最后一个标准是任何表中的任何记录都有匹配的user_id = buyer_id,但没有其他匹配的记录。带有 OR 条件的 FULL OUTER JOIN 会返回它们,但没有足够的提供信息来描述关联记录彼此。我们可以很容易地识别它们,但无法将它们绑回到对方。如果你这样做,并且你在两个表中都有一个以上的记录没有匹配,那么每个记录都会有多个条目。



我试图在没有更多信息的情况下尝试绑定它们的任何查询都可能是猜测并且不准确。



例如,在第一个表中,如果有这两个(示例假记录)记录在第二个表中除user_id外没有匹配:

  1015826235 420003038067 2011-11-03 19:40:21.000 
1015826235 720003038067 2004-11-03 19:40:21.000

AND在table2中 - 这些不匹配:

  1015826235 {product_id:520003038067,timestamps:10 ... 
1015826235 {product_id:620003038067,timestamps:10 ...

您可以识别它们,但是如果您在没有更多条件的情况下将它们匹配,您将得到4而不是2:

  1015826235 420003038067 2011-11-03 19:40:21.000 1015826235 520003038067 
1015826235 420003038067 2011-11-03 19:40:21.000 1015826235 620003038067
1015826235 720003038067 2004 -11-03 19:40:2 1.000 1015826235 520003038067
1015826235 720003038067 2004-11-03 19:40:21.000 1015826235 620003038067



  BUYER_ID ITEM_ID CREATED_TIME USER_ID PRODUCTID时间戳
- -------------------------------------------------- -------------------
NULL NULL NULL 1015826235 520003038067 2009-11-11 22:21:11.000
NULL NULL NULL 1015826235 620003038067 2008-11 -11 22:21:11.000
1015826235 420003038067 2011-11-03 19:40:21.000 NULL NULL NULL
1015826235 720003038067 2004-11-03 19:40:21.000 NULL NULL NULL

结束编辑第2部分 - 更新评论问题附加标准 - 第1部分



我正在使用TSQL,所以我无法为您测试一个确切的曲线你的语法,但连接的概念是相同的,这将返回你想要的。我确实接受了您的查询并尝试您的语法,并根据需要进行修改。我使用TSQL进行测试。您可以通过HiveQL中的功能来完善它并进行改进。还有其他方法可以做到这一点 - 但这是最​​直接的,这将转化为HiveQL。



删除后,您已加入此部分,并在此后包含
$ b 根据需要再次修改语法)**

 SELECT * 
FROM(
)SELECT BUYER_ID,ITEM_ID,CREATED_TIME,PRODUCT_ID,TIMESTAMPS $ b $ FROM FROM testingtable2横向视图
explode(purchased_item)exploded_table作为prod_and_ts )
prod_and_ts
INNER JOIN table2 A ON A.BUYER_ID = prod_and_ts。[USER_ID] AND prod_and_ts.timestamps = UNIX_TIMESTAMP(table2.created_time)
WHERE prod_and_ts.product_id<> A.ITEM_ID
UNION ALL
SELECT BUYER_ID,ITEM_ID,CREATED_TIME,PRODUCT_ID,TIMESTAMPS
FROM testingtable2 LATERAL VIEW
explode(purchased_item)exploded_table作为prod_and_ts)
prod_and_ts
INNER JOIN table2 A ON A.BUYER_ID = prod_and_ts。[USER_ID] AND prod_and_ts.product_id = A.ITEM_ID
WHERE prod_and_ts.timestamps<> UNIX_TIMESTAMP(table2.created_time)
)X

这是我的测试过的TSQL版本与我的表名称作为参考:

  SELECT * 
FROM(
SELECT *
FROM SO_Table1HIVE A
INNER JOIN SO_Table2HIVE B ON A.BUYER_ID = B。[USER_ID] AND B.t1time = A.Created_TIME
WHERE B.PRODUCTID<> A.ITEM_ID
UNION ALL
SELECT *
FROM SO_Table1HIVE A
INNER JOIN SO_Table2HIVE B ON A.BUYER_ID = B. [USER_ID] AND B.PRODUCTID = A.ITEM_ID
WHERE B. t1time<> A.Created_TIME
)X

* 编辑第3部分在TSQL中,整个查询(没有联合)可以使用 FULL来运行。 OUTER JOIN 加入 OR 条件

  SELECT * 
FROM SO_Table1HIVE A
FULL OUTER JOIN SO_Table2HIVE BO N A.BUYER_ID = B. [USER_ID] AND(B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID)

如果不能简单地执行上述操作,对于新条件的SQL逻辑 - 从两个表中获取不匹配的SQL逻辑,并将它们显示为 NULL code>在另一个表中使用 RIGHT JOIN LEFT JOIN
RIGHT JOIN 将抓取第一个表中的所有内容,并匹配第二个表中的所有内容,并且 LEFT 相反。将新查询添加到 UNION 中。


TSQL示例 - 修改HIVE

  SELECT * 
FROM SO_Table1HIVE A
RIGHT JOIN SO_Table2HIVE B ON A.BUYER_ID = B. [USER_ID] AND(B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID)
WHERE A.BUYER_ID是NULL
UNION ALL
SELECT *
FROM SO_Table1HIVE A
LEFT JOIN SO_Table2HIVE B ON A.BUYER_ID = B。[USER_ID] AND(B.t1time = A.Created_TIME or B.PRODUCTID = A.ITEM_ID)
WHERE B. [USER_ID] IS NULL

或者,如果你想要抓住它们并将它们作为重复项添加到 UNION



TSQL

  SELECT * 
FROM SO_Table1HIVE A
JOIN SO_Table2HIVE B ON A.BUYER_ID = B。[USER_ID]
WHERE B.t1time NOT IN(SELECT Created_TIME FROM SO_Table1HIVE)
AND A.Created_TIME NOT IN(SELECT t1time FROM SO_Table2HIVE)
AND B.PRODUCTID NOT IN(SELECT ITEM_ID FROM SO_Table1HIVE)
AND A.ITEM_ID NOT IN (从SO_Table2HIVE中选择产品)

再次祝您好运!


Possible Duplicate:
SQL Query JOIN with Table

CREATE EXTERNAL TABLE IF NOT EXISTS TestingTable1   (This is the MAIN table through which comparisons need to be made)
(
BUYER_ID BIGINT,
ITEM_ID BIGINT,
CREATED_TIME STRING
)

And this is the data in the above first table

**BUYER_ID**  |  **ITEM_ID**     |      **CREATED_TIME**   
--------------+------------------+-------------------------
 1015826235      220003038067       *2001-11-03 19:40:21*
 1015826235      300003861266        2001-11-08 18:19:59
 1015826235      140002997245        2003-08-22 09:23:17
 1015826235     *210002448035*       2001-11-11 22:21:11

This is Second table in Hive- It also contains information about the items we are purchasing.

CREATE EXTERNAL TABLE IF NOT EXISTS TestingTable2
(
USER_ID BIGINT,
PURCHASED_ITEM ARRAY<STRUCT<PRODUCT_ID: BIGINT,TIMESTAMPS:STRING>>
)

And this is the data in the above second table (TestingTable2)-

**USER_ID**    **PURCHASED_ITEM**
1015826235     [{"product_id":220003038067,"timestamps":"1004941621"},    {"product_id":300003861266,"timestamps":"1005268799"},    {"product_id":140002997245,"timestamps":"1061569397"},{"product_id":200002448035,"timestamps":"1005542471"}]

Compare TestingTable2 with TestingTable1 so that below scenario is fulfilled.

Find the PRODUCT_ID AND TIMESTAMPS from TestingTable2 WHICH IS NOT MATCHING WITH ITEM_ID AND CREATED_TIME from TestingTable1 CORRESPONDING TO BUYER_ID(USER_ID) after comparing from TestingTable1.

So If you look TestingTable2 data this(last) ITEM_ID 210002448035 from TestingTable1 is not matching with TestingTable2 PRODUCT_ID- 200002448035 data and similarly with timestamps. So I want to show the below result using the HiveQL query.

**BUYER_ID**  |  **ITEM_ID**     |      **CREATED_TIME**          |  **PRODUCT_ID**    |     **TIMESTAMPS** 
--------------+------------------+--------------------------------+------------------------+----------------------
1015826235          *210002448035*           2001-11-11 22:21:11            200002448035           1005542471 
1015826235       220003038067           *2001-11-03 19:40:21*           220003038067          1004941621

Can anyone help me with this. As I am new to HiveQL so having lot of Problem.

Updated:-

I have written this query, but it doesn't working the way I wanted to.

select * from 
  (select * from 
     (select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps 
      from testingtable2 LATERAL VIEW
      explode(purchased_item) exploded_table as prod_and_ts)
      prod_and_ts 
      LEFT OUTER JOIN testingtable1 
  ON ( prod_and_ts.user_id = testingtable1.buyer_id AND testingtable1.item_id =    prod_and_ts.product_id
     AND prod_and_ts.timestamps = UNIX_TIMESTAMP (testingtable1.created_time)
  )
  where testingtable1.buyer_id IS NULL) 
  set_a LEFT OUTER JOIN testingtable1 
  ON (set_a.user_id = testingtable1.buyer_id AND  
  ( set_a.product_id = testingtable1.item_id OR set_a.timestamps = UNIX_TIMESTAMP(testingtable1.created_time) )
 );

One More UPDATE

As per user1166147 comments. I wrote my query as per his query. In hive I guess INNER JOIN are written by simply JOIN.

This is my below query.

select * from (select t2.buyer_id, t2.item_id, t2.created_time as created_time, subq.user_id, subq.product_id, subq.timestamps as timestamps 
from
(select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 lateral view explode(purchased_item) exploded_table as prod_and_ts) subq JOIN testingtable1 t2 on t2.buyer_id = subq.user_id 
AND subq.timestamps = unix_timestamp(t2.created_time)
WHERE (subq.product_id <> t2.item_id)
union all
select t2.buyer_id, t2.item_id as item_id, t2.created_time, subq.user_id, subq.product_id as product_id, subq.timestamps
from 
(select user_id, prod_and_ts.product_id as product_id, prod_and_ts.timestamps as timestamps from testingtable2 lateral view explode(purchased_item) exploded_table as prod_and_ts) subq JOIN testingtable1 t2 on t2.buyer_id = subq.user_id 
    and subq.product_id = t2.item_id 
    WHERE (subq.timestamps <> unix_timestamp(t2.created_time))) unionall;

And after running the above query, I am getting zero result back.

One Final UPDATE:-

My Bad, I didn't have the accurate data in the tables so that is the reason I was not getting the result back. Yes it is working the actual above query.

解决方案

EDIT - PART 1 Okay - For some reason I am going to explain myself - so to start with I stumbled upon this question because of the SQL tag, and saw Hive, and started to not look and just skip it. BUT then I noticed it had been over a day and you had gotten no answers. I looked - I saw a SQL logic correction in the original query posted that I knew would be needed and would help, so I posted ONLY because no one had answered. I will try to address this last question - but after that I am keeping my advice to myself, as I may be giving bad advice. Good luck! I tried! And you seem to be getting answers now, so...

In TSQL, I could solve this entire problem with the below single query:

SELECT * 
FROM SO_Table1HIVE A
FULL OUTER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID)

It would return everything, including your match buyer_id/user_id only. It won't match a buyer_id/user_id row with no matches in either time or product in the other table, but it will return it as a separate row with NULLS in the other table's fields. I would not match these any way - there is no accurate information provided to do it with as explained below.

END EDIT PART 1

If you can't do FULL OUTER JOIN with OR in Hive, the simplest way to meet the original criteria is to UNION ALL 2 INNER JOINs. On one of the queries, in addition to joining the matching user_ids, join on the PRODUCT_ID AND in your WHERE look for TIMESTAMPS that don't match CREATED_TIME. On the second query, in addition to joining the matching user_ids, join on the times AND in your WHERE look for products that don't match.

EDIT PART 2 - UPDATE FOR COMMENT QUESTION ADDITIONAL CRITERIA

If I understand the last criteria it is any record in either table that has a matching user_id = buyer_id, but nothing else matches. The FULL OUTER JOIN with OR condition will return them, but there isn't enough provided info for a way to relate the records to each other. We can easily identify them, but have no way to tie them back to each other. If you do so and you have more than one record without a match in either OR both tables, there are going to be multiple entries for each.

Any query I wrote to try to tie them without more info (and probably with) would be a guess and inaccurate.

For example, in the first table if there were these 2 (sample fake) records with nothing matching in the second except user_id:

1015826235  420003038067    2011-11-03 19:40:21.000
1015826235  720003038067    2004-11-03 19:40:21.000

AND in table2 - these non matching:

1015826235  {"product_id":520003038067,"timestamps":"10...
1015826235  {"product_id":620003038067,"timestamps":"10...

You can identify them, but if you match them without more criteria you get 4 instead of 2:

1015826235  420003038067    2011-11-03 19:40:21.000 1015826235 520003038067
1015826235  420003038067    2011-11-03 19:40:21.000 1015826235 620003038067
1015826235  720003038067    2004-11-03 19:40:21.000 1015826235 520003038067
1015826235  720003038067    2004-11-03 19:40:21.000 1015826235 620003038067

My suggestion would be simply to identify them and show them, as below.

BUYER_ID        ITEM_ID      CREATED_TIME           USER_ID PRODUCTID   timestamps  
----------------------------------------------------------------------
NULL            NULL         NULL                   1015826235  520003038067    2009-11-11 22:21:11.000
NULL            NULL         NULL                   1015826235  620003038067    2008-11-11 22:21:11.000
1015826235      420003038067 2011-11-03 19:40:21.000    NULL    NULL    NULL    
1015826235      720003038067 2004-11-03 19:40:21.000    NULL    NULL    NULL    

END EDIT PART 2 - UPDATE FOR COMMENT QUESTION ADDITIONAL CRITERIA - PART 1

I am working with TSQL, so I can't test for you an exact query with your syntax, but the concepts of the joins are the same, and this will return what you want. I did take your query and attempt your syntax, modify as needed. I tested in TSQL. You may be able to take this and improve upon it with functionality in HiveQL. There are other ways to do this - but this is the most straightforward and this will translate to HiveQL.

REMOVED, YOU GOT THIS PART AND IT IS INCLUDED LATER

(Again modify syntax as needed)**

SELECT *
FROM (
    SELECT BUYER_ID,ITEM_ID,CREATED_TIME,PRODUCT_ID,TIMESTAMPS 
    FROM testingtable2 LATERAL VIEW
        explode(purchased_item) exploded_table as prod_and_ts)
        prod_and_ts 
    INNER JOIN table2 A  ON A.BUYER_ID = prod_and_ts.[USER_ID] AND prod_and_ts.timestamps = UNIX_TIMESTAMP (table2.created_time) 
    WHERE prod_and_ts.product_id <> A.ITEM_ID
    UNION ALL
    SELECT BUYER_ID,ITEM_ID,CREATED_TIME,PRODUCT_ID,TIMESTAMPS 
    FROM testingtable2 LATERAL VIEW
            explode(purchased_item) exploded_table as prod_and_ts)
            prod_and_ts 
    INNER JOIN table2 A  ON A.BUYER_ID = prod_and_ts.[USER_ID] AND prod_and_ts.product_id = A.ITEM_ID
    WHERE  prod_and_ts.timestamps <> UNIX_TIMESTAMP (table2.created_time) 
) X

And here is my tested TSQL version with my table names for reference:

SELECT * 
FROM(
    SELECT *
    FROM SO_Table1HIVE A
    INNER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND B.t1time = A.Created_TIME 
    WHERE B.PRODUCTID <> A.ITEM_ID
    UNION ALL
    SELECT * 
    FROM SO_Table1HIVE A
    INNER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND B.PRODUCTID = A.ITEM_ID  
    WHERE B.t1time <> A.Created_TIME  
 ) X

*EDIT PART 3 - UPDATE FOR COMMENT QUESTION ADDITIONAL CRITERIA -PART 2

In TSQL the entire query (no unions) can be run using a FULL OUTER JOIN with an OR condition on the join

SELECT * 
FROM SO_Table1HIVE A
FULL OUTER JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME OR B.PRODUCTID = A.ITEM_ID)

If you can't simply do the above, For the SQL logic for the new criteria - to grab those that don't match from both tables and display them as NULL in the other table use RIGHT JOIN and LEFT JOIN. RIGHT JOIN will grab anything in the first table the matches the second and everything in the second, and LEFT does the opposite. Add the new queries to your UNION.

TSQL EXAMPLE - MODIFY FOR HIVE

SELECT * 
FROM SO_Table1HIVE A
RIGHT JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME    OR B.PRODUCTID = A.ITEM_ID)
WHERE A.BUYER_ID IS NULL 
UNION ALL
SELECT * 
FROM SO_Table1HIVE A
LEFT JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] AND (B.t1time = A.Created_TIME OR    B.PRODUCTID = A.ITEM_ID)
WHERE B.[USER_ID] IS NULL

Or, If you wanted to grab them and match them as duplicates add to UNION:

TSQL

SELECT * 
FROM SO_Table1HIVE A
JOIN SO_Table2HIVE B ON A.BUYER_ID = B.[USER_ID] 
WHERE B.t1time NOT IN(SELECT Created_TIME FROM SO_Table1HIVE)
AND A.Created_TIME  NOT IN(SELECT t1time FROM SO_Table2HIVE) 
AND B.PRODUCTID NOT IN(SELECT ITEM_ID FROM SO_Table1HIVE)
AND A.ITEM_ID NOT IN(SELECT PRODUCTID FROM SO_Table2HIVE) 

Again, Good luck!

这篇关于使用HiveQL(Hadoop)在Hive中加入两个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆