如何在Spark SQL中进行左外部联接? [英] How to do left outer join in spark sql?

查看:621
本文介绍了如何在Spark SQL中进行左外部联接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在spark(1.6.2)中进行左外部连接,但它不起作用.我的SQL查询是这样的:

I am trying to do a left outer join in spark (1.6.2) and it doesn't work. My sql query is like this:

sqlContext.sql("select t.type, t.uuid, p.uuid
from symptom_type t LEFT JOIN plugin p 
ON t.uuid = p.uuid 
where t.created_year = 2016 
and p.created_year = 2016").show()

结果是这样的:

+--------------------+--------------------+--------------------+
|                type|                uuid|                uuid|
+--------------------+--------------------+--------------------+
|              tained|89759dcc-50c0-490...|89759dcc-50c0-490...|
|             swapper|740cd0d4-53ee-438...|740cd0d4-53ee-438...|

使用LEFT JOIN或LEFT OUTER JOIN(第二个uuid不为null),我得到相同的结果.

I got same result either using LEFT JOIN or LEFT OUTER JOIN (the second uuid is not null).

我希望第二个uuid列仅为null.如何正确地进行左外连接?

I would expect the second uuid column to be null only. how to do a left outer join correctly?

===其他信息==

=== Additional information ==

如果我使用数据框进行左外部连接,我将得到正确的结果.

If I using dataframe to do left outer join i got correct result.

s = sqlCtx.sql('select * from symptom_type where created_year = 2016')
p = sqlCtx.sql('select * from plugin where created_year = 2016')

s.join(p, s.uuid == p.uuid, 'left_outer')
.select(s.type, s.uuid.alias('s_uuid'), 
        p.uuid.alias('p_uuid'), s.created_date, p.created_year, p.created_month).show()

我得到这样的结果:

+-------------------+--------------------+-----------------+--------------------+------------+-------------+
|               type|              s_uuid|           p_uuid|        created_date|created_year|created_month|
+-------------------+--------------------+-----------------+--------------------+------------+-------------+
|             tained|6d688688-96a4-341...|             null|2016-01-28 00:27:...|        null|         null|
|             tained|6d688688-96a4-341...|             null|2016-01-28 00:27:...|        null|         null|
|             tained|6d688688-96a4-341...|             null|2016-01-28 00:27:...|        null|         null|

谢谢

推荐答案

我在您的代码中看不到任何问题. 左连接"或左外部连接"都可以正常工作.请再次检查数据,显示的数据是否匹配.

I don't see any issues in your code. Both "left join" or "left outer join" will work fine. Please check the data again the data you are showing is for matches.

您还可以使用以下命令执行Spark SQL连接:

You can also perform Spark SQL join by using:

//左外部联接明确

df1.join(df2, df1["col1"] == df2["col1"], "left_outer")

这篇关于如何在Spark SQL中进行左外部联接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆