如何在JAVA中加入没有重复列的Spark数据框 [英] How to join Spark dataframe without duplicate columns in JAVA

查看:148
本文介绍了如何在JAVA中加入没有重复列的Spark数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们如何合并没有重复列的2个数据框

How can we merge 2 dataframes without duplicate columns

a.show()

+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-23 12:33:00|       1|logout|
|Alice|2015-04-20 12:33:00|       5| login|
+-----+-------------------+--------+------+

b.show()
+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-24 00:33:00|       1|login |
+-----+-------------------+--------+------+

我想通过使用数据框A中的整个数据来形成一个新的数据框,但是使用B中的数据来更新行

I want to form a new dataframe by using whole data in Dataframe A but update rows using data in B

+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-24 00:33:00|       1|login |
|Alice|2015-04-20 12:33:00|       5| login|
+-----+-------------------+--------+------+

我能够在scala中加入并形成数据框.但是不能用JAVA做到.

I am able to join and form dataframe in scala. But not able to do in JAVA.

DataFrame f=a.join(b,a.col("Name").equalsTo(b.col("Name")).and a.col("LastTime).equalsTo(b.col("LastTime).and(a.col("Duration").equalsTo(b.col("Duration"),"outer")

在执行这样的JOIN时,我得到重复的列.

I am getting duplicate columns while performing JOIN like this.

推荐答案

我认为我们可以通过Spark SQL进行尝试,并且也可以通过Java执行.

I think we can try it through Spark SQL and it could be executed through java as well.

spark.sql("""SELECT a.Name as Name,
CASE WHEN b.Name is null THEN a.LastTime ELSE b.LastTime END AS LastTime,
CASE WHEN b.Name is null THEN a.Duration ELSE b.Duration END AS Duration,
CASE WHEN b.Name is null THEN a.Status ELSE b.Status END AS Status 
FROM a a left outer join  b b on a.Name=b.Name 
""").show(false)

+-----+-------------------+--------+------+
|Name |LastTime           |Duration|Status|
+-----+-------------------+--------+------+
|Bob  |2015-04-24 00:33:00|1       |login |
|Alice|2015-04-20 12:33:00|5       |login |
+-----+-------------------+--------+------+

一个人可以根据用例更新联接条件

One can update the join condition as per usecase

这篇关于如何在JAVA中加入没有重复列的Spark数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆