如何在JAVA中加入没有重复列的Spark数据帧 [英] How to join Spark dataframe without duplicate columns in JAVA

查看:26
本文介绍了如何在JAVA中加入没有重复列的Spark数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们如何在没有重复列的情况下合并 2 个数据框

How can we merge 2 dataframes without duplicate columns

a.show()

+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-23 12:33:00|       1|logout|
|Alice|2015-04-20 12:33:00|       5| login|
+-----+-------------------+--------+------+

b.show()
+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-24 00:33:00|       1|login |
+-----+-------------------+--------+------+

我想通过使用数据帧 A 中的整个数据来形成一个新的数据帧,但使用 B 中的数据更新行

I want to form a new dataframe by using whole data in Dataframe A but update rows using data in B

+-----+-------------------+--------+------+
| Name|           LastTime|Duration|Status|
+-----+-------------------+--------+------+
|  Bob|2015-04-24 00:33:00|       1|login |
|Alice|2015-04-20 12:33:00|       5| login|
+-----+-------------------+--------+------+

我能够在 Scala 中加入和形成数据框.但是在JAVA中做不到.

I am able to join and form dataframe in scala. But not able to do in JAVA.

DataFrame f=a.join(b,a.col("Name").equalsTo(b.col("Name")).and a.col("LastTime).equalsTo(b.col("LastTime).and(a.col("Duration").equalsTo(b.col("Duration"),"outer")

我在像这样执行 JOIN 时得到重复的列.

I am getting duplicate columns while performing JOIN like this.

推荐答案

我觉得我们可以通过 Spark SQL 尝试,也可以通过 java 执行.

I think we can try it through Spark SQL and it could be executed through java as well.

spark.sql("""SELECT a.Name as Name,
CASE WHEN b.Name is null THEN a.LastTime ELSE b.LastTime END AS LastTime,
CASE WHEN b.Name is null THEN a.Duration ELSE b.Duration END AS Duration,
CASE WHEN b.Name is null THEN a.Status ELSE b.Status END AS Status 
FROM a a left outer join  b b on a.Name=b.Name 
""").show(false)

+-----+-------------------+--------+------+
|Name |LastTime           |Duration|Status|
+-----+-------------------+--------+------+
|Bob  |2015-04-24 00:33:00|1       |login |
|Alice|2015-04-20 12:33:00|5       |login |
+-----+-------------------+--------+------+

可以根据用例更新连接条件

One can update the join condition as per usecase

这篇关于如何在JAVA中加入没有重复列的Spark数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆