spark:合并两个数据帧，如果两个数据帧中的ID重复，df1中的行覆盖df2中的行 [英] spark: merge two dataframes, if ID duplicated in two dataframes, the row in df1 overwrites the row in df2

查看：22 发布时间：2021/11/14 23:07:43 scala dataframe apache-spark apache-spark-sql

本文介绍了spark:合并两个数据帧，如果两个数据帧中的ID重复，df1中的行覆盖df2中的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有两个数据帧:df1 和 df2，具有相同的架构.ID 是主键.

There are two dataframes: df1, and df2 with the same schema. ID is the primary key.

我需要合并两个 df1 和 df2.这可以通过 union 完成，但有一个特殊要求:如果 df1 和 df2 中存在具有相同 ID 的重复行.我需要将其保留在 df1 中.

I need merge the two df1, and df2. This can be done by union except one special requirement: if there are duplicates rows with the same ID in df1 and df2. I need keep the one in df1.

df1:

ID col1 col2
1  AA   2019
2  B    2018

df2:

ID col1 col2
1  A    2019
3  C    2017

我需要以下输出:

df1:

ID col1 col2
1  AA   2019
2  B    2018
3  C    2017

如何做到这一点?谢谢.我认为可以注册两个 tmp 表，进行完全连接并使用 coalesce.但我不喜欢这种方式，因为实际上大约有 40 列，而不是上面示例中的 3 列.

How to do this? Thanks. I think it is possible to register two tmp tables, do full joins and use coalesce. but I do not prefer this way, because there are about 40 columns, in fact, instead of 3 in the above example.

推荐答案

鉴于两个 DataFrame 具有相同的架构，您可以简单地将 df1 与 left_anti 连接联合起来df2 &df1:

Given that the two DataFrames have the same schema, you could simply union df1 with the left_anti join of df2 & df1:

df1.union(df2.join(df1, Seq("ID"), "left_anti")).show
// +---+---+----+
// | ID|co1|col2|
// +---+---+----+
// |  1| AA|2019|
// |  2|  B|2018|
// |  3|  C|2017|
// +---+---+----+

这篇关于spark:合并两个数据帧，如果两个数据帧中的ID重复，df1中的行覆盖df2中的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

spark:合并两个数据帧，如果两个数据帧中的ID重复，df1中的行覆盖df2中的行 [英] spark: merge two dataframes, if ID duplicated in two dataframes, the row in df1 overwrites the row in df2

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

spark:合并两个数据帧，如果两个数据帧中的ID重复，df1中的行覆盖df2中的行 [英] spark: merge two dataframes, if ID duplicated in two dataframes, the row in df1 overwrites the row in df2

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭