Spark Scala数据框合并多个数据框 [英] spark scala dataframe merge multiple dataframes

查看:341
本文介绍了Spark Scala数据框合并多个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有三个文件,

## +---+----+----+---+
## |pk1|pk2|val1|val2|
## +---+----+----+---+
## |  1| aa|  ab|  ac|
## |  2| bb|  bc|  bd|
## +---+----+----+---+

## +---+----+----+---+
## |pk1|pk2|val1|val2|
## +---+----+----+---+
## |  1| aa|  ab|  ad|
## |  2| bb|  bb|  bd|
## +---+----+----+---+

## +---+----+----+---+
## |pk1|pk2|val1|val2|
## +---+----+----+---+
## |  1| aa|  ac|  ad|
## |  2| bb|  bc|  bd|
## +---+----+----+---+

我需要比较前两个文件(我正在以数据帧的形式读取)并仅识别更改,然后与第三个文件合并,所以我的输出应该是

I need to compare the first two files (which I'm reading as dataframe) and identify only the changes and then merge with the third file, so my output should be,

## +---+----+----+---+
## |pk1|pk2|val1|val2|
## +---+----+----+---+
## |  1| aa|  ac|  ad|
## |  2| bb|  bb|  bd|
## +---+----+----+---+

如何仅选择更改的列?并更新另一个数据框?

How to pick only the changed columns? and update another dataframe?

推荐答案

我还可以通过将数据帧创建为临时视图来执行此操作,然后选择case语句.像这样

I can also do this by creating the dataframe as a temp view and then do select case statement. Like this,

df1.createTempView("df1")
df2.createTempView("df2")
df3.createTempView("df3")

select case when df1.val1=df2.val1 and df1.val1<>df3.val1 then df3.val1 end

这要快得多.

这篇关于Spark Scala数据框合并多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆