Spark Scala数据框合并多个数据框 [英] spark scala dataframe merge multiple dataframes
本文介绍了Spark Scala数据框合并多个数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有三个文件,
## +---+----+----+---+
## |pk1|pk2|val1|val2|
## +---+----+----+---+
## | 1| aa| ab| ac|
## | 2| bb| bc| bd|
## +---+----+----+---+
## +---+----+----+---+
## |pk1|pk2|val1|val2|
## +---+----+----+---+
## | 1| aa| ab| ad|
## | 2| bb| bb| bd|
## +---+----+----+---+
## +---+----+----+---+
## |pk1|pk2|val1|val2|
## +---+----+----+---+
## | 1| aa| ac| ad|
## | 2| bb| bc| bd|
## +---+----+----+---+
我需要比较前两个文件(我正在以数据帧的形式读取)并仅识别更改,然后与第三个文件合并,所以我的输出应该是
I need to compare the first two files (which I'm reading as dataframe) and identify only the changes and then merge with the third file, so my output should be,
## +---+----+----+---+
## |pk1|pk2|val1|val2|
## +---+----+----+---+
## | 1| aa| ac| ad|
## | 2| bb| bb| bd|
## +---+----+----+---+
如何仅选择更改的列?并更新另一个数据框?
How to pick only the changed columns? and update another dataframe?
推荐答案
我还可以通过将数据帧创建为临时视图来执行此操作,然后选择case语句.像这样
I can also do this by creating the dataframe as a temp view and then do select case statement. Like this,
df1.createTempView("df1")
df2.createTempView("df2")
df3.createTempView("df3")
select case when df1.val1=df2.val1 and df1.val1<>df3.val1 then df3.val1 end
这要快得多.
这篇关于Spark Scala数据框合并多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文