在 Spark SQL 中插入如果不存在 ELSE 更新 [英] INSERT IF NOT EXISTS ELSE UPDATE in Spark SQL
问题描述
Spark SQL 中是否有执行INSERT IF NOT EXISTS ELSE UPDATE"的规定.
Is there any provision of doing "INSERT IF NOT EXISTS ELSE UPDATE" in Spark SQL.
我有包含一些记录的 Spark SQL 表ABC".然后我有另一批记录,我想根据它们是否存在于该表中来在该表中插入/更新.
I have Spark SQL table "ABC" that has some records. And then i have another batch of records that i want to Insert/update in this table based on whether they exist in this table or not.
是否有我可以在 SQL 查询中使用的 SQL 命令来实现这一点?
is there a SQL command that i can use in SQL query to make this happen?
推荐答案
在常规 Spark 中,这可以通过 join
后跟 map
来实现,如下所示:>
In regular Spark this could be achieved with a join
followed by a map
like this:
import spark.implicits._
val df1 = spark.sparkContext.parallelize(List(("id1", "orginal"), ("id2", "original"))).toDF("df1_id", "df1_status")
val df2 = spark.sparkContext.parallelize(List(("id1", "new"), ("id3","new"))).toDF("df2_id", "df2_status")
val df3 = df1
.join(df2, 'df1_id === 'df2_id, "outer")
.map(row => {
if (row.isNullAt(2))
(row.getString(0), row.getString(1))
else
(row.getString(2), row.getString(3))
})
这产生:
scala> df3.show
+---+--------+
| _1| _2|
+---+--------+
|id3| new|
|id1| new|
|id2|original|
+---+--------+
您也可以将 select
与 udfs
一起使用,而不是 map
,但在这种具有空值的特殊情况下,我个人更喜欢 <代码>地图变体.
You could also use select
with udfs
instead of map
, but in this particular case with null-values, I personally prefer the map
variant.
这篇关于在 Spark SQL 中插入如果不存在 ELSE 更新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!