在 Spark SQL 中插入如果不存在 ELSE 更新 [英] INSERT IF NOT EXISTS ELSE UPDATE in Spark SQL

查看:29
本文介绍了在 Spark SQL 中插入如果不存在 ELSE 更新的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Spark SQL 中是否有执行INSERT IF NOT EXISTS ELSE UPDATE"的规定.

Is there any provision of doing "INSERT IF NOT EXISTS ELSE UPDATE" in Spark SQL.

我有包含一些记录的 Spark SQL 表ABC".然后我有另一批记录,我想根据它们是否存在于该表中来在该表中插入/更新.

I have Spark SQL table "ABC" that has some records. And then i have another batch of records that i want to Insert/update in this table based on whether they exist in this table or not.

是否有我可以在 SQL 查询中使用的 SQL 命令来实现这一点?

is there a SQL command that i can use in SQL query to make this happen?

推荐答案

在常规 Spark 中,这可以通过 join 后跟 map 来实现,如下所示:

In regular Spark this could be achieved with a join followed by a map like this:

import spark.implicits._
val df1 = spark.sparkContext.parallelize(List(("id1", "orginal"), ("id2", "original"))).toDF("df1_id", "df1_status")
val df2 = spark.sparkContext.parallelize(List(("id1", "new"), ("id3","new"))).toDF("df2_id", "df2_status")

val df3 = df1
  .join(df2, 'df1_id === 'df2_id, "outer")
  .map(row => {
    if (row.isNullAt(2))
      (row.getString(0), row.getString(1))
    else
      (row.getString(2), row.getString(3))
  })

这产生:

scala> df3.show
+---+--------+
| _1|      _2|
+---+--------+
|id3|     new| 
|id1|     new|
|id2|original|
+---+--------+

您也可以将 selectudfs 一起使用,而不是 map,但在这种具有空值的特殊情况下,我个人更喜欢 <代码>地图变体.

You could also use select with udfs instead of map, but in this particular case with null-values, I personally prefer the map variant.

这篇关于在 Spark SQL 中插入如果不存在 ELSE 更新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆