如何使用Spark执行插入覆盖? [英] How can you perform a Insert overwrite using Spark?

查看：327 发布时间：2020/9/4 20:47:30 scala apache-spark apache-spark-sql spark-dataframe

本文介绍了如何使用Spark执行插入覆盖?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试将我们的ETL Hive脚本之一转换为Spark，其中Hive ETL脚本维护着一个表，在该表中，每天晚上需要在新同步之前删除部分数据. Hive ETL使主表使用插入覆盖功能删除超过3天的数据.基本上用不超过三天的数据创建一个临时表，然后覆盖主表.

I'm trying to transition one of our ETL Hive script to Spark where the Hive ETL script maintains a table where part of data needs to be deleted every night before the new sync. The Hive ETL takes the main table deletes data that in greater than 3 days using insert overwrite. Basically creates a temp table with data that doesn't surpass greater than three days and then overwrites the main table.

使用Spark(使用Scala)时，在无法写入同一源代码的情况下，我不断收到此错误.这是我的代码:

With Spark (using Scala) I keep getting this error where I cannot write to the same source. Here's my code:

spark.sql ("Select * from mytbl_hive where dt > date_sub(current_date, 3)").registerTempTable("tmp_mytbl")

val mytbl = sqlContext.table("tmp_mytbl")
mytbl.write.mode("overwrite").saveTableAs("tmp_mytbl")

//writing back to Hive ...

mytbl.write.mode("overwrite").insertInto("mytbl_hive")

我收到无法写入正在读取的表的错误.

I get the error that I cannot write to the table I'm reading from.

有人知道这样做的更好方法吗?

Does anyone know of a better way of doing this?

如何使用Spark执行插入覆盖? [英] How can you perform a Insert overwrite using Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用Spark执行插入覆盖? [英] How can you perform a Insert overwrite using Spark?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭