Spark 数据帧 UPSERT 到 Postgres 表 [英] Spark Dataframes UPSERT to Postgres Table
问题描述
我正在使用 Apache Spark DataFrames 连接两个数据源,并将结果作为另一个 DataFrame.我想将结果写入另一个 Postgres 表.我看到这个选项:
I am using Apache Spark DataFrames to join two data sources and get the result as another DataFrame. I want to write the result to another Postgres table. I see this option :
myDataFrame.write.jdbc(url, table, connectionProperties)
但是,我想要做的是根据表的主键将数据帧UPSERT 到表中.这要怎么做?我使用的是 Spark 1.6.0.
But, what I want to do is UPSERT the dataframe into table based on the Primary Key of the Table. How is this to be done? I am using Spark 1.6.0.
推荐答案
不支持.DataFrameWriter
可以附加到或覆盖现有表.如果您的应用程序需要更复杂的逻辑,则必须手动处理.
It is not supported. DataFrameWriter
can either append to or overwrite existing table. If your application requires more complex logic you'll have to deal with this manually.
一种选择是使用具有标准 JDBC 连接的操作(foreach
、foreachPartition
).另一种是写入临时文件并直接在数据库中处理其余部分.
One option is to use an action (foreach
, foreachPartition
) with standard JDBC connection. Another one is to write to a temporary and handle the rest directly in the database.
另见 SPARK-19335(Spark 应该支持做一个高效的 DataFrame Upsert 通过 JDBC) 和相关提案.
See also SPARK-19335 (Spark should support doing an efficient DataFrame Upsert via JDBC) and related proposals.
这篇关于Spark 数据帧 UPSERT 到 Postgres 表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!