spark Dataframe执行UPDATE语句 [英] spark Dataframe execute UPDATE statement
问题描述
我需要使用Apache Spark DataFrame执行jdbc操作. 基本上,我有一个历史记录的jdbc表,称为"Measures",在其中必须执行两项操作:
I need to perform jdbc operation using Apache Spark DataFrame. Basically I have an historical jdbc table called Measures where I have to do two operations:
1.将旧度量记录的endTime有效性属性设置为当前时间
2.插入一个新的测量记录,将endTime设置为9999-12-31
有人可以告诉我如何(如果可以的话)为第一项操作执行更新语句,为第二项操作插入吗?
Can someone tell me how to perform (if we can) update statement for the first operation and insert for the second operation?
我尝试将以下语句用于第一个操作:
I tried to use this statement for the first operation:
val dfWriter = df.write.mode(SaveMode.Overwrite)
dfWriter.jdbc("jdbc:postgresql:postgres", tableName, prop)
但是它不起作用,因为存在重复的密钥冲突.如果可以进行更新,该如何删除语句?
But it doesn't work because there is a duplicate key violation. If we can do update, how we can do delete statement?
谢谢.
推荐答案
Spark还不支持它.您可以使用foreachRDD()循环在数据帧/RDD上进行迭代,并使用JDBC API手动更新/删除表.
I don't think its supported out of the box yet by Spark. What you can do it iterate over the dataframe/RDD using the foreachRDD() loop and manually update/delete the table using JDBC api.
这是一个类似问题的链接: 将UPSERT的数据框UPSERT映射到Postgres表
here is link to a similar question : Spark Dataframes UPSERT to Postgres Table
这篇关于spark Dataframe执行UPDATE语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!