如何遍历azure databricks数据中的多个表并删除postgres数据库中的匹配记录? [英] How to iterate through multiple tables in azure databricks data and delete the matching record in postgres database?

查看:49
本文介绍了如何遍历azure databricks数据中的多个表并删除postgres数据库中的匹配记录?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从azure databricks表中提取数据,并且仅将第一行(rank = 1)加载到postgres中的相似表(相同表结构)中.但是,在加载之前,我正在检查postgres表中的column5值是否与databricks表相同,如果是,则必须从postgres表中删除该行,并且必须加载其余值.

I am extracting data from azure databricks table and loading only the first row (rank = 1) into similar table(same table structure) in postgres. However before loading, I'm checking if the column5 value in postgres table is same as the databricks table, if yes, then that row has to be deleted from postgres table and the remaining values have to be loaded.

在这里,我要遍历提取的databricks表的行,并为postgres中的每一行运行delete命令.请提出一种无需使用游标即可在SQL中实现此目标的方法吗?

Here I want to iterate over the rows of my extracted databricks table and run delete command for each row in postgres. Please suggest a way to achieve this in SQL without using cursors ?

推荐答案

创建临时表 stg .

从databricks表中加载它.理想情况下,使用以下命令从数据块导出:

Load it from the databricks table. Ideally , export from databricks using:

SELECT * FROM databricks_table WHERE rank_column = 1-或WHERE RANK()OVER(PARTITION BY any ORDER BY what_else)= 1

然后,在PostgreSQL上,使用MERGE语句:

Then, on PostgreSQL, use the MERGE statement:

MERGE INTO tgt USING stg ON tgt.column5 = stg.column5
WHEN MATCHED THEN UPDATE SET
  col1 = stg.col1
, col2 = stg.col2
[. . .]
WHEN NOT MATCHED THEN INSERT (col1,col2, ... , coln)
                      VALUES(stg.col1,stg.col2 ...)
;

检查PostgreSQL文档中的MERGE语句以获取更多详细信息...

Check the PostgreSQL docu for the MERGE statement for more details ...

这篇关于如何遍历azure databricks数据中的多个表并删除postgres数据库中的匹配记录?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆