使用 spark sql 数据框删除功能 [英] Delete functionality with spark sql dataframe
问题描述
我需要为我的 spark 应用程序从 postgres 数据库加载/删除特定记录.对于加载,我使用以下格式的火花数据帧
I have a requirement to do a load/delete specific records from postgres db for my spark application. For loading , I am using spark dataframe in the below format
sqlContext.read.format("jdbc").options(Map("url" -> "postgres url",
"user" -> "user" ,
"password" -> "xxxxxx" ,
"table" -> "(select * from employee where emp_id > 1000) as filtered_emp")).load()
要删除数据,我直接写sql而不是使用dataframes
To delete the data, I am writing direct sql instead of using dataframes
delete from employee where emp_id > 1000
问题是,是否有类似于下面的删除数据库中记录的火花方式?还是只能直接用sql?
The question is , is there a spark way of deleting records in database something similar to below? Or the only way is to use direct sql?
sqlContext.read.format("jdbc").options(Map("url" -> "postgres url",
"user" -> "user" ,
"password" -> "xxxxxx" ,
"table" -> "(delete from employee where emp_id > 1000) as filtered_emp")).load()
推荐答案
如果您想修改(删除记录)实际数据源,即 postgres 中的表,那么 Spark 不是一个好方法.您可以直接使用 jdbc 客户端来实现相同的功能.
If you want to modify(delete records) the actual source of data i.e. tables in postgres then Spark wouldn't be a great way. You can use jdbc client directly for achieving the same.
如果您无论如何都想这样做(根据您作为数据帧的一部分计算的一些线索,以分布式方式);您可以编写与数据帧对应的相同 jdbc 客户端代码,这些代码具有用于删除记录的逻辑/触发信息,并且我们可以在多个工作线程上并行执行.
If you want to do this anyway (in distrubuted manner based on some clues that you are computing as part of dataframes); you can have the same jdbc client code written in correspondence with dataframe that have logic/trigger info for deleting records and that can we executed on multiple workers parallely.
这篇关于使用 spark sql 数据框删除功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!