使用 spark sql 数据框删除功能 [英] Delete functionality with spark sql dataframe

查看:122
本文介绍了使用 spark sql 数据框删除功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为我的 spark 应用程序从 postgres 数据库加载/删除特定记录.对于加载,我使用以下格式的火花数据帧

I have a requirement to do a load/delete specific records from postgres db for my spark application. For loading , I am using spark dataframe in the below format

sqlContext.read.format("jdbc").options(Map("url" -> "postgres url", 
      "user" -> "user" ,
      "password" -> "xxxxxx" , 
      "table" -> "(select * from employee where emp_id > 1000) as filtered_emp")).load()

要删除数据,我直接写sql而不是使用dataframes

To delete the data, I am writing direct sql instead of using dataframes

delete from employee where emp_id > 1000

问题是,是否有类似于下面的删除数据库中记录的火花方式?还是只能直接用sql?

The question is , is there a spark way of deleting records in database something similar to below? Or the only way is to use direct sql?

sqlContext.read.format("jdbc").options(Map("url" -> "postgres url", 
      "user" -> "user" ,
      "password" -> "xxxxxx" , 
      "table" -> "(delete from employee where emp_id > 1000) as filtered_emp")).load()

推荐答案

如果您想修改(删除记录)实际数据源,即 postgres 中的表,那么 Spark 不是一个好方法.您可以直接使用 jdbc 客户端来实现相同的功能.

If you want to modify(delete records) the actual source of data i.e. tables in postgres then Spark wouldn't be a great way. You can use jdbc client directly for achieving the same.

如果您无论如何都想这样做(根据您作为数据帧的一部分计算的一些线索,以分布式方式);您可以编写与数据帧对应的相同 jdbc 客户端代码,这些代码具有用于删除记录的逻辑/触发信息,并且我们可以在多个工作线程上并行执行.

If you want to do this anyway (in distrubuted manner based on some clues that you are computing as part of dataframes); you can have the same jdbc client code written in correspondence with dataframe that have logic/trigger info for deleting records and that can we executed on multiple workers parallely.

这篇关于使用 spark sql 数据框删除功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆