Spark jdbc重用连接 [英] Spark jdbc reuse connection

查看:426
本文介绍了Spark jdbc重用连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的spark应用程序中,我使用以下代码使用JDBC驱动程序从sql server数据库中检索数据.

In my spark application, i use the following code to retrieve the data from sql server database using JDBC driver.

 Dataset<Row> dfResult= sparksession.read().jdbc("jdbc:sqlserver://server\dbname", tableName,partitionColumn, lowerBound, upperBound, numberOfPartitions, properties);

并在dfResult数据集上使用地图操作.

and use map operation on dfResult dataset.

以独立模式运行应用程序时,我看到spark为每个rdd创建了唯一的连接.从Api描述中,我了解spark负责关闭连接.

While running the application in standalone mode, i see spark creates unique connection for each rdd.From the Api description, I understand spark takes care of closing the connection.

我可以知道是否有一种方法可以重用该连接,而不是为每个rdd分区打开和关闭jdbc连接吗?

May i know whether there is a way to reuse the connection instead of opening and closing the jdbc connection for each rdd partition?

谢谢

推荐答案

即使您通过API将数据手动推送到数据库中,我也经常看到建议:您为每个分区创建一个连接

Even when you're pushing data manually into a database over an API, I often see recommendations that you create one connection per partition.

# pseudo-code
rdd.foreachPartition(iterator =>
  connection = SomeAPI.connect()
  for i in iterator:
    connection.insert(i)
)

因此,如果jdbc对象已经在执行此操作,则必须确认该模式应该是这种方式.

And so, if the jdbc object is already doing that, then that must be confirming that the pattern should be that way.

这里是推荐这种模式的另一个示例:

Here's another example of this pattern being recommended:

http: //www.slideshare.net/databricks/strata-sj-everyday-im-shuffling-tips-for-writing-better-spark-programs (幻灯片27 )

我认为这是推荐的模式的原因是,当您在多节点集群中工作时,您不知道特定分区将在哪个节点上进行评估,因此,您需要确保对其进行评估.有一个数据库连接.

I presume the reason why this is the recommended pattern is because when you're working in a multi-node cluster, you never know on which node a particular partition will be evaluated, and thus, you'd want to ensure it has a DB connection for it.

这篇关于Spark jdbc重用连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆