Spark JDBC fetchsize选项 [英] Spark JDBC fetchsize option
问题描述
我目前有一个应用程序,该应用程序应该连接到不同类型的数据库,使用Spark的JDBC选项在该数据库上运行特定的查询,然后将生成的DataFrame写入HDFS.
I currently have an application which is supposed to connect to different types of databases, run a specific query on that database using Spark's JDBC options and then write the resultant DataFrame to HDFS.
对于Oracle而言,性能极其糟糕(未检查所有参数).事实证明,这是因为fetchSize
属性对于Oracle默认为10行.因此我将其增加到1000,并且性能提升非常明显.然后,我将其更改为10000,但随后某些表开始由于执行程序中的内存不足问题而失败(6个执行程序,每个执行器4G内存,2G驱动程序内存).
The performance was extremely bad for Oracle (didn't check for all of them). Turns out it was because of the fetchSize
property which is 10 rows by default for Oracle. So I increased it to 1000 and the performance gain was quite visible. Then, I changed it to 10000 but then some of the tables started failing with an out of memory issue in the executor ( 6 executors, 4G memory each, 2G driver memory ).
我的问题是:
-
Spark的JDBC所获取的数据是否在每次运行时都保存在执行程序内存中?作业运行时,有什么方法可以不持久?
在哪里可以获取有关fetchSize
属性的更多信息?我猜不是所有的JDBC驱动程序都支持它.
Where can I get more information about the fetchSize
property? I'm guessing it won't be supported by all JDBC drivers.
为了避免OOM错误,我还有其他需要注意的与JDBC相关的事情吗?
Are there any other things that I need to take care which are related to JDBC to avoid OOM errors?
推荐答案
获取大小这只是JDBC PreparedStatement的值.
Fetch Size It's just a value for JDBC PreparedStatement.
您可以在JDBCRDD.scala中看到它:
You can see it in JDBCRDD.scala:
stmt.setFetchSize(options.fetchSize)
You can read more about JDBC FetchSize here
您还可以改进的一件事是设置所有4个参数,这将导致读取并行化.在此处中查看更多.然后,您的阅读内容可以分为许多机器,因此每台机器的内存使用量可能会较小.
One thing you can also improve is to set all 4 parameters, that will cause parallelization of reading. See more here. Then your reading can be splitted into many machines, so memory usage for every of them may be smaller.
有关支持哪些JDBC选项以及如何支持的详细信息,您必须搜索驱动程序文档-每个驱动程序可能都有其自己的行为
For details which JDBC Options are supported and how, you must search for your Driver documentation - every driver may have it's own behaviour
这篇关于Spark JDBC fetchsize选项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!