什么是 spark.driver.maxResultSize? [英] What is spark.driver.maxResultSize?

查看：53 发布时间：2022/1/3 9:09:53 apache-spark configuration driver communication distributed-computing

本文介绍了什么是 spark.driver.maxResultSize?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

ref 说:

每个分区的所有分区序列化结果的总大小限制触发操作(例如收集).应至少为 1M，或 0 表示无限.如果总大小超过此限制，作业将被中止.上限可能会导致驱动程序内存不足错误(取决于关于 JVM 中对象的 spark.driver.memory 和内存开销).环境适当的限制可以保护驱动程序免受内存不足错误的影响.

Limit of total size of serialized results of all partitions for each Spark action (e.g. collect). Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.

这个属性到底有什么作用?我的意思是一开始(因为我不是在与因内存不足错误而失败的工作作斗争)我认为我应该增加它.

What does this attribute do exactly? I mean at first (since I am not battling with a job that fails due to out of memory errors) I thought I should increase that.

再想一想，这个属性似乎定义了工作人员可以发送给驱动程序的结果的最大大小，因此将其保留为默认值 (1G) 将是保护驱动程序的最佳方法..

On second thought, it seems that this attribute defines the max size of the result a worker can send to the driver, so leaving it at the default (1G) would be the best approach to protect the driver..

但是在这种情况下会发生，worker 将不得不发送更多的消息，所以开销只是工作会变慢?

But will happen on this case, the worker will have to send more messages, so the overhead will be just that the job will be slower?

如果我理解正确，假设一个worker想要发送4G的数据给driver，那么spark.driver.maxResultSize=1G，会导致worker发送4条消息(而不是1 无限制 spark.driver.maxResultSize).如果是这样，那么增加该属性以保护我的司机不被 Yarn 暗杀应该是错误的.

If I understand correctly, assuming that a worker wants to send 4G of data to the driver, then having spark.driver.maxResultSize=1G, will cause the worker to send 4 messages (instead of 1 with unlimited spark.driver.maxResultSize). If so, then increasing that attribute to protect my driver from being assassinated from Yarn should be wrong.

但上面的问题仍然存在..我的意思是，如果我将其设置为 1M(最小值)，它会是最具保护性的方法吗?

But still the question above remains..I mean what if I set it to 1M (the minimum), will it be the most protective approach?

什么是 spark.driver.maxResultSize? [英] What is spark.driver.maxResultSize?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

什么是 spark.driver.maxResultSize? [英] What is spark.driver.maxResultSize?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭