什么是 spark.driver.maxResultSize? [英] What is spark.driver.maxResultSize?

查看:53
本文介绍了什么是 spark.driver.maxResultSize?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

ref 说:

每个分区的所有分区序列化结果的总大小限制触发操作(例如收集).应至少为 1M,或 0 表示无限.如果总大小超过此限制,作业将被中止.上限可能会导致驱动程序内存不足错误(取决于关于 JVM 中对象的 spark.driver.memory 和内存开销).环境适当的限制可以保护驱动程序免受内存不足错误的影响.

Limit of total size of serialized results of all partitions for each Spark action (e.g. collect). Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.

这个属性到底有什么作用?我的意思是一开始(因为我不是在与因内存不足错误而失败的工作作斗争)我认为我应该增加它.

What does this attribute do exactly? I mean at first (since I am not battling with a job that fails due to out of memory errors) I thought I should increase that.

再想一想,这个属性似乎定义了工作人员可以发送给驱动程序的结果的最大大小,因此将其保留为默认值 (1G) 将是保护驱动程序的最佳方法..

On second thought, it seems that this attribute defines the max size of the result a worker can send to the driver, so leaving it at the default (1G) would be the best approach to protect the driver..

但是在这种情况下会发生,worker 将不得不发送更多的消息,所以开销只是工作会变慢?

But will happen on this case, the worker will have to send more messages, so the overhead will be just that the job will be slower?

如果我理解正确,假设一个worker想要发送4G的数据给driver,那么spark.driver.maxResultSize=1G,会导致worker发送4条消息(而不是1 无限制 spark.driver.maxResultSize).如果是这样,那么增加该属性以保护我的司机不被 Yarn 暗杀应该是错误的.

If I understand correctly, assuming that a worker wants to send 4G of data to the driver, then having spark.driver.maxResultSize=1G, will cause the worker to send 4 messages (instead of 1 with unlimited spark.driver.maxResultSize). If so, then increasing that attribute to protect my driver from being assassinated from Yarn should be wrong.

但上面的问题仍然存在..我的意思是,如果我将其设置为 1M(最小值),它会是最具保护性的方法吗?

But still the question above remains..I mean what if I set it to 1M (the minimum), will it be the most protective approach?

推荐答案

假设一个worker想给driver发送4G的数据,那么spark.driver.maxResultSize=1G,会导致worker发送4条消息(而不是1条无限制的spark.driver.maxResultSize).

assuming that a worker wants to send 4G of data to the driver, then having spark.driver.maxResultSize=1G, will cause the worker to send 4 messages (instead of 1 with unlimited spark.driver.maxResultSize).

没有.如果数据的估计大小大于 maxResultSize 给定的作业将被中止.此处的目标是保护您的应用程序免遭驱动程序丢失,仅此而已.

No. If estimated size of the data is larger than maxResultSize given job will be aborted. The goal here is to protect your application from driver loss, nothing more.

如果我把它设置为 1M(最小),它会是最保护的方法吗?

if I set it to 1M (the minimum), will it be the most protective approach?

从某种意义上说是的,但显然它在实践中没有用.良好的价值应该允许应用程序正常进行,但保护应用程序免受意外情况的影响.

In sense yes, but obviously it is not useful in practice. Good value should allow application to proceed normally but protect application from unexpected conditions.

这篇关于什么是 spark.driver.maxResultSize?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆