什么是spark.driver.maxResultSize? [英] What is spark.driver.maxResultSize?

查看:1397
本文介绍了什么是spark.driver.maxResultSize?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

参考说:

每个分区的所有分区的序列化结果的总大小的限制 触发动作(例如收集).至少应为1M,否则应为0 无限.如果总大小超过此限制,作业将被中止. 上限过高可能会导致驱动程序内存不足错误(取决于 关于spark.driver.memory和JVM中对象的内存开销).环境 适当的限制可以保护驱动程序免受内存不足错误的影响.

Limit of total size of serialized results of all partitions for each Spark action (e.g. collect). Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.

此属性的作用是什么?我的意思是,起初(因为我不为因内存不足错误而失败的工作而奋斗),我认为我应该增加它.

What does this attribute do exactly? I mean at first (since I am not battling with a job that fails due to out of memory errors) I thought I should increase that.

经过深思熟虑,似乎该属性定义了工作人员可以发送给驱动程序的结果的最大大小,因此将其保留为默认值(1G)将是保护驱动程序的最佳方法.

On second thought, it seems that this attribute defines the max size of the result a worker can send to the driver, so leaving it at the default (1G) would be the best approach to protect the driver..

但是在这种情况下会发生,工作人员将不得不发送更多消息,所以开销仅仅是工作会变慢?

But will happen on this case, the worker will have to send more messages, so the overhead will be just that the job will be slower?

如果我理解正确,假设某个工作人员想要向驱动程序发送4G数据,那么具有spark.driver.maxResultSize=1G,将导致该工作人员发送4条消息(而不是无限制spark.driver.maxResultSize的1条消息).如果是这样,那么增加该属性以保护我的驱动程序免遭Yarn的暗杀应该是错误的.

If I understand correctly, assuming that a worker wants to send 4G of data to the driver, then having spark.driver.maxResultSize=1G, will cause the worker to send 4 messages (instead of 1 with unlimited spark.driver.maxResultSize). If so, then increasing that attribute to protect my driver from being assassinated from Yarn should be wrong.

但是上面的问题仍然存在..我的意思是,如果将其设置为1M(最小),那将是最具保护性的方法吗?

But still the question above remains..I mean what if I set it to 1M (the minimum), will it be the most protective approach?

推荐答案

假设一个工作人员想要向驱动程序发送4G数据,然后将spark.driver.maxResultSize = 1G发送给该工作人员,则它将发送4条消息(而不是1条消息,不受限制的spark.driver.maxResultSize).

assuming that a worker wants to send 4G of data to the driver, then having spark.driver.maxResultSize=1G, will cause the worker to send 4 messages (instead of 1 with unlimited spark.driver.maxResultSize).

不.如果估计的数据大小大于maxResultSize,则给定的作业将被中止.这里的目标是保护您的应用程序免受驱动程序损失,仅此而已.

No. If estimated size of the data is larger than maxResultSize given job will be aborted. The goal here is to protect your application from driver loss, nothing more.

如果我将其设置为1M(最小),它将是最具保护性的方法吗?

if I set it to 1M (the minimum), will it be the most protective approach?

从某种意义上说是的,但是显然在实践中没有用.好的价值应该可以使应用程序正常运行,但可以保护应用程序免受意外情况的影响.

In sense yes, but obviously it is not useful in practice. Good value should allow application to proceed normally but protect application from unexpected conditions.

这篇关于什么是spark.driver.maxResultSize?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆