什么是spark.python.worker.memory? [英] What is spark.python.worker.memory?

查看:862
本文介绍了什么是spark.python.worker.memory?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能给我这个星火参数的更多precise描述,以及它如何影响程序执行?我不能告诉正是这个参数的意义与文档引擎盖下。

Could anyone give me a more precise description of this Spark parameter and how it effects program execution? I cannot tell exactly what this parameter does "under the hood" from the documentation.

推荐答案

该参数会影响为Python工人的内存限制。如果一个Python工作进程的RSS比内存限制的,那么它会从内存到磁盘的数据溢出,这将减少内存利用率,但一般是昂贵的操作。

The parameter influences the memory limit for Python workers. If the RSS of a Python worker process is larger than the memory limit, then it will spill data from memory to disk, which will reduce the memory utilization but is generally an expensive operation.

请注意,该值每Python的劳动者申请,并且会有多个工人每执行人。

Note that this value applies per Python worker, and there will be multiple workers per executor.

如果你想采取引擎盖下看看,再看看在Spark源代码树,如蟒蛇/ pyspark目录在 ExternalMerger 实施<一个href=\"https://github.com/apache/spark/blob/41afa16500e682475eaa80e31c0434b7ab66abcb/python/pyspark/shuffle.py#L280\" rel=\"nofollow\">https://github.com/apache/spark/blob/41afa16500e682475eaa80e31c0434b7ab66abcb/python/pyspark/shuffle.py#L280

If you want to take a look under the hood, then look at the python/pyspark directory in the Spark source tree, e.g. the ExternalMerger implementation: https://github.com/apache/spark/blob/41afa16500e682475eaa80e31c0434b7ab66abcb/python/pyspark/shuffle.py#L280

这篇关于什么是spark.python.worker.memory?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆