什么是spark.python.worker.memory? [英] What is spark.python.worker.memory?
问题描述
谁能给我这个星火参数的更多precise描述,以及它如何影响程序执行?我不能告诉正是这个参数的意义与文档引擎盖下。
Could anyone give me a more precise description of this Spark parameter and how it effects program execution? I cannot tell exactly what this parameter does "under the hood" from the documentation.
推荐答案
该参数会影响为Python工人的内存限制。如果一个Python工作进程的RSS比内存限制的,那么它会从内存到磁盘的数据溢出,这将减少内存利用率,但一般是昂贵的操作。
The parameter influences the memory limit for Python workers. If the RSS of a Python worker process is larger than the memory limit, then it will spill data from memory to disk, which will reduce the memory utilization but is generally an expensive operation.
请注意,该值每Python的劳动者申请,并且会有多个工人每执行人。
Note that this value applies per Python worker, and there will be multiple workers per executor.
如果你想采取引擎盖下看看,再看看在Spark源代码树,如蟒蛇/ pyspark目录在 ExternalMerger
实施<一个href=\"https://github.com/apache/spark/blob/41afa16500e682475eaa80e31c0434b7ab66abcb/python/pyspark/shuffle.py#L280\" rel=\"nofollow\">https://github.com/apache/spark/blob/41afa16500e682475eaa80e31c0434b7ab66abcb/python/pyspark/shuffle.py#L280
If you want to take a look under the hood, then look at the python/pyspark directory in the Spark source tree, e.g. the ExternalMerger
implementation: https://github.com/apache/spark/blob/41afa16500e682475eaa80e31c0434b7ab66abcb/python/pyspark/shuffle.py#L280
这篇关于什么是spark.python.worker.memory?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!