Spark执行器的RAM和文件大小 [英] Spark Executors RAM and file Size

查看:43
本文介绍了Spark执行器的RAM和文件大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用WholeTextFiles方法读取大小为8.2 GB的文本文件(文件夹中的所有文件).

I am reading text files of size 8.2 GB(all files in a folder) with WholeTextFiles method.

读取文件的工作有3个执行器,每个执行器具有4个内核和4GB的内存,如图所示.

The job that read the files got 3 executors each with 4 cores and 4GB memory a shown in picture..

尽管作业页面显示了3个执行者,但只有2个执行者真正在处理数据.(我可以从stderr日志中了解到,它将打印出正在读取的文件).第三执行者没有任何正在处理文件的痕迹.

Though the job page is showing 3 executors, only 2 executors are really working on the data.(i can understand that from stderr logs which would print the files it's reading). 3rd executor doesnt have any trace that it's processing files.

整个文本文件API有2个分区..

There are 2 partitions from the wholetextfile API..

2个执行器分别具有4GB和8GB的总内存.但是我的文件只有8.2GB.

2 executors had 4GB each total 8GB of memory. But my files had 8.2GB.

谁能解释两个总共8GB RAM的执行器有8.2GB文件吗?

我的工作成功完成.

推荐答案

每个执行器都有内存开销[占分配内存的10%或最少384 M].

Each and every executor has memory overhead [ which is 10% of allocated memory or with a minimum of 384 M].

您可以从YARN运行作业中看到实际分配的内存.

You can see the actual allocated memory from YARN Running Jobs.

此外,还有一种称为容器内存[最小和最大限制]分配"的东西.

Also, there is something called Container memory [min and max limit] allocation.

这篇关于Spark执行器的RAM和文件大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆