Spark Worker要求提供荒谬的虚拟内存 [英] Spark Worker asking for absurd amounts of virtual memory

查看:222
本文介绍了Spark Worker要求提供荒谬的虚拟内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在2节点纱线簇上运行火花作业.我的数据集并不大(小于100MB),仅用于测试,而该工作人员由于要求太多的虚拟内存而被杀死.这里的数额是荒谬的.使用的11GB物理内存中有2GB,使用的300GB虚拟内存.

I am running a spark job on a 2 node yarn cluster. My dataset is not large (< 100MB) just for testing and the worker is getting killed because it is asking for too much virtual memory. The amounts here are absurd. 2GB out of 11GB physical memory used, 300GB virtual memory used.

16/02/12 05:49:43 WARN Scheduler.TaskSetManager:在阶段2.1中丢失了任务0.0(TID 22,ip-172-31-6-141.ec2.internal):ExecutorLostFailure(执行器2退出是由于正在运行的任务之一)原因:容器标记为失败:主机上的container_1455246675722_0023_01_000003:ip-172-31-6-141.ec2.internal.退出状态:143.诊断:容器[pid = 23206,containerID = container_1455246675722_0023_01_000003]超出了虚拟内存限制.当前使用情况:已使用11 GB物理内存中的2.1 GB; 305.3 GB的23.1 GB虚拟内存.杀死容器. 容器_1455246675722_0023_01_000003的过程树的转储: |-PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)SYSTEM_TIME(MILLIS)VMEM_USAGE(BYTES)RSSMEM_USAGE(PAGES)FULL_CMD_LINE |-23292 23213 23292 23206(python)15 3 101298176 5514 python -m pyspark.daemon |-23206 1659 23206 23206(bash)0 0 11431936 352/bin/bash -c/usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -XX:OnOutOfMemoryError ='kill%p'- Xms10240m -Xmx10240m -Djava.io.tmpdir =/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1455246675722_0023/container_1455246675722_0023_01_000003/tmp'-Dspark.driver.port = 37386'-Dspark.yarn. .container.log.dir =/mnt/yarn/logs/application_1455246675722_0023/container_1455246675722_0023_01_000003 -XX:MaxPermSize = 256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@172.3. executor-id 2-主机名ip-172-31-6-141.ec2.internal --cores 8 --app-id application_1455246675722_0023-用户类路径文件:/tmp/hadoop-root/nm-local- dir/usercache/root/appcache/application_1455246675722_0023/container_1455246675722_0023_01_000003/ app .jar 1>/mnt/yarn/logs/application_1455246675722_0023/container_1455246675722_0023_01_000003/stdout 2>/mnt/yarn on_1455246675722_0023/container_1455246675722_0023_01_000003/stderr |-23341 23292 23292 23206(python)87 8 39464374272 23281 python -m pyspark.daemon |-23350 23292 23292 23206(python)86 7 39463976960 24680 python -m pyspark.daemon |-23329 23292 23292 23206(python)90 6 39464521728 23281 python -m pyspark.daemon |-23213 23206 23206 23206(java)1168 61 11967115264 359820/usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -XX:OnOutOfMemoryError = kill%p -Xms10240m -Xmx10240m -Djava.io. tmpdir =/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1455246675722_0023/container_1455246675722_0023_01_000003/tmp -Dspark.driver.port = 37386 -Dspark.yarn.app.container.log.dir =/mnt/yarn/logs/application_1455246675722_0023/container_1455246675722_0023_01_000003 -XX:MaxPermSize = 256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@172.31.0.92:37386 --executor-id 2 --host -31-6-141.ec2.internal --cores 8 --app-id application_1455246675722_0023 --user-class-path文件:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1455246675722_0023/container_1455246675722_0023_01_000003/应用 .jar |-23347 23292 23292 23206(python)87 10 39464783872 23393 python -m pyspark.daemon |-23335 23292 23292 23206(python)83 9 39464112128 23216 python -m pyspark.daemon |-23338 23292 23292 23206(python)81 9 39463714816 24614 python -m pyspark.daemon |-23332 23292 23292 23206(python)86 6 39464374272 24812 python -m pyspark.daemon |-23344 23292 23292 23206(python)85 30 39464374272 23281 python -m pyspark.daemon 集装箱应要求被杀死.退出代码是143

16/02/12 05:49:43 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 2.1 (TID 22, ip-172-31-6-141.ec2.internal): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1455246675722_0023_01_000003 on host: ip-172-31-6-141.ec2.internal. Exit status: 143. Diagnostics: Container [pid=23206,containerID=container_1455246675722_0023_01_000003] is running beyond virtual memory limits. Current usage: 2.1 GB of 11 GB physical memory used; 305.3 GB of 23.1 GB virtual memory used. Killing container. Dump of the process-tree for container_1455246675722_0023_01_000003 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 23292 23213 23292 23206 (python) 15 3 101298176 5514 python -m pyspark.daemon |- 23206 1659 23206 23206 (bash) 0 0 11431936 352 /bin/bash -c /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms10240m -Xmx10240m -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1455246675722_0023/container_1455246675722_0023_01_000003/tmp '-Dspark.driver.port=37386' -Dspark.yarn.app.container.log.dir=/mnt/yarn/logs/application_1455246675722_0023/container_1455246675722_0023_01_000003 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@172.31.0.92:37386 --executor-id 2 --hostname ip-172-31-6-141.ec2.internal --cores 8 --app-id application_1455246675722_0023 --user-class-path file:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1455246675722_0023/container_1455246675722_0023_01_000003/app.jar 1> /mnt/yarn/logs/application_1455246675722_0023/container_1455246675722_0023_01_000003/stdout 2> /mnt/yarn/logs/application_1455246675722_0023/container_1455246675722_0023_01_000003/stderr |- 23341 23292 23292 23206 (python) 87 8 39464374272 23281 python -m pyspark.daemon |- 23350 23292 23292 23206 (python) 86 7 39463976960 24680 python -m pyspark.daemon |- 23329 23292 23292 23206 (python) 90 6 39464521728 23281 python -m pyspark.daemon |- 23213 23206 23206 23206 (java) 1168 61 11967115264 359820 /usr/lib/jvm/java-7-openjdk-amd64/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms10240m -Xmx10240m -Djava.io.tmpdir=/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1455246675722_0023/container_1455246675722_0023_01_000003/tmp -Dspark.driver.port=37386 -Dspark.yarn.app.container.log.dir=/mnt/yarn/logs/application_1455246675722_0023/container_1455246675722_0023_01_000003 -XX:MaxPermSize=256m org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@172.31.0.92:37386 --executor-id 2 --hostname ip-172-31-6-141.ec2.internal --cores 8 --app-id application_1455246675722_0023 --user-class-path file:/tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1455246675722_0023/container_1455246675722_0023_01_000003/app.jar |- 23347 23292 23292 23206 (python) 87 10 39464783872 23393 python -m pyspark.daemon |- 23335 23292 23292 23206 (python) 83 9 39464112128 23216 python -m pyspark.daemon |- 23338 23292 23292 23206 (python) 81 9 39463714816 24614 python -m pyspark.daemon |- 23332 23292 23292 23206 (python) 86 6 39464374272 24812 python -m pyspark.daemon |- 23344 23292 23292 23206 (python) 85 30 39464374272 23281 python -m pyspark.daemon Container killed on request. Exit code is 143

有人知道为什么会这样吗?我一直在尝试修改各种纱线和火花配置,但是我知道要这么多vmem确实有些错误.

Does anyone know why this might be happening? I've been trying modifying various yarn and spark configurations, but I know something is deeply wrong for it to be asking for this much vmem.

推荐答案

我正在运行的命令

spark-submit --executor-cores 8 ...

结果是executor-cores标志没有按照我的想法做.它制作pyspark.daemon进程的8个副本,并运行worker进程的8个副本以运行作业.每个进程都使用38GB的虚拟内存,该内存不必要地大,但是8 * 38〜300,因此可以解释这一点.

It turns out the executor-cores flag doesn't do what I thought it does. It makes 8 copies of the pyspark.daemon process, running 8 copies of the worker process to run jobs. Each process was using 38GB of virtual memory, which is unnecessarily large, but 8 * 38 ~ 300, so that explains that.

它实际上是一个名字很差的标志.如果将executor-cores设置为1,它将创建一个守护程序,但是该守护程序将使用多个内核,如通过htop所见.

It's actually a very poorly named flag. If I set executor-cores to 1, it makes one daemon, but the daemon will use multiple cores, as seen via htop.

这篇关于Spark Worker要求提供荒谬的虚拟内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆