当在Spark中运行大连接时,我让执行器运行超出内存限制 [英] I am getting the executor running beyond memory limits when running big join in spark

查看:309
本文介绍了当在Spark中运行大连接时,我让执行器运行超出内存限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Spark上进行大联接时,我在驱动程序中遇到以下错误.

I am getting the following error in the driver of a big join on spark.

我们有3个节点,内存为32GB,联接的总输入大小为150GB. (当输入文件大小为50GB时,同一个应用程序可以正常运行)

We have 3 nodes with 32GB of ram and total input size of join is 150GB. (The same app is running properly when input file size is 50GB)

我将storage.memoryFraction设置为0.2,并将shuffle.memoryFraction设置为0.2.但仍然会继续遇到运行中的物理极限错误.

I have set storage.memoryFraction to 0.2 and shuffle.memoryFraction to 0.2. But still keep on getting the running beyong physical limits error.

15/04/07 19:58:17 INFO yarn.YarnAllocator:容器标记为失败: container_1426882329798_0674_01_000002.退出状态:143.诊断: 容器 [pid = 51382,containerID = container_1426882329798_0674_01_000002]是 运行超出物理内存限制.当前使用量:16 GB的16.1 GB 使用的物理内存;已使用16.8 GB的33.6 GB虚拟内存.杀人 容器.的过程树转储 container_1426882329798_0674_01_000002: |-PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)SYSTEM_TIME(MILLIS)VMEM_USAGE(BYTES)RSSMEM_USAGE(PAGES) FULL_CMD_LINE |-51387 51382 51382 51382(java)717795 50780 17970946048 4221191/usr/jdk64/jdk1.7.0_45/bin/java -server -XX:OnOutOfMemoryError =杀死%p -Xms14336m -Xmx14336m-详细:gc -XX:+ PrintGCDetails -XX:+ PrintGCTimeStamps -XX:+ StartAttachListener -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port = 9010 -Dcom.sun.management.jmxremote.local.only = false -Dcom.sun.management.jmxremote.authenticate = false -Dcom.sun.management.jmxremote.ssl = false -Dlog4j.configuration = file:/softwares/log4j .properties -Djava.io.tmpdir =/hadoop/yarn/local/usercache/hdfs/appcache/application_1426882329798_0674/container_1426882329798_0674_01_000002/tmp -Dspark.driver.port = 20763 -Dspark.ui.port = 0 -Dspark.yarn.app.container.log.dir =/hadoop/yarn/log/application_1426882329798_0674/container_1426882329798_0674_01_000002 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@maxiq2.augmentiq.in:20763/user/CoarseGrainedScheduler --executor-id 1 --hostname maxiq1.augmentiq.in --cores 4 --app-id application_1426882329798_0674 --user-class-path 文件:/hadoop/yarn/local/usercache/hdfs/appcache/application_1426882329798_0674/container_1426882329798_0674_01_000002/ app .jar

15/04/07 19:58:17 INFO yarn.YarnAllocator: Container marked as failed: container_1426882329798_0674_01_000002. Exit status: 143. Diagnostics: Container [pid=51382,containerID=container_1426882329798_0674_01_000002] is running beyond physical memory limits. Current usage: 16.1 GB of 16 GB physical memory used; 16.8 GB of 33.6 GB virtual memory used. Killing container. Dump of the process-tree for container_1426882329798_0674_01_000002 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 51387 51382 51382 51382 (java) 717795 50780 17970946048 4221191 /usr/jdk64/jdk1.7.0_45/bin/java -server -XX:OnOutOfMemoryError=kill %p -Xms14336m -Xmx14336m -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+StartAttachListener -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dlog4j.configuration=file:/softwares/log4j.properties -Djava.io.tmpdir=/hadoop/yarn/local/usercache/hdfs/appcache/application_1426882329798_0674/container_1426882329798_0674_01_000002/tmp -Dspark.driver.port=20763 -Dspark.ui.port=0 -Dspark.yarn.app.container.log.dir=/hadoop/yarn/log/application_1426882329798_0674/container_1426882329798_0674_01_000002 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@maxiq2.augmentiq.in:20763/user/CoarseGrainedScheduler --executor-id 1 --hostname maxiq1.augmentiq.in --cores 4 --app-id application_1426882329798_0674 --user-class-path file:/hadoop/yarn/local/usercache/hdfs/appcache/application_1426882329798_0674/container_1426882329798_0674_01_000002/app.jar

请帮助我吗?

推荐答案

我们之前也遇到过类似的问题.试图改变所有的火花配置,但没有运气.

We have faced similar issue before. Tried changing all the configuratons of spark but no luck.

后来,我们发现这是数据问题.我们在连接中使用的键有多行.两个表中的某些键大约有4000-5000行.因此spark为该密钥创建了大约5k * 5k的记录,从而使执行程序运行内存.

Later we found that it was the issue with the data. The key which we have used in join had multiple rows. Some of the keys were having around 4000-5000 rows in both the tables. So spark created around 5k * 5k records for that key making that executor run of memory.

您可能要检查一次数据.对输入数据(如分组依据"键)进行一些分析,然后获取计数.这可能会给您一些见识.

You may want to check your data once. Run some profiling on input data like Group by on key and fetch the count. That may give you some insights.

这篇关于当在Spark中运行大连接时,我让执行器运行超出内存限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆