如何避免迷路星火执行人及纱线容器杀死它,由于内存限制? [英] How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?

查看:199
本文介绍了如何避免迷路星火执行人及纱线容器杀死它,由于内存限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下的code时触发 hiveContext.sql()的大部分时间。我的任务是我想创造一些表格和值插入所有蜂巢表分区处理后。

所以我第一把火显示分区并利用其输出一个for循环,我把它创建表(如果不存在的话)的一些方法和其中插入使用 hiveContext.sql

现在,我们不能在遗嘱执行人执行 hiveContext ,所以我有一个驱动器程序一个for循环来执行这一点,并应连续运行一个一。当我在纱线集群提交该星火作业时,几乎所有的时间我的遗嘱执行人得到,因为没有找到洗牌异常丢失。

现在这种情况正在发生,因为纱线被杀害因为内存过载我的遗嘱执行人。我不明白为什么,因为我有一个非常小的数据为每个配置项分区设置,但仍引起纱杀了我的执行人。

将以下code做的一切并行,并尝试在同一时间,以适应在内存中的所有蜂巢分区的数据?

 公共静态无效的主要(字串[] args)抛出IOException
SparkConf的conf =新SparkConf();
SparkContext SC =新SparkContext(CONF);
HiveContext HC =新HiveContext(SC);数据帧partitionFrame = hiveContext.sql(显示分区dbdata分区(日期=2015年8月5日));行[] = rowArr partitionFrame.collect();
对于(鳞次栉比:rowArr){
串[] splitArr = row.getString(0).split(/);
串服务器= splitArr [0] .split(=)[1];
串日期= splitArr [1] .split(=)[1];
字符串csvPath =HDFS:///用户/ DB /转/+服务器+.csv文件;
如果(fs.exists(新路径(csvPath))){
hiveContext.sql(添加文件+ csvPath);
}
createInsertIntoTableABC(HC,实体日期);
createInsertIntoTableDEF(HC,实体日期);
createInsertIntoTableGHI(HC,实体日期);
createInsertIntoTableJKL(HC,实体日期);
createInsertIntoTableMNO(HC,实体日期);
}
}


解决方案

一般来说,你应该总是深入到日志才能获得真正的异常出来(至少在星火1.3.1)。

TL;博士结果
根据纱线结果火花安全配置
spark.shuffle.memoryFraction = 0.5 - 这将使洗牌使用更多的内存分配的结果
spark.yarn.executor.memoryOverhead = 1024 - 这在MB设置。纱线杀死执行人在其内存使用更大然后(遗嘱执行人内存+ executor.memoryOverhead)

小的详细信息

从阅读你的问题,你提到你洗牌未发现异常。

在的情况下
org.apache.spark.shuffle.MetadataFetchFailedException:缺少了洗牌的输出位置
你应该增加 spark.shuffle.memoryFraction ,例如,0.5

有关纱线杀死我的遗嘱执行人最常见的原因是内存使用量超出预期的内容。
为了避免您增加 spark.yarn.executor.memoryOverhead ,我已经将它设置为1024,即使我的遗嘱执行人只是使用的内存2-3G。

希望这有助于

I have the following code which fires hiveContext.sql() most of the time. My task is I want to create few tables and insert values into after processing for all hive table partition.

So I first fire show partitions and using its output in a for-loop, I call a few methods which creates the table (if it doesn't exist) and inserts into them using hiveContext.sql.

Now, we can't execute hiveContext in an executor, so I have to execute this in a for-loop in a driver program, and should run serially one by one. When I submit this Spark job in YARN cluster, almost all the time my executor gets lost because of shuffle not found exception.

Now this is happening because YARN is killing my executor because of memory overload. I don't understand why, as I have a very small data set for each hive partition, but still it causes YARN to kill my executor.

Will the following code do everything in parallel and try to accommodate all hive partition data in memory at the same time?

public static void main(String[] args) throws IOException {   
    SparkConf conf = new SparkConf(); 
    SparkContext sc = new SparkContext(conf); 
    HiveContext hc = new HiveContext(sc); 

    DataFrame partitionFrame = hiveContext.sql(" show partitions dbdata partition(date="2015-08-05")"); 
  
    Row[] rowArr = partitionFrame.collect(); 
    for(Row row : rowArr) { 
        String[] splitArr = row.getString(0).split("/"); 
        String server = splitArr[0].split("=")[1]; 
        String date =  splitArr[1].split("=")[1]; 
        String csvPath = "hdfs:///user/db/ext/"+server+".csv"; 
        if(fs.exists(new Path(csvPath))) { 
            hiveContext.sql("ADD FILE " + csvPath); 
        } 
        createInsertIntoTableABC(hc,entity, date); 
        createInsertIntoTableDEF(hc,entity, date); 
        createInsertIntoTableGHI(hc,entity,date); 
        createInsertIntoTableJKL(hc,entity, date); 
        createInsertIntoTableMNO(hc,entity,date); 
   } 
}

解决方案

Generally, you should always dig into logs to get the real exception out (at least in Spark 1.3.1).

tl;dr
safe config for Spark under Yarn
spark.shuffle.memoryFraction=0.5 - this would allow shuffle use more of allocated memory
spark.yarn.executor.memoryOverhead=1024 - this is set in MB. Yarn kills executors when its memory usage is larger then (executor-memory + executor.memoryOverhead)

Little more info

From reading your question you mention that you get shuffle not found exception.

In case of org.apache.spark.shuffle.MetadataFetchFailedException: Missing an output location for shuffle you should increase spark.shuffle.memoryFraction, for example to 0.5

Most common reason for Yarn killing off my executors was memory usage beyond what it expected. To avoid that you increase spark.yarn.executor.memoryOverhead , I've set it to 1024, even if my executors use only 2-3G of memory.

Hope this helps

这篇关于如何避免迷路星火执行人及纱线容器杀死它,由于内存限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆