如何避免 Spark 执行器因内存限制而丢失和纱线容器杀死它? [英] How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?

查看：15 发布时间：2021/11/14 21:40:07 memory apache-spark apache-spark-sql hadoop-yarn executors

本文介绍了如何避免 Spark 执行器因内存限制而丢失和纱线容器杀死它?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下代码大部分时间都会触发 hiveContext.sql() .我的任务是我想为所有配置单元表分区创建几个表并在处理后插入值.

I have the following code which fires hiveContext.sql() most of the time. My task is I want to create few tables and insert values into after processing for all hive table partition.

所以我首先触发 show partitions 并在 for 循环中使用它的输出，我调用一些方法来创建表(如果它不存在)并使用 hiveContext.sql.

So I first fire show partitions and using its output in a for-loop, I call a few methods which creates the table (if it doesn't exist) and inserts into them using hiveContext.sql.

现在，我们不能在执行器中执行hiveContext，所以我必须在驱动程序中的for循环中执行它，并且应该一个一个地串行运行.当我在 YARN 集群中提交这个 Spark 作业时，几乎所有的时间我的执行程序都会因为 shuffle not found 异常而丢失.

Now, we can't execute hiveContext in an executor, so I have to execute this in a for-loop in a driver program, and should run serially one by one. When I submit this Spark job in YARN cluster, almost all the time my executor gets lost because of shuffle not found exception.

现在发生这种情况是因为 YARN 由于内存过载而杀死了我的执行程序.我不明白为什么，因为每个 hive 分区都有一个非常小的数据集，但它仍然会导致 YARN 杀死我的执行程序.

Now this is happening because YARN is killing my executor because of memory overload. I don't understand why, as I have a very small data set for each hive partition, but still it causes YARN to kill my executor.

以下代码是否会并行执行所有操作并尝试将所有 hive 分区数据同时容纳在内存中?

Will the following code do everything in parallel and try to accommodate all hive partition data in memory at the same time?

public static void main(String[] args) throws IOException {   
    SparkConf conf = new SparkConf(); 
    SparkContext sc = new SparkContext(conf); 
    HiveContext hc = new HiveContext(sc); 

    DataFrame partitionFrame = hiveContext.sql(" show partitions dbdata partition(date="2015-08-05")"); 
  
    Row[] rowArr = partitionFrame.collect(); 
    for(Row row : rowArr) { 
        String[] splitArr = row.getString(0).split("/"); 
        String server = splitArr[0].split("=")[1]; 
        String date =  splitArr[1].split("=")[1]; 
        String csvPath = "hdfs:///user/db/ext/"+server+".csv"; 
        if(fs.exists(new Path(csvPath))) { 
            hiveContext.sql("ADD FILE " + csvPath); 
        } 
        createInsertIntoTableABC(hc,entity, date); 
        createInsertIntoTableDEF(hc,entity, date); 
        createInsertIntoTableGHI(hc,entity,date); 
        createInsertIntoTableJKL(hc,entity, date); 
        createInsertIntoTableMNO(hc,entity,date); 
   } 
}

如何避免 Spark 执行器因内存限制而丢失和纱线容器杀死它? [英] How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何避免 Spark 执行器因内存限制而丢失和纱线容器杀死它? [英] How to avoid Spark executor from getting lost and yarn container killing it due to memory limit?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭