Hive查询中由于内存而出现的问题 [英] Issue in Hive Query due to memory

查看：195 发布时间：2021/5/14 19:09:48 hive insert hiveql partition hive-partitions

本文介绍了Hive查询中由于内存而出现的问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们有一个插入查询，其中我们试图通过从非分区表中读取数据来向分区表中插入数据.

We have insert query in which we are trying to insert data to partitioned table by reading data from non partitioned table.

查询-

 insert into db1.fact_table PARTITION(part_col1, part_col2) 
 ( col1,
 col2,
 col3,
 col4,
 col5,
 col6,
 .
 .
 .
 .
 .
 .
 .
 col32
 LOAD_DT,
 part_col1,
 Part_col2 ) 
 select 
 col1,
 col2,
 col3,
 col4,
 col5,
 col6,
 .
 .
 .
 .
 .
 .
 .
 col32,
 part_col1,
 Part_col2
 from db1.main_table WHERE col1=0;

表有34列，主表中的记录数取决于我们每天收到的输入文件的大小.并且每次运行时插入的分区数量(part_col1，part_col2)可能在4000到5000之间变化

Table has 34 columns, number of records in main table depends on size of input file which we receive on daily basis. and the number of partitions (part_col1, part_col2) which we insert in each run might vary from 4000 to 5000

有时此查询失败，并出现以下问题.

Some time this query fails with below issue.

2019-04-28 13:23:31,715第1阶段映射= 95％，减少= 0％，累积CPU 177220.23秒2019-04-28 13:24:25,989第1阶段映射= 100％，减少= 0％，累计CPU 163577.82秒MapReduce累计CPU总时间:1天21小时26分17秒820毫秒结束的作业=job_1556004136988_155295出现错误，作业期间出错，获取调试信息...正在检查任务ID:任务中的task_1556004136988_155295_m_000003(及更多)job_1556004136988_155295检查任务ID:任务中的task_1556004136988_155295_m_000004(及更多)job_1556004136988_155295失败次数最多的任务(4):-----任务ID:task_1556004136988_155295_m_000000
-----此任务的诊断消息:来自容器启动的异常.容器编号:container_e81_1556004136988_155295_01_000015退出代码:255堆栈跟踪:ExitCodeException exitCode = 255:在org.apache.hadoop.util.Shell.runCommand(Shell.java:563)在org.apache.hadoop.util.Shell.run(Shell.java:460)在org.apache.hadoop.util.Shell $ ShellCommandExecutor.execute(Shell.java:748)在org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:305)中在org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:356)上在org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88)上在java.util.concurrent.FutureTask.run(FutureTask.java:266)在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)在java.lang.Thread.run(Thread.java:748)处Shell输出:main:提供的命令1 main:用户是bldadmin main:请求的纱线用户为bldadmin容器退出且非零退出代码255失败:执行错误，从处返回代码2启动了org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce作业:Stage-Stage-1:Map:10累积CPU:163577.82 sec MAPRFS读取:0 MAPRFS写入:0 FAIL总MapReduce使用时间:1天21小时26分17秒820毫秒

2019-04-28 13:23:31,715 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 177220.23 sec 2019-04-28 13:24:25,989 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 163577.82 sec MapReduce Total cumulative CPU time: 1 days 21 hours 26 minutes 17 seconds 820 msec Ended Job = job_1556004136988_155295 with errors Error during job, obtaining debugging information... Examining task ID: task_1556004136988_155295_m_000003 (and more) from job job_1556004136988_155295 Examining task ID: task_1556004136988_155295_m_000004 (and more) from job job_1556004136988_155295 Task with the most failures(4): ----- Task ID: task_1556004136988_155295_m_000000
----- Diagnostic Messages for this Task: Exception from container-launch. Container id: container_e81_1556004136988_155295_01_000015 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:563) at org.apache.hadoop.util.Shell.run(Shell.java:460) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:748) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:305) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:356) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Shell output: main : command provided 1 main : user is bldadmin main : requested yarn user is bldadmin Container exited with a non-zero exit code 255 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 10 Cumulative CPU: 163577.82 sec MAPRFS Read: 0 MAPRFS Write: 0 FAIL Total MapReduce CPU Time Spent: 1 days 21 hours 26 minutes 17 seconds 820 msec

当前配置单元属性.

使用Tez引擎-

set hive.execution.engine=tez;
set hive.tez.container.size=3072;
set hive.tez.java.opts=-Xmx1640m;
set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;
set hive.enforce.bucketing=true;
set hive.exec.parallel=true;
set hive.auto.convert.join=false;
set hive.enforce.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.optimize.bucketmapjoin=true;
set hive.exec.tmp.maprfsvolume=false;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.stats.fetch.partition.stats=true;
set hive.support.concurrency=true;
set hive.exec.max.dynamic.partitions=999999999;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on=true;

根据其他团队的意见，我们将引擎更改为Mr，并确定了-

Based on input from other teams we changed the engine to mr and propertied are -

set hive.execution.engine=mr;
set hive.auto.convert.join=false;
set mapreduce.map.memory.mb=16384;
set mapreduce.map.java.opts=-Xmx14745m;
set mapreduce.reduce.memory.mb=16384;
set mapreduce.reduce.java.opts=-Xmx14745m;

完成这些属性查询后，几次不会出现任何错误.

With these properties query completed with out any errors few times.

我该如何调试这些问题，我们是否可以设置任何配置单元属性，以便将来不再出现这些问题.

How can i debug these issue and are there any hive properties which we can set so that we don't get these issues in future.

Hive查询中由于内存而出现的问题 [英] Issue in Hive Query due to memory

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Hive查询中由于内存而出现的问题 [英] Issue in Hive Query due to memory

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭