Hive查询中由于内存而出现的问题 [英] Issue in Hive Query due to memory

查看:195
本文介绍了Hive查询中由于内存而出现的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个插入查询,其中我们试图通过从非分区表中读取数据来向分区表中插入数据.

We have insert query in which we are trying to insert data to partitioned table by reading data from non partitioned table.

查询-

 insert into db1.fact_table PARTITION(part_col1, part_col2) 
 ( col1,
 col2,
 col3,
 col4,
 col5,
 col6,
 .
 .
 .
 .
 .
 .
 .
 col32
 LOAD_DT,
 part_col1,
 Part_col2 ) 
 select 
 col1,
 col2,
 col3,
 col4,
 col5,
 col6,
 .
 .
 .
 .
 .
 .
 .
 col32,
 part_col1,
 Part_col2
 from db1.main_table WHERE col1=0;

表有34列,主表中的记录数取决于我们每天收到的输入文件的大小.并且每次运行时插入的分区数量(part_col1,part_col2)可能在4000到5000之间变化

Table has 34 columns, number of records in main table depends on size of input file which we receive on daily basis. and the number of partitions (part_col1, part_col2) which we insert in each run might vary from 4000 to 5000

有时此查询失败,并出现以下问题.

Some time this query fails with below issue.

2019-04-28 13:23:31,715第1阶段映射= 95%,减少= 0%,累积CPU 177220.23秒2019-04-28 13:24:25,989第1阶段映射= 100%,减少= 0%,累计CPU 163577.82秒MapReduce累计CPU总时间:1天21小时26分17秒820毫秒结束的作业=job_1556004136988_155295出现错误,作业期间出错,获取调试信息...正在检查任务ID:任务中的task_1556004136988_155295_m_000003(及更多)job_1556004136988_155295检查任务ID:任务中的task_1556004136988_155295_m_000004(及更多)job_1556004136988_155295失败次数最多的任务(4):-----任务ID:task_1556004136988_155295_m_000000
-----此任务的诊断消息:来自容器启动的异常.容器编号:container_e81_1556004136988_155295_01_000015退出代码:255堆栈跟踪:ExitCodeException exitCode = 255:在org.apache.hadoop.util.Shell.runCommand(Shell.java:563)在org.apache.hadoop.util.Shell.run(Shell.java:460)在org.apache.hadoop.util.Shell $ ShellCommandExecutor.execute(Shell.java:748)在org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:305)中在org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:356)上在org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88)上在java.util.concurrent.FutureTask.run(FutureTask.java:266)在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)在java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:624)在java.lang.Thread.run(Thread.java:748)处Shell输出:main:提供的命令1 main:用户是bldadmin main:请求的纱线用户为bldadmin容器退出且非零退出代码255失败:执行错误,从处返回代码2启动了org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce作业:Stage-Stage-1:Map:10累积CPU:163577.82 sec MAPRFS读取:0 MAPRFS写入:0 FAIL总MapReduce使用时间:1天21小时26分17秒820毫秒

2019-04-28 13:23:31,715 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 177220.23 sec 2019-04-28 13:24:25,989 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 163577.82 sec MapReduce Total cumulative CPU time: 1 days 21 hours 26 minutes 17 seconds 820 msec Ended Job = job_1556004136988_155295 with errors Error during job, obtaining debugging information... Examining task ID: task_1556004136988_155295_m_000003 (and more) from job job_1556004136988_155295 Examining task ID: task_1556004136988_155295_m_000004 (and more) from job job_1556004136988_155295 Task with the most failures(4): ----- Task ID: task_1556004136988_155295_m_000000
----- Diagnostic Messages for this Task: Exception from container-launch. Container id: container_e81_1556004136988_155295_01_000015 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:563) at org.apache.hadoop.util.Shell.run(Shell.java:460) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:748) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:305) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:356) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:88) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Shell output: main : command provided 1 main : user is bldadmin main : requested yarn user is bldadmin Container exited with a non-zero exit code 255 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs Launched: Stage-Stage-1: Map: 10 Cumulative CPU: 163577.82 sec MAPRFS Read: 0 MAPRFS Write: 0 FAIL Total MapReduce CPU Time Spent: 1 days 21 hours 26 minutes 17 seconds 820 msec

当前配置单元属性.

使用Tez引擎-

set hive.execution.engine=tez;
set hive.tez.container.size=3072;
set hive.tez.java.opts=-Xmx1640m;
set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;
set hive.enforce.bucketing=true;
set hive.exec.parallel=true;
set hive.auto.convert.join=false;
set hive.enforce.bucketmapjoin=true;
set hive.optimize.bucketmapjoin.sortedmerge=true;
set hive.optimize.bucketmapjoin=true;
set hive.exec.tmp.maprfsvolume=false;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set hive.stats.fetch.partition.stats=true;
set hive.support.concurrency=true;
set hive.exec.max.dynamic.partitions=999999999;
set hive.enforce.bucketing=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
set hive.compactor.initiator.on=true;

根据其他团队的意见,我们将引擎更改为Mr,并确定了-

Based on input from other teams we changed the engine to mr and propertied are -

set hive.execution.engine=mr;
set hive.auto.convert.join=false;
set mapreduce.map.memory.mb=16384;
set mapreduce.map.java.opts=-Xmx14745m;
set mapreduce.reduce.memory.mb=16384;
set mapreduce.reduce.java.opts=-Xmx14745m;

完成这些属性查询后,几次不会出现任何错误.

With these properties query completed with out any errors few times.

我该如何调试这些问题,我们是否可以设置任何配置单元属性,以便将来不再出现这些问题.

How can i debug these issue and are there any hive properties which we can set so that we don't get these issues in future.

推荐答案

添加按分区键分发.每个化简器将仅处理一个分区,而不是每个分区,这将减少内存消耗,因为化简器将创建较少的文件,并保留较少的缓冲区.

Add distribute by partition key. Each reducer will process only one partition, not every partition, this will result in less memory consumption, because reducer will create less files, keeping less buffers.

insert into db1.fact_table PARTITION(part_col1, part_col2) 
select 
col1,
...

col32,
part_col1,
Part_col2
 from db1.main_table WHERE col1=0
distribute by part_col1, Part_col2; --add this

使用谓词下推,如果源文件是ORC,则可能有助于过滤:

Use Predicate Push Down, it may help with filtering if source files are ORC:

SET hive.optimize.ppd=true;
SET hive.optimize.ppd.storage=true;
SET hive.optimize.index.filter=true;

调整适当的映射器和化简器并行性: https://stackoverflow.com/a/48487306/2700344

Tune proper mapper and reducer parallelism: https://stackoverflow.com/a/48487306/2700344

如果您的数据太大并且按分区键分配不均匀,则除了分区键之外,还添加随机分配.这将有助于处理偏斜的数据:

Add distribute by random in addition to partition keys if your data is too big and the distribution by partition key is not even. This will help with skewed data:

distribute by part_col1, Part_col2, FLOOR(RAND()*100.0)%20;

也请阅读 https://stackoverflow.com/a/55375261/2700344

这篇关于Hive查询中由于内存而出现的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆