HIVE很长的领域给了OOM堆 [英] HIVE very long field gives OOM Heap

查看:96
本文介绍了HIVE很长的领域给了OOM堆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们在HIVE表中存储的字符串字段的长度从小(几kB)到非常长(<400MB)不等.现在,当将数据从一个表复制到另一个表(没有任何条件或联接)时,我们将面临OOM问题,这不完全是我们在生产中正在运行的,但这是发生此问题的最简单的用例.因此,HQL基本上就是:

We are storing string fields which varies in length from small(few kB) to very long(<400MB) in HIVE table. Now we are facing the issue of OOM when copying data from one table to another(without any conditions or joins), which is not exactly what we are running in production, but it is the most simple use case where this problem occurs. So the HQL is basically just:

INSERT INTO new_table
SELECT * FROM old_table;

容器和Java Heap设置为16GB,我们尝试了不同的文件格式(RCFile,ORC),压缩和不压缩,不同的引擎(MR,TEZ)等,但是没有帮助,我们总是遇到OOM.

Container and Java Heap was set to 16GB, we had tried different file formats (RCFile, ORC), with and without compression, different engines(MR, TEZ) etc., but nothing helps and we always run into OOM.

我们不确定那里到底发生了什么.我们期望Java进程仅占用几条记录的最大长度(大约400M),而不是整个16GB堆的内存.

We are not sure what is exactly happening there. We were expecting that Java process will take just few times memory of max length of single record, which is ~400M, but not whole 16GB heap.

您能给我们一些我们应该尝试或关注的东西吗?

Can you give us something we should try or focus on ?

使用的版本:HDP 2.4.2

Version used: HDP 2.4.2

使用TEZ + ORC + 8G RAM时的示例日志: https://pastebin.com/uza84t6F

Sample log when using TEZ+ORC+8G of RAM: https://pastebin.com/uza84t6F

推荐答案

  1. 尝试使用TEXTFILE代替ORC.编写ORC文件需要更多的内存.

  1. Try to use TEXTFILE instead of ORC. Writing an ORC file requires much more memory.

尝试增加并行度,添加更多映射器.使用Tez的以下参数并尝试增加映射器的数量:

Try to increase parallelism, add more mappers. Play with these parameters for Tez and try to increase the number of mappers:

-最小和最大拆分大小:

--min and max split size:

set tez.grouping.min-size=16777216;
set tez.grouping.max-size=1073741824;

请参阅此处: https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html

这篇关于HIVE很长的领域给了OOM堆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆