优化配置单元查询.java.lang.OutOfMemoryError:超出了Java堆空间/GC开销限制 [英] Optimize Hive Query. java.lang.OutOfMemoryError: Java heap space/GC overhead limit exceeded

查看：89 发布时间：2021/5/14 19:08:30 sql hive out-of-memory azure-hdinsight beeline

本文介绍了优化配置单元查询.java.lang.OutOfMemoryError:超出了Java堆空间/GC开销限制的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

由于不断遇到此OOM错误，如何优化这种形式的查询?还是想出一个更好的执行计划?如果我删除了substring子句，查询将正常工作，表明这会占用大量内存.

How can I optimize a query of this form since I keep running into this OOM error? Or come up with a better execution plan? If I removed the substring clause, the query would work fine, suggesting that this takes a lot of memory.

当作业失败时，直线输出将显示OOM Java堆空间.在线阅读建议我增加 export HADOOP_HEAPSIZE ，但这仍然会导致错误.我尝试的另一件事是增加 hive.tez.container.size 和 hive.tez.java.opts (tez堆)，但是仍然出现此错误.在YARN日志中，将超出GC开销限制，这表明内存不足和/或查询计划的效率极低，因为它无法收集回足够的内存.

When the job fails, the beeline output shows the OOM Java heap space. Readings online suggested that I increase export HADOOP_HEAPSIZE but this still results in the error. Another thing I tried was increasing the hive.tez.container.size and hive.tez.java.opts (tez heap), but still has this error. In the YARN logs, there would be GC overhead limit exceeded, suggesting a combination of not enough memory and/or the query plan is extremely inefficient since it can't collect back enough memory.

我正在使用Azure HDInsight Interactive Query 4.0.20个工作节点，D13v2 8内核和56GB RAM.

I am using Azure HDInsight Interactive Query 4.0. 20 worker node, D13v2 8 core, and 56GB RAM.

create external table database.sourcetable(
  a,
  b,
  c,
  ...
  (183 total columns)
  ...
)
PARTITIONED BY ( 
  W string, 
  X int, 
  Y string, 
  Z int
)

目标表

create external table database.NEWTABLE(
  a,
  b,
  c,
  ...
  (187 total columns)
  ...
  W,
  X,
  Y,
  Z
)
PARTITIONED BY (
  aAAA,
  bBBB
)

查询

insert overwrite table database.NEWTABLE partition(aAAA, bBBB, cCCC)
select
a,
b,
c,
...
(187 total columns)
...
W,
X,
Y,
Z,
cast(a as string) as aAAA, 
from_unixtime(unix_timestamp(b,'yyMMdd'),'yyyyMMdd') as bBBB,
substring(upper(c),1,2) as cCCC
from database.sourcetable

优化配置单元查询.java.lang.OutOfMemoryError:超出了Java堆空间/GC开销限制 [英] Optimize Hive Query. java.lang.OutOfMemoryError: Java heap space/GC overhead limit exceeded

问题描述

目标表

查询

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

优化配置单元查询.java.lang.OutOfMemoryError:超出了Java堆空间/GC开销限制 [英] Optimize Hive Query. java.lang.OutOfMemoryError: Java heap space/GC overhead limit exceeded

问题描述

目标表

查询

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭