OOM在tez / hive中 [英] OOM in tez/hive

查看:1244
本文介绍了OOM在tez / hive中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

[在几个答案和评论之后,我根据这里获得的知识问了一个新问题:



从这张图:


  • 表联系人有150M行,283GB的ORC压缩数据(有一个大的json字段,横向视图)

  • 表m有1M行,20MB的ORC压缩数据

  • 表c有2k行,< / li> 1MB ORC压缩

  • 表e有800k行,7GB ORC压缩

  • e与所有其他表一起加入



和联系人是分区的,只有一个分区在WHERE子句中被选定。



因此,我试图增加地图的数量:默认情况下,即使我将它降低到默认值,默认情况下,默认情况下为


  • tez.grouping.max-size:650MB -
    tez.grouping.min-size(16MB)没有区别

  • tez.grouping.split-count甚至增加到1000也没有区别
  • >
  • tez.grouping.split-wave 1.7默认情况下甚至增加到5没有区别


    如果这是相关的,这里有一些其他的内存设置:


    • mapred-site / mapreduce.map.memory.mb = 1024(最小容器大小)

    • mapred-site / mapreduce.reduce.memory.mb = 2048(2 *最小容器大小)
    • mapred-site / mapreduce.map。 java.opts = 819(0.8 * min容器大小)

    • mapred-site / mapreduce.reduce.java.opts = 1638(0.8 * mapreduce.reduce.memory.mb)

    • mapred-site /yarn.app.mapreduce.am.resource.mb = 2048(2 * min容器大小)
    • mapred-site / yarn.app.mapreduce.am.command-opts = 1638( 0.8 * yarn.app.mapreduce.am.resource.mb)

    • mapred-site / mapreduce.task.io.sort.mb = 409(0.4 * min的容器大小)
    • >


    我的理解是,tez可以将工作分解成许多负载,因此需要很长时间才能完成。我错了,还是有一种方法,我没有找到?

    上下文:hdp2.6,8 datanodes与32GB Ram,查询使用粗笨的横向视图基于json通过直线运行。

    解决方案

    这个问题显然是由于SKEWED数据造成的。我会建议您将DISTRIBUTE BY COL添加到您从源中选择查询,以便Reducer具有均匀分布的数据。在下面的例子中,COL3是更均匀分布的数据,如ID列
    例子
    $ b $ pre $ ORIGINAL QUERY:insert overwrite table X AS SELECT COL1,COL2,COL3来自Y
    NEW QUERY:插入覆盖表X AS从COL1,COL2,COL3分发COL3


    [After a few answers and comments I asked a new question based on the knowledge gained here: Out of memory in Hive/tez with LATERAL VIEW json_tuple ]

    One of my query consistently fails with the error:

    ERROR : Status: Failed
    ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1516602562532_3606_2_03, diagnostics=[Task failed, taskId=task_1516602562532_3606_2_03_000001, diagnostics=[TaskAttempt 0 failed, info=[Container container_e113_1516602562532_3606_01_000008 finished with diagnostics set to [Container failed, exitCode=255. Exception from container-launch.
    Container id: container_e113_1516602562532_3606_01_000008
    Exit code: 255
    Stack trace: ExitCodeException exitCode=255: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
        at org.apache.hadoop.util.Shell.run(Shell.java:844)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
        at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
    
    Container exited with a non-zero exit code 255
    ]], TaskAttempt 1 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
    

    The keyword here seems to be java.lang.OutOfMemoryError: Java heap space.

    I looked around but none of what I thought I understood from Tez helps me:

    • ​yarn-site/yarn.nodemanager.resource.memory-mb is maxed up => I use all the memory I can
    • yarn-site/yarn.scheduler.maximum-allocation-mb: same as yarn.nodemanager.resource.memory-mb
    • yarn-site/yarn.scheduler.minimum-allocation-mb = 1024
    • hive-site/hive.tez.container.size = 4096 (multiple of yarn.scheduler.minimum-allocation-mb)

    ​My query has 4 mappers, 3 go very fast, the 4th dies everytime. Here is the Tez graphical view of the query:

    From this image:

    • table contact has 150M rows, 283GB of ORC compressed data (there is one large json field, LATERAL VIEW'ed)
    • table m has 1M rows, 20MB of ORC compressed data
    • table c has 2k rows, < 1MB ORC compressed
    • table e has 800k rows, 7GB of ORC compressed
    • e is LEFT JOIN'ed with all the other tables

    e and contact are partitioned and only one partition in selected in the WHERE clause.

    I thus tried to increase the number of maps:

    • tez.grouping.max-size: 650MB by default, even if I lower it to - tez.grouping.min-size​ (16MB) it makes no difference
    • tez.grouping.split-count even increased to 1000 does not make a difference
    • tez.grouping.split-wave 1.7 by default, even increased to 5 makes no difference

    If it's relevant, here are some other memory settings:

    • mapred-site/mapreduce.map.memory.mb = 1024 (Min container size)
    • mapred-site/mapreduce.reduce.memory.mb = 2048 (2 * min container size)
    • mapred-site/mapreduce.map.java.opts = 819 (0.8 * min container size)
    • mapred-site/mapreduce.reduce.java.opts = 1638 (0.8 * mapreduce.reduce.memory.mb)
    • mapred-site/yarn.app.mapreduce.am.resource.mb = 2048 (2 * min container size)
    • mapred-site/yarn.app.mapreduce.am.command-opts = 1638 (0.8 * yarn.app.mapreduce.am.resource.mb)
    • mapred-site/mapreduce.task.io.sort.mb = 409 (0.4 * min container size)

    My understanding was that tez can split the work in many loads, thus taking long but eventually completing. ​Am I wrong, or is there a way I have not found?

    context: hdp2.6, 8 datanodes with 32GB Ram, query using a chunky lateral view based on json run via beeline.

    解决方案

    The issue is clearly due to SKEWED data. I would recommand that you add DISTRIBUTE BY COL to you select query from source so that the reducer has evenly distributed data. In the below example COL3 is more evenly distributed data like ID column Example

    ORIGINAL QUERY : insert overwrite table X AS SELECT COL1,COL2,COL3 from Y
    NEW QUERY      : insert overwrite table X AS SELECT COL1,COL2,COL3 from Y distribute by COL3
    

    这篇关于OOM在tez / hive中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆