配置单元映射连接:内存不足异常 [英] Hive Map join : out of memory Exception

查看:146
本文介绍了配置单元映射连接:内存不足异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用一个大表(10G)和小表(230 MB)执行地图边。在小键盘上,我将使用所有列产生输出记录,加入键列后



我已在下面使用设置



set hive.auto.convert.join = true;
$ b set hive.mapjoin.smalltable.filesize = 262144000 ;

日志:

  ** 2013- 09-20 02:43:50开始启动本地任务来处理地图加入;最大内存= 1065484288 

2013-09-20 02:44:05处理行数:200000哈希表大小:199999内存使用情况:430269904速率:0.404

2013-09-20 02:44:14处理行:300000散列表大小:299999内存使用情况:643070664速率:0.604

线程Thread-0中的异常java.lang.OutOfMemoryError:Java堆空间
at java.util.jar.Manifest $ FastInputStream。< init>(Manifest.java:313)$ b $ at java.util.jar.Manifest $ FastInputStream。< init>(Manifest.java:308)
在java.util.jar.Manifest.read(Manifest.java:176)
在java.util.jar.Manifest。< init>(Manifest.java:50)$ b $在java.util .jar.JarFile.getManifestFromReference(JarFile.java:168)
at java.util.jar.JarFile.getManifest(JarFile.java:149)
at sun.misc.URLClassPath $ JarLoader $ 2.getManifest( URLClassPath.java:696)java.net.URLClassLoader.defineClass上的
(URLClassLoader.java :228)
在java.net.URLClassLoader.access $ 000(URLClassLoader.java:58)$ b $在java.net.URLClassLoader $ 1.run(URLClassLoader.java:197)
在java。 security.AccessController.doPrivileged(本地方法)$ b $ java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
在sun.misc.Launcher $ AppClassLoader.loadClass(Launcher.java:301)$ b $在java.lang.ClassLoader.loadClass(ClassLoader.java:247)
在org.apache.hadoop.util。 RunJar $ 1.run(RunJar.java:126)
执行失败,退出状态:3
获取错误信息
任务失败!
任务ID:
Stage-7
日志:
FAILED:执行错误,从org.apache.hadoop.hive.ql.exec.MapredLocalTask​​返回代码3
ATTEMPT:执行BackupTask:org.apache.hadoop.hive.ql.exec.MapRedTask **

但仍然面临OOM异常,在我的群集中设置的堆大小为1 GB。
请协助我需要考虑和调整哪些属性以使这个地图边连接工作

解决方案


处理行数:300000散列表大小:299999内存使用情况:643070664速率:0.604

在300k行HT已经使用60你的堆的百分比。第一个问题要问:你确定你的表格顺序是正确的吗?在连接中的小表格是否是数据中的小表格?在编写查询时,大表应该是JOIN子句中的最后一个。哪个Hive版本在0.9或0.11上?

如果您在Hive 0.11上,并且您正确指定了连接,那么首先要尝试的是增加堆大小。从上面的数据(300k行〜> 650Mb堆),你可以计算出你需要多少堆。


I am trying to perform map side with one big Table (10G) and small Table (230 MB). With the small i will use all the columns to produce output records, after joining on key columns

I have used below setting

set hive.auto.convert.join=true;

set hive.mapjoin.smalltable.filesize=262144000;

Logs :

**2013-09-20 02:43:50     Starting to launch local task to process map join;      maximum       memory = 1065484288

2013-09-20 02:44:05     Processing rows:        200000  Hashtable size: 199999  Memory usage:   430269904       rate:0.404

2013-09-20 02:44:14     Processing rows:        300000  Hashtable size: 299999  Memory usage:   643070664       rate:0.604

Exception in thread "Thread-0" java.lang.OutOfMemoryError: Java heap space
        at java.util.jar.Manifest$FastInputStream.<init>(Manifest.java:313)
        at java.util.jar.Manifest$FastInputStream.<init>(Manifest.java:308)
        at java.util.jar.Manifest.read(Manifest.java:176)
        at java.util.jar.Manifest.<init>(Manifest.java:50)
        at java.util.jar.JarFile.getManifestFromReference(JarFile.java:168)
        at java.util.jar.JarFile.getManifest(JarFile.java:149)
        at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:696)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:228)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at org.apache.hadoop.util.RunJar$1.run(RunJar.java:126)
Execution failed with exit status: 3
Obtaining error information
Task failed!
Task ID:
  Stage-7
Logs:
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.MapredLocalTask
ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.MapRedTask**

but still i am facing OOM exception , Heap size set in my cluster is 1 GB. Please assist which properties do i need to consider and tune to make this map side join work

解决方案

Processing rows: 300000 Hashtable size: 299999 Memory usage: 643070664 rate:0.604

At 300k rows the HT already uses 60% of your heap. First question to ask: are you sure you got the table order right, is the small table in the join really the smaller table in your data? When writing the query, the large table should be the last in the JOIN clause. Which Hive version are you on 0.9 or 0.11?

If you are on Hive 0.11 and you are specifying the join correctly then the first thing to try would be to increase the Heap size. From the above data (300k row ~> 650Mb Heap) you can figure out how much heap you need.

这篇关于配置单元映射连接:内存不足异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆