Apache Pig:本地模式下具有简单 GROUP BY 的 OutOfMemory 异常 [英] Apache Pig: OutOfMemory exception with simple GROUP BY in local mode

查看:20
本文介绍了Apache Pig:本地模式下具有简单 GROUP BY 的 OutOfMemory 异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我尝试在一个很小的 ​​(3KB) 随机生成的示例数据集上执行一个非常简单的 GROUP BY 时,我从 Pig 收到 OutOfMemory 异常.

I'm getting an OutOfMemory exception from Pig when trying to execute a very simple GROUP BY on a tiny (3KB), randomly-generated, example data set.

猪脚本:

$ cat example.pig
raw =
LOAD 'example-data'
    USING PigStorage()
    AS (thing1_id:int,
        thing2_id:int,
        name:chararray,
        timestamp:long);

grouped =
GROUP raw BY thing1_id;

DUMP grouped;

数据:

$ cat example-data
281906  13636091    hide    1334350350
174952  20148444    save    1334427826
1082780 16033108    hide    1334500374
2932953 14682185    save    1334501648
1908385 28928536    hide    1334367665
[snip]

$ wc example-data
 100  400 3239 example-data

我们开始:

$ pig -x local example.pig

[snip]

java.lang.OutOfMemoryError: Java heap space
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

[snip]

还有一些额外的信息:

$ apt-cache show hadoop | grep Version
Version: 1.0.2

$ pig --version
Apache Pig version 0.9.2 (r1232772) 
compiled Jan 17 2012, 23:49:20

$ echo $PIG_HEAPSIZE
4096

在这一点上,我觉得我一定做错了什么,因为我看不出 3 kB 文本会导致堆填满的任何原因.

At this point, I feel like I must be doing something drastically wrong because I can't see any reason why 3 kB of text would ever cause the heap to fill up.

推荐答案

检查这个:[link] http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html

Check this: [link] http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html

neil,你说得对,让我解释一下这样的事情:在bin/pig脚本文件中,源代码是:

neil, you are right, let me explain the things like this: In the bin/pig script file, the source code is :

JAVA_HEAP_MAX=-Xmx1000m

JAVA_HEAP_MAX=-Xmx1000m

# 检查可能覆盖默认参数的 envvars

# check envvars which might override default args

if [ "$PIG_HEAPSIZE" != "" ];然后JAVA_HEAP_MAX="-Xmx""$PIG_HEAPSIZE""m"

if [ "$PIG_HEAPSIZE" != "" ]; then JAVA_HEAP_MAX="-Xmx""$PIG_HEAPSIZE""m" fi

它仅使用 -Xmx 开关将 Java_heap_size 设置为 maxium(x"),但我不知道为什么此脚本覆盖不起作用,这就是原因,我要求您使用直接指定 java 堆大小链接中指定的参数.我没有时间检查为什么会出现这个问题.如果有人有想法,请在此处发布.

It is setting the Java_heap_size to maxium ("x") using the -Xmx switch only,but i didnot know why this script overriding is not working, that is the reason, i asked you to specify directly the java heap size using the paramters as specified in the link. I didnot got time to check why this problem is raising. If any one have idea please post it here.

这篇关于Apache Pig:本地模式下具有简单 GROUP BY 的 OutOfMemory 异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆