Apache Pig:OutOfMemory异常,在本地模式下具有简单的GROUP BY [英] Apache Pig: OutOfMemory exception with simple GROUP BY in local mode
问题描述
当我尝试在一个很小的(3KB)随机生成的示例数据集上执行一个非常简单的GROUP BY时,我从Pig收到了OutOfMemory异常.
I'm getting an OutOfMemory exception from Pig when trying to execute a very simple GROUP BY on a tiny (3KB), randomly-generated, example data set.
猪脚本:
$ cat example.pig
raw =
LOAD 'example-data'
USING PigStorage()
AS (thing1_id:int,
thing2_id:int,
name:chararray,
timestamp:long);
grouped =
GROUP raw BY thing1_id;
DUMP grouped;
数据:
$ cat example-data
281906 13636091 hide 1334350350
174952 20148444 save 1334427826
1082780 16033108 hide 1334500374
2932953 14682185 save 1334501648
1908385 28928536 hide 1334367665
[snip]
$ wc example-data
100 400 3239 example-data
我们在这里:
$ pig -x local example.pig
[snip]
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
[snip]
以及一些其他信息:
$ apt-cache show hadoop | grep Version
Version: 1.0.2
$ pig --version
Apache Pig version 0.9.2 (r1232772)
compiled Jan 17 2012, 23:49:20
$ echo $PIG_HEAPSIZE
4096
在这一点上,我觉得我必须做一些严重的错误,因为我看不到任何原因导致3 kB的文本会导致堆被填满.
At this point, I feel like I must be doing something drastically wrong because I can't see any reason why 3 kB of text would ever cause the heap to fill up.
推荐答案
Check this: [link] http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html
尼尔,你是对的,让我解释一下这样的事情:在bin/pig脚本文件中,源代码是:
neil, you are right, let me explain the things like this: In the bin/pig script file, the source code is :
JAVA_HEAP_MAX = -Xmx1000m
JAVA_HEAP_MAX=-Xmx1000m
#检查可能覆盖默认参数的envvars
# check envvars which might override default args
如果[["$ PIG_HEAPSIZE"!="];然后 JAVA_HEAP_MAX =-Xmx""$ PIG_HEAPSIZE""m" fi
if [ "$PIG_HEAPSIZE" != "" ]; then JAVA_HEAP_MAX="-Xmx""$PIG_HEAPSIZE""m" fi
仅使用-Xmx开关将Java_heap_size设置为maxium("x"),但是我不知道为什么此脚本覆盖无效,这是原因,我要求您使用以下命令直接指定Java堆大小链接中指定的参数.我没有时间检查为什么这个问题引发了.如果有任何想法,请在此处发布.
It is setting the Java_heap_size to maxium ("x") using the -Xmx switch only,but i didnot know why this script overriding is not working, that is the reason, i asked you to specify directly the java heap size using the paramters as specified in the link. I didnot got time to check why this problem is raising. If any one have idea please post it here.
这篇关于Apache Pig:OutOfMemory异常,在本地模式下具有简单的GROUP BY的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!