Apache Pig:本地模式下具有简单 GROUP BY 的 OutOfMemory 异常 [英] Apache Pig: OutOfMemory exception with simple GROUP BY in local mode
问题描述
当我尝试在一个很小的 (3KB) 随机生成的示例数据集上执行一个非常简单的 GROUP BY 时,我从 Pig 收到 OutOfMemory 异常.
I'm getting an OutOfMemory exception from Pig when trying to execute a very simple GROUP BY on a tiny (3KB), randomly-generated, example data set.
猪脚本:
$ cat example.pig
raw =
LOAD 'example-data'
USING PigStorage()
AS (thing1_id:int,
thing2_id:int,
name:chararray,
timestamp:long);
grouped =
GROUP raw BY thing1_id;
DUMP grouped;
数据:
$ cat example-data
281906 13636091 hide 1334350350
174952 20148444 save 1334427826
1082780 16033108 hide 1334500374
2932953 14682185 save 1334501648
1908385 28928536 hide 1334367665
[snip]
$ wc example-data
100 400 3239 example-data
我们开始:
$ pig -x local example.pig
[snip]
java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:949)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:674)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:756)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
[snip]
还有一些额外的信息:
$ apt-cache show hadoop | grep Version
Version: 1.0.2
$ pig --version
Apache Pig version 0.9.2 (r1232772)
compiled Jan 17 2012, 23:49:20
$ echo $PIG_HEAPSIZE
4096
在这一点上,我觉得我一定做错了什么,因为我看不出 3 kB 文本会导致堆填满的任何原因.
At this point, I feel like I must be doing something drastically wrong because I can't see any reason why 3 kB of text would ever cause the heap to fill up.
推荐答案
检查这个:[link] http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html
Check this: [link] http://sumedha.blogspot.in/2012/01/solving-apache-pig-javalangoutofmemorye.html
neil,你说得对,让我解释一下这样的事情:在bin/pig脚本文件中,源代码是:
neil, you are right, let me explain the things like this: In the bin/pig script file, the source code is :
JAVA_HEAP_MAX=-Xmx1000m
JAVA_HEAP_MAX=-Xmx1000m
# 检查可能覆盖默认参数的 envvars
# check envvars which might override default args
if [ "$PIG_HEAPSIZE" != "" ];然后JAVA_HEAP_MAX="-Xmx""$PIG_HEAPSIZE""m"
if [ "$PIG_HEAPSIZE" != "" ]; then JAVA_HEAP_MAX="-Xmx""$PIG_HEAPSIZE""m" fi
它仅使用 -Xmx 开关将 Java_heap_size 设置为 maxium(x"),但我不知道为什么此脚本覆盖不起作用,这就是原因,我要求您使用直接指定 java 堆大小链接中指定的参数.我没有时间检查为什么会出现这个问题.如果有人有想法,请在此处发布.
It is setting the Java_heap_size to maxium ("x") using the -Xmx switch only,but i didnot know why this script overriding is not working, that is the reason, i asked you to specify directly the java heap size using the paramters as specified in the link. I didnot got time to check why this problem is raising. If any one have idea please post it here.
这篇关于Apache Pig:本地模式下具有简单 GROUP BY 的 OutOfMemory 异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!