Mahout用完了堆空间 [英] Mahout runs out of heap space

查看:98
本文介绍了Mahout用完了堆空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Mahout在一组推文上运行NaiveBayes.两个文件,一个100 MB和一个300 MB.我将JAVA_HEAP_MAX更改为JAVA_HEAP_MAX = -Xmx2000m(之前为1000).但是即使这样,mahout仍然运行了几个小时(准确地说是2个),然后才抱怨堆空间错误.我该怎么办才能解决?

I am running NaiveBayes on a set of tweets using Mahout. Two files, one 100 MB and one 300 MB. I changed JAVA_HEAP_MAX to JAVA_HEAP_MAX=-Xmx2000m ( earlier it was 1000). But even then, mahout ran for a few hours ( 2 to be precise) before it complained of heap space error. What should i do to resolve ?

如果有帮助,请提供更多信息:我在单个节点上运行,我的笔记本电脑正常运行,并且具有3GB RAM(仅).

Some more info if it helps : I am running on a single node, my laptop infact and it has 3GB of RAM (only) .

谢谢.

我第三次运行< 1/2的数据,这是我第一次使用的数据(第一次我使用了550万条推文,第二次我使用了200万条),但仍然遇到堆空间问题.我出于完成目的而发布完整错误:

I ran it the third time with <1/2 of the data that i used the first time ( first time i used 5.5 million tweets, second i used 2million ) and i still got a heap space problem. I am posting the complete error for completion purposes :

17 May, 2011 2:16:22 PM
 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO:  map 50% reduce 0%

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.lang.AbstractStringBuilder.<init>(AbstractStringBuilder.java:62)
    at java.lang.StringBuilder.<init>(StringBuilder.java:85)
    at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1283)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
    at org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureDriver.runJob(BayesFeatureDriver.java:63)
    at org.apache.mahout.classifier.bayes.mapreduce.bayes.BayesDriver.runJob(BayesDriver.java:44)
    at org.apache.mahout.classifier.bayes.TrainClassifier.trainNaiveBayes(TrainClassifier.java:54)
    at org.apache.mahout.classifier.bayes.TrainClassifier.main(TrainClassifier.java:162)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)
17 May, 2011 7:14:53 PM org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local_0001
java.lang.OutOfMemoryError: Java heap space
    at java.lang.String.substring(String.java:1951)
    at java.lang.String.subSequence(String.java:1984)
    at java.util.regex.Pattern.split(Pattern.java:1019)
    at java.util.regex.Pattern.split(Pattern.java:1076)
    at org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureMapper.map(BayesFeatureMapper.java:78)
    at org.apache.mahout.classifier.bayes.mapreduce.common.BayesFeatureMapper.map(BayesFeatureMapper.java:46)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)

我正在发布我更改的bin/mahout脚本的一部分: 原始:

And i am posting the part of the bin/mahout script that i changed : Original :

JAVA=$JAVA_HOME/bin/java
            JAVA_HEAP_MAX=-Xmx1000m 

if [ "$MAHOUT_HEAPSIZE" != "" ]; then
  #echo "run with heapsize $MAHOUT_HEAPSIZE"
  JAVA_HEAP_MAX="-Xmx""$MAHOUT_HEAPSIZE""m"
  #echo $JAVA_HEAP_MAX
fi

已修改:

JAVA=$JAVA_HOME/bin/java
 JAVA_HEAP_MAX=-Xmx2000m 


if [ "$MAHOUT_HEAPSIZE" != "" ]; then
  #echo "run with heapsize $MAHOUT_HEAPSIZE"
  JAVA_HEAP_MAX="-Xmx""$MAHOUT_HEAPSIZE""m"
  #echo $JAVA_HEAP_MAX
fi

推荐答案

您没有指定哪个进程用尽了内存,这一点很重要.您需要设置MAHOUT_HEAPSIZE,而不是JAVA_HEAP_MAX.

You're not specifying what process ran out of memory, which is important. You need to set MAHOUT_HEAPSIZE, not whatever JAVA_HEAP_MAX is.

这篇关于Mahout用完了堆空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆