使用ALS训练的时候火花给人的StackOverflowError [英] Spark gives a StackOverflowError when training using ALS

查看：3681 发布时间：2016/5/22 15:52:04 apache-spark pyspark

本文介绍了使用ALS训练的时候火花给人的StackOverflowError的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当试图使用培训在ALS星火的MLLib机器学习模型，我一直在收到的StackOverflowError。这里的堆栈跟踪的小样本：

When attempting to train a machine learning model using ALS in Spark's MLLib, I kept on receiving a StackoverflowError. Here's a small sample of the stack trace:

Traceback (most recent call last):
  File "/Users/user/Spark/imf.py", line 31, in <module>
    model = ALS.train(rdd, rank, numIterations)
  File "/usr/local/Cellar/apache-spark/1.3.1_1/libexec/python/pyspark/mllib/recommendation.py", line 140, in train
    lambda_, blocks, nonnegative, seed)
  File "/usr/local/Cellar/apache-spark/1.3.1_1/libexec/python/pyspark/mllib/common.py", line 120, in callMLlibFunc
    return callJavaFunc(sc, api, *args)
  File "/usr/local/Cellar/apache-spark/1.3.1_1/libexec/python/pyspark/mllib/common.py", line 113, in callJavaFunc
    return _java2py(sc, func(*args))
  File "/usr/local/Cellar/apache-spark/1.3.1_1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__
  File "/usr/local/Cellar/apache-spark/1.3.1_1/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o35.trainALSModel.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 40.0 failed 1 times, most recent failure: Lost task 0.0 in stage 40.0 (TID 35, localhost): java.lang.StackOverflowError
        at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2296)
        at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2589)

尝试运行.mean（）来计算均方误差时

也会出现此错误。它出现在1.3.1_1和版本两种版本1.4.1星火。我用PySpark，并增加可没有帮助记忆。

This error would also appear when attempting to run .mean() to calculate the Mean Squared Error. It appeared in both version 1.3.1_1 and version 1.4.1 of Spark. I was using PySpark, and increasing the memory available did not help.

推荐答案

解决的办法是增加检查点，其中prevents从创建溢出所使用的codeBase的递归。首先，创建一个新的目录来存放检查点。然后，有你的SparkContext使用该目录检查点。这是在Python的例子：

The solution was to add checkpointing, which prevents the recursion used by the codebase from creating an overflow. First, create a new directory to store the checkpoints. Then, have your SparkContext use that directory for checkpointing. Here is the example in Python:

sc.setCheckpointDir('checkpoint/')

您可能还需要检查点添加到ALS为好，但我一直没能确定是否有差别。要添加一个检查点出现（可能不是必要的），只是做：

You may also need to add checkpointing to the ALS as well, but I haven't been able to determine whether that makes a difference. To add a checkpoint there (probably not necessary), just do:

ALS.checkpointInterval = 2

这篇关于使用ALS训练的时候火花给人的StackOverflowError的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用ALS训练的时候火花给人的StackOverflowError [英] Spark gives a StackOverflowError when training using ALS

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用ALS训练的时候火花给人的StackOverflowError [英] Spark gives a StackOverflowError when training using ALS

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭