在ALS Spark Scala中进行点校验 [英] Checkpointing In ALS Spark Scala
问题描述
我只想问一下具体如何在Spark中成功使用checkpointInterval。你在ALS代码中的这个评论是什么意思: p>
我们如何设置checkPoint目录?我们可以使用任何兼容hdfs的目录吗?
您可以使用 SparkContext.setCheckpointDir
。据我所知,在本地模式下,本地和DFS路径都可以正常工作,但是在集群中目录必须是HDFS路径。
是否使用setCheckpointInterval在ALS中实现点校验以避免堆栈溢出错误?
它应该帮帮我。请参阅 SPARK-1006
PS:似乎为了在ALS中真正执行检查点,必须设置 checkpointDir
或者检查点不会有效[参考文献]。 here 。]
I just want to ask on the specifics how to successfully use checkpointInterval in Spark. And what do you mean by this comment in the code for ALS: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
If the checkpoint directory is not set in [[org.apache.spark.SparkContext]], * this setting is ignored.
- How can we set checkPoint directory? Can we use any hdfs-compatible directory for this?
- Is using setCheckpointInterval the correct way to implement checkpointing in ALS to avoid Stack Overflow errors?
Edit:
How can we set checkPoint directory? Can we use any hdfs-compatible directory for this?
You can use SparkContext.setCheckpointDir
. As far as I remember in local mode both local and DFS paths work just fine, but on the cluster the directory must be a HDFS path.
Is using setCheckpointInterval the correct way to implement checkpointing in ALS to avoid Stack Overflow errors?
It should help. See SPARK-1006
PS: It seems that in order to actually perform check-point in ALS, the checkpointDir
must be set or check-pointing won't be effective [Ref. here.]
这篇关于在ALS Spark Scala中进行点校验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!