在ALS Spark Scala中进行点校验 [英] Checkpointing In ALS Spark Scala

查看:258
本文介绍了在ALS Spark Scala中进行点校验的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只想问一下具体如何在Spark中成功使用checkpointInterval。你在ALS代码中的这个评论是什么意思: p>

解决方案


我们如何设置checkPoint目录?我们可以使用任何兼容hdfs的目录吗?


您可以使用 SparkContext.setCheckpointDir 。据我所知,在本地模式下,本地和DFS路径都可以正常工作,但是在集群中目录必须是HDFS路径。


是否使用setCheckpointInterval在ALS中实现点校验以避免堆栈溢出错误?

它应该帮帮我。请参阅 SPARK-1006



PS:似乎为了在ALS中真正执行检查点,必须设置 checkpointDir 或者检查点不会有效[参考文献]。 here 。]


I just want to ask on the specifics how to successfully use checkpointInterval in Spark. And what do you mean by this comment in the code for ALS: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala

If the checkpoint directory is not set in [[org.apache.spark.SparkContext]], * this setting is ignored.

  1. How can we set checkPoint directory? Can we use any hdfs-compatible directory for this?
  2. Is using setCheckpointInterval the correct way to implement checkpointing in ALS to avoid Stack Overflow errors?

Edit:

解决方案

How can we set checkPoint directory? Can we use any hdfs-compatible directory for this?

You can use SparkContext.setCheckpointDir. As far as I remember in local mode both local and DFS paths work just fine, but on the cluster the directory must be a HDFS path.

Is using setCheckpointInterval the correct way to implement checkpointing in ALS to avoid Stack Overflow errors?

It should help. See SPARK-1006

PS: It seems that in order to actually perform check-point in ALS, the checkpointDir must be set or check-pointing won't be effective [Ref. here.]

这篇关于在ALS Spark Scala中进行点校验的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆