ALS Spark Scala 中的检查点 [英] Checkpointing In ALS Spark Scala

查看:37
本文介绍了ALS Spark Scala 中的检查点的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是想问一下如何在 Spark 中成功使用 checkpointInterval 的具体细节.你在 ALS 代码中的这条评论是什么意思:

解决方案

如何设置checkPoint目录?我们可以为此使用任何与 hdfs 兼容的目录吗?

您可以使用 SparkContext.setCheckpointDir.据我所知,在本地模式下,本地和 DFS 路径都可以正常工作,但在集群上目录必须是 HDFS 路径.

<块引用>

使用 setCheckpointInterval 是在 ALS 中实现检查点以避免堆栈溢出错误的正确方法吗?

应该会有所帮助.参见 SPARK-1006

PS: 似乎为了在 ALS 中实际执行检查点,必须设置 checkpointDir 否则检查点将无效 [Ref.这里.]

I just want to ask on the specifics how to successfully use checkpointInterval in Spark. And what do you mean by this comment in the code for ALS: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala

If the checkpoint directory is not set in [[org.apache.spark.SparkContext]], * this setting is ignored.

  1. How can we set checkPoint directory? Can we use any hdfs-compatible directory for this?
  2. Is using setCheckpointInterval the correct way to implement checkpointing in ALS to avoid Stack Overflow errors?

Edit:

解决方案

How can we set checkPoint directory? Can we use any hdfs-compatible directory for this?

You can use SparkContext.setCheckpointDir. As far as I remember in local mode both local and DFS paths work just fine, but on the cluster the directory must be a HDFS path.

Is using setCheckpointInterval the correct way to implement checkpointing in ALS to avoid Stack Overflow errors?

It should help. See SPARK-1006

PS: It seems that in order to actually perform check-point in ALS, the checkpointDir must be set or check-pointing won't be effective [Ref. here.]

这篇关于ALS Spark Scala 中的检查点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆