Spark 的 StreamingLinearRegressionWithSGD 是如何工作的? [英] How does Spark's StreamingLinearRegressionWithSGD work?
问题描述
我正在研究 StreamingLinearRegressionWithSGD
有两种方法 trainOn
和 predictOn
.这个类有一个 model 对象,随着训练数据到达 trainOn
参数中指定的流而更新.
I am working on StreamingLinearRegressionWithSGD
which has two methods trainOn
and predictOn
. This class has a model object that is updated as training data arrives in the stream specified in trainOn
argument.
同时使用相同的模型进行预测.
Simultaneously It give prediction using same model.
我想知道模型权重如何在工作人员/执行程序之间更新和同步.
I want to know that how the model weights are updated and synchronized across workers/executors.
任何链接或参考都会有所帮助.谢谢.
Any link or reference will be helpful. Thanks.
推荐答案
这里没有魔法.StreamingLinearAlgorithm
保持对当前GeneralizedLinearModel
的可变引用.
There is no magic here. StreamingLinearAlgorithm
keeps a mutable reference to the current GeneralizedLinearModel
.
trainOn
使用DStream.foreachRDD
在每个batch上训练一个新模型,然后更新model
.同样 predictOn
使用 DStream.map
来预测与 model
的当前版本.
trainOn
uses DStream.foreachRDD
to train a new model on each batch, and then updates the model
. Similarly predictOn
uses DStream.map
to predict with the current version of the model
.
由于 Spark 会为每个阶段序列化闭包,因此不需要任何额外的同步.Spark 每次计算闭包时都会使用 model
的当前值.
Since Spark will serialize closures for each stage there is no need for any additional synchronization. Spark will use the current value of the model
each time it computes the closure.
实际上它相当于在驱动程序上运行一个循环,其中交错 run
和 predict
.
Effectively it equivalent to running a loop on the driver with interleaving run
and predict
.
这篇关于Spark 的 StreamingLinearRegressionWithSGD 是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!