如何在火花流作业期间更新ML模型而无需重新启动应用程序? [英] How to update a ML model during a spark streaming job without restarting the application?

查看：92 发布时间：2020/5/4 10:08:05 apache-spark machine-learning spark-streaming

本文介绍了如何在火花流作业期间更新ML模型而无需重新启动应用程序?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个Spark Streaming工作，其目标是:

I've got a Spark Streaming job whose goal is to :

阅读一批消息
使用预先训练的ML管道根据这些消息预测变量Y

问题是，我希望能够更新执行者使用的模型而无需重新启动应用程序.

The problem is, I'd like to be able to update the model used by the executors without restarting the application.

简单地说，这是它的样子:

Simply put, here's what it looks like :

model = #model initialization

def preprocess(keyValueList):
    #do some preprocessing

def predict(preprocessedRDD):
    if not preprocessedRDD.isEmpty():
        df = #create df from rdd
        df = model.transform(df)
        #more things to do

stream = KafkaUtils.createDirectStream(ssc, [kafkaTopic], kafkaParams)

stream.mapPartitions(preprocess).foreachRDD(predict)

在这种情况下，仅使用模型.未更新.

In this case, the model is simply used. Not updated.

我已经考虑过几种可能性，但是现在我已经将它们划掉了:

I've thought about several possibilities but I have now crossed them all out :

每次更改模型时广播模型(无法更新，只读)
在执行程序上从HDFS读取模型(它需要SparkContext，因此是不可能的)

有什么主意吗?

非常感谢！

如何在火花流作业期间更新ML模型而无需重新启动应用程序? [英] How to update a ML model during a spark streaming job without restarting the application?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何在火花流作业期间更新ML模型而无需重新启动应用程序? [英] How to update a ML model during a spark streaming job without restarting the application?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭