Spark 结构化流和 Spark-Ml 回归 [英] Spark Structured Streaming and Spark-Ml Regression

查看：33 发布时间：2021/11/14 21:49:11 apache-spark apache-spark-sql apache-spark-ml

本文介绍了Spark 结构化流和 Spark-Ml 回归的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否可以将 Spark-Ml 回归应用于流媒体源?我看到有 StreamingLogisticRegressionWithSGD 但它是针对较旧的 RDD API 而我无法将其与结构化流媒体源一起使用.

Is it possible to apply Spark-Ml regression to streaming sources? I see there is StreamingLogisticRegressionWithSGD but It's for older RDD API and I couldn't use It with structured streaming sources.

我应该如何对结构化流媒体源应用回归?
(有点过时)如果我不能使用流 API 进行回归，我如何以批处理方式提交偏移量?(卡夫卡接收器)

推荐答案

今天(Spark 2.2/2.3)不支持结构化流中的机器学习，也没有在这个方向上正在进行的工作.请按照 SPARK-16424 跟踪未来的进展.

Today (Spark 2.2 / 2.3) there is no support for machine learning in Structured Streaming and there is no ongoing work in this direction. Please follow SPARK-16424 to track future progress.

但是你可以:

使用 forEach sink 和某种形式的外部状态存储训练迭代的非分布式模型.在高级别的回归模型可以这样实现:

Train iterative, non-distributed models using forEach sink and some form of external state storage. At a high level regression model could be implemented like this:

在调用 ForeachWriter.open 时获取最新模型，并为分区初始化损失累加器(不是 Spark 意义上的，只是局部变量).
计算 ForeachWriter.process 中每条记录的损失并更新累加器.
在调用 ForeachWriter.close 时推送丢失到外部存储.
这将使外部存储负责计算梯度和更新模型，实现依赖于存储.

Fetch latest model when calling ForeachWriter.open and initialize loss accumulator (not in a Spark sense, just local variable) for the partition.
Compute loss for each record in ForeachWriter.process and update accumulator.
Push loses to external store when calling ForeachWriter.close.
This would leave external storage in charge with computing gradient and updating model with implementation dependent on the store.

尝试破解 SQL 查询(参见 https://github.com/holdenk/spark-structured-streaming-ml 来自 Holden Karau)

Try to hack SQL queries (see https://github.com/holdenk/spark-structured-streaming-ml by Holden Karau)

这篇关于Spark 结构化流和 Spark-Ml 回归的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark 结构化流和 Spark-Ml 回归 [英] Spark Structured Streaming and Spark-Ml Regression

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 结构化流和 Spark-Ml 回归 [英] Spark Structured Streaming and Spark-Ml Regression

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭