Spark中的在线(增量)逻辑回归 [英] Online (incremental) logistic regression in Spark

查看：318 发布时间：2020/9/4 18:42:44 apache-spark pyspark spark-streaming apache-spark-mllib apache-spark-ml

本文介绍了Spark中的在线(增量)逻辑回归的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在Spark MLlib(基于RDD的API)中，有StreamingLogisticRegressionWithSGD用于对Logistic回归模型进行增量训练.但是，该类已被弃用，并且提供的功能很少(例如，无法访问模型系数和输出概率).

In Spark MLlib (RDD-based API) there is the StreamingLogisticRegressionWithSGD for incremental training of a Logistic Regression model. However, this class has been deprecated and offers little functionality (eg no access to model coefficients and output probabilities).

在Spark ML(基于DataFrame的API)中，我仅找到类LogisticRegression，仅具有fit方法用于批处理训练.这不允许进行模型保存，重新加载和增量训练的模式.

In Spark ML (DataFrame-based API) I only find the class LogisticRegression, having only the fit method for batch training. This doesn't allow for a pattern of model-saving, reloading and incremental training.

不用说，某些应用程序会从增量学习中受益匪浅. Spark中有任何可用的解决方案吗?

Needless to say some applications benefit greatly from incremental learning. Is there any solution available in Spark?

推荐答案

在Spark ML中，当您调用LogisticRegression.fit()时，将获得一个LogisticRegressionModel.然后，您可以将LogisticRegressionModel添加到管道并保存/加载用于增量培训的管道.

In Spark ML, when you call LogisticRegression.fit() you get a LogisticRegressionModel. You can then add the LogisticRegressionModel to a Pipeline and save/load the pipeline for incremental training.

val lr = new LogisticRegression()
val pipeline = new Pipeline().setStages(Array(lr))
model = pipeline.fit(data)
model.write.overwrite().save("/tmp/saved_model")

如果要使用流数据训练模型或将其应用于流数据，则可以定义

If you want to train the model with streaming data or apply it to streaming data, you can define a Structured Streaming dataframe and pass it to the pipeline.

例如(摘录自火花文档):

// Read all the csv files written atomically in a directory
val userSchema = new StructType().add("name", "string").add("age", "integer")
val csvDF = spark
  .readStream
  .option("sep", ";")
  .schema(userSchema)      // Specify schema of the csv files
  .csv("/path/to/directory")    // Equivalent to format("csv").load("/path/to/directory")

这篇关于Spark中的在线(增量)逻辑回归的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark中的在线(增量)逻辑回归 [英] Online (incremental) logistic regression in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark中的在线(增量)逻辑回归 [英] Online (incremental) logistic regression in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭