Spark ML - 保存 OneVsRestModel [英] Spark ML - Save OneVsRestModel

查看:33
本文介绍了Spark ML - 保存 OneVsRestModel的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在重构我的代码以利用 DataFrames, Estimators和管道.我最初在 MLlib Multiclass LogisticRegressionWithLBFGS 上使用 代码>RDD[LabeledPoint].我很享受学习和使用新 API,但我不确定如何保存我的新模型并将其应用于新数据.

I am in the middle of refactoring my code to take advantage of DataFrames, Estimators, and Pipelines. I was originally using MLlib Multiclass LogisticRegressionWithLBFGS on RDD[LabeledPoint]. I am enjoying learning and using the new API, but I am not sure how to save my new model and apply it on new data.

目前,LogisticRegression 的 ML 实现仅支持二进制分类.我是,而不是使用 OneVsRest 像这样:

Currently, the ML implementation of LogisticRegression only supports binary classification. I am, instead using OneVsRest like so:

val lr = new LogisticRegression().setFitIntercept(true)
val ovr = new OneVsRest()
ovr.setClassifier(lr)
val ovrModel = ovr.fit(training)

我现在想保存我的 OneVsRestModel,但这似乎不受 API 支持.我试过了:

I would now like to save my OneVsRestModel, but this does not seem to be supported by the API. I have tried:

ovrModel.save("my-ovr") // Cannot resolve symbol save
ovrModel.models.foreach(_.save("model-" + _.uid)) // Cannot resolve symbol save

有没有办法保存它,以便我可以将它加载到新的应用程序中以进行新的预测?

Is there a way to save this, so I can load it in a new application for making new predictions?

推荐答案

Spark 2.0.0

OneVsRestModel 实现了 MLWritable 所以应该可以直接保存它.下面显示的方法对于单独保存单个模型仍然很有用.

OneVsRestModel implements MLWritable so it should be possible to save it directly. Method shown below can be still useful to save individual models separately.

火花<2.0.0

这里的问题是 models 返回 ClassificationModel[_, _]]Array 而不是 Array> LogisticRegressionModel(或 MLWritable).要使其工作,您必须具体说明类型:

The problem here is that models returns an Array of ClassificationModel[_, _]] not an Array of LogisticRegressionModel (or MLWritable). To make it work you'll have to be specific about the types:

import org.apache.spark.ml.classification.LogisticRegressionModel

ovrModel.models.zipWithIndex.foreach { 
  case (model: LogisticRegressionModel, i: Int) => 
    model.save(s"model-${model.uid}-$i")
}

或者更通用:

import org.apache.spark.ml.util.MLWritable

ovrModel.models.zipWithIndex.foreach { 
  case (model: MLWritable, i: Int) =>
    model.save(s"model-${model.uid}-$i")
}

不幸的是,目前 (Spark 1.6) OneVsRestModel 没有实现 MLWritable,因此无法单独保存.

Unfortunately as for now (Spark 1.6) OneVsRestModel doesn't implement MLWritable so it cannot be saved alone.

注意:

OneVsRest 中的所有模型似乎都使用相同的 uid,因此我们需要一个显式索引.以后识别模型也很有用.

All models int the OneVsRest seem to use the same uid hence we need an explicit index. It will be also useful to identify the model later.

这篇关于Spark ML - 保存 OneVsRestModel的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆