是否可以在spark.ml管道中访问estimator属性? [英] Is it possible to access estimator attributes in spark.ml pipelines?

查看：112 发布时间：2020/5/28 0:43:20 scala apache-spark pipeline apache-spark-ml

本文介绍了是否可以在spark.ml管道中访问estimator属性?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Spark 1.5.1中有一个spark.ml管道，该管道由一系列转换器和一个k均值估计器组成.我希望能够访问 KMeansModel .clusterCenters居中，但无法确定具体方法.是否有与sklearn的pipeline.named_steps功能等效的spark.ml?

I have a spark.ml pipeline in Spark 1.5.1 which consists of a series of transformers followed by a k-means estimator. I want to be able to access the KMeansModel.clusterCenters after fitting the pipeline, but can't figure out how. Is there a spark.ml equivalent of sklearn's pipeline.named_steps feature?

我发现了此答案，其中提供了两种选择.如果我将k-means模型从管道中取出并单独进行拟合，则第一个方法可行，但这有点违反了管道的目的.第二个选项不起作用-我得到error: value getModel is not a member of org.apache.spark.ml.PipelineModel.

I found this answer which gives two options. The first works if I take the k-means model out of my pipeline and fit it separately, but that kinda defeats the purpose of a pipeline. The second option doesn't work - I get error: value getModel is not a member of org.apache.spark.ml.PipelineModel.

管道示例:

import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
import org.apache.spark.ml.clustering.{KMeans, KMeansModel}
import org.apache.spark.ml.Pipeline

// create example dataframe
val sentenceData = sqlContext.createDataFrame(Seq(
  ("Hi I heard about Spark"),
  ("I wish Java could use case classes"),
  ("K-means models are neat")
  )).toDF("sentence")

// initialize pipeline stages
val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words")
val hashingTF = new HashingTF().setInputCol("words").setOutputCol("features").setNumFeatures(20)
val kmeans = new KMeans()
val pipeline = new Pipeline().setStages(Array(tokenizer, hashingTF, kmeans))

// fit the pipeline
val fitKmeans = pipeline.fit(sentenceData)

因此，现在fitKmeans的类型为org.apache.spark.ml.PipelineModel.我的问题是，如何访问该管道中包含的k均值模型计算出的聚类中心?如上所述，当不包含在管道中时，可以使用fitKmeans.clusterCenters来完成.

So now fitKmeans is of type org.apache.spark.ml.PipelineModel. My question is, how do I access the cluster centers calculated by the k-means model contained within this pipeline? As noted above, when not contained in a pipeline, this can be done with fitKmeans.clusterCenters.

是否可以在spark.ml管道中访问estimator属性? [英] Is it possible to access estimator attributes in spark.ml pipelines?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

是否可以在spark.ml管道中访问estimator属性? [英] Is it possible to access estimator attributes in spark.ml pipelines?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭