管道后如何将变量名称映射到特征 [英] How to map variable names to features after pipeline

查看：21 发布时间：2021/11/14 20:59:29 scala apache-spark apache-spark-mllib apache-spark-ml

本文介绍了管道后如何将变量名称映射到特征的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我修改了 OneHotEncoder 示例以实际训练 LogisticRegression.我的问题是如何将生成的权重映射回分类变量?

I have modified the OneHotEncoder example to actually train a LogisticRegression. My question is how to map the generated weights back to the categorical variables?

def oneHotEncoderExample(sqlContext: SQLContext): Unit = {

val df = sqlContext.createDataFrame(Seq(
    (0, "a", 1.0),
    (1, "b", 1.0),
    (2, "c", 0.0),
    (3, "d", 1.0),
    (4, "e", 1.0),
    (5, "f", 0.0)
)).toDF("id", "category", "label")
df.show()

val indexer = new StringIndexer()
  .setInputCol("category")
  .setOutputCol("categoryIndex")
  .fit(df)
val indexed = indexer.transform(df)
indexed.select("id", "categoryIndex").show()

val encoder = new OneHotEncoder()
  .setInputCol("categoryIndex")
  .setOutputCol("features")
val encoded = encoder.transform(indexed)
encoded.select("id", "features").show()


val lr = new LogisticRegression()
  .setMaxIter(10)
  .setRegParam(0.01)

val pipeline = new Pipeline()
  .setStages(Array(indexer, encoder, lr))

// Fit the pipeline to training documents.
val pipelineModel  = pipeline.fit(df)

val lorModel = pipelineModel.stages.last.asInstanceOf[LogisticRegressionModel]
println(s"LogisticRegression: ${(lorModel :LogisticRegressionModel)}")
// Print the weights and intercept for logistic regression.
println(s"Weights: ${lorModel.coefficients} Intercept: ${lorModel.intercept}")
}

输出

权重:[1.5098946631236487,-5.509833649232324,1.5098946631236487,1.5098946631236487,-5.50983364208818236249]28188182364923p364923p381623

Weights: [1.5098946631236487,-5.509833649232324,1.5098946631236487,1.5098946631236487,-5.509833649232324] Intercept: 2.6679020381781235

推荐答案

我假设您想要的是访问功能元数据.让我们从转换现有的 DataFrame 开始:

I assume what you want here is an access the features metadata. Lets start with transforming existing DataFrame:

val transformedDF = pipelineModel.transform(df)

接下来可以提取元数据对象:

Next you can extract metadata object:

val meta: org.apache.spark.sql.types.Metadata = transformedDF
  .schema(transformedDF.schema.fieldIndex("features"))
  .metadata

最后让我们提取属性:

meta.getMetadata("ml_attr").getMetadata("attrs")
//  org.apache.spark.sql.types.Metadata = {"binary":[
//    {"idx":0,"name":"e"},{"idx":1,"name":"f"},{"idx":2,"name":"a"},
//    {"idx":3,"name":"b"},{"idx":4,"name":"c"}]}

这些可用于将权重与原始特征相关联.

These can be used to relate weights back to the original features.

这篇关于管道后如何将变量名称映射到特征的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

管道后如何将变量名称映射到特征 [英] How to map variable names to features after pipeline

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

管道后如何将变量名称映射到特征 [英] How to map variable names to features after pipeline

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭