如何将Logistic回归模型得到的系数映射到PYSPARK中的特征名称 [英] How to map the coefficient obtained from logistic regression model to the feature names in pyspark

查看：20 发布时间：2022/4/19 21:23:32 pyspark logistic-regression feature-extraction

本文介绍了如何将Logistic回归模型得到的系数映射到PYSPARK中的特征名称的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用到Databricks列出的管道流构建了一个Logistic回归模型。 https://docs.databricks.com/spark/latest/mllib/binary-classification-mllib-pipelines.html

使用OneHotEncoderEstimator对特征(数字和字符串特征)进行编码，然后使用标准定标器进行转换。

我想知道如何将Logistic回归得到的权重(系数)映射到原始数据帧中的特征名称。

换言之，如何获得与模型得到的权重或系数相对应的特征

谢谢

我尝试从lrModel.schema中提取特征，该模式给出了一个structField列表，其中显示了特征

我尝试从方案中提取要素并映射到权重，但未成功

from pyspark.ml.classification import LogisticRegression

# Create initial LogisticRegression model
lr = LogisticRegression(labelCol="label", featuresCol="scaledFeatures", maxIter=10)

# Train model with Training Data

lrModel = lr.fit(trainingData)

predictions = lrModel.transform(trainingData)

LRschema = predictions.schema

提取元组列表(特征权重、特征名称)的预期结果

推荐答案

假设您要使用Logistic回归，此 pandas 解决方法将为您提供结果。

lr = LogisticRegression(labelCol="label", featuresCol="features",maxIter=50,threshold=0.5)

lr_model=lr.fit(train_set)

print("Intercept: " + str(lr_model.intercept))  

pd.DataFrame({'coefficients':lr_model.coefficients, 'feature':list(pd.DataFrame(train_set.schema["features"].metadata["ml_attr"]["attrs"]['numeric']).sort_values('idx')['name'])})

这篇关于如何将Logistic回归模型得到的系数映射到PYSPARK中的特征名称的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将Logistic回归模型得到的系数映射到PYSPARK中的特征名称 [英] How to map the coefficient obtained from logistic regression model to the feature names in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将Logistic回归模型得到的系数映射到PYSPARK中的特征名称 [英] How to map the coefficient obtained from logistic regression model to the feature names in pyspark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭