ChiSqSelector - 真实功能 - Spark [英] ChiSqSelector - Real Features - Spark

查看：19 发布时间：2021/11/14 21:07:54 scala apache-spark apache-spark-mllib

本文介绍了ChiSqSelector - 真实功能 - Spark的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 Spark 1.6 构建一个 NB 模型，并使用 ChiSqSelector 来识别主要特征.我总共有 7 个特征并寻找前 3 个特征.虽然过程运行良好，但我将如何识别被评为顶级特征的实际特征.由于数据已分类，我无法将输出映射到实际输入列.

I am building a NB model with Spark 1.6 and using ChiSqSelector to identify the top features. I have a total of 7 features and looking for the top 3. While the process runs fine, how will i identify the actual feature that has been rated as the top feature. Since the data is categorized, i am not able to map the output to the actual input column.

val chidata = cat_recs.map(r => (r.getDouble(targetInd), Vectors.dense(featuresidx.map(r.getDouble(_)).toArray))).toDF("target","features")
val sel = new ChiSqSelector().setNumTopFeatures(3).setFeaturesCol("features").setLabelCol("target").setOutputCol("selectedFeatuers")
val chiresult = sel.fit(chidata).transform(chidata)

输出为

scala> chiresult.foreach(println)
[1.0,[0.0,2.0,0.0,5.0,7.0,5.0,1.0],[0.0,5.0,7.0]]
[1.0,[4.0,3.0,0.0,5.0,7.0,5.0,1.0],[0.0,5.0,7.0]]
[0.0,[3.0,2.0,0.0,5.0,7.0,5.0,3.0],[0.0,5.0,7.0]]
[1.0,[1.0,2.0,0.0,1.0,7.0,5.0,2.0],[0.0,1.0,7.0]]
[1.0,[0.0,2.0,0.0,1.0,7.0,5.0,3.0],[0.0,1.0,7.0]]

结构——目标:双，特征:向量，selectedFeatures:向量从上面，我们以第一行为例

Structure -- target: double, features: vector, selectedFeatures: vector From the above, lets take the example of the first row

[1.0,[0.0,2.0,0.0,5.0,7.0,5.0,1.0],[0.0,5.0,7.0]]

我如何识别它在 selectedFeatures 中引用的 0.0，同样在第 5 行.

how can i identify which 0.0 it is referring to in the selectedFeatures, similarly in 5th row as well.

请帮忙..

谢谢

巴拉

推荐答案

在您的示例中:

[1.0,[0.0,2.0,0.0,5.0,7.0,5.0,1.0],[0.0,5.0,7.0]]

最后一列 [0.0,5.0,7.0] 表示所选特征的值，在本例中为特征 2、3 和 4(从 0 开始计数).要提取未来指数，只需使用

the last column [0.0,5.0,7.0] represents the values of selected features, in this case features 2, 3 and 4 (counting from 0). To extract future indices just use

val model = sel.fit(chidata)
val importantFeatures = model.selectedFeatures

这篇关于ChiSqSelector - 真实功能 - Spark的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

ChiSqSelector - 真实功能 - Spark [英] ChiSqSelector - Real Features - Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

ChiSqSelector - 真实功能 - Spark [英] ChiSqSelector - Real Features - Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭