ChiSqSelector - 真实功能 - Spark [英] ChiSqSelector - Real Features - Spark

查看:19
本文介绍了ChiSqSelector - 真实功能 - Spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Spark 1.6 构建一个 NB 模型,并使用 ChiSqSelector 来识别主要特征.我总共有 7 个特征并寻找前 3 个特征.虽然过程运行良好,但我将如何识别被评为顶级特征的实际特征.由于数据已分类,我无法将输出映射到实际输入列.

I am building a NB model with Spark 1.6 and using ChiSqSelector to identify the top features. I have a total of 7 features and looking for the top 3. While the process runs fine, how will i identify the actual feature that has been rated as the top feature. Since the data is categorized, i am not able to map the output to the actual input column.

val chidata = cat_recs.map(r => (r.getDouble(targetInd), Vectors.dense(featuresidx.map(r.getDouble(_)).toArray))).toDF("target","features")
val sel = new ChiSqSelector().setNumTopFeatures(3).setFeaturesCol("features").setLabelCol("target").setOutputCol("selectedFeatuers")
val chiresult = sel.fit(chidata).transform(chidata)

输出为

scala> chiresult.foreach(println)
[1.0,[0.0,2.0,0.0,5.0,7.0,5.0,1.0],[0.0,5.0,7.0]]
[1.0,[4.0,3.0,0.0,5.0,7.0,5.0,1.0],[0.0,5.0,7.0]]
[0.0,[3.0,2.0,0.0,5.0,7.0,5.0,3.0],[0.0,5.0,7.0]]
[1.0,[1.0,2.0,0.0,1.0,7.0,5.0,2.0],[0.0,1.0,7.0]]
[1.0,[0.0,2.0,0.0,1.0,7.0,5.0,3.0],[0.0,1.0,7.0]]

结构——目标:双,特征:向量,selectedFeatures:向量从上面,我们以第一行为例

Structure -- target: double, features: vector, selectedFeatures: vector From the above, lets take the example of the first row

[1.0,[0.0,2.0,0.0,5.0,7.0,5.0,1.0],[0.0,5.0,7.0]]

我如何识别它在 selectedFeatures 中引用的 0.0,同样在第 5 行.

how can i identify which 0.0 it is referring to in the selectedFeatures, similarly in 5th row as well.

请帮忙..

谢谢

巴拉

推荐答案

在您的示例中:

[1.0,[0.0,2.0,0.0,5.0,7.0,5.0,1.0],[0.0,5.0,7.0]]

最后一列 [0.0,5.0,7.0] 表示所选特征的值,在本例中为特征 2、3 和 4(从 0 开始计数).要提取未来指数,只需使用

the last column [0.0,5.0,7.0] represents the values of selected features, in this case features 2, 3 and 4 (counting from 0). To extract future indices just use

val model = sel.fit(chidata)
val importantFeatures = model.selectedFeatures

这篇关于ChiSqSelector - 真实功能 - Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆