理解 Spark SQL 中向量列的表示 [英] Understanding Representation of Vector Column in Spark SQL

查看：38 发布时间：2021/11/14 20:58:55 apache-spark apache-spark-sql apache-spark-mllib apache-spark-ml

本文介绍了理解 Spark SQL 中向量列的表示的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在我使用 VectorAssembler() 来整合一些 OneHotEncoded 分类特征之前...我的数据框看起来像这样:

Before I used VectorAssembler() to consolidate some OneHotEncoded categorical features... My data frame looked like so :

|  Numerical|  HotEncoded1|   HotEncoded2
|  14460.0|    (44,[5],[1.0])|     (3,[0],[1.0])|
|  14460.0|    (44,[9],[1.0])|     (3,[0],[1.0])|
|  15181.0|    (44,[1],[1.0])|     (3,[0],[1.0])|

第一列是数字列，另外两列表示 OneHotEncoded 分类特征的转换数据集.应用 VectorAssembler() 后，我的输出变为:

The first column is a numerical column and the other two columns represent the transformed data set for OneHotEncoded categorical features. After applying VectorAssembler(), my output becomes:

[(48,[0,1,9],[14460.0,1.0,1.0])]
[(48,[0,3,25],[12827.0,1.0,1.0])]
[(48,[0,1,18],[12828.0,1.0,1.0])]

我不确定这些数字的含义，也无法理解这个转换后的数据集.对此输出意味着什么进行一些澄清会很棒！

I am unsure of what these numbers mean and cannot make sense of this transformed data set. Some clarification on what this output means would be great!

理解 Spark SQL 中向量列的表示 [英] Understanding Representation of Vector Column in Spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

理解 Spark SQL 中向量列的表示 [英] Understanding Representation of Vector Column in Spark SQL

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭