Spark ML VectorAssembler返回奇怪的输出 [英] Spark ML VectorAssembler returns strange output

查看：304 发布时间：2020/9/4 0:26:02 scala apache-spark apache-spark-mllib apache-spark-ml

本文介绍了Spark ML VectorAssembler返回奇怪的输出的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在经历VectorAssembler的一个非常奇怪的行为，我想知道是否还有其他人看到过此情况.

I am experiencing a very strange behaviour from VectorAssembler and I was wondering if anyone else has seen this.

我的情况非常简单.我从CSV文件中解析数据，其中有一些标准的Int和Double字段，并且我还计算了一些额外的列.我的解析函数返回以下内容:

My scenario is pretty straightforward. I parse data from a CSV file where I have some standard Int and Double fields and I also calculate some extra columns. My parsing function returns this:

val joined = countPerChannel ++ countPerSource //two arrays of Doubles joined
(label, orderNo, pageNo, Vectors.dense(joinedCounts))

我的主要功能使用如下解析功能:

My main function uses the parsing function like this:

val parsedData = rawData.filter(row => row != header).map(parseLine)
val data = sqlContext.createDataFrame(parsedData).toDF("label", "orderNo", "pageNo","joinedCounts")

然后我像这样使用VectorAssembler:

val assembler = new VectorAssembler()
                           .setInputCols(Array("orderNo", "pageNo", "joinedCounts"))
                           .setOutputCol("features")

val assemblerData = assembler.transform(data)

因此，当我将一行数据打印到VectorAssembler中之前，它看起来像这样:

So when I print a row of my data before it goes into the VectorAssembler it looks like this:

[3.2,17.0,15.0,[0.0,0.0,0.0,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,4.0,0.0,0.0,2.0]]

在VectorAssembler的转换功能之后，我打印了同一行数据并得到了:

After the transform function of VectorAssembler I print the same row of data and get this:

[3.2,(18,[0,1,6,9,14,17],[17.0,15.0,3.0,1.0,4.0,2.0])]

到底是怎么回事? VectorAssembler做了什么?我仔细检查了所有计算，甚至遵循了简单的Spark示例，但看不到我的代码出了什么问题.你能?

What on earth is going on? What has the VectorAssembler done? I 've double checked all the calculations and even followed the simple Spark examples and cannot see what is wrong with my code. Can you?

Spark ML VectorAssembler返回奇怪的输出 [英] Spark ML VectorAssembler returns strange output

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark ML VectorAssembler返回奇怪的输出 [英] Spark ML VectorAssembler returns strange output

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭