如何在PySpark DataFrame中将ArrayType转换为DenseVector? [英] How to convert ArrayType to DenseVector in PySpark DataFrame?

查看：803 发布时间：2020/9/4 1:51:30 python apache-spark pyspark apache-spark-mllib apache-spark-ml

本文介绍了如何在PySpark DataFrame中将ArrayType转换为DenseVector?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

尝试建立ML Pipeline时出现以下错误:

I'm getting the following error trying to build a ML Pipeline:

pyspark.sql.utils.IllegalArgumentException: 'requirement failed: Column features must be of type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 but was actually ArrayType(DoubleType,true).'

我的features列包含浮点值的数组.听起来我需要将它们转换为某种类型的向量(它不是稀疏的，所以是DenseVector?).有没有一种方法可以直接在DataFrame上执行此操作，还是需要转换为RDD?

My features column contains an array of floating point values. It sounds like I need to convert those to some type of vector (it's not sparse, so a DenseVector?). Is there a way to do this directly on the DataFrame or do I need to convert to an RDD?

推荐答案

您可以使用UDF:

udf(lambda vs: Vectors.dense(vs), VectorUDT())

在Spark中< 2.0导入:

In Spark < 2.0 import:

from pyspark.mllib.linalg import Vectors, VectorUDT

在Spark 2.0+导入中:

In Spark 2.0+ import:

from pyspark.ml.linalg import Vectors, VectorUDT

请注意，尽管实现相同，但这些类仍不兼容.

Please note that these classes are not compatible despite identical implementation.

也可以提取单个特征并与VectorAssembler组合.假设输入列称为features:

It is also possible to extract individual features and assemble with VectorAssembler. Assuming input column is called features:

from pyspark.ml.feature import VectorAssembler

n = ... # Size of features

assembler = VectorAssembler(
    inputCols=["features[{0}]".format(i) for i in range(n)], 
    outputCol="features_vector")

assembler.transform(df.select(
    "*", *(df["features"].getItem(i) for i in range(n))
))

这篇关于如何在PySpark DataFrame中将ArrayType转换为DenseVector?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在PySpark DataFrame中将ArrayType转换为DenseVector? [英] How to convert ArrayType to DenseVector in PySpark DataFrame?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在PySpark DataFrame中将ArrayType转换为DenseVector? [英] How to convert ArrayType to DenseVector in PySpark DataFrame?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭