如何合并在数据帧的多个特征向量？ [英] How to merge multiple feature vectors in DataFrame?

查看：521 发布时间：2016/5/22 15:19:57 apache-spark machine-learning apache-spark-sql

本文介绍了如何合并在数据帧的多个特征向量？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我来到了一个数据帧，每行看起来是这样的：

using Spark ML transformers I arrived at a DataFrame where each row looks like this:

Row(object_id, text_features_vector, color_features, type_features)

其中， text_features 是项权重的稀疏向量， color_features 是一个小的20元（一热-en codeR）的颜色密集向量和 type_features 也是种一热恩codeR密集的载体。

where text_features is a sparse vector of term weights, color_features is a small 20-element (one-hot-encoder) dense vector of colors, and type_features is also a one-hot-encoder dense vector of types.

什么会一个好方法是（用火花的设施）在一个单一的，大阵合并这些功能，让我衡量任何两个物体之间的事情就像在余弦距离

What would a good approach be (using spark's facilities) to merge these features in one single, large array, so that I measure things like the cosine distance between any two objects?

推荐答案

您可以使用的 VectorAssembler ：

import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.sql.DataFrame

val df: DataFrame = ???

val assembler = new VectorAssembler()
  .setInputCols(Array("text_features", "color_features", "type_features"))
  .setOutputCol("features")

val transformed = assembler.transform(df)

有关PySpark例子中看到：在PySpark 恩code和组装多种功能

For PySpark example see: Encode and assemble multiple features in PySpark

这篇关于如何合并在数据帧的多个特征向量？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何合并在数据帧的多个特征向量？ [英] How to merge multiple feature vectors in DataFrame?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

如何合并在数据帧的多个特征向量？ [英] How to merge multiple feature vectors in DataFrame?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭