在PySpark数据帧上进行自定义聚合 [英] Custom aggregation on PySpark dataframes

查看：112 发布时间：2020/6/2 20:45:53 apache-spark pyspark apache-spark-sql aggregate-functions user-defined-functions

本文介绍了在PySpark数据帧上进行自定义聚合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个PySpark DataFrame，其中一列是一种热编码向量。我想在groupby

I have a PySpark DataFrame with one column as one hot encoded vectors. I want to aggregate the different one hot encoded vectors by vector addition after groupby

e.g之后通过矢量加法来聚合不同的一个热编码矢量。 df [userid，action]第1行：[ 1234， [1,0,0]]第2行：[ 1234，[0 1 0]]

e.g. df[userid,action] Row1: ["1234","[1,0,0]] Row2: ["1234", [0 1 0]]

我希望将输出作为行： [ 1234，[1 1 0]] 所以向量是一个和由 userid 分组的所有向量中。

I want the output as row: ["1234", [ 1 1 0]] so the vector is a sum of all vectors grouped by userid.

我如何实现此目标？PySpark sum汇总操作不支持该向量

How can I achieve this? PySpark sum aggregate operation does not support the vector addition.

在PySpark数据帧上进行自定义聚合 [英] Custom aggregation on PySpark dataframes

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在PySpark数据帧上进行自定义聚合 [英] Custom aggregation on PySpark dataframes

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭