如何将向量列拆分为两列? [英] How to split column of vectors into two columns?

查看:92
本文介绍了如何将向量列拆分为两列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用PySpark.

I use PySpark.

Spark ML的随机森林"输出DataFrame具有概率"列,该列是具有两个值的向量.我只想在输出DataFrame中添加两列"prob1"和"prob2",它们对应于向量中的第一个和第二个值.

Spark ML's Random Forest output DataFrame has a column "probability" which is a vector with two values. I just want to add two columns to the output DataFrame, "prob1" and "prob2", which correspond to the first and second values in the vector.

我尝试了以下操作:

output2 = output.withColumn('prob1', output.map(lambda r: r['probability'][0]))

但是我得到"col应该是列"的错误.

but I get the error that 'col should be Column'.

关于如何将向量列转换为其值列的任何建议?

Any suggestions on how to transform a column of vectors into columns of its values?

推荐答案

遇到了同样的问题,以下是针对具有n长度向量的情况进行调整的代码.

Got the same problem, below is the code adjusted for the situation when you have n-length vector.

splits = [udf(lambda value: value[i].item(), FloatType()) for i in range(n)]
out =  tstDF.select(*[s('features').alias("Column"+str(i)) for i, s in enumerate(splits)])

这篇关于如何将向量列拆分为两列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆