如何将向量列拆分为两列? [英] How to split column of vectors into two columns?
本文介绍了如何将向量列拆分为两列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我使用PySpark.
I use PySpark.
Spark ML的随机森林"输出DataFrame具有概率"列,该列是具有两个值的向量.我只想在输出DataFrame中添加两列"prob1"和"prob2",它们对应于向量中的第一个和第二个值.
Spark ML's Random Forest output DataFrame has a column "probability" which is a vector with two values. I just want to add two columns to the output DataFrame, "prob1" and "prob2", which correspond to the first and second values in the vector.
我尝试了以下操作:
output2 = output.withColumn('prob1', output.map(lambda r: r['probability'][0]))
但是我得到"col应该是列"的错误.
but I get the error that 'col should be Column'.
关于如何将向量列转换为其值列的任何建议?
Any suggestions on how to transform a column of vectors into columns of its values?
推荐答案
遇到了同样的问题,以下是针对具有n长度向量的情况进行调整的代码.
Got the same problem, below is the code adjusted for the situation when you have n-length vector.
splits = [udf(lambda value: value[i].item(), FloatType()) for i in range(n)]
out = tstDF.select(*[s('features').alias("Column"+str(i)) for i, s in enumerate(splits)])
这篇关于如何将向量列拆分为两列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文