如何炸开柱子? [英] How to explode columns?
本文介绍了如何炸开柱子?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
之后:
val df = Seq((1, Vector(2, 3, 4)), (1, Vector(2, 3, 4))).toDF("Col1", "Col2")
我在 Apache Spark 中有这个 DataFrame:
I have this DataFrame in Apache Spark:
+------+---------+
| Col1 | Col2 |
+------+---------+
| 1 |[2, 3, 4]|
| 1 |[2, 3, 4]|
+------+---------+
如何将其转换为:
+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 |
+------+------+------+------+
| 1 | 2 | 3 | 4 |
| 1 | 2 | 3 | 4 |
+------+------+------+------+
推荐答案
不与 RDD 相互转换的解决方案:
A solution that doesn't convert to and from RDD:
df.select($"Col1", $"Col2"(0) as "Col2", $"Col2"(1) as "Col3", $"Col2"(2) as "Col3")
或者可以说更好:
val nElements = 3
df.select(($"Col1" +: Range(0, nElements).map(idx => $"Col2"(idx) as "Col" + (idx + 2)):_*))
Spark 数组列的大小不是固定的,例如您可以:
The size of a Spark array column is not fixed, you could for instance have:
+----+------------+
|Col1| Col2|
+----+------------+
| 1| [2, 3, 4]|
| 1|[2, 3, 4, 5]|
+----+------------+
所以没有办法获取列的数量并创建它们.如果你知道大小总是一样的,你可以像这样设置 nElements
:
So there is no way to get the amount of columns and create those. If you know the size is always the same, you can set nElements
like this:
val nElements = df.select("Col2").first.getList(0).size
这篇关于如何炸开柱子?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文