Spark SQL列操作 [英] Spark SQL Column Manipulation
本文介绍了Spark SQL列操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个数据集,该数据集的下方低于下方.
I have a Dataset which has below Below Cols .
df.show();
+--------+---------+---------+---------+---------+
| Col1 | Col2 | Expend1 | Expend2 | Expend3 |
+--------+---------+---------+---------+---------+
| Value1 | Cvalue1 | 123 | 2254 | 22 |
| Value1 | Cvalue2 | 124 | 2255 | 23 |
+--------+---------+---------+---------+---------+
我希望使用某些联接或多维数据集或任何操作将其更改为以下格式.
I want that to be changed to this below format using some joins or cube or any Operations.
1.
+--------+---------+------+
| Value1 | Cvalue1 | 123 |
| Value1 | Cvalue1 | 2254 |
| Value1 | Cvalue1 | 22 |
| Value1 | Cvalue1 | 124 |
| Value1 | Cvalue1 | 2255 |
| Value1 | Cvalue1 | 23 |
+--------+---------+------+
或者如果使用这种格式,则更好
or Better if this format
2.
+--------+---------+---------+------+
| Value1 | Cvalue1 | Expend1 | 123 |
| Value1 | Cvalue1 | Expend2 | 2254 |
| Value1 | Cvalue1 | Expend3 | 22 |
| Value1 | Cvalue1 | Expend1 | 124 |
| Value1 | Cvalue1 | Expend2 | 2255 |
| Value1 | Cvalue1 | Expend3 | 23 |
+--------+---------+---------+------+
我能否实现以上两种可能的格式.如果在#1的情况下,我是否可以获取Last value的列名称,无论它是Expend1还是Expend 2还是Expend3.
Can I be able to achieve this above two Possible format. If In case of #1 , can I get the Column name of Last value , whether it is Expend1 or Expend 2 or Expend3.
推荐答案
功能 map
然后 explode
可以使用:
val data = List(
("Value1", "Cvalue1", 123, 2254, 22),
("Value1", "Cvalue2", 124, 2255, 23)
)
val df = data.toDF("Col1", "Col2", "Expend1", "Expend2", "Expend3")
// action
val unpivotedColumns = List("Expend1", "Expend2", "Expend3")
val columnMapping = unpivotedColumns.foldLeft(new ArrayBuffer[Column]())((acc, current) => {
acc += lit(current)
acc += col(current)
})
val mapped = df.select($"Col1", $"Col2", map(columnMapping: _*).alias("result"))
val result = mapped.select($"Col1", $"Col2", explode($"result"))
result.show(false)
结果是:
+------+-------+-------+-----+
|Col1 |Col2 |key |value|
+------+-------+-------+-----+
|Value1|Cvalue1|Expend1|123 |
|Value1|Cvalue1|Expend2|2254 |
|Value1|Cvalue1|Expend3|22 |
|Value1|Cvalue2|Expend1|124 |
|Value1|Cvalue2|Expend2|2255 |
|Value1|Cvalue2|Expend3|23 |
+------+-------+-------+-----+
这篇关于Spark SQL列操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文