Spark SQL 列操作 [英] Spark SQL Column Manipulation
本文介绍了Spark SQL 列操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个低于 Cols 的数据集.
I have a Dataset which has below Below Cols .
df.show();
+--------+---------+---------+---------+---------+
| Col1 | Col2 | Expend1 | Expend2 | Expend3 |
+--------+---------+---------+---------+---------+
| Value1 | Cvalue1 | 123 | 2254 | 22 |
| Value1 | Cvalue2 | 124 | 2255 | 23 |
+--------+---------+---------+---------+---------+
我希望使用一些连接或多维数据集或任何操作将其更改为以下格式.
I want that to be changed to this below format using some joins or cube or any Operations.
1.
+--------+---------+------+
| Value1 | Cvalue1 | 123 |
| Value1 | Cvalue1 | 2254 |
| Value1 | Cvalue1 | 22 |
| Value1 | Cvalue1 | 124 |
| Value1 | Cvalue1 | 2255 |
| Value1 | Cvalue1 | 23 |
+--------+---------+------+
或者如果这种格式更好
2.
+--------+---------+---------+------+
| Value1 | Cvalue1 | Expend1 | 123 |
| Value1 | Cvalue1 | Expend2 | 2254 |
| Value1 | Cvalue1 | Expend3 | 22 |
| Value1 | Cvalue1 | Expend1 | 124 |
| Value1 | Cvalue1 | Expend2 | 2255 |
| Value1 | Cvalue1 | Expend3 | 23 |
+--------+---------+---------+------+
我能不能实现以上两种可能的格式.如果在#1 的情况下,我可以得到Last value 的列名,无论是Expend1 还是Expend 2 或Expend3.
Can I be able to achieve this above two Possible format. If In case of #1 , can I get the Column name of Last value , whether it is Expend1 or Expend 2 or Expend3.
推荐答案
Functions map
然后 explode
可以使用:
Functions map
and then explode
can be used:
val data = List(
("Value1", "Cvalue1", 123, 2254, 22),
("Value1", "Cvalue2", 124, 2255, 23)
)
val df = data.toDF("Col1", "Col2", "Expend1", "Expend2", "Expend3")
// action
val unpivotedColumns = List("Expend1", "Expend2", "Expend3")
val columnMapping = unpivotedColumns.foldLeft(new ArrayBuffer[Column]())((acc, current) => {
acc += lit(current)
acc += col(current)
})
val mapped = df.select($"Col1", $"Col2", map(columnMapping: _*).alias("result"))
val result = mapped.select($"Col1", $"Col2", explode($"result"))
result.show(false)
结果是:
+------+-------+-------+-----+
|Col1 |Col2 |key |value|
+------+-------+-------+-----+
|Value1|Cvalue1|Expend1|123 |
|Value1|Cvalue1|Expend2|2254 |
|Value1|Cvalue1|Expend3|22 |
|Value1|Cvalue2|Expend1|124 |
|Value1|Cvalue2|Expend2|2255 |
|Value1|Cvalue2|Expend3|23 |
+------+-------+-------+-----+
这篇关于Spark SQL 列操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文