Spark SQL 列操作 [英] Spark SQL Column Manipulation

查看:25
本文介绍了Spark SQL 列操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个低于 Cols 的数据集.

I have a Dataset which has below Below Cols .

df.show();

+--------+---------+---------+---------+---------+
|  Col1  |  Col2   | Expend1 | Expend2 | Expend3 |
+--------+---------+---------+---------+---------+
| Value1 | Cvalue1 |     123 |    2254 |      22 |
| Value1 | Cvalue2 |     124 |    2255 |      23 |
+--------+---------+---------+---------+---------+

我希望使用一些连接或多维数据集或任何操作将其更改为以下格式.

I want that to be changed to this below format using some joins or cube or any Operations.

1.

    +--------+---------+------+
    | Value1 | Cvalue1 |  123 |
    | Value1 | Cvalue1 | 2254 |
    | Value1 | Cvalue1 |   22 |
    | Value1 | Cvalue1 |  124 |
    | Value1 | Cvalue1 | 2255 |
    | Value1 | Cvalue1 |   23 |
    +--------+---------+------+

或者如果这种格式更好

2.

+--------+---------+---------+------+
| Value1 | Cvalue1 | Expend1 |  123 |
| Value1 | Cvalue1 | Expend2 | 2254 |
| Value1 | Cvalue1 | Expend3 |   22 |
| Value1 | Cvalue1 | Expend1 |  124 |
| Value1 | Cvalue1 | Expend2 | 2255 |
| Value1 | Cvalue1 | Expend3 |   23 |
+--------+---------+---------+------+

我能不能实现以上两种可能的格式.如果在#1 的情况下,我可以得到Last value 的列名,无论是Expend1 还是Expend 2 或Expend3.

Can I be able to achieve this above two Possible format. If In case of #1 , can I get the Column name of Last value , whether it is Expend1 or Expend 2 or Expend3.

推荐答案

Functions map 然后 explode 可以使用:

Functions map and then explode can be used:

val data = List(
  ("Value1", "Cvalue1", 123, 2254, 22),
  ("Value1", "Cvalue2", 124, 2255, 23)
)
val df = data.toDF("Col1", "Col2", "Expend1", "Expend2", "Expend3")

// action 
val unpivotedColumns = List("Expend1", "Expend2", "Expend3")
val columnMapping = unpivotedColumns.foldLeft(new ArrayBuffer[Column]())((acc, current) => {
  acc += lit(current)
  acc += col(current)
})
val mapped = df.select($"Col1", $"Col2", map(columnMapping: _*).alias("result"))
val result = mapped.select($"Col1", $"Col2", explode($"result"))
result.show(false)

结果是:

+------+-------+-------+-----+
|Col1  |Col2   |key    |value|
+------+-------+-------+-----+
|Value1|Cvalue1|Expend1|123  |
|Value1|Cvalue1|Expend2|2254 |
|Value1|Cvalue1|Expend3|22   |
|Value1|Cvalue2|Expend1|124  |
|Value1|Cvalue2|Expend2|2255 |
|Value1|Cvalue2|Expend3|23   |
+------+-------+-------+-----+

这篇关于Spark SQL 列操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆