如何在 Spark 1.5 中转置数据帧(没有可用的枢轴运算符)? [英] How to transpose dataframe in Spark 1.5 (no pivot operator available)?
问题描述
我想使用没有 Pivot 功能的 spark scala 转置下表
我使用的是 Spark 1.5.1,但 1.5.1 不支持 Pivot 功能.请推荐合适的方法来转置下表:
I am using Spark 1.5.1 and Pivot function does not support in 1.5.1. Please suggest suitable method to transpose following table:
Customer Day Sales
1 Mon 12
1 Tue 10
1 Thu 15
1 Fri 2
2 Sun 10
2 Wed 5
2 Thu 4
2 Fri 3
输出表:
Customer Sun Mon Tue Wed Thu Fri
1 0 12 10 0 15 2
2 10 0 0 5 4 3
以下代码不起作用,因为我使用的是 Spark 1.5.1 并且数据透视功能可从 Spark 1.6 获得:
Following code is not working as I am using Spark 1.5.1 and pivot function is available from Spark 1.6:
var Trans = Cust_Sales.groupBy("Customer").Pivot("Day").sum("Sales")
推荐答案
不确定效率如何,但您可以使用 collect
获取所有不同的日期,然后添加这些列,然后使用 groupBy
和 sum
:
Not sure how efficient that is, but you can use collect
to get all the distinct days, and then add these columns, then use groupBy
and sum
:
// get distinct days from data (this assumes there are not too many of them):
val days: Array[String] = df.select("Day")
.distinct()
.collect()
.map(_.getAs[String]("Day"))
// add column for each day with the Sale value if days match:
val withDayColumns = days.foldLeft(df) {
case (data, day) => data.selectExpr("*", s"IF(Day = '$day', Sales, 0) AS $day")
}
// wrap it up
val result = withDayColumns
.drop("Day")
.drop("Sales")
.groupBy("Customer")
.sum(days: _*)
result.show()
打印(几乎)您想要的内容:
Which prints (almost) what you wanted:
+--------+--------+--------+--------+--------+--------+--------+
|Customer|sum(Tue)|sum(Thu)|sum(Sun)|sum(Fri)|sum(Mon)|sum(Wed)|
+--------+--------+--------+--------+--------+--------+--------+
| 1| 10| 15| 0| 2| 12| 0|
| 2| 0| 4| 10| 3| 0| 5|
+--------+--------+--------+--------+--------+--------+--------+
如果需要,我会让你重命名/重新排序列.
I'll leave it to you to rename / reorder the columns if needed.
这篇关于如何在 Spark 1.5 中转置数据帧(没有可用的枢轴运算符)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!