在火花标度中枢轴 [英] Pivot in spark scala
本文介绍了在火花标度中枢轴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个这样的 df.
I have a df like this.
+---+-----+-----+----+
| M|M_Max|Sales|Rank|
+---+-----+-----+----+
| M1| 100| 200| 1|
| M1| 100| 175| 2|
| M1| 101| 150| 3|
| M1| 100| 125| 4|
| M1| 100| 90| 5|
| M1| 100| 85| 6|
| M2| 200| 1001| 1|
| M2| 200| 500| 2|
| M2| 201| 456| 3|
| M2| 200| 345| 4|
| M2| 200| 231| 5|
| M2| 200| 123| 6|
+---+-----+-----+----+
我正在像这样在这个 df 之上做一个枢轴操作.
I am doing a pivot operation on top of this df like this.
df.groupBy("M").pivot("Rank").agg(first("Sales")).show
+---+----+---+---+---+---+---+
| M| 1| 2| 3| 4| 5| 6|
+---+----+---+---+---+---+---+
| M1| 200|175|150|125| 90| 85|
| M2|1001|500|456|345|231|123|
+---+----+---+---+---+---+---+
但我的预期输出如下.这意味着我需要在输出中获取列 - Max(M_Max).
But my expected output is like below. This means I need to get the column - Max(M_Max) in the output.
这里 M_Max 是列的最大值 - M_Max.我的预期输出如下.在不使用 df 连接的情况下,使用 Pivot 函数可以做到这一点吗?
Here M_Max is the max of column - M_Max. My Expected Output is like below. is this possible with Pivot function without using df joins.?
+---+----+---+---+---+---+---+-----+
| M| 1| 2| 3| 4| 5| 6|M_Max|
+---+----+---+---+---+---+---+-----+
| M1| 200|175|150|125| 90| 85| 101|
| M2|1001|500|456|345|231|123| 201|
+---+----+---+---+---+---+---+-----+
推荐答案
诀窍是应用窗口函数.解决方法如下:
The trick is to apply window functions. The solution is given below:
scala> val df = Seq(
| | ("M1",100,200,1),
| | ("M1",100,175,2),
| | ("M1",101,150,3),
| | ("M1",100,125,4),
| | ("M1",100,90,5),
| | ("M1",100,85,6),
| | ("M2",200,1001,1),
| | ("M2",200,500,2),
| | ("M2",200,456,3),
| | ("M2",200,345,4),
| | ("M2",200,231,5),
| | ("M2",201,123,6)
| | ).toDF("M","M_Max","Sales","Rank")
df: org.apache.spark.sql.DataFrame = [M: string, M_Max: int ... 2 more fields]
scala> import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.expressions.Window
scala> val w = Window.partitionBy("M")
w: org.apache.spark.sql.expressions.WindowSpec = org.apache.spark.sql.expressions.WindowSpec@49b4e11c
scala> df.withColumn("new", max("M_Max") over (w)).groupBy("M", "new").pivot("Rank").agg(first("Sales")).withColumnRenamed("new", "M_Max").show
+---+-----+----+---+---+---+---+---+
| M|M_Max| 1| 2| 3| 4| 5| 6|
+---+-----+----+---+---+---+---+---+
| M1| 101| 200|175|150|125| 90| 85|
| M2| 201|1001|500|456|345|231|123|
+---+-----+----+---+---+---+---+---+
scala> df.show
+---+-----+-----+----+
| M|M_Max|Sales|Rank|
+---+-----+-----+----+
| M1| 100| 200| 1|
| M1| 100| 175| 2|
| M1| 101| 150| 3|
| M1| 100| 125| 4|
| M1| 100| 90| 5|
| M1| 100| 85| 6|
| M2| 200| 1001| 1|
| M2| 200| 500| 2|
| M2| 200| 456| 3|
| M2| 200| 345| 4|
| M2| 200| 231| 5|
| M2| 201| 123| 6|
+---+-----+-----+----+
如果有帮助,请告诉我!!
Let me know if it helps!!
这篇关于在火花标度中枢轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文