火花sql窗口功能滞后 [英] spark sql window function lag
本文介绍了火花sql窗口功能滞后的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在查看Scala中Spark DataFrame
的window
幻灯片功能.
I am looking at the window
slide function for a Spark DataFrame
in Scala.
我有一个DataFrame
,其中有Col1
,Col2
,Col3
,date
,volume
和new_col
列.
I have a DataFrame
with columns Col1
, Col2
, Col3
, date
, volume
and new_col
.
Col1 Col2 Col3 date volume new_col
201601 100.5
201602 120.6 100.5
201603 450.2 120.6
201604 200.7 450.2
201605 121.4 200.7`
现在,我想添加一个名称为(new_col
)的新列,并向下滑动一行,如上所示.
Now I want to add a new column with name(new_col
) with one row slided down, as shown above.
我尝试了以下使用窗口功能的选项.
I tried below option to use the window function.
val windSldBrdrxNrx_df = df.withColumn("Prev_brand_rx", lag("Prev_brand_rx",1))
您有什么建议吗?
推荐答案
您所做的正确是lag
val df = sc.parallelize(Seq((201601, 100.5),
(201602, 120.6),
(201603, 450.2),
(201604, 200.7),
(201605, 121.4))).toDF("date", "volume")
val w = org.apache.spark.sql.expressions.Window.orderBy("date")
import org.apache.spark.sql.functions.lag
val leadDf = df.withColumn("new_col", lag("volume", 1, 0).over(w))
leadDf.show()
+------+------+-------+
| date|volume|new_col|
+------+------+-------+
|201601| 100.5| 0.0|
|201602| 120.6| 100.5|
|201603| 450.2| 120.6|
|201604| 200.7| 450.2|
|201605| 121.4| 200.7|
+------+------+-------+
此代码在Spark shell 2.0.2上运行
This code was run on Spark shell 2.0.2
这篇关于火花sql窗口功能滞后的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文