如何将当前行的值除以下一个? [英] How to divide the value of current row with the following one?
问题描述
在使用DataFrame
s的Spark-Sql 1.6版中,是否有一种方法可以为特定列计算每一行的当前行与下一行的除法之比?
In Spark-Sql version 1.6, using DataFrame
s, is there a way to calculate, for a specific column, the fraction of dividing current row and the next one, for every row?
例如,如果我有一个只有一列的表,就像这样
For example, if I have a table with one column, like so
Age
100
50
20
4
我想要以下输出
Franction
2
2.5
5
由于没有要添加的下一行",因此最后一行被删除.
The last row is dropped because it has no "next row" to be added to.
现在,我通过对表进行排名并将其与自身连接在一起来实现它,其中rank
等于rank+1
.
Right now I am doing it by ranking the table and joining it with itself, where the rank
is equals to rank+1
.
是否有更好的方法可以做到这一点?
可以使用Window
函数吗?
Is there a better way to do this?
Can this be done with a Window
function?
推荐答案
Window
函数应仅执行部分技巧.其他局部技巧可以通过定义udf
函数
Window
function should do only partial tricks. Other partial trick can be done by defining a udf
function
def div = udf((age: Double, lag: Double) => lag/age)
首先,我们需要使用Window
函数找到lag
,然后将lag
和age
传递给udf
函数以找到div
导入sqlContext.implicits._
导入org.apache.spark.sql.functions ._
First we need to find the lag
using Window
function and then pass that lag
and age
in udf
function to find the div
import sqlContext.implicits._
import org.apache.spark.sql.functions._
val dataframe = Seq(
("A",100),
("A",50),
("A",20),
("A",4)
).toDF("person", "Age")
val windowSpec = Window.partitionBy("person").orderBy(col("Age").desc)
val newDF = dataframe.withColumn("lag", lag(dataframe("Age"), 1) over(windowSpec))
最后校准udf函数
newDF.filter(newDF("lag").isNotNull).withColumn("div", div(newDF("Age"), newDF("lag"))).drop("Age", "lag").show
最终输出应为
+------+---+
|person|div|
+------+---+
| A|2.0|
| A|2.5|
| A|5.0|
+------+---+
已编辑
由于@Jacek建议使用.na.drop
代替.filter(newDF("lag").isNotNull)
并使用/
运算符是一个更好的解决方案,因此我们甚至不需要调用udf
函数
Edited
As @Jacek has suggested a better solution to use .na.drop
instead of .filter(newDF("lag").isNotNull)
and use /
operator , so we don't even need to call the udf
function
newDF.na.drop.withColumn("div", newDF("lag")/newDF("Age")).drop("Age", "lag").show
这篇关于如何将当前行的值除以下一个?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!