如何在Spark窗口函数中以降序使用orderby()? [英] How to use orderby() with descending order in Spark window functions?

查看:1089
本文介绍了如何在Spark窗口函数中以降序使用orderby()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个窗口函数,该函数按某些键(=列名)进行分区,按另一个列名进行排序,并返回具有x最高排名的行.

I need a window function that partitions by some keys (=column names), orders by another column name and returns the rows with top x ranks.

这对于升序工作正常:

def getTopX(df: DataFrame, top_x: String, top_key: String, top_value:String): DataFrame ={
    val top_keys: List[String] = top_key.split(", ").map(_.trim).toList
    val w = Window.partitionBy(top_keys(1),top_keys.drop(1):_*)
       .orderBy(top_value)
    val rankCondition = "rn < "+top_x.toString
    val dfTop = df.withColumn("rn",row_number().over(w))
      .where(rankCondition).drop("rn")
  return dfTop
}

但是当我尝试在第4行中将其更改为orderBy(desc(top_value))orderBy(top_value.desc)时,出现语法错误.正确的语法是什么?

But when I try to change it to orderBy(desc(top_value)) or orderBy(top_value.desc) in line 4, I get a syntax error. What's the correct syntax here?

推荐答案

orderBy有两种版本,一种用于字符串,一种用于Column对象(API ).您的代码使用的是第一个版本,该版本不允许更改排序顺序.您需要切换到列版本,然后调用desc方法,例如myCol.desc.

There are two versions of orderBy, one that works with strings and one that works with Column objects (API). Your code is using the first version, which does not allow for changing the sort order. You need to switch to the column version and then call the desc method, e.g., myCol.desc.

现在,我们进入API设计领域.传递Column参数的优点是您具有更大的灵活性,例如,可以使用表达式等.如果要维护一个采用字符串而不是Column的API,则需要转换字符串到一列.有很多方法可以做到这一点,最简单的方法就是使用org.apache.spark.sql.functions.col(myColName).

Now, we get into API design territory. The advantage of passing Column parameters is that you have a lot more flexibility, e.g., you can use expressions, etc. If you want to maintain an API that takes in a string as opposed to a Column, you need to convert the string to a column. There are a number of ways to do this and the easiest is to use org.apache.spark.sql.functions.col(myColName).

将它们放在一起,我们得到

Putting it all together, we get

.orderBy(org.apache.spark.sql.functions.col(top_value).desc)

这篇关于如何在Spark窗口函数中以降序使用orderby()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆