如何在 Spark 窗口函数中以降序使用 orderby()? [英] How to use orderby() with descending order in Spark window functions?

查看:35
本文介绍了如何在 Spark 窗口函数中以降序使用 orderby()?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个窗口函数,它按某些键(=列名)进行分区,按另一个列名排序并返回排名最高的行.

I need a window function that partitions by some keys (=column names), orders by another column name and returns the rows with top x ranks.

这适用于升序:

def getTopX(df: DataFrame, top_x: String, top_key: String, top_value:String): DataFrame ={
    val top_keys: List[String] = top_key.split(", ").map(_.trim).toList
    val w = Window.partitionBy(top_keys(1),top_keys.drop(1):_*)
       .orderBy(top_value)
    val rankCondition = "rn < "+top_x.toString
    val dfTop = df.withColumn("rn",row_number().over(w))
      .where(rankCondition).drop("rn")
  return dfTop
}

但是当我尝试在第 4 行将其更改为 orderBy(desc(top_value))orderBy(top_value.desc) 时,出现语法错误.这里的正确语法是什么?

But when I try to change it to orderBy(desc(top_value)) or orderBy(top_value.desc) in line 4, I get a syntax error. What's the correct syntax here?

推荐答案

orderBy 有两种版本,一种适用于字符串,一种适用于 Column 对象(API).您的代码使用的是第一个版本,该版本不允许更改排序顺序.需要切换到列版本,然后调用desc方法,例如myCol.desc.

There are two versions of orderBy, one that works with strings and one that works with Column objects (API). Your code is using the first version, which does not allow for changing the sort order. You need to switch to the column version and then call the desc method, e.g., myCol.desc.

现在,我们进入 API 设计领域.传递 Column 参数的好处是你有更多的灵活性,例如,你可以使用表达式等.如果你想维护一个接受字符串而不是 的 APIColumn,需要将字符串转换为列.有很多方法可以做到这一点,最简单的方法是使用 org.apache.spark.sql.functions.col(myColName).

Now, we get into API design territory. The advantage of passing Column parameters is that you have a lot more flexibility, e.g., you can use expressions, etc. If you want to maintain an API that takes in a string as opposed to a Column, you need to convert the string to a column. There are a number of ways to do this and the easiest is to use org.apache.spark.sql.functions.col(myColName).

综合起来,我们得到

.orderBy(org.apache.spark.sql.functions.col(top_value).desc)

这篇关于如何在 Spark 窗口函数中以降序使用 orderby()?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆