我们如何对数据框进行排名? [英] How do we rank dataframe?

查看：57 发布时间：2020/9/4 6:03:37 scala apache-spark apache-spark-sql

本文介绍了我们如何对数据框进行排名?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有如下示例数据框:

i/p

accountNumber   assetValue  
A100            1000         
A100            500          
B100            600          
B100            200

o/p

AccountNumber   assetValue  Rank
A100            1000         1
A100            500          2
B100            600          1
B100            200          2

现在我的问题是我们如何在数据帧上添加此等级列，该列按帐号排序.如果我需要在数据框之外进行操作，我并不期望会有如此之多的行.

Now my question is how do we add this rank column on dataframe which is sorted by account number. I am not expecting huge volume of rows so open to idea if I need to do it outside of dataframe.

我正在使用Spark版本1.5和SQLContext，因此无法使用Windows函数

I am using Spark version 1.5 and SQLContext hence cannot use Windows function

推荐答案

您可以使用row_number函数和Window表达式来指定partition和order列:

You can use row_number function and Window expression with which you can specify the partition and order columns:

import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.row_number

val df = Seq(("A100", 1000), ("A100", 500), ("B100", 600), ("B100", 200)).toDF("accountNumber", "assetValue")

df.withColumn("rank", row_number().over(Window.partitionBy($"accountNumber").orderBy($"assetValue".desc))).show

+-------------+----------+----+
|accountNumber|assetValue|rank|
+-------------+----------+----+
|         A100|      1000|   1|
|         A100|       500|   2|
|         B100|       600|   1|
|         B100|       200|   2|
+-------------+----------+----+

这篇关于我们如何对数据框进行排名?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

我们如何对数据框进行排名? [英] How do we rank dataframe?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

我们如何对数据框进行排名? [英] How do we rank dataframe?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭