如何对apache spark scala中多列的数据进行排序? [英] How to sort the data on multiple columns in apache spark scala?

查看：39 发布时间：2021/7/15 20:58:03 scala apache-spark

本文介绍了如何对apache spark scala中多列的数据进行排序?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有这样的数据集，我从 csv 文件中获取并使用 scala 将其转换为 RDD.

I have data set like this which I am taking from csv file and converting it into RDD using scala.

+-----------+-----------+----------+
| recent    | Freq      | Monitor  |
+-----------+-----------+----------+
|        1  |       1234 |   199090|
|        4  |       2553|    198613|
|        6  |       3232 |   199090|
|        1  |       8823 |   498831|
|        7  |       2902 |   890000|
|        8  |       7991 |   081097|
|        9  |       7391 |   432370|
|        12 |       6138 |   864981|
|        7  |       6812 |   749821|
+-----------+-----------+----------+

如何对所有列的数据进行排序?

How to sort the data on all columns ?

谢谢

推荐答案

假设你的输入 RDD/DataFrame 被称为 df.

Suppose your input RDD/DataFrame is called df.

要按降序对 recent 排序，Freq 和 Monitor 都按升序排序，您可以这样做:

To sort recent in descending order, Freq and Monitor both in ascending you can do:

import org.apache.spark.sql.functions._

val sorted = df.sort(desc("recent"), asc("Freq"), asc("Monitor"))

您也可以使用 df.orderBy(...)，它是 sort() 的别名.

You can use df.orderBy(...) as well, it's an alias of sort().

这篇关于如何对apache spark scala中多列的数据进行排序?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何对apache spark scala中多列的数据进行排序? [英] How to sort the data on multiple columns in apache spark scala?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何对apache spark scala中多列的数据进行排序? [英] How to sort the data on multiple columns in apache spark scala?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭