pyspark数据帧中的自定义排序 [英] Custom sorting in pyspark dataframes

查看：145 发布时间：2021/4/8 20:30:11 python pandas apache-spark pyspark apache-spark-sql

本文介绍了pyspark数据帧中的自定义排序的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否存在用于在pyspark中对分类数据实现自定义排序顺序的任何推荐方法?理想情况下，我正在寻找pandas类别数据类型提供的功能.

Are there any recommended methods for implementing custom sort ordering for categorical data in pyspark? I'm ideally looking for the functionality the pandas categorical data type offers.

因此，给定具有 Speed 列的数据集，可能的选项为 [超快速"，快速"，中"，慢"] .我想实现适合上下文的自定义排序.

So, given a dataset with a Speed column, the possible options are ["Super Fast", "Fast", "Medium", "Slow"]. I want to implement custom sorting that will fit the context.

如果我使用默认排序，则类别将按字母顺序排序.Pandas允许将列数据类型更改为分类，并且部分定义提供了自定义排序顺序:

If I use the default sorting the categories will be sorted alphabetically. Pandas allows to change the column data type to be categorical and part of the definition gives a custom sort order: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html

推荐答案

您可以使用 orderBy 并使用 when 时定义自定义顺序:

You can use orderBy and define your custom ordering using when:

from pyspark.sql.functions col, when

df.orderBy(when(col("Speed") == "Super Fast", 1)
           .when(col("Speed") == "Fast", 2)
           .when(col("Speed") == "Medium", 3)
           .when(col("Speed") == "Slow", 4)
           )

这篇关于pyspark数据帧中的自定义排序的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pyspark数据帧中的自定义排序 [英] Custom sorting in pyspark dataframes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pyspark数据帧中的自定义排序 [英] Custom sorting in pyspark dataframes

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭