如何在 PySpark 的 DataFrame 中按总和排序? [英] How could I order by sum, within a DataFrame in PySpark?

查看:46
本文介绍了如何在 PySpark 的 DataFrame 中按总和排序?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

类似于:

order_items.groupBy("order_item_order_id").count().orderBy(desc("count")).show()

我试过了:

order_items.groupBy("order_item_order_id").sum("order_item_subtotal").orderBy(desc("sum")).show()

但这会产生错误:

Py4JJavaError:调用 o501.sort 时发生错误.: org.apache.spark.sql.AnalysisException: 无法解析sum"给定的输入列 order_item_order_id, SUM(order_item_subtotal#429);

Py4JJavaError: An error occurred while calling o501.sort. : org.apache.spark.sql.AnalysisException: cannot resolve 'sum' given input columns order_item_order_id, SUM(order_item_subtotal#429);

我也试过:

order_items.groupBy("order_item_order_id").sum("order_item_subtotal").orderBy(desc("SUM(order_item_subtotal)")).show()

但我得到同样的错误:

Py4JJavaError:调用 o512.sort 时发生错误.: org.apache.spark.sql.AnalysisException: 无法解析 'SUM(order_item_subtotal)' 给定的输入列 order_item_order_id, SUM(order_item_subtotal#429);

Py4JJavaError: An error occurred while calling o512.sort. : org.apache.spark.sql.AnalysisException: cannot resolve 'SUM(order_item_subtotal)' given input columns order_item_order_id, SUM(order_item_subtotal#429);

我在执行时得到了正确的结果:

I get the right result when executing:

order_items.groupBy("order_item_order_id").sum("order_item_subtotal").orderBy(desc("SUM(order_item_subtotal#429)")).show()

但这是后验,在看到 Spark 附加到 sum 列名称后的数字,即 #429.

but this was done a posteriori, after having seen the number that Spark appends to the sum column name, i.e. #429.

有没有办法获得相同的结果,但先验,而不知道将附加哪个数字?

Is there a way to get the same result but a priori, without knowing which number will be appended?

推荐答案

你应该为你的列使用别名:

You should use aliases for your columns:

import pyspark.sql.functions as func

order_items.groupBy("order_item_order_id")\
           .agg(func.sum("order_item_subtotal")\
                .alias("sum_column_name"))\
           .orderBy("sum_column_name")

这篇关于如何在 PySpark 的 DataFrame 中按总和排序?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆