pyspark相当于pandas groupby +应用于列 [英] pyspark equivalent of pandas groupby + apply on column

查看：77 发布时间：2020/10/17 0:01:36 dataframe group-by pyspark

本文介绍了pyspark相当于pandas groupby +应用于列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我有一个Spark DataFrame，我想计算groupby之后变量的唯一值的数量。

I have a spark DataFrame and I would like to count the number of unique values for a variable after groupby.

在大熊猫中，我可以通过以下方式获取它： df.groupby（'UserName'）。apply（lambda x：x ['Server']。 nunique（））

In pandas I can obtain it as : df.groupby('UserName').apply(lambda x: x['Server'].nunique())

当 df 是a时，如何获得相同的结果pyspark数据框？

How can I get the same results when df is a pyspark dataframe?

您可以将 countDistinct 与 agg一起使用：

df.groupBy('UserName').agg(countDistinct('Server').alias('Server'))

这篇关于pyspark相当于pandas groupby +应用于列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文