同一查询中的火花计数和过滤计数 [英] spark count and filtered count in same query

查看：25 发布时间：2021/11/14 22:35:24 sql apache-spark count apache-spark-sql

本文介绍了同一查询中的火花计数和过滤计数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

在 SQL 中类似

SELECT  count(id), sum(if(column1 = 1, 1, 0)) from groupedTable

可以制定为在单次通过中对总记录和过滤记录进行计数.

could be formulated to perform a count of the total records as well as filtered records in a single pass.

如何在 spark-data-frame API 中执行此操作?即不需要将计数之一连接回原始数据框.

How can I perform this in spark-data-frame API? i.e. without needing to join back one of the counts to the original data frame.

只需使用 count 两种情况:

Just use count for both cases:

df.select(count($"id"), count(when($"column1" === 1, true)))

如果列是 nullable，您应该对此进行更正(例如使用 coalesce 或 IS NULL，具体取决于所需的输出).

If column is nullable you should correct for that (for example with coalesce or IS NULL, depending on the desired output).

这篇关于同一查询中的火花计数和过滤计数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文