计算Postgresql中的累计总数 [英] Count cumulative total in Postgresql

查看：662 发布时间：2020/5/29 19:51:24 sql postgresql aggregate-functions

本文介绍了计算Postgresql中的累计总数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用 count 和 group by 来获取每天注册的订户数量：

I am using count and group by to get the number of subscribers registered each day:

  SELECT created_at, COUNT(email)  
    FROM subscriptions 
GROUP BY created at;

结果：

created_at  count
-----------------
04-04-2011  100
05-04-2011   50
06-04-2011   50
07-04-2011  300

我想而是获取每天的累计订阅者总数。我怎么得到这个？

I want to get the cumulative total of subscribers every day instead. How do I get this?

created_at  count
-----------------
04-04-2011  100
05-04-2011  150
06-04-2011  200
07-04-2011  500

推荐答案

对于较大的数据集， 窗口功能 是执行此类查询的最有效方法-表格将被扫描仅一次，而不是每个日期一次，就像自联接一样。它看起来也简单得多。：） PostgreSQL 8.4及更高版本支持窗口功能。

With larger datasets, window functions are the most efficient way to perform these kinds of queries -- the table will be scanned only once, instead of once for each date, like a self-join would do. It also looks a lot simpler. :) PostgreSQL 8.4 and up have support for window functions.

它是这样的：

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM subscriptions
GROUP BY created_at;

此处 OVER 创建窗口； ORDER BY created_at 表示必须按 created_at 的顺序求和。

Here OVER creates the window; ORDER BY created_at means that it has to sum up the counts in created_at order.

编辑：如果要在一天内删除重复的电子邮件，可以使用 sum（计数（不同的电子邮件））。不幸的是，这不会删除跨越不同日期的重复项。

If you want to remove duplicate emails within a single day, you can use sum(count(distinct email)). Unfortunately this won't remove duplicates that cross different dates.

如果您要删除所有重复项，我认为最简单的方法是使用子查询和 DISTINCT ON 。这会将电子邮件归为最早的日期（因为我是按created_at升序排序的，因此它将选择最早的电子邮件）：

If you want to remove all duplicates, I think the easiest is to use a subquery and DISTINCT ON. This will attribute emails to their earliest date (because I'm sorting by created_at in ascending order, it'll choose the earliest one):

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM (
    SELECT DISTINCT ON (email) created_at, email
    FROM subscriptions ORDER BY email, created_at
) AS subq
GROUP BY created_at;

如果在（电子邮件，created_at）上创建索引，此查询也不应太慢。

If you create an index on (email, created_at), this query shouldn't be too slow either.

（如果要测试，这就是方法我创建了示例数据集）

(If you want to test, this is how I created the sample dataset)

create table subscriptions as
   select date '2000-04-04' + (i/10000)::int as created_at,
          'foofoobar@foobar.com' || (i%700000)::text as email
   from generate_series(1,1000000) i;
create index on subscriptions (email, created_at);

这篇关于计算Postgresql中的累计总数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算Postgresql中的累计总数 [英] Count cumulative total in Postgresql

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算Postgresql中的累计总数 [英] Count cumulative total in Postgresql

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭