计算Postgresql中的累计总数 [英] Count cumulative total in Postgresql

查看:662
本文介绍了计算Postgresql中的累计总数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 count group by 来获取每天注册的订户数量:

I am using count and group by to get the number of subscribers registered each day:

  SELECT created_at, COUNT(email)  
    FROM subscriptions 
GROUP BY created at;

结果:

created_at  count
-----------------
04-04-2011  100
05-04-2011   50
06-04-2011   50
07-04-2011  300

我想而是获取每天的累计订阅者总数。我怎么得到这个?

I want to get the cumulative total of subscribers every day instead. How do I get this?

created_at  count
-----------------
04-04-2011  100
05-04-2011  150
06-04-2011  200
07-04-2011  500


推荐答案

对于较大的数据集, 窗口功能 是执行此类查询的最有效方法-表格将被扫描仅一次,而不是每个日期一次,就像自联接一样。它看起来也简单得多。 :) PostgreSQL 8.4及更高版本支持窗口功能。

With larger datasets, window functions are the most efficient way to perform these kinds of queries -- the table will be scanned only once, instead of once for each date, like a self-join would do. It also looks a lot simpler. :) PostgreSQL 8.4 and up have support for window functions.

它是这样的:

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM subscriptions
GROUP BY created_at;

此处 OVER 创建窗口; ORDER BY created_at 表示必须按 created_at 的顺序求和。

Here OVER creates the window; ORDER BY created_at means that it has to sum up the counts in created_at order.

编辑:如果要在一天内删除重复的电子邮件,可以使用 sum(计数(不同的电子邮件))。不幸的是,这不会删除跨越不同日期的重复项。

If you want to remove duplicate emails within a single day, you can use sum(count(distinct email)). Unfortunately this won't remove duplicates that cross different dates.

如果您要删除所有重复项,我认为最简单的方法是使用子查询和 DISTINCT ON 。这会将电子邮件归为最早的日期(因为我是按created_at升序排序的,因此它将选择最早的电子邮件):

If you want to remove all duplicates, I think the easiest is to use a subquery and DISTINCT ON. This will attribute emails to their earliest date (because I'm sorting by created_at in ascending order, it'll choose the earliest one):

SELECT created_at, sum(count(email)) OVER (ORDER BY created_at)
FROM (
    SELECT DISTINCT ON (email) created_at, email
    FROM subscriptions ORDER BY email, created_at
) AS subq
GROUP BY created_at;

如果在(电子邮件,created_at)上创建索引,此查询也不应太慢。

If you create an index on (email, created_at), this query shouldn't be too slow either.

(如果要测试,这就是方法我创建了示例数据集)

(If you want to test, this is how I created the sample dataset)

create table subscriptions as
   select date '2000-04-04' + (i/10000)::int as created_at,
          'foofoobar@foobar.com' || (i%700000)::text as email
   from generate_series(1,1000000) i;
create index on subscriptions (email, created_at);

这篇关于计算Postgresql中的累计总数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆