按月分组后找到最高的 [英] Finding the highest after grouping by month

查看:56
本文介绍了按月分组后找到最高的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在postgres中,我想输出编号最高的人.的讨论"的每个月的请求,与年份无关,即应该有12个输出.

In postgres, I want to output the persons who have the highest no. of "discussed" requests for each month, irrespective of the year i.e. there should be 12 outputs.

ID  PERSON      REQUEST   DATE
4   datanoise   opened  2010-09-02 
5   marsuboss   opened  2010-09-02 
6   m3talsmith  opened  2010-09-06 
7   sferik  opened  2010-09-08 
8   sferik  opened  2010-09-09 
8   dtrasbo discussed   2010-09-09 
8   brianmario  discussed   2010-09-09 
8   sferik  discussed   2010-09-09 
9   rsim    opened  2011-09-09 
.....more tuples to follow

*这只是数据库的一小部分.还假设数据集足够大,以至于日期列中都代表了所有月份.

*This is just a small part of the databse. also assume that the dataset is big enough that all months are represented in the date column.

推荐答案

测试数据:

CREATE TEMPORARY TABLE foo( id SERIAL PRIMARY KEY, name INTEGER NOT NULL,
    dt DATE NULL, request BOOL NOT NULL );
INSERT INTO foo (name,dt,request) SELECT random()*1000, 
   '2010-01-01'::DATE+('1 DAY'::INTERVAL)*(random()*3650), random()>0.5 
   FROM generate_series(1,100000) n;
 SELECT * FROM foo LIMIT 10;
 id | name |     dt     | request
----+------+------------+---------
  1 |  110 | 2014-11-05 | f
  2 |  747 | 2015-03-12 | t
  3 |  604 | 2014-09-26 | f
  4 |  211 | 2011-12-14 | t
  5 |  588 | 2016-12-15 | f
  6 |   96 | 2012-02-19 | f
  7 |   17 | 2018-09-18 | t
  8 |  591 | 2018-02-15 | t
  9 |  370 | 2015-07-28 | t
 10 |  844 | 2019-05-16 | f

现在,您必须获取每个名称和月份的计数,然后获取最大计数,但这不会给您提供具有最大计数的名称,这需要返回上一个结果.为了只进行一次分组,需要在CTE中完成:

Now you have to get the count per name and month, then get the max count, but that won't give you the name that has the maximum, which requires joining back with the previous result. In order to do the group by only once, it is done in a CTE:

WITH totals AS (
     SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt FROM foo 
      WHERE request=true GROUP BY name,mon
  )
SELECT * FROM 
   (SELECT mon, max(cnt) cnt FROM totals GROUP BY mon) x
   JOIN totals USING (mon,cnt);

如果多个名称具有相同的最大计数,则两个名称都将被返回.要只保留一个,可以使用DISTICT ON:

If several names have the same maximum count, they will be returned both. To keep only one, you can use DISTICT ON:

WITH (same as above)
SELECT DISTINCT ON (mon) * FROM
   (SELECT mon, max(cnt) cnt FROM totals GROUP BY mon) x
   JOIN totals USING (mon,cnt) ORDER BY mon,name;

您还可以使用DISTINCT ON来按计数desc在该cas中仅保留每月一次由ORDER子句指定的行,以便它保留最高的计数.

You can also use DISTINCT ON to keep only one row per month, specified by the ORDER clause, in this cas by count desc, so it keeps the highest count.

SELECT DISTINCT ON (mon) * FROM (
     SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt FROM foo 
      WHERE request=true GROUP BY name,mon
  )x ORDER BY mon, cnt DESC;

...或者您可以通过将主键粘贴到传递给max()的数组中来破解argmax()函数,这意味着它将返回具有最大值的行的id:

...or you could hack an argmax() function by sticking the primary key into an array passed to max(), which means it will return the id of the row which has the maximum value:

SELECT mon, cntid[1] cnt, name FROM
(SELECT mon, max(ARRAY[cnt,id]) cntid FROM (
     SELECT EXTRACT(month FROM dt) mon, name, count(*) cnt, min(id) id FROM foo
      WHERE request=true GROUP BY name,mon
  ) x GROUP BY mon)y
 JOIN foo ON (foo.id=cntid[2]);

哪个会更快?...

这篇关于按月分组后找到最高的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆