加入对 generate_series() 的计数查询并将 Null 值检索为“0" [英] Join a count query on generate_series() and retrieve Null values as '0'

查看:30
本文介绍了加入对 generate_series() 的计数查询并将 Null 值检索为“0"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 generate_series() 计算每个月的 ID.此查询适用于 PostgreSQL 9.1:

I want to count ID's per month using generate_series(). This query works in PostgreSQL 9.1:

SELECT (to_char(serie,'yyyy-mm')) AS year, sum(amount)::int AS eintraege FROM (
    SELECT  
       COUNT(mytable.id) as amount,   
       generate_series::date as serie   
       FROM mytable  
    
    RIGHT JOIN generate_series(     
       (SELECT min(date_from) FROM mytable)::date,   
       (SELECT max(date_from) FROM mytable)::date,  
       interval '1 day') ON generate_series = date(date_from)  
       WHERE version = 1   
       GROUP BY generate_series       
       ) AS foo
GROUP BY Year   
ORDER BY Year ASC;  

这是我的输出:

"2006-12" | 4  
"2007-02" | 1  
"2007-03" | 1  

但我想得到的是这个输出(一月份的'0'值):

But what I want to get is this output ('0' value in January):

"2006-12" | 4  
"2007-01" | 0  
"2007-02" | 1  
"2007-03" | 1  

没有id的月份应该被列出.
任何想法如何解决这个问题?

Months without id should be listed nevertheless.
Any ideas how to solve this?

示例数据:

drop table if exists mytable;
create table mytable(id bigint, version smallint, date_from timestamp);
insert into mytable(id, version, date_from) values
(4084036, 1, '2006-12-22 22:46:35'),
(4084938, 1, '2006-12-23 16:19:13'),
(4084938, 2, '2006-12-23 16:20:23'),
(4084939, 1, '2006-12-23 16:29:14'),
(4084954, 1, '2006-12-23 16:28:28'),
(4250653, 1, '2007-02-12 21:58:53'),
(4250657, 1, '2007-03-12 21:58:53')
;

推荐答案

解开、简化和固定,它可能看起来像这样:

Untangled, simplified and fixed, it might look like this:

SELECT to_char(s.tag,'yyyy-mm') AS monat
     , count(t.id) AS eintraege
FROM  (
   SELECT generate_series(min(date_from)::date
                        , max(date_from)::date
                        , interval '1 day'
          )::date AS tag
   FROM   mytable t
   ) s
LEFT   JOIN mytable t ON t.date_from::date = s.tag AND t.version = 1   
GROUP  BY 1
ORDER  BY 1;

db<>fiddle 这里

在所有的噪音、误导性标识符和非常规格式中,真正的问题隐藏在这里:

Among all the noise, misleading identifiers and unconventional format the actual problem was hidden here:

WHERE version = 1

您正确使用了 右[外部]加入.但是添加需要 mytable 中的现有行的 WHERE 子句会将 RIGHT [OUTER] JOIN 转换为 [INNER] JOIN 有效.

You made correct use of RIGHT [OUTER] JOIN. But adding a WHERE clause that requires an existing row from mytable converts the RIGHT [OUTER] JOIN to an [INNER] JOIN effectively.

将该过滤器移动到 JOIN 条件中以使其工作.

Move that filter into the JOIN condition to make it work.

我在做的时候简化了其他一些事情.

I simplified some other things while being at it.

SELECT to_char(mon, 'yyyy-mm') AS monat
     , COALESCE(t.ct, 0) AS eintraege
FROM  (
   SELECT date_trunc('month', date_from)::date AS mon
        , count(*) AS ct
   FROM   mytable
   WHERE  version = 1     
   GROUP  BY 1
   ) t
RIGHT JOIN (
   SELECT generate_series(date_trunc('month', min(date_from))
                        , max(date_from)
                        , interval '1 mon')::date
   FROM   mytable
   ) m(mon) USING (mon)
ORDER  BY mon;

db<>fiddle 这里

先聚合后加入要便宜得多 - 每月加入一行而不是每天加入一行.

It's much cheaper to aggregate first and join later - joining one row per month instead of one row per day.

GROUP BYORDER BY 基于 date 值而不是呈现的 text 更便宜.

It's cheaper to base GROUP BY and ORDER BY on the date value instead of the rendered text.

count(*)count(id) 快一点,而在 this 查询中等效.

count(*) is a bit faster than count(id), while equivalent in this query.

generate_series() 基于 timestamp 而不是 date 会更快更安全.见:

generate_series() is a bit faster and safer when based on timestamp instead of date. See:

这篇关于加入对 generate_series() 的计数查询并将 Null 值检索为“0"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆