GROUP BY + CASE 语句 [英] GROUP BY + CASE statement

查看:34
本文介绍了GROUP BY + CASE 语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个工作查询,它​​按硬件型号和结果对数据进行分组,但问题是有很多结果".我试图将其减少到 如果结果 = 0,则保持为 0,否则将其设置为 1".这通常有效,但我最终有:

I have a working query that is grouping data by hardware model and a result, but the problem is there are many "results". I have tried to reduce that down to "if result = 0 then keep as 0, else set it to 1". This generally works, but I end up having:

    day     |      name      | type | case | count
------------+----------------+------+------+-------
 2013-11-06 | modelA         |    1 |    0 |   972
 2013-11-06 | modelA         |    1 |    1 |    42
 2013-11-06 | modelA         |    1 |    1 |     2
 2013-11-06 | modelA         |    1 |    1 |    11
 2013-11-06 | modelB         |    1 |    0 |   456
 2013-11-06 | modelB         |    1 |    1 |    16
 2013-11-06 | modelB         |    1 |    1 |     8
 2013-11-06 | modelB         |    3 |    0 | 21518
 2013-11-06 | modelB         |    3 |    1 |     5
 2013-11-06 | modelB         |    3 |    1 |     7
 2013-11-06 | modelB         |    3 |    1 |   563

而不是我试图实现的聚合,每个类型/案例组合只有 1 行.

Instead of the aggregate I am trying to achieve, where only 1 row per type/case combo.

    day     |      name      | type | case | count
------------+----------------+------+------+-------
 2013-11-06 | modelA         |    1 |    0 |   972
 2013-11-06 | modelA         |    1 |    1 |    55
 2013-11-06 | modelB         |    1 |    0 |   456
 2013-11-06 | modelB         |    1 |    1 |    24
 2013-11-06 | modelB         |    3 |    0 | 21518
 2013-11-06 | modelB         |    3 |    1 |   575

这是我的查询:

select CURRENT_DATE-1 AS day, model.name, attempt.type, 
       CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END, 
       count(*) 
from attempt attempt, prod_hw_id prod_hw_id, model model
where time >= '2013-11-06 00:00:00'  
AND time < '2013-11-07 00:00:00'
AND attempt.hard_id = prod_hw_id.hard_id
AND prod_hw_id.model_id = model.model_id
group by model.name, attempt.type, attempt.result
order by model.name, attempt.type, attempt.result;

关于如何实现这一目标的任何提示都很棒.

Any tips on how I can achieve this would be awesome.

Day 将始终在 WHERE 子句中定义,因此不会发生变化.name、type、result(case)count 会有所不同.简而言之,对于任何给定的模型,我只需要每个 type + case" 组合 1 行.正如你在第一个结果集中看到的,我有 3 行 modelAtype=1case=1(因为有很多结果" 值,我已经变成了 0=0 和任何其他=1).我希望将其表示为 1 行,并在示例数据集 2 中汇总计数.

Day will always be defined in the WHERE clause, so it will not vary. name, type, result(case) and count will vary. In short, for any given model I want only 1 row per "type + case" combo. As you can see in the first result set I have 3 rows for modelA that have type=1 and case=1 (because there are many "result" values that I have turned into 0=0 and anything else=1). I want that to be represented as 1 row with the count aggregated as in example data set 2.

推荐答案

您的查询已经可以工作了 - 除了您遇到命名冲突或只是混淆了输出列(CASE 表达式)与源列 结果,具有不同的内容.

Your query would work already - except that you are running into naming conflicts or just confusing the output column (the CASE expression) with source column result, which has different content.

...
GROUP BY model.name, attempt.type, attempt.result
...

你需要GROUP BY你的CASE表达式而不是你的源列:

You need to GROUP BY your CASE expression instead of your source column:

...
GROUP BY model.name, attempt.type
       , CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END
...

或者提供一个与 FROM 列表中的任何列名称不同的列别名 - 否则该列优先:

Or provide a column alias that's different from any column name in the FROM list - or else that column takes precedence:

SELECT ...
     , CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END AS result1
...
GROUP BY model.name, attempt.type, result1
...

SQL 标准在这方面比较特殊.在此处引用手册:

The SQL standard is rather peculiar in this respect. Quoting the manual here:

输出列的名称可用于引用列的值ORDER BYGROUP BY 子句,但不在 WHEREHAVING 子句中;在那里你必须写出表达式.

An output column's name can be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.

还有:

如果 ORDER BY 表达式是一个简单的名称,同时匹配两个输出列名和输入列名,ORDER BY 会将其解释为输出列名称.这与GROUP BY的选择相反将在相同的情况下.这种不一致是为了与SQL标准兼容.

If an ORDER BY expression is a simple name that matches both an output column name and an input column name, ORDER BY will interpret it as the output column name. This is the opposite of the choice that GROUP BY will make in the same situation. This inconsistency is made to be compatible with the SQL standard.

大胆强调我的.

这些冲突可以通过在GROUP BYORDER BY中使用位置引用(序数)来避免,引用中的项目>SELECT 列表从左到右.请参阅下面的解决方案.
缺点是,这可能更难阅读并且容易受到 SELECT 列表中的编辑的影响(人们可能会忘记相应地调整位置引用).

These conflicts can be avoided by using positional references (ordinal numbers) in GROUP BY and ORDER BY, referencing items in the SELECT list from left to right. See solution below.
The drawback is, that this may be harder to read and vulnerable to edits in the SELECT list (one might forget to adapt positional references accordingly).

但是你不必将列day添加到GROUP BY子句中,只要它保持一个常量值(<代码>CURRENT_DATE-1).

But you do not have to add the column day to the GROUP BY clause, as long as it holds a constant value (CURRENT_DATE-1).

使用正确的 JOIN 语法和位置引用重写和简化,它可能如下所示:

Rewritten and simplified with proper JOIN syntax and positional references it could look like this:

SELECT m.name
     , a.type
     , CASE WHEN a.result = 0 THEN 0 ELSE 1 END AS result
     , CURRENT_DATE - 1 AS day
     , count(*) AS ct
FROM   attempt    a
JOIN   prod_hw_id p USING (hard_id)
JOIN   model      m USING (model_id)
WHERE  ts >= '2013-11-06 00:00:00'  
AND    ts <  '2013-11-07 00:00:00'
GROUP  BY 1,2,3
ORDER  BY 1,2,3;

另请注意,我避免使用列名 time.这是一个保留字,不应用作标识符.此外,您的时间"显然是 timestampdate,所以这是相当误导.

Also note that I am avoiding the column name time. That's a reserved word and should never be used as identifier. Besides, your "time" obviously is a timestamp or date, so that is rather misleading.

这篇关于GROUP BY + CASE 语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆