GROUP BY + CASE语句 [英] GROUP BY + CASE statement

查看:310
本文介绍了GROUP BY + CASE语句的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个按硬件模型和结果对数据进行分组的工作查询,但问题是有很多结果。如果结果= 0,我试图将其降低到,否则将其设置为1。这一般起作用,但我最终得到了:

  day |名称|键入|案例| count 
------------ + ---------------- + ------ + ------ + - ------
2013-11-06 | modelA | 1 | 0 | 972
2013-11-06 | modelA | 1 | 1 | 42
2013-11-06 | modelA | 1 | 1 | 2
2013-11-06 | modelA | 1 | 1 | 11
2013-11-06 | modelB | 1 | 0 | 456
2013-11-06 | modelB | 1 | 1 | 16
2013-11-06 | modelB | 1 | 1 | 8
2013-11-06 | modelB | 3 | 0 | 21518
2013-11-06 | modelB | 3 | 1 | 5
2013-11-06 | modelB | 3 | 1 | 7
2013-11-06 | modelB | 3 | 1 | 563

而不是我试图实现的集合,其中每个类型/案例组合只有1行。

  day |名称|键入|案例| count 
------------ + ---------------- + ------ + ------ + - ------
2013-11-06 | modelA | 1 | 0 | 972
2013-11-06 | modelA | 1 | 1 | 55
2013-11-06 | modelB | 1 | 0 | 456
2013-11-06 | modelB | 1 | 1 | 24
2013-11-06 | modelB | 3 | 0 | 21518
2013-11-06 | modelB | 3 | 1 | 575

以下是我的查询:

  select CURRENT_DATE-1 AS day,model.name,attempt.type,
CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END,
count(*)
from try attempt,prod_hw_id prod_hw_id,model model
where time> ='2013-11-06 00:00:00'
AND time< '2013-11-07 00:00:00'
AND attempt.hard_id = prod_hw_id.hard_id
AND prod_hw_id.model_id = model.model_id
group by model.name,attempt.type, attempt.result
order by model.name,attempt.type,attempt.result;

有关如何实现此目的的任何提示都非常棒。



日将始终在 WHERE 子句中定义,因此它不会改变。 名称,类型,结果(大小写) count 会有所不同。简而言之,对于任何给定的模型,我只需要每行type + case组合一行。正如你在第一个结果集中可以看到的,我有3行用于 modelA ,它们有 type = 1 case = 1 (因为有很多result值,我已经将其转换为 0 = 0且其他任何值都是1) 。我希望将它表示为1行,并将计数聚合到示例数据集2中。 解决方案

- 除了遇到命名冲突或将输出列 CASE 表达式)与源列混淆外, 结果,它有不同的内容。

  ... 
GROUP BY model.name,attempt.type, attempt.result
...

您需要 GROUP BY 您的 CASE 表达式,而不是源列:

  ... 
GROUP BY model.name,attempt.type
CASE WHEN或者提供一个结果= 0 THEN 0 ELSE 1 END
...

strong>列别名不同于 FROM 列表中的任何列名称 - 否则该列优先:

  SELECT ... 
,CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END AS result1
...
GROUP BY model.name,attempt.type , result1
...

SQL标准在这方面相当奇特。 在此处引用该手册:


输出列的名称可用于引用
ORDER BY GROUP BY 子句,但不在 WHERE HAVING 子句中;
您必须写出表达式。


和:


如果 ORDER BY 表达式是一个与输出
列名称和输入列名称匹配的简单名称, code> ORDER BY 会将其解释为
的输出列名称。 在相同的情况下, GROUP BY
将使
变成相反的选择。这种不一致是与SQL标准兼容的


粗体强调矿。

使用 GROUP BY 位置引用(序号)可以避免这些冲突>和 ORDER BY ,从左到右引用 SELECT 列表中的项目。请参阅下面的解决方案。

缺点是,这可能很难阅读,并且容易受到 SELECT 列表中的编辑(可能会忘记修改位置参考文献)。



但是你必须将 day 列添加到 GROUP BY 子句,只要它持有一个常数值( CURRENT_DATE-1 )。



使用适当的JOIN语法和位置引用进行重写和简化,它可能如下所示:

  SELECT m.name 
,a.type
,CASE WHEN a.result = 0 THEN 0 ELSE 1 END结果
,CURRENT_DATE - 1 AS日
,count(*)AS ct
FROM尝试
JOIN prod_hw_id p USING(hard_id)
JOIN模型m USING(model_id)
WHERE ts> ='2013-11-06 00:00:00'
AND ts< '2013-11-07 00:00:00'
GROUP BY 1,2,3
ORDER BY 1,2,3;

另请注意,我避开了列名 time 。这是一个保留字,绝不能用作标识符。此外,你的时间显然是一个 timestamp date ,所以这相当具有误导性。


I have a working query that is grouping data by hardware model and a result, but the problem is there are many "results". I have tried to reduce that down to "if result = 0 then keep as 0, else set it to 1". This generally works, but I end up having:

    day     |      name      | type | case | count
------------+----------------+------+------+-------
 2013-11-06 | modelA         |    1 |    0 |   972
 2013-11-06 | modelA         |    1 |    1 |    42
 2013-11-06 | modelA         |    1 |    1 |     2
 2013-11-06 | modelA         |    1 |    1 |    11
 2013-11-06 | modelB         |    1 |    0 |   456
 2013-11-06 | modelB         |    1 |    1 |    16
 2013-11-06 | modelB         |    1 |    1 |     8
 2013-11-06 | modelB         |    3 |    0 | 21518
 2013-11-06 | modelB         |    3 |    1 |     5
 2013-11-06 | modelB         |    3 |    1 |     7
 2013-11-06 | modelB         |    3 |    1 |   563

Instead of the aggregate I am trying to achieve, where only 1 row per type/case combo.

    day     |      name      | type | case | count
------------+----------------+------+------+-------
 2013-11-06 | modelA         |    1 |    0 |   972
 2013-11-06 | modelA         |    1 |    1 |    55
 2013-11-06 | modelB         |    1 |    0 |   456
 2013-11-06 | modelB         |    1 |    1 |    24
 2013-11-06 | modelB         |    3 |    0 | 21518
 2013-11-06 | modelB         |    3 |    1 |   575

Here is my query:

select CURRENT_DATE-1 AS day, model.name, attempt.type, 
       CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END, 
       count(*) 
from attempt attempt, prod_hw_id prod_hw_id, model model
where time >= '2013-11-06 00:00:00'  
AND time < '2013-11-07 00:00:00'
AND attempt.hard_id = prod_hw_id.hard_id
AND prod_hw_id.model_id = model.model_id
group by model.name, attempt.type, attempt.result
order by model.name, attempt.type, attempt.result;

Any tips on how I can achieve this would be awesome.

Day will always be defined in the WHERE clause, so it will not vary. name, type, result(case) and count will vary. In short, for any given model I want only 1 row per "type + case" combo. As you can see in the first result set I have 3 rows for modelA that have type=1 and case=1 (because there are many "result" values that I have turned into 0=0 and anything else=1). I want that to be represented as 1 row with the count aggregated as in example data set 2.

解决方案

Your query would work already - except that you are running into naming conflicts or just confusing the output column (the CASE expression) with source column result, which has different content.

...
GROUP BY model.name, attempt.type, attempt.result
...

You need to GROUP BY your CASE expression instead of your source column:

...
GROUP BY model.name, attempt.type
       , CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END
...

Or provide a column alias that's different from any column name in the FROM list - or else that column takes precedence:

SELECT ...
     , CASE WHEN attempt.result = 0 THEN 0 ELSE 1 END AS result1
...
GROUP BY model.name, attempt.type, result1
...

The SQL standard is rather peculiar in this respect. Quoting the manual here:

An output column's name can be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.

And:

If an ORDER BY expression is a simple name that matches both an output column name and an input column name, ORDER BY will interpret it as the output column name. This is the opposite of the choice that GROUP BY will make in the same situation. This inconsistency is made to be compatible with the SQL standard.

Bold emphasis mine.

These conflicts can be avoided by using positional references (ordinal numbers) in GROUP BY and ORDER BY, referencing items in the SELECT list from left to right. See solution below.
The drawback is, that this may be harder to read and vulnerable to edits in the SELECT list (one might forget to adapt positional references accordingly).

But you do not have to add the column day to the GROUP BY clause, as long as it holds a constant value (CURRENT_DATE-1).

Rewritten and simplified with proper JOIN syntax and positional references it could look like this:

SELECT m.name
     , a.type
     , CASE WHEN a.result = 0 THEN 0 ELSE 1 END AS result
     , CURRENT_DATE - 1 AS day
     , count(*) AS ct
FROM   attempt    a
JOIN   prod_hw_id p USING (hard_id)
JOIN   model      m USING (model_id)
WHERE  ts >= '2013-11-06 00:00:00'  
AND    ts <  '2013-11-07 00:00:00'
GROUP  BY 1,2,3
ORDER  BY 1,2,3;

Also note that I am avoiding the column name time. That's a reserved word and should never be used as identifier. Besides, your "time" obviously is a timestamp or date, so that is rather misleading.

这篇关于GROUP BY + CASE语句的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆