获取n个分组类别并将其他类别合计为一个 [英] Get n grouped categories and sum others into one

查看:73
本文介绍了获取n个分组类别并将其他类别合计为一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有以下结构的表:

I have a table with the following structure:

Contents (
  id
  name
  desc
  tdate
  categoryid
  ...
)

我需要对该表中的数据进行一些统计。例如,我想通过分组和该类别的ID获得具有相同类别的行数。另外,我想将它们限制为 n 行降序,如果有更多类别可用,我想将它们标记为其他。到目前为止,我对数据库进行了2个查询:

I need to do some statistics with the data in this table. For example I want to get number of rows with the same category by grouping and id of that category. Also I want to limit them for n rows in descending order and if there are more categories available I want to mark them as "Others". So far I have come out with 2 queries to database:

以降序选择 n 行:

SELECT COALESCE(ca.NAME, 'Unknown') AS label
    ,ca.id AS catid
    ,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
GROUP BY label
    ,catid
ORDER BY data DESC LIMIT 7

选择其他行作为一个:

SELECT 'Others' AS label
    ,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
WHERE c.categoryid NOT IN ($INCONDITION)

但是当我在db表中没有类别组时,我仍然会得到其他记录。

But when I have no category groups left in db table I still get an "Others" record. Is it possible to make it in one query and make the "Others" record optional?

推荐答案

特定难度:在 SELECT 列表中具有一个或多个聚合函数且没有 GROUP BY 子句的查询会产生即使在基础表中未找到行,也仅排成一行

The specific difficulty here: Queries with one or more aggregate functions in the SELECT list and no GROUP BY clause produce exactly one row, even if no row is found in the underlying table.

中,您无能为力子句取消显示该行。您必须在事实之后 ,即在 HAVING 子句或外部查询中排除此类行。

There is nothing you can do in the WHERE clause to suppress that row. You have to exclude such a row after the fact, i.e. in the HAVING clause, or in an outer query.

每个文档:


如果查询包含聚合函数调用,但没有 GROUP BY 子句,仍然会进行
分组:结果是单个组行(或者可能根本没有
行,如果然后通过 HAVING )。如果相同的
包含 HAVING 子句,即使没有任何汇总的
函数调用或 GROUP BY 子句。

If a query contains aggregate function calls, but no GROUP BY clause, grouping still occurs: the result is a single group row (or perhaps no rows at all, if the single row is then eliminated by HAVING). The same is true if it contains a HAVING clause, even without any aggregate function calls or GROUP BY clause.

应注意,添加 GROUP BY 子句也只有一个常量表达式(否则它是完全没有意义的!)也可以工作。 请参见下面的示例。但是,即使它简短,便宜且简单,我也不愿使用该技巧,因为它的作用尚不明显。

It should be noted that adding a GROUP BY clause with only a constant expression (which is otherwise completely pointless!) works, too. See example below. But I'd rather not use that trick, even if it's short, cheap and simple, because it's hardly obvious what it does.

以下查询只需要进行单表扫描,并返回按计数排序的前7个类别。如果(并且仅当)还有更多类别,其余的类别会汇总为其他:

The following query only needs a single table scan and returns the top 7 categories ordered by count. If (and only if) there are more categories, the rest is summarized into 'Others':

WITH cte AS (
   SELECT categoryid, count(*) AS data
        , row_number() OVER (ORDER BY count(*) DESC, categoryid) AS rn
   FROM   contents
   GROUP  BY 1
   )
(  -- parentheses required again
SELECT categoryid, COALESCE(ca.name, 'Unknown') AS label, data
FROM   cte
LEFT   JOIN category ca ON ca.id = cte.categoryid
WHERE  rn <= 7
ORDER  BY rn
)
UNION ALL
SELECT NULL, 'Others', sum(data)
FROM   cte
WHERE  rn > 7         -- only take the rest
HAVING count(*) > 0;  -- only if there actually is a rest
-- or: HAVING  sum(data) > 0




  • 如果多个类别可以有在第7/8位排名相同。在我的示例中,类别较小的 categoryid 赢得了这样的比赛。

    括号必须包含 LIMIT ORDER BY 子句对 UNION 查询。

    Parentheses are required to include a LIMIT or ORDER BY clause to an individual leg of a UNION query.

    您只需要加入表 category 的前7个类别。在这种情况下,通常先聚集再加入,通常比较便宜。因此,请勿在 CTE(普通表)中加入基本查询表达式)名为 cte ,仅加入 UNION的第一个 SELECT 查询,这样更便宜。

    You only need to join to table category for the top 7 categories. And it's generally cheaper to aggregate first and join later in this scenario. So don't join in the the base query in the CTE (common table expression) named cte, only join in the first SELECT of the UNION query, that's cheaper.

    不确定为什么需要 COALESCE 。如果您具有从 contents.categoryid category.id 以及两个的外键content.categoryid category.name 定义为 NOT NULL (就像它们应该是),那么您就不需要它。

    Not sure why you need the COALESCE. If you have a foreign key in place from contents.categoryid to category.id and both contents.categoryid and category.name are defined NOT NULL (like they probably should be), then you don't need it.

    这也可行:

    ...
    
    UNION ALL
    SELECT NULL , 'Others', sum(data)
    FROM   cte
    WHERE  rn > 7
    GROUP BY true; 

    我什至可以获得更快的查询计划。但这是一个相当奇怪的黑客……

    And I even get slightly faster query plans. But it's a rather odd hack ...

    SQL Fiddle 演示了所有内容。

    SQL Fiddle demonstrating all.

    有关 UNION ALL的更多解释 / LIMIT 技术:

    • Sum results of a few queries and then find top 5 in SQL

    这篇关于获取n个分组类别并将其他类别合计为一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆