获取n个分组类别并将其他类别合计为一个 [英] Get n grouped categories and sum others into one
问题描述
我有一个具有以下结构的表:
I have a table with the following structure:
Contents (
id
name
desc
tdate
categoryid
...
)
我需要对该表中的数据进行一些统计。例如,我想通过分组和该类别的ID获得具有相同类别的行数。另外,我想将它们限制为 n
行降序,如果有更多类别可用,我想将它们标记为其他。到目前为止,我对数据库进行了2个查询:
I need to do some statistics with the data in this table. For example I want to get number of rows with the same category by grouping and id of that category. Also I want to limit them for n
rows in descending order and if there are more categories available I want to mark them as "Others". So far I have come out with 2 queries to database:
以降序选择 n
行:
SELECT COALESCE(ca.NAME, 'Unknown') AS label
,ca.id AS catid
,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
GROUP BY label
,catid
ORDER BY data DESC LIMIT 7
选择其他行作为一个:
SELECT 'Others' AS label
,COUNT(c.id) AS data
FROM contents c
LEFT OUTER JOIN category ca ON ca.id = c.categoryid
WHERE c.categoryid NOT IN ($INCONDITION)
但是当我在db表中没有类别组时,我仍然会得到其他记录。
But when I have no category groups left in db table I still get an "Others" record. Is it possible to make it in one query and make the "Others" record optional?
推荐答案
特定难度:在 SELECT
列表中具有一个或多个聚合函数且没有 GROUP BY
子句的查询会产生即使在基础表中未找到行,也仅排成一行。
The specific difficulty here: Queries with one or more aggregate functions in the SELECT
list and no GROUP BY
clause produce exactly one row, even if no row is found in the underlying table.
在中,您无能为力
子句取消显示该行。您必须在事实之后 ,即在 HAVING
子句或外部查询中排除此类行。
There is nothing you can do in the WHERE
clause to suppress that row. You have to exclude such a row after the fact, i.e. in the HAVING
clause, or in an outer query.
如果查询包含聚合函数调用,但没有
GROUP BY
子句,仍然会进行
分组:结果是单个组行(或者可能根本没有
行,如果然后通过HAVING $ c消除了单个行) $ c>)。如果相同的
包含HAVING
子句,即使没有任何汇总的
函数调用或GROUP BY $,也是如此。 c $ c>子句。
If a query contains aggregate function calls, but no
GROUP BY
clause, grouping still occurs: the result is a single group row (or perhaps no rows at all, if the single row is then eliminated byHAVING
). The same is true if it contains aHAVING
clause, even without any aggregate function calls orGROUP BY
clause.
应注意,添加 GROUP BY
子句也只有一个常量表达式(否则它是完全没有意义的!)也可以工作。 请参见下面的示例。但是,即使它简短,便宜且简单,我也不愿使用该技巧,因为它的作用尚不明显。
It should be noted that adding a GROUP BY
clause with only a constant expression (which is otherwise completely pointless!) works, too. See example below. But I'd rather not use that trick, even if it's short, cheap and simple, because it's hardly obvious what it does.
以下查询只需要进行单表扫描,并返回按计数排序的前7个类别。如果(并且仅当)还有更多类别,其余的类别会汇总为其他:
The following query only needs a single table scan and returns the top 7 categories ordered by count. If (and only if) there are more categories, the rest is summarized into 'Others':
WITH cte AS (
SELECT categoryid, count(*) AS data
, row_number() OVER (ORDER BY count(*) DESC, categoryid) AS rn
FROM contents
GROUP BY 1
)
( -- parentheses required again
SELECT categoryid, COALESCE(ca.name, 'Unknown') AS label, data
FROM cte
LEFT JOIN category ca ON ca.id = cte.categoryid
WHERE rn <= 7
ORDER BY rn
)
UNION ALL
SELECT NULL, 'Others', sum(data)
FROM cte
WHERE rn > 7 -- only take the rest
HAVING count(*) > 0; -- only if there actually is a rest
-- or: HAVING sum(data) > 0
-
如果多个类别可以有在第7/8位排名相同。在我的示例中,类别较小的
categoryid
赢得了这样的比赛。括号必须包含
LIMIT
或ORDER BY
子句对UNION $ c $的单个分支c>查询。
Parentheses are required to include a
LIMIT
orORDER BY
clause to an individual leg of aUNION
query.您只需要加入表
category
的前7个类别。在这种情况下,通常先聚集再加入,通常比较便宜。因此,请勿在 CTE(普通表)中加入基本查询表达式)名为cte
,仅加入UNION的第一个
查询,这样更便宜。SELECT
You only need to join to table
category
for the top 7 categories. And it's generally cheaper to aggregate first and join later in this scenario. So don't join in the the base query in the CTE (common table expression) namedcte
, only join in the firstSELECT
of theUNION
query, that's cheaper.不确定为什么需要
COALESCE
。如果您具有从contents.categoryid
到category.id
以及两个的外键content.categoryid
和category.name
定义为NOT NULL
(就像它们应该是),那么您就不需要它。Not sure why you need the
COALESCE
. If you have a foreign key in place fromcontents.categoryid
tocategory.id
and bothcontents.categoryid
andcategory.name
are definedNOT NULL
(like they probably should be), then you don't need it.这也可行:
... UNION ALL SELECT NULL , 'Others', sum(data) FROM cte WHERE rn > 7 GROUP BY true;
我什至可以获得更快的查询计划。但这是一个相当奇怪的黑客……
And I even get slightly faster query plans. But it's a rather odd hack ...
SQL Fiddle 演示了所有内容。
SQL Fiddle demonstrating all.
有关
UNION ALL的更多解释
/LIMIT
技术:- Sum results of a few queries and then find top 5 in SQL
这篇关于获取n个分组类别并将其他类别合计为一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!