MySQL按数字系列分组 [英] MySQL group by number series
问题描述
我刚刚开始学习 MySQL,我遇到了这个问题,我非常需要解决方案或逻辑.
I'm just starting to learn MySQL and I have encountered this problem which I badly needed the solution or just the logic.
例如我有这张表:
id a
-- --
1 1
2 2
3 3
4 5
5 6
6 7
7 9
8 10
9 11
10 12
现在,我想要的是显示 a
列中的所有数据,这些数据应该按系列分组.在这种情况下,结果应该是:
Now, what I want is to display all the data in column a
which should be grouped by series. In this case the result should be:
series_start|series_end|count
------------+----------+-----
1 3 3
5 7 3
9 12 4
这需要很多子查询和连接.我现在只是想不通.
This needs a lot of subqueries and joins. I just can't figure it out by now.
推荐答案
This is a gaps-and-islands 问题,这是另一种解决它的方法,它也使用变量:
This is a gaps-and-islands problem, and here's another method of solving it, which also uses variables:
SELECT
MIN(a) AS series_start,
MAX(a) AS series_end,
MAX(a) - MIN(a) + 1 AS series_count
FROM (
SELECT
a,
@r := @r + 1 AS r
FROM
yourtable,
(SELECT @r := 0) AS x
ORDER BY
a
) s
GROUP BY
a - r
ORDER BY
a - r
;
这就是它的工作原理.
子查询为表行分配行号并返回此行集:
The subquery assigns row numbers to the table rows and returns this row set:
a r
-- --
1 1
2 2
3 3
5 4
6 5
7 6
9 7
10 8
11 9
12 10
在这种情况下,存储行号的 r
列恰好与数据样本中的 id
列匹配,但我假设一般情况下id
列可能有间隙,因此不能在这里使用.
In this case the r
column, which stores the row numbers, happens to match the id
column in your data sample, but I'm assuming that in general the id
column may have gaps, and for that reason it cannot be used here.
主查询根据 r
和 a
之间的差异对结果进行分组:对于顺序值,它总是相同的:
The main query groups the results by the difference between r
and a
: for sequential values, it will always be the same:
a r a - r
-- -- -----
1 1 0
2 2 0
3 3 0
5 4 1
6 5 1
7 6 1
9 7 2
10 8 2
11 9 2
12 10 2
这允许我们将这些行组合在一起.此时剩下的就是获取最小值、最大值和计数,这会为您提供以下输出:
and that allows us to group such rows together. All that remains at this point is to get the minimim, maximum and count, which gives you this output:
series_start series_end series_count
------------ ---------- ------------
1 3 3
5 7 3
9 12 4
此查询的 SQL Fiddle 演示,我借用了 @sgeddes 的架构,可以在 这里.
A SQL Fiddle demonstration of this query, for which I've borrowed @sgeddes's schema, can be found here.
更新
由于不能使用数字变量(根据注释),您可以使用三角形自连接分配行号,但它的效率远低于使用变量.无论如何,这是修改后的版本,对先前查询的更改以粗体突出显示:
As numeric variables cannot be used (according to comments), you could assign row numbers using a triangular self-join, but it will be much less efficient than using a variable. Anyway, here's the modified version, changes to the previous query being highlighted in bold:
SELECT
MIN(a) AS series_start,
MAX(a) AS series_end,
MAX(a) - MIN(a) + 1 AS series_count
FROM (
SELECT
data.a,
COUNT(*) AS r
FROM
yourtable AS data
INNER JOIN
yourtable AS tally
ON
data.id >= tally.id
GROUP BY
data.a
) s
GROUP BY
a - r
ORDER BY
a - r
;
方法本身保持不变:子查询返回一个排序的行集,然后与以前一样处理.
The approach itself remains unchanged: the subquery returns a ranked row set, which is then processed same as previously.
可在此处获得用于修改查询的 SQL Fiddle 演示.
A SQL Fiddle demo for the modified query is available here.
这篇关于MySQL按数字系列分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!