SQL仅选择列上具有最大值的行 [英] SQL select only rows with max value on a column

查看:94
本文介绍了SQL仅选择列上具有最大值的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有此文件表(此处为简化版):

I have this table for documents (simplified version here):

+------+-------+--------------------------------------+
| id   | rev   | content                              |
+------+-------+--------------------------------------+
| 1    | 1     | ...                                  |
| 2    | 1     | ...                                  |
| 1    | 2     | ...                                  |
| 1    | 3     | ...                                  |
+------+-------+--------------------------------------+

如何为每个ID选择一行,而仅选择最大的转速?
使用以上数据,结果应包含两行:[1, 3, ...][2, 1, ..].我正在使用 MySQL .

How do I select one row per id and only the greatest rev?
With the above data, the result should contain two rows: [1, 3, ...] and [2, 1, ..]. I'm using MySQL.

当前,我在while循环中使用检查来检测和覆盖结果集中的旧转速.但这是获得结果的唯一方法吗?没有 SQL 解决方案吗?

Currently I use checks in the while loop to detect and over-write old revs from the resultset. But is this the only method to achieve the result? Isn't there a SQL solution?

更新
答案表明,有一个 是一个SQL解决方案,并且此处是一个sqlfiddle演示.

Update
As the answers suggest, there is a SQL solution, and here a sqlfiddle demo.

更新2
我发现添加上述 sqlfiddle 后,问题的投票率已经超过答案的投票率.那不是意图!小提琴是基于答案,尤其是被接受的答案.

Update 2
I noticed after adding the above sqlfiddle, the rate at which the question is upvoted has surpassed the upvote rate of the answers. That has not been the intention! The fiddle is based on the answers, especially the accepted answer.

推荐答案

乍一看...

您需要的是带有MAX聚合函数的GROUP BY子句:

At first glance...

All you need is a GROUP BY clause with the MAX aggregate function:

SELECT id, MAX(rev)
FROM YourTable
GROUP BY id

从来没有那么简单,是吗?

我刚刚注意到您也需要content列.

这是SQL中一个非常常见的问题:查找行的整个数据,并在每个组标识符的列中找到某个最大值.在我的职业生涯中,我听到了很多.实际上,这是我在当前工作的技术面试中回答的问题之一.

This is a very common question in SQL: find the whole data for the row with some max value in a column per some group identifier. I heard that a lot during my career. Actually, it was one the questions I answered in my current job's technical interview.

实际上,是如此普遍,以至于StackOverflow社区创建了一个标签来处理诸如此类的问题:.

It is, actually, so common that StackOverflow community has created a single tag just to deal with questions like that: greatest-n-per-group.

基本上,您有两种方法可以解决该问题:

Basically, you have two approaches to solve that problem:

在这种方法中,您首先在子查询中找到group-identifier, max-value-in-group(上面已解决).然后将表同时连接到group-identifiermax-value-in-group上的子查询:

In this approach, you first find the group-identifier, max-value-in-group (already solved above) in a sub-query. Then you join your table to the sub-query with equality on both group-identifier and max-value-in-group:

SELECT a.id, a.rev, a.contents
FROM YourTable a
INNER JOIN (
    SELECT id, MAX(rev) rev
    FROM YourTable
    GROUP BY id
) b ON a.id = b.id AND a.rev = b.rev

通过自我左联接,调整联接条件和过滤器

在这种方法中,您将表与其自身保持连接.平等在group-identifier中进行.然后,进行2个明智的举动:

Left Joining with self, tweaking join conditions and filters

In this approach, you left join the table with itself. Equality goes in the group-identifier. Then, 2 smart moves:

  1. 第二个连接条件是左侧值小于右侧值
  2. 执行步骤1时,实际具有最大值的行将在右侧具有NULL(是LEFT JOIN,还记得吗?).然后,我们过滤联接的结果,仅显示右侧为NULL的行.
  1. The second join condition is having left side value less than right value
  2. When you do step 1, the row(s) that actually have the max value will have NULL in the right side (it's a LEFT JOIN, remember?). Then, we filter the joined result, showing only the rows where the right side is NULL.

所以您最终得到了:

SELECT a.*
FROM YourTable a
LEFT OUTER JOIN YourTable b
    ON a.id = b.id AND a.rev < b.rev
WHERE b.id IS NULL;

结论

两种方法都带来完全相同的结果.

Conclusion

Both approaches bring the exact same result.

如果在group-identifier中有两行带有max-value-in-group,则两种方法的结果都将出现在这两行中.

If you have two rows with max-value-in-group for group-identifier, both rows will be in the result in both approaches.

这两种方法都与SQL ANSI兼容,因此无论其味道"如何,都可以与您喜欢的RDBMS一起使用.

Both approaches are SQL ANSI compatible, thus, will work with your favorite RDBMS, regardless of its "flavor".

这两种方法对性能也很友好,但是您的工作量可能会有所不同(RDBMS,数据库结构,索引等).因此,当您选择一种方法而不是另一种方法时,就是 benchmark .并确保选择最适合自己的一种.

Both approaches are also performance friendly, however your mileage may vary (RDBMS, DB Structure, Indexes, etc.). So when you pick one approach over the other, benchmark. And make sure you pick the one which make most of sense to you.

这篇关于SQL仅选择列上具有最大值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆