检索每个组中的最后一条记录 - MySQL [英] Retrieving the last record in each group - MySQL

查看：109 发布时间：2018/5/30 13:30:50 sql mysql group-by greatest-n-per-group

本文介绍了检索每个组中的最后一条记录 - MySQL的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有一个表 messages ，其中包含如下所示的数据：

  Id名称Other_Columns 
 ------------------------- 
 1 A A_data_1 
 2 A A_data_2 
 3 A A_data_3 
 4 B B_data_1 
 5 B B_data_2 
 6 C C_data_1

如果我运行一个查询 select * from messages group by name ，我会得到如下结果：

  1 A A_data_1 
 4 B B_data_1 
 6 C C_data_1

什么查询会返回以下结果？

  3 A A_data_3 
 5 B B_data_2 
 6 C C_data_1

也就是说，应该返回每个组中的最后一条记录。 / p>

目前，这是我使用的查询：

  SELECT 
 * 
 FROM（SELECT 
 * 
 FROM messages 
 ORDER BY id DESC） AS x 
 GROUP BY名称

但这看起来非常低效。任何其他方式来实现相同的结果？

解决方案

现在，MySQL 8.0支持窗口函数，就像几乎所有流行的SQL实现一样。使用这种标准语法，我们可以编写出最大的每组查询：

  WITH ranking_messages AS（
 SELECT m。*，ROW_NUMBER（）OVER（PARTITION BY name ORDER BY id DESC）AS rn 
 FROM messages AS m 
）
 SELECT * FROM ranking_messages WHERE rn = 1;

以下是我在2009年为此问题撰写的原始答案：

我这样写解决方案：

SELECT m1。* FROM messages m1 LEFT JOIN消息m2 ON（m1.name = m2.name AND m1.id< m2.id） WHERE m2.id IS NULL;
关于性能，根据数据的性质，一种解决方案或另一种解决方案可能会更好。因此，您应该测试两个查询，并使用性能更好的数据库。

例如，我有一份 StackOverflow八月数据转储。我会用它来进行基准测试。在 Posts 表中有1,114,357行。这是在我的Macbook Pro 2.40GHz上 MySQL 5.0.75上运行。

我将编写一个查询来查找给定用户ID（我的）的最新帖子。

首先使用< 通过@Eric显示 GROUP BY 在子查询中：
SELECT p1.postid FROM职位p1 INNER JOIN（SELECT pi.owneruserid，MAX（pi.postid）AS maxpostid FROM职位pi GROUP BY pi.owneruserid）p2 ON（p1.postid = p2 .maxpostid） WHERE p1.owneruserid = 20860; 1行（1分钟17.89秒）
即使< a href =https://dev.mysql.com/doc/refman/5.7/en/using-explain.html =noreferrer> EXPLAIN 分析超过16秒：
+ ---- + ------------ - + ------------ + -------- + -------------------------- - + ------------- + --------- + -------------- + -------- - + ------------- + | id | select_type |表| |键入| possible_keys |键| key_len | ref |行|额外| + ---- + ------------- + ------------ + -------- + ---- ------------------------ + ------------- + --------- + - ------------- + --------- + ------------- + | 1 | PRIMARY | < Derived2的> | ALL | NULL | NULL | NULL | NULL | 76756 | | | 1 | PRIMARY | p1 | eq_ref | PRIMARY，PostId，OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 |使用where | | 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 |使用index | + ---- + ------------- + ------------ + -------- + ---- ------------------------ + ------------- + --------- + - ------------- + --------- + ------------- + 3行（16.09秒）
现在使用与 LEFT JOIN code>：
SELECT p1.postid FROM文章p1 LEFT JOIN文章p2 ON（p1.owneruserid = p2.owneruserid AND p1.postid< p2.postid） WHERE p2.postid IS NULL AND p1.owneruserid = 20860; 1行（0.28秒）
EXPLAIN 分析显示两个表都可以使用它们的索引：
+ --- - + ------------- + ------- + ------ + ------------------- --------- ------------- + + --------- + ------- + ------ + - ------------------------------------- + | id | select_type |表| |键入| possible_keys |键| key_len | ref |行|额外| + ---- + ------------- + ------- + ------ + ----------- ----------------- + ------------- + --------- + ------- + ------ + -------------------------------------- + | 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 |使用index | | 1 | SIMPLE | p2 | ref | PRIMARY，PostId，OwnerUserId | OwnerUserId | 8 | const | 1384 |在哪里使用;使用索引;不存在| + ---- + ------------- + ------- + ------ + ----------- ----------------- + ------------- + --------- + ------- + ------ + -------------------------------------- + 设置2行（0.00秒）
这里是DDL for my Posts 表： CREATE TABLE`posts`（$ b $ PostId bigint 20 unsigned NOT NULL auto_increment $ b $ PostTypeId bigint 20 unsigned NOT NULL $ b $ AcceptedAnswerId bigint（20）unsigned default NULL ` ParentId` bigint（20）unsigned default NULL， `CreationDate` datetime NOT NULL， `Score` int（11）NOT NULL默认值为'0'， `ViewCount` int（11） NOT NULL默认为'0'， `Body` text NOT NULL， `OwnerUserId` bigint（20）unsigned NOT NULL， `OwnerDisplayName` varchar（40）default NULL， `LastEditorUserId` bigint（20）unsigned default NULL， `LastEditDate` datetime默认NULL， `LastActivityDate` datetime默认NULL， `Title` varch ar（250）NOT NULL默认''， `Tags` varchar（150）NOT NULL默认''， `AnswerCount` int（11）NOT NULL默认'0'， ` CommentCount` int（11）NOT NULL默认'0'， `FavoriteCount` int（11）NOT NULL默认'0'， `ClosedDate` datetime默认NULL， PRIMARY KEY（` （'PostId'）， UNIQUE KEY`PostId`（`PostId`）， `Key'PostTypeId`（'PostTypeId`）， Key`AcceptedAnswerId````AcceptedAnswerId`）， KEY'OwnerUserId`（`OwnerUserId`）， KEY`LastEditorUserId`（`LastEditorUserId`）， KEY`ParentId`（`ParentId`）， CONSTRAINT`posts_ibfk_1` FOREIGN KEY `PostTypeId`）参考`posttypes`（`PostTypeId`））ENGINE = InnoDB; There is a table messages that contains data as shown below: Id Name Other_Columns ------------------------- 1 A A_data_1 2 A A_data_2 3 A A_data_3 4 B B_data_1 5 B B_data_2 6 C C_data_1 If I run a query select * from messages group by name, I will get the result as: 1 A A_data_1 4 B B_data_1 6 C C_data_1 What query will return the following result? 3 A A_data_3 5 B B_data_2 6 C C_data_1 That is, the last record in each group should be returned. At present, this is the query that I use: SELECT * FROM (SELECT * FROM messages ORDER BY id DESC) AS x GROUP BY name But this looks highly inefficient. Any other ways to achieve the same result? 解决方案 MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries: WITH ranked_messages AS ( SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn FROM messages AS m ) SELECT * FROM ranked_messages WHERE rn = 1; Below is the original answer I wrote for this question in 2009: I write the solution this way: SELECT m1.* FROM messages m1 LEFT JOIN messages m2 ON (m1.name = m2.name AND m1.id < m2.id) WHERE m2.id IS NULL; Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz. I'll write a query to find the most recent post for a given user ID (mine). First using the technique shown by @Eric with the GROUP BY in a subquery: SELECT p1.postid FROM Posts p1 INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid FROM Posts pi GROUP BY pi.owneruserid) p2 ON (p1.postid = p2.maxpostid) WHERE p1.owneruserid = 20860; 1 row in set (1 min 17.89 sec) Even the EXPLAIN analysis takes over 16 seconds: +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 76756 | | | 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 | Using where | | 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 | Using index | +----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+ 3 rows in set (16.09 sec) Now produce the same query result using my technique with LEFT JOIN: SELECT p1.postid FROM Posts p1 LEFT JOIN posts p2 ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid) WHERE p2.postid IS NULL AND p1.owneruserid = 20860; 1 row in set (0.28 sec) The EXPLAIN analysis shows that both tables are able to use their indexes: +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ | 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 | Using index | | 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 | Using where; Using index; Not exists | +----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+ 2 rows in set (0.00 sec) Here's the DDL for my Posts table: CREATE TABLE `posts` ( `PostId` bigint(20) unsigned NOT NULL auto_increment, `PostTypeId` bigint(20) unsigned NOT NULL, `AcceptedAnswerId` bigint(20) unsigned default NULL, `ParentId` bigint(20) unsigned default NULL, `CreationDate` datetime NOT NULL, `Score` int(11) NOT NULL default '0', `ViewCount` int(11) NOT NULL default '0', `Body` text NOT NULL, `OwnerUserId` bigint(20) unsigned NOT NULL, `OwnerDisplayName` varchar(40) default NULL, `LastEditorUserId` bigint(20) unsigned default NULL, `LastEditDate` datetime default NULL, `LastActivityDate` datetime default NULL, `Title` varchar(250) NOT NULL default '', `Tags` varchar(150) NOT NULL default '', `AnswerCount` int(11) NOT NULL default '0', `CommentCount` int(11) NOT NULL default '0', `FavoriteCount` int(11) NOT NULL default '0', `ClosedDate` datetime default NULL, PRIMARY KEY (`PostId`), UNIQUE KEY `PostId` (`PostId`), KEY `PostTypeId` (`PostTypeId`), KEY `AcceptedAnswerId` (`AcceptedAnswerId`), KEY `OwnerUserId` (`OwnerUserId`), KEY `LastEditorUserId` (`LastEditorUserId`), KEY `ParentId` (`ParentId`), CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`) ) ENGINE=InnoDB; 这篇关于检索每个组中的最后一条记录 - MySQL的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

检索每个组中的最后一条记录 - MySQL [英] Retrieving the last record in each group - MySQL

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

检索每个组中的最后一条记录 - MySQL [英] Retrieving the last record in each group - MySQL

问题描述

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭