检索每个组中的最后一条记录 - MySQL [英] Retrieving the last record in each group - MySQL

查看:109
本文介绍了检索每个组中的最后一条记录 - MySQL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个表 messages ,其中包含如下所示的数据:

  Id名称Other_Columns 
-------------------------
1 A A_data_1
2 A A_data_2
3 A A_data_3
4 B B_data_1
5 B B_data_2
6 C C_data_1

如果我运行一个查询 select * from messages group by name ,我会得到如下结果:

  1 A A_data_1 
4 B B_data_1
6 C C_data_1

什么查询会返回以下结果?

  3 A A_data_3 
5 B B_data_2
6 C C_data_1

也就是说,应该返回每个组中的最后一条记录。 / p>

目前,这是我使用的查询:

  SELECT 
*
FROM(SELECT
*
FROM messages
ORDER BY id DESC) AS x
GROUP BY名称

但这看起来非常低效。任何其他方式来实现相同的结果?

解决方案

现在,MySQL 8.0支持窗口函数,就像几乎所有流行的SQL实现一样。使用这种标准语法,我们可以编写出最大的每组查询:

  WITH ranking_messages AS(
SELECT m。*,ROW_NUMBER()OVER(PARTITION BY name ORDER BY id DESC)AS rn
FROM messages AS m

SELECT * FROM ranking_messages WHERE rn = 1;

以下是我在2009年为此问题撰写的原始答案:




我这样写解决方案:

  SELECT m1。* 
FROM messages m1 LEFT JOIN消息m2
ON(m1.name = m2.name AND m1.id< m2.id)
WHERE m2.id IS NULL;

关于性能,根据数据的性质,一种解决方案或另一种解决方案可能会更好。因此,您应该测试两个查询,并使用性能更好的数据库。



例如,我有一份 StackOverflow八月数据转储。我会用它来进行基准测试。在 Posts 表中有1,114,357行。这是在我的Macbook Pro 2.40GHz上 MySQL 5.0.75上运行。



我将编写一个查询来查找给定用户ID(我的)的最新帖子。



首先使用< 通过@Eric显示 GROUP BY 在子查询中:

  SELECT p1.postid 
FROM职位p1
INNER JOIN(SELECT pi.owneruserid,MAX(pi.postid)AS maxpostid
FROM职位pi GROUP BY pi.owneruserid)p2
ON(p1.postid = p2 .maxpostid)
WHERE p1.owneruserid = 20860;

1行(1分钟17.89秒)

即使< a href =https://dev.mysql.com/doc/refman/5.7/en/using-explain.html =noreferrer> EXPLAIN 分析超过16秒:

  + ---- + ------------ -  + ------------ + -------- + -------------------------- -  + ------------- + --------- + -------------- + -------- -  + ------------- + 
| id | select_type |表| |键入| possible_keys |键| key_len | ref |行|额外|
+ ---- + ------------- + ------------ + -------- + ---- ------------------------ + ------------- + --------- + - ------------- + --------- + ------------- +
| 1 | PRIMARY | < Derived2的> | ALL | NULL | NULL | NULL | NULL | 76756 | |
| 1 | PRIMARY | p1 | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY | 8 | p2.maxpostid | 1 |使用where |
| 2 | DERIVED | pi | index | NULL | OwnerUserId | 8 | NULL | 1151268 |使用index |
+ ---- + ------------- + ------------ + -------- + ---- ------------------------ + ------------- + --------- + - ------------- + --------- + ------------- +
3行(16.09秒)

现在使用 LEFT JOIN code>:

  SELECT p1.postid 
FROM文章p1 LEFT JOIN文章p2
ON(p1.owneruserid = p2.owneruserid AND p1.postid< p2.postid)
WHERE p2.postid IS NULL AND p1.owneruserid = 20860;

1行(0.28秒)

EXPLAIN 分析显示两个表都可以使用它们的索引:

  + --- -  + ------------- + ------- + ------ + ------------------- --------- ------------- + + --------- + ------- + ------ +  - ------------------------------------- + 
| id | select_type |表| |键入| possible_keys |键| key_len | ref |行|额外|
+ ---- + ------------- + ------- + ------ + ----------- ----------------- + ------------- + --------- + ------- + ------ + -------------------------------------- +
| 1 | SIMPLE | p1 | ref | OwnerUserId | OwnerUserId | 8 | const | 1384 |使用index |
| 1 | SIMPLE | p2 | ref | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8 | const | 1384 |在哪里使用;使用索引;不存在|
+ ---- + ------------- + ------- + ------ + ----------- ----------------- + ------------- + --------- + ------- + ------ + -------------------------------------- +
设置2行(0.00秒)






这里是DDL for my Posts 表:

  CREATE TABLE`posts`($ b $ PostId bigint 20 unsigned NOT NULL auto_increment $ b $ PostTypeId bigint 20 unsigned NOT NULL $ b $ AcceptedAnswerId bigint(20)unsigned default NULL 
` ParentId` bigint(20)unsigned default NULL,
`CreationDate` datetime NOT NULL,
`Score` int(11)NOT NULL默认值为'0',
`ViewCount` int(11) NOT NULL默认为'0',
`Body` text NOT NULL,
`OwnerUserId` bigint(20)unsigned NOT NULL,
`OwnerDisplayName` varchar(40)default NULL,
`LastEditorUserId` bigint(20)unsigned default NULL,
`LastEditDate` datetime默认NULL,
`LastActivityDate` datetime默认NULL,
`Title` varch ar(250)NOT NULL默认'',
`Tags` varchar(150)NOT NULL默认'',
`AnswerCount` int(11)NOT NULL默认'0',
` CommentCount` int(11)NOT NULL默认'0',
`FavoriteCount` int(11)NOT NULL默认'0',
`ClosedDate` datetime默认NULL,
PRIMARY KEY(` ('PostId'),
UNIQUE KEY`PostId`(`PostId`),
`Key'PostTypeId`('PostTypeId`),
Key`AcceptedAnswerId````AcceptedAnswerId`),
KEY'OwnerUserId`(`OwnerUserId`),
KEY`LastEditorUserId`(`LastEditorUserId`),
KEY`ParentId`(`ParentId`),
CONSTRAINT`posts_ibfk_1` FOREIGN KEY `PostTypeId`)参考`posttypes`(`PostTypeId`)
)ENGINE = InnoDB;


There is a table messages that contains data as shown below:

Id   Name   Other_Columns
-------------------------
1    A       A_data_1
2    A       A_data_2
3    A       A_data_3
4    B       B_data_1
5    B       B_data_2
6    C       C_data_1

If I run a query select * from messages group by name, I will get the result as:

1    A       A_data_1
4    B       B_data_1
6    C       C_data_1

What query will return the following result?

3    A       A_data_3
5    B       B_data_2
6    C       C_data_1

That is, the last record in each group should be returned.

At present, this is the query that I use:

SELECT
  *
FROM (SELECT
  *
FROM messages
ORDER BY id DESC) AS x
GROUP BY name

But this looks highly inefficient. Any other ways to achieve the same result?

解决方案

MySQL 8.0 now supports windowing functions, like almost all popular SQL implementations. With this standard syntax, we can write greatest-n-per-group queries:

WITH ranked_messages AS (
  SELECT m.*, ROW_NUMBER() OVER (PARTITION BY name ORDER BY id DESC) AS rn
  FROM messages AS m
)
SELECT * FROM ranked_messages WHERE rn = 1;

Below is the original answer I wrote for this question in 2009:


I write the solution this way:

SELECT m1.*
FROM messages m1 LEFT JOIN messages m2
 ON (m1.name = m2.name AND m1.id < m2.id)
WHERE m2.id IS NULL;

Regarding performance, one solution or the other can be better, depending on the nature of your data. So you should test both queries and use the one that is better at performance given your database.

For example, I have a copy of the StackOverflow August data dump. I'll use that for benchmarking. There are 1,114,357 rows in the Posts table. This is running on MySQL 5.0.75 on my Macbook Pro 2.40GHz.

I'll write a query to find the most recent post for a given user ID (mine).

First using the technique shown by @Eric with the GROUP BY in a subquery:

SELECT p1.postid
FROM Posts p1
INNER JOIN (SELECT pi.owneruserid, MAX(pi.postid) AS maxpostid
            FROM Posts pi GROUP BY pi.owneruserid) p2
  ON (p1.postid = p2.maxpostid)
WHERE p1.owneruserid = 20860;

1 row in set (1 min 17.89 sec)

Even the EXPLAIN analysis takes over 16 seconds:

+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
| id | select_type | table      | type   | possible_keys              | key         | key_len | ref          | rows    | Extra       |
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL                       | NULL        | NULL    | NULL         |   76756 |             | 
|  1 | PRIMARY     | p1         | eq_ref | PRIMARY,PostId,OwnerUserId | PRIMARY     | 8       | p2.maxpostid |       1 | Using where | 
|  2 | DERIVED     | pi         | index  | NULL                       | OwnerUserId | 8       | NULL         | 1151268 | Using index | 
+----+-------------+------------+--------+----------------------------+-------------+---------+--------------+---------+-------------+
3 rows in set (16.09 sec)

Now produce the same query result using my technique with LEFT JOIN:

SELECT p1.postid
FROM Posts p1 LEFT JOIN posts p2
  ON (p1.owneruserid = p2.owneruserid AND p1.postid < p2.postid)
WHERE p2.postid IS NULL AND p1.owneruserid = 20860;

1 row in set (0.28 sec)

The EXPLAIN analysis shows that both tables are able to use their indexes:

+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
| id | select_type | table | type | possible_keys              | key         | key_len | ref   | rows | Extra                                |
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
|  1 | SIMPLE      | p1    | ref  | OwnerUserId                | OwnerUserId | 8       | const | 1384 | Using index                          | 
|  1 | SIMPLE      | p2    | ref  | PRIMARY,PostId,OwnerUserId | OwnerUserId | 8       | const | 1384 | Using where; Using index; Not exists | 
+----+-------------+-------+------+----------------------------+-------------+---------+-------+------+--------------------------------------+
2 rows in set (0.00 sec)


Here's the DDL for my Posts table:

CREATE TABLE `posts` (
  `PostId` bigint(20) unsigned NOT NULL auto_increment,
  `PostTypeId` bigint(20) unsigned NOT NULL,
  `AcceptedAnswerId` bigint(20) unsigned default NULL,
  `ParentId` bigint(20) unsigned default NULL,
  `CreationDate` datetime NOT NULL,
  `Score` int(11) NOT NULL default '0',
  `ViewCount` int(11) NOT NULL default '0',
  `Body` text NOT NULL,
  `OwnerUserId` bigint(20) unsigned NOT NULL,
  `OwnerDisplayName` varchar(40) default NULL,
  `LastEditorUserId` bigint(20) unsigned default NULL,
  `LastEditDate` datetime default NULL,
  `LastActivityDate` datetime default NULL,
  `Title` varchar(250) NOT NULL default '',
  `Tags` varchar(150) NOT NULL default '',
  `AnswerCount` int(11) NOT NULL default '0',
  `CommentCount` int(11) NOT NULL default '0',
  `FavoriteCount` int(11) NOT NULL default '0',
  `ClosedDate` datetime default NULL,
  PRIMARY KEY  (`PostId`),
  UNIQUE KEY `PostId` (`PostId`),
  KEY `PostTypeId` (`PostTypeId`),
  KEY `AcceptedAnswerId` (`AcceptedAnswerId`),
  KEY `OwnerUserId` (`OwnerUserId`),
  KEY `LastEditorUserId` (`LastEditorUserId`),
  KEY `ParentId` (`ParentId`),
  CONSTRAINT `posts_ibfk_1` FOREIGN KEY (`PostTypeId`) REFERENCES `posttypes` (`PostTypeId`)
) ENGINE=InnoDB;

这篇关于检索每个组中的最后一条记录 - MySQL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆