在单个查询中为每个DISTINCT选择几条记录 [英] SELECT several records for each DISTINCT one in a single query
问题描述
我需要为每个不同的记录选择几行.几乎像在这里问的那样选择第一个尽管我使用MySQL,但SQL Server 2008中每个不同ID的n条记录.
I need to select several rows per each distinct record. Pretty much something like asked here Select first n records for each distinct ID in SQL Server 2008, although I use MySQL.
在这种情况下,可以通过运行21个查询来实现此目的:1个常规查询和20个查询以获取子记录,即:
The purpose can be achieved by running 21 queries in this case: 1 general and 20 to get the subrecords, i.e. something like this:
SELECT DISTINCT `user_id`
FROM `posts`
WHERE `deleted` = '0'
ORDER BY `user_id` ASC
LIMIT 20
...选择所有需要的行,然后
... to select all rows needed, and then
SELECT *
FROM `posts`
WHERE `deleted` = '0'
AND `user_id` = ?
ORDER BY `id` DESC
LIMIT 5
...在第一个查询选择的每一行中循环执行.
... in a loop per each row selected by the first query.
基本上,我有点需要每个用户获得5个帖子.我需要在单个查询中完成此操作.例如,posts
设置就是我完成的,希望它可以更容易地理解我的需求.
Basically, I sort of need to get 5 posts of each user. I need this done in a single query. The posts
setup is just for example, I made this up so it's, hopefully, easier to understand what I need.
我从以下查询开始:
SELECT *
FROM `posts`
WHERE `user_id`
IN (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4
)
LIMIT 5
但是我得到了#1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'
错误.
所以我尝试了JOIN
的想法,如建议的此处:
So I've tried the JOIN
idea like suggested here:
SELECT posts.id,
posts.user_id,
NULL
FROM (
SELECT posts.user_id
FROM posts
WHERE posts.deleted = 0
LIMIT 20
) q
JOIN posts
ON posts.user_id = q.user_id
我还尝试了建议的此处的几个嵌套查询:
I've also tried several nested queries as suggested here:
SELECT *
FROM posts
WHERE user_id IN (
SELECT * FROM (
SELECT user_id
FROM posts
LIMIT 20
)
as t);
以及在Internet上找到的其他解决方案.但是它们要么不起作用,要么只是简单地从数据库中选择前N行(无论出于何种原因和联接).尝试过LEFT JOIN
,RIGHT JOIN
,甚至是INNER JOIN
,但仍然没有成功.
And the other solutions found on the Internet. But they either do not work or just simply select the first N rows from the database (regardless of the conditions and joins for some reason). Tried LEFT JOIN
, RIGHT JOIN
, even INNER JOIN
, but still no success.
请帮助.
更新,忘了提及该表的大小约为5GB.
UPDATE Forgot to mention that the table is around 5GB in size.
更新 尝试了子子查询:
SELECT *
FROM `posts`
WHERE
`user_id` IN ( SELECT `user_id` FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
)
LIMIT 5
与上述相同,它返回以下内容:
Same as above, it returns the following:
+----+---------+------+
| id | user_id | post |
+----+---------+------+
| 1 | 1 | a |
+----+---------+------+
| 2 | 1 | b |
+----+---------+------+
| 3 | 1 | c |
+----+---------+------+
| .. | .. | .. |
即同一用户的5行(外部LIMIT
设置为该行).奇怪的是,如果我只运行子查询和子查询:
I.e. the 5 (which is what the outer LIMIT
is set to) rows for the same user. The weird thing is that if I run the sub and sub-sub query alone:
SELECT `user_id` FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
我得到4个不同的值:
+---------+
| user_id |
+---------+
| 1 |
+---------+
| 2 |
+---------+
| 3 |
+---------+
| 4 |
+---------+
推荐答案
您必须使用变量,对有序查询进行两种不同的计数:一种是针对每个用户的帖子数,另一种是针对用户的帖子:
You have to use variables, doing two different counts on an ordered query: one on the number of the posts for each user, and one for the users:
SELECT posts_counts.*
FROM (
SELECT
posts.*,
@post_count:=case when @prec_user_id=user_id then @post_count+1 else 1 end as pc,
case when @prec_user_id<>user_id then @user_count:=@user_count+1 else @user_count end as uc,
@prec_user_id:=user_id
FROM
posts,
(select @prec_user_id:=0, @user_count:=0, @post_count:=0) counts
ORDER BY
posts.user_id ) posts_counts
WHERE pc<5 and uc<4
编辑:您也可以考虑尝试以下查询:
You might also consider trying this query:
SELECT *
FROM `posts`
WHERE
`user_id` IN ( SELECT user_id FROM (
SELECT DISTINCT `user_id`
FROM `posts`
ORDER BY `user_id` DESC
LIMIT 4 ) limit_users
)
LIMIT 5
(这将从每个选定用户的所有帖子中选择5个帖子,因此仍然不是您所需要的,但是它使用了一个技巧来在子子查询中使用LIMIT)
(this will justs select 5 posts, out of all posts from each selected user, so it's still not what you need, but it uses a trick to use a LIMIT in a sub-subquery)
下一个查询将限制20个用户中的每5个帖子:
Next query will limit 5 posts for each of the 20 users:
select posts_limited.*
from (
select
posts.*,
@row:=if(@last_user=posts.user_id, @row+1, 1) as row,
@last_user:=posts.user_id
from
posts inner join
(select user_id from
(select distinct user_id
from posts
order by user_id desc
LIMIT 20) limit_users
) limit_users
on posts.user_id = limit_users.user_id,
(select @last_user:=0, @row:=0) r
) posts_limited
where row<=5
这篇关于在单个查询中为每个DISTINCT选择几条记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!