在单个查询中为每个DISTINCT选择几条记录 [英] SELECT several records for each DISTINCT one in a single query

查看:117
本文介绍了在单个查询中为每个DISTINCT选择几条记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为每个不同的记录选择几行.几乎像在这里问的那样选择第一个尽管我使用MySQL,但SQL Server 2008中每个不同ID的n条记录.

I need to select several rows per each distinct record. Pretty much something like asked here Select first n records for each distinct ID in SQL Server 2008, although I use MySQL.

在这种情况下,可以通过运行21个查询来实现此目的:1个常规查询和20个查询以获取子记录,即:

The purpose can be achieved by running 21 queries in this case: 1 general and 20 to get the subrecords, i.e. something like this:

SELECT DISTINCT `user_id`
FROM `posts`
WHERE `deleted` = '0'
ORDER BY `user_id` ASC
LIMIT 20

...选择所有需要的行,然后

... to select all rows needed, and then

SELECT *
FROM `posts`
WHERE `deleted` = '0'
AND `user_id` = ?
ORDER BY `id` DESC
LIMIT 5

...在第一个查询选择的每一行中循环执行.

... in a loop per each row selected by the first query.

基本上,我有点需要每个用户获得5个帖子.我需要在单个查询中完成此操作.例如,posts设置就是我完成的,希望它可以更容易地理解我的需求.

Basically, I sort of need to get 5 posts of each user. I need this done in a single query. The posts setup is just for example, I made this up so it's, hopefully, easier to understand what I need.

我从以下查询开始:

SELECT * 
FROM `posts` 
WHERE `user_id` 
IN (
    SELECT DISTINCT `user_id` 
    FROM `posts` 
    ORDER BY `user_id` DESC 
    LIMIT 4
) 
LIMIT 5

但是我得到了#1235 - This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'错误.

所以我尝试了JOIN的想法,如建议的此处:

So I've tried the JOIN idea like suggested here:

SELECT  posts.id,
        posts.user_id,
        NULL
FROM    (
        SELECT  posts.user_id
        FROM    posts
        WHERE   posts.deleted = 0
        LIMIT 20
        ) q
JOIN    posts
ON      posts.user_id = q.user_id

我还尝试了建议的此处的几个嵌套查询:

I've also tried several nested queries as suggested here:

SELECT *
FROM posts 
WHERE user_id IN (
      SELECT * FROM (
            SELECT user_id 
            FROM posts 
            LIMIT 20
      ) 
      as t);

以及在Internet上找到的其他解决方案.但是它们要么不起作用,要么只是简单地从数据库中选择前N行(无论出于何种原因和联接).尝试过LEFT JOINRIGHT JOIN,甚至是INNER JOIN,但仍然没有成功.

And the other solutions found on the Internet. But they either do not work or just simply select the first N rows from the database (regardless of the conditions and joins for some reason). Tried LEFT JOIN, RIGHT JOIN, even INNER JOIN, but still no success.

请帮助.

更新,忘了提及该表的大小约为5GB.

UPDATE Forgot to mention that the table is around 5GB in size.

更新 尝试了子子查询:

SELECT * 
FROM `posts` 
WHERE
  `user_id` IN ( SELECT `user_id` FROM (
     SELECT DISTINCT `user_id` 
     FROM `posts` 
    ORDER BY `user_id` DESC 
    LIMIT 4 ) limit_users
  ) 
LIMIT 5

与上述相同,它返回以下内容:

Same as above, it returns the following:

+----+---------+------+
| id | user_id | post |
+----+---------+------+
|  1 |       1 |    a |
+----+---------+------+
|  2 |       1 |    b |
+----+---------+------+
|  3 |       1 |    c |
+----+---------+------+
| .. |      .. |   .. |

即同一用户的5行(外部LIMIT设置为该行).奇怪的是,如果我只运行子查询和子查询:

I.e. the 5 (which is what the outer LIMIT is set to) rows for the same user. The weird thing is that if I run the sub and sub-sub query alone:

    SELECT `user_id` FROM (
     SELECT DISTINCT `user_id` 
     FROM `posts` 
    ORDER BY `user_id` DESC 
    LIMIT 4 ) limit_users

我得到4个不同的值:

+---------+
| user_id |
+---------+
|       1 |
+---------+
|       2 |
+---------+
|       3 |
+---------+
|       4 |
+---------+

推荐答案

您必须使用变量,对有序查询进行两种不同的计数:一种是针对每个用户的帖子数,另一种是针对用户的帖子:

You have to use variables, doing two different counts on an ordered query: one on the number of the posts for each user, and one for the users:

SELECT posts_counts.*
FROM (
  SELECT
    posts.*,
    @post_count:=case when @prec_user_id=user_id then @post_count+1 else 1 end as pc,
    case when @prec_user_id<>user_id then @user_count:=@user_count+1 else @user_count end as uc,
    @prec_user_id:=user_id
  FROM
    posts,
    (select @prec_user_id:=0, @user_count:=0, @post_count:=0) counts
  ORDER BY
    posts.user_id ) posts_counts
WHERE pc<5 and uc<4

编辑:您也可以考虑尝试以下查询:

You might also consider trying this query:

SELECT * 
FROM `posts` 
WHERE
  `user_id` IN ( SELECT user_id FROM (
     SELECT DISTINCT `user_id` 
     FROM `posts` 
    ORDER BY `user_id` DESC 
    LIMIT 4 ) limit_users
  ) 
LIMIT 5

(这将从每个选定用户的所有帖子中选择5个帖子,因此仍然不是您所需要的,但是它使用了一个技巧来在子子查询中使用LIMIT)

(this will justs select 5 posts, out of all posts from each selected user, so it's still not what you need, but it uses a trick to use a LIMIT in a sub-subquery)

下一个查询将限制20个用户中的每5个帖子:

Next query will limit 5 posts for each of the 20 users:

select posts_limited.*
from (
  select
    posts.*,
    @row:=if(@last_user=posts.user_id, @row+1, 1) as row,
   @last_user:=posts.user_id
  from
    posts inner join
    (select user_id from
      (select distinct user_id
       from posts
       order by user_id desc
       LIMIT 20) limit_users
    ) limit_users
    on posts.user_id = limit_users.user_id,
    (select @last_user:=0, @row:=0) r
  ) posts_limited
  where row<=5

这篇关于在单个查询中为每个DISTINCT选择几条记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆