从MySQL数据库中选择多个字段重复项 [英] Select multiple field duplicates from MySQL Database

查看:337
本文介绍了从MySQL数据库中选择多个字段重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个旧论坛,该论坛包含带有重复的第一条帖子(可能有不同的回复)的主题.我要删除所有这些线程中的一个,但要删除其中一个(使线程的视图计数最高).

I've got an old forum which contains threads with duplicate first posts (perhaps differing replies). I want to delete all but one of these threads (leaving the thread with the highest view count).

我有以下SQL查询来帮助识别重复的线程,但是我找不到一种方法来仅列出xf_thread.view_count列的最小值最低的重复项:

I have the following SQL query to help identify duplicate threads, but I can't find a way for it to list only duplicates with the lowest value for the xf_thread.view_count column:

SELECT
    t.thread_id, MIN(t.view_count)
FROM
    xf_thread t
INNER JOIN
    xf_post p ON p.thread_id = t.thread_id
WHERE
    t.first_post_id = p.post_id
GROUP BY
    t.title,
    t.username,
    p.message
HAVING
    COUNT(t.title) > 1
    AND COUNT(t.username) > 1
    AND COUNT(p.message) > 1;

此查询目前已正确地对线程进行了分组,但它仅显示了一个随机的thread_id-而不是与min(view_count)对应的thread_id.

At the moment, this query correctly groups threads, but it only shows a random thread_id - rather than the thread_id corresponding to min(view_count).

我已经阅读了有关如何解决此问题的信息,但是我不知道该如何实现-因为看起来好像无法按查询对行进行排序.

I've read up on how to work around this, but I can't figure out how to achieve this - as it doesn't look like it's possible to order the row in group by queries.

修改

借助Madhur的帮助,查询现在返回所有要删除的线程ID.但是,我可以弄清楚如何删除具有匹配thread_id的行.这是我尝试使用的查询(它一直在运行,而选择查询( https://stackoverflow.com/a/52314208/2469308 )在几秒钟内运行:

Thanks to the help from Madhur, the query now returns all the of the thread IDs to be deleted. However, I can figure out how to delete the rows with matching thread_id's. Here's the query I tried to use (it just keeps running, whereas the select query (https://stackoverflow.com/a/52314208/2469308) runs in seconds:

DELETE FROM xf_thread 
WHERE  thread_id IN (SELECT Substring_index(Group_concat(DISTINCT t.thread_id 
                                            ORDER BY 
                                            t.view_count 
                                                                ASC 
                                            SEPARATOR ','), 
                                   ',', 1) AS 
                            thread_id_with_minimum_views 
                     FROM   (SELECT * 
                             FROM   xf_thread) t 
                            INNER JOIN xf_post p 
                                    ON p.thread_id = t.thread_id 
                     WHERE  t.first_post_id = p.post_id 
                            AND t.user_id = 0 
                            AND t.reply_count < 2 
                     GROUP  BY t.title, 
                               t.username, 
                               p.message 
                     HAVING Count(t.title) > 1 
                            AND Count(t.username) > 1 
                            AND Count(p.message) > 1 
                     ORDER  BY t.thread_id); 

推荐答案

一个非常棘手的解决方案是在

A very hacky solution is to get the thread_id sorted by view_count in a GROUP_CONCAT. Then, we can utilize string operations to get the thread_id with minimum view_count.

在您的SELECT子句中,而不是t.thread_id,您可以尝试以下操作:

In your SELECT clause, instead of t.thread_id, you can try the following:

SUBSTRING_INDEX(GROUP_CONCAT(DISTINCT t.thread_id 
                             ORDER BY t.view_count ASC 
                             SEPARATOR ','), 
                ',', 
                1) AS thread_id_with_minimum_views

现在,基于SELECT查询以标识具有最小视图的重复记录,DELETE查询将从xf_thread表中删除此类记录如下:

Now, based on the SELECT query to identify Duplicate records with Minimum view, DELETE query to delete such records from the xf_thread table will be as follows:

DELETE t_delete FROM xf_thread AS t_delete 
INNER JOIN (SELECT CAST(SUBSTRING_INDEX(GROUP_CONCAT(DISTINCT t.thread_id ORDER BY t.view_count ASC SEPARATOR ','), ',', 1) AS UNSIGNED) AS tid_min_view 
            FROM (SELECT * FROM xf_thread) t 
            INNER JOIN xf_post p ON p.thread_id = t.thread_id 
            WHERE t.first_post_id = p.post_id 
              AND t.user_id = 0 
              AND t.reply_count < 2 
            GROUP BY t.title, t.username, p.message 
            HAVING Count(t.title) > 1 
               AND Count(t.username) > 1 
               AND Count(p.message) > 1 
            ORDER BY t.thread_id) AS t_dup 
  ON t_delete.thread_id = t_dup.tid_min_view 

这篇关于从MySQL数据库中选择多个字段重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆