从MySQL数据库中选择多个字段重复项 [英] Select multiple field duplicates from MySQL Database
问题描述
我有一个旧论坛,该论坛包含带有重复的第一条帖子(可能有不同的回复)的主题.我要删除所有这些线程中的一个,但要删除其中一个(使线程的视图计数最高).
I've got an old forum which contains threads with duplicate first posts (perhaps differing replies). I want to delete all but one of these threads (leaving the thread with the highest view count).
我有以下SQL查询来帮助识别重复的线程,但是我找不到一种方法来仅列出xf_thread.view_count列的最小值最低的重复项:
I have the following SQL query to help identify duplicate threads, but I can't find a way for it to list only duplicates with the lowest value for the xf_thread.view_count column:
SELECT
t.thread_id, MIN(t.view_count)
FROM
xf_thread t
INNER JOIN
xf_post p ON p.thread_id = t.thread_id
WHERE
t.first_post_id = p.post_id
GROUP BY
t.title,
t.username,
p.message
HAVING
COUNT(t.title) > 1
AND COUNT(t.username) > 1
AND COUNT(p.message) > 1;
此查询目前已正确地对线程进行了分组,但它仅显示了一个随机的thread_id-而不是与min(view_count)对应的thread_id.
At the moment, this query correctly groups threads, but it only shows a random thread_id - rather than the thread_id corresponding to min(view_count).
我已经阅读了有关如何解决此问题的信息,但是我不知道该如何实现-因为看起来好像无法按查询对行进行排序.
I've read up on how to work around this, but I can't figure out how to achieve this - as it doesn't look like it's possible to order the row in group by queries.
修改
借助Madhur的帮助,查询现在返回所有要删除的线程ID.但是,我可以弄清楚如何删除具有匹配thread_id的行.这是我尝试使用的查询(它一直在运行,而选择查询( https://stackoverflow.com/a/52314208/2469308 )在几秒钟内运行:
Thanks to the help from Madhur, the query now returns all the of the thread IDs to be deleted. However, I can figure out how to delete the rows with matching thread_id's. Here's the query I tried to use (it just keeps running, whereas the select query (https://stackoverflow.com/a/52314208/2469308) runs in seconds:
DELETE FROM xf_thread
WHERE thread_id IN (SELECT Substring_index(Group_concat(DISTINCT t.thread_id
ORDER BY
t.view_count
ASC
SEPARATOR ','),
',', 1) AS
thread_id_with_minimum_views
FROM (SELECT *
FROM xf_thread) t
INNER JOIN xf_post p
ON p.thread_id = t.thread_id
WHERE t.first_post_id = p.post_id
AND t.user_id = 0
AND t.reply_count < 2
GROUP BY t.title,
t.username,
p.message
HAVING Count(t.title) > 1
AND Count(t.username) > 1
AND Count(p.message) > 1
ORDER BY t.thread_id);
推荐答案
A very hacky solution is to get the thread_id
sorted by view_count
in a GROUP_CONCAT. Then, we can utilize string operations to get the thread_id
with minimum view_count
.
在您的SELECT
子句中,而不是t.thread_id
,您可以尝试以下操作:
In your SELECT
clause, instead of t.thread_id
, you can try the following:
SUBSTRING_INDEX(GROUP_CONCAT(DISTINCT t.thread_id
ORDER BY t.view_count ASC
SEPARATOR ','),
',',
1) AS thread_id_with_minimum_views
现在,基于SELECT
查询以标识具有最小视图的重复记录,DELETE
查询将从xf_thread
表中删除此类记录如下:
Now, based on the SELECT
query to identify Duplicate records with Minimum view, DELETE
query to delete such records from the xf_thread
table will be as follows:
DELETE t_delete FROM xf_thread AS t_delete
INNER JOIN (SELECT CAST(SUBSTRING_INDEX(GROUP_CONCAT(DISTINCT t.thread_id ORDER BY t.view_count ASC SEPARATOR ','), ',', 1) AS UNSIGNED) AS tid_min_view
FROM (SELECT * FROM xf_thread) t
INNER JOIN xf_post p ON p.thread_id = t.thread_id
WHERE t.first_post_id = p.post_id
AND t.user_id = 0
AND t.reply_count < 2
GROUP BY t.title, t.username, p.message
HAVING Count(t.title) > 1
AND Count(t.username) > 1
AND Count(p.message) > 1
ORDER BY t.thread_id) AS t_dup
ON t_delete.thread_id = t_dup.tid_min_view
这篇关于从MySQL数据库中选择多个字段重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!