如果我有一个具有多个列值相同的 MySQL 表,如何删除除两个最新条目之外的所有条目? [英] If I have a MySQL table with multiple colum values the same, how do I delete all but two of the most recent entries?

查看:34
本文介绍了如果我有一个具有多个列值相同的 MySQL 表,如何删除除两个最新条目之外的所有条目?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这听起来像是几个问题的重复,而且很可能是,但我已经搜索并尝试了我自己的几种可能解决方案的实现,但所有这些似乎都导致某种形式的无限递归只是咀嚼 100​​% CPU 而什么都不做.那可能是因为我做错了,或者它们不适合我,我不知道.

I know this sounds like a duplicate of a few questions, and it may well be, but I've searched through and tried my own implementation of several possible solutions but all of them seem to result in some form of infinite recursion that just chews 100% CPU and does nothing. That could be because I'm doing it wrong or they aren't appropriate for me, I don't know.

我有一个结构如下的 MySQL 表:

I have a MySQL table structured as follows :

        +--------+------+-----+-------+--------+--------+----------------+
        |   id   | fid  | bid | dec_a | varc_a | varc_b | dec_b | varc_c |
        +--------+------+-----+-------+--------+--------+----------------+
        | 106861 | 4192 |  22 | 1.40  | blah   | blahbr | 0.2   | blahca |
        | 108620 | 4192 |  22 | 1.55  | blah   | blahbe | 0.2   | blahca |
        | 108621 | 4192 |  22 | 1.55  | blah   | blahbq | 0.2   | blahca | 
        | 108622 | 4192 |  22 | 1.55  | blah   | blahbw | 0.2   | blahca | 
        | 108623 | 4192 |  22 | 1.55  | blah   | blahbe | 0.2   | blahca | 
        | 108624 | 4192 |  22 | 1.55  | blah   | blahbf | 0.2   | blahca | 
        | 106863 | 4192 |  33 | 1.40  | blah   | blahba | 0.2   | blahca | 
        +--------+------+-----+-------+--------+--------+-------+--------+

id"值是一个 BIGINT 自动递增值,数据是按正确的时间顺序从源中添加的,所以我将其视为时间戳.

The "id" value is a BIGINT auto-incrementing value and the data is added in proper chronological order from the source, so I am viewing this as the timestamp.

为了确定哪些数据是重复的,我使用了fid"、bid"、varc_a"、dec_b"和varc_c"列.从上面的示例中,您可以看到基于这些列有 6 个重复项,它们是前六行,第七行显示出价"列中存在变化的位置,但显然任何这些列中的任何变化都排除了该行作为副本.

To establish which data is duplicated I am using the "fid", "bid", "varc_a", "dec_b" and "varc_c" columns. From the example above you can see that there are 6 duplicates based on those columns and those are the first six rows, the seventh row shows where there is variation in the "bid" column but obviously any variation in any of those columns excludes the row as a duplicate.

我可以轻松地想象我想要做什么:数据库中可能有数百万个条目,我想根据条目 id 排除最近的 2 行数据,其中fid"、bid"、"varc_a", "dec_b" 和 "varc_c" 列值相同,然后扫掉剩下的.

I can easily visualise what I want to do : There are potentially millions of entries in the database, I want to exclude the 2 most recent rows of data based on the entry id where the "fid", "bid", "varc_a", "dec_b" and "varc_c" column values are the same and then sweep away what's left.

在我的一生中,我无法弄清楚如何仅使用 MySQL 来做到这一点,正如我所说,我看过的所有问题和答案似乎都没有做我想做的事或者我不明白提议的内容.

For the life of me I can't figure out how to do that using just MySQL and, as I say, all of the questions and answers I've looked at don't seem to be doing what I want to do or I'm not understanding what's proposed.

我知道我可以通过浏览数据并删除重复项来使用 PHP+MySQL 来完成此操作,但考虑到我可以很容易地以如此低效的方式完成此操作,我想我错过了一些明显的东西,我应该单独用 MySQL 能做到吗?

I know I can do this with PHP+MySQL by trawling through the data and removing the duplicates but considering I can do it in such a horribly inefficient way quite easily I'm thinking that I'm missing something obvious and I should be able to do it with MySQL alone ?

:注意:

Mike 的回答非常出色,并且根据我的问题的上下文进行了一些调整,它正是我所需要的.我最终使用的是这个:

Mike's answer is excellent and it did precisely what I need with a little tweaking given the context of my question. What I ended up using was this :

DROP TEMPORARY TABLE IF EXISTS keepers1, keepers2, keepers_all;

CREATE TEMPORARY TABLE keepers1 (KEY(id)) ENGINE=MEMORY AS 
SELECT fid, bid, varc_a, dec_b, var_c, MAX(id) AS id 
FROM market_prices
GROUP BY fid, bid, varc_a, dec_b, varc_c;

CREATE TEMPORARY TABLE keepers2 AS
SELECT fid, bid, varc_a, dec_b, varc_c, MAX(id) AS id
FROM market_prices AS k
WHERE NOT EXISTS (SELECT 1 FROM keepers1 WHERE id = k.id)
GROUP BY  fid, bid, varc_a, dec_b, varc_c;

CREATE TEMPORARY TABLE keepers_all (KEY(id)) ENGINE=MEMORY AS
SELECT id FROM keepers1
UNION ALL
SELECT id FROM keepers2;

DELETE k.* FROM market_prices AS k WHERE NOT EXISTS (SELECT 2 FROM keepers_all WHERE id = k.id);

当分组时,请确保只使用重复的列,在最后一条语句中,您的 SELECT 应该是您想要保留的记录数,我需要在最后使用 SELECT 2.

When grouping be sure to just use the columns that are duplicated and in that last statement your SELECT should be the number of records you want to keep, I needed a SELECT 2 at the end there.

是时候向风云人物举杯了!

Time to raise a glass to the man of the hour!

推荐答案

这可能是您问题的解决方案.

This may be a solution for your problem.

但是,由于没有日期时间列,我假设 id 列是主键.它是Auto_increment.所以我的假设是数字越大记录越新.(除非您将一些旧数据转储到表中,否则应该是正确的)

However, since there is no date-time column I am assuming that the id column is the primary key. And it is Auto_increment. So my assumption is that the larger the number the newer the record. (it should be true unless you had some old data dumps into the table)

请确保在删除之前备份您的数据,因为这会导致您永久丢失数据.更好的是,您可以将当前表复制到另一个表中,然后处理新表以确保下面的逻辑正确.然后将我下面的查询更改为从 tbl_new 读取,而不是在 tbl

Make sure you back up your data before you delete as this will cause you a permanent data lost. Even better, you can make a copy of the current table into a different table and work on he new table to make sure the logic below is correct. Then change the queries that I have below to read from tbl_new instead on tbl

您可以通过类似的方式复制您的表格

you can duplicate your table via something like

CREATE TABLE tbl_new LIKE tbl;

我为每个查询都留下了评论

I have left comments for every query

DROP TEMPORARY TABLE IF EXISTS keepers1, keepers2, keepers_all;
-- get the #1 top records
CREATE TEMPORARY TABLE keepers1 (KEY(id)) ENGINE=MEMORY AS
SELECT fid, bid, dec_a, varc_a, varc_b, dec_b, varc_c, MAX(id) AS id
FROM tbl
GROUP BY fid, bid, dec_a, varc_a, varc_b, dec_b, varc_c;

-- get the #2 top records
CREATE TEMPORARY TABLE keepers2 AS
SELECT fid, bid, dec_a, varc_a, varc_b, dec_b, varc_c, MAX(id) AS id
FROM tbl AS k
WHERE NOT EXISTS (SELECT 1 FROM keepers1 WHERE id = k.id)
GROUP BY fid, bid, dec_a, varc_a, varc_b, dec_b, varc_c;


-- create a temp table where you have all he ids that you want to keep
CREATE TEMPORARY TABLE keepers_all (KEY(id)) ENGINE=MEMORY AS
SELECT id FROM keepers1
UNION ALL
SELECT id FROM keepers2;


-- delete all records that you don't want to keep
DELETE k.* FROM tbl AS k WHERE NOT EXISTS (SELECT 1 FROM keepers_all WHERE id = k.id);

如果这是一次性清理工作,那么您应该能够从控制台执行查询.但如果您正在寻找招聘工作,您可能应该使用此代码并将其放入程序中.

if this is a one time clean up job then you should be able to execute the queries from the console. but if you are looking for a recruiting Job them you should probably take this code and put it in a procedure.

注意:这里我使用 MEMORY TEMPORARY 表以获得更好的性能.您可能会遇到一个问题,说 "Table is Full" 这是因为你的记录太多了.那么您可以增加会话的 max_heap_table_size 值像

Note: here I am using MEMORY TEMPORARY tables for better performance. You may run into an issue that say "Table is Full" this is because you have too many records. then you can increase the value max_heap_table_size for the session something like

SET SESSION tmp_table_size = 1024 * 1024 * 1024 * 2; -- this will set it to 2G
SET SESSION max_heap_table_size = 1024 * 1024 * 1024 * 2; -- this will set it to 2G

这将为您提供当前价值

SELECT VARIABLES LIKE 'max_heap_table_size';
SELECT VARIABLES LIKE 'tmp_table_size';

这篇关于如果我有一个具有多个列值相同的 MySQL 表,如何删除除两个最新条目之外的所有条目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆