MySQL SELECT DISTINCT行(不是列)过滤$ _POST作为重复项 [英] MySQL SELECT DISTINCT rows (not columns) to filter $_POST for duplicates

查看:228
本文介绍了MySQL SELECT DISTINCT行(不是列)过滤$ _POST作为重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从MySQL表中筛选出所有的 $ _ POST 数据在线表单中存储的行。有时用户的互联网连接停止或浏览器拧紧,表单提交后的新页面不显示(虽然INSERT已经工作,表行已创建)。然后他们点击刷新,并提交两次表单,创建一个重复的行(时间戳和自动增量id列除外)。

I'm trying to filter rows from the MySQL table where all the $_POST data is stored from an online form. Sometimes the user's internet connection stalls or the browser screws up, and the new page after form submission is not displayed (though the INSERT worked and the table row was created). They then hit refresh, and submit their form twice, creating a duplicate row (except for the timestamp and autoincrement id columns).

我想选择唯一的表单提交。这必须是一个非常常见的任务,但我似乎找不到可以使用 DISTINCT 调用的东西,它以简洁的方式应用于除了时间戳和id之外的每一列(类似于 SELECT id,timestamp,DISTINCT everything_else FROM table; 。现在我可以这样做:

I'd like to select unique form submissions. This has to be a really common task, but I can't seem to find something that lets me call with DISTINCT applying to every column except the timestamp and id in a succinct way (sort of like SELECT id, timestamp, DISTINCT everything_else FROM table;. At the moment, I can do:

CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS (
  SELECT DISTINCT everything,except,id,and,timestamp 
  FROM table1
);
SELECT * FROM table1 LEFT OUTER JOIN temp1 
  ON table1.everything = temp1.everything
  ...
;

我的表格有20k行,大约有25列(用于机器学习练习的分类功能),该查询使用永远(因为我假定它遍历20k行20K次?)我从来没有让它跑完了,这是做什么的标准练习方法?

My table has 20k rows with about 25 columns (classification features for a machine learning exercise). This query takes forever (as I presume it traverses the 20k rows 20K times?) I've never even let it run to completion. What's the standard practice way to do this?

注意:这个问题建议为相关列添加索引,但索引最多可以有16个关键部分。我应该选择最有可能是独一无二的吗?我可以用这种方式在2秒钟内找到大约700个重复项,但是我不能确定不要抛出一个独特的行,因为我在指定索引时也必须忽略一些列。

Note: This question suggests add an index to the relevant columns, but there can be max 16 key parts to an index. Should I just choose the most likely unique ones? I can find about 700 duplicates in 2 seconds this way, but I can't be sure of not throwing away a unique row because I also have to ignore some columns when specifying the index.

推荐答案

如果您有一个 UNIQUE 键(除 AUTO_INCREMENT ),只需使用 INSERT IGNORE ... 来静默地避免重复的行。如果您没有UNIQUE密钥,您是否再也不需要再找到一行?

If you have a UNIQUE key (other than an AUTO_INCREMENT), simply use INSERT IGNORE ... to silently avoid duplicate rows. If you don't have a UNIQUE key, do you never need to find a row again?

如果您已经允许重复,并且需要将其删除,这是一个不同的问题。

If you have already allowed duplicates and you need to get rid of them, that is a different question.

这篇关于MySQL SELECT DISTINCT行(不是列)过滤$ _POST作为重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆