大型表上的MySQL MIN GROUP BY(> 8000行) [英] MySQL MIN GROUP BY on large tables ( > 8000 rows)

查看:83
本文介绍了大型表上的MySQL MIN GROUP BY(> 8000行)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下查询:

 选择contact_purl,contact_firstName,contact_lastName,MIN(contact_id)AS MinID 
FROM contacts
WHERE contact_client_id = 1
GROUP BY contact_purl
HAVING COUNT(contact_id)> 1

目的是找到任何具有重复contact_purl的联系人并返回第一个条目。

我遇到了一个非常奇怪的问题......如果表的行少于8000行,查询将在不到1秒内呈现。但是,如果表中有超过8000行,查询平均需要338秒。

以下是包含〜5000行的表的查询计划:

imgur.com/v54sA.pngalt =在这里输入图片描述>



表...



< (
`contact_id` int(11)NOT NULL AUTO_INCREMENT,
`contact_client_id` int(11)DEFAULT NULL,$(pre $) b $ b`contact_sales_id` int(11)DEFAULT NULL,
`contact_campaign_id` int(11)DEFAULT NULL,
`contact_purl` varchar(100)NOT NULL,$ b $``contact_purl1` varchar 50)DEFAULT NULL,
`contact_purl2` varchar(50)DEFAULT NULL,$ b $`contact_firstName` varchar(50)NOT NULL,$ b $``contact_lastName` varchar(50)NOT NULL,
`contact_organization` varchar(100)DEFAULT NULL,
`contact_url_organization` varchar(200)DEFAULT NULL,
`contact_position` varchar(50)DEFAULT NULL,
`contact_email` varchar(100)DEFAULT NULL,
`contact_phone` varchar(20)DEFAULT NULL,$ b $``contact_fax` varchar(20)NOT NULL,$ b $``contact_address1` varchar(100 )DEFAULT NULL,
'contact_address2` varchar(100)DEFAULT NULL,
`contact_city` varchar(100)DEFAULT NULL,
`contact_state` varchar(20)DEFAULT NULL,
`contact_zip` varchar(10)DEFAULT NULL,
`contact_IP` varchar(50)DEFAULT NULL,$ b $``contact_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`contact_pw` varchar(200 )NOT NULL,
`contact_subscribed` varchar(1)NOT NULL DEFAULT'Y',
`contact_import` varchar(200)DEFAULT NULL,
`contacts_c_1` varchar(500)DEFAULT NULL,
`contacts_c_2` varchar(500)DEFAULT NULL,
`contacts_c_3` varchar(500)DEFAULT NULL,
`contacts_c_4` varchar(500)DEFAULT NULL,
`contacts_c_5` varchar (500)DEFAULT NULL,
`contacts_c_6` varchar(500)DEFAULT NULL,
`c (500)DEFAULT NULL,
`contacts_c_8` varchar(500)DEFAULT NULL,
`contacts_c_9` varchar(500)DEFAULT NULL,
`contacts_c_10` varchar(500)DEFAULT NULL ,
`contacts_c_11` varchar(500)DEFAULT NULL,
`contacts_c_12` varchar(500)DEFAULT NULL,
`contacts_c_13` varchar(500)DEFAULT NULL,
`contacts_c_14` varchar(500)DEFAULT NULL,
`contacts_c_15` varchar(500)DEFAULT NULL,
`contacts_c_16` varchar(500)DEFAULT NULL,
`contacts_c_17` varchar(500)DEFAULT NULL,
`contacts_c_18` varchar(500)DEFAULT NULL,
`contacts_c_19` varchar(500)DEFAULT NULL,
`contacts_c_20` varchar(500)DEFAULT NULL,
`contacts_c_21` varchar 500)DEFAULT NULL,
`contacts_c_22` varchar(500)DEFAULT NULL,
`contacts_c_23` varchar(500)DEFAULT NULL,
`contacts_c_24` varchar(500)DEFAULT NULL,
`contacts_c_25` varchar(500)DEFAULT NULL,
`contacts_c_26` varchar(500)DEFAULT NULL,
`cont acts_c_27` varchar(500)DEFAULT NULL,
`contacts_c_28` varchar(500)DEFAULT NULL,
`contacts_c_29` varchar(500)DEFAULT NULL,
`contacts_c_30` varchar(500)DEFAULT NULL ,
`contacts_c_31` varchar(500)DEFAULT NULL,
`contacts_c_32` varchar(500)DEFAULT NULL,
`contacts_c_33` varchar(500)DEFAULT NULL,
`contacts_c_34` varchar(500)DEFAULT NULL,
`contacts_c_35` varchar(500)DEFAULT NULL,
`contacts_c_36` varchar(500)DEFAULT NULL,
`contacts_c_37` varchar(500)DEFAULT NULL,
`contacts_c_38` varchar(500)DEFAULT NULL,
`contacts_c_39` varchar(500)DEFAULT NULL,
`contacts_c_40` varchar(500)DEFAULT NULL,
`contacts_c_41` varchar 500)DEFAULT NULL,
`contacts_c_42` varchar(500)DEFAULT NULL,
`contacts_c_43` varchar(500)DEFAULT NULL,
`contacts_c_44` varchar(500)DEFAULT NULL,
`contacts_c_45` varchar(500)DEFAULT NULL,
`contacts_c_46` varchar(500)DEFAULT NULL,
`cont acts_c_47` varchar(500)DEFAULT NULL,
`contacts_c_48` varchar(500)DEFAULT NULL,
`contacts_c_49` varchar(500)DEFAULT NULL,
`contacts_c_50` varchar(500)DEFAULT NULL ,
`contacts_i_1` varchar(100)DEFAULT NULL,
`contacts_i_2` varchar(100)DEFAULT NULL,
`contacts_i_3` varchar(100)DEFAULT NULL,
`contacts_i_4` varchar(100)DEFAULT NULL,
`contacts_i_5` varchar(100)DEFAULT NULL,
`contacts_i_6` varchar(100)DEFAULT NULL,
`contacts_i_7` varchar(100)DEFAULT NULL,$ $ b $ contacts_i_8 varchar(100)DEFAULT NULL,
` 100)DEFAULT NULL,
`contacts_i_12` varchar(100)DEFAULT NULL,
`contacts_i_13` varchar(100)DEFAULT NULL,
`contacts_i_14` varchar(100)DEFAULT NULL,
`contacts_i_15` varchar(100)DEFAULT NULL,
PRIMARY KEY(`contact_id`),
KEY`contact_campaign_id`(`c ('contact_purl2'),$ b $ key'contact_purl1'('contact_purl1'),
('contact_purl2'),
KEY'contact_client_id`(`contact_client_id`),
KEY KEY`contact_purl`(`contact_purl`)

我最近优化和碎片整理表格也是如此。



有什么想法会导致这种情况? 谢谢你在你的问题中发布你的表结构,查询和 EXPLAIN 输出。我认为你正在跨越内存/磁盘临时表的大小边界,从而导致性能的巨大变化。如果您在contact_purl列上放置了唯一索引,MySQL将不允许插入重复项。这会让你的查询变得不必要。否则,我会在(contact_client_id,contact_purl)上创建一个索引,以便MySQL可以直接从索引中找出想要的行。您也可以尝试分离对列的搜索并使用子查询来检索它们。这样的事情可能是:

$ p $ SELECT contact_purl,contact_firstName,contact_lastName,contact_id
FROM contacts,(SELECT MIN(contact_id) AS MINID
FROM contacts
WHERE contact_client_id = 1
GROUP BY contact_purl
HAVING COUNT(contact_id)> 1)nodups WHERE nodups.MinID = contacts.contact_id


I have the following query:

SELECT contact_purl, contact_firstName, contact_lastName, MIN( contact_id ) AS MinID
FROM contacts
WHERE contact_client_id = 1
GROUP BY contact_purl
HAVING COUNT( contact_id ) > 1

The purpose is to find any contacts with a duplicate "contact_purl," and return the first entry.

I'm running into a very strange problem... If the table has less than 8,000 rows, the query will render in less than 1 second. HOWEVER, if the table has more than 8,000 rows, the query will take consistently 338 seconds on average.

Here is the query plan for the table with ~5000 rows:

And for ~8000 rows:

The table...

  CREATE TABLE IF NOT EXISTS `contacts` (
  `contact_id` int(11) NOT NULL AUTO_INCREMENT,
  `contact_client_id` int(11) DEFAULT NULL,
  `contact_sales_id` int(11) DEFAULT NULL,
  `contact_campaign_id` int(11) DEFAULT NULL,
  `contact_purl` varchar(100) NOT NULL,
  `contact_purl1` varchar(50) DEFAULT NULL,
  `contact_purl2` varchar(50) DEFAULT NULL,
  `contact_firstName` varchar(50) NOT NULL,
  `contact_lastName` varchar(50) NOT NULL,
  `contact_organization` varchar(100) DEFAULT NULL,
  `contact_url_organization` varchar(200) DEFAULT NULL,
  `contact_position` varchar(50) DEFAULT NULL,
  `contact_email` varchar(100) DEFAULT NULL,
  `contact_phone` varchar(20) DEFAULT NULL,
  `contact_fax` varchar(20) NOT NULL,
  `contact_address1` varchar(100) DEFAULT NULL,
  `contact_address2` varchar(100) DEFAULT NULL,
  `contact_city` varchar(100) DEFAULT NULL,
  `contact_state` varchar(20) DEFAULT NULL,
  `contact_zip` varchar(10) DEFAULT NULL,
  `contact_IP` varchar(50) DEFAULT NULL,
  `contact_timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `contact_pw` varchar(200) NOT NULL,
  `contact_subscribed` varchar(1) NOT NULL DEFAULT 'Y',
  `contact_import` varchar(200) DEFAULT NULL,
  `contacts_c_1` varchar(500) DEFAULT NULL,
  `contacts_c_2` varchar(500) DEFAULT NULL,
  `contacts_c_3` varchar(500) DEFAULT NULL,
  `contacts_c_4` varchar(500) DEFAULT NULL,
  `contacts_c_5` varchar(500) DEFAULT NULL,
  `contacts_c_6` varchar(500) DEFAULT NULL,
  `contacts_c_7` varchar(500) DEFAULT NULL,
  `contacts_c_8` varchar(500) DEFAULT NULL,
  `contacts_c_9` varchar(500) DEFAULT NULL,
  `contacts_c_10` varchar(500) DEFAULT NULL,
  `contacts_c_11` varchar(500) DEFAULT NULL,
  `contacts_c_12` varchar(500) DEFAULT NULL,
  `contacts_c_13` varchar(500) DEFAULT NULL,
  `contacts_c_14` varchar(500) DEFAULT NULL,
  `contacts_c_15` varchar(500) DEFAULT NULL,
  `contacts_c_16` varchar(500) DEFAULT NULL,
  `contacts_c_17` varchar(500) DEFAULT NULL,
  `contacts_c_18` varchar(500) DEFAULT NULL,
  `contacts_c_19` varchar(500) DEFAULT NULL,
  `contacts_c_20` varchar(500) DEFAULT NULL,
  `contacts_c_21` varchar(500) DEFAULT NULL,
  `contacts_c_22` varchar(500) DEFAULT NULL,
  `contacts_c_23` varchar(500) DEFAULT NULL,
  `contacts_c_24` varchar(500) DEFAULT NULL,
  `contacts_c_25` varchar(500) DEFAULT NULL,
  `contacts_c_26` varchar(500) DEFAULT NULL,
  `contacts_c_27` varchar(500) DEFAULT NULL,
  `contacts_c_28` varchar(500) DEFAULT NULL,
  `contacts_c_29` varchar(500) DEFAULT NULL,
  `contacts_c_30` varchar(500) DEFAULT NULL,
  `contacts_c_31` varchar(500) DEFAULT NULL,
  `contacts_c_32` varchar(500) DEFAULT NULL,
  `contacts_c_33` varchar(500) DEFAULT NULL,
  `contacts_c_34` varchar(500) DEFAULT NULL,
  `contacts_c_35` varchar(500) DEFAULT NULL,
  `contacts_c_36` varchar(500) DEFAULT NULL,
  `contacts_c_37` varchar(500) DEFAULT NULL,
  `contacts_c_38` varchar(500) DEFAULT NULL,
  `contacts_c_39` varchar(500) DEFAULT NULL,
  `contacts_c_40` varchar(500) DEFAULT NULL,
  `contacts_c_41` varchar(500) DEFAULT NULL,
  `contacts_c_42` varchar(500) DEFAULT NULL,
  `contacts_c_43` varchar(500) DEFAULT NULL,
  `contacts_c_44` varchar(500) DEFAULT NULL,
  `contacts_c_45` varchar(500) DEFAULT NULL,
  `contacts_c_46` varchar(500) DEFAULT NULL,
  `contacts_c_47` varchar(500) DEFAULT NULL,
  `contacts_c_48` varchar(500) DEFAULT NULL,
  `contacts_c_49` varchar(500) DEFAULT NULL,
  `contacts_c_50` varchar(500) DEFAULT NULL,
  `contacts_i_1` varchar(100) DEFAULT NULL,
  `contacts_i_2` varchar(100) DEFAULT NULL,
  `contacts_i_3` varchar(100) DEFAULT NULL,
  `contacts_i_4` varchar(100) DEFAULT NULL,
  `contacts_i_5` varchar(100) DEFAULT NULL,
  `contacts_i_6` varchar(100) DEFAULT NULL,
  `contacts_i_7` varchar(100) DEFAULT NULL,
  `contacts_i_8` varchar(100) DEFAULT NULL,
  `contacts_i_9` varchar(100) DEFAULT NULL,
  `contacts_i_10` varchar(100) DEFAULT NULL,
  `contacts_i_11` varchar(100) DEFAULT NULL,
  `contacts_i_12` varchar(100) DEFAULT NULL,
  `contacts_i_13` varchar(100) DEFAULT NULL,
  `contacts_i_14` varchar(100) DEFAULT NULL,
  `contacts_i_15` varchar(100) DEFAULT NULL,
  PRIMARY KEY (`contact_id`),
  KEY `contact_campaign_id` (`contact_campaign_id`),
  KEY `contact_client_id` (`contact_client_id`),
  KEY `contact_purl2` (`contact_purl2`),
  KEY `contact_purl1` (`contact_purl1`),
  KEY `contact_purl` (`contact_purl`)
)

I have recently Optimized and Defragmented the table as well.

Any ideas on what would be causing this?

解决方案

First off, thank you for posting your table structure, query, and EXPLAIN output in your question. I think you're crossing the memory / disk temporary table size boundary, thus the large performance change. If you put a unique index on the contact_purl column, MySQL won't allow duplicates to be inserted. This would make your query unnecessary. Otherwise, I'd create an index on (contact_client_id, contact_purl) so MySQL can figure out what rows you want from the indexes directly. You could also try separating the search for the columns and retrieving them by using a subquery. Something like this maybe:

SELECT contact_purl, contact_firstName, contact_lastName, contact_id
FROM contacts, (SELECT MIN(contact_id) AS MinID
FROM contacts
WHERE contact_client_id = 1
GROUP BY contact_purl
HAVING COUNT( contact_id ) > 1) nodups WHERE nodups.MinID = contacts.contact_id

这篇关于大型表上的MySQL MIN GROUP BY(&gt; 8000行)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆