MySQL查询对大数据的速度非常慢 [英] MySQL query painfully slow on large data

查看:840
本文介绍了MySQL查询对大数据的速度非常慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不是MySQL专家,但我明白了,我刚刚继承了一个很大的表(600,000行和大约90列(请杀死我...)),并且我创建了一个较小的表将其与类别表链接.

I'm no MySQL whiz but I get it, I have just inherited a pretty large table (600,000 rows and around 90 columns (Please kill me...)) and I have a smaller table that I've created to link it with a categories table.

我正在尝试使用左联接查询所述表,因此我在一个对象中同时拥有两组数据,但运行速度非常慢,而且我还不足以解决它.对于为什么这么慢,我非常感谢一些指导和解释.

I'm trying to query said table with a left join so I have both sets of data in one object but it runs terribly slow and I'm not hot enough to sort it out; I'd really appreciate a little guidance and explanation as to why it's so slow.

SELECT 
    `products`.`Product_number`,
    `products`.`Price`,
    `products`.`Previous_Price_1`,
    `products`.`Previous_Price_2`,
    `products`.`Product_number`,
    `products`.`AverageOverallRating`,
    `products`.`Name`,
    `products`.`Brand_description`
FROM `product_categories`
LEFT OUTER JOIN `products`
ON `products`.`product_id`= `product_categories`.`product_id`
WHERE COALESCE(product_categories.cat4, product_categories.cat3,
product_categories.cat2, product_categories.cat1) = '123456'
AND `product_categories`.`product_id` != 0

这两个表是MyISAM,products表在Product_number和Brand_Description上建立索引,而product_categories表在所有合并的列上都有唯一索引;如果此信息完全没有帮助.

The two tables are MyISAM, the products table has indexing on Product_number and Brand_Description and the product_categories table has a unique index on all columns combined; if this info is of any help at all.

继承了这个系统之后,我需要在尽快核对它并正确执行之前使其尽快工作,这样,现在任何帮助都将为您赢得最大的敬意!

Having inherited this system I need to get this working asap before I nuke it and do it properly so any help right now will earn you my utmost respect!

这是扩展说明的输出:

Here is the output of the explain extended:

+----+-------------+--------------------+-------+---------------+------+---------+------+---------+----------+--------------------------+
| id | select_type | table              | type  | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                    |
+----+-------------+--------------------+-------+---------------+------+---------+------+---------+----------+--------------------------+
|  1 | SIMPLE      | product_categories | index | NULL          | cat1 | 23      | NULL | 1224419 |   100.00 | Using where; Using index |
|  1 | SIMPLE      | products           | ALL   | Product_id    | NULL | NULL    | NULL |  512376 |   100.00 |                          |
+----+-------------+--------------------+-------+---------------+------+---------+------+---------+----------+--------------------------+

推荐答案

查询的含义似乎是查找所有类别为'123456'的产品".正确吗?

The meaning of your query seems to be "find all products that have the category '123456'." Is that correct?

COALESCE是在WHERE语句中使用的非常昂贵的函数,因为它对索引不利的NULL值进行操作.您的解释结果表明,您对product_categories表的查询的选择性不是很高.在MySQL中,如果要利用索引来加快查询速度,则需要完全避免WHERE语句中的函数.

COALESCE is an extraordinarily expensive function to use in a WHERE statement, because it operates on index-hostile NULL values. Your explain result shows that your query is not being very selective on your product_categories table. In MySQL you need to avoid functions in WHERE statements altogether if you want to exploit indexes to make your queries fast.

其他人所说的关于90列桌子有害的事情也是正确的.但是您仍然坚持使用它,所以让我们对其进行处理.

The thing someone else said about 90-column tables being harmful is also true. But you're stuck with it, so let's just deal with it.

我们可以重新处理您的查询以摆脱基于函数的WHERE吗?让我们尝试一下.

Can we rework your query to get rid of the function-based WHERE? Let's try this.

SELECT  /* some columns from the products table */
  FROM products
 WHERE product_id IN 
 (
     SELECT DISTINCT product_id 
                FROM product_categories
               WHERE product_id <> 0
                 AND (   cat1='123456'
                      OR cat2='123456'
                      OR cat3='123456'
                      OR cat4='123456')
 )

要使其快速生效,您将需要在四个cat列上创建单独的索引.复合唯一索引(在所有合并的列上")将无济于事.仍然可能不是很好.

For this to work fast you're going to need to create separate indexes on your four cat columns. The composite unique index ("on all columns combined") is not going to help you. It still may not be so good.

更好的解决方案可能是在布尔模式下进行全文搜索.您正在使用MyISAM访问方法,因此这是可能的.绝对值得一试.确实可能很快.

A better solution might be FULLTEXT searching IN BOOLEAN MODE. You're working with the MyISAM access method so this is possible. It's definitely worth a try. It could be very fast indeed.

SELECT  /* some columns from the products table */
  FROM products
 WHERE product_id IN 
 (
     SELECT product_id 
       FROM product_categories
      WHERE MATCH(cat1,cat2,cat3,cat4) 
            AGAINST('123456' IN BOOLEAN MODE)
        AND product_id <> 0

 )

要使其快速生效,您将需要像这样创建FULLTEXT索引.

For this to work fast you're going to need to create a FULLTEXT index like so.

 CREATE FULLTEXT INDEX cat_lookup 
                    ON product_categories (cat1, cat2, cat3, cat4)

请注意,这些建议的查询都不会产生与您的COALESCE查询完全相同的结果. COALESCE查询的设置方式,某些组合与这些查询不匹配.例如.

Note that neither of these suggested queries produce precisely the same results as your COALESCE query. The way your COALESCE query is set up, some combinations won't match it that will match these queries. For example.

    cat1     cat2     cat3     cat4   
  123451   123453   123455   123456      matches your and my queries
  123456   123455   123454   123452      matches my queries but not yours

但是,即使您的商品中还有一些商品,我的查询也可能会生成有用的商品列表.

But it's likely that my queries will produce a useful list of products, even if it has a few more items in yours.

您可以仅通过处理product_categories上的内部查询来调试这些东西.

You can debug this stuff by just working with the inner queries on product_categories.

这篇关于MySQL查询对大数据的速度非常慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆