MySQL索引 - 根据此表和查询的最佳实践是什么 [英] MySQL indexes - what are the best practices according to this table and queries

查看:81
本文介绍了MySQL索引 - 根据此表和查询的最佳实践是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个表(500,000行)

i have this table (500,000 row)

CREATE TABLE IF NOT EXISTS `listings` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `type` tinyint(1) NOT NULL DEFAULT '1',
  `hash` char(32) NOT NULL,
  `source_id` int(10) unsigned NOT NULL,
  `link` varchar(255) NOT NULL,
  `short_link` varchar(255) NOT NULL,
  `cat_id` mediumint(5) NOT NULL,
  `title` mediumtext NOT NULL,
  `description` mediumtext,
  `content` mediumtext,
  `images` mediumtext,
  `videos` mediumtext,
  `views` int(10) unsigned NOT NULL,
  `comments` int(11) DEFAULT '0',
  `comments_update` int(11) NOT NULL DEFAULT '0',
  `editor_id` int(11) NOT NULL DEFAULT '0',
  `auther_name` varchar(255) DEFAULT NULL,
  `createdby_id` int(10) NOT NULL,
  `createdon` int(20) NOT NULL,
  `editedby_id` int(10) NOT NULL,
  `editedon` int(20) NOT NULL,
  `deleted` tinyint(1) NOT NULL,
  `deletedon` int(20) NOT NULL,
  `deletedby_id` int(10) NOT NULL,
  `deletedfor` varchar(255) NOT NULL,
  `published` tinyint(1) NOT NULL DEFAULT '1',
  `publishedon` int(11) unsigned NOT NULL,
  `publishedby_id` int(10) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `hash` (`hash`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;

我想在x和y之间通过发布每个查询(在所有网站上显示1个月的记录)

i'm thinking to make each query by the publishedon between x and y (show in all the site just records of 1 month)

在同一时间,我想添加已发布的在where子句中发布,cat_id,source_id

in the same time, i want to add with the publishedon in the where clause published, cat_id , source_id

这样的事情是这样的:

SELECT * FROM listings 
WHERE (publishedon BETWEEN 1441105258 AND 1443614458) 
  AND (published = 1) 
  AND (cat_id in(1,2,3,4,5)) 
  AND (source_id  in(1,2,3,4,5))

该查询是正常的,直到现在没有索引,但是当尝试使用按发布的顺序它变得太慢了,所以我使用了这个索引

that query is ok and fast until now without indexing, but when trying to use order by publishedon its became too slow, so i used this index

CREATE INDEX `listings_pcs` ON listings(
    `publishedon` DESC,
    `published` ,
    `cat_id` ,
    `source_id`
)

它的工作原理和的订单按发布的变得快速,现在我想按次序排序像这样

it worked and the order by publishedon became fast, now i want to order by views like this

SELECT * FROM listings 
WHERE (publishedon BETWEEN 1441105258 AND 1443614458) 
  AND (published = 1) 
  AND (cat_id in(1,2,3,4,5)) 
  AND (source_id  in(1,2,3,4,5)) 
ORDER BY views DESC

这是解释

由于 ORDER BY视图DESC

然后我试图删除旧索引并添加此

then i'm tried to drop the old index and add this

CREATE INDEX `listings_pcs` ON listings(
    `publishedon` DESC,
    `published` ,
    `cat_id` ,
    `source_id`,
    `views` DESC
)

它太慢了

如果我只在发布上使用单个索引怎么样?
如果在cat_id,source_id,views,publishedon上使用单个索引怎么样?

what about if i use just single index on publishedon? what about using single index on cat_id,source_id,views,publishedon?

如果我找到其他索引方法,我可以在一个月内改变查询依赖关系取决于任何其他列

i can change the query dependencies like publishedon in one month if i found other indexing method depend on any other columns

如何制作索引( cat_id source_id 已发布已发布)?但在某些情况下我只会使用source_id?

what about making index in (cat_id, source_id, publishedon, published) ? but in some cases i will use source_id only?

该表的最佳索引架构是什么

what is the best indexing schema for that table

推荐答案

关于为什么你的查询没有得到任何更快的一个重要的一般说明尽管你的尝试是索引目前不支持 DESC MySQL的。请参阅此 SO主题,并来源来源。

One important general note as to why your query isn't getting any faster despite your attempts is that DESC on indexes is not currently supported on MySQL. See this SO thread, and the source from which it comes.

在这种情况下,你最大的问题是记录的庞大规模。如果引擎决定使用索引真的不会更快,那么它就不会。

In this case, your largest problem is in the sheer size of your record. If the engine decides it wouldn't really be faster to use an index, then it won't.

你有几个选择,所有这些都非常不错,可能会帮助你看到显着的改善。

You have a few options, and all are actually pretty decent and can probably help you see significant improvement.

首先,我想简要介绍一下SQL中的索引。虽然我不认为这是你的困境的解决方案,但这是你的主要问题,并且可以提供帮助。

First, I want to make a quick note about indexing in SQL. While I don't think it's the solution for your woes, it was your main question, and can help.

通常可以帮助我考虑在三个不同的桶中编制索引。 绝对 可能永远不会。你当然在 never 列的索引中没有任何内容,但有一些我会考虑可能索引。

It usually helps me to think about indexing in three different buckets. The absolutely, the maybe, and the never. You certainly don't have anything in your indexing that's in the never column, but there are some I would consider "maybe" indexes.

绝对 :这是您的主键和任何外键。它也是您定期参考的任何密钥,用于从您拥有的大量数据中提取一小组数据。

absolutely: This is your primary key and any foreign keys. It is also any key you will reference on a very regular basis to pull a small set of data from the massive data you have.

可能 :这些列虽然您可以定期引用它们,但它们本身并未真正引用。事实上,通过分析并使用 EXPLAIN 作为 @Machavity 推荐他的回答是,你可能会发现,当这些列用于去除字段时,无论如何都没有那么多的字段。对于我来说,这个专栏的一个例子就是已发布的列。请注意,每个 INDEX 都会增加您的查询所需的工作。

maybe: These are columns which, while you may reference them regularly, are not really referenced by themselves. In fact, through analysis and using EXPLAIN as @Machavity recommends in his answer, you may find that by the time these columns are used to strip out fields, there aren't that many fields anyway. An example of a column that would solidly be in this pile for me would be the published column. Keep in mind that every INDEX adds to the work your queries need to do.

此外:当您定期搜索基于两个不同列的数据时,复合键是一个不错的选择。稍后会详细介绍。

Also: Composite keys are a good choice when you're regularly searching for data based on two different columns. More on that later.

有很多选项可供选择考虑一下,每个人都有一些缺点。最终,我会根据具体情况考虑其中的每一项,因为我认为这些都不是银弹。理想情况下,您可以针对当前设置测试一些不同的解决方案,并使用一个不错的科学测试来查看哪一个运行速度最快。

There are a number of options to consider, and each one has some drawbacks. Ultimately I would consider each of these on a case-by-case basis as I don't see any of these to be a silver bullet. Ideally, you'd test a few different solutions against your current setting and see which one runs the fastest using a nice scientific test.


  1. 将SQL表拆分为两个或多个单独的表。

  1. Split your SQL table into two or more separate tables.

这是少数几次之一,尽管你的表中的列数,我不会急于尝试将表拆分成更小的块。但是,如果您决定将其拆分为较小的块,我会认为您的 [action] edon [action] edby_id [action] ed 可轻松放入另一个表格,操作

This is one of the few times where, despite the number of columns in your table, I wouldn't rush to try to split your table into smaller chunks. If you decided to split it into smaller chunks, however, I'd argue that your [action]edon, [action]edby_id, and [action]ed could easily be put into another table, actions:

+-----------+-------------+------+-----+-------------------+----------------+
| Field     | Type        | Null | Key | Default           | Extra          |
+-----------+-------------+------+-----+-------------------+----------------+
| id        | int(11)     | NO   | PRI | NULL              | auto_increment |
| action_id | int(11)     | NO   |     | NULL              |                |
| action    | varchar(45) | NO   |     | NULL              |                |
| date      | datetime    | NO   |     | CURRENT_TIMESTAMP |                |
| user_id   | int(11)     | NO   |     | NULL              |                |
+-----------+-------------+------+-----+-------------------+----------------+

这样做的缺点是,它不允许您确保只有一个创建日期没有 TRIGGER 。好处是,当您按日期排序时,不必对包含尽可能多索引的列进行排序。此外,它还允许您排序不仅创建,还可以进行所有其他操作。

The downside to this is that it does not allow you to ensure there is only one creation date without a TRIGGER. The upside is that when you don't have to sort as many columns with as many indexes when you're sorting by date. Also, it allows you to sort not only be created, but also by all of your other actions.

编辑:根据要求,这是一个示例排序查询

SELECT * FROM listings 
INNER JOIN actions ON actions.listing_id = listings.id
WHERE (actions.action = 'published') 
  AND (listings.published = 1) 
  AND (listings.cat_id in(1,2,3,4,5)) 
  AND (listings.source_id  in(1,2,3,4,5)) 
  AND (actions.actiondate between 1441105258 AND 1443614458)
ORDER BY listings.views DESC

从理论上说,它应该减少你的行数排序,因为它只提取相关数据。我没有像你这样的数据集,所以我现在无法测试它!

如果你把复合键放在 actiondate listings.id ,这有助于提高速度。

If you put a composite key on actiondate and listings.id, this should help to increase speed.

As我说,我不认为这对你来说是最好的解决方案,因为我不相信它会给你最大的优化。这引出了我的下一个建议:

As I said, I don't think this is the best solution for you right now because I'm not convinced it's going to give you the maximum optimization. This leads me to my next suggestion:


  1. 创建月份字段

  1. Create a month field

我用这个漂亮的工具来确认我的想法了解你的问题:你在这里按月分类。你的例子是9月1日到9月30日之间的具体情况。

I used this nifty tool to confirm what I thought I understood of your question: You are sorting by month here. Your example is specifically looking between September 1st and September 30th, inclusive.

所以另一种选择是将整数函数拆分为字段。您仍然可以获得时间戳,但时间戳对于搜索来说并不是那么好。即使是一个简单的查询也可以运行 EXPLAIN ,你会亲眼看到。

So another option is for you to split your integer function into a month, day, and year field. You can still have your timestamp, but timestamps aren't all that great for searching. Run an EXPLAIN on even a simple query and you'll see for yourself.

这样,你就可以索引月份和年份字段并执行如下查询:

That way, you can just index the month and year fields and do a query like this:

SELECT * FROM listings 
WHERE (publishedmonth = 9)
  AND (publishedyear = 2015) 
  AND (published = 1) 
  AND (cat_id in(1,2,3,4,5)) 
  AND (source_id  in(1,2,3,4,5)) 
ORDER BY views DESC

Slap an前面有 EXPLAIN ,您应该会看到大幅改进。

Slap an EXPLAIN in front and you should see massive improvements.

因为您打算参考一个月和一个月一天,您可能希望针对月份和年份添加复合键,而不是单独添加一个键,以增加收益。

Because you're planning on referring to a month and a day, you may want to add a composite key against month and year, rather than a key on both separately, for added gains.

注意:我想明确,这不是正确做事的方式。它很方便,但是非规范化。如果你想要正确的做事方式,你可以适应这个链接但我认为这需要你认真地重新考虑你的桌子,我没有尝试过这样的东西,没有必要,而且,坦率地说,会刷新我的几何形状。我认为你要做的事情有点矫枉过正。

Note: I want to be clear, this is not the "correct" way to do things. It is convenient, but denormalized. If you want the correct way to do things, you'd adapt something like this link but I think that would require you to seriously reconsider your table, and I haven't tried anything like this, having lacked the need, and, frankly, will, to brush up on my geometry. I think it's a little overkill for what you're trying to do.


  1. 你在其他地方的繁重分类

  1. Do your heavy sorting elsewhere

这对我很难接受,因为我喜欢做SQL 尽可能的方式,但这并不总是最好的解决方案。例如,重型计算最好使用您的编程语言完成,让SQL处理关系。

This was hard for me to come to terms with because I like to do things the "SQL" way wherever possible, but that is not always the best solution. Heavy computing, for example, is best done using your programming language, leaving SQL to handle relationships.

Digg的前CTO使用PHP而不是MySQL排序并收到< a href =http://highscalability.com/blog/2010/3/23/digg-4000-performance-increase-by-sorting-in-php-rather-than.html\"rel =nofollow noreferrer> 4,000性能提升%。当然,你可能没有扩展到这个级别,所以除非你自己测试,否则性能权衡将不会明确。不过,这个概念是合理的:数据库是瓶颈,相比之下,计算机内存相当便宜。

The former CTO of Digg sorted using PHP instead of MySQL and received a 4,000% performance increase. You're probably not scaling out to this level, of course, so the performance trade-offs won't be clearcut unless you test it out yourself. Still, the concept is sound: the database is the bottleneck, and computer memory is dirt cheap by comparison.

毫无疑问,可以做更多的调整。这些都有缺点,需要一些投资。最好的答案是测试其中的两个或更多,看看哪一个可以帮助你获得最大的改进。

There are doubtless a lot more tweaks that can be done. Each of these has a drawback and requires some investment. The best answer is to test two or more of these and see which one helps you get the most improvement.

这篇关于MySQL索引 - 根据此表和查询的最佳实践是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆