在索引布尔列与日期时间列上查询的性能 [英] Performance of query on indexed Boolean column vs Datetime column

查看:64
本文介绍了在索引布尔列与日期时间列上查询的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果在datetime类型列而不是boolean类型列上设置索引(并且在该列上进行查询),查询性能是否存在显着差异?

Is there a notable difference in query performance, if the index is set on datetime type column, instead of boolean type column (and querying is done on that column)?

在我目前的设计中,我有2列:

In my current design I got 2 columns:

  • is_active TINYINT(1),已编入索引
  • deleted_at DATETIME
  • is_active TINYINT(1), indexed
  • deleted_at DATETIME

查询是SELECT * FROM table WHERE is_active = 1;

如果我改为在deleted_at列上创建索引并运行诸如SELECT * FROM table WHERE deleted_at is null;这样的查询,会不会慢一些?

Would it be any slower, if I made an index on deleted_at column instead, and ran queries like this SELECT * FROM table WHERE deleted_at is null; ?

推荐答案

这里是具有1000万行的MariaDB(10.0.19)基准测试(使用

Here is a MariaDB (10.0.19) benchmark with 10M rows (using the sequence plugin):

drop table if exists test;
CREATE TABLE `test` (
    `id` MEDIUMINT UNSIGNED NOT NULL,
    `is_active` TINYINT UNSIGNED NOT NULL,
    `deleted_at` TIMESTAMP NULL,
    PRIMARY KEY (`id`),
    INDEX `is_active` (`is_active`),
    INDEX `deleted_at` (`deleted_at`)
) ENGINE=InnoDB
    select seq id
        , rand(1)<0.5 as is_active
        , case when rand(1)<0.5 
            then null
            else '2017-03-18' - interval floor(rand(2)*1000000) second
        end as deleted_at
    from seq_1_to_10000000;

要测量时间,我使用set profiling=1并在执行查询后运行show profile.从分析结果中,我取Sending data的值,因为其他所有内容都小于一毫秒.

To measure the time I use set profiling=1 and run show profile after executing a query. From the profiling result I take the value of Sending data since everything else is altogether less than one msec.

TINYINT 索引:

SELECT COUNT(*) FROM test WHERE is_active = 1;

运行时:〜 738毫秒

TIMESTAMP 索引:

SELECT COUNT(*) FROM test WHERE  deleted_at is null;

运行时:〜 748毫秒

索引大小:

select database_name, table_name, index_name, stat_value*@@innodb_page_size
from mysql.innodb_index_stats 
where database_name = 'tmp'
  and table_name = 'test'
  and stat_name = 'size'

结果:

database_name | table_name | index_name | stat_value*@@innodb_page_size
-----------------------------------------------------------------------
tmp           | test       | PRIMARY    | 275513344 
tmp           | test       | deleted_at | 170639360 
tmp           | test       | is_active  |  97107968 

请注意,虽然TIMESTAMP(4字节)是TYNYINT(1字节)的4倍,但索引大小甚至不是两倍.但是,如果索引大小不适合内存,则索引大小可能很大.因此,当我将innodb_buffer_pool_size1G更改为50M时,我得到以下数字:

Note that while TIMESTAMP (4 Bytes) is 4 times as long as TYNYINT (1 Byte), the index size is not even twice as large. But the index size can be significant if it doesn't fit into memory. So when i change innodb_buffer_pool_size from 1G to 50M i get the following numbers:

  • TINYINT:〜 960毫秒
  • 时间戳:〜 1500毫秒
  • TINYINT: ~ 960 msec
  • TIMESTAMP: ~ 1500 msec

为了更直接地解决这个问题,我对数据做了一些更改:

To address the question more directly I did some changes to the data:

  • 我使用DATETIME代替TIMESTAMP
  • 由于条目通常很少被删除,因此我使用rand(1)<0.99(已删除1%)而不是rand(1)<0.5(已删除50%)
  • 表的大小从10M更改为1M.
  • SELECT COUNT(*)更改为SELECT *
  • Instead of TIMESTAMP I use DATETIME
  • Since entries are usually rarely deleted I use rand(1)<0.99 (1% deleted) instead of rand(1)<0.5 (50% deleted)
  • Table size changed from 10M to 1M rows.
  • SELECT COUNT(*) changed to SELECT *

索引大小:

index_name | stat_value*@@innodb_page_size
------------------------------------------
PRIMARY    | 25739264
deleted_at | 12075008
is_active  | 11026432

由于deleted_at的99%值为NULL,因此索引大小没有显着差异,尽管非空的DATETIME需要8字节(MariaDB).

Since 99% of deleted_at values are NULL there is no significant difference in index size, though a non empty DATETIME requires 8 Bytes (MariaDB).

SELECT * FROM test WHERE is_active = 1;      -- 782 msec
SELECT * FROM test WHERE deleted_at is null; -- 829 msec

删除两个索引都将在大约350毫秒内执行两个查询.然后删除is_active列,即可在280毫秒内执行deleted_at is null查询.

Dropping both indexes both queries execute in about 350 msec. And dropping the is_active column the deleted_at is null query executes in 280 msec.

请注意,这仍然不是现实的情况.您不太可能希望从1M中选择990K行并将其交付给用户.表中可能还会有更多列(可能包括文本).但是它表明,您可能不需要is_active列(如果它不添加其他信息),并且在最佳情况下,任何索引都不能用于选择未删除的条目.

Note that this is still not a realistic scenario. You will unlikely want to select 990K rows out of 1M and deliver it to the user. You will probably also have more columns (maybe including text) in the table. But it shows, that you probably don't need the is_active column (if it doesn't add additional information), and that any index is in best case useless for selecting non deleted entries.

但是索引对于选择已删除的行可能有用:

However an index can be usefull to select deleted rows:

SELECT * FROM test WHERE is_active = 0;

在有索引的情况下以10毫秒执行,在没有索引的情况下以170毫秒执行.

Executes in 10 msec with index and in 170 msec without index.

SELECT * FROM test WHERE deleted_at is not null;

有索引的执行时间为11毫秒,不包含索引的执行时间为167毫秒.

Executes in 11 msec with index and in 167 msec without index.

删除is_active列,它在有索引的情况下以4毫秒执行,在没有索引的情况下以150毫秒执行.

Dropping the is_active column it executes in 4 msec with index and in 150 msec without index.

因此,如果这种情况某种程度上适合您的数据,则结论将是:如果您很少选择已删除的条目,则删除is_active列,并且不要在deleted_at列上创建索引.或根据您的需求调整基准并做出自己的结论.

So if this scenario somehow fits your data the conclusion would be: Drop the is_active column and don't create an index on deleted_at column if you are rarely selecting deleted entries. Or adjust the benchmark to your needs and make your own conclusion.

这篇关于在索引布尔列与日期时间列上查询的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆