为什么在全表扫描中MySQL InnoDB比MyISAM慢得多? [英] Why is MySQL InnoDB so much slower at full table scans than MyISAM?

查看:94
本文介绍了为什么在全表扫描中MySQL InnoDB比MyISAM慢得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

编辑


OP在对PostgreSQL进行性能分析时,在下面的回答中承认了一个错误。我正在更新此问题,以反映MyISAM和Google


你好,



我针对MySQL InnoDB进行了测试, MyISAM和PostgreSQL,以查看这些引擎中的每个
在执行全表扫描方面的表现如何,以了解在我们不可避免地需要
使其发生这种情况的情况下,响应时间可能是
。 / p>

测试是在Intel Core 2 Quad Q6600 @ 2.4Ghz上进行的,带有
4GB RAM和具有16MB缓存的7200 RPM HD。



MySQL版本为5.0.67-community-nt-log 32位,PGSQL版本为8.4。



我写了一个小型脚本,可在4列表中生成500万行数据。
这些是MySQL和PGSQL中使用的create table语句:



-InnoDB

  CREATE TABLE sample_innodb(
id整数unsigned不为null,
vc1 varchar(200)不为空,
vc2 varchar(200)不为空,
vc3 varchar(200)not null
)ENGINE = InnoDB;

-MyISAM

  CREATE TABLE sample_isam(
id整数unsigned不为null,
vc1 varchar(200)不为空,
vc2 varchar(200)不为空,
vc3 varchar (200)不为空
)ENGINE = MyISAM;

-PostgreSQL

 创建表sample_pgsql(
id整数不为null,
vc1 varchar(200)不为空,
vc2 varchar(200)不为空,
vc3 varchar( 200)不为null
);

这是我用来为这些表生成数据的脚本:

  var chars ='0123456789ABCDEFGHIJKLMNOPQRSTUVWXTZabcdefghiklmnopqrstuvwxyz'.split(''); 

函数randomString(length){
var str =’;
for(var i = 0; i< length; i ++){
str + = chars [Math.floor(Math.random()* chars.length)]];
}

return str;
}

function genrow(idv,vcv1,vcv2,vcv3){
return idv +, + vcv1 +, + vcv2 +, + vcv3;
}

函数gentable(numrows){
for(var i = 0; i< numrow; i ++){
var row =
genrow (i,
randomString(10),
randomString(20),
randomString(30));

WScript.Echo(row);
}
}

gentable(5000000);

我在Windows上使用以下命令运行了该脚本:



cscript.exe / nologo test.js> data.csv



您可以使用以下命令将此数据加载到MySQL中:

  LOAD DATA LOCAL INFILE'data.csv'
插入表sample_innodb
以','
终止的字段以'\n'$ b终止的行$ b(id,vc1,vc2,vc3);

加载数据本地文件'data.csv'
到表sample_isam
字段以','
线以'\n'
终止(id,vc1,vc2,vc3);

您可以使用以下命令将数据加载到PGSQL中:

 从'data.csv'中使用带有定界符','


我使用此查询作为计时,以尝试强制执行最坏情况的表扫描方案:



MySQL

 从[表] 
中选择count(*),其中vc1如'%blah0%',vc2如'%blah1%',vc3如'%blah2 %';



PostgreSQL

 选择计数*)从[表] 
中,其中vc1 ilike'%blah0%'和vc2 ilike'%blah1%'和vc3 ilike'%blah2%';

我多次运行此查询以获取平均完成时间,而将
留在了



结果如下:




  • InnoDB-8.56s

  • MyISAM-1.84s

  • PGSQL-8.4s



问题



为什么InnoDB和MyISAM在完成完整表扫描的时间上如此遥远?我只是在MySQL配置中缺少明显的东西吗?我已经使用MySQL多年了,只要我的问题仅限于索引可以解决此问题,就没有问题。



分区将显然也可以解决此问题,但是
的成本要高得多。



作为参考,以下是我的MySQL和PGSQL配置文件:



MYSQL配置



  [客户端] 
端口= 3306

[mysql]
default-character-set = utf8

[mysqld]
port = 3306
basedir = C:/ Program Files / MySQL / MySQL Server 5.0 /
datadir = C:/ Program Files / MySQL / MySQL Server 5.0 / Data /
default-character-set = utf8
default-storage-engine = INNODB
log = c:/logs/mysql/mysqld.log
sql-mode = STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
max_connections = 700
query_cache_size = 0M
table_cache = 1400
tmp_table_size = 16M
thread_cache_size = 34

myisam_max_sort_file_size = 100G
myisam_sort_buffer_size = 8M
key_buffer_size = 200M
read_buffer_size = 64K
read_rnd_buffer_size = 256K
sort_buffer_size = 208K

innodb_additional_mem_pool_size = 2M
innodb_flush_log_at_trx_commit = 1
b $ b innodb_buffer_pool_size = 200M
innodb_log_file_size = 18M
innodb_thread_concurrency = 10



PGSQL CONFIG



  listen_addresses ='*'#要监听的IP地址; 
#以逗号分隔的地址列表;
#默认为'localhost','*'=所有
#(更改需要重启)
端口= 5432#(更改需要重启)
max_connections = 100#(更改需要重新启动)

shared_buffers = 32MB#分钟128kB
#(更改需要重新启动)
temp_buffers = 12MB#分钟800kB
maintenance_work_mem = 32MB#分钟1MB

log_destination ='stderr'#有效值是
#stderr,csvlog,syslog和eventlog的组合,
#取决于平台。 csvlog
#要求logging_collector处于打开状态。

logging_collector = on#启用将stderr和csvlog
#捕获到日志文件中。必须在
#csvlogs上启用。
#(更改需要重新启动)

log_line_prefix ='%t'#特殊值:
#%u =用户名
#%d =数据库名
#%r =远程主机和端口
#%h =远程主机
#%p =进程ID
#%t =不带毫秒的时间戳
#%m =时间戳毫秒
#%i =命令标签
#%c =会话ID
#%l =会话行号
#%s =会话开始时间戳
#% v =虚拟交易ID
#%x =交易ID(如果没有则为0)
#%q =在非会话中停止此处
#进程
#%% ='% '
#例如’<%u %%% d> '

datestyle ='iso,mdy'
lc_messages ='English_United States.1252'#系统错误消息
的语言环境#字符串
lc_monetary ='English_United States。 1252'#货币格式
的语言环境lc_numeric ='English_United States.1252'#数字格式
的语言环境
lc_time ='English_United States.1252'#时间格式的语言环境

default_text_search_config ='pg_catalog.english'

还要了解这些数据集在MySQL中的实际大小,这是一个
在其上显示表格状态\G,以防万一:

  ******** ******************* 1.行*************************** 
名称:sample_innodb
引擎:InnoDB
版本:10
行格式:紧凑型
行:5000205
平均行长:100
数据长度:500154368
最大数据长度:0
索引长度:149700608
Data_free:0
自动递增:NULL
创建时间:2010-02-02 17:27:50
更新时间:NULL
检查时间:NULL
排序规则:utf8_general_ci
校验和:NULL
创建选项:
评论:InnoDB free:497664 kB

*************** ************ 2.行***************************
名称:sample_isam
引擎:MyISAM
版本:10
行格式:动态
行:5000000
平均长度:72
数据长度:360006508
最大数据长度:281474976710655
索引长度:1024
数据免费:0
自动递增:NULL
创建时间:2010-02-02 17:27:50
更新时间:2010-02-02 17:37: 23
Check_time:NULL
整理:utf8_general_ci
校验和:NULL
创建选项:
注释:


解决方案

在我的服务器上,您的配置我的基本性能如下:




  • InnoDB:5.71s

  • MyISAM:2.50s



这在我的书中还不错,但是可以进行一些调整。



您的InnoDB性能可以在几个方面得到改善:



增加 innodb_buffer_pool_size




  • 这是单个最重要的InnoDB配置变量。理想情况下,它应该是专用于MySQL InnoDB的服务器上可用RAM的70-80%。

  • 增加 innodb_buffer_pool_size 在我的服务器上达到2G(对于此测试来说绰绰有余)减少,将InnoDB时间减少至 4.60s



标识主键




  • InnoDB基于主键对数据进行聚类。当您不声明一个时,InnoDB隐式生成一个随机的。具有顺序主键(id)的速度比随机主键的速度快。

  • 使服务器上的PRIMARY KEY的ID 减少,使InnoDB时间降至 3.80 s



使用MySQL升级MySQL / InnoDB



5.1+ MySQL支持可插拔存储引擎。具体来说,就是新的 InnoDB插件



新的InnoDB引擎提供了许多性能增强功能,可能会对这种特定类型的查询产生重大影响。



值得注意的是:




  • 从MySQL 5.1.38开始,InnoDB插件包含在MySQL

  • 从MySQL 5.1开始。 43,InnoDB插件不仅包含在内,而且是MySQL的默认引擎


EDIT

OP has acknowledged a mistake when profiling PostgreSQL in his answer below. I am updating this question to reflect the comparison between MyISAM & InnoDB.

Hello,

I ran a test against MySQL InnoDB, MyISAM, and PostgreSQL to see how well each of these engines performed doing full table scans to understand what the response time might be for the cases where we inevitably need to let this happen.

The tests were conducted on an Intel Core 2 Quad Q6600 @ 2.4Ghz w/ 4GB RAM and a 7200 RPM HD with a 16MB cache.

MySQL Version was 5.0.67-community-nt-log 32-bit, PGSQL version was 8.4.

I wrote a small script to generate 5 million rows of data in a 4 column table. These are the create table statements used in MySQL and PGSQL:

-- InnoDB

CREATE TABLE sample_innodb (
id integer unsigned not null,
vc1 varchar(200) not null,
vc2 varchar(200) not null,
vc3 varchar(200) not null
) ENGINE=InnoDB;

-- MyISAM

CREATE TABLE sample_isam (
id integer unsigned not null,
vc1 varchar(200) not null,
vc2 varchar(200) not null,
vc3 varchar(200) not null
) ENGINE=MyISAM;

-- PostgreSQL

create table sample_pgsql (
id integer not null,
vc1 varchar(200) not null,
vc2 varchar(200) not null,
vc3 varchar(200) not null
);

This is the script I used to generate data for these tables:

var chars = '0123456789ABCDEFGHIJKLMNOPQRSTUVWXTZabcdefghiklmnopqrstuvwxyz'.split('');

function randomString(length) {
 var str = '';
 for (var i = 0; i < length; i++) {
    str += chars[Math.floor(Math.random() * chars.length)];
 }

   return str;
}

function genrow(idv, vcv1, vcv2, vcv3) {
 return idv + "," + vcv1 + "," + vcv2 + "," + vcv3;
}

function gentable(numrows) {
 for (var i = 0; i < numrows; i++) {
    var row = 
        genrow(i,
               randomString(10),
               randomString(20),
               randomString(30));

    WScript.Echo(row);
  }
}

gentable(5000000);

I ran this script on Windows with the command:

cscript.exe /nologo test.js > data.csv

You can load this data into MySQL with these commands:

LOAD DATA LOCAL INFILE 'data.csv'
INTO TABLE sample_innodb
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(id, vc1, vc2, vc3);

LOAD DATA LOCAL INFILE 'data.csv'
INTO TABLE sample_isam
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(id, vc1, vc2, vc3);

You can load the data into PGSQL with this command:

copy sample_pgsql (id, vc1, vc2, vc3) from 'data.csv' with delimiter ','

I used this query for timing to try and force a worst case table scan scenario:

MySQL

select count(*) from [table] 
where vc1 like '%blah0%' and vc2 like '%blah1%' and vc3 like '%blah2%';

PostgreSQL

select count(*) from [table] 
where vc1 ilike '%blah0%' and vc2 ilike '%blah1%' and vc3 ilike '%blah2%';

I ran this query multiple times to get an average time to complete, leaving out the first run to get everything primed in memory.

The results were as follows:

  • InnoDB - 8.56s
  • MyISAM - 1.84s
  • PGSQL - 8.4s

Question

Why is it that InnoDB and MyISAM are so far apart in terms of time to complete a full table scan? Am I just missing something obvious in the MySQL config? I've used MySQL for years and have had no problems with it as long as my problems were constrained to the set of "indexes can solve this" problems.

Partitioning will obviously also solve this problem, but at a much higher cost.

For reference, here are my config files for MySQL and PGSQL:

MYSQL CONFIG

[client]
port=3306

[mysql]
default-character-set=utf8

[mysqld]
port=3306
basedir="C:/Program Files/MySQL/MySQL Server 5.0/"
datadir="C:/Program Files/MySQL/MySQL Server 5.0/Data/"
default-character-set=utf8
default-storage-engine=INNODB
log="c:/logs/mysql/mysqld.log"
sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
max_connections=700
query_cache_size=0M
table_cache=1400
tmp_table_size=16M
thread_cache_size=34

myisam_max_sort_file_size=100G
myisam_sort_buffer_size=8M
key_buffer_size=200M
read_buffer_size=64K
read_rnd_buffer_size=256K
sort_buffer_size=208K

innodb_additional_mem_pool_size=2M
innodb_flush_log_at_trx_commit=1
innodb_log_buffer_size=1M
innodb_buffer_pool_size=200M
innodb_log_file_size=18M
innodb_thread_concurrency=10

PGSQL CONFIG

listen_addresses = '*'        # what IP address(es) to listen on;
                # comma-separated list of addresses;
                # defaults to 'localhost', '*' = all
                # (change requires restart)
port = 5432                # (change requires restart)
max_connections = 100            # (change requires restart)

shared_buffers = 32MB            # min 128kB
                # (change requires restart)
temp_buffers = 12MB            # min 800kB
maintenance_work_mem = 32MB        # min 1MB

log_destination = 'stderr'        # Valid values are combinations of
                # stderr, csvlog, syslog and eventlog,
                # depending on platform.  csvlog
                # requires logging_collector to be on.

logging_collector = on        # Enable capturing of stderr and csvlog
                # into log files. Required to be on for
                # csvlogs.
                # (change requires restart)

log_line_prefix = '%t'            # special values:
                #   %u = user name
                #   %d = database name
                #   %r = remote host and port
                #   %h = remote host
                #   %p = process ID
                #   %t = timestamp without milliseconds
                #   %m = timestamp with milliseconds
                #   %i = command tag
                #   %c = session ID
                #   %l = session line number
                #   %s = session start timestamp
                #   %v = virtual transaction ID
                #   %x = transaction ID (0 if none)
                #   %q = stop here in non-session
                #        processes
                #   %% = '%'
                # e.g. '<%u%%%d> '

datestyle = 'iso, mdy'
lc_messages = 'English_United States.1252'            # locale for system error message
                # strings
lc_monetary = 'English_United States.1252'            # locale for monetary formatting
lc_numeric = 'English_United States.1252'            # locale for number formatting
lc_time = 'English_United States.1252'                # locale for time formatting

default_text_search_config = 'pg_catalog.english'

Also to understand how large these data sets actually are in MySQL, here's a show table status \G on them, in case this helps:

*************************** 1. row ***************************
           Name: sample_innodb
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 5000205
 Avg_row_length: 100
    Data_length: 500154368
Max_data_length: 0
   Index_length: 149700608
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2010-02-02 17:27:50
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment: InnoDB free: 497664 kB

*************************** 2. row ***************************
           Name: sample_isam
         Engine: MyISAM
        Version: 10
     Row_format: Dynamic
           Rows: 5000000
 Avg_row_length: 72
    Data_length: 360006508
Max_data_length: 281474976710655
   Index_length: 1024
      Data_free: 0
 Auto_increment: NULL
    Create_time: 2010-02-02 17:27:50
    Update_time: 2010-02-02 17:37:23
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options:
        Comment:

解决方案

On my server with your configuration my base performance is as follows:

  • InnoDB: 5.71s
  • MyISAM: 2.50s

This is not bad in my book, but it can use some tweaking.

Your InnoDB performance can be improved on several fronts:

Increase innodb_buffer_pool_size

  • This is the single most important InnoDB configuration variable. Ideally it should be 70-80% of your available RAM on a server dedicated to MySQL and InnoDB.
  • Increasing innodb_buffer_pool_size on my server to 2G (more than enough for this test) decreased the InnoDB time to 4.60s

Make id the PRIMARY KEY

  • InnoDB clusters it's data based on PRIMARY KEY. When you don't declare one, InnoDB generates a random one implicitly. Having a sequential primary key (id) is faster than a random one.
  • Making id the PRIMARY KEY on my server decreased the InnoDB time to 3.80s

Upgrade your MySQL/InnoDB

With MySQL 5.1+ MySQL supports pluggable storage engines. Specifically the new InnoDB Plugin.

The new InnoDB engine provides many numerous performance enhancements that may have significant affects on this particular type of query.

Of note:

  • As of MySQL 5.1.38, the InnoDB Plugin is included in MySQL
  • As of MySQL 5.1.43, the InnoDB Plugin is not only included, but is the default engine for MySQL

这篇关于为什么在全表扫描中MySQL InnoDB比MyISAM慢得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆