传递大量数据的查询的最佳MySQL设置? [英] Optimal MySQL settings for queries that deliver large amounts of data?

查看:69
本文介绍了传递大量数据的查询的最佳MySQL设置?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一名科学家,我将MySQL用作数值模拟结果的存储库.通常,我有一组通过实验获得的数据和一个对照组.这两个数据集存储在一个表中.一个指示符字段告诉我记录是来自实验还是来自控件集.该表通常具有约1亿条记录. 5000万次实验和5000万个控件.

当我对数据进行后期处理时,我的典型任务是首先发出以下两个查询:

select b0,t0 from results_1mregr_c_ew_f where RC='E' and df>60  /// getting experiments data 

select b0,t0 from results_1mregr_c_ew_f where RC='C' and df>60 /// getting controls data

我在RC,df上有一个多列索引. 这些查询会花费大量时间,而查询会花费大部分时间来发送数据"

我正在具有12GB RAM的8core MacPro上运行此程序. 我是这台机器的一个用户,此任务是主要任务,因此我可以将所有RAM专用于MySQL.所有表都是MyISAM(如果可以提高查询速度,我可以将它们转换).

对于如何加快这些查询的任何建议,我将不胜感激. 我应该更改一些设置,索引,查询吗?...

在每一个查询中,我预计将获得约5000万条记录. 请注意,由于管理原因,不能将表格分为两个表格,一个表格包含实验,一个表格包含对照观察.

以下是输出:

explain select b0, t0 from results_1mregr_c_ew_f  where RC="C" and df>60
+----+-----------+---------------------+-----+-------------+---+-------+----+-------+-----------+
| id |select_type|table                |type |possible_keys|key|key_len|ref |rows   |Extra      |
+----+-----------+---------------------+-----+-------------+---+-------+----+-------+-----------+
|  1 |SIMPLE     |results_1mregr_c_ew_f|range|ff           |ff |11     |NULL|6251121|Using where|
+----+-----------+---------------------+-----+-------------+---+-------+----+-------+-----------+

以下是输出:

show indexes from results_1mregr_c_ew_f;
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table                 | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| results_1mregr_c_ew_f |          0 | PRIMARY  |            1 | id          | A         |    50793996 |     NULL | NULL   |      | BTREE      |         |
| results_1mregr_c_ew_f |          1 | ff       |            1 | RC          | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| results_1mregr_c_ew_f |          1 | ff       |            2 | df          | A         |         120 |     NULL | NULL   |      | BTREE      |         |
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

以下是输出:

CREATE TABLE `results_1mregr_c_ew_f` (
  `b0` double NOT NULL COMMENT '    ',
  `s0` double NOT NULL,
  `t0` double NOT NULL,
  `b1` double NOT NULL,
  `s1` double NOT NULL,
  `t1` double NOT NULL,
  `b2` double NOT NULL,
  `s2` double NOT NULL,
  `t2` double NOT NULL,
  `b3` double NOT NULL,
  `s3` double NOT NULL,
  `t3` double NOT NULL,
  `b4` double NOT NULL,
  `s4` double NOT NULL,
  `t4` double NOT NULL,
  `AD` char(4) NOT NULL,
  `chisq` double NOT NULL,
  `RC` char(7) NOT NULL,
  `colq` varchar(255) NOT NULL,
  `df` int(11) NOT NULL,
  `ncol` int(11) NOT NULL,
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `p1` float NOT NULL,
  `p2` float NOT NULL,
  `p3` float NOT NULL,
  `p4` float NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ff` (`RC`,`df`)
) ENGINE=MyISAM AUTO_INCREMENT=50793997 DEFAULT CHARSET=ascii |

解决方案

当我可以在60秒内在相似的硬件上执行相同的操作时,您的查询要花2个小时才能执行,因此必须有一个严重的错误.

以下某些内容可能会有所帮助...

为您的引擎调整MySQL

检查服务器配置并进行相应的优化.以下一些资源应该是有用的.

现在不那么明显了...

考虑使用存储过程来处理数据服务器端

为什么不处理MySQL内部的所有数据,从而不必将大量数据发送到应用程序层?以下示例使用游标在2分钟内循环和处理服务器端5000万行.我不是游标的忠实拥护者,尤其是在游标非常有限的MySQL中,但是我猜想您会循环结果集并进行某种形式的数值分析,因此在这种情况下使用游标是合理的. /p>

简化的myisam结果表-基于您的密钥.

drop table if exists results_1mregr_c_ew_f;
create table results_1mregr_c_ew_f
(
id int unsigned not null auto_increment primary key,
rc tinyint unsigned not null,
df int unsigned not null default 0,
val double(10,4) not null default 0,
ts timestamp not null default now(),
key (rc, df)
)
engine=myisam;

我生成了1亿行数据,其中关键字段的基数与您的示例大致相同:

show indexes from results_1mregr_c_ew_f;

Table                   Non_unique  Key_name    Seq_in_index    Column_name Collation   Cardinality Index_type
=====                   ==========  ========    ============    =========== =========   =========== ==========
results_1mregr_c_ew_f       0       PRIMARY         1               id          A       100000000   BTREE   
results_1mregr_c_ew_f       1       rc              1               rc          A               2   BTREE   
results_1mregr_c_ew_f       1       rc              2               df          A             223   BTREE   

存储过程

我创建了一个简单的存储过程,该过程可以获取所需的数据并对其进行处理(使用与示例相同的where条件)

drop procedure if exists process_results_1mregr_c_ew_f;

delimiter #

create procedure process_results_1mregr_c_ew_f
(
in p_rc tinyint unsigned,
in p_df int unsigned
)
begin

declare v_count int unsigned default 0;
declare v_done tinyint default 0;
declare v_id int unsigned;
declare v_result_cur cursor for select id from results_1mregr_c_ew_f where rc = p_rc and df > p_df;
declare continue handler for not found set v_done = 1;

open v_result_cur;

repeat
    fetch v_result_cur into v_id;

    set v_count = v_count + 1;
    -- do work...

until v_done end repeat;
close v_result_cur;

select v_count as counter;

end #

delimiter ; 

观察到以下运行时:

call process_results_1mregr_c_ew_f(0,60);

runtime 1 = 03:24.999 Query OK (3 mins 25 secs)
runtime 2 = 03:32.196 Query OK (3 mins 32 secs)

call process_results_1mregr_c_ew_f(1,60);

runtime 1 = 04:59.861 Query OK (4 mins 59 secs)
runtime 2 = 04:41.814 Query OK (4 mins 41 secs)

counter
========
23000002 (23 million rows processed in each case)

嗯,性能有些令人失望,因此进入下一个想法.

考虑使用innodb引擎(震惊)

为什么是innodb?因为它具有聚簇索引!您会发现使用innodb插入速度较慢,但​​希望读取速度会更快,因此这是一个值得权衡的选择.

通过聚集索引访问行是快速的,因为行数据位于索引搜索所在的同一页上.如果表很大,则与使用不同于索引记录的页面存储行数据的存储组织相比,聚集索引体系结构通常可以节省磁盘I/O操作.例如,MyISAM将一个文件用于 数据行,另一个用于索引记录.

更多信息在这里:

简化的innodb结果表

drop table if exists results_innodb;
create table results_innodb
(
rc tinyint unsigned not null,
df int unsigned not null default 0,
id int unsigned not null, -- cant auto_inc this !!
val double(10,4) not null default 0,
ts timestamp not null default now(),
primary key (rc, df, id) -- note clustered (innodb only !) composite PK
)
engine=innodb;

innodb的一个问题是它不支持构成复合键一部分的auto_increment字段,因此您必须自己使用序列生成器,触发器或其他方法提供增量键值-可能是在填充结果表本身??

同样,我生成了1亿行数据,这些键字段的基数与您的示例大致相同.如果这些数字与myisam示例不匹配,请不要担心,因为innodb估计基数,因此它们不会完全相同. (但它们-使用相同的数据集)

show indexes from results_innodb;

Table           Non_unique  Key_name    Seq_in_index    Column_name Collation   Cardinality Index_type
=====           ==========  ========    ============    =========== =========   =========== ==========
results_innodb      0       PRIMARY         1               rc          A                18     BTREE   
results_innodb      0       PRIMARY         2               df          A                18     BTREE   
results_innodb      0       PRIMARY         3               id          A         100000294     BTREE   

存储过程

存储过程与上面的myisam示例完全相同,但是从innodb表中选择数据.

declare v_result_cur cursor for select id from results_innodb where rc = p_rc and df > p_df;

结果如下:

call process_results_innodb(0,60);

runtime 1 = 01:53.407 Query OK (1 mins 53 secs)
runtime 2 = 01:52.088 Query OK (1 mins 52 secs)

call process_results_innodb(1,60);

runtime 1 = 02:01.201 Query OK (2 mins 01 secs)
runtime 2 = 01:49.737 Query OK (1 mins 50 secs)

counter
========
23000002 (23 million rows processed in each case)

比myisam引擎实施快

2-3分钟! (innodb FTW)

分而治之

在使用游标的服务器端存储过程中处理结果可能不是最佳解决方案,尤其是因为MySQL不支持诸如数组和复杂数据结构之类的东西,而这些东西在3GL语言(例如C#等)中甚至是现成的版本中都是可用的.其他数据库,例如Oracle PL/SQL.

因此,这里的想法是将一批数据返回到应用程序层(无论是C#还是什么),然后可以将结果添加到基于集合的数据结构中,然后在内部处理数据.

存储过程

存储过程采用3个参数rc,df_low和df_high,可让您选择以下数据范围:

call list_results_innodb(0,1,1); -- df 1
call list_results_innodb(0,1,10); -- df between 1 and 10
call list_results_innodb(0,60,120); -- df between 60 and 120 etc...

显然,df范围越高,您将提取的数据越多.

drop procedure if exists list_results_innodb;

delimiter #

create procedure list_results_innodb
(
in p_rc tinyint unsigned,
in p_df_low int unsigned,
in p_df_high int unsigned
)
begin
    select rc, df, id from results_innodb where rc = p_rc and df between p_df_low and p_df_high;
end #

delimiter ; 

我还敲出了一个myisam版本,除了所使用的表格外,其他版本都相同.

call list_results_1mregr_c_ew_f(0,1,1);
call list_results_1mregr_c_ew_f(0,1,10);
call list_results_1mregr_c_ew_f(0,60,120);

基于上面的游标示例,我希望innodb版本的性能优于myisam版本.

我开发了一个快速且肮脏的多线程C#应用程序,该应用程序将调用存储过程并将结果添加到集合中以进行后查询处理.您不必使用线程,可以按顺序执行相同的批处理查询方法,而不会造成很多性能损失.

每个线程(QueryThread)选择一个df数据范围,循环结果集并将每个结果(行)添加到结果集合中.

class Program
    {
        static void Main(string[] args)
        {
            const int MAX_THREADS = 12; 
            const int MAX_RC = 120;

            List<AutoResetEvent> signals = new List<AutoResetEvent>();
            ResultDictionary results = new ResultDictionary(); // thread safe collection

            DateTime startTime = DateTime.Now;
            int step = (int)Math.Ceiling((double)MAX_RC / MAX_THREADS) -1; 

            int start = 1, end = 0;
            for (int i = 0; i < MAX_THREADS; i++){
                end = (i == MAX_THREADS - 1) ? MAX_RC : end + step;
                signals.Add(new AutoResetEvent(false));

                QueryThread st = new QueryThread(i,signals[i],results,0,start,end);
                start = end + 1;
            }
            WaitHandle.WaitAll(signals.ToArray());
            TimeSpan runTime = DateTime.Now - startTime;

            Console.WriteLine("{0} results fetched and looped in {1} secs\nPress any key", results.Count, runTime.ToString());
            Console.ReadKey();
        }
    }

运行时观察如下:

Thread 04 done - 31580517
Thread 06 done - 44313475
Thread 07 done - 45776055
Thread 03 done - 46292196
Thread 00 done - 47008566
Thread 10 done - 47910554
Thread 02 done - 48194632
Thread 09 done - 48201782
Thread 05 done - 48253744
Thread 08 done - 48332639
Thread 01 done - 48496235
Thread 11 done - 50000000
50000000 results fetched and looped in 00:00:55.5731786 secs
Press any key

因此在不到60秒的时间内获取了5000万行并将其添加到集合中.

我使用myisam存储过程尝试了同样的事情,该过程花了2分钟才能完成.

50000000 results fetched and looped in 00:01:59.2144880 secs

移动到innodb

在我的简化系统中,myisam表的性能不会太差,因此可能不值得迁移到innodb.如果您确实决定将结果数据复制到innodb表中,请执行以下操作:

start transaction;

insert into results_innodb 
 select <fields...> from results_1mregr_c_ew_f order by <innodb primary key>;

commit;

在将整个事物插入并包装在事务中之前,先通过innodb PK对结果进行排序将加快速度.

我希望其中的一些方法会有所帮助.

祝你好运

I work as a scientist and I have used MySQL as a storage for the results of my numerical simulations. Typically I have a set of data obtained by my experiment and a control set. These two data sets are stored in one table. One indicator field tells me if a record comes from experiment or from a control set. This table usually has ~ 100 million records. 50million experiments and 50 million controls.

When I do the post processing of my data my typical task consists of first issuing the following two queries:

select b0,t0 from results_1mregr_c_ew_f where RC='E' and df>60  /// getting experiments data 

and

select b0,t0 from results_1mregr_c_ew_f where RC='C' and df>60 /// getting controls data

I have a multi column index on RC,df. These queries take lots of time and the queries spend most of the time "Sending data"

I'm running this on 8core MacPro with 12GB of RAM. I'm a single user of this machine and this task is the primary task therefore I can dedicate all the RAM to MySQL. All tables are MyISAM (I can convert them if that would increase teh speed of my queries).

I would appreciate any recommendations on how to speed up these queries. Should I change some settings, indices, queries....

In each of these queries I expect to get back ~ 50 million records. Note that splitting the table into two tables one containing experimental and one containing control observation is not an option due to administrative reasons.

Here is the output of:

explain select b0, t0 from results_1mregr_c_ew_f  where RC="C" and df>60
+----+-----------+---------------------+-----+-------------+---+-------+----+-------+-----------+
| id |select_type|table                |type |possible_keys|key|key_len|ref |rows   |Extra      |
+----+-----------+---------------------+-----+-------------+---+-------+----+-------+-----------+
|  1 |SIMPLE     |results_1mregr_c_ew_f|range|ff           |ff |11     |NULL|6251121|Using where|
+----+-----------+---------------------+-----+-------------+---+-------+----+-------+-----------+

Here is the output from:

show indexes from results_1mregr_c_ew_f;
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table                 | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| results_1mregr_c_ew_f |          0 | PRIMARY  |            1 | id          | A         |    50793996 |     NULL | NULL   |      | BTREE      |         |
| results_1mregr_c_ew_f |          1 | ff       |            1 | RC          | A         |           3 |     NULL | NULL   |      | BTREE      |         |
| results_1mregr_c_ew_f |          1 | ff       |            2 | df          | A         |         120 |     NULL | NULL   |      | BTREE      |         |
+-----------------------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

Here is the output from:

CREATE TABLE `results_1mregr_c_ew_f` (
  `b0` double NOT NULL COMMENT '    ',
  `s0` double NOT NULL,
  `t0` double NOT NULL,
  `b1` double NOT NULL,
  `s1` double NOT NULL,
  `t1` double NOT NULL,
  `b2` double NOT NULL,
  `s2` double NOT NULL,
  `t2` double NOT NULL,
  `b3` double NOT NULL,
  `s3` double NOT NULL,
  `t3` double NOT NULL,
  `b4` double NOT NULL,
  `s4` double NOT NULL,
  `t4` double NOT NULL,
  `AD` char(4) NOT NULL,
  `chisq` double NOT NULL,
  `RC` char(7) NOT NULL,
  `colq` varchar(255) NOT NULL,
  `df` int(11) NOT NULL,
  `ncol` int(11) NOT NULL,
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `p1` float NOT NULL,
  `p2` float NOT NULL,
  `p3` float NOT NULL,
  `p4` float NOT NULL,
  PRIMARY KEY (`id`),
  KEY `ff` (`RC`,`df`)
) ENGINE=MyISAM AUTO_INCREMENT=50793997 DEFAULT CHARSET=ascii |

解决方案

Something has to be seriously wrong for your query to be taking 2 hours to execute when I can do the same thing in under 60 seconds on similar hardware.

Some of the following might prove helpful...

Tune MySQL for your engine

Check your server configuration and optimise accordingly. Some of the following resources should be useful.

Now for the less obvious...

Consider using a stored procedure to process the data server side

Why not process all the data inside of MySQL so you don't have to send vast quantities of data to your application layer ? The following example uses a cursor to loop and process 50M rows server side in under 2 minutes. I'm not a huge fan of cursors, especially in MySQL where they are very limited, but I'm guessing you'd be looping the resultset and doing some form of numerical analysis so use of a cursor is justifiable in this case.

Simplified myisam results table - keys based on yours.

drop table if exists results_1mregr_c_ew_f;
create table results_1mregr_c_ew_f
(
id int unsigned not null auto_increment primary key,
rc tinyint unsigned not null,
df int unsigned not null default 0,
val double(10,4) not null default 0,
ts timestamp not null default now(),
key (rc, df)
)
engine=myisam;

I generated 100M rows of data with the key fields having approximately the same cardinality as in your example:

show indexes from results_1mregr_c_ew_f;

Table                   Non_unique  Key_name    Seq_in_index    Column_name Collation   Cardinality Index_type
=====                   ==========  ========    ============    =========== =========   =========== ==========
results_1mregr_c_ew_f       0       PRIMARY         1               id          A       100000000   BTREE   
results_1mregr_c_ew_f       1       rc              1               rc          A               2   BTREE   
results_1mregr_c_ew_f       1       rc              2               df          A             223   BTREE   

Stored procedure

I created a simple stored procedure that fetches the required data and processes it (uses same where condition as your example)

drop procedure if exists process_results_1mregr_c_ew_f;

delimiter #

create procedure process_results_1mregr_c_ew_f
(
in p_rc tinyint unsigned,
in p_df int unsigned
)
begin

declare v_count int unsigned default 0;
declare v_done tinyint default 0;
declare v_id int unsigned;
declare v_result_cur cursor for select id from results_1mregr_c_ew_f where rc = p_rc and df > p_df;
declare continue handler for not found set v_done = 1;

open v_result_cur;

repeat
    fetch v_result_cur into v_id;

    set v_count = v_count + 1;
    -- do work...

until v_done end repeat;
close v_result_cur;

select v_count as counter;

end #

delimiter ; 

The following runtimes were observed:

call process_results_1mregr_c_ew_f(0,60);

runtime 1 = 03:24.999 Query OK (3 mins 25 secs)
runtime 2 = 03:32.196 Query OK (3 mins 32 secs)

call process_results_1mregr_c_ew_f(1,60);

runtime 1 = 04:59.861 Query OK (4 mins 59 secs)
runtime 2 = 04:41.814 Query OK (4 mins 41 secs)

counter
========
23000002 (23 million rows processed in each case)

Hmmmm, performance a bit disappointing so onto the next idea.

Consider using the innodb engine (shock horror)

Why innodb ?? because it has clustered indexes ! You will find inserting slower using innodb but hopefully it will be faster to read so it's a trade off that might be worth it.

Accessing a row through the clustered index is fast because the row data is on the same page where the index search leads. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. For example, MyISAM uses one file for data rows and another for index records.

More info here :

Simplified innodb results table

drop table if exists results_innodb;
create table results_innodb
(
rc tinyint unsigned not null,
df int unsigned not null default 0,
id int unsigned not null, -- cant auto_inc this !!
val double(10,4) not null default 0,
ts timestamp not null default now(),
primary key (rc, df, id) -- note clustered (innodb only !) composite PK
)
engine=innodb;

One problem with innodb is that is doesnt support auto_increment fields that form part of a composite key so you'd have to provide the incrementing key value yourself using a sequence generator, trigger or some other method - perhaps in the application populating the result table itself ??

Again, I generated 100M rows of data with the key fields having approximately the same cardinality as in your example. Don't worry if these figures don't match the myisam example as innodb estimates the cardinalities so they wont be exactly the same. (but they are - same dataset used)

show indexes from results_innodb;

Table           Non_unique  Key_name    Seq_in_index    Column_name Collation   Cardinality Index_type
=====           ==========  ========    ============    =========== =========   =========== ==========
results_innodb      0       PRIMARY         1               rc          A                18     BTREE   
results_innodb      0       PRIMARY         2               df          A                18     BTREE   
results_innodb      0       PRIMARY         3               id          A         100000294     BTREE   

Stored procedure

The stored procedure is exactly the same as the myisam example above but selects data from the innodb table instead.

declare v_result_cur cursor for select id from results_innodb where rc = p_rc and df > p_df;

The results are as follows:

call process_results_innodb(0,60);

runtime 1 = 01:53.407 Query OK (1 mins 53 secs)
runtime 2 = 01:52.088 Query OK (1 mins 52 secs)

call process_results_innodb(1,60);

runtime 1 = 02:01.201 Query OK (2 mins 01 secs)
runtime 2 = 01:49.737 Query OK (1 mins 50 secs)

counter
========
23000002 (23 million rows processed in each case)

approx 2-3 mins faster than the myisam engine implementation ! (innodb FTW)

Divide and Conquer

Processing the results in a server side stored procedure that uses a cursor might not be an optimal solution especially as MySQL doesnt have support for things such arrays and complex data structures that are readily available in 3GL languages such as C# etc or even in other databases such as Oracle PL/SQL.

So the idea here is to return batches of data to an application layer (C# whatever) which can then add the results to a collection based data structure and then process the data internally.

Stored procedure

The stored procedure takes 3 paramaters rc, df_low and df_high which allows you to select a range of data as follows:

call list_results_innodb(0,1,1); -- df 1
call list_results_innodb(0,1,10); -- df between 1 and 10
call list_results_innodb(0,60,120); -- df between 60 and 120 etc...

obviously the higher the df range the more data you'll be extracting.

drop procedure if exists list_results_innodb;

delimiter #

create procedure list_results_innodb
(
in p_rc tinyint unsigned,
in p_df_low int unsigned,
in p_df_high int unsigned
)
begin
    select rc, df, id from results_innodb where rc = p_rc and df between p_df_low and p_df_high;
end #

delimiter ; 

I also knocked up a myisam version also which is identical except for the table that is used.

call list_results_1mregr_c_ew_f(0,1,1);
call list_results_1mregr_c_ew_f(0,1,10);
call list_results_1mregr_c_ew_f(0,60,120);

Based on the cursor example above I would expect the innodb version to out-perform the myisam one.

I devloped a quick and dirty multi-threaded C# application that will call the stored procedure and add the results to a collection for post query processing. You dont have to use threads, the same batched query approach could be done sequentially without much loss of performance.

Each thread (QueryThread) selects a range of df data, loops the resultset and adds each result (row) to the results collection.

class Program
    {
        static void Main(string[] args)
        {
            const int MAX_THREADS = 12; 
            const int MAX_RC = 120;

            List<AutoResetEvent> signals = new List<AutoResetEvent>();
            ResultDictionary results = new ResultDictionary(); // thread safe collection

            DateTime startTime = DateTime.Now;
            int step = (int)Math.Ceiling((double)MAX_RC / MAX_THREADS) -1; 

            int start = 1, end = 0;
            for (int i = 0; i < MAX_THREADS; i++){
                end = (i == MAX_THREADS - 1) ? MAX_RC : end + step;
                signals.Add(new AutoResetEvent(false));

                QueryThread st = new QueryThread(i,signals[i],results,0,start,end);
                start = end + 1;
            }
            WaitHandle.WaitAll(signals.ToArray());
            TimeSpan runTime = DateTime.Now - startTime;

            Console.WriteLine("{0} results fetched and looped in {1} secs\nPress any key", results.Count, runTime.ToString());
            Console.ReadKey();
        }
    }

Runtime observed as follows:

Thread 04 done - 31580517
Thread 06 done - 44313475
Thread 07 done - 45776055
Thread 03 done - 46292196
Thread 00 done - 47008566
Thread 10 done - 47910554
Thread 02 done - 48194632
Thread 09 done - 48201782
Thread 05 done - 48253744
Thread 08 done - 48332639
Thread 01 done - 48496235
Thread 11 done - 50000000
50000000 results fetched and looped in 00:00:55.5731786 secs
Press any key

So 50 million rows fetched and added to a collection in under 60 seconds.

I tried the same thing using the myisam stored procedure which took 2 minutes to complete.

50000000 results fetched and looped in 00:01:59.2144880 secs

Moving to innodb

In my simplified system the myisam table doesnt perform too badly so it might not be worth migrating to innodb. If you do decided to copy your result data to an innodb table then do it as follows:

start transaction;

insert into results_innodb 
 select <fields...> from results_1mregr_c_ew_f order by <innodb primary key>;

commit;

Ordering the result by the innodb PK before inserting and wrapping the whole thing up in a transaction will speed things up.

I hope some of this proves helpful.

Good luck

这篇关于传递大量数据的查询的最佳MySQL设置?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆