MySQL:有效地在存储过程中填充表 [英] MySQL: Fill a table within a Stored Procedure efficiently

查看:192
本文介绍了MySQL:有效地在存储过程中填充表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MySQL 服务器中测试性能,并填充了超过2亿条记录的表。存储过程非常慢,生成大的SQL字符串。



系统资讯:




  • 数据库: MySQL 5.6.10 InnoDB数据库

  • RAM: 16GB DDR3 1600Mhz CL8。



存储过程创建一个INSERT sql查询,并将所有值插入表中。

  DELIMITER $$ 
USE`test` $$

DROP程序如果存在`inputRowsNoRandom` $$

CREATE DEFINER =`root` @`localhost` PROCEDURE`inputRowsNoRandom`(IN NumRows BIGINT)
BEGIN
/ *构建一个插入错误的插入操作* /
DECLARE i BIGINT;
DECLARE nMax BIGINT;
DECLARE squery LONGTEXT;
DECLARE svalue LONGTEXT;

SET i = 1;
SET nMax = NumRows + 1;
SET squery ='INSERT INTO`entity_versionable`(fk_entity,str1,str2,bool1,double1,DATE)VALUES';
SET svalues ='(1,a1,100,1,500000,2013-06-14 12:40:45),';

WHILE i< nMax DO
SET squery = CONCAT(squery,svalues);
SET i = i + 1;
END WHILE;

/ * SELECT squery; * /
SET squery = LEFT(squery,CHAR_LENGTH(squery) - 1);
SET squery = CONCAT(squery,;);
SELECT squery;

/ * EXECUTE INSERT SENTENCE * /
/ * START TRANSACTION; * /
/ * PREPARE stmt from squery;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
* /

/ * COMMIT; * /
END $$
DELIMITER;



结果


  1. 连接20000个字符串需要大约45秒的时间来处理:



< blockquote>

CALL test.inputRowsNoRandom(20000);





  1. 连接100000个字符串大约需要+5/12分钟O_O:




CALL test.inputRowsNoRandom(100000);



结果(按持续时间排序) - 说明百分比

释放项目0.00005 50.00000

开始0.00002 20.00000

执行0.00001 10.00000

初始0.00001 10.00000

清理0.00001 10.00000

总计0.00010 100.00000



更改状态变量由于执行查询

变量值描述

Bytes_received 21从客户端发送到服务器的字节

Bytes_sent从服务器发送的字节数客户端

Com_select 1已执行的SELECT语句数

问题1服务器执行的语句数




测试:

我已经测试了12到64个线程的不同MySQL配置,缓存开启和关闭,将日志移动到另一个硬盘...

也使用TEXT,INT测试。



其他信息:







问题:




  • 代码中有错误吗?如果我发送100000个字符串来构建最终的SQL字符串, SELECT squery; 的结果是一个NULL字符串。发生了什么? (错误必须在那里,但我看不到它)。

  • 我可以通过任何方式改进代码加快它吗?

  • 我已经读取一些操作在存储过程可能真的很慢,应该生成的文件在C / Java / PHP ..和发送到mysql


    mysql -u mysqluser -p databasename< numbers.sql


  • MySQL似乎只使用一个核心一个单一的SQL查询,将 nginx 或其他数据库系统:多线程数据库 Cassandra Redis MongoDB ..)使用存储过程实现更好的性能,并为一个查询使用多个CPU? (因为我的单个查询只使用总CPU的20%,大约150个线程)。





UPDATE:





div class =h2_lin>解决方案

不要在RDBMS中使用循环,特别是那个缩放。



尝试使用查询快速填充表格1米行

  INSERT INTO`entity_versionable`(fk_entity,str1,str2,bool1,double1,date)
SELECT 1,'a1',100,1,500000,'2013-06-14 12:40:45'
FROM

select aN + bN * 10 + cN * 100 + dN * 1000 + eN * 10000 + fN * 100000 + 1 N
from(select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9)a
,(select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9)b
,(select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9)c
,(select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9)d
,(select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9)e
,(select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9)f
)t

$ 〜8秒完成










<
查询OK,1000000行受影响(7.63秒)
记录:1000000重复:0警告:0

strong> UPDATE1 现在使用预准备语句的存储过程版本

  DELIMITER $$ 
CREATE PROCEDURE`inputRowsNoRandom`(IN NumRows INT)
BEGIN
DECLARE i INT DEFAULT 0;

PREPARE stmt
FROM'INSERT INTO`entity_versionable`(fk_entity,str1,str2,bool1,double1,date)
VALUES(?,?,?,?, ?)';
SET @ v1 = 1,@ v2 ='a1',@ v3 = 100,@ v4 = 1,@ v5 = 500000,@ v6 ='2013-06-14 12:40:45'

WHILE i< NumRows DO
EXECUTE stmt USING @ v1,@ v2,@ v3,@ v4,@ v5,@ v6;
SET i = i + 1;
END WHILE;

DEALLOCATE PREPARE stmt;
END $$
DELIMITER;

在〜3分钟内完成:

 
mysql> CALL inputRowsNoRandom(1000000);
查询OK,0行受影响(2分钟51.57秒)

感觉差别8秒vs 3分钟



UPDATE2 为了加快速度,我们可以批量显式使用事务和提交插入。所以这里是一个改进版本的SP。

  DELIMITER $$ 
CREATE PROCEDURE inputRowsNoRandom1(IN NumRows BIGINT, IN BatchSize INT)
BEGIN
DECLARE i INT DEFAULT 0;

PREPARE stmt
FROM'INSERT INTO`entity_versionable`(fk_entity,str1,str2,bool1,double1,date)
VALUES(?,?,?,?, ?)';
SET @ v1 = 1,@ v2 ='a1',@ v3 = 100,@ v4 = 1,@ v5 = 500000,@ v6 ='2013-06-14 12:40:45'

开始交易;
WHILE i< NumRows DO
EXECUTE stmt USING @ v1,@ v2,@ v3,@ v4,@ v5,@ v6;
SET i = i + 1;
如果i%BatchSize = 0 THEN
COMMIT;
START TRANSACTION;
END IF;
END WHILE;
COMMIT;
DEALLOCATE PREPARE stmt;
END $$
DELIMITER;

不同批次大小的结果:

 
mysql> CALL inputRowsNoRandom1(1000000,1000);
查询OK,0 rows affected(27.25 sec)

mysql> CALL inputRowsNoRandom1(1000000,10000);
查询OK,0行受影响(26.76秒)

mysql> CALL inputRowsNoRandom1(1000000,100000);
查询OK,0行受影响(26.43秒)

您自己看到差异。仍比交叉连接差3倍。


I am testing performance in a MySQL Server and filling a table with more than 200 million of records. The Stored Procedure is very slow generating the big SQL string. Any help or comment is really welcome.

System Info:

  • Database: MySQL 5.6.10 InnoDB database (test).
  • Processor: AMD Phenom II 1090T X6 core, 3910Mhz each core.
  • RAM: 16GB DDR3 1600Mhz CL8.
  • HD: Windows 7 64bits SP1 in SSD, mySQL installed in SSD, logs written in mechanical hard disk.

The Stored Procedure creates a INSERT sql query with all the values to be inserted into the table.

DELIMITER $$
USE `test`$$

DROP PROCEDURE IF EXISTS `inputRowsNoRandom`$$

CREATE DEFINER=`root`@`localhost` PROCEDURE `inputRowsNoRandom`(IN NumRows BIGINT)
BEGIN
    /* BUILD INSERT SENTENCE WITH A LOS OF ROWS TO INSERT */
    DECLARE i BIGINT;
    DECLARE nMax BIGINT;
    DECLARE squery LONGTEXT;
    DECLARE svalues LONGTEXT;

    SET i = 1;
    SET nMax = NumRows + 1;
    SET squery = 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, DATE) VALUES ';
    SET svalues = '("1", "a1", 100, 1, 500000, "2013-06-14 12:40:45"),';

    WHILE i < nMax DO
        SET squery = CONCAT(squery, svalues);
        SET i = i + 1;
    END WHILE;

    /*SELECT squery;*/
    SET squery = LEFT(squery, CHAR_LENGTH(squery) - 1);
    SET squery = CONCAT(squery, ";");
    SELECT squery;

    /* EXECUTE INSERT SENTENCE */
    /*START TRANSACTION;*/
    /*PREPARE stmt FROM squery;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    */

    /*COMMIT;*/
END$$
DELIMITER ;


Results:

  1. Concatenating 20000 strings takes about 45 seconds to be processed:

CALL test.inputRowsNoRandom(20000);

  1. Concatenating 100000 strings takes about +5/12 minutes O_O:

CALL test.inputRowsNoRandom(100000);

Result (ordered by duration) - stateduration (summed) in sec || percentage
freeing items 0.00005 50.00000
starting 0.00002 20.00000
executing 0.00001 10.00000
init 0.00001 10.00000
cleaning up 0.00001 10.00000
Total 0.00010 100.00000

Change Of STATUS VARIABLES Due To Execution Of Query
variable value description
Bytes_received 21 Bytes sent from the client to the server
Bytes_sent 97 Bytes sent from the server to the client
Com_select 1 Number of SELECT statements that have been executed
Questions 1 Number of statements executed by the server

Tests:
I have already tested with different MySQL configurations from 12 to 64 threads, setting cache on and off, moving logs to another hardware disk...
Also tested using TEXT, INT..

Additional Information:


Questions:

  • Is something wrong in the code? If I send 100000 strings to build the final SQL string, the result of SELECT squery; is a NULL string. Whats happening? (error must be there but I dont see it).
  • Can I improve the code in any way to speed it up?
  • I have read some operations in Stored Procedures can be really slow, should I generate the file in C/Java/PHP.. and send it to mysql?

    mysql -u mysqluser -p databasename < numbers.sql

  • MySQL seems to use only one core for one single SQL query, would nginx or other database system: Multithreadted DBs, Cassandra, Redis, MongoDB..) achieve better performance with stored procedures and use more than one CPU for one query? (Since my single query is using only 20% of total CPU with about 150 threads).

UPDATE:

解决方案

Don't use loops especially on that scale in RDBMS.

Try to quickly fill your table with 1m rows with a query

INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
SELECT 1, 'a1', 100, 1, 500000, '2013-06-14 12:40:45'
  FROM
(
select a.N + b.N * 10 + c.N * 100 + d.N * 1000 + e.N * 10000 + f.N * 100000 + 1 N
from (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) a
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) b
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) c
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) d
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) e
      , (select 0 as N union all select 1 union all select 2 union all select 3 union all select 4 union all select 5 union all select 6 union all select 7 union all select 8 union all select 9) f
) t

It took on my box (MacBook Pro 16GB RAM, 2.6Ghz Intel Core i7) ~8 sec to complete

Query OK, 1000000 rows affected (7.63 sec)
Records: 1000000  Duplicates: 0  Warnings: 0

UPDATE1 Now a version of a stored procedure that uses a prepared statement

DELIMITER $$
CREATE PROCEDURE `inputRowsNoRandom`(IN NumRows INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
    END WHILE;

    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;

Completed in ~3 min:

mysql> CALL inputRowsNoRandom(1000000);
Query OK, 0 rows affected (2 min 51.57 sec)

Feel the difference 8 sec vs 3 min

UPDATE2 To speed things up we can explicitly use transactions and commit insertions in batches. So here it goes an improved version of the SP.

DELIMITER $$
CREATE PROCEDURE inputRowsNoRandom1(IN NumRows BIGINT, IN BatchSize INT)
BEGIN
    DECLARE i INT DEFAULT 0;

    PREPARE stmt 
       FROM 'INSERT INTO `entity_versionable` (fk_entity, str1, str2, bool1, double1, date)
             VALUES(?, ?, ?, ?, ?, ?)';
    SET @v1 = 1, @v2 = 'a1', @v3 = 100, @v4 = 1, @v5 = 500000, @v6 = '2013-06-14 12:40:45';

    START TRANSACTION;
    WHILE i < NumRows DO
        EXECUTE stmt USING @v1, @v2, @v3, @v4, @v5, @v6;
        SET i = i + 1;
        IF i % BatchSize = 0 THEN 
            COMMIT;
            START TRANSACTION;
        END IF;
    END WHILE;
    COMMIT;
    DEALLOCATE PREPARE stmt;
END$$
DELIMITER ;

Results with different batch sizes:

mysql> CALL inputRowsNoRandom1(1000000,1000);
Query OK, 0 rows affected (27.25 sec)

mysql> CALL inputRowsNoRandom1(1000000,10000);
Query OK, 0 rows affected (26.76 sec)

mysql> CALL inputRowsNoRandom1(1000000,100000);
Query OK, 0 rows affected (26.43 sec)

You see the difference yourself. Still > 3 times worse than cross join.

这篇关于MySQL:有效地在存储过程中填充表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆