如何提高在一个非常大的MySQL表上的INSERT性能 [英] How to improve INSERT performance on a very large MySQL table

查看:120
本文介绍了如何提高在一个非常大的MySQL表上的INSERT性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理大型MySQL数据库,我需要提高特定表上的INSERT性能。这个包含大约200万行,其结构如下:



(一个前提:我不是数据库专家,所以我写的代码可能是请帮助我理解我的错误:))

  CREATE TABLE如果NOT EXISTS项id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(200)NOT NULL,
key VARCHAR(10)NOT NULL,
busy TINYINT(1)NOT NULL DEFAULT 1,
created_at DATETIME NOT NULL,
updated_at DATETIME NOT NULL,

PRIMARY KEY(id,name),
UNIQUE KEY name_key_unique_key(name,key),
INDEX name_index
)ENGINE = MyISAM
PARTITION BY LINEAR KEY(name)
PARTITIONS 25;



每天我收到许多csv文件,其中每行由name; key ,所以我必须解析这些文件(为每行添加值created_at和updated_at),并将值插入到我的表中。在这一个,名称和键的组合必须是唯一的,所以我实现了插入过程如下:

  CREATE TEMPORARY TABLE temp_items(
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(200)NOT NULL,
key VARCHAR(10)NOT NULL,
busy TINYINT NULL DEFAULT 1,
created_at DATETIME NOT NULL,
updated_at DATETIME NOT NULL,
PRIMARY KEY(id)

ENGINE = MyISAM;

LOAD DATA LOCAL INFILE'file_to_process.csv'
INTO TABLE temp_items
FIELDS TERMINATED BY','
可选地由'\'''附加
(name,key,busy,created_at,updated_at);

INSERT INTO项目(name,key,busy,created_at,updated_at)

SELECT temp_items.name,temp_items.key, temp_items.busy,temp_items.created_at,temp_items.updated_at
FROM temp_items
ON DUPLICATE KEY UPDATE busy = 1,updated_at = NOW();

DROP TEMPORARY TABLE temp_items;

刚刚显示的代码允许我达到我的目标,雇员约48小时,这是一个问题。
我认为这种差的性能是由脚本必须检查一个非常大的表(200万行)和每个插入的对名称; key是唯一的。



如何提高我的脚本的性能?



解决方案

您的线性键名称和大索引会使事情变慢。



LINEAR KEY需要计算每个插入。
http://dev.mysql.com /doc/refman/5.1/en/partitioning-linear-hash.html



您可以向我们展示一些file_to_process.csv的示例数据,可能是一个更好的模式

  INSERT INTO items (name,key,busy,created_at,updated_at)

SELECT temp_items.name,temp_items.key,temp_items.busy,temp_items.created_at,temp_items.updated_at
FROM temp_items

这将会创建一个磁盘临时表,这是非常非常慢,所以你不应该使用它可以获得更多的性能,或者你应该检查一些mysql配置设置,如tmp-table-size和max-heap-table-size可能这些配置错误。


I am working on a large MySQL database and I need to improve INSERT performance on a specific table. This one contains about 200 Millions rows and its structure is as follows:

(a little premise: I am not a database expert, so the code I've written could be based on wrong foundations. Please help me to understand my mistakes :) )

CREATE TABLE IF NOT EXISTS items (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(200) NOT NULL,
    key VARCHAR(10) NOT NULL,
    busy TINYINT(1) NOT NULL DEFAULT 1,
    created_at DATETIME NOT NULL,
    updated_at DATETIME NOT NULL,

    PRIMARY KEY (id, name),
    UNIQUE KEY name_key_unique_key (name, key),
    INDEX name_index (name)
) ENGINE=MyISAM
PARTITION BY LINEAR KEY(name)
PARTITIONS 25;

Every day I receive many csv files in which each line is composed by the pair "name;key", so I have to parse these files (adding values created_at and updated_at for each row) and insert the values into my table. In this one, the combination of "name" and "key" MUST be UNIQUE, so I implemented the insert procedure as follows:

CREATE TEMPORARY TABLE temp_items (
    id INT NOT NULL AUTO_INCREMENT,
    name VARCHAR(200) NOT NULL, 
    key VARCHAR(10) NOT NULL, 
    busy TINYINT(1) NOT NULL DEFAULT 1,  
    created_at DATETIME NOT NULL, 
    updated_at DATETIME NOT NULL,  
    PRIMARY KEY (id) 
    ) 
ENGINE=MyISAM;

LOAD DATA LOCAL INFILE 'file_to_process.csv' 
INTO TABLE temp_items
FIELDS TERMINATED BY ',' 
OPTIONALLY ENCLOSED BY '\"' 
(name, key, created_at, updated_at); 

INSERT INTO items (name, key, busy, created_at, updated_at) 
(
    SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at 
    FROM temp_items
) 
ON DUPLICATE KEY UPDATE busy=1, updated_at=NOW();

DROP TEMPORARY TABLE temp_items;

The code just shown allows me to reach my goal but, to complete the execution, it employs about 48 hours, and this is a problem. I think that this poor performance are caused by the fact that the script must check on a very large table (200 Millions rows) and for each insertion that the pair "name;key" is unique.

How can I improve the performance of my script?

Thanks to all in advance.

解决方案

Your linear key on name and the large indexes slows things down.

LINEAR KEY needs to be calculated every insert. http://dev.mysql.com/doc/refman/5.1/en/partitioning-linear-hash.html

can you show us some example data of file_to_process.csv maybe a better schema should be build.

Edit looked more closely

INSERT INTO items (name, key, busy, created_at, updated_at) 
(
    SELECT temp_items.name, temp_items.key, temp_items.busy, temp_items.created_at, temp_items.updated_at 
    FROM temp_items
) 

this will proberly will create a disk temp table, this is very very slow so you should not use it to get more performance or maybe you should check some mysql config settings like tmp-table-size and max-heap-table-size maybe these are misconfigured.

这篇关于如何提高在一个非常大的MySQL表上的INSERT性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆