慢MySQL插入 [英] Slow MySQL inserts

查看:136
本文介绍了慢MySQL插入的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用并使用使用MySQL作为后端引擎的软件(它可以使用诸如PostgreSQL或Oracle或SQLite等其他软件,但这是我们使用的主要应用程序)。该软件被设计成使得我们想要访问的二进制数据作为 BLOB 保存在单独的列中(每个表具有一个 列,其他列具有整数/浮点以表示 BLOB ,一个字符串列具有 BLOB 的MD5哈希)。表通常具有2,3或4个索引,其中一个始终是MD5列,其由 UNIQUE 创建。一些表已经有数百万条目,他们已经进入了多GB的大小。我们在同一个服务器中保存每年的MySQL数据库(到目前为止)。

I am using and working on software which uses MySQL as a backend engine (it can use others such as PostgreSQL or Oracle or SQLite, but this is the main application we are using). The software was design in such way that the binary data we want to access is kept as BLOBs in individual columns (each table has one BLOB column, other columns have integers/floats to characterize the BLOB, and one string column with the BLOB's MD5 hash). The tables have typically 2, 3 or 4 indexes, one of which is always the MD5 column, which is made UNIQUE. Some tables already have millions of entries, and they have entered the multi-gigabyte in size. We keep separate per-year MySQL databases in the same server (so far). The hardware is quite reasonable (I think) for general applications (a Dell PowerEdge 2U-form server).

MySQL SELECT 查询相对较快。有很少的投诉,因为这些(大部分时间)在批处理模式。但是, INSERT 查询需要很长时间,这随表格大小(行数)而增加。不可否认,这是因为MD5列的类型 UNIQUE ,因此每个 INSERT 必须确定每个新行具有对应的,已经插入的MD5字符串。而且如果有其他索引(不是唯一的),性能变得更糟,这不奇怪(我想)。但我仍然不能让我的头脑休息,这个软件架构的选择(我怀疑保持 BLOBs 在表行,而不是磁盘有重大的负面影响)不是最好的选择。插入不是关键,但它是一种恼人的感觉。

MySQL SELECT queries are relatively fast. There's little complaint there, since these are (most of the time) in batch mode. However, INSERT queries take a long time, which increases with table size (number of rows). Admittedly, this is because the MD5 column is of type UNIQUE and so each INSERT has to figure out whether each new row has a corresponding, already-inserted, MD5 string. And it's not too strange (I think) if the performance gets worse if there are other indexes (not unique). But I still can't put my mind to rest that this software architecture choice (I suspect keeping BLOBs in the table row instead of disk has a significant, negative impact) is not the best choice. Insertions are not critical, but it is an annoying feeling to have.

有没有人有类似的情况下的经验?用MySQL,甚至其他(最好是基于Linux的)RDBMes?你会提供什么洞察,也许一些性能数字?

Does anyone have experience in similar situations? With MySQL, or even other (preferably Linux-based) RDBMes? Any insights you would care to provide, maybe some performance figures?

BTW,工作语言是C ++(将C调用封装到MySQL的API)。

BTW, the working language is C++ (which wraps C calls to MySQL's API).

推荐答案

这可能是水平分割和将blob字段移动到单独表中的时间。在关于垂直分区的快速注释中的文章中,作者从表中删除了一个更大的varchar字段,并且它增加了查询的速度大约数量级。

It could be a time for horizontal partitioning and moving blob field into a separate table. In this article in 'A Quick Side Note on Vertical Partitioning' author removes a larger varchar field from a table and it increases speed of a query about order of magnitude.

原因是物理遍历磁盘上的数据变得明显更快如果有更少的空间覆盖,那么移动更大的领域在其他地方可以提高性能。

The reason is physical traversal of the data on a disk becomes significantly faster if there is less space to cover, so moving bigger fields elsewhere increases performance.

此外(和你可能已经做了)有利于减小索引的大小列到其绝对最小值(在asdi编码为md5的char(32)),因为键的大小与其使用速度成正比。

Also (and you probably do it already) it is beneficial to decrease the size of your index column to its absolute minumum (char(32) in ascii encoding for md5), because size of the key is directly proportional to the speed of its use.

多次插入一个时间与InnoDB表,你可以显着提高插入速度通过将它们包装到事务中并在一个查询中进行多次插入:

If you do multiple inserts at a time with InnoDB tables you can significantly increase speed of inserts by wrapping them into transaction and doing mupliple inserts in one query:

START TRANSACTION
INSERT INTO x (id, md5, field1, field2) values (1, '123dab...', 'data1','data2'),(2,'ab2...','data3','data4'),.....;
COMMIT

这篇关于慢MySQL插入的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆