优化MySQL插入以处理数据流 [英] Optimizing MySQL inserts to handle a data stream

查看:105
本文介绍了优化MySQL插入以处理数据流的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在消耗高速数据流,并按照以下步骤将数据存储在MySQL数据库中.对于每个新到达的物品.

I am consuming a high rate data stream and doing the following steps to store data in a MySQL database. For each new arriving item.

  • (1)解析传入的项目.
  • (2)执行几个在复制密钥更新时插入..."

我已经使用在复制时插入...密钥更新消除了到数据库的另一次往返.

I have used INSERT ... ON DUPLICATE KEY UPDATE to eliminate one additional round-trip to the database.

在尝试提高整体性能时,我考虑过通过以下方式进行批量更新:

While trying to improve the overall performance, I have considered doing bulk updates in the following way:

  • (1)解析传入的项目.
  • (2)使用"INSERT ... ON DUPLICATE KEY UPDATE"生成SQL语句,并将其追加到文件中.

定期将文件中的SQL语句刷新到数据库中.

Periodically flush the SQL statements in the file to the database.

两个问题:

  • (1)这会对数据库负载产生积极影响吗?
  • (2)我应该如何将语句刷新到数据库,以便仅在完全刷新后才重建索引? (使用交易?)

更新:我正在使用Perl DBI + MySQL MyISAM.

UPDATE: I am using Perl DBI + MySQL MyISAM.

预先感谢您的任何评论.

Thanks in advance for any comments.

推荐答案

您没有说您正在运行哪种数据库访问环境(PERL DBI?JDBC?ODBC?),或者哪种表存储引擎. (MyISAM?InnoDB?)您正在使用.

You don't say what kind of database access environment (PERL DBI? JDBC? ODBC?) you're running in, or what kind of table storage engine (MyISAM? InnoDB?) you're using.

首先,您应该选择INSERT ... ON DUPLICATE KEY UPDATE.除非您可以保证使用唯一的密钥,否则请采取行动.

First of all, you're right to pick INSERT ... ON DUPLICATE KEY UPDATE. Good move, unless you can guarantee unique keys.

第二,如果您的数据库访问环境允许,则应使用准备好的语句.如果将一堆语句写入文件,然后使数据库客户端再次读取该文件,则绝对不会获得良好的性能.直接从使用传入数据流的软件包执行INSERT操作.

Secondly, if your database access environment allows it, you should use prepared statements. You definitely won't get good performance if you write a bunch of statements into a file, and then make a database client read the file once again. Do the INSERT operations directly from the software package that consumes the incoming data stream.

第三,选择合适的表存储引擎. MyISAM插入将比InnoDB快,因此,如果您要记录数据并在以后检索,那将是一个成功.但是InnoDB具有更好的事务完整性.如果您确实要处理大量数据,并且不需要经常读取,请考虑使用ARCHIVE存储引擎.

Thirdly, pick the right kind of table storage engine. MyISAM inserts are going to be faster than InnoDB, so if you're logging data and retrieving it later that will be a win. But InnoDB has better transactional integrity. If you're really handling tonnage of data, and you don't need to read it very often, consider the ARCHIVE storage engine.

最后,考虑在一批INSERT ...命令的开头执行一次START TRANSACTION,然后在固定的行数(例如100左右)之后执行COMMIT和另一个START TRANSACTION.如果您使用的是InnoDB,这将大大加快速度.如果您使用的是MyISAM或ARCHIVE,则无关紧要.

Finally, consider doing a START TRANSACTION at the beginning of a batch of INSERT ... commands, then doing a COMMIT and another START TRANSACTION after a fixed number of rows, like 100 or so. If you're using InnoDB, this will speed things up a lot. If you're using MyISAM or ARCHIVE, it won't matter.

您的大赢家将来自准备好的声明和存储引擎的最佳选择.

Your big wins will come from the prepared statement stuff and the best choice of storage engine.

这篇关于优化MySQL插入以处理数据流的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆