很大的mysql表和报告 [英] very big mysql table and reporting

查看:67
本文介绍了很大的mysql表和报告的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找用于处理MySQL中大型表的选项.在我的数据库中,很少有表具有超过1.3亿行(超过70GB)的快速增长.为了进行报告和分析,我必须运行一些聚合函数,尽管有索引,但查询的运行速度仍然很慢.我试图用合并数据制作一些表,但这并不是最佳选择.因此,我正在寻找可用于解决此问题的工具的选项.

I am looking for options for dealing with large tables in MySQL. In my database there are few table with over 130 million of rows (over 70GB) increasing very fast. For reporting and analysis purposes I have to run some aggregation functions and the queries run very slow despite the indexes. i tried to make some table with consolidates data, but that is not optimal. So I am looking for option for tools I can use to solve this problem.

推荐答案

如果尚未对表进行partition查找,请首先进行以​​下操作:

Start by looking into partitioning your table if you haven't already:

http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

http://www.slideshare.net/datacharmer/mysql-partitions-tutorial

http://blog.mayflower.de/archives/353-Is-MySQL-partitioning-useful-for-very-big-real-life-problems.html

您如何整合"数据?也许您使用的方法不是最佳方法.一种好方法(让我知道这是否确实是您正在做的事情)是创建一个包含聚合数据的表.然后以这种方式进行设置:

How are you 'consolidating' your data? Maybe the method you are using isn't optimal. One good approach (let me know if this is actually what you are doing) is to create a table that contains aggregated data. Then set it up this way:

首先搁置如何将数据转储到主表中...

First putting aside how the data is being dumped into your main table...

  • 创建一个作业(cron或您可能已经准备好进行或已配置的任何作业),该作业相对于将数据加载到主表中的方式以指定的间隔运行(将其称为MAIN,向前移动) .如果您的MAIN表每小时加载一次,请对其进行同步.半小时一次?没关系无论如何,您都可以检查速度,或者如果报告运行时间接近非高峰时间,则可以在该时间附近安排时间

  • Create a job (cron or whatever you may have handy or already configured) that runs at a specified interval, relative to how the data is loaded into the main table (let's call it MAIN, moving forward). If your MAIN table gets loaded hourly, then sync it. Half-hourly? Doesn't matter. You can check the speed anyway, or if it's near off-peak hours that your reports run, then schedule near then

为合并数据正确索引表.让我们称之为AGG前进.

Properly index your table for consolidated data. Let's call it AGG moving forward.

创建一个存储过程,该存储过程将数据从MAIN加载到AGG,这基本上是一个AGG LOAD FOR INTERVAL-?.当然,您是唯一知道如何或何时将数据插入MAIN的人,因此您也将是知道聚合意图的人.如果聚合意图没有完成(例如,它是一整天的时间,那么它是一个累积运行,直到设置完为止),也可以继续运行聚合存储过程.

Create a stored procedure that loads data from MAIN to AGG, which is basically an AGG LOAD FOR INTERVAL-?. Of course, you're the only one here who knows how or when the data gets inserted into MAIN, so you'll also be the one who knows what the aggregation intention is. It's also possible to keep running the aggregating stored procedure if the aggregation intention is not completed (say it's for an entire day.. so it is an accumulative run until that is set)

使用STAGING表. 对我来说,他们是最好的.

创建一个存储过程,以重新检查数据,以便通过运行此过程可以在AGG表中反映记录的任何更新或其他插入.包括要更新范围的参数.如果是每天,则有一个DAILY AGG LOADDAILY AGG RELOAD过程.包括一个AGG CHECK INTERVALAGG CHECK DAILY程序,这将帮助您在晚上睡个好觉.哦,更不用说AGG DATA HOLE CHECKMISSING AGG DATA CHECK并应用业务规则来实现对必需的最小数据量的检查,这些数据可以从聚合表或主表或登台表(最好)中获取

Create a stored procedure that re-checks the data, so that any updates or additional insertion of records can be reflected in the AGG table by running this procedure. Include parameters for the range to update. If it's daily, then you have a DAILY AGG LOAD and DAILY AGG RELOAD procedure. Include an AGG CHECK INTERVAL and AGG CHECK DAILY procedure which will help you sleep well at night. Oh and not to mention a AGG DATA HOLE CHECK or a MISSING AGG DATA CHECK and apply business rules that implement checking for a required minimum amount of data which you can get from the aggregated table or from the main table or staging table (preferrably)

当然,永远不要修改AGG表.始终只重新加载它.

Of course, never modify the AGG table. Always only reload it.

这有什么帮助?然后,您是否只需要让报表查询AGG表,该表就更小,更快(因为聚合已经完成)?间隔加载可能会带来性能问题,但是如果您正确构造表,表的索引及其维护,则应该值得.

How does this help? Wouldn't you then only need to have your reports query the AGG table, which is smaller, and faster (since the aggregation has been done already)? Maybe the performance issue comes in with the interval loading, but if you properly structure your table, its indexes and it's maintenance, it should be worth it.

分区从何而来?存档.一旦过了一定的时间(讨论一下您的团队/老板/上司可以接受的条件),您可以从MAIN中存档旧数据.我经历过必须在生产数据库中保存1年的数据.这有点像拖累,但是因为这是客户的要求,所以公司别无选择,只能给我我需要的磁盘空间(动手),直到我得到不错的运行,我才开始尝试.我必须提到,我的经验是使用Microsoft SQL Server 2005,存储过程和SSIS使其变得有趣.

Where does partitioning come in? Archiving. Once a certain time has passed (discuss what's acceptable with your team/boss/top man) you can archive the old data from MAIN. I experienced having to keep 1 year's worth of data in the production database. That kinda felt like a drag, but because it was the client's request, the company had no choice but to give me the disk space I needed (rubs hands) and boy did I play around with it until I got something running decently. I must mention that my experience was with Microsoft SQL Server 2005, and stored procedures and SSIS made it fun.

如果您还不知道这些,以及对于其他可能想考虑选择的人,这就是全部.我并不是说您还不知道以上任何一项;我只是在说我以前能够做的事情-考虑到我没有更多的信息可用于您的帖子,只是您尝试了一个合并过程.

This is all if you don't know it already, and for others who may want to consider options. I'm not saying you didn't know any of the above already; I'm just stating what I have been able to do before -- considering that I didn't have more information to work with from your post, except that you have a consolidation process that you tried..

这篇关于很大的mysql表和报告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆