汇总数据表 [英] Aggregate data tables

查看:69
本文介绍了汇总数据表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在构建一个大型数据库的前端(数百万行中的10个).数据是不同公司的负荷用水量,该表如下所示:

I am building a front-end to a largish db (10's of millions of rows). The data is water usage for loads of different companies and the table looks something like:

id | company_id | datetime            | reading | used | cost
=============================================================
1  | 1          | 2012-01-01 00:00:00 | 5000    | 5    | 0.50
2  | 1          | 2012-01-01 00:01:00 | 5015    | 15   | 1.50
....

在前端,用户可以选择他们要如何查看数据,例如:每小时6个增量,每天增量,每月等.什么是快速执行此操作的最佳方法.鉴于数据变化如此之大,并且可以看到任何一组数据的次数,将查询数据缓存在内存中或类似的东西几乎是没有意义的,并且由于存在太多变量,因此无法事先构建数据.

On the frontend users can select how they want to view the data, eg: 6 hourly increments, daily increments, monthly etc. What would be the best way to do this quickly. Given the data changes so much and the number of times any one set of data will be seen, caching the query data in memcahce or something similar is almost pointless and there is no way to build the data before hand as there are too many variables.

我认为使用某种汇总聚合表可以使诸如readingsreadings_6hreadings_1d之类的表具有完全相同的结构,并且已经进行了聚合.

I figured using some kind of agregate aggregate table would work having tables such as readings, readings_6h, readings_1d with exactly the same structure, just already aggregated.

如果这是一个可行的解决方案,那么使聚合表保持最新和准确的最佳方法是什么.除了来自仪表的数据外,该表也是只读的.用户无需更新或写入.

If this is a viable solution, what is the best way to keep the aggregate tables upto date and accurate. Besides the data coming in from meters the table is read only. Users don't ever have to update or write to it.

许多可能的解决方案包括:

A number of possible solutions include:

1)坚持实时使用组/集合函数进行查询

1) stick to doing queries with group / aggregate functions on the fly

2)做一个基本的选择并保存

2) doing a basic select and save

SELECT `company_id`, CONCAT_WS(' ', date(`datetime`), '23:59:59') AS datetime, 
MAX(`reading`) AS reading, SUM(`used`) AS used, SUM(`cost`) AS cost 
FROM `readings`
WHERE `datetime` > '$lastUpdateDateTime'
GROUP BY `company_id`

3)重复键更新(不确定此处的汇总方式,也要确保数据准确无误,否则不会被计算两次或丢失行.

3) duplicate key update (not sure how the aggregation would be done here, also making sure that the data is accurate not counted twice or missing rows.

INSERT INTO `readings_6h` ... 
SELECT FROM `readings` .... 
ON DUPLICATE KEY UPDATE .. calculate...

4)其他想法/建议?

4) other ideas / recommendations?

我目前正在执行选项2,该过程大约需要15分钟才能将+ -100k行汇总到4个表(_6h,_1d,_7d,_1m,_1y)上的+ -30,000k行中.

I am currently doing option 2 which is taking around 15 minutes to aggregate +- 100k rows into +- 30k rows over 4 tables (_6h, _1d, _7d, _1m, _1y)

TL; DR对于无法有效缓存的大量报表,查看/存储汇总数据的最佳方法是什么.

推荐答案

此功能最好通过称为

This functionality would be best served by a feature called materialized view, which MySQL unfortunately lacks. You could consider migrating to a different database system, such as PostgreSQL.

有多种方法可以使用存储过程,触发器和事件在MySQL中模拟实例化视图.您创建一个更新聚合数据的存储过程.如果必须在每个插入上更新汇总数据,则可以定义一个触发器来调用该过程.如果必须每隔几个小时更新一次数据,则可以定义一个 MySQL Scheduler事件或执行此任务的Cron作业.

There are ways to emulate materialized views in MySQL using stored procedures, triggers, and events. You create a stored procedure that updates the aggregate data. If the aggregate data has to be updated on every insert you could define a trigger to call the procedure. If the data has to be updated every few hours you could define a MySQL scheduler event or a cron job to do it.

有一种类似于方法3的组合方法,它不依赖于输入数据的日期.想象一下,如果一些新数据到达时机太晚而又没有将其纳入汇总中,将会发生什么. (我不知道您可能没有这个问题.)您可以定义一个触发器,将新数据插入到积压"中,并使该过程仅从积压中更新聚合表.

There is a combined approach, similar to your option 3, that does not depend on the dates of the input data; imagine what would happen if some new data arrives a moment too late and does not make it into the aggregation. (You might not have this problem, I don't know.) You could define a trigger that inserts new data into a "backlog," and have the procedure update the aggregate table from the backlog only.

本文将详细描述所有这些方法: http://www.fromdual. com/mysql-materialized-views

All these methods are described in detail in this article: http://www.fromdual.com/mysql-materialized-views

这篇关于汇总数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆