具有MySQL的物化视图(摘要表)的首选方法 [英] Preferred method for Materialized Views (Summary Tables) with MySQL

查看:115
本文介绍了具有MySQL的物化视图(摘要表)的首选方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开发一个工作项目,为了出于性能原因,我需要创建和维护汇总表。我相信这个正确的术语是物化视图



我有两个主要原因:


  1. 非规范化



    我尽可能地对表进行了归一化。所以有些情况下,我必须加入很多表才能提取数据。我们使用的是MySQL集群,它在JOIN的性能上表现相当差。



    所以我需要创建可以运行更快的SELECT的非规范化表。

    / li>
  2. 汇总数据



    例如,我有一个有几百万条记录的交易表。交易来自不同的网站。应用程序需要生成报告,将显示每个或每月的交易计数,以及每个网站的总收入金额。我不希望报告脚本每次都可以计算出来,所以我需要生成一个摘要表,它将按[site,date]进行细分。



    只是一个简单的例子。有许多不同种类的汇总表,我需要生成和维护。


过去我已经做了这些事情通过编写几个cron脚本来保持每个汇总表的更新。但是在这个新项目中,我希望实现一个更加优雅和适当的解决方案。



我更喜欢使用基于PHP的解决方案,因为我不是服务器管理员,我可以通过我的应用程序代码控制一切,感觉最舒适。






我的解决方案已考虑:


  1. 复制VIEW



    如果生成的表可以表示为单个SELECT查询,我可以生成一个VIEW。因为它们很慢,所以可以有一个cronjob将这个VIEW复制到真实的表中。



    然而,这些SELECT查询中的一些可能太慢了,甚至不能接受对于cronjobs。重新创建整个摘要数据并不是非常有效,如果较旧的行甚至没有被更新太多。


  2. 每个摘要表的自定义Cronjobs



    这是以前使用的解决方案,但现在我试图避免它。如果将有许多汇总表,那么维护可能会很麻烦。


  3. MySQL触发器



    可以向主表添加触发器,以便每次都有INSERT,UPDATE或DELETE时,汇总表将相应更新。



    将不会有cronjobs和总结将是实时的。但是,如果有必要从头开始重新构建汇总表,那么必须使用另一个解决方案(大概是上面的#1)来完成。


  4. strong>使用ORM Hooks / Triggers



    我正在使用Doctrine作为我的ORM。有一种方法可以添加事件侦听器,这些事件侦听器会触发INSERT / UPDATE / DELETE的内容,从而可以更新汇总表。在某种意义上,这个解决方案类似于上面的#3,但是我将对这些触发器有更好的控制,因为它们将在PHP中实现。







实施注意事项:


  1. 完成重建



    我想避免重建汇总表,以提高效率,只有更新新数据。但是如果出现问题,我需要使用主表上的现有数据从头重新构建汇总表的功能。


  2. 忽略UPDATE /旧数据上的删除



    一些摘要可以假定旧的记录永远不会被更新或删除,但只会插入新的记录。总结过程可以通过假设不需要检查旧数据的更新来节省大量工作。



    但是当然这不适用


  3. 保存日志



    我不会访问或不想使用二进制MySQL日志。



    为了总结新数据,摘要过程只需要记住它总结的最后一个记录的最后一个主键ID。下次运行时,可以总结出该ID之后的所有内容。然而,为了跟踪更新/删除的旧记录,它需要另一个日志,以便它可以返回并重新总结该数据。







我会感谢任何可以帮助的策略,建议或链接。谢谢!

解决方案

如上所述,Oracle中的物化视图与SQL Server中的索引视图不同。他们非常酷和有用。请参阅 http://download.oracle.com/docs/ cd / B10500_01 / server.920 / a96567 / repmview.htm 详细信息



MySql不支持这些。



你提到的一件事多次是表现不佳。您是否检查过数据库设计,以便进行正确的索引,并对查询执行解释计划,以了解为什么它们很慢。请参阅这里 http://dev.mysql.com/doc/refman /5.1/en/using-explain.html 。这当然是假设您的服务器正确调整,您有mysql设置和调整,例如。缓冲区缓存等等。



直接的问题。听起来像你想做的是我们经常在数据仓库中做的事情。我们有一个生产数据库和一个DW,可以提供各种信息,聚合和预先格式化,以加快查询速度。这可能对你来说太过分了,但是你可以决定。根据您为报告定义的延迟,即您需要多长时间,我们通常会定期(每天,每周等)进行ETL(提取转换加载)过程,以从生产系统填充DW。这对生产系统造成的影响很小,并将所有报告移动到另一组服务器,这也降低了负载。在DW方面,我通常会设计不同的模式,即使用星型模式。 (http://www.orafaq.com/node/2286)星型模式有事实表(要测量的东西)和尺寸(您希望将时间,地理位置,产品类别等集合到的项目) SQL Server还包括一个称为SQL Server Analysis Services(SSAS)的附加引擎,以查看事实表和维度,预先计算和构建OLAP数据立方体。在这些数据多维数据集中,您可以深入研究所有类型的模式,做数据分析和数据挖掘,Oracle做的事情略有不同,但结果是一样的。



您是否想要走路,真的取决于业务需求和多少价值你可以从数据分析中获得数据,正如我所说,如果你只是有几个汇总表,但是你会发现有些概念在你想到的时候可能会有所帮助,如果你的业务正在向商业智能解决方案发展,那么这就是一些东西要考虑。



PS如果这是业务需要,您实际上可以使用名为ROLAP的实时设置DW。微型策略有一个很好的产品,为此工作很好。



PPS您也可以从MS查看PowerPivot(http://www.powerpivot.com/learn.aspx)我只玩过它,所以我不能告诉你如何在非常大的数据集上工作。


I am developing a project at work for which I need to create and maintain Summary Tables for performance reasons. I believe the correct term for this is Materialized Views.

I have 2 main reasons to do this:

  1. Denormalization

    I normalized the tables as much as possible. So there are situations where I would have to join many tables to pull data. We work with MySQL Cluster, which has pretty poor performance when it comes to JOIN's.

    So I need to create Denormalized Tables that can run faster SELECT's.

  2. Summarize Data

    For example, I have a Transactions table with a few million records. The transactions come from different websites. The application needs to generate a report will display the daily or monthly transaction counts, and total revenue amounts per website. I don't want the report script to calculate this every time, so I need to generate a Summary Table that will have a breakdown by [site,date].

    That is just one simple example. There are many different kinds of summary tables I need to generate and maintain.

In the past I have done these things by writing several cron scripts to keep each summary table updated. But in this new project, I am hoping to implement a more elegant and proper solution.

I would prefer a PHP based solution, as I am not a server administrator, and I feel the most comfortable when I can control everything through my application code.


Solutions that I have considered:

  1. Copying VIEW's

    If the resulting table can be represented as a single SELECT query, I can generate a VIEW. Since they are slow, there can be a cronjob that copies this VIEW into a real table.

    However, some of these SELECT queries can be so slow that it's not acceptable even for cronjobs. It is not very efficient to recreate the whole summary data, if older rows are not even being updated much.

  2. Custom Cronjobs for each Summary Table

    This is the solution I have used before, but now I am trying to avoid it if possible. If there will be many summary tables, it can be messy to maintain.

  3. MySQL Triggers

    It is possible to add triggers to the main tables so that every time there is an INSERT, UPDATE or DELETE, the summary tables get updated accordingly.

    There would be no cronjobs and the summaries would be in real time. However if there is ever a need to rebuild a summary table from scratch, it would have to be done with another solution (probably #1 above).

  4. Using ORM Hooks/Triggers

    I am using Doctrine as my ORM. There is a way to add event listeners that will trigger stuff on INSERT/UPDATE/DELETE, which in turn can update the summary tables. In a sense this solution is similar to #3 above, but I will have better control over these triggers since they will be implemented in PHP.


Implementation Considerations:

  1. Complete Rebuilds

    I want to avoid having to rebuild the summary tables, for efficiency, and only update for new data. But in case something goes wrong, I need the capability to rebuild the summary table from scratch using existing data on the main tables.

  2. Ignoring UPDATE/DELETE on Old Data

    Some summaries can assume that older records will never be updated or deleted, but only new records will be inserted. The summary process can save a lot of work by making the assumption that it doesn't need to check for updates on older data.

    But of course this won't apply to all tables.

  3. Keeping a Log

    Let's assume that I won't have access to, or do not want to use the binary MySQL logs.

    For summarizing new data, the summary process just needs to remember the last primary key id's for the last records it summarized. Next time it runs, it can summarize everything after that id. However, to keep track of older records that have been updated/deleted, it needs another log so it can go back and re-summarize that data.


I would appreciate any kind of strategies, suggestions or links that can help. Thank you!

解决方案

As noted above materialized views in Oracle are different than indexed views in SQL Server. They are very cool and useful. See http://download.oracle.com/docs/cd/B10500_01/server.920/a96567/repmview.htm for details

MySql does not have support for these however.

One thing you mention several times is poor performance. Have you checked your database design for proper indexing and run explain plans on the queries to see why they are slow. See here http://dev.mysql.com/doc/refman/5.1/en/using-explain.html. This is of course assuming that your server is tuned properly, you have mysql setup and tuned, e.g. buffer caches, etc. etc. etc.

To your direct question. What you sound like you want to do is something we do often in a data warehouse situation. We have a production database and a DW that pulls in all sorts of information, aggregates and pre-caclulates it to speed up querying. This may be overkill for you but you can decide. Depending on the latency you define for your reports, i.e. how often you need them, we normally go through an ETL (extract transform load) process periodically (daily, weekly, etc.) to populate the DW from the production system. This keeps impact low on the production system and moves all reporting to another set of servers which also lessens the load. On the DW side, I would normally design my schemas different, i.e. using star schemas. (http://www.orafaq.com/node/2286) Star schemas have fact tables (things you want to measure) and dimensions (things you want to aggregate the measures by (time, geography, product categories, etc.) On SQL Server they also include an additional engine called SQL Server Analysis services (SSAS) to look at fact tables and dimensions, pre calculate and build OLAP data cubes. In these data cubes you can drill down and look at all types of patterns, do data analysis and data mining. Oracle does things slightly differently but the outcome is the same.

Whether you want to go the about route really depends on the business need and how much value you get from data analysis. As I said it is likely overkill if you just have a few summary tables but some of the concepts you may find helpful as you think things through. If your business is going toward a business intelligence solution then this is something to consider.

PS You can actually set a DW up to work in "real-time" using something called ROLAP if that is the business need. Microstrategy has a good product that works well for this.

PPS You also may want to look at PowerPivot from MS (http://www.powerpivot.com/learn.aspx) I have only played with it so I cannot tell you how it works on very large datasets.

这篇关于具有MySQL的物化视图(摘要表)的首选方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆