如何确保物化视图始终是最新的? [英] How can I ensure that a materialized view is always up to date?

查看:119
本文介绍了如何确保物化视图始终是最新的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每次涉及到的表更改时,我都需要调用刷新材料视图,对吗?令我惊讶的是,没有在网上找到太多讨论。



我应该如何去做?



我认为这里答案的上半部分是我要寻找的: https://stackoverflow.com/a/ 23963969/168143



这样做有危险吗?如果更新视图失败,是否会回滚正在调用的更新,插入等事务? (这就是我想要的...我想)

解决方案


我需要调用刷新所涉及表的材料视图,对吧?


是,PostgreSQL本身将永远不会自动调用它,您需要采取某种方式。


我应该怎么做?

有很多方法可以实现这一点。在给出示例之前,请记住 刷新材料VIEW 命令确实会在AccessExclusive模式下阻止视图,因此当它工作时,您甚至无法在表上执行 SELECT



虽然,如果您使用的是9.4版或更高版本,则可以给它 CURRENTLY 选项:

 刷新材料视图同时my_mv; 

这将获得ExclusiveLock,并且不会阻止 SELECT 查询,但开销可能更大(取决于更改的数据量,如果更改了几行,则可能会更快)。尽管您仍然不能同时运行两个 REFRESH 命令。



手动刷新



这是一个可以考虑的选项。特别是在数据加载或批处理更新的情况下(例如,长时间后仅加载大量信息/数据的系统),通常需要在末尾进行修改或处理数据的操作,因此您可以简单地将< c $ c> REFRESH 操作结束。



安排REFRESH操作



第一个广泛使用的选项是使用某种调度系统来调用刷新,例如,您可以在cron作业中配置like:

  * / 30 * * * * psql -d your_database -c刷新材料视图同时查看my_mv 

然后您的实例化视图将在每30分钟刷新一次。



注意事项



此选项非常好,特别是使用 CONCURRENTLY 选项,但前提是您必须始终接受并非100%都是最新的数据。请记住,即使使用或不使用 REFRESH 命令的确需要运行整个查询,因此您在考虑安排 REFRESH 的时间之前,必须花时间运行内部查询。



刷新触发器



另一个选择是在触发器函数中调用 REFRESH MATERIALIZED VIEW ,如下所示:

 创建或替换功能tg_refresh_my_mv()
返回值触发语言plpgsql AS $$
开始
刷新材料立即查看my_mv;
返回NULL;
END;
$$;

然后,在涉及视图更改的任何表中,您将执行:

 创建触发器tg_refresh_my_mv在插入或更新或删除
之后,在table_name
上执行每个语句执行程序tg_refresh_my_mv();



注意事项



它有一些严重的陷阱为了提高性能和并发性:


  1. 任何INSERT / UPDATE / DELETE操作都必须执行查询(如果考虑使用MV,这可能会很慢);

  2. 即使同时 ,一个 REFRESH 仍会阻止另一个,因此,所涉及表上的所有INSERT / UPDATE / DELETE都将被序列化。

唯一可以想到的情况是



使用LISTEN / NOTIFY刷新



前一个选项的问题是它是同步的,并且每次操作都会带来很大的开销。为了改善这一点,您可以像以前一样使用触发器,但是只调用 NOTIFY 操作

 创建或替换功能tg_refresh_my_mv( )
RETURNS触发语言plpgsql AS $$
开始
NOTIFY refresh_mv,'my_mv';
返回NULL;
END;
$$;

因此,您可以构建一个保持连接并使用 LISTEN 操作以确定需要调用刷新。您可以用来测试它的一个不错的项目是 pgsidekick ,通过该项目,您可以使用Shell脚本来执行此操作 LISTEN ,因此您可以将 REFRESH 安排为:

  pglisten --listen = refresh_mv --print0 | xargs -0 -n1 -I? psql -d your_database -c是否同时刷新了材料视图? 

或使用 pglater (也在 pgsidekick ),以确保您不会经常调用 REFRESH 。例如,您可以使用以下触发器将其设置为 REFRESH ,但要在1分钟(60秒)内:



<$创建或替换功能
返回NULL;
END;
$$;

因此它不会在更少的时间内调用 REFRESH 相隔60秒,并且如果您在不到60秒的时间内多次,则仅会触发 REFRESH



注意事项



作为cron选项,仅当您可以裸露一个几乎没有陈旧的数据,但是这样做的好处是,仅在真正需要时才调用 REFRESH ,这样您的开销就更少了,并且数据更新的时间也更接近需要时。



OBS:我还没有真正尝试过代码和示例,因此,如果有人发现错误,错字或尝试并起作用(或没有),请让我知道。


I'll need to invoke REFRESH MATERIALIZED VIEW on each change to the tables involved, right? I'm surprised to not find much discussion of this on the web.

How should I go about doing this?

I think the top half of the answer here is what I'm looking for: https://stackoverflow.com/a/23963969/168143

Are there any dangers to this? If updating the view fails, will the transaction on the invoking update, insert, etc. be rolled back? (this is what I want... I think)

解决方案

I'll need to invoke REFRESH MATERIALIZED VIEW on each change to the tables involved, right?

Yes, PostgreSQL by itself will never call it automatically, you need to do it some way.

How should I go about doing this?

Many ways to achieve this. Before giving some examples, keep in mind that REFRESH MATERIALIZED VIEW command does block the view in AccessExclusive mode, so while it is working, you can't even do SELECT on the table.

Although, if you are in version 9.4 or newer, you can give it the CONCURRENTLY option:

REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv;

This will acquire an ExclusiveLock, and will not block SELECT queries, but may have a bigger overhead (depends on the amount of data changed, if few rows have changed, then it might be faster). Although you still can't run two REFRESH commands concurrently.

Refresh manually

It is an option to consider. Specially in cases of data loading or batch updates (e.g. a system that only loads tons of information/data after long periods of time) it is common to have operations at end to modify or process the data, so you can simple include a REFRESH operation in the end of it.

Scheduling the REFRESH operation

The first and widely used option is to use some scheduling system to invoke the refresh, for instance, you could configure the like in a cron job:

*/30 * * * * psql -d your_database -c "REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv"

And then your materialized view will be refreshed at each 30 minutes.

Considerations

This option is really good, specially with CONCURRENTLY option, but only if you can accept the data not being 100% up to date all the time. Keep in mind, that even with or without CONCURRENTLY, the REFRESH command does need to run the entire query, so you have to take the time needed to run the inner query before considering the time to schedule the REFRESH.

Refreshing with a trigger

Another option is to call the REFRESH MATERIALIZED VIEW in a trigger function, like this:

CREATE OR REPLACE FUNCTION tg_refresh_my_mv()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    REFRESH MATERIALIZED VIEW CONCURRENTLY my_mv;
    RETURN NULL;
END;
$$;

Then, in any table that involves changes on the view, you do:

CREATE TRIGGER tg_refresh_my_mv AFTER INSERT OR UPDATE OR DELETE
ON table_name
FOR EACH STATEMENT EXECUTE PROCEDURE tg_refresh_my_mv();

Considerations

It has some critical pitfalls for performance and concurrency:

  1. Any INSERT/UPDATE/DELETE operation will have to execute the query (which is possible slow if you are considering MV);
  2. Even with CONCURRENTLY, one REFRESH still blocks another one, so any INSERT/UPDATE/DELETE on the involved tables will be serialized.

The only situation I can think that as a good idea is if the changes are really rare.

Refresh using LISTEN/NOTIFY

The problem with the previous option is that it is synchronous and impose a big overhead at each operation. To ameliorate that, you can use a trigger like before, but that only calls a NOTIFY operation:

CREATE OR REPLACE FUNCTION tg_refresh_my_mv()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    NOTIFY refresh_mv, 'my_mv';
    RETURN NULL;
END;
$$;

So then you can build an application that keep connected and uses LISTEN operation to identify the need to call REFRESH. One nice project that you can use to test this is pgsidekick, with this project you can use shell script to do LISTEN, so you can schedule the REFRESH as:

pglisten --listen=refresh_mv --print0 | xargs -0 -n1 -I? psql -d your_database -c "REFRESH MATERIALIZED VIEW CONCURRENTLY ?;"

Or use pglater (also inside pgsidekick) to make sure you don't call REFRESH very often. For example, you can use the following trigger to make it REFRESH, but within 1 minute (60 seconds):

CREATE OR REPLACE FUNCTION tg_refresh_my_mv()
RETURNS trigger LANGUAGE plpgsql AS $$
BEGIN
    NOTIFY refresh_mv, '60 REFRESH MATERIALIZED VIEW CONCURRENLTY my_mv';
    RETURN NULL;
END;
$$;

So it will not call REFRESH in less the 60 seconds apart, and also if you NOTIFY many times in less than 60 seconds, the REFRESH will be triggered only once.

Considerations

As the cron option, this one also is good only if you can bare with a little stale data, but this has the advantage that the REFRESH is called only when really needed, so you have less overhead, and also the data is updated more closer to when needed.

OBS: I haven't really tried the codes and examples yet, so if someone finds a mistake, typo or tries it and works (or not), please let me know.

这篇关于如何确保物化视图始终是最新的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆