关于Postgres track_commit_timestamp(pg_xact_commit_timestamp)的问题 [英] Questions about Postgres track_commit_timestamp (pg_xact_commit_timestamp)

查看:296
本文介绍了关于Postgres track_commit_timestamp(pg_xact_commit_timestamp)的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在为并发安全的增量聚合汇总系统进行设计,track_commit_timestamp(pg_xact_commit_timestamp)听起来很完美。但是我发现对此的评论很少,并且无法从源代码中详细了解它的工作原理。

I'm working on a design for a concurrency-safe incremental aggregate rollup system,and track_commit_timestamp (pg_xact_commit_timestamp) sounds perfect. But I've found very little commentary on it generally, and couldn't figure out how it works in detail from the source code.

希望有人知道答案。我的一个或多个问题:

Hopefully, someone knows the answers to one or more of my questions:


  • 提交时间戳功能是否可能产生时间混乱?我要寻找的是一种识别自特定时间以来已更改的记录的方法,以便以后可以进行任何更改。如果有相同的时间戳,则不需要完美的提交顺序。

  • Is it possible for the commit timestamp feature to produce times out of order? What I'm after is a way to identify records that have been changed since a specific time so that I can get any later changes for processing. If there are identical timestamps, I don't need them in perfect commit sequence.

在最终实现中,每行增加了多少个字节?我看到的讨论似乎是12-24个字节。讨论过添加额外的字节以防万一。这是9.5之前的版本,所以是一个世界以前。

How many bytes are added to each row in the final implementation? The discussions I saw seemed to be ranging from 12-24 bytes. There was discussion of adding in extra bytes for "just in case." This is pre 9.5, so a world ago.

时间戳是否在内部索引?有B树吗?我要求进行容量规划的原因。

Are the timestamps indexed internally? With a B-tree? I ask for capacity-planning reasons.

我在StackOverflow和设计讨论中看到,时间戳记不会无限期保存,但是找不到

I've seen on StackOverflow and the design discussions that the timestamps are not kept indefinitely, but can't find the details on exactly how long they are stored.

有关启用track_commit_timestamp对性能的影响的任何经验法则?我不需要所有表上的数据,但是,在哪里呢,听起来好像可以正常工作。

Any rules of thumb on the performance impact of enabling track_commit_timestamp? I don't need the data on all tables but, where I do, it sounds like it might work perfectly.

有麻烦吗?我尝试在测试表上运行VACUUM FULL,但pg_xact_commit_timestamp均未更改。像VACUUM这样的物理操作似乎不应该改变任何东西,但是很容易有些我没想到的东西。而且,老实说,我的VACUUM快速测试甚至什么都没有。

Any gotchas? I tried running VACUUM FULL on a test table and none of the pg_xact_commit_timestamp changed. It seems like a physical operation like VACUUM shouldn't change anything, but there could easily be something I've not thought of. And, honestly, my quick VACUUM test might not even mean anything.

非常感谢您的帮助!

我已经编辑了问题以阐明我要完成的工作,我希望基于已处理和未处理的数据进行跟踪

I've edited my question to clarify what I'm trying to accomplish, I'm looking to track processed and unprocessed data based on update stamps.

select max(pg_xact_commit_timestamp(xmin)) from scan;--   2019-07-07 20:46:14.694288+10

update scan set quantity = 5 where quantity = 1; --       Change some data.

select max(pg_xact_commit_timestamp(xmin)) from scan; --  2019-07-10 09:38:17.920294+10

-- Find the changed row(s):
select * 
  from scan 
 where pg_xact_commit_timestamp(xmin) > '2019-07-07 20:46:14.694288+10'; 

这个想法是定期对行进行汇总。因此,

The idea is to do a rollup on rows incrementally and regularly. So,

-跟踪上一个汇总的时间戳。
-等待5分钟(或其他时间)。
-查找当前的最大提交时间戳。
-搜索提交时间戳介于最后处理的时间戳和最大值之间的行。
-将其汇总。

-- Track the last rolled up timestamp. -- Wait for 5 minutes (or whatever.) -- Find the current max commit timestamp. -- Search for rows where the commit timestamp is between the last processed timestamp and the max. -- Roll them up.

仅交易ID不能正常工作,因为它们很容易乱序提交。这个时间戳记系统不一定必须是100%完美的,但我的目标是非常接近完美的东西。因此,可能会容忍一些时钟摆动,甚至在开始/结束时间重叠时甚至有些混乱。

Transaction IDs alone can't work because they can commit out of order very easily. And this timestamp system doesn't have to be 100% perfect, but I'm aiming for something very close to perfect. So, a bit of clock wiggle and even a bit of confusion around overlapping start/end times is likely tolerable.

此计划中是否存在明显的缺陷?

Is there a glaring flaw in this plan?

推荐答案

由于该主题似乎在归档文件中并未出现太多,因此我想在继续之前添加一些细节。我在几个列表,论坛上以及通过直接沟通询问了相关问题。有几个人很友善地审查了源代码,提供了历史背景,并为我清除了这些内容。希望在这里保留一些细节可以帮助其他人。显然,错误都是我的错误,更不用说进行更正和增强了。

As this subject doesn't seem to show up in the archives very much, I want to add a bit of detail before moving on. I asked related questions on several lists, forums, and by direct communication. Several people were kind enough to review the source code, provide historical background, and clear this up for me. Hopefully, leaving some detail here will help someone else down the track. Errors are all mine, obviously, corrections and enhancements more than welcome.


  • 在事务工作完成后分配提交时间戳记,但与提交时不同。 WAL编写者不会更新标记以使其按时间顺序排列。

  • 因此,绝对提交时间戳不是可靠的机制,可以按顺序查找更改行

  • 多个时钟。自调时钟。哦,人类!

  • 如果您确实希望按顺序更改顺序,则可以选择逻辑解码或复制。 (我几周前通过实验对逻辑复制进行了试验。最酷的东西。有史以来。)

  • 时间戳跟踪的成本是每个交易 12字节,每行。所以,还不错。 (时间戳为8个字节,事务ID为4个字节。)

  • 这是现有事务系统的全部内容,因此事务ID滚动的实际情况也适用于此。 (对我而言,这并不可怕。)请参阅:

  • Commit timestamps are assigned when the transaction's work is completed, but that's not the same was when it is committed. The WAL writer doesn't update the stamps to keep them in chronological sequence.
  • Therefore, commit timestamps are definitely not a reliable mechanism for finding changes rows in order.
  • Multiple clocks. Self-adjusting clocks. Oh the humanity!
  • If you do want an in order-change sequence, logical decoding or replication are options. (I tried out logical replication a couple of weeks ago experimentally. Coolest. Thing. Ever.)
  • The cost of timestamp tracking is 12 bytes per transaction, not per row. So, not so bad. (Timestamps are 8 bytes, transaction IDs are 4 bytes.)
  • This is all part of the existing transaction system, so the realities of transaction ID rollaround apply here too. (Not scary in my case.) See:

https://www.postgresql.org/docs/current/routine-vacuuming.html

这篇关于关于Postgres track_commit_timestamp(pg_xact_commit_timestamp)的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆