仅从数据库中的日志表中读取新行 [英] Reading only new rows from a log-like table in a database

查看:114
本文介绍了仅从数据库中的日志表中读取新行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有几个服务器将关键数据库中的表中插入大量行的位置,一个服务器从表中一次读取新数据。 (表在概念上是某种日志文件 - 数据只插入但从未被修改,读取服务器显示日志的尾部。)有没有办法让读取服务器只读取新数据?我们可以根据需要自由构建表格。

We have the sitation that several servers are inserting chunks of rows into a table in a relational database, and one server reads the new data once in a while from the table. (The table is conceptually some kind of logfile - data is only inserted but never modified, and the reading server shows a tail of the log.) Is there a way to have the reading server only read the new data? We are free to structure the table(s) as we want to.

有些想法超出了我的想法但不起作用是:

Some ideas that crossed my mind but do not work are:


  • 将行标记为已读不符合我们的应用程序:读取服务器不应更改数据库。 (写入数据库显示东西不是一件好事,可能会有几个会话显示的东西。)

  • Marking the rows as read does not fit our application: the reading server should not change the database. (Writing to the database for displaying things is not a good thing to do, and there might be several sessions displaying the stuff.)

我们可以插入一个时间戳在每一行充满数据库系统的时候。问题是这不是提交时间的时间戳,而是插入时间。如果您要求数据库现在提供现在-5分钟之间的所有价值,那么您不能依赖现有的所有值,因为可能会有交易正在进行中。您将不得不再次询问这个时间间隔内的值,这是我想避免的。

We could insert a timestamp in each row that is filled with the database system time. The problem is that this is not the timestamp of the commit time, but of the insert time. If you ask the database "give me all values between now-5 minutes and now" you cannot rely on all values being present, since there might be transactions in progress. You'll have to ask again later for the values in this interval, which is what I wanted to avoid.

我们可以插入一个正在运行的行计数,从一个序列。运行事务的同样的问题发生在使用时间戳时。

We could insert a running row count filled from a sequence. The same problem with running transactions occurs as when using timestamps.

是否有解决问题的方法,必须应用一些启发式,如假设最大交易时间,并始终要求在现在 - 最大交易时间之后写入的值和读取一些数据两次?

Is there any solution to the problem, or do I have to apply some heuristics like assuming a maximum transaction time and always asking for values written after "now - maximum transaction time" and reading some data twice?

如果重要:我们用这个Oracle。但是,我认为仅与其他数据库一起使用的答案也是非常有用的。

In case it matters: we use Oracle for this. But I assume answers that work only with other databases, are of general interest as well.

推荐答案

正在使用的数据库不是指定,所以不清楚解决方案是否必须被锤击到现有部署中。有一些可以插入到MySQL中的队列引擎可能有效。其中一个是 Q4M 。一些商业数据库(如Oracle)具有临时数据库功能,可以确定事务时间与有效时间与实时时间。

The database being used wasn't specified so it's not clear as to whether the solution has to be hammered into an existing deployment or not. There are some queue engines that can be plugged into MySQL that could potentially work. One of them is Q4M. Some commercial databases like Oracle have temporal database functionality that allow for determining transaction time vs valid time vs real time.

使用Oracle时,伪列 ora_rowscn 或有用的组合 scn_to_timestamp(ora_rowscn)可以有效地提供一行提交的时间戳(发生SCN的时间戳) )。或者,Oracle Workspace Manager提供版本启用表,基本上如下所示:您可以在具有 DBMS_WM.EnableVersioning(...)的表上启用版本控制,指定有效时间范围的 WMSYS.WM_PERIOD(...)字段设置工作空间的有效范围在读取器 DBMS_WM上设置。 SetValidTime(...)

When using Oracle, either the pseudo-column ora_rowscn or the useful combination scn_to_timestamp(ora_rowscn) can effectively provide the timestamp for when a row was committed (the SCN in which it took place). Alternatively, Oracle Workspace Manager provides version-enable tables, basically it goes like this: You enable versioning on a table with DBMS_WM.EnableVersioning(...), rows are inserted with an aditional WMSYS.WM_PERIOD(...) field specifying a valid time range, set a valid range for the workspace is set on the reader DBMS_WM.SetValidTime(...).

您也可以通过将您的时间戳想法与提交时间启发式相关联来在一定程度上假冒此功能。这个想法只是将有效时间与数据一起存储,而不是使用now()的任意增量。换句话说,基于提交时间+某些可接受的延迟窗口(可能是平均提交时间+标准偏差的两倍)的启发式,将指定一些未来日期(有效时间)的辅助时间戳列。或者,使用一些ceil()的平均提交时间(至少提交时间,但舍入到30秒间隔)。后者将有效量化(合并?)时间日志记录将被读取。它看起来不太一样,但这样可以避免读取冗余行。它也解决了读书应用程序无法准确了解写作应用程序的提交时间,而无需编写更多代码的问题。

You could also fake this functionality to a certain degree by meshing your timestamp idea with the commit time heuristic. The idea is simply to store the "valid time" as a column along with the data instead of using an arbitrary delta from now(). In other words a secondary timestamp column that would specify some future date (the "valid time") based on a heuristic of commit time + some acceptable window of delay (perhaps the mean commit time + twice the standard deviation). Alternatively, using some ceil()ing of mean commit time ("at least the commit time but rounding up to, say, 30 second intervals"). The latter would effectively quantize (coalesce?) the time log records would be read. It doesn't seem too different but this way would save you from reading redundant rows. It also solves the problem that the reading application cannot accurately know the commit times of the writing application without writing a lot more code.

这篇关于仅从数据库中的日志表中读取新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆