使用 CDC 防止多个数据库的更新循环 [英] Preventing update loops for multiple databases using CDC

查看:42
本文介绍了使用 CDC 防止多个数据库的更新循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有许多无法更改的旧系统 - 但是,我们希望开始从这些系统获取数据更改并将其自动应用到其他系统.

We have a number of legacy systems that we're unable to make changes to - however, we want to start taking data changes from these systems and applying them automatically to other systems.

我们正在考虑将某种形式的服务总线(尚未选择特定技术)置于中间,并使用一组总线适配器(每个遗留应用程序一个)在数据库特定概念和一般更新消息之间进行转换.

We're thinking of some form of service bus (no specific tech picked yet) sitting in the middle, and a set of bus adapters (one per legacy application) to translate between database specific concepts and general update messages.

我一直在关注的一个领域是使用变更数据捕获 (CDC) 来监控旧数据库中的更新活动,并使用该信息构建适当的消息.但是,我有一个问题——作为 CDC 信息的消费者,我如何最好地区分应用程序应用的更改与总线适配器在接收消息时应用的更改——否则,总线分发的第一个更新将当每个接收者将更改应用于自己的系统时,他们会重新分发.

One area I've been looking at is using Change Data Capture (CDC) to monitor update activity in the legacy databases, and use that information to construct appropriate messages. However, I have a concern - how best could I, as a consumer of CDC information, distinguish changes applied by the application vs changes applied by the bus adapter on receipt of messages - because otherwise, the first update that gets distributed by the bus will get re-distributed by every receiver when they apply that change to their own system.

如果我正在实施穷人"CDC - 即触发器,那么这些触发器在原始 DML 语句的上下文/事务/连接中执行 - 所以我可以将它们设计为忽略一个特定用户(应用传入更新的用户从总线),或设置并检测会话属性以类似忽略某些更新.

If I was implementing "poor mans" CDC - i.e. triggers, then those triggers execute within the context/transaction/connection of the original DML statements - so I could either design them to ignore one particular user (the user applying incoming updates from the bus), or set and detect a session property to similar ignore certain updates.

有什么想法吗?

推荐答案

如果我正确理解您的问题,您正在尝试定义一个消息路由结构,该结构适用于您已经选择的设计(使用 企业服务总线) 和 一种消息实现,您可以使用它从旧系统流出数据,这些旧系统将更改转发到新系统.

If I understand your question correctly, you're trying to define a message routing structure that works with a design you've already selected (using an enterprise service bus) and a message implementation that you can use to flow data off your legacy systems that only forward-ports changes to your newer systems.

困难在于您试图以这样一种方式应用更改,即它们本身不会从从旧系统接收数据包的客户端生成 CDC 消息.事实上,您所关心的只是让您的较新系统使用数据而不是将消息传播回您的总线,从而产生不必要的串扰,这可能会使您的基础设施过载.

The difficulty is you're trying to apply changes in such a way that they don't themselves generate a CDC message from the clients receiving the data bundle from your legacy systems. In fact, all you're concerned about is having your newer systems consume the data and not propagate messages back to your bus, creating unnecessary crosstalk that might exponentiate, overloading your infrastructure.

秘密在于 MSSQL 的 CDC特征在它们通过网络传播时协调变化.具体来说,请注意以下警告:

The secret is how MSSQL's CDC features reconcile changes as they propagate through the network. Specifically, note this caveat:

所有更改都以 LSN 或日志序列号记录.SQL通过日志序列号清楚地标识 DML 的每个操作.对任何表的任何提交的修改都记录在具有 SQL 提供的特定 LSN 的数据库的事务日志服务器.__$operationcolumn 值是:1 = 删除,2 = 插入,3 =更新(更新前的值),4 = 更新(更新后的值).

All the changes are logged in terms of LSN or Log Sequence Number. SQL distinctly identifies each operation of DML via a Log Sequence Number. Any committed modifications on any tables are recorded in the transaction log of the database with a specific LSN provided by SQL Server. The __$operationcolumn values are: 1 = delete, 2 = insert, 3 = update (values before update), 4 = update (values after update).

cdc.fn_cdc_get_net_changes_dbo_Employee 给我们所有的记录我们在函数中提供的 LSN 之间发生了变化.我们有net_change 函数返回的三个记录;有一个删除,一次插入和两次更新,但在同一条记录上.在这种情况下更新后的记录,它只显示两个更新后的净变化值更新完成.

cdc.fn_cdc_get_net_changes_dbo_Employee gives us all the records net changed falling between the LSN we provide in the function. We have three records returned by the net_change function; there was a delete, an insert, and two updates, but on the same record. In case of the updated record, it simply shows the net changed value after both the updates are complete.

要获取所有更改,请执行cdc.fn_cdc_get_all_changes_dbo_Employee;有选择通过'ALL'或'ALL UPDATE OLD'.'ALL' 选项提供所有更改,但对于更新,它提供更新后的值.因此我们找到两个更新记录.我们有一个记录显示第一个当 Jason 更新为 Nichole 时更新,当 Nichole 更新为一张记录时已更新为 EMMA.

For getting all the changes, execute cdc.fn_cdc_get_all_changes_dbo_Employee; there are options either to pass 'ALL' or 'ALL UPDATE OLD'. The 'ALL' option provides all the changes, but for updates, it provides the after updated values. Hence we find two records for updates. We have one record showing the first update when Jason was updated to Nichole, and one record when Nichole was updated to EMMA.

虽然此文档有些简洁且难以理解,但似乎已按 LSN 顺序记录和协调更改.此系统应该丢弃竞争性更改,让您的一致性模型有效工作.

While this documentation is somewhat terse and difficult to understand, it appears that changes are logged and reconciled in LSN order. Competing changes should be discarded by this system, allowing your consistency model to work effectively.

还要注意:

CDC 默认禁用,必须在数据库级别启用然后在桌子上启用.

CDC is by default disabled and must be enabled at the database level followed by enabling on the table.

选项 B 变得显而易见:在您的旧系统上建立 CDC,然后使用您的服务总线将这些更改转换为未绑定到 CDC 的更新(例如,使用原始事务更新语句).这应该允许您从系统设计中寻求数据的单向流.

Option B then becomes obvious: institute CDC on your legacy systems, then use your service bus to translate these changes into updates that aren't bound to CDC (using, for example, raw transactional update statements). This should allow for the one-way flow of data that you seek from the design of your system.

有关协调变更的其他方法,请考虑这篇关于最终一致性"的维基百科文章提出的概念.祝您的内部数据库消息传递系统好运.

For additional methods of reconciling changes, consider the concepts raised by this Wikipedia article on "eventual consistency". Best of luck with your internal database messaging system.

这篇关于使用 CDC 防止多个数据库的更新循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆