单向数据库同步 [英] One-way Database Synchronization

查看:160
本文介绍了单向数据库同步的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通常需要将数据从一个数据库中的主表同步到其他数据库(通常位于其他服务器上)的克隆表。例如,考虑后端系统管理库存数据并且最终必须将库存数据推送到作为网站应用程序的一部分的一个或多个数据库的情况。

There is frequently the need to synchronize data from master tables in one database to clone tables in other databases, often on other servers. For example, consider the case where a backend system manages inventory data and that inventory data ultimately must be pushed to one or more databases that are part of a web site application.

后端系统中的源数据被严格标准化,具有数十个表和外键约束。它是一个设计良好的OLTP RDBMS系统。许多表中包含数百万行。需要将这些数据定期推送到其他数据库。尽可能频繁;可以容忍延迟。最重要的是,后端和远程数据库的最大正常运行时间是必要的。

The source data in the backend system is heavily normalized, with dozens of tables and foreign key constraints. It is a well-designed OLTP RDBMS system. Many of the tables in question contain millions of rows. The need is to push this data out to the other databases regularly. As frequently as feasible; latency can be tolerated. Above all, maximum uptime of both the backend and remote databases is imperative.

我使用SQL Server并熟悉更改跟踪,rowversion,触发器等。我知道Microsoft为这些场景大量推送复制,SyncFx和SSIS。然而,在推荐技术的供应商白皮书和概述与解决方案的实际实施,部署和维护之间存在很大差异。在SQL Server世界中,复制通常被视为交钥匙解决方案,但我试图探讨替代解决方案。 (有一些担心复制难以管理,难以更改模式,并且如果需要重新初始化,关键系统将会有大的停机时间。)

I am using SQL Server and am familiar with change tracking, rowversion, triggers, and so on. I know that Microsoft pushes replication, SyncFx, and SSIS heavily for these scenarios. However, there is quite a difference between vendor whitepapers and overviews recommending technologies and the actual implementation, deployment, and maintenance of the solution. In the SQL Server world, replication is often viewed as the turnkey solution, but I am trying to explore alternate solutions. (There is some fear that replication is difficult to administer, makes it hard to change schema, and in the event that a re-initialize is ever required there would be large downtime for critical systems.)

有很多gotchas。由于大量表之间的复杂外键关系,确定执行捕获或应用更新的顺序不是微不足道的。由于唯一索引,两行可能会互锁,使得每次一行更新甚至无法工作(需要在最后更新之前对每行执行中间更新)。这些不一定是显示停止,因为唯一索引通常可以更改为常规索引,外键可以禁用(虽然禁用外键是非常不受欢迎的)。通常,你会听到,只是使用SQL 2008更改跟踪和SSIS或SyncFx。这些答案对于实际困难并不公正。 (当然,客户真的很难回答如何复制数据可能是如此困难,使困难的情况下更糟糕!)

There are lots of gotchas. Due to the complex foreign key relationships among large numbers of tables, determining what order to perform captures or to apply updates is not trivial. Due to unique indexes, two rows might be interlocked in such a way that row-at-a-time update will not even work (need to perform intermediate updates to each row before the final update). These are not necessarily show-stoppers, as unique indexes can often be changed to regular indexes and foreign keys can be disabled (though disabling the foreign keys is extremely undesirable). Often, you will hear, "just" use SQL 2008 change tracking and SSIS or SyncFx. These kinds of answers really do not do justice to the practical difficulties. (And of course, clients really have a hard time wrapping their heads over how copying data could be so difficult, making a difficult situation all the worse!)

这个问题是最终非常通用:执行与许多严重相关的数据库表的大量行的单向同步。几乎每个参与数据库的人都必须处理这种问题。白皮书是常见的,实用的专业知识很难找到。我们知道这可能是一个困难的问题,但工作必须完成。让我们听听什么对你有用(和避免什么)。告诉您对Microsoft产品或其他供应商的产品的体验。但如果你个人没有对大量相关的表和行进行战斗测试的解决方案,请不要回答。

This issue is ultimately very generic: perform one-way synchronization of many heavily related database tables with lots of rows. Almost everyone involved in databases has to deal with this kind of issue. Whitepapers are common, practical expertise hard to find. We know this can be a difficult issue, but the job must get done. Let's hear about what has worked for you (and what to avoid). Tell your experience with Microsoft products or products from other vendors. But if you personally have not battle-tested the solution with large numbers of heavily-related tables and rows, please refrain from answering. Let's keep this practical -- not theoretical.

推荐答案

更好地询问serverfault.com(我不能发布评论,脚本

Better ask on serverfault.com (I can't post comments, scripts are broken in SO, so I have to post a full answer)

更新:(切换到Safari,脚本再次运行,我可以正常发布)

Update: (switched to Safari, scripts work again, I can post properly)

没有银弹。为了易于使用和一键转部署,没有什么可以击败复制。是唯一涵盖深度冲突检测和解决的解决方案,支持推送模式更改,并提供了一套全面的工具来设置和监控它。在这个议程被.Net人群接管之前,它已经是数据同步的MS海报孩子多年了。复制在我看来有两个基本问题:

There is no silver bullet. For ease of use and 'one key turn' deployment nothing can beat replication. Is the only solution that covers deeply conflict detection and resolution, has support for pushing schema changes and comes with a comprehensive set of tools for setting it up and monitoring it. It has been the MS poster child of data synchronization for many years before this 'agenda' was taken over by the .Net crowd. Replication has two underlying problems in my opinion:


  • 用于推送更改的技术是原始,慢和不可靠。它需要文件共享来启动副本,它依赖于T-SQL实际上复制数据,导致各种可扩展性问题:复制线程使用服务器工作线程,并且它们与任意表交互并且应用程序查询导致阻塞和死锁。我听说过的最大的部署大约有400-500个网站,由超人MVP和顶尖美元顾问完成。这就停止了在1500个站点上启动的许多项目(超出最大部署的复制项目)。我很好奇听到如果我错了,你知道一个SQL Server复制解决方案部署超过500个网站。

  • 复制隐喻太数据中心。它没有考虑分布式应用程序的需求:需要版本化和形式化的合同,数据自主性 fiefdoms ',从可用性和安全pov松散耦合。因此,基于复制的解决方案解决了让数据可用的迫切需求,但无法解决我的应用需要与您的应用交谈的真正问题。

  • The technology used to pushing changes is primitive, slow and unreliable. It requires file shares to initiate the replicas and it depends on T-SQL to actually replicate data, resulting in all sort of scalability problems: the replication threads use server worker threads and the fact that they interact with arbitrary tables and application queries lead to blocking and deadlocks. The biggest deployments I've heard of are around 400-500 sites and are done by superhuman MVPs and top dollar consultants. This stops on its track many projects that start at 1500 sites (way beyond largest deployed replication projects). I'm curious to hear if I'm wrong and you know of a SQL Server replication solution deployed with more than 500 sites.
  • The replication metaphor is too data centric. It does not take into account the requirements of distributed applications: need of versioned and formalized contracts, autonomy of data 'fiefdoms', loose coupling from availability and security pov. As a result replication based solution solve the immediate need to 'make data available there', but fail to solve the true problem of 'my app needs to talk with your app'.

在频谱的另一端,您将找到真正解决应用程序通信问题的解决方案,例如基于队列消息传递的服务。但是要么是痛苦地慢,充满了根源于通信机制(web服务和或msmq)和数据存储(DTC事务在comm和db之间,没有共同的高可用性故事,没有共同的可恢复性故事等)的分离的问题。在MS中存在快速且完全集成DB的解决方案堆栈,但没有人知道如何使用它们。在这些和复制之间,你会发现各种中间解决方案,如OCS / Synch框架和基于SSIS的自定义解决方案。没有可以轻松地设置和监控复制,但它们可以扩展和执行更好。

At the other end of the spectrum you'll find solutions that truly address the problem of application communication, like services based on queued messaging. But are either painfully slow and riddled with problems rooted in the separation of the communication mechanism (web services and or msmq) and the data storage (DTC transactions between comm and db, no common high availability story, no common recoverability story etc etc). Solutions that are blazingly fast and fully integrated with DB exists in the MS stack, but nobody knows how to use them. Somewhere in between these and replication you'll find various intermediate solutions, like OCS/Synch framework and SSIS based custom solutions. None will offer the ease of setup and monitoring of replication, but they might scale and perform better.

我参与了几个项目,需要'数据同步'大规模(+1200站点,+1600站点),我的解决方案是解决应用程序通信问题的问题。一旦心态改变为此,并且数据流不再被视为用表Y的键X记录,而是由消费者Y传达项目X的购买的消息,则解决方案变得更容易理解和应用。你不再认为在X-Y-Z顺序插入记录,因此FK关系不打破,而是在消息XYZ描述的过程购买方面。

I was involved with several projects that required 'data synchronization' on a very large scale (+1200 sites, +1600 sites) and my solution was to turn the problem on a 'application communication' problem. Once the mindset is changed to this and the data flow is no longer seen as 'record with key X of table Y' but instead 'message communicating the purchase of item X by customer Y' the solution becomes easier to understand and apply. You no longer think in terms of 'insert records in order X-Y-Z so FK relations don't break' but instead in terms of 'process purchase as described by message XYZ'.

在我看来,复制及其衍生物(即数据跟踪和数据库传输)是解决方案,锚定在数据/应用的'80技术和视图。过时的恐龙(并没有变成鸟类)。

In my view replication, and it derivatives (ie. data tracking and data-gram shipping), are solutions anchored in the '80 technologies and view of the data/applications. Obsolete dinosaurs (and by no way turning into birds).

我知道这甚至不会解决你所有的(非常合法的)问题,但写出所有我要说/ rant / rable这个主题将填充卷的平装...

I know this does not even begin to address all your (very legit) concerns, but writing out all I have to say/rant/rable on this topic would fill volumes of paperback...

这篇关于单向数据库同步的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆