同步常见分布式数据的最佳实践 [英] Best Practice for synchronizing common distributed data

查看:126
本文介绍了同步常见分布式数据的最佳实践的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个支持离线模式的互联网应用程序,其中用户可能创建数据,当用户恢复在线时将与服务器同步。所以,因为这个我使用UUID的身份在我的数据库,所以断开的客户端可以生成新的对象,而不必担心使用另一个客户端使用的ID等。然而,虽然这对于这个用户拥有的对象是很好的,是由多个用户共享的对象。例如,用户使用的标记可能是全局的,并且远程数据库没有可能保存Universe中的所有可能的标记。

I have a internet application that supports offline mode where users might create data that will be synchronized with the server when the user comes back online. So because of this I'm using UUID's for identity in my database so the disconnected clients can generate new objects without fear of using an ID used by another client, etc. However, while this works great for objects that are owned by this user there are objects that are shared by multiple users. For example, tags used by a user might be global, and there's no possible way the remote database could hold all possible tags in the universe.

如果离线用户创建对象并向其中添加了一些标签。假设这些标签不存在于用户的本地数据库上,因此软件为它们生成UUID。现在当这些标签被同步时,将需要解决过程来解决任何重叠。有些方法可以将远程数据库中的任何现有标记与本地版本进行匹配。

If an offline user creates an object and adds some tags to it. Let's say those tags don't exist on the user's local database so the software generates a UUID for them. Now when those tags are synchronized there would need to be resolution process to resolve any overlap. Some way to match up any existing tags in the remote database with the local versions.

一种方法是使用一些进程,名称,如果是标签),本地数据库必须使用此全局数据库中的对象替换其现有对象。当有很多到其他对象的连接时,这可能是凌乱的。有人告诉我要避免这种情况。

One way is to use some process by which global objects are resolved by a natural key (name in the case of a tag), and the local database has to replace it's existing object with this the one from the global database. This can be messy when there are many connections to other objects. Something tells me to avoid this.

另一种处理方式是使用两个ID。一个全局ID和一个本地ID。我希望使用UUIDs将有助于避免这一点,但我继续来回使用单个UUID和使用两个拆分ID。使用这个选项让我想知道我是否让问题失控。

Another way to handle this is to use two IDs. One global ID and one local ID. I was hoping using UUIDs would help avoid this, but I keep going back and forth between using a single UUID and using two split IDs. Using this option makes me wonder if I've let the problem get out of hand.

另一种方法是跟踪通过非共享对象的所有更改。在此示例中,用户分配标签的对象。当用户同步其脱机更改时,服务器可能会将其本地标记替换为全局标记。下一次此客户端与服务器同步时,它会检测非共享对象中的更改。当客户端拉下该对象时,他将接收到全局标记。软件将简单地重新保存非共享对象,将其指向服务器的标签并孤立其本地版本。与此相关的一些问题是完全同步的额外往返,以及在孤立的本地数据库中的额外数据。当系统处于同步状态之间时,是否会出现其他问题或错误? (即尝试与服务器通信并向对象发送本地UUID等)。

Another approach is to track all changes through the non-shared objects. In this example, the object the user assigned the tags. When the user synchronizes their offline changes the server might replace his local tag with the global one. The next time this client synchronizes with the server it detects a change in the non-shared object. When the client pulls down that object he'll receive the global tag. The software will simply resave the non-shared object pointing it to the server's tag and orphaning his local version. Some issues with this are extra round trips to fully synchronize, and extra data in the local database that is just orphaned. Are there other issues or bugs that could happen when the system is in between synchronization states? (i.e. trying to talk to the server and sending it local UUIDs for objects, etc).

另一个替代方法是避免常见对象。在我的软件,可能是一个可以接受的答案。我不是在用户之间进行大量的对象共享,但这并不意味着我不会在未来这样做。这意味着,如果我需要添加这些类型的功能,选择此选项可能会瘫痪我的软件在未来。这个选择有后果,我不知道我是否已经完全探索他们。

Another alternative is to avoid common objects. In my software that could be an acceptable answer. I'm not doing a lot of sharing of objects across users, but that doesn't mean I'd NOT be doing it in the future. Which means choosing this option could paralyze my software in the future should I need to add these types of features. There are consequences to this choice, and I'm not sure if I've completely explored them.

所以我正在寻找任何种类的最佳实践,现有算法

So I'm looking for any sort of best practice, existing algorithms for handling this type of system, guidance on choices, etc.

推荐答案

根据您想要提供给用户的应用程序语义,你可以选择不同的解决方案。例如,如果您实际上讨论的是使用关键字为离线用户创建的对象进行标记,并且想要跨不同用户创建的多个对象共享标记,则可以按照您的建议使用文本作为标记。一旦每个人的更改都合并,就会共享具有相同文字的标签(例如THIS IS AWESOME)。

Depend on what application semantics you want to offer to users, you may pick different solutions. E.g., if you are actually talking about tagging objects created by an offline user with a keyword, and wanting to share the tags across multiple objects created by different users, then using "text" for the tag is fine, as you suggested. Once everyone's changes are merged, tags with the same "text", like, say "THIS IS AWESOME", will be shared.

还有其他方法来处理断开连接的更新到共享对象。 SVN,CVS等版本控制系统试图自动解决冲突,而当不能,只会告诉用户有冲突。您也可以这样做,只是告诉用户有并发更新,用户必须处理解决方案。

There are other ways to handle disconnected updates to shared objects. SVN, CVS, and other version control system try to resolve conflicts automatically, and when cannot, will just tell user there is a conflict. You can do the same, just tell user there have been concurrent updates and the users have to handle resolution.

或者,您还可以将更新记录为更改单位,尝试一起编写更改。例如,如果您的共享对象是画布,并且您的应用程序语义允许在同一画布上共享绘图,那么断开连接的更新将从点A绘制线到点B,另一个断开连接的更新绘制从点C到点的线D.可以组成。在这种情况下,如果将这两个更新保留为两个操作,则可以对这两个更新进行排序,并在重新连接后,每个用户上传所有断开连接的操作,并应用其他用户缺少的操作。

Alternatively, you can also log updates as units of change, and try to compose the changes together. For example, if your shared object is a canvas, and your application semantics allows shared drawing on the same canvas, then a disconnected update that draws a line from point A to point B, and another disconnected update drawing a line from point C to point D, can be composed. In this case, if you keep those two updates as just two operations, you can order the two updates and on re-connection, each user uploads all its disconnected operations and applies missing operations from other users. You probably want some kind of ordering rule, perhaps based on version number.

另一个选择:如果对共享对象的更新不能自动协调,并且你的应用程序语义不支持,则可能需要某种排序规则通知用户并要求用户解决由于断开的更新所导致的冲突,那么您还可以使用版本树来处理此问题。对共享对象的每次更新都会创建一个新版本,并将过去版本作为父版本。当有来自两个不同用户的共享对象的断开连接更新时,两个单独的子版本/叶节点来自相同的父版本。如果你的应用程序的状态的内部表示是这个版本树,那么你的应用程序的内部状态保持一致,尽管断开连接更新,你可以用一些其他方式处理版本树的两个分支(例如让用户知道分支和为他们创建工具合并分支,如在源代码管理系统中)。

Another alternative: if updates to shared objects cannot be automatically reconciled, and your application semantics does not support notifying user and asking user to resolve conflicts due to disconnected updates, then you can also use version tree to handle this. Each update to a shared object creates a new version, with past version as the parent. When there are disconnected updates to a shared object from two different users, two separate children versions/leaf nodes result from the same parent version. If your application's internal representation of state is this version tree, then your application's internal state remains consistent despite disconnected updates, and you can handle the two branches of the version tree in some other way (e.g. letting user know of branches and create tools for them to merge branches, as in source control systems).

只是几个选项。希望这有帮助。

Just a few options. Hope this helps.

这篇关于同步常见分布式数据的最佳实践的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆