NoSQL 和最终一致性——现实世界的例子 [英] NoSQL and eventual consistency - real world examples

查看:14
本文介绍了NoSQL 和最终一致性——现实世界的例子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找 NoSQL 应用程序的良好示例,这些应用程序描述了如何在缺乏事务性的情况下工作,正如我们在关系数据库中所知道的那样.我对编写密集型代码最感兴趣,因为对于大多数只读代码来说,这是一项更容易的任务.我已经阅读了很多关于 NoSQL 的东西,关于 CAP 定理,最终一致性等.然而,这些东西往往专注于数据库架构本身,而不是与它一起使用的设计模式.我确实理解在分布式应用程序中实现完全事务性是不可能的.这正是为什么我想了解应该在哪里以及如何降低要求以使任务可行.

I'm looking for good examples of NoSQL apps that portray how to work with lack of transactionality as we know it in relational databases. I'm mostly interested in write-intensive code, as for mostly read-only code this is a much easier task. I've read a number of things about NoSQL in general, about CAP theorem, eventual consistency etc. However those things tend to concentrate on the database architecture for its own sake and not on the design patterns to use with it. I do understand that it's impossible to achieve full transactionality within a distributed app. This is exactly why I would like to understand where and how requirements should be lowered in order to make the task feasable.

最终的一致性并不是我的目标.目前我还没有真正了解如何将 NoSQL 用于某些写入密集型的事情.说:我有一个简单的拍卖系统,里面有报价.理论上,第一个接受报价的人获胜.在实践中,我至少想保证只有一个获胜者,并且人们在同一个请求中得到他们的结果.这可能是不可行的.但是如何在实践中解决它 - 也许有些请求可能比平时花费更长的时间,因为出了点问题.也许一些请求应该自动刷新.这只是一个例子.

It's not that eventual consistency is my goal on it's own. For the time being I don't really see how to use NoSQL to certain things that are write-intensive. Say: I have a simplistic auction system, where there are offers. In theory the first person to accept an offer wins. In practice I would like at least to guarantee that there is only a single winner and that people get their results in the same request. It's probably not feasable. But how to solve it in practice - maybe some requests could take longer than usual, because something went wrong. Maybe some requests should be automatically refreshed. It's just an example.

推荐答案

让我用纯粹直观的术语来解释 CAP.一、C、A、P是什么意思:

Let me explain CAP in purely intuitive terms. First, what C, A and P mean:

  • 一致性:从外部观察者的角度来看,每个事务"要么完全完成,要么完全回滚.例如,进行亚马逊购买时,购买确认、订单状态更新、库存减少等都应该同步"出现无论内部划分为子系统

  • Consistency: From the standpoint of an external observer, each "transaction" either fully completed or is fully rolled back. For example, when making an amazon purchase the purchase confirmation, order status update, inventory reduction etc should all appear 'in sync' regardless of the internal partitioning into sub-systems

可用性:100% 的请求成功完成.

Availablility: 100% of requests are completed successfully.

Partition Tolerance:任何给定的请求都可以完成,即使一个系统中的节点子集不可用.

Partition Tolerance: Any given request can be completed even if a subset of nodes in the system are unavailable.

从系统设计的角度来看,这些意味着什么?CAP定义的张力是什么?

What do these imply from a system design standpoint? what is the tension which CAP defines?

要实现 P,我们需要副本.很多!我们保留的副本越多,即使某些节点处于离线状态,我们需要的任何数据都可用的机会就越大.对于绝对P",我们应该将每个数据项复制到系统中的每个节点.(显然在现实生活中我们会在 2、3 等上妥协)

To achieve P, we needs replicas. Lots of em! The more replicas we keep, the better the chances are that any piece of data we need will be available even if some nodes are offline. For absolute "P" we should replicate every single data item to every node in the system. (Obviously in real life we compromise on 2, 3, etc)

要实现 A,我们不需要单点故障.这意味着主/从"或主/从"复制配置不再适用,因为主/主是单点故障.我们需要使用多个主配置.要实现绝对A",任何单个副本都必须能够独立于其他副本处理读取和写入.(实际上我们在异步、基于队列、仲裁等方面做出了妥协)

To achieve A, we need no single point of failure. That means that "primary/secondary" or "master/slave" replication configurations go out the window since the master/primary is a single point of failure. We need to go with multiple master configurations. To achieve absolute "A", any single replica must be able to handle reads and writes independently of the other replicas. (in reality we compromise on async, queue based, quorums, etc)

要实现 C,我们需要系统中的单一版本的真理".这意味着如果我写入节点 A,然后 立即 从节点 B 读回,节点 B 应该返回最新值.显然,这在真正分布式的多主机系统中是不可能发生的.

To achieve C, we need a "single version of truth" in the system. Meaning that if I write to node A and then immediately read back from node B, node B should return the up-to-date value. Obviously this can't happen in a truly distributed multi-master system.

那么,您的问题的解决方案是什么?可能是为了放松一些约束,并在其他约束上妥协.

So, what is the solution to your question? Probably to loosen up some of the constraints, and to compromise on the others.

例如,要在具有 n 个副本的系统中实现完全写入一致性"保证,读取次数 + 写入次数必须大于或等于 n:r+ w >= n.这很容易用一个例子来解释:如果我将每个项目存储在 3 个副本上,那么我有几个选项来保证一致性:

For example, to achieve a "full write consistency" guarantee in a system with n replicas, the # of reads + the # of writes must be greater or equal to n : r + w >= n. This is easy to explain with an example: if I store each item on 3 replicas, then I have a few options to guarantee consistency:

A) 我可以将项目写入所有 3 个副本,然后从 3 个副本中的任何一个读取,并确信我得到了最新版本B)我可以将项目写入其中一个副本,然后读取所有 3 个副本并选择 3 个结果中的最后一个C)我可以写入 3 个副本中的 2 个,并从 3 个副本中的 2 个读取,并且我保证我将在其中一个上拥有最新版本.

A) I can write the item to all 3 replicas and then read from any one of the 3 and be confident I'm getting the latest version B) I can write item to one of the replicas, and then read all 3 replicas and choose the last of the 3 results C) I can write to 2 out of the 3 replicas, and read from 2 out of the 3 replicas, and I am guaranteed that I'll have the latest version on one of them.

当然,上面的规则假设在此期间没有节点发生故障.为了确保 P + C,你需要更加偏执......

Of course, the rule above assumes that no nodes have gone down in the meantime. To ensure P + C you will need to be even more paranoid...

还有几乎无限数量的实现"黑客攻击——例如,如果存储层无法写入最小仲裁,它可能会导致调用失败,但即使在返回后也可能继续将更新传播到其他节点成功.或者,它可能会放松语义保证并将合并版本控制冲突的责任推到业务层(这就是亚马逊的 Dynamo 所做的).

There are also a near-infinite number of 'implementation' hacks - for example the storage layer might fail the call if it can't write to a minimal quorum, but might continue to propagate the updates to additional nodes even after returning success. Or, it might loosen the semantic guarantees and push the responsibility of merging versioning conflicts up to the business layer (this is what Amazon's Dynamo did).

不同的数据子集可以有不同的保证(即单点故障对于关键数据可能是可以的,或者在最小的写入副本数成功写入新版本之前阻塞您的写入请求可能是可以的)

Different subsets of data can have different guarantees (ie single point of failure might be OK for critical data, or it might be OK to block on your write request until the minimal # of write replicas have successfully written the new version)

还有更多要谈的,但如果这有帮助,请告诉我,如果您有任何后续问题,我们可以从那里继续......

There is more to talk about, but let me know if this was helpful and if you have any followup questions, we can continue from there...

[继续...]

解决 90% 情况的模式已经存在,但每个 NoSQL 解决方案都在不同的配置中应用它们.这些模式是分区(基于稳定/散列或基于变量/查找)、冗余和复制、内存缓存中的、分布式算法(如 map/reduce).

The patterns for solving the 90% case already exist, but each NoSQL solution applies them in different configurations. The patterns are things like partitioning (stable/hash-based or variable/lookup-based), redundancy and replication, in memory-caches, distributed algorithms such as map/reduce.

当您深入研究这些模式时,底层算法也相当普遍:版本向量、merckle 树、DHT、gossip 协议等.

When you drill down into those patterns, the underlying algorithms are also fairly universal: version vectors, merckle trees, DHTs, gossip protocols, etc.

对于大多数 SQL 解决方案来说也是如此:它们都实现了索引(在底层使用 b 树),具有基于已知 CS 算法的相对智能的查询优化器,都使用内存缓存来减少磁盘空间IO.差异主要在实施、管理经验、工具集支持等方面

The same can be said for most SQL solutions: they all implement indexes (which use b-trees under the hood), have relatively smart query optimizers which are based on known CS algorithms, all use in-memory caching to reduce disk IO. The differences are mostly in implementation, management experience, toolset support, etc

很遗憾,我无法指出某个包含您需要知道的所有知识的中央知识库.一般来说,首先要问自己真正需要哪些 NoSQL 特性.这将指导您在键值存储、文档存储或列存储之间进行选择.(这些是 NoSQL 产品的 3 个主要类别).从那里您可以开始比较各种实现.

unfortunately I can't point to some central repository of wisdom which contains all you will need to know. In general, start with asking yourself what NoSQL characteristics you really need. That will guide you to choosing between a key-value store, a document store or a column store. (those are the 3 main categories of NoSQL offerings). And from there you can start comparing the various implementations.

[2011 年 4 月 14 日再次更新]

好的,这是真正证明赏金合理的部分..我刚刚在 NoSQL 系统上找到了以下 120 页的白皮书.这非常接近于我之前告诉你的NoSQL 圣经"不存在.阅读并高兴:-)

OK here's the part which actually justifies the bounty.. I just found the following 120 page whitepaper on NoSQL systems. This is very close to being the "NoSQL bible" which I told you earlier doesn't exist. Read it and rejoice :-)

NoSQL 数据库,Christof Strauch

这篇关于NoSQL 和最终一致性——现实世界的例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆