NoSQL和最终的一致性-实际示例 [英] NoSQL and eventual consistency - real world examples

查看:94
本文介绍了NoSQL和最终的一致性-实际示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找NoSQL应用程序的好例子,这些应用程序描述了如何在关系数据库中缺乏事务性的情况下工作。我主要对写密集型代码感兴趣,对于大多数只读代码,这是一件容易得多的任务。我已经阅读了很多有关NoSQL,CAP定理,最终一致性等方面的知识。但是,出于自身的原因,这些内容往往集中于数据库体系结构,而不是与之配合使用的设计模式。我确实知道,在分布式应用程序中无法实现完全的交易性。这就是为什么我想了解在何处以及如何降低要求以使任务可行的原因。

I'm looking for good examples of NoSQL apps that portray how to work with lack of transactionality as we know it in relational databases. I'm mostly interested in write-intensive code, as for mostly read-only code this is a much easier task. I've read a number of things about NoSQL in general, about CAP theorem, eventual consistency etc. However those things tend to concentrate on the database architecture for its own sake and not on the design patterns to use with it. I do understand that it's impossible to achieve full transactionality within a distributed app. This is exactly why I would like to understand where and how requirements should be lowered in order to make the task feasable.

编辑:

最终一致性并不是我自己的目标。暂时我还没有真正看到如何在某些写密集型的事情上使用NoSQL。说:我有一个简单的拍卖系统,那里有报价。理论上,第一个接受报价的人会获胜。在实践中,我至少要保证只有一个获胜者,并且人们在相同的请求中得到他们的结果。这可能不可行。但是在实际中如何解决-某些请求可能会比平时花费更长的时间,因为出了点问题。也许某些请求应自动刷新。只是一个例子。

It's not that eventual consistency is my goal on it's own. For the time being I don't really see how to use NoSQL to certain things that are write-intensive. Say: I have a simplistic auction system, where there are offers. In theory the first person to accept an offer wins. In practice I would like at least to guarantee that there is only a single winner and that people get their results in the same request. It's probably not feasable. But how to solve it in practice - maybe some requests could take longer than usual, because something went wrong. Maybe some requests should be automatically refreshed. It's just an example.

推荐答案

让我以纯粹直观的方式来解释CAP。首先,C,A和P的含义是:

Let me explain CAP in purely intuitive terms. First, what C, A and P mean:


  • 一致性:从外部观察者的角度来看,每个
    交易已完全完成或已完全回滚。例如,在进行亚马逊购买时,
    的购买确认,订单状态
    更新,库存减少等都应同步显示
    ,而无视内部子系统的划分

  • Consistency: From the standpoint of an external observer, each "transaction" either fully completed or is fully rolled back. For example, when making an amazon purchase the purchase confirmation, order status update, inventory reduction etc should all appear 'in sync' regardless of the internal partitioning into sub-systems

可用性:100%的请求已成功完成。

Availablility: 100% of requests are completed successfully.

分区容限:任何给定的请求都可以即使系统中节点的
子集不可用也可以完成。

Partition Tolerance: Any given request can be completed even if a subset of nodes in the system are unavailable.

这些从什么暗示系统设计的观点? CAP定义的张力是什么?

What do these imply from a system design standpoint? what is the tension which CAP defines?

要实现P,我们需要复制品。大量的时间!我们保留的副本越多,即使某些节点处于脱机状态,我们所需的任何数据都将变得可用的机会就越大。对于绝对 P,我们应该将每个数据项复制到系统中的每个节点。 (显然,在现实生活中,我们会在2、3等问题上折衷)

To achieve P, we needs replicas. Lots of em! The more replicas we keep, the better the chances are that any piece of data we need will be available even if some nodes are offline. For absolute "P" we should replicate every single data item to every node in the system. (Obviously in real life we compromise on 2, 3, etc)

要实现A,我们不需要单点故障。这意味着主要/次要或主/从复制配置不可用,因为主/主要是单点故障。我们需要使用多个主配置。为了获得绝对的 A,任何单个副本都必须能够独立于其他副本来处理读取和写入。 (实际上,我们在异步,基于队列,仲裁等方面做出了妥协)。

To achieve A, we need no single point of failure. That means that "primary/secondary" or "master/slave" replication configurations go out the window since the master/primary is a single point of failure. We need to go with multiple master configurations. To achieve absolute "A", any single replica must be able to handle reads and writes independently of the other replicas. (in reality we compromise on async, queue based, quorums, etc)

要实现C,我们需要在系统中使用单一版本的真相。这意味着如果我写节点A,然后立即从节点B中读回,则节点B应该返回最新值。显然,这在真正的分布式多主系统中是不可能发生的。

To achieve C, we need a "single version of truth" in the system. Meaning that if I write to node A and then immediately read back from node B, node B should return the up-to-date value. Obviously this can't happen in a truly distributed multi-master system.

那么,您的问题的解决方案是什么?可能放宽一些约束,并在其他约束上进行折衷。

So, what is the solution to your question? Probably to loosen up some of the constraints, and to compromise on the others.

例如,要在具有 n 个副本的系统中实现完全写入一致性保证,则读取次数+写入次数必须大于或等于n: r + w> = n。用一个例子很容易解释:如果我将每个项目存储在3个副本中,那么我有几个选择可以保证一致性:

For example, to achieve a "full write consistency" guarantee in a system with n replicas, the # of reads + the # of writes must be greater or equal to n : r + w >= n. This is easy to explain with an example: if I store each item on 3 replicas, then I have a few options to guarantee consistency:

A)我可以将项目写入所有3个副本,然后从3个副本中的任何一个读取,并确信我会获得最新版本的
B )我可以将项目写入其中一个副本,然后读取所有3个副本,然后选择3个结果中的最后一个
C)我可以写入3个副本中的2个,并读取3个副本中的2个副本,并且可以保证我将在其中一个副本上拥有最新版本。

A) I can write the item to all 3 replicas and then read from any one of the 3 and be confident I'm getting the latest version B) I can write item to one of the replicas, and then read all 3 replicas and choose the last of the 3 results C) I can write to 2 out of the 3 replicas, and read from 2 out of the 3 replicas, and I am guaranteed that I'll have the latest version on one of them.

当然,以上规则假定在此期间没有节点掉线。为了确保P + C,您将需要更加偏执...

Of course, the rule above assumes that no nodes have gone down in the meantime. To ensure P + C you will need to be even more paranoid...

还有几乎无限数量的实现 hacks,例如存储层如果无法写入最小仲裁,则调用可能会失败,但是即使返回成功后,也可能继续将更新传播到其他节点。或者,这可能会松开语义保证,并将合并版本冲突的责任推到业务层(这是Amazon Dynamo所做的)。

There are also a near-infinite number of 'implementation' hacks - for example the storage layer might fail the call if it can't write to a minimal quorum, but might continue to propagate the updates to additional nodes even after returning success. Or, it might loosen the semantic guarantees and push the responsibility of merging versioning conflicts up to the business layer (this is what Amazon's Dynamo did).

不同的数据子集可以有不同的保证(即,对于关键数据,单点故障可能是可以的,或者可以阻止您的写入请求,直到出现最小故障为止) #写副本已成功编写新版本)

Different subsets of data can have different guarantees (ie single point of failure might be OK for critical data, or it might be OK to block on your write request until the minimal # of write replicas have successfully written the new version)

还有更多要讨论的问题,但是请告诉我这是否有帮助以及是否有任何后续问题,我们可以从那里继续...

There is more to talk about, but let me know if this was helpful and if you have any followup questions, we can continue from there...

[续...]

解决90%情况的模式已经存在,但是每种NoSQL解决方案都将它们应用在不同的配置中。这些模式类似于分区(基于稳定/基于哈希或基于变量/查找),冗余和复制,内存缓存中的分布式算法(例如map / reduce)。

The patterns for solving the 90% case already exist, but each NoSQL solution applies them in different configurations. The patterns are things like partitioning (stable/hash-based or variable/lookup-based), redundancy and replication, in memory-caches, distributed algorithms such as map/reduce.

当您深入研究这些模式时,底层算法也相当通用:版本向量,默克尔树,DHT,八卦协议等。

When you drill down into those patterns, the underlying algorithms are also fairly universal: version vectors, merckle trees, DHTs, gossip protocols, etc.

对于大多数SQL解决方案来说,它表示:它们都实现索引(在后台使用b树),具有基于已知CS算法的相对智能的查询优化器,均使用内存缓存来减少磁盘IO。不幸的是,差异主要在于实现,管理经验,工具集支持等

The same can be said for most SQL solutions: they all implement indexes (which use b-trees under the hood), have relatively smart query optimizers which are based on known CS algorithms, all use in-memory caching to reduce disk IO. The differences are mostly in implementation, management experience, toolset support, etc

我无法指出一些包含所有您需要知道的知识的中央知识库。通常,首先要问自己真正需要什么NoSQL特性。这将指导您在键值存储,文档存储或列存储之间进行选择。 (这些是NoSQL产品的3个主要类别)。然后您就可以开始比较各种实现了。

unfortunately I can't point to some central repository of wisdom which contains all you will need to know. In general, start with asking yourself what NoSQL characteristics you really need. That will guide you to choosing between a key-value store, a document store or a column store. (those are the 3 main categories of NoSQL offerings). And from there you can start comparing the various implementations.

[2011年4月14日再次更新]

好的,这实际上是证明赏金合理的部分。.
我刚刚在NoSQL系统上找到了以下120页的白皮书。这与我之前告诉您的 NoSQL圣经几乎不存在。阅读并欣喜:-)

OK here's the part which actually justifies the bounty.. I just found the following 120 page whitepaper on NoSQL systems. This is very close to being the "NoSQL bible" which I told you earlier doesn't exist. Read it and rejoice :-)

NoSQL Christof Strauch数据库

这篇关于NoSQL和最终的一致性-实际示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆