Hadoop 3.0纠删码 - 确定可接受节点故障的数量? [英] Hadoop 3.0 erasure coding - determining the number of acceptable node failures?

查看:249
本文介绍了Hadoop 3.0纠删码 - 确定可接受节点故障的数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在hadoop 2.0中,默认复制因子为3. 可接受的节点故障数为3-1 = 2。

因此,在100个节点群集上,如果一个文件被分成10个部分(块),复制因子为3,所需的总存储块数为30个。如果任何3个包含块X和其副本的节点失败,则该文件不可恢复。即使群集有1000个节点,或者文件被分割为20个部分,群集上的3个节点的故障仍然可能对文件造成灾难性影响。

现在进入hadoop 3.0。

使用擦除编码,Hadoop说它提供了50%高效的存储。根据Reed-Solomon方法的工作原理(即k个数据块和n个奇偶校验块,至少应有k个(k + n)个块可供文件恢复/读取)

因此,对于上面的同一个文件 - 有10个数据块并将数据效率保持在50%,可以添加5个奇偶校验块。因此,从10 + 5块中,至少应该有10个块可供该文件访问。在100个节点群集上,如果15个块中的每一个都存储在一个单独的节点上,那么您可以看到,共有5个节点故障是可以接受的。现在在1000个节点集群中存储相同的文件(即15个块)对于可接受节点故障的数量没有任何影响 - 它仍然是5.

但是这里有趣的部分是 - 如果相同的文件(或其他文件)分成20个块,然后添加10个奇偶校验块,然后在100个节点集群上共保存30个块,节点故障的可接受数量为10个。

我想在这里指出的一点是:hadoop 2中的可接受节点失败次数为ReplicationFactor-1,显然是基于复制因子。这是一个集群属性。



但在hadoop 3中,假设存储效率固定为50%,那么可接受节点故障的数量似乎不同对于不同的文件,基于它所分成的块的数量。

如果上述推论是正确的,那么任何人都可以发表评论吗?以及如何确定任何集群可接受的节点故障?



(而且我不想在上面使它变得复杂,所以没有讨论边缘情况文件只有一个模块,但是我认为这个算法足够聪明,可以按原样或者用奇偶校验数据进行复制,以保证数据的耐久性设置。)



编辑:
这个问题是我在EC - 其他如下的一系列问题的一部分 -


将您的数字用于Hadoop 2.0,每个数据块存储在3个不同的节点上。

解决方案

只要3个节点中的任何一个未读取特定块,那块数据就可以恢复。

再次使用您的数字,对于Hadoop 3.0,每个一组10个数据块和5个奇偶校验块存储在15个不同的节点上。因此,数据空间需求减少到50%的开销,但数据和奇偶校验写入的节点数量增加了5倍,从Hadoop 2.0的3个节点到Hadoop 3.0的15个节点。由于冗余基于Reed Solomon擦除纠正,因此只要15个节点中的任何10个节点都没有读取特定的一组块,那么该组块就可以恢复(一组块的最大允许故障为5个节点)。如果是20个数据块和10个奇偶校验块,则数据块和奇偶校验块分布在30个不同的节点上(一组块的最大允许故障为10个节点)。对于全集群视图,如果超过nk个节点失败,则无论节点数量如何,都可能发生故障,因为有可能出现一组数据和奇偶校验块会碰巧包含所有失败的节点。为了避免这种情况,n应该随着群集中节点的数量而增加。对于100个节点,每个集合可以是80个数据块,20个奇偶校验块(25%冗余)。注意100个节点会非常大。这个网页中的例子是14个节点RS(14,10)(每组:10个数据块,4个校验块)。

https://hadoop.apache.org/docs/r3.0.0



与数字相关,群集大小将为15(10 + 5)或30(20 + 10)个节点。 有1个块或少于k个块时,仍然需要nk个奇偶校验块来确保发生故障之前需要多于nk个节点的故障。对于Reed Solomon编码,可以通过模拟缺失块的前导零块来完成。






我认为假设节点故障率为1%。


假设节点故障率为1%。

p> 15个节点,10个数据,5个奇偶校验,每次使用comb(a,b)组合一个事物b:

恰好x节点的概率失败是:

  6 => ((.01)^ 6)((.99)^ 9)(梳子(15,6))= 4.572×10 ^ -9 
7 => ((.01)^ 7)((.99)^ 8)(comb(15,7))〜= 5.938×10 ^ -11
8 => ((.01)^ 8)((.99)^ 7)(comb(15,8))〜= 5.998×10 ^ -13
...

6个或更多故障的概率〜= 4.632×10 ^ -9

30个节点, 20个数据,10个奇偶校验

确切的x节点故障概率是:

  11 => ((.01)^ 11)((.99)^ 19)(comb(30,11))〜= 4.513×10 ^ -15 
12 => ((.01)^ 12)((.99)^ 18)(comb(30,12))〜= 7.218×10 ^ -17
13 => ((.01)^ 13)((.99)^ 17)(comb(30,13))〜= 1.010×10 ^ -18
14 => ((.01)^ 14)((.99)^ 16)(comb(30,14))〜= 1.238×10 ^ -20

11个或更多故障的概率〜= 4.586×10 ^ -15

为了表明对奇偶校验开销的需求随着节点数量,考虑100个节点,80个数据,20个参与方(25%冗余)的极端情况:

确切的x节点故障概率是: p>

  21 => ((.01)^ 21)((.99)^ 79)(comb(100,21))〜= 9.230×10 ^ -22 
22 => ((.01)^ 22)((.99)^ 78)(comb(100,22))〜= 3.348×10 ^ -23
23 => ((.01)^ 23)((.99)^ 77)(comb(100,23))〜= 1.147×10 ^ -24

21次或更多故障的概率〜= 9.577×10 ^ -22

In hadoop 2.0 the default replication factor is 3. And the number of node failures acceptable was 3-1=2.
So on a 100 node cluster if a file was divided in to say 10 parts (blocks), with replication factor of 3 the total storage blocks required are 30. And if any 3 nodes containing a block X and it's replicas failed then the file is not recoverable. Even if the cluster had 1000 nodes or the file was split in to 20 parts, failure of 3 nodes on the cluster can still be disastrous for the file.

Now stepping into hadoop 3.0.
With erasure coding, as Hadoop says it provides the same durability with 50% efficient storage. And based on how Reed-Solomon method works (that is for k data blocks and n parity blocks, at least k of the (k+n) blocks should be accessible for the file to be recoverable/readable)
So for the same file above - there are 10 data blocks and to keep the data efficiency to 50%, 5 parity blocks can be added. So from the 10+5 blocks, at least any 10 blocks should be available for the file to be accessible. And on the 100 node cluster if each of the 15 blocks are stored on a separate node, then as you can see, a total of 5 node failures is acceptable. Now storing the same file (ie 15 blocks) on a 1000 node cluster would not make any difference w.r.t the number of acceptable node failures - it's still 5.
But the interesting part here is - if the same file (or another file) was divided into 20 blocks and then 10 parity block were added, then for the total of 30 blocks to be saved on the 100 node cluster, the acceptable number of node failures is 10.

The point I want to make here is -
in hadoop 2 the number of acceptable node failures is ReplicationFactor-1 and is clearly based on the replication factor. And this is a cluster wide property.

but in hadoop 3, say if the storage efficiency was fixed to 50%, then the number of acceptable node failures seems to be different for different files based on the number of blocks it is divided in to.

So can anyone comment if the above inference is correct? And how any clusters acceptable node failures is determined?

(And I did not want to make it complex above, so did not discuss the edge case of a file with one block only. But the algorithm, I guess, will be smart enough to replicate it as is or with parity data so that the data durability settings are guaranteed.)

Edit: This question is part of a series of questions I have on EC - Others as below -
Hadoop 3.0 erasure coding: impact on MR jobs performance?

解决方案

Using your numbers for Hadoop 2.0, each block of data is stored on 3 different nodes. As long as any one of the 3 nodes has not failed to read a specific block, that block of data is recoverable.

Again using your numbers, for Hadoop 3.0, every set of 10 blocks of data and 5 blocks of parities are stored on 15 different nodes. So the data space requirement is reduced to 50% overhead, but the number of nodes the data and parities are written to have increased by a factor of 5, from 3 nodes for Hadoop 2.0 to 15 nodes for Hadoop 3.0. Since the redundancy is based on Reed Solomon erasure correction, then as long as any 10 of the 15 nodes have not failed to read a specific set of blocks, that set of blocks is recoverable (maximum allowed failure for a set of blocks is 5 nodes). If it's 20 blocks of data and 10 blocks of parities, then the data and parity blocks are distributed on 30 different nodes (maximum allowed failure for a set of blocks is 10 nodes).

For a cluster-wide view, failure can occur if more than n-k nodes fail, regardless of the number of nodes, since there's some chance that a set of data and parity blocks will happen to include all of the failing nodes. To avoid this, n should be increased along with the number of nodes in a cluster. For 100 nodes, each set could be 80 blocks of data, 20 blocks of parity (25% redundancy). Note 100 nodes would be unusually large. The example from this web page is 14 nodes RS(14,10) (for each set: 10 blocks of data, 4 blocks of parity).

https://hadoop.apache.org/docs/r3.0.0

with your numbers, the cluster size would be 15 (10+5) or 30 (20+10) nodes.

For a file with 1 block or less than k blocks, n-k parity blocks would still be needed to ensure that it takes more than n-k nodes to fail before a failure occurs. For Reed Solomon encoding, this could be done by emulating leading blocks of zeroes for the "missing" blocks.


I thought I'd add some probability versus number of nodes in a cluster.

Assume node failure rate is 1%.

15 nodes, 10 for data, 5 for parities, using comb(a,b) for combinations of a things b at a time:

Probability of exactly x node failures is:

6 => ((.01)^6) ((.99)^9) (comb(15,6)) ~= 4.572 × 10^-9
7 => ((.01)^7) ((.99)^8) (comb(15,7)) ~= 5.938 × 10^-11
8 => ((.01)^8) ((.99)^7) (comb(15,8)) ~= 5.998 × 10^-13
...

Probability of 6 or more failures ~= 4.632 × 10^-9

30 nodes, 20 for data, 10 for parities

Probability of exactly x node failures is:

11 => ((.01)^11) ((.99)^19) (comb(30,11)) ~= 4.513 × 10^-15
12 => ((.01)^12) ((.99)^18) (comb(30,12)) ~= 7.218 × 10^-17
13 => ((.01)^13) ((.99)^17) (comb(30,13)) ~= 1.010 × 10^-18
14 => ((.01)^14) ((.99)^16) (comb(30,14)) ~= 1.238 × 10^-20

Probability of 11 or more failures ~= 4.586 × 10^-15

To show that the need for parity overhead decreases with number of nodes, consider the extreme case of 100 nodes, 80 for data, 20 for parties (25% redundancy):

Probability of exactly x node failures is:

21 => ((.01)^21) ((.99)^79) (comb(100,21)) ~= 9.230 × 10^-22
22 => ((.01)^22) ((.99)^78) (comb(100,22)) ~= 3.348 × 10^-23
23 => ((.01)^23) ((.99)^77) (comb(100,23)) ~= 1.147 × 10^-24

Probability of 21 or more failures ~= 9.577 × 10^-22

这篇关于Hadoop 3.0纠删码 - 确定可接受节点故障的数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆