HDFS复制 - 数据存储 [英] HDFS Replication - Data Stored

查看:115
本文介绍了HDFS复制 - 数据存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



假设我有一个10节点的系统(1TB每个节点),给我总容量10TB。如果我有一个复制因子3,那么我有每个文件1个原始副本和3个副本。所以,实际上,我的存储中只有25%是原始数据。因此,我的10TB集群实际上只有2.5TB的原始数据(未复制)。



请让我知道我的思路是否正确。

解决方案

你的想法有点偏离。复制因子为3意味着您有3个数据副本 总共 。更具体地说,文件的每个块将有3个副本,所以如果您的文件由10个块组成,则10个节点中将有30个块,或者每个节点大约3个块。



你认为10x1TB集群的容量小于10TB是正确的 - 复制因子为3,实际上它的功能容量约为3.3TB,由于空间实际容量稍少需要做任何处理,持有临时文件等。


I am a relative newbie to hadoop and want to get a better understanding of how replication works in HDFS.

Say that I have a 10 node system(1 TB each node), giving me a total capacity of 10 TB. If I have a replication factor of 3, then I have 1 original copy and 3 replicas for each file. So, in essence, only 25% of my storage is original data. So my 10 TB cluster is in effect only 2.5 TB of original(un-replicated) data.

Please let me know if my train of thought is correct.

解决方案

Your thinking is a little off. A replication factor of 3 means that you have 3 total copies of your data. More specifically, there will be 3 copies of each block for your file, so if your file is made up of 10 blocks there will be 30 total blocks across your 10 nodes, or about 3 blocks per node.

You are correct in thinking that a 10x1TB cluster has less than 10TB capacity- with a replication factor of 3, it actually has a functional capacity of about 3.3TB, with a little less actual capacity because of space needed for doing any processing, holding temporary files, etc.

这篇关于HDFS复制 - 数据存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆