HDFS复制 - 数据存储 [英] HDFS Replication - Data Stored
问题描述
假设我有一个10节点的系统(1TB每个节点),给我总容量10TB。如果我有一个复制因子3,那么我有每个文件1个原始副本和3个副本。所以,实际上,我的存储中只有25%是原始数据。因此,我的10TB集群实际上只有2.5TB的原始数据(未复制)。
请让我知道我的思路是否正确。
你的想法有点偏离。复制因子为3意味着您有3个数据副本 总共 。更具体地说,文件的每个块将有3个副本,所以如果您的文件由10个块组成,则10个节点中将有30个块,或者每个节点大约3个块。
你认为10x1TB集群的容量小于10TB是正确的 - 复制因子为3,实际上它的功能容量约为3.3TB,由于空间实际容量稍少需要做任何处理,持有临时文件等。
I am a relative newbie to hadoop and want to get a better understanding of how replication works in HDFS.
Say that I have a 10 node system(1 TB each node), giving me a total capacity of 10 TB. If I have a replication factor of 3, then I have 1 original copy and 3 replicas for each file. So, in essence, only 25% of my storage is original data. So my 10 TB cluster is in effect only 2.5 TB of original(un-replicated) data.
Please let me know if my train of thought is correct.
Your thinking is a little off. A replication factor of 3 means that you have 3 total copies of your data. More specifically, there will be 3 copies of each block for your file, so if your file is made up of 10 blocks there will be 30 total blocks across your 10 nodes, or about 3 blocks per node.
You are correct in thinking that a 10x1TB cluster has less than 10TB capacity- with a replication factor of 3, it actually has a functional capacity of about 3.3TB, with a little less actual capacity because of space needed for doing any processing, holding temporary files, etc.
这篇关于HDFS复制 - 数据存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!