HDFS复制 - 数据存储 [英] HDFS Replication - Data Stored

查看：115 发布时间：2018/5/31 18:50:55 hadoop hdfs

本文介绍了HDFS复制 - 数据存储的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个10节点的系统（1TB每个节点），给我总容量10TB。如果我有一个复制因子3，那么我有每个文件1个原始副本和3个副本。所以，实际上，我的存储中只有25％是原始数据。因此，我的10TB集群实际上只有2.5TB的原始数据（未复制）。

请让我知道我的思路是否正确。

解决方案

你的想法有点偏离。复制因子为3意味着您有3个数据副本总共。更具体地说，文件的每个块将有3个副本，所以如果您的文件由10个块组成，则10个节点中将有30个块，或者每个节点大约3个块。

你认为10x1TB集群的容量小于10TB是正确的 - 复制因子为3，实际上它的功能容量约为3.3TB，由于空间实际容量稍少需要做任何处理，持有临时文件等。

I am a relative newbie to hadoop and want to get a better understanding of how replication works in HDFS.

Say that I have a 10 node system(1 TB each node), giving me a total capacity of 10 TB. If I have a replication factor of 3, then I have 1 original copy and 3 replicas for each file. So, in essence, only 25% of my storage is original data. So my 10 TB cluster is in effect only 2.5 TB of original(un-replicated) data.

Please let me know if my train of thought is correct.

解决方案

Your thinking is a little off. A replication factor of 3 means that you have 3 total copies of your data. More specifically, there will be 3 copies of each block for your file, so if your file is made up of 10 blocks there will be 30 total blocks across your 10 nodes, or about 3 blocks per node.

You are correct in thinking that a 10x1TB cluster has less than 10TB capacity- with a replication factor of 3, it actually has a functional capacity of about 3.3TB, with a little less actual capacity because of space needed for doing any processing, holding temporary files, etc.

这篇关于HDFS复制 - 数据存储的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

HDFS复制 - 数据存储 [英] HDFS Replication - Data Stored

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

HDFS复制 - 数据存储 [英] HDFS Replication - Data Stored

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭