混淆了Hadoop namenode内存使用情况 [英] Confusion over Hadoop namenode memory usage

查看:397
本文介绍了混淆了Hadoop namenode内存使用情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对Hadoop namenode内存计算有些疑惑。它在Hadoop书籍(定义指南)中被提及为

由于namenode在内存中保存文件系统元数据,文件系统中文件数量的限制是由namenode上的内存数量决定的作为一个经验法则,每个文件,目录和块大约需要150个字节,例如,如果你有一百万个文件,每个占用一个block,你至少需要300 MB的内存,虽然存储数百万个文件是可行的,但数十亿是超出当前硬件的能力。



从每个占用一个block,namenode的最小内存应该是150MB,而不是300MB。请帮我理解为什么它是300MB?

解决方案

我猜你读了汤姆怀特的书的第二版。我有第三版,并且对这篇文章的引用 Hadoop分布式文件系统的可扩展性。进入后,我读了下一句:


估计表明名称节点使用少于200个字节来存储单个元数据对象(文件inode或块)

HDFS NameNode中的文件是:文件inode +一个块。每个引用都有150个字节。 1.000.000个文件= 1.000.000个inodes + 1.000.000个块参考(在本例中,每个文件占用1个块)。

<2.000.000 * 150字节〜= 300Mb

我把链接提供给你,可以验证我是否提交我的论证中有一个错误。


I have a silly doubt on Hadoop namenode memory calculation.It is mentioned in Hadoop book (Definite guide) as

"Since the namenode holds filesystem metadata in memory, the limit to the number of files in a filesystem is governed by the amount of memory on the namenode. As a rule of thumb, each file, directory, and block takes about 150 bytes. So, for example, if you had one million files, each taking one block, you would need at least 300 MB of memory. While storing millions of files is feasible, billions is beyond the capability of current hardware."

Since each taking one block, namenode minimum memory should be 150MB and not 300MB.Please help me to understand why it is 300MB

解决方案

I guess you read the second edition of Tom White's book. I have the third edition, and this reference to a post Scalability of the Hadoop Distributed File System. Into the post, I read the next sentence:

Estimates show that the name-node uses less than 200 bytes to store a single metadata object (a file inode or a block).

A file in HDFS NameNode is: A file inode + a block. Each reference to both have 150 bytes. 1.000.000 of files = 1.000.000 inodes + 1.000.000 block reference (In the example, each file occupied 1 block).

2.000.000 * 150 bytes ~= 300Mb

I put the link for you can verify if I commit a mistake in my argumentation.

这篇关于混淆了Hadoop namenode内存使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆