Hadoop namenode内存使用情况 [英] Hadoop namenode memory usage

查看:693
本文介绍了Hadoop namenode内存使用情况的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  1. 当namenode的内存使用率高于一定的百分比时(比如说75 %),通过hadoop api读写hdfs文件将失败(例如,调用open()会抛出异常),原因是什么?有没有人有同样的事情?
    PS。这次namenode的磁盘io不高,CPU相对闲置。


  2. 什么决定了namenode'QPS(每秒查询次数)?


非常感谢!

解决方案

由于namenode基本上只是一个RPC服务器, c> HashMap 与块,你有两个主要的内存问题:
$ b


  1. Java HashMap 非常昂贵,它的冲突解决方案(单独的链接算法)也很昂贵,因为它将碰撞的元素存储在链表中。

  2. RPC服务器需要线程来处理请求 - Hadoop附带自己的RPC框架,您可以使用 dfs.namenode.service.handler.count 对datanodes进行配置。默认设置为10.或者,您可以为其他客户端配置 dfs.namenode.handler.count ,如MapReduce作业 JobClients 来运行作业。当请求进入并且它想要创建一个新的处理程序时,它可能会出现内存不足(新的线程也分配了一大堆栈空间,也许你需要增加这个空间)。

所以这些就是为什么你的namenode需要这么多内存的原因。

 什么决定了namenode'QPS(每秒查询)? 

我还没有对它进行基准测试,所以我不能给你很好的建议。当然,微调处理程序计数要高于可并行运行的任务数+推测执行。
根据您提交工作的方式,您还必须对其他财产进行微调。



当然,你应该给namenode足够的内存空间,所以它有足够的空间来完成垃圾回收循环。


I am confused by hadoop namenode memory problem.

  1. when namenode memory usage is higher than a certain percentage (say 75%), reading and writing hdfs files through hadoop api will fail (for example, call some open() will throw exception), what is the reason? Does anyone has the same thing? PS.This time the namenode disk io is not high, the CPU is relatively idle.

  2. what determines namenode'QPS (Query Per Second) ?

Thanks very much!

解决方案

Since the namenode is basically just a RPC Server managing a HashMap with the blocks, you have two major memory problems:

  1. Java HashMap is quite costly, its collision resolution (seperate chaining algorithm) is costly as well, because it stores the collided elements in a linked list.
  2. The RPC Server needs threads to handle requests- Hadoop ships with his own RPC framework and you can configure this with the dfs.namenode.service.handler.count for the datanodes it is default set to 10. Or you can configure this dfs.namenode.handler.count for other clients, like MapReduce jobs, JobClients that want to run a job. When a request comes in and it want to create a new handler, it go may out of memory (new Threads are also allocating a good chunk of stack space, maybe you need to increase this).

So these are the reasons why your namenode needs so much memory.

What determines namenode'QPS (Query Per Second) ?

I haven't benchmarked it yet, so I can't give you very good tips on that. Certainly fine tuning the handler counts higher than the number of tasks that can be run in parallel + speculative execution. Depending on how you submit your jobs, you have to fine tune the other property as well.

Of course you should give the namenode always enough memory, so it has headroom to not fall into full garbage collection cycles.

这篇关于Hadoop namenode内存使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆