Hadoop Datanode，namenode，secondary-namenode，job-tracker和task-tracker [英] Hadoop Datanode, namenode, secondary-namenode, job-tracker and task-tracker

查看：124 发布时间：2018/5/31 19:04:12 hadoop

本文介绍了Hadoop Datanode，namenode，secondary-namenode，job-tracker和task-tracker的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是hadoop的新手，所以我有些怀疑。如果主节点发生hadoop集群发生的故障？我们可以恢复该节点而不会有任何损失当当前主节点出现故障时，是否有可能让辅助主节点自动切换到主节点？

我们有namenode（Secondary namenode）的备份，所以我们可以在Secondary Namenode失败时恢复namenode。像这样，当datanode失败时，我们如何才能恢复数据节点中的数据？辅助namenode只是namenode的备份而不是datenode，对吧？如果一个节点在作业完成之前失败，那么在作业跟踪器中有待处理的作业，那么作业是继续还是从空闲节点中的第一个重新开始？

如果发生任何事情，我们如何恢复整个群集数据？

我的最后一个问题是，我们可以在Mapreduce中使用C程序吗（例如，在mapreduce中使用Bubble排序）？

提前致谢

解决方案

尽管回答你的问题已经太晚了，但它可能会帮助其他人。。 首先让我们介绍一下次要名称节点：

它包含名称空间图像，编辑日志文件备份过去的
小时（可配置）。其工作是将最新的名称节点
NameSpaceImage和编辑日志文件合并到名称节点，作为
替换旧的名称节点。要在群集中创建辅助NN不是
必需的。

现在谈谈您的担忧。

如果主节点发生hadoop集群发生故障

支持Frail的回答，是hadoop单点故障
整个当前正在运行的任务（如Map-Reduce）或任何其他
正在使用失败的主节点将停止。

Can我们恢复该节点没有任何损失？

是假设的，没有损失，它是最不可能的，因为所有的
数据（块报告）将会丢失，这是由数据节点发送到名称
节点，最后备份由次名节点。为什么我至少提到了
，因为如果名称节点在成功备份后通过次级名称节点运行
失败，则它处于安全状态。

在当前主节点发生故障时，是否可以让辅助主节点自动切换到主节点？

管理员（用户）可以使用staright。如果要自动切换
，则必须从集群中编写一个本地代码，编码
以监控将聪明地配置辅助名称节点
的集群，并使用新名称节点重新启动集群地址。

我们有namenode（Secondary namenode）的备份，所以我们可以在失败时从Secondary namenode恢复namenode。像这样，当数据节点出现故障时，我们如何才能恢复数据节点中的数据？
/ b>
$ b

关于复制因素，我们有3个（默认作为最佳实践，每个文件块的
可配置）副本全部位于不同的数据节点中。
因此，如果出现故障，我们有2个备份数据节点。
稍后的名称节点将创建一个包含失败
数据节点的数据副本。

第二个名称节点仅仅是namenode的备份而不是datenode，对吧？

对。它只包含数据节点的所有元数据，如数据节点
地址，属性包括每个数据节点的块报告。

如果某个节点在作业完成之前发生故障，那么作业跟踪器中有待处理的作业，该作业是继续还是从空闲节点中的第一个重新开始？

HDFS将强制尝试继续这项工作。但它又取决于
复制因子，机架感知和其他配置由
管理。但是，如果遵循Hadoop关于HDFS的最佳实践，那么
不会失败。 JobTracker将复制节点地址为
continuousnue。

如果发生任何事情，我们可以恢复整个群集数据吗？

重新启动它。

我的最后一个问题是，我们可以使用C程序Mapreduce（例如，mapreduce中的Bubble排序）？b

是，您可以使用任何支持标准文件
读取写入操作的编程语言。

我只是试了一下。希望它能帮助你和其他人。

*欢迎提出建议/改进。 *

I am new in hadoop so I have some doubts. If the master-node fails what happened the hadoop cluster? Can we recover that node without any loss? Is it possible to keep a secondary master-node to switch automatically to the master when the current one fails?

We have the backup of the namenode (Secondary namenode), so we can restore the namenode from Secondary namenode when it fails. Like this, How can we restore the data's in datanode when the datanode fails? The secondary namenode is the backup of namenode only not to datenode, right? If a node is failed before completion of a job, so there is job pending in job tracker, is that job continue or restart from the first in the free node?

How can we restore the entire cluster data if anything happens?

And my final question, can we use C program in Mapreduce (For example, Bubble sort in mapreduce)?

Thanks in advance
解决方案
Although, It is too late to answer your question but just It may help others..

First of all let me Introduce you with Secondary Name Node:

It Contains the name space image, edit log files' back up for past one hour (configurable). And its work is to merge latest Name Node NameSpaceImage and edit logs files to upload back to Name Node as replacement of the old one. To have a Secondary NN in a cluster is not mandatory.

Now coming to your concerns..

If the master-node fails what happened the hadoop cluster?

Supporting Frail's answer, Yes hadoop has single point of failure so whole of your currently running task like Map-Reduce or any other that is using the failed master node will stop. The whole cluster including client will stop working.

Can we recover that node without any loss?

That is hypothetical, Without loss it is least possible, as all the data (block reports) will lost which has sent by Data nodes to Name node after last back up taken by secondary name node. Why I mentioned least, because If name node fails just after a successful back up run by secondary name node then it is in safe state.

Is it possible to keep a secondary master-node to switch automatically to the master when the current one fails?

It is staright possible by an Administrator (User). And to switch it automatically you have to write a native code out of the cluster, Code to moniter the cluster that will cofigure the secondary name node smartly and restart the cluster with new name node address.

We have the backup of the namenode (Secondary namenode), so we can restore the namenode from Secondary namenode when it fails. Like this, How can we restore the data's in datanode when the datanode fails?

It is about replication factor, We have 3 (default as best practice, configurable) replicas of each file block all in different data nodes. So in case of failure for time being we have 2 back up data nodes. Later Name node will create one more replica of the data that failed data node contained.

The secondary namenode is the backup of namenode only not to datenode, right?

Right. It just contains all the metadata of data nodes like data node address,properties including block report of each data node.

If a node is failed before completion of a job, so there is job pending in job tracker, is that job continue or restart from the first in the free node?

HDFS will forcely try to continue the job. But again it depends on replication factor, rack awareness and other configuration made by admin. But if following Hadoop's best practices about HDFS then it will not get failed. JobTracker will get replicated node address to continnue.

How can we restore the entire cluster data if anything happens?

By Restarting it.

And my final question, can we use C program in Mapreduce (For example, Bubble sort in mapreduce)?

yes, you can use any programming language which support Standard file read write operations.

I Just gave a try. Hope it will help you as well as others.

*Suggestions/Improvements are welcome.*

这篇关于Hadoop Datanode，namenode，secondary-namenode，job-tracker和task-tracker的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hadoop Datanode，namenode，secondary-namenode，job-tracker和task-tracker [英] Hadoop Datanode, namenode, secondary-namenode, job-tracker and task-tracker

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Hadoop Datanode，namenode，secondary-namenode，job-tracker和task-tracker [英] Hadoop Datanode, namenode, secondary-namenode, job-tracker and task-tracker

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭