首页
分布式计算/Hadoop
Hadoop文件写入

Hadoop文件写入 [英] Hadoop file write

查看：160 发布时间：2018/5/31 18:32:01 hadoop

本文介绍了Hadoop文件写入的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

引用Tom White的书Hadoop权威指南.....
我的问题（假设复制因子3和数据被写入节点D1，D2，D3）
如果我理解正确，到第一个位置D1本身是失败的....整个过程与新的管道启动...
如果写入第二个节点D2失败了怎么办？ ..据说ack队列中的任何数据包都被添加到数据队列的前面，这样，发生故障的节点下游的datanode不会错过任何数据包，并且良好数据节点上的当前块被赋予一个新的身份证明
我不清楚这一点...
$ b $ ol

获得新身份证书

谁给这个新身份

为什么需要它？

解决方案

两个回答你的问题，我想强调一点。由 客户端 启动了读取或写入 （ HDFS客户端）。

查看此图表。

回到您的查询，

ack队列中的任何数据包都会添加到数据队列的前端，以便故障节点下游的datanode不会错过任何数据包

在此行之后，您可以在下面找到

好的datanode会被赋予一个新的身份，并传递给namenode，这样如果失败的datanode稍后恢复，那么失败的datanode上的部分块将被删除。失败的datanode被从流水线中移除，并且从
这两个良好的datanode构建了一个新的管道。

以上点将回答您的第一个查询：1. 阻止获取新身份

谁给了这个新的身份： 即使它不是显式的，我们可以得出结论 HDFSClient 负责提供新的身份并通知NameNode有关新身份。

/ ol>

由于只有部分数据写在有问题的datanode上，所以我们必须完全删除这部分数据。本书中的下一行内容解释了这一点。

好的datanode上的当前块被赋予一个新的标识，到namenode，以便失败datanode上的部分块将被删除，如果失败的datanode后来恢复。

Referring to Tom White's book Hadoop definitive guide ..... My question (assuming replication factor 3 and data being written to node D1,D2,D3) If I understand correctly, if writing to the first location D1 itself is failed.... whole process with new pipeline is initiated... What if writing to second node D2 is getting failed ? .. it is said that "any packets in the ack queue are added to the front of the data queue so that datanodes that are downstream from the failed node will not miss any packets" and The current block on the good datanodes is given a new identity.. I am not clear on this point...

Block getting new identity

Who gives this new identity

why is it needed ?

解决方案
Two answer your question, I would like to highlight one point. Either read or write operations have been initiated by Client ( HDFS Client).

Have a look at this diagram.

In entire process, client is either reading/writing from/to data nodes directly and not through NameNode. NameNode will just send the list of datanodes to be contacted for read or write operation.

Coming back to your query,

"any packets in the ack queue are added to the front of the data queue so that datanodes that are downstream from the failed node will not miss any packets"

After this line, you can find below line

The current block on the good datanodes is given a new identity, which is communicated to the namenode, so that the partial block on the failed datanode will be deleted if the failed datanode recovers later on. The failed datanode is removed from the pipeline, and a new pipeline is constructed from the two good datanodes.

The above point will answer your first query : 1. Block getting new identity

Who gives this new identity: Even though it's not explicit, we can conclude that HDFSClient is responsbile to provide new identity and inform the NameNode about new identity.

Why is it needed ?

Since only partial data is written on problematic datanode, we have to remove this block of data completely. Same was explained in next set of lines in the book.

The current block on the good datanodes is given a new identity, which is communicated to the namenode, so that the partial block on the failed datanode will be deleted if the failed datanode recovers later on.

这篇关于Hadoop文件写入的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

相关文章

 Hadoop：HDFS文件写入&读;

在Hadoop中写入HDFS中的文件;

写入hadoop中的多个文件夹？;

Hadoop HDFS：读取正在写入的序列文件;

将数据写入Hadoop;

将数据写入 Hadoop;

将输出写入不同的文件夹hadoop;

将输出写入不同的文件夹 hadoop;

hadoop fs -text文件返回“text：无法写入输出流”。;

Hadoop 2.0数据写入操作确认;

Hadoop - 直接从Mapper写入HBase;

Hadoop - 直接从 Mapper 写入 HBase;

Hadoop 2.0 数据写入操作确认;

Flink 能否将结果写入多个文件(如 Hadoop 的 MultipleOutputFormat)?;

如何使用 Ruby 在 Hadoop HDFS 中写入和读取文件?;

Hadoop HDFS以编程方式写入操作;

Flink可以将结果写入多个文件(例如Hadoop的MultipleOutputFormat)吗?;

Hadoop for JSON文件;

如何使用hadoop office库在apache spark java中将Dataset写入excel文件;

使用Hadoop在datanode上写入临时文件时遇到困难;

有什么方法可以检查Hadoop文件是否已经打开以进行写入?;

解密Hadoop Snappy文件;

Pentaho Hadoop文件输入;

更新hadoop HDFS文件;

hadoop：配置文件;

分布式计算/Hadoop最新文章

SearchPhaseExecutionException [无法执行phase [query]，所有分片失败];

如何计算Hive中两个数组的交集和联合？;

Elasticsearch：执行精确搜索，其中查询包含特殊字符，如'＃';

找不到hadoop安装：必须设置$ HADOOP_HOME或hadoop必须位于路径中;

Hive错误：parseexception缺少EOF;

如何从HIVE中的日期中减去几个月;

在ambari hadoop安装过程中，许可被拒绝（publickey，gssapi-keyex，gssapi-with-mic，密码）;

从kibana导出到csv / excel;

检索ElasticSearch中所有_id的高效方法;

不正确的配置：namenode地址dfs.namenode.rpc-address未配置;

热门教程

Java教程

Apache ANT 教程

Kali Linux教程

JavaScript教程

JavaFx教程

MFC 教程

Apache HTTP客户端教程

Microsoft Visio 教程

热门工具

Java 在线工具

C(GCC) 在线工具

PHP 在线工具

C# 在线工具

Python 在线工具

MySQL 在线工具

VB.NET 在线工具

Lua 在线工具

Oracle 在线工具

C++(GCC) 在线工具

Go 在线工具

Fortran 在线工具

登录关闭

扫码关注1秒登录

发送“验证码”获取 | 15天全站免登陆

友情链接： IT屋 Chrome插件谷歌浏览器插件

IT屋 ©2016-2022 琼ICP备2021000895号-1 站点地图站点标签 SiteMap <免责申明> 本站内容来源互联网,如果侵犯您的权益请联系我们删除.