如何与HDFS追加工程 [英] How does HDFS with append works

查看:189
本文介绍了如何与HDFS追加工程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们假设一种是采用默认块大小(128 MB),并有使用130 MB的文件;因此,使用一个完整大小的块和2 MB一个街区。然后20 MB需要被附加到文件(总应该是现在的150 MB)。会发生什么?

Let's assume one is using default block size (128 MB), and there is a file using 130 MB ; so using one full size block and one block with 2 MB. Then 20 MB needs to be appended to the file (total should be now of 150 MB). What happens?

是否HDFS实际从2MB调整最后的块的大小,以22MB?或创建新的块?

Does HDFS actually resize the size of the last block from 2MB to 22MB? Or create a new block?

如何追加到HDFS处理一个文件conccurency?
是否有dataloss的风险?

How does appending to a file in HDFS deal with conccurency? Is there risk of dataloss ?

HDFS是否建立第三块放20 + 2 MB在里面,并删除与2MB块。如果是,如何同时工作的呢?

Does HDFS create a third block put the 20+2 MB in it, and delete the block with 2MB. If yes, how does this work concurrently?

推荐答案

据该的最新的设计文档中的 JIRA问题 =htt​​p://stackoverflow.com/a/13377076/1959155> 前面所提到的,我们找到了如下回答你的问题:

According to the latest design document in the Jira issue mentioned before, we find the following answers to your question:


  1. HDFS会的附加到的最后一块,的的创建一个新的块,并从旧的最后一个块中的数据复制。这并不困难,因为HDFS只使用一个正常的文件系统写这些块文件作为正常文件。普通文件系统对追加新数据的机制。当然,如果您填写的最后一块,您将创建一个新的块。

  2. 仅一个单独的写或附加到任何文件被允许在HDFS的同时,因此没有并发处理。这是通过名称节点进行管理。您需要关闭文件,如果你想让别人来开始写吧。

  3. 如果在一个文件中的最后一个块是不可复制的,附加将失败。追加写入到单个副本,谁它管线到副本,类似于正常写入。在我看来,相比于正常的写像有dataloss不需要支付额外的风险。

  1. HDFS will append to the last block, not create a new block and copy the data from the old last block. This is not difficult because HDFS just uses a normal filesystem to write these block-files as normal files. Normal file systems have mechanisms for appending new data. Of course, if you fill up the last block, you will create a new block.
  2. Only one single write or append to any file is allowed at the same time in HDFS, so there is no concurrency to handle. This is managed by the namenode. You need to close a file if you want someone else to begin writing to it.
  3. If the last block in a file is not replicated, the append will fail. The append is written to a single replica, who pipelines it to the replicas, similar to a normal write. It seems to me like there is no extra risk of dataloss as compared to a normal write.

这篇关于如何与HDFS追加工程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆