带有附加功能的 HDFS 如何工作 [英] How does HDFS with append works

查看:24
本文介绍了带有附加功能的 HDFS 如何工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们假设一个使用默认块大小(128 MB),并且有一个使用 130 MB 的文件;所以使用一个完整大小的块和一个 2 MB 的块.然后需要将 20 MB 附加到文件中(现在总共应该是 150 MB).会发生什么?

Let's assume one is using default block size (128 MB), and there is a file using 130 MB ; so using one full size block and one block with 2 MB. Then 20 MB needs to be appended to the file (total should be now of 150 MB). What happens?

HDFS 是否真的将最后一个块的大小从 2MB 调整为 22MB?或者创建一个新块?

Does HDFS actually resize the size of the last block from 2MB to 22MB? Or create a new block?

追加到 HDFS 中的文件如何处理并发?是否有数据丢失的风险?

How does appending to a file in HDFS deal with conccurency? Is there risk of dataloss ?

HDFS 是否创建了第三个块,将 20+2 MB 放入其中,并删除了 2MB 的块.如果是,这是如何同时工作的?

Does HDFS create a third block put the 20+2 MB in it, and delete the block with 2MB. If yes, how does this work concurrently?

推荐答案

根据 Jira issue 中的 noreferrer">最新设计文档 之前提到过,我们为您的问题找到以下答案:

According to the latest design document in the Jira issue mentioned before, we find the following answers to your question:

  1. HDFS 将追加到最后一个块,创建一个新块并从旧的最后一个块复制数据.这并不困难,因为 HDFS 只是使用普通文件系统将这些块文件作为普通文件写入.普通文件系统具有附加新数据的机制.当然,如果你填满了最后一个区块,你就会创建一个新区块.
  2. 在 HDFS 中只允许一次写入或附加到任何文件,因此没有并发处理.这是由名称节点管理的.如果您希望其他人开始写入文件,则需要关闭文件.
  3. 如果文件中的最后一个块没有被复制,追加将失败.append 写入单个副本,副本将其通过管道传输到副本,类似于正常写入.在我看来,与正常写入相比,没有额外的数据丢失风险.
  1. HDFS will append to the last block, not create a new block and copy the data from the old last block. This is not difficult because HDFS just uses a normal filesystem to write these block-files as normal files. Normal file systems have mechanisms for appending new data. Of course, if you fill up the last block, you will create a new block.
  2. Only one single write or append to any file is allowed at the same time in HDFS, so there is no concurrency to handle. This is managed by the namenode. You need to close a file if you want someone else to begin writing to it.
  3. If the last block in a file is not replicated, the append will fail. The append is written to a single replica, who pipelines it to the replicas, similar to a normal write. It seems to me like there is no extra risk of dataloss as compared to a normal write.

这篇关于带有附加功能的 HDFS 如何工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆