访问正在写入的文件 [英] Accessing a file that is being written

查看:220
本文介绍了访问正在写入的文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 使用hadoop fs -put命令可以使用64 MB的HDFS块大小写入300 MB文件。在此命令写完200 MB文件后,在尝试访问此文件时,其他用户会看到什么内容? 

a。)他们会在他们尝试访问此文件时看到Hadoop抛出ConcurrentFileAccessException。
b)。他们会看到文件的当前状态,直到命令写入的最后一位。
c。)他们会通过最后完成的块看到文件的当前状态。
d)。在整个文件写入并关闭之前,他们将看不到任何内容。

从我所了解的 hadoop fs -put 命令的答案是D,但有人说这是C。



任何人都可以为任一选项提供建设性解释吗?



感谢xx

解决方案

在整个文件被写入之前,文件不可访问的原因并关闭(选项D)是因为为了访问文件,请求首先被发送到NameNode,以获得与组成文件的不同块有关的元数据。这个元数据只有在收到确认所有文件块写入成功的确认后,才由NameNode写入。



因此,即使块可用,用户也可以在更新元数据之前不会看到文件,这是在写入所有块之后完成的。

You use the hadoop fs –put command to write a 300 MB file using and HDFS block size of 64 MB. Just after this command has finished writing 200 MB of this file, what would another user see when trying to access this file?

a.) They would see Hadoop throw an ConcurrentFileAccessException when they try to access this file.
b.) They would see the current state of the file, up to the last bit written by the command.
c.) They would see the current of the file through the last completed block.
d.) They would see no content until the whole file written and closed.

From what I understand about the hadoop fs -put command the answer is D, however some say it is C.

Could anyone provide a constructive explanation for either of the options?

Thanks xx

解决方案

The reason why the the file will not be accessible until the whole file is written and closed (option D) is because, in order to access a file, the request is first sent to the NameNode, to obtain metadata relating to the different blocks that compose the file. This metadata will be written by the NameNode only after it receives confirmation that all blocks of the file were written successfully.

Therefore, even though the blocks are available, the user can't see the file until the metadata is updated, which is done after all blocks are written.

这篇关于访问正在写入的文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆