只有在完全写入和关闭后才能从HDFS读取文件 [英] reading a file from HDFS only after it is fully written and closed

查看:485
本文介绍了只有在完全写入和关闭后才能从HDFS读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个进程在运行。一个是将文件写入HDFS,另一个是加载这些文件。

I have two processes running. One is writing files to an HDFS and the other is loading those files.

第一个进程(写入文件的进程)正在使用:

The first process (The one that writes the file) is using:

private void writeFileToHdfs(byte[] sourceStream, Path outFilePath) {
FSDataOutputStream out = null;
try {
    // create the file
    out = getFileSystem().create(outFilePath);
    out.write(sourceStream);
} catch (Exception e) {
    LOG.error("Error while trying to write a file to hdfs", e);
} finally {
    try {
    if (null != out)
        out.close();
    } catch (IOException e) {
    LOG.error("Could not close output stream to hdfs", e);
    }
}
}

第二个进程读取这些文件进一步处理。
创建文件时,首先创建文件,然后填充内容。此过程需要时间(几毫秒,但仍然),在此期间,第二个进程可能会在完全写入和关闭之前获取文件。

The second process reads those files for further processing. When creating a file, it is first created and then populated with content. This process takes time (a few milliseconds, but still) and during this time the second process may pick up the file before it is fully written and closed.

请注意HDFS不会在namenode中保持锁定信息 - 所以没有守护进程可以在访问之前检查文件是否被锁定。

Notice that HDFS does not keep locking info in the namenode - so there is no daemon out there that can check if the file is locked before accessing it.

我想知道什么是最好的方式解决此问题。

I wonder what is the best way to resolve this issue.

以下是我的想法:


  1. 将文件复制到新文件夹一次它们是完全编写和关闭的,然后第二个进程
    将从这个新文件夹中读取。

  2. 一旦文件完全写入并关闭,则根据某些命名约定重命名文件。第二个流程
    将根据此命名约定读取。

我有一种感觉我正在努力解决一个众所周知的问题问题,我错过了一些东西。这样的问题是否有最佳实践?

I have a feeling I'm trying to solve a well know problem and I'm missing something out. Is there a best practice for such a problem?

推荐答案

Apache commons有一些东西。只需触摸该文件,错误将告诉您它是否已被锁定。

The Apache commons has some stuff for that. Just touch the file and an error will tell you if it's already locked.

import org.apache.commons.io.*

boolean fileAvail = false;

try {
    FileUtils.touch(fileName); //throws IOException if being used
    fileAvail = true;
} catch (IOException e) {
    fileAvail = false;
}

(也)尝试使用资源

在Java 7中,您可以在实现 Closable 的任何内容上使用此功能,例如文件,套接字和数据库连接,它将自动关闭一旦try块的范围通过这样做结束

In Java 7 you can use this functionality on anything that implements Closable like files, sockets and database connections where it will auto close as soon as the scope of the try block is ended by doing this

 try (FSDataOutputStream out = getFileSystem().create(outFilePath))
 {
   //use out in here
 }
 //No finally required - catch is optional

...保存所有额外代码

...saves all that extra code

这篇关于只有在完全写入和关闭后才能从HDFS读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆