为什么File.exists()在多线程环境中表现异常? [英] Why is File.exists() behaving flakily in multithreaded environment?

查看:724
本文介绍了为什么File.exists()在多线程环境中表现异常?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在Java JDK 1.7下运行的批处理过程.它运行在具有RHEL 2.6.18-308.el5#1 SMP的系统上.

I have a batch process running under java JDK 1.7. It is running on a system with RHEL, 2.6.18-308.el5 #1 SMP.

此过程从数据库获取元数据对象的列表.从此元数据中提取文件的路径.此文件可能实际存在或可能不存在.

This process gets a list of metadata objects from a database. From this metadata it extracts a path to a file. This file may or may not actually exist.

该进程使用ExecutorService(Executors.newFixedThreadPool())启动多个线程.每个线程都运行一个Callable,该Callable启动一个进程,该进程读取该文件并在该输入文件存在时写入另一个文件(并记录结果),而在该文件不存在时不执行任何操作(记录该结果除外).

The process uses the ExecutorService (Executors.newFixedThreadPool()) to launch multiple threads. Each thread runs a Callable that launches a process that reads that file and writes another file if that input file exists (and logs the result) and does nothing if the file does not exist (except log that result).

我发现行为是不确定的.尽管每个文件的实际存在在整个过程中都是恒定的,但是运行此过程并不会给出一致的结果.它通常会给出正确的结果,但有时会发现确实不存在的一些文件不存在.如果我再次运行相同的过程,它将发现之前所说的文件不存在.

I find the behavior is indeterminate. Although the actual existence of the each of the files is constant throughout, running this process does not give consistent results. It usually gives correct results but occasionally finds that a few files which really do exist do not. If I run the same process again, it will find the files that it previously said did not exist.

为什么会发生这种情况,有没有另一种可靠的选择呢?在其他线程试图读取目录的同时在多线程进程中写入文件是错误的吗?较小的线程池(当前为30)会有所帮助吗?

Why might this be happening, and is there an alternative way of doing that would be more reliable? Is it a mistake to be writing files in a multithreaded process while other threads are attempting to read the directory? Would a smaller Thread Pool help (currently 30)?

更新: 这是在这种情况下工作线程调用的unix进程的实际代码:

UPDATE: Here is the actual code of the unix process called by the worker threads in this scenario:

public int convertOutputFile(String inputFile, String outputFile)
throws IOException
{
    List<String> args = new LinkedList<String>();
    args.add("sox");
    args.add(inputFile);
    args.add(outputFile);
    args.addAll(2, this.outputArguments);
    args.addAll(1, this.inputArguments);
    long pStart = System.currentTimeMillis();
    int status = -1;
    Process soxProcess = new ProcessBuilder(args).start();

    try {
        // if we don't wait for the process to complete, player won't
        // find the converted file.
        status = soxProcess.waitFor();
        if (status == 0) {
            logger.debug(String.format("SoX conversion process took %d ms.",
                    System.currentTimeMillis() - pStart));
        } else {
            logger.error("SoX conversion process returned an error status of " + status);
        }
    } catch (InterruptedException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return status;
}

更新#2:

我尝试了从java.io.File.exists()切换到java.nio.Files.exists()的实验,这似乎提供了更高的可靠性.我还没有看到多次尝试失败的情况,与之前一样,大约10%的时间都发生了这种情况.所以我想我想知道nio版本在处理基础文件系统方面是否更健壮. 此发现后来被证明是错误的. nio对您没有帮助.

I have tried the experiment of switching from java.io.File.exists() to java.nio.Files.exists() and this seems to provide more reliability. I have yet to see the failure condition over multiple attempts, where as before it occurred approximately 10% of the time. So I guess I'm looking to know whether the nio version is somehow more robust in how it handles the underlying File System. This finding was later proven false. nio is no help here.

更新#3: 经过进一步检查,我仍然发现发生了相同的故障情况.因此,改用nio并不是万能的.通过将执行程序服务的线程池大小减小到1,我获得了更好的结果.这似乎更可靠,而且这样一来,一个线程就没有机会读取目录,而另一个线程正在启​​动写入该目录的进程.目录.

UPDATE #3: Upon further review I still find the same failure condition occurring. So switching to nio is not a panacea. I've obtained better results by reducing the thread pool size of the executor service to 1. This seems to be more reliable and there is that way no chance of one thread reading the directory while another thread is launching a process that writes to the same directory.

我尚未调查的另一种可能性是,将输出文件放在与输入文件不同的目录中是否可以更好地为我服务.我将它们放在同一目录中是因为它更易于编码,但是这可能会使事情变得混乱,因为输出文件的创建会影响与输入目录扫描相同的目录.

One further possibility that I have not yet investigated is whether I would be better served by putting my output files in a different directory than the input files. I put them in the same directory because it was easier to code, but that may be confusing things, since the output file creation is affecting the same directory as the input directory scan.

更新#4: 重新编码以便将输出文件写入与输入文件(正在检查是否存在)不同的目录中不会特别有用. 唯一有用的更改是ExecutorService线程池的大小为1,换句话说,不要对该操作进行多线程处理.

UPDATE #4: Recoding so that the output files are written to a different directory than the input files (whose existence is being checked for) does not particularly help things. The only change that helps things is having an ExecutorService thread pool size of 1, in other words, not multithreading this operation.

推荐答案

您的应用程序可能已正确地是多线程的,每当您访问FileSystem时,它都有局限性. 在您的情况下,我敢打赌,太多的线程正在同时访问它,结果是FS用尽了文件句柄.文件实例无法告诉您,因为exists()不会引发Exception,因此即使目录存在,它们也只会返回false.

Your application might be properly multithreaded, whenever you are accessing the FileSystem, it has limitations. In your case, I would bet that too many threads are accessing it at the same time, with the consequence that FS runs out of file handle. File instances have no way to tell you that, as exists() do not throw Exception, so they simply return false, even if the directory exists.

这篇关于为什么File.exists()在多线程环境中表现异常?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆