为什么使用java.nio.files.File :: list会导致此广度优先的文件遍历程序因“打开的文件太多"而崩溃.错误? [英] Why does usage of java.nio.files.File::list is causing this breadth-first file traversal program to crash with the "Too many open files" error?

查看:463
本文介绍了为什么使用java.nio.files.File :: list会导致此广度优先的文件遍历程序因“打开的文件太多"而崩溃.错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Stream是惰性的,因此以下语句不会将path引用的目录的整个子级加载到内存中;而是逐个加载它们,并且在每次调用forEach之后,p引用的目录都可以进行垃圾回收,因此其文件描述符也应该关闭:

Streams are lazy, hence the following statement does not load the entire children of the directory referenced by the path into memory; instead it loads them one by one, and after each invocation of forEach, the directory referenced by p is eligible for garbage collection, so its file descriptor should also become closed:

Files.list(path).forEach(p -> 
   absoluteFileNameQueue.add(
      p.toAbsolutePath().toString()
   )
);


基于此假设,我实现了广度优先的文件遍历工具:


Based on this assumption, I have implemented a breadth-first file traversal tool:

public class FileSystemTraverser {

    public void traverse(String path) throws IOException {
        traverse(Paths.get(path));
    }

    public void traverse(Path root) throws IOException {
        final Queue<String> absoluteFileNameQueue = new ArrayDeque<>();
        absoluteFileNameQueue.add(root.toAbsolutePath().toString());

        int maxSize = 0;
        int count = 0;

        while (!absoluteFileNameQueue.isEmpty()) {
            maxSize = max(maxSize, absoluteFileNameQueue.size());
            count += 1;
            Path path = Paths.get(absoluteFileNameQueue.poll());

            if (Files.isDirectory(path)) {
                Files.list(path).forEach(p ->
                        absoluteFileNameQueue.add(
                                p.toAbsolutePath().toString()
                        )
                );
            }

            if (count % 10_000 == 0) {
                System.out.println("maxSize = " + maxSize);
                System.out.println("count = " + count);
            }
        }

        System.out.println("maxSize = " + maxSize);
        System.out.println("count = " + count);
    }

}

我以一种非常简单的方式使用它:

And I use it in a fairly straightforward way:

public class App {

    public static void main(String[] args) throws IOException {
        FileSystemTraverser traverser = new FileSystemTraverser();
        traverser.traverse("/media/Backup");
    }

}

/media/Backup中安装的磁盘大约有300万个文件.

The disk mounted in /media/Backup has about 3 million files.

由于某种原因,大约140,000标记,程序由于以下堆栈跟踪而崩溃:

For some reason, around the 140,000 mark, the program crashes with this stack trace:

Exception in thread "main" java.nio.file.FileSystemException: /media/Backup/Disk Images/Library/Containers/com.apple.photos.VideoConversionService/Data/Documents: Too many open files
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
    at java.nio.file.Files.newDirectoryStream(Files.java:457)
    at java.nio.file.Files.list(Files.java:3451)

在我看来,由于某些原因,文件描述符未关闭或Path对象未进行垃圾回收,导致应用最终崩溃.

It seems to me for some reason the file descriptors are not getting closed or the Path objects are not garbage collected that causes the app to eventually crash.

  • 操作系统:是Ubuntu 15.0.4
  • 内核:4.4.0-28通用
  • ulimit:无限
  • 文件系统:btrfs
  • Java运行时:已与OpenJDK 1.8.0_91和Oracle JDK 1.8.0_91一起测试

有什么想法我在这里遗漏了什么,以及如何解决此问题(不求助于java.io.File::list(即,停留在NIO2和Path s的范围内))?

Any ideas what am I missing here and how can I fix this problem (without resorting to java.io.File::list (i.e. by staying within the ream of NIO2 and Paths)?

我怀疑JVM是否使文件描述符保持打开状态.我把这个堆转储放在了大约120,000个文件标记处:

I doubt that JVM is keeping the file descriptors open. I took this heap dump around the 120,000 files mark:

我在VisualVM中安装了一个文件描述符探测插件,实际上它表明FD并没有得到处理(如cerebrotecnologico和k5正确指出的那样):

I installed a file descriptor probing plugin in VisualVM and indeed it revealed that the FDs are not getting disposed of (as correctly pointed out by cerebrotecnologico and k5):

推荐答案

似乎没有正确关闭从Files.list(Path)返回的流.另外,您不应该在流上不使用forEach并不确定它不是并行的(因此.sequential()).

Seems like the Stream returned from Files.list(Path) is not closed correctly. In addition you should not be using forEach on a stream you are not certain it is not parallel (hence the .sequential()).

    try (Stream<Path> stream = Files.list(path)) {
        stream.map(p -> p.toAbsolutePath().toString()).sequential().forEach(absoluteFileNameQueue::add);
    }

这篇关于为什么使用java.nio.files.File :: list会导致此广度优先的文件遍历程序因“打开的文件太多"而崩溃.错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆