为什么 java DirectoryStream 执行得这么慢? [英] Why does the java DirectoryStream perform so slow?

查看:38
本文介绍了为什么 java DirectoryStream 执行得这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用 nio 包的 DirectoryStreams 对 Streams 进行了一些测试.我只是尝试获取按上次修改日期和大小排序的目录中所有文件的列表.

I've done some testing with Streams in special with DirectoryStreams of the nio-package. I simply try to get a list of all files in a directory sorted by last modified date and size.

旧 File.listFiles() 的 JavaDoc 声明了一个 注意文件s中的方法:

The JavaDoc of old File.listFiles() stated a Note to the method in Files:

注意 Files 类定义了 newDirectoryStream 方法来打开一个目录并遍历目录中的文件名目录.在处理非常大的文件时,这可能会使用较少的资源目录.

Note that the Files class defines the newDirectoryStream method to open a directory and iterate over the names of the files in the directory. This may use less resources when working with very large directories.

我在下面多次运行代码(下面的前三遍):

I run the code down below a lot of times (first three times below):

首次运行:

Run time of Arrays.sort: 1516
Run time of Stream.sorted as Array: 2912
Run time of Stream.sorted as List: 2875

第二次运行:

Run time of Arrays.sort: 1557
Run time of Stream.sorted as Array: 2978
Run time of Stream.sorted as List: 2937

第三次运行:

Run time of Arrays.sort: 1563
Run time of Stream.sorted as Array: 2919
Run time of Stream.sorted as List: 2896

我的问题是:为什么流的表现如此糟糕?

My question is: Why do the streams perform so bad?

import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class FileSorter {

  // This sorts from old to young and from big to small
  Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
    int sorter = 0;
    try {
      FileTime lm1 = Files.getLastModifiedTime(o1);
      FileTime lm2 = Files.getLastModifiedTime(o2);
      if (lm2.compareTo(lm1) == 0) {
        Long s1 = Files.size(o1);
        Long s2 = Files.size(o2);
        sorter = s2.compareTo(s1);
      } else {
        sorter = lm1.compareTo(lm2);
      }
    } catch (IOException ex) {
      throw new UncheckedIOException(ex);
    }
    return sorter;
  };

  public String[] getSortedFileListAsArray(Path dir) throws IOException {
    Stream<Path> stream = Files.list(dir);
    return stream.sorted(timeSizeComparator).
            map(Path::getFileName).map(Path::toString).toArray(String[]::new);
  }

  public List<String> getSortedFileListAsList(Path dir) throws IOException {
    Stream<Path> stream = Files.list(dir);
    return stream.sorted(timeSizeComparator).
            map(Path::getFileName).map(Path::toString).collect(Collectors.
            toList());
  }

  public String[] sortByDateAndSize(File[] fileList) {
    Arrays.sort(fileList, (File o1, File o2) -> {
      int r = Long.compare(o1.lastModified(), o2.lastModified());
      if (r != 0) {
        return r;
      }
      return Long.compare(o1.length(), o2.length());
    });
    String[] fileNames = new String[fileList.length];
    for (int i = 0; i < fileNames.length; i++) {
      fileNames[i] = fileList[i].getName();
    }
    return fileNames;
  }

  public static void main(String[] args) throws IOException {
    // File (io package)
    File f = new File("C:\Windows\system32");
    // Path (nio package)
    Path dir = Paths.get("C:\Windows\system32");

    FileSorter fs = new FileSorter();

    long before = System.currentTimeMillis();
    String[] names = fs.sortByDateAndSize(f.listFiles());
    long after = System.currentTimeMillis();
    System.out.println("Run time of Arrays.sort: " + ((after - before)));

    long before2 = System.currentTimeMillis();
    String[] names2 = fs.getSortedFileListAsArray(dir);
    long after2 = System.currentTimeMillis();
    System.out.
            println("Run time of Stream.sorted as Array: " + ((after2 - before2)));

    long before3 = System.currentTimeMillis();
    List<String> names3 = fs.getSortedFileListAsList(dir);
    long after3 = System.currentTimeMillis();
    System.out.
            println("Run time of Stream.sorted as List: " + ((after3 - before3)));
  }
}

更新

应用 Peter 的代码后,我得到了以下结果:

Update

After applying the code from Peter I have this results:

Run time of Arrays.sort: 1615
Run time of Stream.sorted as Array: 3116
Run time of Stream.sorted as List: 3059
Run time of Stream.sorted as List with caching: 378

更新 2

在对Peter的解决方案做了一些研究之后,我可以说,使用for ex读取文件属性.Files.getLastModified 必须是一个沉重的危机.仅将 Comparator 中的该部分更改为:

Update 2

After doing some research on the solution of Peter, I can say, that reading file attributes with for ex. Files.getLastModified must be a heavy crunch. Changing only that part in Comparator to:

  Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
    File f1 = o1.toFile();
    File f2 = o2.toFile();
    long lm1 = f1.lastModified();
    long lm2 = f2.lastModified();
    int cmp = Long.compare(lm2, lm1);
    if (cmp == 0) {
      cmp = Long.compare(f2.length(), f1.length());
    }
    return cmp;
  };

在我的电脑上获得更好的结果:

Gets the even better result on my computer:

Run time of Arrays.sort: 1968
Run time of Stream.sorted as Array: 1999
Run time of Stream.sorted as List: 1975
Run time of Stream.sorted as List with caching: 488

但是正如你所看到的,缓存对象是最好的方法.正如 jtahlborn 所说,它是一种稳定的排序.

But as you can see, caching the object is the much best way. And as jtahlborn mentioned, it is a kind of stable sort.

经过更多研究后,我发现 Files.lastModified 和 Files.size 方法在同一件事上做了大量工作:属性.所以我做了三个版本的PathInfo类来测试:

After a bit more research, I've seen, that the methods Files.lastModified and Files.size, both do a huge job on a same thing: Attributes. So I made three versions of the PathInfo class to test:

  1. Peters 版本如下所述
  2. 旧式文件版本,我在构造函数中执行一次 Path.toFile() 并使用 f.lastModified 和 f.length 从该文件中获取所有值
  3. Peters 解决方案的一个版本,但现在我使用 Files.readAttributes(path,BasicFileAttributes.class) 读取了一个 Attribute 对象,并对此做了一些处理.

将所有内容放在一个循环中,每次执行 100 次,我得出了以下结果:

Putting it all in a loop for doing it 100 times each, I came up with these results:

After doing all hundred times
Mean performance of Peters solution: 432.26
Mean performance of old File solution: 343.11
Mean performance of read attribute object once solution: 255.66

PathInfo 构造函数中用于最佳解决方案的代码:

Code in constructor of PathInfo for the best solution:

public PathInfo(Path path) {
  try {
    // read the whole attributes once
    BasicFileAttributes bfa = Files.readAttributes(path, BasicFileAttributes.class);
    fileName = path.getFileName().toString();
    modified = bfa.lastModifiedTime().toMillis();
    size = bfa.size();
  } catch (IOException ex) {
    throw new UncheckedIOException(ex);
  }
}

我的结果:从不读取属性两次,对象中的缓存会导致性能爆发.

My result: Never read attributes twice and caching in an Object is bursting performance.

推荐答案

Files.list() 是一个 O(N) 操作,而排序是 O(N log N).排序中的操作更有可能是重要的.鉴于比较不会做同样的事情,这是最可能的解释.C:/Windows/System32 下有很多修改日期相同的文件,这意味着会经常检查大小.

Files.list() is a O(N) operation whereas sorting is O(N log N). It is far more likely that the operations inside the sorting which matter. Given the comparisons don't do the same thing, this is the most likely explanation. There is a lot of files with the same modification date under C:/Windows/System32 meaning the size would be checked quite often.

为了显示大部分时间没有花在 FIles.list(dir) Stream 上,我优化了比较,因此每个文件只获取一次文件的数据.

To show that most of the time is not spent in FIles.list(dir) Stream, I have optimise the comparison so the data about a file is only obtained once per file.

import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class FileSorter {

    // This sorts from old to young and from big to small
    Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
        int sorter = 0;
        try {
            FileTime lm1 = Files.getLastModifiedTime(o1);
            FileTime lm2 = Files.getLastModifiedTime(o2);
            if (lm2.compareTo(lm1) == 0) {
                Long s1 = Files.size(o1);
                Long s2 = Files.size(o2);
                sorter = s2.compareTo(s1);
            } else {
                sorter = lm1.compareTo(lm2);
            }
        } catch (IOException ex) {
            throw new UncheckedIOException(ex);
        }
        return sorter;
    };

    public String[] getSortedFileListAsArray(Path dir) throws IOException {
        Stream<Path> stream = Files.list(dir);
        return stream.sorted(timeSizeComparator).
                map(Path::getFileName).map(Path::toString).toArray(String[]::new);
    }

    public List<String> getSortedFileListAsList(Path dir) throws IOException {
        Stream<Path> stream = Files.list(dir);
        return stream.sorted(timeSizeComparator).
                map(Path::getFileName).map(Path::toString).collect(Collectors.
                toList());
    }

    public String[] sortByDateAndSize(File[] fileList) {
        Arrays.sort(fileList, (File o1, File o2) -> {
            int r = Long.compare(o1.lastModified(), o2.lastModified());
            if (r != 0) {
                return r;
            }
            return Long.compare(o1.length(), o2.length());
        });
        String[] fileNames = new String[fileList.length];
        for (int i = 0; i < fileNames.length; i++) {
            fileNames[i] = fileList[i].getName();
        }
        return fileNames;
    }

    public List<String> getSortedFile(Path dir) throws IOException {
        return Files.list(dir).map(PathInfo::new).sorted().map(p -> p.getFileName()).collect(Collectors.toList());
    }

    static class PathInfo implements Comparable<PathInfo> {
        private final String fileName;
        private final long modified;
        private final long size;

        public PathInfo(Path path) {
            try {
                fileName = path.getFileName().toString();
                modified = Files.getLastModifiedTime(path).toMillis();
                size = Files.size(path);
            } catch (IOException ex) {
                throw new UncheckedIOException(ex);
            }
        }

        @Override
        public int compareTo(PathInfo o) {
            int cmp = Long.compare(modified, o.modified);
            if (cmp == 0)
                cmp = Long.compare(size, o.size);
            return cmp;
        }

        public String getFileName() {
            return fileName;
        }
    }

    public static void main(String[] args) throws IOException {
        // File (io package)
        File f = new File("C:\Windows\system32");
        // Path (nio package)
        Path dir = Paths.get("C:\Windows\system32");

        FileSorter fs = new FileSorter();

        long before = System.currentTimeMillis();
        String[] names = fs.sortByDateAndSize(f.listFiles());
        long after = System.currentTimeMillis();
        System.out.println("Run time of Arrays.sort: " + ((after - before)));

        long before2 = System.currentTimeMillis();
        String[] names2 = fs.getSortedFileListAsArray(dir);
        long after2 = System.currentTimeMillis();
        System.out.println("Run time of Stream.sorted as Array: " + ((after2 - before2)));

        long before3 = System.currentTimeMillis();
        List<String> names3 = fs.getSortedFileListAsList(dir);
        long after3 = System.currentTimeMillis();
        System.out.println("Run time of Stream.sorted as List: " + ((after3 - before3)));
        long before4 = System.currentTimeMillis();
        List<String> names4 = fs.getSortedFile(dir);
        long after4 = System.currentTimeMillis();
        System.out.println("Run time of Stream.sorted as List with caching: " + ((after4 - before4)));
    }
}

这会打印在我的笔记本电脑上.

This prints on my laptop.

Run time of Arrays.sort: 1980
Run time of Stream.sorted as Array: 1295
Run time of Stream.sorted as List: 1228
Run time of Stream.sorted as List with caching: 185

如您所见,大约 85% 的时间用于重复获取文件的修改日期和大小.

As you can see, about 85% of the time is spent obtaining the modification date and size of the files repeatedly.

这篇关于为什么 java DirectoryStream 执行得这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆