为什么java DirectoryStream的执行速度如此之慢? [英] Why does the java DirectoryStream perform so slow?

查看:181
本文介绍了为什么java DirectoryStream的执行速度如此之慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经使用nio-package的DirectoryStreams对Streams进行了一些特殊测试。我只是尝试获取目录中按上次修改日期和大小排序的所有文件的列表。

I've done some testing with Streams in special with DirectoryStreams of the nio-package. I simply try to get a list of all files in a directory sorted by last modified date and size.

旧的File.listFiles()的JavaDoc声明了注意文件中的方法 s

The JavaDoc of old File.listFiles() stated a Note to the method in Files:


请注意,Files类将newDirectoryStream方法定义为
打开一个目录并迭代
目录中文件的名称。使用非常大的
目录时,这可能会占用更少的资源。

Note that the Files class defines the newDirectoryStream method to open a directory and iterate over the names of the files in the directory. This may use less resources when working with very large directories.

我在很多时候运行代码(前三次):

I run the code down below a lot of times (first three times below):

首次运行:

Run time of Arrays.sort: 1516
Run time of Stream.sorted as Array: 2912
Run time of Stream.sorted as List: 2875

第二轮:

Run time of Arrays.sort: 1557
Run time of Stream.sorted as Array: 2978
Run time of Stream.sorted as List: 2937

第三次运行:

Run time of Arrays.sort: 1563
Run time of Stream.sorted as Array: 2919
Run time of Stream.sorted as List: 2896

我的问题是:为什么流表现得如此糟糕?

My question is: Why do the streams perform so bad?

import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class FileSorter {

  // This sorts from old to young and from big to small
  Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
    int sorter = 0;
    try {
      FileTime lm1 = Files.getLastModifiedTime(o1);
      FileTime lm2 = Files.getLastModifiedTime(o2);
      if (lm2.compareTo(lm1) == 0) {
        Long s1 = Files.size(o1);
        Long s2 = Files.size(o2);
        sorter = s2.compareTo(s1);
      } else {
        sorter = lm1.compareTo(lm2);
      }
    } catch (IOException ex) {
      throw new UncheckedIOException(ex);
    }
    return sorter;
  };

  public String[] getSortedFileListAsArray(Path dir) throws IOException {
    Stream<Path> stream = Files.list(dir);
    return stream.sorted(timeSizeComparator).
            map(Path::getFileName).map(Path::toString).toArray(String[]::new);
  }

  public List<String> getSortedFileListAsList(Path dir) throws IOException {
    Stream<Path> stream = Files.list(dir);
    return stream.sorted(timeSizeComparator).
            map(Path::getFileName).map(Path::toString).collect(Collectors.
            toList());
  }

  public String[] sortByDateAndSize(File[] fileList) {
    Arrays.sort(fileList, (File o1, File o2) -> {
      int r = Long.compare(o1.lastModified(), o2.lastModified());
      if (r != 0) {
        return r;
      }
      return Long.compare(o1.length(), o2.length());
    });
    String[] fileNames = new String[fileList.length];
    for (int i = 0; i < fileNames.length; i++) {
      fileNames[i] = fileList[i].getName();
    }
    return fileNames;
  }

  public static void main(String[] args) throws IOException {
    // File (io package)
    File f = new File("C:\\Windows\\system32");
    // Path (nio package)
    Path dir = Paths.get("C:\\Windows\\system32");

    FileSorter fs = new FileSorter();

    long before = System.currentTimeMillis();
    String[] names = fs.sortByDateAndSize(f.listFiles());
    long after = System.currentTimeMillis();
    System.out.println("Run time of Arrays.sort: " + ((after - before)));

    long before2 = System.currentTimeMillis();
    String[] names2 = fs.getSortedFileListAsArray(dir);
    long after2 = System.currentTimeMillis();
    System.out.
            println("Run time of Stream.sorted as Array: " + ((after2 - before2)));

    long before3 = System.currentTimeMillis();
    List<String> names3 = fs.getSortedFileListAsList(dir);
    long after3 = System.currentTimeMillis();
    System.out.
            println("Run time of Stream.sorted as List: " + ((after3 - before3)));
  }
}



更新



应用Peter的代码后我得到了这样的结果:

Update

After applying the code from Peter I have this results:

Run time of Arrays.sort: 1615
Run time of Stream.sorted as Array: 3116
Run time of Stream.sorted as List: 3059
Run time of Stream.sorted as List with caching: 378



更新2



在对Peter的解决方案进行一些研究后,我可以比方说,读取文件的属性为ex。 Files.getLastModified必须是一个沉重的紧缩。仅将比较器中的该部分更改为:

Update 2

After doing some research on the solution of Peter, I can say, that reading file attributes with for ex. Files.getLastModified must be a heavy crunch. Changing only that part in Comparator to:

  Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
    File f1 = o1.toFile();
    File f2 = o2.toFile();
    long lm1 = f1.lastModified();
    long lm2 = f2.lastModified();
    int cmp = Long.compare(lm2, lm1);
    if (cmp == 0) {
      cmp = Long.compare(f2.length(), f1.length());
    }
    return cmp;
  };

在我的电脑上获得更好的结果:

Gets the even better result on my computer:

Run time of Arrays.sort: 1968
Run time of Stream.sorted as Array: 1999
Run time of Stream.sorted as List: 1975
Run time of Stream.sorted as List with caching: 488

但正如您所见,缓存对象是最好的方法。正如jtahlborn所说,它是一种稳定的类型。

But as you can see, caching the object is the much best way. And as jtahlborn mentioned, it is a kind of stable sort.

经过一番研究后,我发现,Files.lastModified和Files.size这两个方法在同一个方面做了大量工作:属性。所以我做了三个版本的PathInfo类来测试:

After a bit more research, I've seen, that the methods Files.lastModified and Files.size, both do a huge job on a same thing: Attributes. So I made three versions of the PathInfo class to test:


  1. Peters版本如下所述

  2. 旧样式文件版本,我在构造函数中执行一次Path.toFile()并使用f.lastModified和f.length获取该文件中的所有值

  3. Peters的一个版本解决方案,但现在我用Files.readAttributes(path,BasicFileAttributes.class)读取了一个Attribute对象并对此做了一些事情。

Putting这一切都在一个循环中,每次100次,我想出了这些结果:

Putting it all in a loop for doing it 100 times each, I came up with these results:

After doing all hundred times
Mean performance of Peters solution: 432.26
Mean performance of old File solution: 343.11
Mean performance of read attribute object once solution: 255.66

最佳解决方案的PathInfo构造函数中的代码:

Code in constructor of PathInfo for the best solution:

public PathInfo(Path path) {
  try {
    // read the whole attributes once
    BasicFileAttributes bfa = Files.readAttributes(path, BasicFileAttributes.class);
    fileName = path.getFileName().toString();
    modified = bfa.lastModifiedTime().toMillis();
    size = bfa.size();
  } catch (IOException ex) {
    throw new UncheckedIOException(ex);
  }
}

我的结果:从不读取属性两次,在对象中缓存是爆破性能。

My result: Never read attributes twice and caching in an Object is bursting performance.

推荐答案

Files.list()是一个O(N)操作,而排序是O(N log N)。排序中的操作更有可能是重要的。鉴于比较不做同样的事情,这是最可能的解释。在C:/ Windows / System32下有很多具有相同修改日期的文件,这意味着将经常检查大小。

Files.list() is a O(N) operation whereas sorting is O(N log N). It is far more likely that the operations inside the sorting which matter. Given the comparisons don't do the same thing, this is the most likely explanation. There is a lot of files with the same modification date under C:/Windows/System32 meaning the size would be checked quite often.

要显示大部分时间是没有在FIles.list(dir)Stream中花费,我已经优化了比较,因此每个文件只获取一次文件的数据。

To show that most of the time is not spent in FIles.list(dir) Stream, I have optimise the comparison so the data about a file is only obtained once per file.

import java.io.File;
import java.io.IOException;
import java.io.UncheckedIOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.attribute.FileTime;
import java.util.Arrays;
import java.util.Comparator;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class FileSorter {

    // This sorts from old to young and from big to small
    Comparator<Path> timeSizeComparator = (Path o1, Path o2) -> {
        int sorter = 0;
        try {
            FileTime lm1 = Files.getLastModifiedTime(o1);
            FileTime lm2 = Files.getLastModifiedTime(o2);
            if (lm2.compareTo(lm1) == 0) {
                Long s1 = Files.size(o1);
                Long s2 = Files.size(o2);
                sorter = s2.compareTo(s1);
            } else {
                sorter = lm1.compareTo(lm2);
            }
        } catch (IOException ex) {
            throw new UncheckedIOException(ex);
        }
        return sorter;
    };

    public String[] getSortedFileListAsArray(Path dir) throws IOException {
        Stream<Path> stream = Files.list(dir);
        return stream.sorted(timeSizeComparator).
                map(Path::getFileName).map(Path::toString).toArray(String[]::new);
    }

    public List<String> getSortedFileListAsList(Path dir) throws IOException {
        Stream<Path> stream = Files.list(dir);
        return stream.sorted(timeSizeComparator).
                map(Path::getFileName).map(Path::toString).collect(Collectors.
                toList());
    }

    public String[] sortByDateAndSize(File[] fileList) {
        Arrays.sort(fileList, (File o1, File o2) -> {
            int r = Long.compare(o1.lastModified(), o2.lastModified());
            if (r != 0) {
                return r;
            }
            return Long.compare(o1.length(), o2.length());
        });
        String[] fileNames = new String[fileList.length];
        for (int i = 0; i < fileNames.length; i++) {
            fileNames[i] = fileList[i].getName();
        }
        return fileNames;
    }

    public List<String> getSortedFile(Path dir) throws IOException {
        return Files.list(dir).map(PathInfo::new).sorted().map(p -> p.getFileName()).collect(Collectors.toList());
    }

    static class PathInfo implements Comparable<PathInfo> {
        private final String fileName;
        private final long modified;
        private final long size;

        public PathInfo(Path path) {
            try {
                fileName = path.getFileName().toString();
                modified = Files.getLastModifiedTime(path).toMillis();
                size = Files.size(path);
            } catch (IOException ex) {
                throw new UncheckedIOException(ex);
            }
        }

        @Override
        public int compareTo(PathInfo o) {
            int cmp = Long.compare(modified, o.modified);
            if (cmp == 0)
                cmp = Long.compare(size, o.size);
            return cmp;
        }

        public String getFileName() {
            return fileName;
        }
    }

    public static void main(String[] args) throws IOException {
        // File (io package)
        File f = new File("C:\\Windows\\system32");
        // Path (nio package)
        Path dir = Paths.get("C:\\Windows\\system32");

        FileSorter fs = new FileSorter();

        long before = System.currentTimeMillis();
        String[] names = fs.sortByDateAndSize(f.listFiles());
        long after = System.currentTimeMillis();
        System.out.println("Run time of Arrays.sort: " + ((after - before)));

        long before2 = System.currentTimeMillis();
        String[] names2 = fs.getSortedFileListAsArray(dir);
        long after2 = System.currentTimeMillis();
        System.out.println("Run time of Stream.sorted as Array: " + ((after2 - before2)));

        long before3 = System.currentTimeMillis();
        List<String> names3 = fs.getSortedFileListAsList(dir);
        long after3 = System.currentTimeMillis();
        System.out.println("Run time of Stream.sorted as List: " + ((after3 - before3)));
        long before4 = System.currentTimeMillis();
        List<String> names4 = fs.getSortedFile(dir);
        long after4 = System.currentTimeMillis();
        System.out.println("Run time of Stream.sorted as List with caching: " + ((after4 - before4)));
    }
}

这在我的笔记本电脑上打印。

This prints on my laptop.

Run time of Arrays.sort: 1980
Run time of Stream.sorted as Array: 1295
Run time of Stream.sorted as List: 1228
Run time of Stream.sorted as List with caching: 185

如您所见,大约85%的时间用于重复获取文件的修改日期和大小。

As you can see, about 85% of the time is spent obtaining the modification date and size of the files repeatedly.

这篇关于为什么java DirectoryStream的执行速度如此之慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆