成千上万的文件在同一个Linux目录中是否正确(性能明智)? [英] Is it OK (performance-wise) to have hundreds or thousands of files in the same Linux directory?

查看:494
本文介绍了成千上万的文件在同一个Linux目录中是否正确(性能明智)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

众所周知,在Windows中,当您尝试打开其中一个文件时,具有太多文件的目录将具有可怕的性能。我有一个只能在Linux下执行的程序(目前它在Debian-Lenny上,但我不想具体说明这个发行版),并且把许多文件写到同一个目录(这个目录作为一个存储库)。我认为很多是每天几十,这意味着一年之后,我想有5000-10000个文件。它们是为了保存(一旦文件被创建,它从来没有被删除),并假定硬盘具有所需的容量(如果没有,应该升级)。这些文件的大小范围很广,从几KB到几十MB(但不超过)。名称始终是数值,逐渐生成。
我很担心长期的性能下降,所以我会问:


  • 可以写全部同一个目录?或者我应该考虑为每个X文件创建一组子目录?

  • 我需要一个特定的文件系统用于这样的目录吗?
  • 什么是更强大的选择?专门的文件系统?哪个?

  • 任何其他的考虑/推荐
  • 这在很大程度上取决于文件系统。

    ext2和ext3每个目录有32,000个文件的硬性限制。这比你所问的要多一点,但足够接近,我不会冒险。此外,ext2和ext3将执行一个线性扫描,每次你在目录中访问一个文件的名称。



    ext4据说修复这些问题,但我不能担保。
    $ b

    XFS是从一开始就为这类事情设计的,即使把数以百万计的文件放在目录中也能正常工作。



    所以如果你真的需要大量的文件,我会使用XFS或者ext4。

    请注意,没有文件系统会使ls 如果你有大量文件(除非你使用ls -f),那么运行速度很快,因为ls将读取整个目录并对其进行排序。成千上万的可能不是什么大不了的事情,但是一个好的设计应该超出你想要的乍一看......



    对于你描述的应用程序,我可能会创建一个层次结构,因为对于一个人来说,它几乎没有任何额外的编码或精神上的努力。具体而言,您可以将您的第一个文件命名为00/00/01而不是000001。


    It's well known that in Windows a directory with too many files will have a terrible performance when you try to open one of them. I have a program that is to execute only in Linux (currently it's on Debian-Lenny, but I don't want to be specific about this distro) and writes many files to the same directory (which acts somewhat as a repository). By "many" I mean tens each day, meaning that after one year I expect to have something like 5000-10000 files. They are meant to be kept (once a file is created, it's never deleted) and it is assumed that the hard disk has the required capacity (if not, it should be upgraded). Those files have a wide range of sizes, from a few KB to tens of MB (but not much more than that). The names are always numeric values, incrementally generated. I'm worried about long-term performance degradation, so I'd ask:

    • Is it OK to write all to the same directory? Or should I think about creating a set of subdirectories for every X files?
    • Should I require a specific filesystem to be used for such directory?
    • What would be the more robust alternative? Specialized filesystem? Which?
    • Any other considerations/recomendations?

    解决方案

    It depends very much on the file system.

    ext2 and ext3 have a hard limit of 32,000 files per directory. This is somewhat more than you are asking about, but close enough that I would not risk it. Also, ext2 and ext3 will perform a linear scan every time you access a file by name in the directory.

    ext4 supposedly fixes these problems, but I cannot vouch for it personally.

    XFS was designed for this sort of thing from the beginning and will work well even if you put millions of files in the directory.

    So if you really need a huge number of files, I would use XFS or maybe ext4.

    Note that no file system will make "ls" run fast if you have an enormous number of files (unless you use "ls -f"), since "ls" will read the entire directory and the sort the names. A few tens of thousands is probably not a big deal, but a good design should scale beyond what you think you need at first glance...

    For the application you describe, I would probably create a hierarchy instead, since it is hardly any additional coding or mental effort for someone looking at it. Specifically, you can name your first file "00/00/01" instead of "000001".

    这篇关于成千上万的文件在同一个Linux目录中是否正确(性能明智)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆