储存与在Linux中访问多达1000万个文件 [英] Storing & accessing up to 10 million files in Linux

查看:214
本文介绍了储存与在Linux中访问多达1000万个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个应用程序,该应用程序需要存储最多约1000万个文件.

I'm writing an app that needs to store lots of files up to approx 10 million.

它们目前以UUID命名,每个大小约为4MB,但大小始终相同.从这些文件读写文件将始终是顺序的.

They are presently named with a UUID and are going to be around 4MB each but always the same size. Reading and writing from/to these files will always be sequential.

我正在寻找2个主要问题的答案:

2 main questions I am seeking answers for:

1)哪种文件系统对此最合适. XFS还是ext4? 2)是否有必要将文件存储在子目录下,以减少单个目录中的文件数?

1) Which filesystem would be best for this. XFS or ext4? 2) Would it be necessary to store the files beneath subdirectories in order to reduce the numbers of files within a single directory?

对于问题2,我注意到人们试图发现您可以存储在单个目录中的文件数量的XFS限制,但没有发现超过数百万的限制.他们指出没有性能问题.在ext4下呢?

For question 2, I note that people have attempted to discover the XFS limit for number of files you can store in a single directory and haven't found the limit which exceeds millions. They noted no performance problems. What about under ext4?

与从事类似工作的人一起搜索时,有些人建议将inode编号存储为文件的链接,而不是存储文件名以提高性能(这在数据库索引中,我也在使用).但是,我看不到可用的API来按inode编号打开文件.看来这更像是在ext3下改善性能的建议,我不打算使用它.

Googling around with people doing similar things, some people suggested storing the inode number as a link to the file instead of the filename for performance (this is in a database index. which I'm also using). However, I don't see a usable API for opening the file by inode number. That seemed to be more of a suggestion for improving performance under ext3 which I am not intending to use by the way.

ext4和XFS的限制是什么?彼此之间有什么性能优势?在我的案例中,您能看到在XFS上使用ext4的理由吗?

What are the ext4 and XFS limits? What performance benefits are there from one over the other and could you see a reason to use ext4 over XFS in my case?

推荐答案

您绝对应该将文件存储在子目录中.

You should definitely store the files in subdirectories.

EXT4和XFS都使用有效的文件名查找方法,但是如果您需要在诸如lsfind之类的目录上运行工具,您将非常高兴将文件分成1,000个可管理的块- 10,000个文件.

EXT4 and XFS both use efficient lookup methods for file names, but if you ever need to run tools over the directories such as ls or find you will be very glad to have the files in manageable chunks of 1,000 - 10,000 files.

inode编号是为了提高EXT文件系统的顺序访问性能.元数据存储在inode中,如果您无序访问这些inode,则对元数据的访问是随机的.通过按inode顺序读取文件,您也可以使元数据访问顺序化.

The inode number thing is to improve the sequential access performance of the EXT filesystems. The metadata is stored in inodes and if you access these inodes out of order then the metadata accesses are randomized. By reading your files in inode order you make the metadata access sequential too.

这篇关于储存与在Linux中访问多达1000万个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆