在文件系统中存储大量文件 [英] Storing Large Number Of Files in File-System
问题描述
我有数百万根据GUId生成的音频文件( http:/ /en.wikipedia.org/wiki/Globally_Unique_Identifier )。如何将这些文件存储在文件系统中,以便我可以在同一个文件系统中高效地添加更多文件,并可以有效地搜索 。此外,它应该在将来可扩展。
文件基于GUId(唯一文件名)命名。
例如:
[1] 63f4c070-0ab2-102d-adcb-0015f22e2e5c
[2] ba7cd610-f268-102c-b5ac-0013d4a7a2d6
[3] d03cf036-0ab2-102d-adcb-0015f22e2e5c
[4] d3655a36-0ab3-102d-adcb-0015f22e2e5c
给你的意见。
PS:我已经经历了存储大量图片>。我需要特定的数据结构/算法/逻辑,以便将来也可以可扩展。
EDIT1:文件数量约为1-2百万,文件系统为ext3(CentOS)。
谢谢,
Naveen
这很简单 - 根据GUID值部分构建文件夹树。 / p>
例如,创建256个文件夹,每个文件夹以第一个字节命名,并且只存储那些以该字节开头的GUID的文件。如果一个文件夹中的文件仍然太多,则在GUID的第二个字节的每个文件夹中执行相同的操作。如有需要可添加更多级别。搜索文件将非常快。
通过选择每个级别使用的字节数,您可以有效地为场景选择树结构。
I have millions of audio files, generated based on GUId (http://en.wikipedia.org/wiki/Globally_Unique_Identifier). How can I store these files in the file-system so that I can efficiently add more files in the same file-system and can search for a particular file efficiently. Also it should be scalable in future.
Files are named based on GUId (unique file name).
Eg:
[1] 63f4c070-0ab2-102d-adcb-0015f22e2e5c
[2] ba7cd610-f268-102c-b5ac-0013d4a7a2d6
[3] d03cf036-0ab2-102d-adcb-0015f22e2e5c
[4] d3655a36-0ab3-102d-adcb-0015f22e2e5c
Pl. give your views.
PS: I have already gone through < Storing a large number of images >. I need the particular data-structure/algorithm/logic so that it can also be scalable in future.
EDIT1: Files are around 1-2 millions in number and file system is ext3 (CentOS).
Thanks,
Naveen
That's very easy - build a folder tree based on GUID values parts.
For example, make 256 folders each named after the first byte and only store there files that have a GUID starting with this byte. If that's still too many files in one folder - do the same in each folder for the second byte of the GUID. Add more levels if needed. Search for a file will be very fast.
By selecting the number of bytes you use for each level you can effectively choose the tree structure for your scenario.
这篇关于在文件系统中存储大量文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!