将大量图像存储在单个目录中是否会减慢图像检索的速度? [英] Does storing a lot of images in a single directory slow down image retrieval?

查看:181
本文介绍了将大量图像存储在单个目录中是否会减慢图像检索的速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一个网站,用户可以上传尽可能多的图片(想像photobucket一样),设置文件存储的最佳方式是什么(所有上传也得到一个唯一的随机时间戳)?

 网站根目录
- 用户名称
---- image1.jpg
---- image2 .jpg
---- image3.jpg
- 其他用户
---- image1.jpg
---- image2.jpg
---- image3.jpg
...



  siteroot 
--uploads
---- image1.jpg
---- image2.jpg
- --image3.jpg
---- image4.jpg
---- image6.jpg
...
---- image50000.jpg

我认为第一种方法更有组织性。但我认为第二种方法是标准的(保持所有上传在同一个目录),但我想知道如果检索一个图像时,如果有成千上万的图像在同一目录中是否会变慢



---编辑---

感谢迄今为止的出色答案。
此外,我将创建缩略图,所以我也将不得不插入那个目录... ,创建一个命名约定,如thumb_whatever.jpg。
$ b

有很多不同的方法可以做到这一点。
是磁盘空间将是一个问题。但是现在我关心的是检索时间。当我必须输出一个图像到浏览器,如果这个图像是在一个目录中有10,000个其他图像,我很担心,可以得到多慢。

方案

目录中的文件数量在读取文件数据所需的时间上应该完全没有影响 - 但是它可以在开始之前大量影响查找文件所需的时间阅读它。

主要问题启动的确切断点将从文件系统类型到文件系统类型有所不同,但是,一般来说,如果您谈论的是几百个文件,你不需要担心它。如果你正在谈论几千个,那么值得思考,也许做一些基准测试,看看你的文件系统和硬件如何处理它。如果你正在谈论数以万计的文件,那么你真的需要开始分手了。 (我曾经有一个Linux / e2fs打印服务器,CUPS在完成打印后没有删除它的作业控制文件,在一个目录中出现了大约10万个文件,只是得到一个目录列表花了半个多小时才开始显示任何文件名。)

用用户名分隔它们可能不是最好的选择,因为你可能会有很多用户上传很少的图像,也许一对夫妇谁上传数百或数千的图像,可能在这些用户的存储目录中创建访问时间问题。在这种情况下更大的问题是,你可能最终(假设一个成功的网站)与成千上万或成千上万的用户和大量的子目录是一样糟糕的大量的文件,以减缓访问您的数据。



由于你将要在它们上面有一个时间戳,我可能要做的就是把它们放到子目录中,这个子目录是基于 last 时间戳的三位数字。这将在1000个子目录中相对均匀地分配文件,并且应该保持每个目录中文件的数量相当小。 (使用前三位数字将导致一个目录被填充,然后移到下一个目录,而不是平均分配。)如果你仍然结束在每个子目录中的文件太多(这可能意味着你正在处理几个百万上传的图像),你可以添加前三位的第二个级别,所以upload-1234567890.jpg最终将在/567/890/upload-1234567890.jpg。

If i have a site where users can upload as many images as they want(think photobucket-like), what is the best way to set up file storage (also, all uploads get a unique random timestamp)?

site root
--username
----image1.jpg
----image2.jpg
----image3.jpg
--anotheruser
----image1.jpg
----image2.jpg
----image3.jpg
...

or

siteroot
--uploads
----image1.jpg
----image2.jpg
----image3.jpg
----image4.jpg
----image6.jpg
...
----image50000.jpg

I think the first method is more organized. But i think the second method is standard(keeping all uploads in the same dir), but i wonder if it would be slower when retrieving an image if there are thousands of image in the same directory

--- edit ---

Thanks for the great answers so far. Also, i will be creating thumbnails, so i also would have to insert that directory somewhere... or, create a naming convention such as thumb_whatever.jpg.

so many different ways to do this. Yes disk space will be a problem. but for now i am concerned with retrieval time. When i have to output an image to the browser, if that image is in a directory with 10,000 other images, i am worried on how slow that could get.

解决方案

The number of files in a directory should have no effect at all on the time required to read a file's data - but it can massively affect the amount of time needed to find the file before you can start to read it.

The exact breakpoints where the major issues start up will vary from filesystem type to filesystem type, but, in general, if you're talking about a few hundred files, you don't much need to worry about it. If you're talking about a few thousand, it's worth thinking about and maybe doing a little benchmarking to see how your filesystem and hardware handle it. If you're talking about tens of thousands of files, then you really need to start breaking things up. (I once had a Linux/e2fs print server where CUPS wasn't deleting its job control files after it finished printing and it got up around 100,000 files in one directory. Just getting a directory listing took over half an hour before it even started to display any filenames.)

Separating them by user name may not be the best choice, though, since you'll likely have a lot of users uploading very few images and perhaps a couple who upload hundreds or thousands of images, potentially creating access time issues in those users' storage directories. The bigger problem in that scenario is that you'd likely end up (assuming a successful site) with thousands or tens of thousands of users and a large number of subdirectories is just as bad as a large number of files for slowing down access to your data.

Since you're going to have a timestamp on them, what I would probably do is put them into subdirectories based on the last three digits of the timestamp. That will distribute the files relatively evenly across 1000 subdirectories and should keep the number of files in each directory reasonably small. (Using the first three digits would cause one directory to be filled before moving to the next instead of distributing them evenly.) If you're still ending up with too many files in each subdirectory (which would likely mean you're dealing with several million uploaded images), you could add a second level for the previous three digits, so upload-1234567890.jpg would end up at /567/890/upload-1234567890.jpg.

这篇关于将大量图像存储在单个目录中是否会减慢图像检索的速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆