将用户上传的图像方便地存储在文件系统上 [英] Effeciently storing user uploaded images on the file system

查看:181
本文介绍了将用户上传的图像方便地存储在文件系统上的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

场景

用户可以发布一个项目,最多包含5张图片,上传的每张图片都需要重新采样,调整大小 - 总共创建了4个额外的图像。意思是说,如果用户上传5张照片,最终可以存储25张图像。

Users can post an item and include up to 5 images with the post, each image that's uploaded needs to be resampled and resized - a total of 4 extra images are created. Meaning if the user uploads 5 images end up with 25 images total to store.

假设


  • 图片已正确 ,它们是有效的图片文件

  • 系统必须缩放(让我们先考虑1000个帖子,所以最多5000个图像)

  • 每个图像都重命名为与db post条目的auto_incremenet id相关的后缀ie 12345_1_1.jpg 12345_2_1.jpg - 所以没有重复的问题

  • 图像不是敏感的,所以没有问题让他们直接访问(尽管目录列表将被禁用)

  • The images have been properly checked and they're valid image files
  • The system has to scale (let's assume 1000 posts in the first instance, so maximum 5000 images)
  • Each image is renamed in relation to the auto_incremenet id of the db post entry and includes relevant suffix i.e. 12345_1_1.jpg 12345_2_1.jpg - so there's no issues with duplicates
  • The images aren't of a sensitive nature, so there's no issues with having them directly accessible (although directory listing would be disabled)

可能的方法


  • 鉴于ids是唯一的,我们可以将它们放入一个文件夹(在某一点后不合法)。

  • 可以为每个帖子创建一个文件夹,并将所有图像放入Ť帽子,所以ROOT / images / 12345(再次,最终会有大量的文件夹)

  • 可以根据日期进行图像存储,即每天创建一个新文件夹,日期图像存储在那里。

  • 可以根据调整大小的类型存储图像,即所有原始文件可以存储在一个文件夹中/将所有缩略图存储在图像/ thumb我想Gumtree使用这样的方法)。

  • 在创建另一个文件夹之前,可以将X个文件存储在一个文件夹中。

  • Given the ids are unique we could just drop them into one folder (ineffecient after a certain point).
  • Could create a folder for each post and place all the images into that, so ROOT/images/12345 (again, would end up with a multitude of folders)
  • Could do an image store based on date, i.e. each day a new folder is created and the days images are stored in there.
  • Could store the images based on the resized type, i.e. all the original files could be stored in one folder images/orig all the thumbnails in images/thumb (i think Gumtree uses an approach like this).
  • Could allow X amount of files to be stored in one folder before creating another one.

任何人有关于最佳实践/方法的经验,当涉及到图像可缩放地存储?

Anyone have experience on the best practices / approaches when it comes to storing images scalably?

注意:我会提前提到S3 - 我们假设我们希望暂时保留本地的图像。

Note: I prememt someone will mention S3 - let's assume we want to keep the images locally for the time being.

感谢您的查找

推荐答案

迄今为止,我们拥有3万多个文件和20多GB的大量生产系统。 ..

We have such a system in heavy production with 30,000+ files and 20+ GB to date...

   Column    |            Type             |                        Modifiers                         
-------------+-----------------------------+----------------------------------------------------------
 File_ID     | integer                     | not null default nextval('"ACRM"."File_pseq"'::regclass)
 CreateDate  | timestamp(6) with time zone | not null default now()
 FileName    | character varying(255)      | not null default NULL::character varying
 ContentType | character varying(128)      | not null default NULL::character varying
 Size        | integer                     | not null
 Hash        | character varying(40)       | not null
Indexes:
    "File_pkey" PRIMARY KEY, btree ("File_ID")

这些文件只存储在一个单一的目录中,整数File_ID作为文件的名称。我们超过30,000,没有问题。

The files are just stored in a single directory with the integer File_ID as the name of the file. We're over 30,000 with no problems. I've tested higher with no problems.

这是使用RHEL 5 x86_64与ext3作为文件系统。

This is using RHEL 5 x86_64 with ext3 as the file system.

我会再这样做吗?不,让我分享一下重新设计的想法。


  1. 数据库仍然是主源关于这些文件的信息。

  1. The database is still the "master source" of information on the files.

每个文件都是sha1(),它基于哈希值散列并存储在文件系统层次结构中:
/FileData/ab/cd/abcd4548293827394723984723432987.jpg

Each file is sha1() hashed and stored in a filesystem hierarchy based on that hash: /FileData/ab/cd/abcd4548293827394723984723432987.jpg

数据库有助于在每个文件上存储元信息文件。这将是一个三个表格系统:

the database is a bit smarter about storing meta-information on each file. It would be a three table system:

文件:存储名称,日期,ip,所有者,和指向Blob(sha1)的指针

File_Meta :根据文件的类型,在文件上存储键/值对。这可能包括诸如Image_Width等的信息...

Blob :存储对sha1的引用及其大小。

File : stores info such as name, date, ip, owner, and a pointer to a Blob (sha1)
File_Meta : stores key/value pairs on the file, depending on the type of file. This may include information such as Image_Width, etc...
Blob : stores a reference to the sha1 along with it's size.

该系统将通过存储由哈希引用的数据来复制文件内容(多个文件可以引用相同的文件数据)。使用rsync备份同步文件数据库将非常容易。

This system would de-duplicate the file content by storing the data referenced by a hash (multiple files could reference the same file data). It would be very easy to backup sync the file database using rsync.

此外,包含大量文件的给定目录的限制将被删除。

Also, the limitations of a given directory containing a lot of files would be eliminated.

文件扩展名将作为唯一文件散列的一部分存储。例如,如果空文件的哈希值为 abcd8765 ...一个空的 .txt 文件和空 .php 文件将引用相同的哈希。相反,他们应该参考 abcd8765.php abcd8765.txt 。为什么?

The file extension would be stored as part of the unique file hash. For example, if the hash for an empty file were abcd8765... An empty .txt file and empty .php file would refer to the same hash. Rather, they should refer to abcd8765.php and abcd8765.txt. Why?

Apache等可以配置为根据文件扩展名自动选择内容类型和缓存规则。使用有效的名称和反映文件内容的扩展名来存储文件很重要。

Apache, etc.. can be configured to automatically choose the content type and caching rules based on the file extension. It is important to store the files with a valid name and the extension which reflects the content of the file.

您可以看到,该系统可以通过委派文件来提高性能通过nginx传递。请参阅 http://wiki.nginx.org/XSendfile

You see, this system could really boost performance by delegating the file delivery through nginx. See http://wiki.nginx.org/XSendfile.

我希望这在某些方面有所帮助。小心。

I hope this helps in some way. Take care.

这篇关于将用户上传的图像方便地存储在文件系统上的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆