使用BLOB VS OS文件系统的MySQL二进制存储:大文件,大批量,大问题 [英] MySQL Binary Storage using BLOB VS OS File System: large files, large quantities, large problems

查看:154
本文介绍了使用BLOB VS OS文件系统的MySQL二进制存储:大文件,大批量,大问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行的版本(基本上
最新的一切):

PHP:5.3.1

MySQL:5.1.41

Apache :2.2.14

操作系统:CentOS(最新)



这是情况。



<我有成千上万个非常重要的文件,从客户合同到语音签名(合同的客户授权记录),文件类型包括但不限于jpg,gif,png,tiff,doc,docx,xl​​s,wav ,mp3,pdf等



所有这些文档目前都存储在多个服务器上,包括Windows 32位,CentOS和Mac等。有些文件也存储在员工台式电脑和笔记本电脑上,有些文件仍存放在数百个盒子和文件柜中。



现在因为客户或律师可以要求提供证据在任何时候,我的公司必须能够有效地搜索和定位正确的文件,因此所有这些文件都必须被数字化(如果还没有)并且与搜索的某种顺序相关联。访问。



作为程序员,我创建了一个完整的客户关系管理工具,供整个公司使用。这包括客户资料管理,订单和工作跟踪工具,工作/销售创建和管理模块等,以及目前客户资料级别(驾驶执照,信贷机构等)或工作中所需的任何文件/销售级别(合同,语音签名等)可以上传到服务器并位于父/子层次结构中,就像Windows资源管理器或任何其他典型的文件管理模型一样。



结构显示如下:



drivers_license

| - DL_123.jpg

voice_signatures

| - VS_123.wav

| - VS_4567.wav

contract



因此文件使用PHP上传Apache,并存储在OS的文件系统中。在上载时,关于文件的某些信息存储在MySQL数据库中。存储的一些信息是:



表:FileUploads

FileID

CustomerID(该文件所属的客户ID,它们都有此。)

JobID / SaleID(相关作业/销售的ID,如果有的话)。
FileSize

FileType

UploadedDateTime

UploadedBy

FilePath(文件存储的目录路径。)

FileName(当前文件上传文件的名称,CustomerID和JobID / SaleID的组合(如果适用)。)

FileDescription

OriginalFileName(上传时源文件的原始名称,包括扩展名。)



如您所见,文件通过文件名链接到数据库。当我想提供客户的文件供下载给用户时,我所要做的就是SELECT * FROM FileUploads WHERE CustomerID = 123 OR JobID = 2345;这将输出我需要的所有文件详细信息,并且使用FilePath和FileName,我可以提供下载链接。



http ... server / FilePath / FileName



此方法存在许多问题:


  1. 在此数据库无意识环境中存储文件意味着不保留数据完整性。如果删除记录,也可能不会删除该文件,反之亦然。

  2. 文件遍布整个地方,不同的服务器,计算机等。

  3. 文件名是二进制文件与数据库,客户资料和客户记录相匹配的唯一东西。

等,有很多原因,其中一些在这里描述: http://www.dreamwerx.net/ site / article01 。另外这里也有一篇有趣的文章:sietch.net/ViewNewsItem.aspx?NewsItemID=124。



所以经过大量研究后我几乎已经决定要去将所有这些文件存储在数据库中,作为BLOB或LONGBLOB,但在我这样做之前还有很多注意事项。



我知道将它们存储在数据库中是一个可行的选择,但有许多方法存储它们。我也知道存储它们是一回事;以可管理的方式关联和访问它们是另一回事。



此链接提供的文章:dreamwerx.net/site/article01描述了一种拆分上传的二进制文件的方法将文件分成64kb块并使用FileID存储每个块,然后使用标头将实际二进制文件流式传输到客户端。这是一个非常酷的主意,因为它减轻了对服务器内存的预感;而不是将整个100mb文件加载到RAM然后将其发送到客户端,它一次只做64kb。我已经尝试了这个(并更新了他的脚本),这在一个非常小的测试框架中是完全成功的。



所以如果你同意这个方法是一个可存储的,稳定的和强大的长期选项,用于存储中等大小的文件(1kb到几百megs),以及大量的这些文件,让我知道你有什么其他注意事项或想法。



此外,我正在考虑获取当前的文件管理PHP脚本,该脚本提供了一个界面,用于管理存储在文件系统中的文件,并将其转换为管理存储在数据库中的文件。如果已经有任何软件可以做到这一点,请告诉我。



我想我可以提出很多问题,所有信息都在那里^ ^所以请讨论这方面的所有方面,我们可以来回传递想法并相互教授。



干杯,



Quantico773

解决方案

我在一个大型软件系统上工作,该系统已经完成了存储附件和其他内容的两种机制。系统的第一次迭代将所有数据存储在DB中的BLOB中。我当时诅咒它。作为一名程序员,我可以编写侧面脚本来立即对数据进行操作并随时更改。



提前10年左右,我仍然管理相同的软件但是架构已经改变,它是用文件系统指针编写的。我现在诅咒它并希望它回到数据库中。我有几年的额外好处,并且已经在更多和更多更大的情况下以更大的能力工作这个应用程序,我觉得我的观点现在受到更好的教育。应用程序的升级或系统迁移需要大量脚本和数百万个文件的复制。有一次我们更改了操作系统,所有文件指针都有错误的目录分隔符,或者服务器名称在文件所在的地方发生了变化,我们不得不在周末编写并安排简单的SQL更新语句与DBA进行修复。另一个是文件系统和数据库记录不同步,为什么不确定,但经过数千天的操作,有时非事务性系统(文件系统和数据库不共享事务上下文)只是变得不同步。有时文件会神秘地丢失。



当所有这些都在数据库中时,迁移或环境升级是转储和导入数据库的问题。可以正确审核行更改,同步和日志中的所有内容都可以在必要时重播到时间点。当然数据库变大了,但它是2011年,这对数据库来说根本不是一个挑战。



对于它的价值,我们遇到了大数据缓冲区的类似问题当流式传输一些数据时,但A)我们可以使用JDBC和B中的Input | OutputStreams在字节缓冲区中抽取数据当使用其他工具时,我们编写了一个存储过程,将BLOB块化为临时表并迭代地为块提供服务来自临时表。效果很好。



我不在乎将这些内容放入数据库的技术原因,但它是所以更容易在一个整合的位置进行管理我可以将顾问和客户在短时间内管理不同文件所浪费时间的硬件或网格加倍和三倍。






更新:在评论者上轻松一下,他们只是就此事发表意见。


Versions I am running (basically latest of everything):
PHP: 5.3.1
MySQL: 5.1.41
Apache: 2.2.14
OS: CentOS (latest)

Here is the situation.

I have thousands of very important documents, ranging from customer contracts to voice signatures (recordings of customer authorisation for contracts), with file types including, but not limited to jpg, gif, png, tiff, doc, docx, xls, wav, mp3, pdf, etc.

All of these documents are currently stored on several servers including Windows 32 bit, CentOS and Mac, among others. Some files are also stored on employees desktop computers and laptops, and some are still hard copies stored in hundreds of boxes and filing cabinets.

Now because customers or lawyers could demand evidence of contracts at any time, my company has to be able to search and locate the correct document(s) effectively, for this reason ALL of these files have to be digitised (if not already) and correlated into some sort of order for searching and accessing.

As the programmer, I have created a full Customer Relations Management tool that the whole company uses. This includes Customer Profiles management, Order and job Tracking tools, Job/sale creation and management modules, etc, and at the moment any file that is needed at a customer profile level (drivers licence, credit authority, etc) or at a job/sale level (contracts, voice signatures, etc) can be uploaded to the server and sits in a parent/child hierarchy structure, just like Windows Explorer or any other typical file managment model.

The structure appears as such:

drivers_license
|- DL_123.jpg
voice_signatures
|- VS_123.wav
|- VS_4567.wav
contracts

So the files are uplaoded using PHP and Apache, and are stored in the file system of the OS. At the time of uploading, certain information about the file(s) is stored in a MySQL database. Some of the information stored is:

TABLE: FileUploads
FileID
CustomerID (the customer id that the file belongs to, they all have this.)
JobID/SaleID (the id of the job/sale associated, if any.)
FileSize
FileType
UploadedDateTime
UploadedBy
FilePath (the directory path the file is stored in.)
FileName (current file name of uploaded file, combination of CustomerID and JobID/SaleID if applicable.)
FileDescription
OriginalFileName (original name of the source file when uploaded, including extension.)

So as you can see, the file is linked to the database by the File Name. When I want to provide a customers' files for download to a user all I have to do is "SELECT * FROM FileUploads WHERE CustomerID = 123 OR JobID = 2345;" and this will output all the file details I require, and with the FilePath and FileName I can provide the link for download.

http... server / FilePath / FileName

There are a number of problems with this method:

  1. Storing files in this "database unconcious" environment means data integrity is not kept. If a record is deleted, the file may not be deleted also, or vice versa.
  2. Files are strewn all over the place, different servers, computers, etc.
  3. The file name is the ONLY thing matching the binary to the database and customer profile and customer records.

etc, etc. There are so many reasons, some of which are described here: http://www.dreamwerx.net/site/article01 . Also there is an interesting article here too: sietch.net/ViewNewsItem.aspx?NewsItemID=124 .

SO, after much research I have pretty much decided I am going to store ALL of these files in the database, as a BLOB or LONGBLOB, but there are still many considerations before I do this.

I know that storing them in the database is a viable option, however there are a number of methods of storing them. I also know storing them is one thing; correlating and accessing them in a manageable way is another thing entirely.

The article provided at this link: dreamwerx.net/site/article01 describes a way of splitting the uploaded binary files into 64kb chunks and storing each chunk with the FileID, and then streaming the actual binary file to the client using headers. This is a really cool idea since it alleviates preassure on the servers memory; instead of loading an entire 100mb file into the RAM and then sending it to the client, it is doing it 64kb at a time. I have tried this (and updated his scripts) and this is totally successful, in a very small frame of testing.

So if you are in agreeance that this method is a viable, stable and robust long-term option to store moderately large files (1kb to couple hundred megs), and large quantities of these files, let me know what other considerations or ideas you have.

Also, I am considering getting a current "File Management" PHP script that gives an interface for managing files stored in the File System and converting it to manage files stored in the database. If there is already any software out there that does this, please let me know.

I guess there are many questions I could ask, and all the information is up there ^^ so please, discuss all aspects of this and we can pass ideas back and forth and teach each other.

Cheers,

Quantico773

解决方案

I work on a large software system that has done both mechanisms for storing attachments and other content. The first iteration of the system stored all data in BLOBs in the DB. I cursed it at the time. As a programmer, I could write side scripts to immediately operate on the data and change it whenever I wanted to.

Advance about 10 years and I still manage the same software but the architecture has changed and it was written with filesystem pointers. I curse it now and wish it were back in the DB. I have the added benefit of several years and having worked this application in much greater capacity in many more and many larger situations, I feel my opinion now is better educated. Promotion or system migration of the application requires extensive scripting and copying of millions of files. On one occasion we changed the OS and all the file pointers had the wrong directory separator, or the server name changes where the file was and we had to write and schedule simple SQL update statements with the DBA on the weekend to fix. Another is that the filesystem and DB records get out of sync, why is uncertain but after thousands of days of operation, sometimes non-transactional systems (filesystem and DB don't share transactional contexts) simply become out of sync. Sometimes files mysteriously go missing.

When all this was in the DB, migration or environment promotion was a matter of dump and import the DB. Row changes could be properly audited, everything in sync and logs can be replayed to point-in-time if necessary. Sure the DB gets big, but it is 2011 and this stuff is simply not a challenge for databases.

For what it is worth we had some similar issues with large data buffers when streaming some data, but A) we could pump the data in byte buffers with the Input|OutputStreams in JDBC and B) when using other tools, we wrote a stored procedure that would chunk the BLOB into a temp table and iteratively serve the chunks from the temp table. Works great.

I don't care what the technical reason for not putting this stuff in the DB, but it is so much easier to manage in a consolidated location I could double and triple the hardware or grid the DB for the time wasted by consultants and customers just in a short period of time managing the disparate files.


Update: go easy on the commenters, they're just giving their opinion on the matter.

这篇关于使用BLOB VS OS文件系统的MySQL二进制存储:大文件,大批量,大问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆