在文件系统上混合使用RDBMS和文件的最佳做法 [英] Best practices for mixed usage of RDBMS and files on filesystem

查看:110
本文介绍了在文件系统上混合使用RDBMS和文件的最佳做法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我正在开发的模式中的一个表格中,我需要处理几千个数据表,这些数据表大多是PDF文档,有时还包括PNG,JPG等图形图像文件。模型的电子分销商门户,其中新产品经常添加到他们的投资组合。

In one of the tables in the schema I am working on, I need to deal with couple-of thousand "data-sheets" which are mostly PDF documents, and sometimes graphic-image files like PNG, JPG etc. The schema models a Electronics Distributor's portal, where new products get added to their portfolio frequently.

这些文件(数据表)是在添加新产品时,但是它们需要不时更新(由于较新版本的文档,而不是产品本身),所以我认为更新是一个异步过程。

These documents (data-sheets) are added, at the time of introduction of a new product, but they need updates from time to time (s.a. due to newer version of the document, not the product itself), so I'd think the update to be an asynchronous procedure.

鉴于此,我应该只保留数据表(&类似文档)在我的表中的文件名/路径,实际文件在文件系统上,或者我应该采取blob方法。我几乎可以肯定,它应该是前一种方法,但仍然想采取社区建议,看看是否有一些陷阱watchout。

Given this, should I keep only the file-name/path of the data-sheets (& similar documents) in my table, with the actual file being on filesystem, or should I take the blob approach. I am almost certain that it should be the former approach, but still wanted to take community advise, and see if there are some pitfalls to watchout for.

推荐答案

为了完整起见,让我只提到一些数据库允许你有这两种方法的混合,例如 Oracle BFILE

For completeness, let me just mention that some databases allow you to have a "hybrid" of these two approaches, for example Oracle BFILE or MS SQL Server FILESTREAM.

还有一个有趣的讨论,在Ask Tom on 在Oracle BLOB中存储文件(简而言之: BLOBs比文件)。

There is also an interesting discussion at Ask Tom on storing files in Oracle BLOBs (in a nutshell: "BLOBs are better than files").

BTW,你不一定需要选择一个... 如果可以承受存储开销,并且您正在以读为主的环境中操作,您可以将主数据存储在BLOB中以实现完整性,但缓存同一数据在文件中以便快速读取 - 只有访问。一些注意事项:

BTW, you don't necessarily need to chose one over another... If you can afford storage overhead and you are operating in a read-mostly environment, you could store the "master" data in the BLOB for integrity but "cache" that same data in a file for quick read-only access. Some considerations:


  • 如果BLOB更新/删除,您需要确保文件已更新/删除。
  • $考虑根据需要创建/更新文件。
  • 考虑从缓存中移除旧文件,即使相应的BLOB仍然存在。

  • 考虑使用几个缓存(例如,如果您有一个中间层,并且分布到多个物理机器,每台机器可以有自己的文件缓存)。


  • You'd need to make sure the file is updated/removed if BLOB is updated/removed.
  • Consider creating/updating the file on-demand.
  • Consider evicting old files from the "cache" even if corresponding BLOBs still exist.
  • Consider using several "caches" (e.g. if you have a middle tier and is distributed to multiple physical machines, each machine could have its own file cache).
  • And finally, you'd need to make sure all this works robustly in a concurrent environment.

因此,这不是最简单的方法,根据您的需求,可以在诚信,绩效和实施工作之间进行良好的权衡。

So, this is not the simplest approach but, depending on your needs, may be a good tradeoff between integrity, performance and implementation effort.

这篇关于在文件系统上混合使用RDBMS和文件的最佳做法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆