是平面文件数据库有什么好处? [英] Are flat file databases any good?

查看:492
本文介绍了是平面文件数据库有什么好处?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

知情大约需要平面文件数据库的优劣选择。我使用的是平面文件数据库方案来管理自定义博客数据考虑。它将部署在Linux操作系统的变体和用Java编写的。

Informed options needed about the merits of flat file database. I'm considering using a flat file database scheme to manage data for a custom blog. It would be deployed on Linux OS variant and written in Java.

什么是对两篇文章和评论的性能阅读和写作可能底片或正片?

What are the possible negatives or positives regarding performance for reading and writing of both articles and comments?

那么第检索废话它是一个平面文件,而不是一个RDBMS,如果它是的,因为得到斜线宠爱? (一厢情愿)

Would article retrieval crap out because of it being a flat file rather than a RDBMS if it were to get slash-doted? (Wishful thinking)

我不反对使用RDBMS,只是要求社区中的这样一个软件架构方案的可行性意见。

I'm not against using a RDBMS, just asking the community their opinion on the viability of such a software architecture scheme.

后续行动:
在这个问题上的情况下,我会看到平面文件==基于文件系统的例如每个博客条目和与之配套的元数据将在一个单独的文件。 ,借该文件夹的日期结构(博客\\ testblog2 \\ 2008 \\ 12 \\ 01)== 2008年12月1日

Follow Up: In the case of this question I would see "Flat file == file system–based" For example each blog entry and its accompanying metadata would be in a single file. Making for many files organized by date structure of the file folders (blogs\testblog2\2008\12\01) == 12/01/2008

推荐答案

平面文件数据库中有自己的位置,并且是正确的域名挺管用的。

Flat file databases have their place and are quite workable for the right domain.

邮件服务器和过去的NNTP服务器真正推多远,你真的可以把这些东西的限制(这实际上是很远 - 文件系统可以有数以百万计的文件和目录)。

Mail servers and NNTP servers of the past really pushed the limits of how far you can really take these things (which is actually quite far -- files systems can have millions of files and directories).

扁平文件DB两个最大的弱点索引和原子更新,但如果域适用这些可能不是一个问题。

Flat file DBs two biggest weaknesses are indexing and atomic updates, but if the domain is suitable these may not be an issue.

但是你可以,​​例如,在适当的锁定,使用基本的文件系统命令,至少在Unix上做一个原子索引更新。

But you can, for example, with proper locking, do an "atomic" index update using basic file system commands, at least on Unix.

一个简单的例子是有通过数据运行后创建下一个临时名称新的索引文件索引过程。然后,当你做,你只需重命名(或者系统调用重命名(2)或外壳mv命令)旧的文件在新文件。重命名和mv是在Unix系统上的原子操作(即它或者工作或没有不和从未有一个失踪状态之间)。

A simple case is having the indexing process running through the data to create the new index file under a temporary name. Then, when you are done, you simply rename (either the system call rename(2) or the shell mv command) the old file over the new file. Rename and mv are atomic operations on a Unix system (i.e. it either works or it doesn't and there's never a missing "in between state").

同样的,创造新的条目。基本上完全写入文件到一个临时文件,然后重命名或在MV中它的最终位置。然后,你从来没有在DB中间的文件。否则,你可能有一个竞争条件(如进程读取还是写入一个文件,并可以到达终点前的写作过程是完整的 - 丑陋的比赛条件)。

Same with creating new entries. Basically write the file fully to a temp file, then rename or mv it in to its final place. Then you never have an "intermediate" file in the "DB". Otherwise, you might have a race condition (such as a process reading a file that is still being written, and may get to the end before the writing process is complete -- ugly race condition).

如果您的主索引与目录名工作得很好,那么这工作得很好。您可以使用散列方案,例如,创建目录和子目录找到新的文件。

If your primary indexing works well with directory names, then that works just fine. You can use a hashing scheme, for example, to create directories and subdirectories to locate new files.

查找使用文件名和目录结构的文件被非常快的,因为大多数文件系统今天索引他们的目录。

Finding a file using the file name and directory structure is very fast as most filesystems today index their directories.

如果你把一百万目录中的文件,有可能是你想在看向调整问题,而是出于那个盒子的大多数人会很容易地处理成千上万10的。只要记住,如果您需要扫描的目录,那里将有很多要扫描的文件。通过目录分区有助于prevent这一点。

If you're putting a million files in a directory, there may well be tuning issues you'll want to look in to, but out of that box most will handle 10's of thousands easily. Just remember that if you need to SCAN the directory, there's going to be a lot of files to scan. Partitioning via directories helps prevent that.

但是,这一切都取决于你的索引和搜索技术。

But that all depends on your indexing and searching techniques.

实际上,提供静态内容的股票现成的Web服务器是一个大型,平面文件数据库和模型作品pretty不错。

Effectively, a stock off the shelf web server serving up static content is a large, flat file database, and the model works pretty good.

最后,当然,你有免费的Unix文件系统级工具在您的处置多如牛毛,但都发生问题的文件(分叉grep的100万倍找到一个文件的东西都会有性能折衷不计其数 - 的简单的开销加起来)。

Finally, of course, you have the plethora of free Unix file system level tools at your disposal, but all them have issues with zillions of files (forking grep 1000000 times to find something in a file will have performance tradeoffs -- the overhead simply adds up).

如果您的所有文件都是同一个文件系统上,那么硬链接也给你的选择(因为他们也都是原子)为使相同的文件在不同的地方(基本上为索引)的条款。

If all of your files are on the same file system, then hard links also give you options (since they, too, are atomic) in terms of putting the same file in different places (basically for indexing).

例如,你可以有一个今天目录,昨天目录,Java的目录,实际的消息目录。

For example, you could have a "today" directory, a "yesterday" directory, a "java" directory, and the actual message directory.

所以,一个岗位可以在今天目录链接,通过Java目录(因为职位是标记为Java的,说的),并在其最后的位置(比如/用品/ 2008/12 / 01 / my_java_post.txt)。然后,在午夜,您运行两个过程。第一个发生在今天目录中的所有文件,检查他们的创建日期,以确保他们不是今日(因为这个过程可能需要几秒钟,一个新的文件可能潜入),并重命名这些文件昨天。接下来,为昨天目录做同样的事情,只有在这里你只是删除他们,如果他们是过时了。

So, a post could be linked in the "today" directory, the "java" directory (because the post is tagged with "java", say), and in its final place (say /articles/2008/12/01/my_java_post.txt). Then, at midnight, you run two processes. The first one takes all files in the "today" directory, checks their create date to make sure they're not "today" (since the process can take several seconds and a new file might sneak in), and renames those files to "yesterday". Next, you do the same thing for the "yesterday" directory, only here you simply delete them if they're out of date.

同时,该文件是仍处于的Java和... / 1​​2/01目录中。由于您使用的UNIX文件系统和硬链接,文件只存在一次,这些都只是指针到文件中。他们都不是的文件,他们都是一样的。

Meanwhile, the file is still in the "java" and the ".../12/01" directory. Since you're using a Unix file system, and hard links, the "file" only exists once, these are all just pointers to the file. None of them are "the" file, they're all the same.

您可以看到,虽然每个单独的文件此举是原子,大部分是没有的。例如,今天脚本运行时,在昨天目录下可以很好地包含两个文件昨天和前一天,因为昨天的剧本还没有运行。

You can see that while each individual file move is atomic, the bulk is not. For example, while the "today" script is running, the "yesterday" directory can well contain files from both "yesterday" and "the day before" because the "yesterday" script had not yet run.

在一个事务数据库,你会做的一次。

In a transactional DB, you would do that all at once.

但是,简单地说,它是一个尝试和真正的方法。 Unix的,特别是工作得非常好与成语,与现代文件系统能够支持也非常的好。

But, simply, it is a tried and true method. Unix, in particular, works VERY well with that idiom, and the modern file systems can support it quite well as well.

这篇关于是平面文件数据库有什么好处?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆