如何在数据库中存储文章或其他大文本 [英] How to store articles or other large texts in a database

查看:591
本文介绍了如何在数据库中存储文章或其他大文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在设计一个数据库驱动的网站。主要原因是为了学习目的,但我不会说谎,有少量的虚荣心包括!

I am currently in the process of designing myself a database driven website. The main reason is for learning purposes but I wont lie, there is a small amount of vanity included!

虽然我相信我的数据库设计到目前为止,我仍然不完全确定存储文章或其他大文本的最佳方式。我知道大多数DBMS具有TEXT数据类型或等效,并且可以容纳大量的文本。但是,将整篇文章存储为一个长字符串会导致不愉快的阅读,因此需要格式化。

While I believe that my database design is pretty good so far, I am still not entirely sure on the best way of storing articles or other large texts. I know most DBMSs have the TEXT datatype or equivalent and can hold a massive amount of text. However, storing a full article as one long string makes for unhappy reading, so formatting is going to be needed.

我将文章文本与所有HTML或BBcode标签 - 或者最好只是在HTML或XML文档中创建页面,并将该文件的路径存储在DB中?

Do I store the article text along with all of the HTML or BBcode tags - or is it better to simply create the page in either a HTML or XML document and store the path to this file in the DB?

我非常喜欢将文章存储为XML文档的想法,因为我可以轻松地使用自定义标记标记文章,并使用PHP的XML和XSLT函数将XML转换为HTML [或确实,任何其他格式]。它还允许作者决定何时创建线/分页符。这种方法当然需要额外的编码[我不怕],但它确实存在一个问题,使文章可搜索。

I quite like the idea of storing articles as an XML document as I can easily markup an article with custom tags and use PHP's XML and XSLT functions to transform the XML to HTML [or indeed, any other format]. It also allows the author to dictate when to create line/page breaks. This approach would of course require extra coding [which I am not afraid of] but it does present a problem with making articles searchable.

我知道MySQL,用于在文本字段中保存的字符串中搜索特定术语/短语的SQL语法。如果我将文本存储在单独的文件中,我该如何使这些文章可搜索?

I know MySQL, for example, has SQL syntax for searching for specific terms/phrases inside strings held in a text field. If I were to store text in separate files, how might I approach making these articles searchable?

这里有很多我写在这么简单的问题,我会分解:

There is quite a lot I have written here on such a simple question, so I will break it down:

1:是否有一种最好的方式直接在数据库中存储大量格式化的文本或

2

1: Is there a "best" way of storing large amounts of formatted text directly in a database or
2: is it better to hold paths to that text in the form of HTML/XML/Whatever documents.

如果是2,是否有一种优雅的方式使文本可搜索?

If 2, is there an elegant way of making that text searchable?

感谢您的时间:)

推荐答案

一个大文本字段如亚历克斯建议。要搜索,请不要敲击您的数据库,请使用 Lucene htdig 来创建输出的索引。这种方式搜索非常快。副作用是你让搜索更多的搜索引擎友好;

Store everthing in one big text field as Alex suggested. For searching, don't hammer your database, use Lucene, or htdig to create an index of your output. This way searches are very fast. The side effect is you make your searches a little more search engine friendly; you take your keywords field (as backslash suggested) and stick them in the meta-keywords attribute.

编辑

除非你只是搜索关键字,使用数据库做搜索将是非常慢的(曾经搜索论坛,它需要永远?)。数据库没有办法索引

Unless you are only searching keywords, having the db do the searches will be horribly slow (ever searched a forum and it takes FOREVER?). There is no way for the database to index a

  select.. where FULLTEXTFIELD like '%cookies%'.  

这是令人沮丧的寻找一篇文章,搜索不返回您要找的结果,因为他们不在关键字字段! Htdig允许您有效地搜索文章的全文。您的搜索将立即回来,并且文章中的每个词条都可以完全搜索。将关键字放在元标记中会使搜索结果页上的搜索结果更高。

It is frustrating looking for an article and the search doesn't return the results your are looking for because they weren't in the keyword field! Htdig allows you to search the full text of the article efficiently. Your searches will come back instantly, and EVERY term in the article is fully searchable. Putting the keywords in the meta tags will make searches on those terms come higher on the results page.

另一个好处是模糊匹配。如果搜索activate,htdigg将匹配具有活动,激活,活动等(可配置)的页面。或者如果用户拼写一个单词,它仍然会匹配。您希望您的用户拥有Google的体验,而不是令人讨厌的。 :)

Another benefit is fuzzy matching. If you search for 'activate' htdigg will match pages that have active, activation, activity etc. (configurable). Or if the user misspells a word, it will still be matched. You want your users to have a Google like experience, not an annoying one. :)

您需要一个脚本来创建数据库中所有页面的链接列表。让htdig自动爬行,你永远不必再考虑它。

You do need a script to create a list of links to all your pages from your database. Have htdig crawl this automatically and you never have to think about it again.

此外,htdig也会抓取您的非数据库页面,因此您的整个网站可以通过相同的简单界面进行搜索。

Also htdig will crawl your non database pages as well so your whole site is searchable through the same simple interface.

对于关键字字段,您应该具有一个单独的表,名为关键字,其中包含文章的ID和关键字字段(每行一个关键字)。但是为了简单起见,在db中有一个字段不是一个可怕的想法,它使得更新关键字很容易,如果你把它放在一个形式。

As for the keyword field , you should have a separate table called keywords with the id of the article and a keyword field (1 keyword per row). But for simplicity, having a single field in the db isn't a terrible idea, it makes updating the keywords pretty easy if you put it in a form.

如果你不想为此烦恼,可以尝试使用
Google自定义搜索

If you don't want to fuss with all the hassle of that, you can try using Google custom search. it is far less work, but you have no guarantee that all your pages will get indexed.

祝你好运!

这篇关于如何在数据库中存储文章或其他大文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆