php、mysql 搜索网站 [英] php, mysql search for the website

查看:60
本文介绍了php、mysql 搜索网站的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个搜索引擎来搜索我正在构建的网站.我决定尝试自己使用 php 和 mysql.目前看起来可行的选择是创建三个表.

I need a search engine for a website I am building. I decided to try my own using php and mysql. Currently it looks like the viable options is to create three tables.

一个是单词,一个是页面,一个是参考表.然后当我插入一篇新文章时,我会扫描文本并将单独的词放在词表中,并在第三个表中引用这些词.

One for words, one for pages, and one reference table. Then when I am inserting a new article I would scan the text and put the separate words in the words table and refernce those words on the third table.

最后进行搜索时.脚本应返回给定单词索引最多的页面.

In the end when a search is made. The script should return the pages with the most indexed words for a given word.

但是看起来这种方法只能根据关键字的数量返回结果.文章中使用的关键字越多,它在结果页面上的排名就越高.因此,关键字较少的文章可能与搜索更相关,但会在结果中排​​名较低.

However it looks like this approach can only return results depending on the number of keywords. The more a keyword is used in an article the more higher it will appear on the result page. So an article with less keywords maybe more related to the search but will be placed lower on the results.

问题是有没有更好的方法来使用 php/mysql 创建自定义搜索引擎?此外,如果您无法访问服务器来安装 Sphinx 等搜索引擎,那么解决此问题的最佳方法是什么?

The question would be is there a better way to create a custom search engine using php/mysql? Also if you do not have access to server to install search engines like Sphinx what is the best way to tackle this problem?

推荐答案

我用几乎相同的方式构建了一个搜索引擎,但我构建了一个交叉表,将每个单词链接到它出现的每个页面.在那个表中,我还存储了单词在页面中出现的次数相对于页面长度的.如果你愿意,我计算了页面上那个词的百分比.这样可以更轻松地对搜索结果应用权重.但不幸的是,很难确定一个页面是否在其他方面更相关.谷歌使用了一些技巧,比如页面上两个关键字之间的距离.如果它们彼此接近,它们很可能是相关的.如果某个关键字在页面中较高,则它可能更重要,依此类推.

I've built a search engine in much the same way, but I built a cross table, linking each word to each page in which it occurred. In that table, I also stored the number of times the word appeared in the page in relation to the length of the page. I calculated if you like, the percentage of the words on the page that were that word. That makes it easier to apply a weight to your search result. But unfortunately it is hard to determine if a page is more relevant in other ways. Google uses some tricks like the distance between two keywords on a page. If they are close to each other, they are probably related. If a keyword is higher in the page, it is probably more important, and so on.

而且,Google 使用了一种完全不同的数据库结构,该结构更适合此类查询.在 MySQL 中构建它可能很困难.

But also, Google uses a totally different database structure that is better built for these kind of queries. It may be hard to build that in MySQL.

你可以试试 MySQL 的全文索引是否对你有帮助.它为您的页面编制索引,您可以使用 MATCH 进行查询,它会为每一行返回一个分数.我不知道那里确切使用了什么公式,但它似乎很聪明.

You can try if the FullText indexing of MySQL is any help to you. It indexes your pages and you can query using MATCH which returns a score for each row. I don't know exactly what formulas are used there, but it seems to be pretty smart.

如果您的所有页面都是公开的,您可能需要考虑使用 Google 自定义搜索或类似的东西.它将为您节省大量时间.

If all of your pages are public you might want to consider using Google Custom Search or something like that. It will save you a lot of time.

这篇关于php、mysql 搜索网站的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆