如何在不使用 Java 或 Google 站点搜索的情况下向 PHP 网站添加搜索功能? [英] How to add a search functionality to a PHP website, without using Java or Google site search?

查看:47
本文介绍了如何在不使用 Java 或 Google 站点搜索的情况下向 PHP 网站添加搜索功能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在我的网站上添加一个搜索字段.该站点基于 PHP 和 Yii 框架.网络服务器在提供结果网页之前组装多个数据(来自文件和 API)(网络服务器迟早会从 MySQL 数据库中获取这些数据,但目前它只是文件,而 API结果).

I would like to add a search field to my site. The site is based on PHP and the Yii framework. The web-server assembles multiple data (from files and APIs) before serving the resulting web-page (the web-server will get these pieces of data out of a MySQL database sooner or later, but it's just files at the moment, and API results).

Apache 的 Lucene 可以解决这个问题,但无法在服务器上使用 Java - 我在共享 Linux 主机上.

Apache's Lucene could answer the problem, but there is no way to use Java on the server - I am on a shared Linux host.

Google 站点搜索(或 bing 的,..)可以解决这个问题,但我想要一个完全可定制的搜索框,并在建议的结果中添加一些结果.

Google site search (or bing's,..) could answer the problem, but I would like to have a fully-customizable search box, and add some results to the proposed result.

我可以创建自己的搜索引擎、索引页面并根据每条数据的来源使用不同的权重,以获得精确的结果;但我认为必须有一些更有效、更快实施的方法.

I could create my own search engine, indexing pages and using different weights according to where each piece of data come from, to have a precise result ; but I think there must be something out there that would be more efficient, and quicker to implement.

在不使用 Java 或 Google 站点搜索的情况下向基于 PHP 的网站添加快速搜索功能的方法是什么?

What'd be a way to add a quick search functionality to a PHP based website, without using Java or Google site search ?

推荐答案

我使用 Zend Framework,因此 Zend_Search_Lucene.它是分面搜索的纯 PHP 实现.您可以相对直接地定义自己的文档"(作为数据的集合)、权重轴和构建索引.根据我的经验,缺点是索引和查询比(例如)solr 慢得多.

I use Zend Framework and consequently Zend_Search_Lucene. It's a pure PHP implementation of a faceted search. You can define your own "document" (as an aggregate of your data), weight axes, and build indexes relatively straight-forwardly. The downside, in my experience, is that it's much slower on indexing and query than (eg) solr.

更新 1为了回应评论,这里有一个链接:我如何使用 Zend_Search_Lucene 进行空间搜索.那里的代码演示了一些事情:

Update 1 In response to comment, here's a link: how I use Zend_Search_Lucene for spatial searches. The code there demonstrates a few things:

  • 第 54-62 行显示了如何向索引添加文档".在这个例子中,文档只有两个字段(经度和纬度),但你明白了.只需将其放入循环中并将文档添加到您的索引中即可.在生产操作中,我会跟踪数据的更改,并在进入索引文档的任何数据发生更改时更新索引.最初的导入非常慢——根据经验,我发现算法至少是 O(n log n) 和一个相当大的 K,而 solr 更像是 O(log n).
  • 第 42-52 行显示了如何搜索索引.这个搜索比平时更复杂一些,因为我必须以与索引中编码的方式相同的方式对经度和纬度进行编码.文章解释了为什么必须这样做,但足以说明:如果您只有文本数据,索引搜索并不难.
  • 第 40 行正在创建索引,前两个项目符号中提到的添加"和搜索"都需要该索引.请注意,将索引保持在快速介质(如 SD 存储)上会降低算法中的 K,但它仍然(根据经验,非分析)为 O(n log n).
  • 第 1-38 行是将经度和纬度规范化为 Zend_Search_Lucene 支持的格式所需的助手.同样,如果您只有文本数据,则不需要这种复杂性.
  • Lines 54-62 show how to add a "document" to the index. In this example, the document only has two fields (longitude & latitude), but you get the idea. Just put this in a loop and add documents to your index. In production operation, I keep track of changes to data, and update the index when any data going into indexed documents changes. The initial import is very slow -- empirically, I found the algorithm is at least O(n log n) with a pretty big K, while solr was more like O(log n).
  • Lines 42-52 show how to search an index. This search is a bit more complicated than usual, because I have to encode longitude and latitude in the same way its encoded in the index. The article explains why this has to be done, but suffice to say: if you just have text data, the index searching is not this hard.
  • Line 40 is creating the index, which both the "add" and "search" mentioned in the previous two bullets requires. Note that keeping the index on a fast medium (like SD storage) lowers the K in the algorithm, but it's still (empirically, not analytically) O(n log n).
  • Lines 1-38 are the helpers needed to normalize a longitude and latitude into a format that Zend_Search_Lucene supports. Again, if you have only text data, this complication isn't necessary.

更新 2 回应对性能的评论.将索引放在快速介质(SD、带同步的 RAM 磁盘等)上会加快速度.使用 未存储 字段也有一定帮助.这两个都减少了经验 O(n log n) 中的常数,但主要问题仍然是 n 乘数.Zend 似乎在做的是,在每次添加时,将大部分或所有先前添加的内容重新混洗到索引中.据我所知,这是索引构建过程中使用的算法,无法修改.

Update 2 Responding to the comment on performance. Putting the index on a fast medium (SD, RAM disk w/ sync, whatever) speeds it up a bit. Using unstored fields also helps a bit. Both of these reduce the constant in the empirical O(n log n), but still the dominant problem is that n multiplier. What Zend appears to do is, upon each add, re-shuffle most or all of the previous adds to the index. As far as I can tell, this is the algorithm in play during index build and can't be modified.

我绕过 n-multiplier 的方法是使用 Zend Page Cache 基于词干查询(所以如果有人输入blueberries"、blueberry"、blue berry"、blu bary"等,他们都会被词干并固定为 soundex 语音blue-bear-ee").常见查询几乎可以立即获得结果,并且由于特定域是读取密集型和插入潜在的,因此这是一个可以接受的解决方案.显然一般来说不是.

The way I got around that n-multiplier was to use a Zend Page Cache based on the stemmed query (so if someone types "blueberries", "blueberry", "blue berry", "blu bary", etc. they all get stemmed and fixed to the soundex phonetic "blue-bear-ee"). Common queries get almost instant results, and since the particular domain was read-heavy and insert-latent, this was an acceptable solution. Obviously in general it's not.

在其他情况下,有 setResultSetLimit() 方法,当与评分一起使用时,将更快地返回结果.如果你不关心所有可能的结果,只关心前 N 个结果,那么这就是要走的路.

In other circumstances, there is the setResultSetLimit() method, which when used with scoring, will return results faster. If you don't care about all possible results, just the top N results, then this is the way to go.

最后,所有这些经验都与 Zend 1.x 相关.不知道 2.x 有没有解决这个问题.

Finally, all this experience is with respect to Zend 1.x. I do not know if this has been addressed in 2.x.

这篇关于如何在不使用 Java 或 Google 站点搜索的情况下向 PHP 网站添加搜索功能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆