创建和更新Zend_Search_Lucene索引 [英] Creating and updating Zend_Search_Lucene indexes

查看:99
本文介绍了创建和更新Zend_Search_Lucene索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Zend_Search_Lucene创建文章索引,以便可以在我的网站上对其进行搜索.每当管理员更新/创建/删除管理区域中的文章时,索引就会重新构建:

I'm using Zend_Search_Lucene to create an index of articles to allow them to be searched on my website. Whenever a administrator updates/creates/deletes an article in the admin area, the index is rebuilt:

$config = Zend_Registry::get("config");
$cache = $config->lucene->cache;
$path = $cache . "/articles";

try
{
    $index = Zend_Search_Lucene::open($path);
}
catch (Zend_Search_Lucene_Exception $e)
{
    $index = Zend_Search_Lucene::create($path);
}

$model = new Default_Model_Articles();
$select = $model->select();
$articles = $model->fetchAll($select);

foreach ($articles as $article)
{
    $doc = new Zend_Search_Lucene_Document();
    $doc->addField(Zend_Search_Lucene_Field::Text("title", $article->title));
    $index->addDocument($doc);
}

$index->commit();

我的问题是这个.由于我正在重新索引文章并处理已删除的文章,为什么我不每次都只使用创建"(而不是打开"和更新)?使用以上方法,我认为每次都会在文章中添加addDocument(因此将有重复项).我该如何预防?有没有一种方法可以检查索引中是否已经存在文档?

My question is this. Since I am reindexing the articles and handling deleted articles as well, why would I not just use "create" every time (instead of "open" and update)? Using the above method, I think the articles would be added with addDocument every time (so there would be duplicates). How would I prevent that? Is there a way to check if a Document exists already in the index?

此外,我不认为我完全理解打开"并更新索引时的工作方式.似乎每次都在索引文件夹中创建新的#.cfs(因此我有_0.cfs,_1.cfs,_2.cfs)文件,但是当我使用"create"时,它将用新的#.cfs覆盖该文件#递增的文件(例如,仅_2.cfs).您能解释一下这些分段文件是什么吗?

Also, I don't think I fully understand how the indexing works when you "open" and update it. It seems to create new #.cfs (so I have _0.cfs, _1.cfs, _2.cfs) files in the index folder every time, but when I use "create", it overwrites that file with a new #.cfs file with the # incremented (so, for example just _2.cfs). Can you please explain what these segmented files are?

推荐答案

是的,您可以检查文档是否已在索引中,并查看

Yes , you can check if a Document is already in the index, have a look in this Manual Page. You can then delete this specific Document from the index via $index->delete($id);, where $id is the return value of the termDocs method. After that you can simply add the new version of the Document.

关于Lucene创建的多个索引文件:每次修改现有索引时,Lucene不会真正更改现有文件,但会为您所做的每次更改添加部分索引.这对性能极度不利,但是有一个简单的方法可以解决此问题.在对索引进行每次更改之后,请执行以下操作:$ index->​​ optimize(); -这会将所有部分文件追加到真实索引中,从而大大缩短了搜索时间.

About the multiple index files that Lucene creates: Every time you modify an existing index, Lucene does not realy change the existing files, but adds partial indexes for every change you make. This is extremely bad for performance, but there is a simple way around this. After every change you make to the index do this: $index->optimize(); - this will append all the partial files to the real index, improving searchtimes dramatically.

这篇关于创建和更新Zend_Search_Lucene索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆