什么是更好的 ?一个大领域还是许多小领域? [英] What is better ? One big field or many small?

查看:110
本文介绍了什么是更好的 ?一个大领域还是许多小领域?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要写一个基于Zend SearchLucène的搜索引擎.

I'm about writing a search engine based on Zend Search Lucène.

我的对象有许多不同的字段(10种文本类型),我想知道哪种方法最好. (所有字段都未存储,仅被索引,我不需要恢复它们.)

My objects have many different fields (10 text type), and i would like to know which of these ways is the best. (All fields are unstored, just indexed, I don't need to recover them.)

一个大字段,(由许多小字段组成):

One big field, (concatenation of many small fields) :

$content = $textfield1 . $textfield2 . $textfield3 . $textfield4 ...
Zend_Search_Lucene_Field::unStored("content", $content);

许多小领域:

Zend_Search_Lucene_Field::unStored("content", $textfield1);
Zend_Search_Lucene_Field::unStored("content2", $textfield2);
Zend_Search_Lucene_Field::unStored("content3", $textfield3);
....
....

每个字段可能包含很多文本(大约500个单词或更多).

Each fields may contain lot of text (about 500 words and more).

推荐答案

如果这些字段的内容相似,那么就性能而言,最好有一个字段而不是几个字段(假设大多数情况下,您想要搜索所有这些对象.)

If the content of these fields is similar, then performance-wise, it's better to have one field than several ones (assuming that most of the time you want to search across all of them).

Lucene将字段中的术语以{field}{term}的形式存储在一个大词典中,因此,如果您不需要分别对待这些字段,最好将它们放在一个袋子中.这样,您将可以使用较小的字典(特别是如果这些字段的术语相似)并且在搜索过程中减少磁盘搜索(扫描的发布列表的总和将大致保持不变).

Lucene stores terms for fields in one big dictionary as concatenation of form {field}{term}, so if you don't need to treat the fields separately it's better to throw them into one bag. This way, you will have way smaller dictionary (especially, if the terms for these fields are similar) and less disk seeks during the search (the sum of postings list scanned will remain roughly the same).

这篇关于什么是更好的 ?一个大领域还是许多小领域?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆