Solr索引和搜索多语言数据 [英] Solr index and search multilingual data
问题描述
在索引期间的Solr模式中,Solr检测被索引数据的语言,并根据检测到的语言应用不同的索引规则。所有数据都存储在特定于语言的字段中,例如:
In my Solr schema during indexing Solr detects a language of the data being indexed and applies different indexing rules according to the language it's detected. All data is stored in language specific fields, for example:
- 英文标题存储在
title_en
field。 - 西班牙语标题存储在
title_es
字段中。
- English titles are stored in
title_en
field. - Spanish titles are stored in
title_es
field.
-
<field name="title_en" type="text_en" indexed="true" stored="true"/>
<field name="title_es" type="text_es" indexed="true" stored="true"/>
所有搜索都针对一个包含所有字段的文本:
All searches are made against one catch-all field "text":
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
所有语言特定字段都复制到文本字段,以便可用于搜索查询:
All language specific fields are copied to "text" field, in order to be available for search query:
<copyField source="title_en" dest="text"/>
<copyField source="title_es" dest="text"/>
我关注的是:,因为text字段正在对其进行索引编制拥有,应用我假设text_general索引规则,然后重新索引发生,我想所有以前语言特定字段(title_en,title_es)的语言特定索引规则都将丢失。
My concern is: since "text" field is doing indexing of its own, applying I assume "text_general" indexing rules, then re-indexing takes place and I guess all previous language specific indexing rules for the language specific fields (title_en, title_es) are lost.
如果是这样,那么如何在一个查询中搜索所有数据,保留特定于语言的索引?
If so, then how do I do search in one query across all data, preserving language specific indexes?
推荐答案
是的,存储在 text
中的数据(定义为 text_general
)仅根据该字段的规则进行处理 - 以及不受 title_en
或 title_es
的影响。 copyField
在任何值处理之前发生,因为您通常(如本例所示)想要在字段上执行不同的标记化和分析。
Yes, the data stored in text
(defined as text_general
) is only processed according to the rules for that field - and is not affected by title_en
or title_es
. copyField
happens before any processing of the value, since you usually (as in this case) want to perform different tokenization and analysis on the field.
一个简单的解决方案是使用查询字段参数查询title_en和title_es字段( qf = title_en,title_es
。这将根据您的查询搜索已处理内容的英语和西班牙语版本。
An easy solution is to query the title_en and title_es fields if you want to search both, by using the query fields parameter: qf=title_en,title_es
. This will search both the english and spanish version of your processed content according to your query.
这篇关于Solr索引和搜索多语言数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!