Solr索引和搜索多语言数据 [英] Solr index and search multilingual data

查看:138
本文介绍了Solr索引和搜索多语言数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在索引期间的Solr模式中,Solr检测被索引数据的语言,并根据检测到的语言应用不同的索引规则。所有数据都存储在特定于语言的字段中,例如:

In my Solr schema during indexing Solr detects a language of the data being indexed and applies different indexing rules according to the language it's detected. All data is stored in language specific fields, for example:


  • 英文标题存储在 title_en field。

  • 西班牙语标题存储在 title_es 字段中。

  • English titles are stored in title_en field.
  • Spanish titles are stored in title_es field.

-

<field name="title_en" type="text_en" indexed="true" stored="true"/>
<field name="title_es" type="text_es" indexed="true" stored="true"/>

所有搜索都针对一个包含所有字段的文本:

All searches are made against one catch-all field "text":

<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>

所有语言特定字段都复制到文本字段,以便可用于搜索查询:

All language specific fields are copied to "text" field, in order to be available for search query:

<copyField source="title_en" dest="text"/>
<copyField source="title_es" dest="text"/>

我关注的是:,因为text字段正在对其进行索引编制拥有,应用我假设text_general索引规则,然后重新索引发生,我想所有以前语言特定字段(title_en,title_es)的语言特定索引规则都将丢失。

My concern is: since "text" field is doing indexing of its own, applying I assume "text_general" indexing rules, then re-indexing takes place and I guess all previous language specific indexing rules for the language specific fields (title_en, title_es) are lost.

如果是这样,那么如何在一个查询中搜索所有数据,保留特定于语言的索引?

If so, then how do I do search in one query across all data, preserving language specific indexes?

推荐答案

是的,存储在 text 中的数据(定义为 text_general )仅根据该字段的规则进行处理 - 以及不受 title_en title_es 的影响。 copyField 在任何值处理之前发生,因为您通常(如本例所示)想要在字段上执行不同的标记化和分析。

Yes, the data stored in text (defined as text_general) is only processed according to the rules for that field - and is not affected by title_en or title_es. copyField happens before any processing of the value, since you usually (as in this case) want to perform different tokenization and analysis on the field.

一个简单的解决方案是使用查询字段参数查询title_en和title_es字段( qf = title_en,title_es 。这将根据您的查询搜索已处理内容的英语和西班牙语版本。

An easy solution is to query the title_en and title_es fields if you want to search both, by using the query fields parameter: qf=title_en,title_es. This will search both the english and spanish version of your processed content according to your query.

这篇关于Solr索引和搜索多语言数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆