Elasticsearch:对同一数据记录的每种语言使用单独的索引 [英] Elasticsearch: Use a separate index for each language of the same data record

查看:116
本文介绍了Elasticsearch:对同一数据记录的每种语言使用单独的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为title的字段的数据记录。记录可能同时具有不同的标题语言。这样的记录有其他字段的值不随语言变化,所以我不在以下两个示例中列出它们:

 记录#1:
标题(英文):Hello

记录#2:
标题(英文):World
标题(西班牙文):mundo

目前,标题有四种可能的语言:英文,西班牙文,法文和中文。当系统增长时,会支持更多的语言。



我是Elasticsearch的新手。我想要为每种语言分别设一个索引。因此,对于第2号记录,我将创建两个Elasticsearch文档(每种语言一个),并将文档发送到与其语言相对应的索引。



在索引,更新,删除和搜索中,这是一个很好/可接受的设计吗?任何问题?



对于这种设计,我认为至少有以下优点:




  • 我可以很容易地决定每种语言需要多少个碎片
    独立

  • 我可以决定索引(语言)的碎片的数量和位置

  • 当系统增加
    摧毁或重新索引现有数据时,我可以轻松添加新语言的索引。

  • 系统可以最大限度地利用分布式计算
    电力



感谢任何输入!



最佳

解决方案

您的解决方案可能会正常运行,但如果您开始允许多语言搜索,则可能遇到重复文档的问题。



每个字段可能有更多的优先级,例如:




  • title.engligsh

  • title.spanish



您可以有完全不同的分析规则每种语言不会复制文档。



此方法将进一步允许您添加一个新的标题。文档的哪些字段与自己的分析规则。请注意,最后我检查过,如果您使用全新的自定义分析仪,您需要打开/关闭索引才能生效,这将导致几秒钟的停机时间。



我会尝试找到一些时间来扩展这个答案,以一个端到端的例子。


I have a data record which has a field called title. A record may have different languages for the title at the same time. Such a record has other fields whose values do not vary with languages and so I do not list them in the following two examples:

Record #1:
Title (English): Hello

Record #2:
Title (English): World
Title (Spanish): mundo

Currently there are four possible languages for the title: English, Spanish, French, and Chinese. There will be more languages supported when the system grows.

I am new to Elasticsearch. I think about having an separate index for each language. So for record #2, I will create two Elasticsearch documents (one for each language) and send a document to the index corresponding to its language.

Is this a good/acceptable design within indexing, update, delete, and search in mind? Any problems?

For this design, I believe it has at least benefits:

  • I can easily decide how many shards are needed for each language independently
  • I can decide the number and locations of shards for index (language)
  • I can easily add an index for a new language when the system grows destroying or re-indexing existing data.
  • The system can maximally take advantage of distributed computing power

Thanks for any input!

Best.

解决方案

Your solution would likely work fine, but you can run into issues with duplicate documents if you start allowing multi-language searches.

It might be more optimal to have multiple possible values per field, eg:

  • title.engligsh
  • title.spanish

You can have completely different analysis rules for each language without duplicating the document.

This approach will further allow you to add a new title.whatever fields to documents with their own analysis rules. Be warned though, last I checked, if you use a completely new custom analyzer you need to open/close the index for it to take effect, which will result in a few seconds of down time.

I'll try to find some time to expand this answer with an end to end example.

这篇关于Elasticsearch:对同一数据记录的每种语言使用单独的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆