对未编入索引的字段进行部分更新 [英] Partial update on field that is not indexed

查看:175
本文介绍了对未编入索引的字段进行部分更新的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们考虑以下情况 - 文章文档 - 内容(字符串)和视图(int)中有两个字段。视图字段未编入索引。观看字段包含本文阅读次数的信息。



从官方文档


我们还说文件是不可变的:不能更改,
只能被替换。更新API必须遵守相同的规则。在外部,它
看起来好像我们正在部分更新文档。
然而,在内部,更新API只是管理我们已经描述的相同的
检索更改重定向流程。


但是,如果我们做了非索引字段的异常更新怎么会弹性搜索整个文档的索引?例如 - 我想每次有人阅读一些文章时更新意见。如果整个文档被重新编号,我不能进行实时更新(因为操作太重)。所以我必须延迟工作,例如更新访客每3-5-10分钟阅读的所有文章。或者我明白了什么问题?

解决方案

但是,如果我们做了非索引字段的异常更新怎么办?弹性搜索重新索引整个文档?



是的,而视图单独索引它是 _source 字段的一部分。 _source 字段包含您在索引文档时发送到Elasticsearch的原始JSON,并在搜索结果中在文档中匹配时返回结果。 Lucene中的文档将索引 _source 字段。在您的更新脚本中,您正在更改 _source 字段,以便将整个文档重新编入索引。



然后,您可以评估以下策略。每次有人读这篇文章,我发送更新到弹性。但是refresh_interval我设置为30秒。如果在30秒间隔内有1000名用户阅读了一篇文章,这个策略是否正常?



您仍在为1000个文档索引1文档将被索引为当前文档,999个文档将被索引标记为已删除,并在下一次Lucene合并时从索引中删除。


Let's consider the following situation - there are two fields in "article" document - content(string) and views(int). The views field is not indexed. The views field contains information how many times this article was read.

From official doc:

We also said that documents are immutable: they cannot be changed, only replaced. The update API must obey the same rules. Externally, it appears as though we are partially updating a document in place. Internally, however, the update API simply manages the same retrieve-change-reindex process that we have already described.

But what if we do particial update of not indexed field - will elasticsearch reindex the entire document? For example - I want to update views every time someone reads some article. If entire document is reindexed I can't do real time update (as it's too heavy operation). So I will have to work with delay, for example updates all articles the visitors have read every 3-5-10 minutes. Or I understand something wrong?

解决方案

But what if we do particial update of not indexed field - will elasticsearch reindex the entire document?

Yes, whilst the views field is not indexed individually it is part of the _source field. The _source field contains the original JSON you sent to Elasticsearch when you indexed the document and is returned in the results if there is a match on the document during a search. The _source field is indexed with the document in Lucene. In your update script you are changing the _source field so the whole document will be re-indexed.

Could you then evaluate the following strategy. Every time someone reads the article I send update to elastic. However refresh_interval I set to 30 seconds. Will this strategy be normal if during 30 second interval about 1000 users have read one article?

You are still indexing the 1000 documents, 1 document will be indexed as the current document, 999 documents will be indexed marked as deleted and removed from the index during the next Lucene merge.

这篇关于对未编入索引的字段进行部分更新的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆