我如何知道何时对文件建立索引? [英] How can I tell when documents have been indexed?

查看:61
本文介绍了我如何知道何时对文件建立索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

昨天我问过的这个问题后面有一些内容,它表明W10中的Elasticsearch-as-a-service服务启动后需要一定的有限时间才能允许请求,即使在Python脚本中实际上已交付 Elasticsearch 对象几秒钟之后,我现在发现如果我将文档添加到索引并立即查询索引我没有任何结果(但是如果我等待几秒钟,我确实会得到预期的结果).

Somewhat following on from this question which I asked yesterday, which shows that Elasticsearch-as-a-service in W10 takes a certain finite time to allow requests after the service has been started, even several seconds after an Elasticsearch object has actually been delivered in the Python script, I now find that if I add documents to an index and immediately query the index I get no results (but if I wait a few seconds I do get the expected results).

我在学习时正在阅读一本有关ES的书,关于索引更新的信息仅每秒发生一次(这本书涉及ES 1.7,我使用的是7.10).

I am reading a book on ES as I learn, and there was something there about index updates happening only once a second (the book covers ES 1.7, I'm using 7.10).

问题是,添加文档后,是否有我可以运行的命令(Python elasticsearch 模块或可能是REST URL ...),该命令将不会返回,直到为新文档建立索引为止,或以某种方式指示在索引之后现在索引中有多少个文档?

The question is, after adding documents, is there some command I can run (Python elasticsearch module or possibly a REST URL...) which will either not return until the new documents have been indexed, or indicate somehow how many documents are now in the index, after having been indexed?

NB我正在使用这种命令编制索引:

NB I am using this sort of command to index:

es_obj.index( index='my_index', body=record_as_json_string )

推荐答案

答案

是的,您可以通过多种方式使用刷新API来实现它.

Yes, you can use the refresh API in multiple ways to achieve it.

例如,下面将立即插入并刷新.

For instance, below will insert and refresh immediately.

curl -X PUT "localhost:9200/test/_doc/1?refresh&pretty" -H 'Content-Type: application/json' -d'
{"test": "test"}
'
curl -X PUT "localhost:9200/test/_doc/2?refresh=true&pretty" -H 'Content-Type: application/json' -d'
{"test": "test"}
'

从文档中:

在操作发生后立即刷新相关的主和副本分片(而不是整个索引),以便更新后的文档立即出现在搜索结果中.无论是从索引还是从搜索的角度来看,都应在仔细考虑和验证后再进行,确保不会导致性能下降.

Refresh the relevant primary and replica shards (not the whole index) immediately after the operation occurs, so that the updated document appears in search results immediately. This should ONLY be done after careful thought and verification that it does not lead to poor performance, both from an indexing and a search standpoint.

来源: https://www.elastic.co/guide/zh-CN/elasticsearch/reference/current/docs-refresh.html

您应该这样做吗?

以这种方式保留默认设置以提供更好的性能.由于ES主要用于存储大型数据集,并且刷新是一项较昂贵的操作,因此在每次插入后刷新都可能导致您遇到无法预料的延迟和性能问题.上面提到的突出显示了何时使用什么.请参考以获得有关性能调整的更好解释.

The default settings are kept that way to give better performance. Since ES is mostly used to store large data sets, and refresh is a costlier operation, refreshing after every insert might lead you into unforeseen delays and performance issues. The above mentioned source highlights when to use what. Refer this for better explanation on performance tuning.

这篇关于我如何知道何时对文件建立索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆