重新索引通过批量API进行弹性搜索,扫描和滚动 [英] Reindexing Elastic search via Bulk API, scan and scroll

查看:149
本文介绍了重新索引通过批量API进行弹性搜索,扫描和滚动的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试重新索引我的弹性搜索设置,目前正在查看弹性搜索文档使用Python API的例子

I am trying to re-index my Elastic search setup, currently looking at the Elastic search documentation and an example using the Python API

我有点困惑,尽管如此,我可以从Python API获取滚动ID:

I'm a little bit confused as to how this all works though. I was able to obtain the scroll ID from the Python API:

es = Elasticsearch("myhost")

index = "myindex"
query = {"query":{"match_all":{}}}
response = es.search(index= index, doc_type= "my-doc-type", body= query, search_type= "scan", scroll= "10m")

scroll_id = response["_scroll_id"]

现在我的问题是,这对我有什么用?知道滚动ID甚至给我什么?该文档说使用批量API,但我不知道scoll_id的因素如何,这有点混乱。

Now my question is, what use is this to me? What does knowing the scrolling id even give me? The documentation says to use the "Bulk API" but I have no idea how the scoll_id factors into this, it was a little confusing.

任何人都可以给出一个简短的例子,显示我从这一点重新索引,考虑到我有正确的scroll_id?

Could anyone give a brief example showing my how to re-index from this point, considering that I've got the scroll_id correctly?

推荐答案

你可以使用滚动api以最有效的方式浏览所有文档。使用scroll_id可以找到存储在服务器上的特定滚动请求的会话。因此,您需要为每个请求提供scroll_id以获取更多的项目。

Hi you can use the scroll api to go through all the documents in the most efficient way. Using the scroll_id you can find a session that is stored on the server for your specific scroll request. So you need to provide the scroll_id with each request to obtain more items.

批量API用于更有效的索引文档。复制和索引时,您需要两者,但并不真正相关。

The bulk api is for more efficient indexing documents. When copying and index you need both, but they are not really related.

我确实有一些java代码可以帮助您更好地了解它的工作原理。

I do have some java code that might help you to get a better idea about how it works.

    public void reIndex() {
    logger.info("Start creating a new index based on the old index.");

    SearchResponse searchResponse = client.prepareSearch(MUSIC_INDEX)
            .setQuery(matchAllQuery())
            .setSearchType(SearchType.SCAN)
            .setScroll(createScrollTimeoutValue())
            .setSize(SCROLL_SIZE).execute().actionGet();

    BulkProcessor bulkProcessor = BulkProcessor.builder(client,
            createLoggingBulkProcessorListener()).setBulkActions(BULK_ACTIONS_THRESHOLD)
            .setConcurrentRequests(BULK_CONCURRENT_REQUESTS)
            .setFlushInterval(createFlushIntervalTime())
            .build();

    while (true) {
        searchResponse = client.prepareSearchScroll(searchResponse.getScrollId())
                .setScroll(createScrollTimeoutValue()).execute().actionGet();

        if (searchResponse.getHits().getHits().length == 0) {
            logger.info("Closing the bulk processor");
            bulkProcessor.close();
            break; //Break condition: No hits are returned
        }

        for (SearchHit hit : searchResponse.getHits()) {
            IndexRequest request = new IndexRequest(MUSIC_INDEX_NEW, hit.type(), hit.id());
            request.source(hit.sourceRef());
            bulkProcessor.add(request);
        }
    }
}

这篇关于重新索引通过批量API进行弹性搜索,扫描和滚动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆