Elasticsearch-Java RestHighLevelClient-如何使用滚动API获取所有文档 [英] Elasticsearch - Java RestHighLevelClient - how to get all documents using scroll api

查看:332
本文介绍了Elasticsearch-Java RestHighLevelClient-如何使用滚动API获取所有文档的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Elasticsearch的索引中,我保存了大约30000个实体。我想使用RestHighLevelClient获取它们的所有ID。我读过,最好的方法是使用滚动API。但是,当我这样做时,我只能接收大约10个实体,而不是30k。如何解决这个问题

In my index in Elasticsearch I saved about 30000 entities. I'd like to get all ids of them using RestHighLevelClient. I've read that the best way to do it is to use scroll api. However when I do it I recieve only about 10 entities instead of 30k. How to solve this

final class ElasticRepo {
    private final RestHighLevelClient restHighLevelClient;

List<ListingsData> getAllListingsDataIds() {
        val request = new SearchRequest(ELASTICSEARCH_LISTINGS_INDEX);
        request.types(ELASTICSEARCH_TYPE);
        val searchSourceBuilder = new SearchSourceBuilder()
                .query(matchAllQuery())
                .fetchSource(new String[]{"listing_id"}, new String[]{"backoffice_data", "search_and_match_data"});
        request.source(searchSourceBuilder);
        request.scroll(TimeValue.timeValueMinutes(3));
        return executeQuery(request);
    }

 private List<ListingsData> executeQuery(final SearchRequest searchQuery) {
        try {
            val hits = restHighLevelClient.search(searchQuery, RequestOptions.DEFAULT).getHits().getHits();
            return Arrays.stream(hits).map(SearchHit::getSourceAsString).map(ElasticRepo::toListingsData).collect(Collectors.toList());
        } catch (IOException e) {
            e.printStackTrace();
            throw new RuntimeException("");
        }
    }

}

何时我做到了executeQuery只返回大约11个实体。如何解决该问题,如何获取索引中的所有文档?

And when I do it executeQuery returns only about 11 entites. How to solve that, how to obtain all documents in index ?

推荐答案

尝试遵循此示例,我正在使用此代码并它的工作原理是:

try to follow this example, I am using this code and it works:

        String query = "your query here";

        QueryBuilder matchQueryBuilder = QueryBuilders.boolQuery().must(new QueryStringQueryBuilder(query));

        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        searchSourceBuilder.query(matchQueryBuilder);

        searchSourceBuilder.size(5000); //max is 10000

        searchRequest.indices("your index here");

        searchRequest.source(searchSourceBuilder);

        final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(10L));

        searchRequest.scroll(scroll);

        SearchResponse searchResponse = client.search(searchRequest);
            String scrollId = searchResponse.getScrollId();

        SearchHit[] allHits = new SearchHit[0];

        SearchHit[] searchHits = searchResponse.getHits().getHits();

        while (searchHits != null && searchHits.length > 0)
        {

            allHits = Helper.concatenate(allHits, searchResponse.getHits().getHits()); //create a function which concatenate two arrays

            SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);

            scrollRequest.scroll(scroll);

            searchResponse = client.searchScroll(scrollRequest);

            scrollId = searchResponse.getScrollId();

            searchHits = searchResponse.getHits().getHits();

        }

        ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
        clearScrollRequest.addScrollId(scrollId);
        ClearScrollResponse clearScrollResponse = client.clearScroll(clearScrollRequest);

这篇关于Elasticsearch-Java RestHighLevelClient-如何使用滚动API获取所有文档的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆