ElasticsearchTemplate 检索大数据集 [英] ElasticsearchTemplate retrieve big data sets

查看:44
本文介绍了ElasticsearchTemplate 检索大数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 ElasticsearchTemplate 的新手.我想根据我的查询从 Elasticsearch 获取 1000 个文档.我已经使用 QueryBuilder 创建了我的查询,它运行良好.我浏览了以下链接,其中指出可以使用扫描和滚动来实现大数据集.

链接一
链接二

I am new to ElasticsearchTemplate. I want to get 1000 documents from Elasticsearch based on my query. I have used QueryBuilder to create my query , and it is working perfectly. I have gone through the following links , which states that it is possible to achieve big data sets using scan and scroll.

link one
link two

我正在尝试在以下代码段中实现此功能,我从上面提到的链接之一复制粘贴了这些代码.但我收到以下错误:

I am trying to implement this functionality in the following section of code, which I have copy pasted from one of the link , mentioned above. But I am getting following error :

ResultsMapper 类型不是通用的;它不能用参数 <myInputDto> 进行参数化.

MyInputDto 是我项目中带有 @Document 批注的类.一天结束,我只想从 Elasticsearch 检索 1000 个文档.我试图找到 size 参数,但我认为它不受支持.

MyInputDto is a class with @Document annotation in my project. End of the day , I just want to retrieve 1000 documents from Elasticsearch. I tried to find size parameter but I think it is not supported.

String scrollId = esTemplate.scan(searchQuery, 1000, false);
        List<MyInputDto> sampleEntities = new ArrayList<MyInputDto>();
        boolean hasRecords = true;
        while (hasRecords) {
            Page<MyInputDto> page = esTemplate.scroll(scrollId, 5000L,
                    new ResultsMapper<MyInputDto>() {
                        @Override
                        public Page<MyInputDto> mapResults(SearchResponse response) {
                            List<MyInputDto> chunk = new ArrayList<MyInputDto>();
                            for (SearchHit searchHit : response.getHits()) {
                                if (response.getHits().getHits().length <= 0) {
                                    return null;
                                }
                                MyInputDto user = new MyInputDto();
                                user.setId(searchHit.getId());
                                user.setMessage((String) searchHit.getSource().get("message"));
                                chunk.add(user);
                            }
                            return new PageImpl<MyInputDto>(chunk);
                        }
                    });
            if (page != null) {
                sampleEntities.addAll(page.getContent());
                hasRecords = page.hasNextPage();
            } else {
                hasRecords = false;
            }
        }

这里有什么问题?有没有其他选择来实现这一目标?如果有人能告诉我这个(代码)在后端是如何工作的,我将不胜感激.

What is the issue here ? Is there any other alternative to achieve this? I will be thankful if somebody could tell me how this ( code ) is working in the back end.

推荐答案

解决方案 1

如果你想使用 ElasticsearchTemplate,使用 CriteriaQuery 会更简单易读,因为它允许使用 setPageable 设置页面大小代码>方法.通过滚动,您可以获得下一组数据:

If you want to use ElasticsearchTemplate, it would be much simpler and readable to use CriteriaQuery, as it allows to set the page size with setPageable method. With scrolling, you can get next sets of data:

CriteriaQuery criteriaQuery = new CriteriaQuery(Criteria.where("productName").is("something"));
criteriaQuery.addIndices("prods");
criteriaQuery.addTypes("prod");
criteriaQuery.setPageable(PageRequest.of(0, 1000));

ScrolledPage<TestDto> scroll = (ScrolledPage<TestDto>) esTemplate.startScroll(3000, criteriaQuery, TestDto.class);
while (scroll.hasContent()) {
    LOG.info("Next page with 1000 elem: " + scroll.getContent());
    scroll = (ScrolledPage<TestDto>) esTemplate.continueScroll(scroll.getScrollId(), 3000, TestDto.class);
}
esTemplate.clearScroll(scroll.getScrollId());

解决方案 2

如果您想使用 org.elasticsearch.client.Client 而不是 ElasticsearchTemplate,则 SearchResponse 允许设置要返回的搜索命中:

If you'd like to use org.elasticsearch.client.Client instead of ElasticsearchTemplate, then SearchResponse allows to set the number of search hits to return:

QueryBuilder prodBuilder = ...;

SearchResponse scrollResp = client.
        prepareSearch("prods")
        .setScroll(new TimeValue(60000))
        .setSize(1000)
        .setTypes("prod")
        .setQuery(prodBuilder)
        .execute().actionGet();

ObjectMapper mapper = new ObjectMapper();
List<TestDto> products = new ArrayList<>();

try {
    do {
        for (SearchHit hit : scrollResp.getHits().getHits()) {
            products.add(mapper.readValue(hit.getSourceAsString(), TestDto.class));
        }
        LOG.info("Next page with 1000 elem: " + products);
        products.clear();
        scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
                .setScroll(new TimeValue(60000))
                .execute()
                .actionGet();
    } while (scrollResp.getHits().getHits().length != 0);
} catch (IOException e) {
    LOG.error("Exception while executing query {}", e);
}

这篇关于ElasticsearchTemplate 检索大数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆