使用Java API获取Elasticsearch的所有记录 [英] Getting all records from Elasticsearch using Java API

查看:338
本文介绍了使用Java API获取Elasticsearch的所有记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Java API获取Elasticsearch的所有记录。但是我收到以下错误


n [[Wild Thing] [localhost:9300] [indices:data / read / search [phase / DFS]]];
嵌套:QueryPhaseExecutionException [结果窗口太大,从
+大小必须小于或等于:[10000],但是[10101]。


我的代码如下

 客户端客户端; 
尝试{
client = TransportClient.builder()。build()。
addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(localhost),9300));
int from = 1;
int to = 100;
while(from< = 131881){
SearchResponse response = client
.prepareSearch(demo_risk_data)
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setFrom(from)
.setQuery(QueryBuilders.boolQuery()。mustNot(QueryBuilders.termQuery(user_agent,)))
.setSize(to).setExplain(true).execute()。actionGet()
if(response.getHits()。getHits()。length> 0){
for(SearchHit searchData:response.getHits()。getHits()){
JSONObject value = new的JSONObject(searchData.getSource());
System.out.println(value.toString());
}
}
}
}

总计目前存在的记录数是131881,所以我从从= 1 到= 100 开始,然后获得100条记录直到从< = 131881 。有没有办法,我可以检查得到记录在一组说100,直到没有进一步的记录在Elasticsearch。

解决方案

是,您可以使用滚动API ,Java客户端也支持



您可以这样做:

 客户端客户端; 
尝试{
client = TransportClient.builder()。build()。
addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName(localhost),9300));

QueryBuilder qb = QueryBuilders.boolQuery()。mustNot(QueryBuilders.termQuery(user_agent,));
SearchResponse scrollResp = client.prepareSearch(demo_risk_data)
.addSort(SortParseElement.DOC_FIELD_NAME,SortOrder.ASC)
.setScroll(new TimeValue(60000))
.setQuery qb)
.setSize(100).execute()。actionGet();

//滚动直到没有命中返回
while(true){
//中断条件:没有命中返回
if(scrollResp.getHits()。 getHits()。length == 0){
break;
}

//否则读取结果
for(SearchHit命中:scrollResp.getHits()。getHits()){
JSONObject value = new JSONObject(searchData。的getSource());
System.out.println(value.toString());
}

//准备下一个查询
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())。setScroll(new TimeValue(60000))。execute()。actionGet );
}
}


I am trying to get all the records from Elasticsearch using Java API. But I receive the below error

n[[Wild Thing][localhost:9300][indices:data/read/search[phase/dfs]]]; nested: QueryPhaseExecutionException[Result window is too large, from + size must be less than or equal to: [10000] but was [10101].

My code is as below

Client client;
try {
    client = TransportClient.builder().build().
            addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300));
    int from = 1;
    int to = 100;
    while (from <= 131881) {
        SearchResponse response = client
                .prepareSearch("demo_risk_data")
                .setSearchType(SearchType.DFS_QUERY_THEN_FETCH).setFrom(from)
                .setQuery(QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("user_agent", "")))
                .setSize(to).setExplain(true).execute().actionGet();
        if (response.getHits().getHits().length > 0) {
            for (SearchHit searchData : response.getHits().getHits()) {
                JSONObject value = new JSONObject(searchData.getSource());
                System.out.println(value.toString());
            }
        }
    }
}

Total number of records currently present are 131881 ,so I start with from = 1 and to = 100 and then get 100 records until from <= 131881. Is there are way where I can check get records in set of say 100 until there are no further records in Elasticsearch.

解决方案

Yes, you can do so using the scroll API, which the Java client also supports.

You can do it like this:

Client client;
try {
    client = TransportClient.builder().build().
            addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300));

    QueryBuilder qb = QueryBuilders.boolQuery().mustNot(QueryBuilders.termQuery("user_agent", ""));
    SearchResponse scrollResp = client.prepareSearch("demo_risk_data")
        .addSort(SortParseElement.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet();

    //Scroll until no hits are returned
    while (true) {
        //Break condition: No hits are returned
        if (scrollResp.getHits().getHits().length == 0) {
            break;
        }

        // otherwise read results
        for (SearchHit hit : scrollResp.getHits().getHits()) {
            JSONObject value = new JSONObject(searchData.getSource());
            System.out.println(value.toString());
        }

        // prepare next query
        scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
    }
}

这篇关于使用Java API获取Elasticsearch的所有记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆