elasticsearch java批量批量大小 [英] elasticsearch java bulk batch size

查看:395
本文介绍了elasticsearch java批量批量大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



目前我使用的是:

  BulkRequestBuilder bulkRequest = getClient()。prepareBulk(); 
while(hasMore){
bulkRequest.add(getClient()。prepareIndex(indexName,indexType,artist.getDocId())。setSource(json));
hasMore = checkHasMore();
}
BulkResponse bResp = bulkRequest.execute()。actionGet();
//检查失败
log.info(Has failures?{},bResp.hasFailures());

任何想法如何设置批量/批量大小?

解决方案

它主要取决于您的文档的大小,客户端上的可用资源和客户端的类型(传输客户端或节点客户端)。



节点客户端知道集群上的分片,并将文档直接发送到保存分片的节点,并将它们应该进行索引。另一方面,传输客户端是以循环方式将其请求发送到节点列表的普通客户端。批量请求将发送到一个节点,然后在索引时成为您的网关。



由于您使用的是Java API,我建议您看看 BulkProcessor ,这使得它更容易和灵活地索引大量。您可以定义最大数量的操作,最大大小和自上次批量执行以来的最大时间间隔。在需要的时候,它将自动执行批量。您也可以设置并发批量请求的最大数量。



创建 BulkProcessor 如下所示:

  BulkProcessor bulkProcessor = BulkProcessor.builder(客户端,新的BulkProcessor.Listener(){
@Override
public void beforeBulk(long executionId,BulkRequest request){
logger.info(要执行新的批量组成{}操作,request.numberOfActions());
}

@Override
public void afterBulk(long executionId,BulkRequest request,BulkResponse response){
logger.info(执行大量由{}操作组成,request.numberOfActions());
}

@Override
public void afterBulk(long executionId,BulkRequest request,Throwable failure){
logger.warn(执行批量错误,失败);
}
})。setBulkActions(bulkSize).setConcurrentRequests(maxConcurrentBulk).build();

您只需要添加您的请求即可:

  bulkProcessor.add(indexRequest); 

并将其关闭,以清除可能尚未执行的最终请求: p>

  bulkProcessor.close(); 

最后回答你的问题:关于 BulkProcessor 也是它有明智的默认值:5 MB的大小,1000个操作,1个并发请求,没有刷新间隔(可能有用的设置)。


I want to use the elasticsearch bulk api using java and wondering how I can set the batch size.

Currently I am using it as:

BulkRequestBuilder bulkRequest = getClient().prepareBulk();
while(hasMore) {
    bulkRequest.add(getClient().prepareIndex(indexName, indexType, artist.getDocId()).setSource(json));
    hasMore = checkHasMore();
}
BulkResponse bResp = bulkRequest.execute().actionGet();
//To check failures
log.info("Has failures? {}", bResp.hasFailures());

Any idea how I can set the bulk/batch size?

解决方案

It mainly depends on the size of your documents, available resources on the client and the type of client (transport client or node client).

The node client is aware of the shards over the cluster and sends the documents directly to the nodes that hold the shards where they are supposed to be indexed. On the other hand the transport client is a normal client that sends its requests to a list of nodes in a round-robin fashion. The bulk request would be sent to one node then, which would become your gateway when indexing.

Since you're using the Java API, I would suggest you to have a look at the BulkProcessor, which makes it much easier and flexibile to index in bulk. You can either define a maximum number of actions, a maximum size and a maximum time interval since the last bulk execution. It's going to execute the bulk automatically for you when needed. You can also set a maximum number of concurrent bulk requests.

After you created the BulkProcessor like this:

BulkProcessor bulkProcessor = BulkProcessor.builder(client, new BulkProcessor.Listener() {
    @Override
    public void beforeBulk(long executionId, BulkRequest request) {
        logger.info("Going to execute new bulk composed of {} actions", request.numberOfActions());
    }

    @Override
    public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
        logger.info("Executed bulk composed of {} actions", request.numberOfActions());
    }

    @Override
    public void afterBulk(long executionId, BulkRequest request, Throwable failure) {
        logger.warn("Error executing bulk", failure);
    }
    }).setBulkActions(bulkSize).setConcurrentRequests(maxConcurrentBulk).build();

You just have to add your requests to it:

bulkProcessor.add(indexRequest);

and close it at the end to flush any eventual requests that might have not been executed yet:

bulkProcessor.close();

To finally answer your question: the nice thing about the BulkProcessor is also that it has sensible defaults: 5 MB of size, 1000 actions, 1 concurrent request, no flush interval (which might be useful to set).

这篇关于elasticsearch java批量批量大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆