使用ElasticLowLevelClient客户端在Elasticssearch中进行批量索引 [英] Bulk Indexing in Elasticssearch using the ElasticLowLevelClient client

查看:1041
本文介绍了使用ElasticLowLevelClient客户端在Elasticssearch中进行批量索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ElasticLowLevelClient客户端为elasticsearch数据编制索引,因为我无权访问POCO对象,因此需要将其作为原始字符串进行索引.我可以通过调用以下命令成功为单个对象建立索引:

I'm using the ElasticLowLevelClient client to index elasticsearch data as it needs to be indexed as a raw string as I don't have access to the POCO objects. I can successfully index an individual object by calling:

client.Index<object>(indexName, message.MessageType, message.Id, 
    new Elasticsearch.Net.PostData<object>(message.MessageJson));

如何使用ElasticLowLevelClient客户端批量插入索引?批量插入API都需要索引文档的POCO,而我没有,例如:

How can I do a bulk insert into the index using the ElasticLowLevelClient client? The bulk inset APIs all require a POCO of the indexing document which I don't have e.g.:

 ElasticsearchResponse<T> Bulk<T>(string index, PostData<object> body,
      Func<BulkRequestParameters, BulkRequestParameters> requestParameters = null)

我可以为每个对象并行进行API调用,但这似乎效率很低.

I could make the API calls in parallel for each object but that seems inefficient.

推荐答案

低级客户端通用类型参数是预期响应的类型.

The low level client generic type parameter is the type for the response expected.

如果使用的是通过高级别客户端公开的低级别客户端,则可以通过.LowLevel属性发送批量请求,其中文档为JSON字符串,如5.x

If you're using the low level client exposed on the high level client, through the .LowLevel property, you can send a bulk request where your documents are JSON strings as follows in 5.x

var client = new ElasticClient(settings);


var messages = new [] 
{
    new Message 
    { 
        Id = "1", 
        MessageType = "foo", 
        MessageJson = "{\"name\":\"message 1\",\"content\":\"foo\"}" 
    },  
    new Message 
    { 
        Id = "2", 
        MessageType = "bar", 
        MessageJson = "{\"name\":\"message 2\",\"content\":\"bar\"}" 
    }   
};

var indexName = "my-index";

var bulkRequest = messages.SelectMany(m => 
    new[]
    {
        client.Serializer.SerializeToString(new
            {
                index = new
                {
                    _index = indexName,
                    _type = m.MessageType,
                    _id = m.Id
                }
            }, SerializationFormatting.None),
        m.MessageJson
    });

var bulkResponse = client.LowLevel.Bulk<BulkResponse>(string.Join("\n", bulkRequest) + "\n");

发送以下批量请求

POST http://localhost:9200/_bulk
{"index":{"_index":"my-index","_type":"foo","_id":"1"}}
{"name":"message 1","content":"foo"}
{"index":{"_index":"my-index","_type":"bar","_id":"2"}}
{"name":"message 2","content":"bar"}

一些要点

  1. 我们需要自己构建批量请求以使用低级批量API调用.由于我们的文档已经是字符串,因此建立字符串请求是有意义的.
  2. 我们序列化了一个匿名类型,没有缩进每个批量项目的操作和元数据.
  3. MessageJson中不能包含任何换行符,因为这会破坏批量API.换行符是正文中json对象的定界符.
  4. 由于我们使用的是公开在高级客户端上的低级客户端,因此我们仍然可以利用高级请求,响应和序列化程序.批量请求返回BulkResponse,与高级客户端发送批量请求时,您可以像平常一样使用它.
  1. We need to build the bulk request ourselves to use the low level bulk API call. Since our documents are already strings, it makes sense to build a string request.
  2. We serialize an anonymous type with no indenting for the action and metadata for each bulk item.
  3. The MessageJson cannot contain any newline characters in it as this will break the bulk API; newline characters are the delimiters for json objects within the body.
  4. Because we're using the low level client exposed on the high level client, we can still take advantage of the high level requests, responses and serializer. The bulk request returns a BulkResponse, which you can work with as you normally do when sending a bulk request with the high level client.

这篇关于使用ElasticLowLevelClient客户端在Elasticssearch中进行批量索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆