使用 ElasticLowLevelClient 客户端在 Elasticssearch 中批量索引 [英] Bulk Indexing in Elasticssearch using the ElasticLowLevelClient client

查看:27
本文介绍了使用 ElasticLowLevelClient 客户端在 Elasticssearch 中批量索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 ElasticLowLevelClient 客户端来索引 elasticsearch 数据,因为它需要被索引为原始字符串,因为我无权访问 POCO 对象.我可以通过调用成功索引单个对象:

I'm using the ElasticLowLevelClient client to index elasticsearch data as it needs to be indexed as a raw string as I don't have access to the POCO objects. I can successfully index an individual object by calling:

client.Index<object>(indexName, message.MessageType, message.Id, 
    new Elasticsearch.Net.PostData<object>(message.MessageJson));

如何使用 ElasticLowLevelClient 客户端对索引进行批量插入?批量插入 API 都需要我没有的索引文档的 POCO,例如:

How can I do a bulk insert into the index using the ElasticLowLevelClient client? The bulk inset APIs all require a POCO of the indexing document which I don't have e.g.:

 ElasticsearchResponse<T> Bulk<T>(string index, PostData<object> body,
      Func<BulkRequestParameters, BulkRequestParameters> requestParameters = null)

我可以为每个对象并行调用 API,但这似乎效率低下.

I could make the API calls in parallel for each object but that seems inefficient.

推荐答案

低级客户端泛型类型参数是预期响应的类型.

The low level client generic type parameter is the type for the response expected.

如果您使用在高层客户端上公开的低层客户端,通过 .LowLevel 属性,您可以发送批量请求,其中您的文档是 JSON 字符串,如下所示 5.x

If you're using the low level client exposed on the high level client, through the .LowLevel property, you can send a bulk request where your documents are JSON strings as follows in 5.x

var client = new ElasticClient(settings);


var messages = new [] 
{
    new Message 
    { 
        Id = "1", 
        MessageType = "foo", 
        MessageJson = "{"name":"message 1","content":"foo"}" 
    },  
    new Message 
    { 
        Id = "2", 
        MessageType = "bar", 
        MessageJson = "{"name":"message 2","content":"bar"}" 
    }   
};

var indexName = "my-index";

var bulkRequest = messages.SelectMany(m => 
    new[]
    {
        client.Serializer.SerializeToString(new
            {
                index = new
                {
                    _index = indexName,
                    _type = m.MessageType,
                    _id = m.Id
                }
            }, SerializationFormatting.None),
        m.MessageJson
    });

var bulkResponse = client.LowLevel.Bulk<BulkResponse>(string.Join("
", bulkRequest) + "
");

发送以下批量请求

POST http://localhost:9200/_bulk
{"index":{"_index":"my-index","_type":"foo","_id":"1"}}
{"name":"message 1","content":"foo"}
{"index":{"_index":"my-index","_type":"bar","_id":"2"}}
{"name":"message 2","content":"bar"}

几个要点

  1. 我们需要自己构建批量请求以使用低级批量 API 调用.由于我们的文档已经是字符串,因此构建字符串请求是有意义的.
  2. 我们序列化了一个匿名类型,每个批量项目的操作和元数据都没有缩进.
  3. MessageJson 不能包含任何换行符,因为这会破坏批量 API;换行符是正文中 json 对象的分隔符.
  4. 因为我们使用暴露在高级客户端上的低级客户端,所以我们仍然可以利用高级请求、响应和序列化程序.批量请求返回一个 BulkResponse,您可以像往常一样使用高级客户端发送批量请求.
  1. We need to build the bulk request ourselves to use the low level bulk API call. Since our documents are already strings, it makes sense to build a string request.
  2. We serialize an anonymous type with no indenting for the action and metadata for each bulk item.
  3. The MessageJson cannot contain any newline characters in it as this will break the bulk API; newline characters are the delimiters for json objects within the body.
  4. Because we're using the low level client exposed on the high level client, we can still take advantage of the high level requests, responses and serializer. The bulk request returns a BulkResponse, which you can work with as you normally do when sending a bulk request with the high level client.

这篇关于使用 ElasticLowLevelClient 客户端在 Elasticssearch 中批量索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆