为什么弹性搜索批量插入使用\ n定界符,而不是使用json对象数组? [英] Why elastic search bulk insert uses \n delimiter, instead of using an array of json objects?

查看:124
本文介绍了为什么弹性搜索批量插入使用\ n定界符,而不是使用json对象数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是弹性搜索文档在以下位置提供的批量插入示例: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

Here is a sample of bulk insertion provided by elastic search docs at: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

POST _bulk
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }

他们提到由于此格式使用文字\ n作为分隔符,因此请确保JSON操作和源代码的打印方式不是很好".

They mentioned that "Because this format uses literal \n's as delimiters, please be sure that the JSON actions and sources are not pretty printed".

我想知道这种输入格式背后的原因,以及为什么他们不选择JSON对象数组.

I would like to know the reason behind such input format and why did they not choose an array of JSON objects instead.

例如:

POST _bulk
    [{{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
    { "field1" : "value1" }},
    { "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
    { "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
    { "field1" : "value3" }
    { "update" : {"_id" : "1", "_type" : "type1", "_index" : "test"} }
    { "doc" : {"field2" : "value2"} }]

以上结构不正确,但类似的东西 在REST API开发标准中缺少我常见的东西吗?定界符而不是数组?

The above structure is not correct but something like that Is it something common that I am missing, in a REST API development standards? Delimiters instead of an array?

推荐答案

这使Bulk端点可以逐行处理正文.如果它是一个JSON数组,则ES必须将整个JSON正文加载并解析到内存中,才能依次提取一个数组元素.

That allows the Bulk endpoint to process the body one/two line after another. If it was a JSON array, ES would have to load and parse the whole JSON body into memory in order to extract one array element after another.

了解到批量主体可能很大(即数百MB),这是一项优化措施,可以防止ES服务器在发送大量批量请求时崩溃.

Knowing that the bulk body can be pretty large (i.e. hundreds of MB), this was an optimisation to prevent your ES server from crashing when sending huge bulk requests.

这篇关于为什么弹性搜索批量插入使用\ n定界符,而不是使用json对象数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆