Elasticsearch 批量索引 JSON 数据 [英] Elasticsearch Bulk Index JSON Data

查看:36
本文介绍了Elasticsearch 批量索引 JSON 数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 JSON 文件批量索引到新的 Elasticsearch 索引中,但我无法这样做.我在 JSON 中有以下示例数据

I am trying to bulk index a JSON file into a new Elasticsearch index and am unable to do so. I have the following sample data inside the JSON

[{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"},
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"},
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"},
{"Amount": "2115", "Quantity": "2", "Id": "975463798", "Client_Store_sk": "1109"},
{"Amount": "2116", "Quantity": "1", "Id": "975463827", "Client_Store_sk": "1109"},
{"Amount": "648", "Quantity": "3", "Id": "975464139", "Client_Store_sk": "1109"},
{"Amount": "2126", "Quantity": "2", "Id": "975464805", "Client_Store_sk": "1109"},
{"Amount": "2133", "Quantity": "1", "Id": "975464061", "Client_Store_sk": "1109"},
{"Amount": "1339", "Quantity": "4", "Id": "974919458", "Client_Store_sk": "1109"},
{"Amount": "1196", "Quantity": "5", "Id": "974920538", "Client_Store_sk": "1109"},
{"Amount": "1198", "Quantity": "4", "Id": "975463638", "Client_Store_sk": "1109"},
{"Amount": "1345", "Quantity": "4", "Id": "974919522", "Client_Store_sk": "1109"},
{"Amount": "1347", "Quantity": "2", "Id": "974919563", "Client_Store_sk": "1109"},
{"Amount": "673", "Quantity": "2", "Id": "975464359", "Client_Store_sk": "1109"},
{"Amount": "2153", "Quantity": "1", "Id": "975464511", "Client_Store_sk": "1109"},
{"Amount": "3896", "Quantity": "4", "Id": "977289342", "Client_Store_sk": "1109"},
{"Amount": "3897", "Quantity": "4", "Id": "974920602", "Client_Store_sk": "1109"}]

我正在使用

 curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary --data @/home/data1.json 

当我尝试使用来自 Elasticsearch 的标准批量索引 API 时,出现此错误

When I try to use the standard bulk index API from Elasticsearch I get this error

 error: {"message":"ActionRequestValidationException[Validation Failed: 1: no requests added;]"}

谁能帮忙索引这种类型的 JSON?

Can anyone help with indexing this type of JSON?

推荐答案

您需要做的是读取该 JSON 文件,然后使用 _bulk 端点,即一行用于命令,另一行文档的行,由换行符分隔...冲洗并重复每个文档:

What you need to do is to read that JSON file and then build a bulk request with the format expected by the _bulk endpoint, i.e. one line for the command and one line for the document, separated by a newline character... rinse and repeat for each document:

curl -XPOST localhost:9200/your_index/_bulk -d '
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
... etc for all your documents
'

只需确保将 your_indexyour_type 替换为您正在使用的实际索引和类型名称.

Just make sure to replace your_index and your_type with the actual index and type names you're using.

更新

请注意,如果您的 URL 中指定了 _index_type,则可以通过删除它们来缩短命令行.如果您指定 映射中id字段的路径(请注意,此功能将在 ES 2.0 中弃用).至少,对于所有文档,您的命令行看起来像 {"index":{}} 但它始终是强制性的,以便指定您要执行的操作类型(在本例 index 文档)

Note that the command-line can be shortened, by removing _index and _type if those are specified in your URL. It is also possible to remove _id if you specify the path to your id field in your mapping (note that this feature will be deprecated in ES 2.0, though). At the very least, your command line can look like {"index":{}} for all documents but it will always be mandatory in order to specify which kind of operation you want to perform (in this case index the document)

更新 2

curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary  @/home/data1.json

/home/data1.json 应该是这样的:

{"index":{}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"}

更新 3

您可以参考这个答案来查看如何生成UPDATE 2中提到的新的json样式文件.

You can refer to this answer to see how to generate the new json style file mentioned in UPDATE 2.

更新 4

从 ES 7.x 开始,doc_type 不再是必需的,应该只是 _doc 而不是 my_doc_type.从 ES 8.x 开始, doc 类型将被完全删除.您可以在此处

As of ES 7.x, the doc_type is not necessary anymore and should simply be _doc instead of my_doc_type. As of ES 8.x, the doc type will be removed completely. You can read more about this here

这篇关于Elasticsearch 批量索引 JSON 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆