Elasticsearch 批量索引 JSON 数据 [英] Elasticsearch Bulk Index JSON Data
问题描述
我正在尝试将 JSON 文件批量索引到新的 Elasticsearch 索引中,但我无法这样做.我在 JSON 中有以下示例数据
I am trying to bulk index a JSON file into a new Elasticsearch index and am unable to do so. I have the following sample data inside the JSON
[{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"},
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"},
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"},
{"Amount": "2115", "Quantity": "2", "Id": "975463798", "Client_Store_sk": "1109"},
{"Amount": "2116", "Quantity": "1", "Id": "975463827", "Client_Store_sk": "1109"},
{"Amount": "648", "Quantity": "3", "Id": "975464139", "Client_Store_sk": "1109"},
{"Amount": "2126", "Quantity": "2", "Id": "975464805", "Client_Store_sk": "1109"},
{"Amount": "2133", "Quantity": "1", "Id": "975464061", "Client_Store_sk": "1109"},
{"Amount": "1339", "Quantity": "4", "Id": "974919458", "Client_Store_sk": "1109"},
{"Amount": "1196", "Quantity": "5", "Id": "974920538", "Client_Store_sk": "1109"},
{"Amount": "1198", "Quantity": "4", "Id": "975463638", "Client_Store_sk": "1109"},
{"Amount": "1345", "Quantity": "4", "Id": "974919522", "Client_Store_sk": "1109"},
{"Amount": "1347", "Quantity": "2", "Id": "974919563", "Client_Store_sk": "1109"},
{"Amount": "673", "Quantity": "2", "Id": "975464359", "Client_Store_sk": "1109"},
{"Amount": "2153", "Quantity": "1", "Id": "975464511", "Client_Store_sk": "1109"},
{"Amount": "3896", "Quantity": "4", "Id": "977289342", "Client_Store_sk": "1109"},
{"Amount": "3897", "Quantity": "4", "Id": "974920602", "Client_Store_sk": "1109"}]
我正在使用
curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary --data @/home/data1.json
当我尝试使用来自 Elasticsearch 的标准批量索引 API 时,出现此错误
When I try to use the standard bulk index API from Elasticsearch I get this error
error: {"message":"ActionRequestValidationException[Validation Failed: 1: no requests added;]"}
谁能帮忙索引这种类型的 JSON?
Can anyone help with indexing this type of JSON?
推荐答案
您需要做的是读取该 JSON 文件,然后使用 _bulk
端点,即一行用于命令,另一行文档的行,由换行符分隔...冲洗并重复每个文档:
What you need to do is to read that JSON file and then build a bulk request with the format expected by the _bulk
endpoint, i.e. one line for the command and one line for the document, separated by a newline character... rinse and repeat for each document:
curl -XPOST localhost:9200/your_index/_bulk -d '
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463711"}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index": {"_index": "your_index", "_type": "your_type", "_id": "975463943"}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
... etc for all your documents
'
只需确保将 your_index
和 your_type
替换为您正在使用的实际索引和类型名称.
Just make sure to replace your_index
and your_type
with the actual index and type names you're using.
更新
请注意,如果您的 URL 中指定了 _index
和 _type
,则可以通过删除它们来缩短命令行.如果您指定 映射中id字段的路径(请注意,此功能将在 ES 2.0 中弃用).至少,对于所有文档,您的命令行看起来像 {"index":{}}
但它始终是强制性的,以便指定您要执行的操作类型(在本例 index
文档)
Note that the command-line can be shortened, by removing _index
and _type
if those are specified in your URL. It is also possible to remove _id
if you specify the path to your id field in your mapping (note that this feature will be deprecated in ES 2.0, though). At the very least, your command line can look like {"index":{}}
for all documents but it will always be mandatory in order to specify which kind of operation you want to perform (in this case index
the document)
更新 2
curl -XPOST localhost:9200/index_local/my_doc_type/_bulk --data-binary @/home/data1.json
/home/data1.json
应该是这样的:
{"index":{}}
{"Amount": "480", "Quantity": "2", "Id": "975463711", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2105", "Quantity": "2", "Id": "975463943", "Client_Store_sk": "1109"}
{"index":{}}
{"Amount": "2107", "Quantity": "3", "Id": "974920111", "Client_Store_sk": "1109"}
更新 3
您可以参考这个答案来查看如何生成UPDATE 2中提到的新的json样式文件.
You can refer to this answer to see how to generate the new json style file mentioned in UPDATE 2.
更新 4
从 ES 7.x 开始,doc_type
不再是必需的,应该只是 _doc
而不是 my_doc_type
.从 ES 8.x 开始, doc 类型将被完全删除.您可以在此处
As of ES 7.x, the doc_type
is not necessary anymore and should simply be _doc
instead of my_doc_type
. As of ES 8.x, the doc type will be removed completely. You can read more about this here
这篇关于Elasticsearch 批量索引 JSON 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!