Elasticsearch批量JSON数据 [英] Elasticsearch Bulk JSON Data

查看:335
本文介绍了Elasticsearch批量JSON数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问题来自此SO线程.

似乎我有一个相似但不相同的查询,最好像@Val所说的那样,让其他人从中受益.

因此,类似于以上所述,我需要在索引中插入大量数据(我的初始测试大约是10000个文档,但这只是针对POC,还有更多).我想插入的数据在.json文档中,看起来像这样(片段):

[ { "fileName": "filename", "data":"massive string text data here" }, 
  { "fileName": "filename2", "data":"massive string text data here" } ]

以我个人的身份,我是ElasticSearch的新手,但是,从阅读文档开始,我的假设是我可以获取.json文件并根据其中的数据创建索引.现在,我了解到json中的每个项目似乎都需要有一个标头",例如:

{"index":{}}
{ "fileName": "filename", "data":"massive string text data here" }

意思是,这不是实际的json格式(如此),而是操纵的字符串.

我想知道是否有一种方法可以按原样(以json格式)import我的json数据,而不必先手动操作文本(因为我的测试数据有10000个条目,我敢肯定您可以了解为什么我不希望手动执行此操作).

有什么建议或建议的自动化工具可以帮助您解决此问题吗?

PS-我正在使用Windows Installer和Postman进行通话.

解决方案

您可以使用这样的单个shell命令非常轻松地转换文件.假设您的文件名为input.json,则可以执行以下操作:

jq -c -r ".[]" input.json | while read line; do echo '{"index":{}}'; echo $line; done > bulk.json

此后,您将得到一个名为bulk.json的文件,该文件的格式正确,可以发送到批量端点.

然后,您可以像这样调用批量端点:

curl -XPOST localhost:9200/your_index/your_type/_bulk -H "Content-Type: application/x-ndjson" --data-binary @bulk.json

注意:如果尚未安装jq ,则需要先安装jq

This question arises from this SO thread.

As it seems I have a similar but not the same query, it might be best to have a separate question for others to benefit from, as @Val suggested.

So, similar to the above, I have the need to insert a massive amount of data into an index (my initial testing is about 10 000 documents but this is just for a POC, there are many more). The data I would like to insert is in a .json document and looks something like this (snippet):

[ { "fileName": "filename", "data":"massive string text data here" }, 
  { "fileName": "filename2", "data":"massive string text data here" } ]

On my own admission I am new to ElasticSearch, however, from reading through the documentation, my assumptions were that I could take a .json file and create an index from the data within. I have now since learnt that it seems each item within the json needs to have a "header", something like:

{"index":{}}
{ "fileName": "filename", "data":"massive string text data here" }

Meaning, that this is not actual json format (as such) but rather manipulated string.

I would like to know if there is a way to import my json data as is (in json format), without having to manually manipulate the text first (as my test data has 10 000 entries, I'm sure you can see why I'd prefer not doing this manually).

Any suggestions or suggested automated tools to help with this?

PS - I am using the windows installer and Postman for the calls.

解决方案

You can transform your file very easily with a single shell command like this. Provided that your file is called input.json, you can do this:

jq -c -r ".[]" input.json | while read line; do echo '{"index":{}}'; echo $line; done > bulk.json

After this you'll have a file called bulk.json which is properly formatted to be sent to the bulk endpoint.

Then you can call your bulk endpoint like this:

curl -XPOST localhost:9200/your_index/your_type/_bulk -H "Content-Type: application/x-ndjson" --data-binary @bulk.json

Note: You need to install jq first if you don't have it already.

这篇关于Elasticsearch批量JSON数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆