Elasticsearch数据二进制文件内存不足 [英] Elasticsearch data binary ran out of memory

查看:472
本文介绍了Elasticsearch数据二进制文件内存不足的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将800GB的文件上传到elasticsearch,但是我不断收到内存错误,该错误告诉我数据二进制文件内存不足.我的系统上有64GB的RAM和3TB的存储空间

Im trying to upload a 800GB file to elasticsearch but i keep getting a memory error that tells me the data binary is out of memory. I have 64GB of RAM on my system and 3TB of storage

curl -XPOST 'http://localhost:9200/carrier/doc/1/_bulk' --data-binary @carrier.json

我想知道配置文件中是否有设置可以增加内存量,以便我可以上传到他的文件中

Im wondering if there is a setting in the config file to increase to amount of memory so i can upload to his file

谢谢

推荐答案

800GB可以一次发送很多,ES必须将所有内容都放入内存中才能对其进行处理,因此对于您拥有的内存量.

800GB is a quite a lot to send in one shot, ES has to put all the content into memory in order to process it, so that's probably too big for the amount of memory you have.

解决此问题的一种方法是将文件拆分为多个文件,然后将文件一个接一个地发送.您可以使用如下所示的小型Shell脚本来实现它.

One way around this is to split your file into several and send each one after another. You can achieve it with a small shell script like the one below.

#!/bin/sh

# split the main file into files containing 10,000 lines max
split -l 10000 -a 10 carrier.json /tmp/carrier_bulk

# send each split file
BULK_FILES=/tmp/carrier_bulk*
for f in $BULK_FILES; do
    curl -s -XPOST http://localhost:9200/_bulk --data-binary @$f
done

更新

如果您想解释ES响应,则可以通过将响应管道传递到像这样的小型python单线容器来轻松实现:

If you want to interpret the ES response you can do so easily by piping the response to a small python one-liner like this:

curl -s -XPOST $ES_HOST/_bulk --data-binary @$f | python -c 'import json,sys;obj=json.load(sys.stdin);print "    <- Took %s ms with errors: %s" % (obj["took"], obj["errors"])';

这篇关于Elasticsearch数据二进制文件内存不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆