弹性搜索的索引速度为1000万次 [英] Indexing speed of elasticsearch for 10 million events

查看:65
本文介绍了弹性搜索的索引速度为1000万次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在试图找出为什么弹性搜索在索引时是如此之慢。我不确定它是否是弹性搜索本身的限制,但是我将分享我迄今为止所拥有的东西。



我有一个弹性搜索节点和一个logstash实例运行一个盒子。我的文档有大约15个字段,我有一个弹性搜索映射设置与正确的类型(虽然我已经尝试没有映射,得到几乎相同的结果)。



我我一次索引约8-10万个事件,并采取了以下方法。



批量API具有以下格式(我将csv转换为JSON并将其放入我卷曲的文件

  {create:{}} 
{field1:value1 ,field2:value2 ....}
{create:{}}
{field1:value1,field2:value2 ....}
{create:{}}
{field1:value1,field2:value2 ....}

我还尝试使用tcp输入与原始csv或使用文件监听器的logstash,并将csv指向logstash正在侦听的文件的末尾。



所有这三种方法似乎每秒摄取大约10,000个事件,这是非常慢的。



我在做错了吗?我应该在批量摄取中明确指定一个ID,而不是让它自动生成一个?



当通过批量API采集时,我将事件分成50,000和100,000事件文件并单独摄取。

解决方案

你会发现我对这个这里,您可以下载索引脚本文件,这有一些有用的脚本来最大限度地提高索引性能。它在硬件上有所不同,并且用于索引的Elasticsearch的优化。即删除副本节点等。



希望这有助于您。


I am trying to figure out why elasticsearch is so slow at indexing. I am unsure if it is a limitation of elasticsearch itself or not but I will share what I have so far.

I have a single elasticsearch node and a logstash instance running on a box. My documents have about 15 fields and I have an elastic search mapping setup with the correct types (although I have tried without the mapping and get pretty much identical results).

I am indexing roughly 8 - 10 million events at a time and have taken the following approaches.

bulk api with the following format (I converted the csv to JSON and placed it into a file which I curl in

{"create" : {}}
{"field1" : "value1", "field2" : "value2 .... }
{"create" : {}}
{"field1" : "value1", "field2" : "value2 .... }
{"create" : {}}
{"field1" : "value1", "field2" : "value2 .... }

I have also tried logstash using both a tcp input with the original csv or using a file listener and cat the csv to the end of a file logstash is listening to.

All three of these methods seem to ingest around 10,000 events per second which is very slow.

Am I doing something wrong? Should I be explicitly assigning an id in my bulk ingest rather than letting it auto generate one?

When ingesting through the bulk API I have split the events up into 50,000 and 100,000 event files and ingested each separately.

解决方案

Youll find I done some research on this here, you can download the Indexing Scripts file and this has some useful scripts to maximise indexing performance. It really does vary on hardware and optimisation of Elasticsearch for indexing. I.e. Removal of replica nodes etc.

Hope this helps you somewhat.

这篇关于弹性搜索的索引速度为1000万次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆