如何提高Elasticsearch性能 [英] How to improve elasticsearch performance
问题描述
我使用python中的parallel_bulk函数向elasticsearch写入数据,但是性能非常低,我写入10000个数据,它消耗了180秒,并且设置了 settings
:
I write data using parallel_bulk function in python to elasticsearch, but the performance is very low, I write 10000 data, and it consumes 180s, and I set the settings
:
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0,
"refresh_interval": "30s",
"index.translog.durability": "async",
"index.translog.sync_interval": "30s"
}
然后在elasticsearch.yml中设置:
and in the elasticsearch.yml, I set:
bootstrap.memory_lock: true
indices.memory.index_buffer_size: 20%
indices.memory.min_index_buffer_size: 96mb
# Search pool
thread_pool.search.size: 5
thread_pool.search.queue_size: 100
thread_pool.bulk.queue_size: 300
thread_pool.index.queue_size: 300
indices.fielddata.cache.size: 40%
discovery.zen.fd.ping_timeout: 120s
discovery.zen.fd.ping_retries: 6
discovery.zen.fd.ping_interval: 30s
但是它并不能提高性能,我该怎么办?我在Windows10上只有一个节点上使用elasticsearch6.5.4,我从Oracle产生数据到Elasticsearch.
But it doesn't improve the performance, how can I do it? I use elasticsearch6.5.4 on windows10, and only one node, and I yield data from Oracle to elasticsearch.
推荐答案
根据昨天发布的代码,您可以尝试创建oracle DB的es转储:
According the code of the yesterday's post, You can try to create an es dump of oracle DB:
class CreateDump(object):
def __init__():
self.output = r"/home/littlely/Scrivania/oracle_dump.json"
self.index_name = "your_index_name"
self.doc_type = "your_doc_type"
def _gen_data(self, index, doc_type, chunk_size):
sql = """select * from tem_search_engine_1 where rownum <= 10000"""
self.cursor.execute(sql)
col_name_list = [col[0].lower() for col in self.cursor.description]
col_name_len = len(col_name_list)
actions = []
start = time.time()
for row in self.cursor:
source = {}
tbl_id = ""
for i in range(col_name_len):
source.update({col_name_list[i]: str(row[i])})
if col_name_list[i] == "tbl_id":
tbl_id = row[i]
self.writeOnFS(source, tbl_id)
def writeOnFS(source, tbl_id):
with open(self.output, 'a') as f:
prep = json.dumps({"index":{"_index" : self.index_name, "_type" : self.doc_type, "_id" : tbl_id}})
data = json.dumps(source)
print(data)
f.write(prep + " \n")
f.write(data + " \n")
然后,您将在 self.output
路径中找到oracle转储.因此,您只需要批量存储json文件-二进制路径就是self.output路径:
Then you will find the oracle dump in self.output
path. So you need only to bulk your json file - the binary path is the self.output path:
curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/<your_index_name>/<your_doc_type)/_bulk --data-binary @/home/littlely/Scrivania/oracle_dump.json
或者如果太大,请安装GNU PARAllEl.在Ubuntu中:
OR if is it too big, install GNU PARAllEl. In Ubuntu:
sudo apt-get install parallel
然后:
cat /home/littlely/Scrivania/oracle_dump.json.json | parallel --pipe -L 2 -N 2000 -j3 'curl -H "Content-Type: application/x-ndjson" -s http://localhost:9200/<your_index_name>/_bulk --data-binary @- > /dev/null'
享受!
这篇关于如何提高Elasticsearch性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!