使用pyes进行弹性搜索 [英] elastic search performance using pyes
问题描述
很抱歉交叉发布.以下问题也发布在Elastic Search的Google网上论坛上.
Sorry for cross posting.The following question is also posted on Elastic Search's google group.
简而言之,我试图找出为什么在包含约1.5密耳记录的ES索引上进行搜索时无法获得最佳性能的原因.
In short I am trying to find out why I am not able to get optimal performance while doing searches on a ES index which contains about 1.5 millon records.
目前,我能够在2秒内获得大约500-1000次搜索.我认为这应该快几个数量级.另外,目前我不使用节俭.
Currently I am able to get about 500-1000 searches in 2 seconds. I would think that this should be orders of magnitudes faster. Also currently I am not using thrift.
这是我检查效果的方式.
Here is how I am checking the performance.
使用pyes的0.19.1版本(从github尝试了稳定版和开发版)使用0.13.8版本的请求
Using 0.19.1 version of pyes (tried both stable and dev version from github) Using 0.13.8 version of requests
conn = ES(['localhost:9201'],timeout=20,bulk_size=1000)
loop_start = time.clock()
q1 = TermQuery("tax_name","cellvibrio")
for x in xrange(1000000):
if x % 1000 == 0 and x > 0:
loop_check_point = time.clock()
print 'took %s secs to search %d records' % (loop_check_point-loop_start,x)
results = conn.search(query=q1)
if results:
for r in results:
pass
# print len(results)
else:
pass
感谢您可以提供的任何帮助,以帮助我扩大搜索范围.
Appreciate any help that you can give to help me scaleup the searches.
谢谢!
推荐答案
不仅仅是并发吗?
您正在按顺序执行所有查询.因此,查询必须先完成,然后才能进行下一个查询.如果您对服务器的RTT为1毫秒,则每秒将请求数限制为1000.
You're doing all your queries in sequence. So a query has to finish before the next one can come in to play. If you have a 1ms RTT to the server, this will limit you to 1000 requests per second.
尝试并行运行脚本的几个实例,看看所获得的性能如何.
Try to run a few instances of your script in parallel and see what kind of performance you got.
这篇关于使用pyes进行弹性搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!