使用solr搜索时的CPU使用率 [英] CPU usage when searching using solr

查看:1159
本文介绍了使用solr搜索时的CPU使用率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个solr云设置4个碎片(每个物理机一个碎片),有大约1亿个文档。 Zookeeper在这4台机器之一。我们遇到带有通配符和邻近搜索的复杂查询,有时需要超过15秒才能获得前100个文档。查询流量目前非常低(每分钟2-3个查询)。 4托管云的服务器有以下规范:
(2个服务器 - > 64 GB RAM,24个CPU内核,2.4 GHz)+(2个服务器 - > 48 GB RAM,24个CPU内核,2.4GHz)。

We have a solr cloud setup of 4 shards (one shard per physical machine) having ~100 million documents. Zookeeper is on one of those 4 machines. We encounter complex queries having wild cards and proximity searches together and it sometimes takes more than 15 secs to get top 100 documents. Query traffic is very very low at the moment (2-3 queries every minute). 4 Servers hosting cloud have following specs: (2 servers -> 64 GB RAM, 24 CPU cores, 2.4 GHz) + (2 servers -> 48 GB RAM, 24 CPU cores, 2.4GHz).

我们为每个shard提供8 GB的JVM内存。我们在每台机器SSD(总计4 * 510 GB = 2.4TB)上的510GB索引映射到每个服务器上其余RAM上的操作系统磁盘缓存。所以我认为RAM不是我们的问题。

We are providing 8 GB JVM memory per shard. Our 510GB index on SSDs per machine (totalling to 4*510 GB = 2.4TB) is mapped into OS disk cache on remaining RAM on each server. So I suppose RAM is not an issue for us.

现在有趣的事情要注意的是:当一个查询被触发到云时,只有一个CPU内核被利用到100 %,其余均为0%。相同的行为在所有机器上复制。这些计算机上没有运行其他进程。

Now Interesting thing to note is: When a query is fired to the cloud, only one CPU core is utilized to 100% and rest are all at 0%. Same behaviour is replicated on all the machines. No other processes are running on these machines.

不应该使用某种类型的多线程来利用CPU内核?我可以随时增加每个查询的CPU消耗,因为流量不是问题。如果是,如何?

Shouldn't solr be doing multi-threading of some-kind to utilize the CPU cores? Can I anyhow increase CPU consumption for each queries as traffic is not a problem. If so, how?

推荐答案

对Solr shard的单个请求主要是单线程处理在多个字段)。经验法则是将碎片的文档计数保持在不超过几亿。你远远低于25M / shard,但正如你所说,你的查询是复杂的。你看到的是简单的单线程处理的效果。

A single request to a Solr shard is largely processed single-threaded (you can set threads for faceting on multiple fields). Rule of thumb is to keep document count for shards to no more than a very few hundreds of millions. You are well below that with 25M/shard, but as you say, your queries are complex. What you see is simple the effect of single-threaded processing.

您的问题的解决方案是使用更多的碎片,因为所有碎片被并行查询。由于你有很多可用的CPU内核和很少的流量,你可能想尝试在每台机器上运行10个碎片。 SolrCloud总共使用40个分片并不是一个问题,与重型查询相比,增加的合并开销应该是微不足道的。

The solution to your problem is to use more shards, as all shards are queried in parallel. As you have a lot of free CPU cores and very little traffic, you might want to try running 10 shards on each machine. It is not a problem for SolrCloud to use 40 shards in total and the increased merging overhead should be insignificant compared to your heavy queries.

这篇关于使用solr搜索时的CPU使用率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆