我们什么时候需要使用Elasticsearch的大堆? [英] When do we need large heap with Elasticsearch?
问题描述
运行ES 1.5.2
JAVA 1.8_45
Windows 2008
32个核心128GB RAM 5TB SSD(每台机器)的4个节点。
我的目标是索引约25亿份文件。我高达8.1亿。我现在有ES_HEAP_SIZE = 30g
但是我已经体验到大量的内存压力STW暂停。示例:目前,一个节点始终在90%以上的堆使用率,而其余节点在30%到40%之间的任何地方都会随意滑动。所以似乎1节点没有GC?
只有两件事情发生在群集批量索引(没有错误)记录和一些滚动搜索。
我可以使用doc值。目前没有字段数据缓存(除了奇迹verry小)和过滤器缓存是非常小的每个节点约100MB。
节点仍在尝试恢复,所以我只是' t要完全停止集群并将RAM重置为10GB?
如何在批量和滚动中连接到集群搜索...
//在应用程序启动时执行此操作,并重新使用客户端实例。
设置设置= ImmutableSettings
.settingsBuilder()
.put(cluster.name,xxxx)
.build();
client = new TransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress(xxxx,9300))
.addTransportAddress(new InetSocketTransportAddress(xxxx,9300))
.addTransportAddress(new InetSocketTransportAddress(xxxx,9300))
.addTransportAddress(new InetSocketTransportAddress(xxxx,9300));
不要将批量请求发送到一个节点。搜索请求也是一样。
批量请求保存在接收请求的节点上的内存缓冲区中,显然,发送不是一个好主意对一个节点的任何类型的请求。通过使用代理服务器(如果您有)或通过使用客户端节点,并将请求发送到该节点。客户端节点知道如何执行循环机制。
还可以查看其他选项(取决于访问集群的客户端),看看是否客户端支持自动循环/负载平衡请求。
Running ES 1.5.2 JAVA 1.8_45 Windows 2008 4 nodes of 32 Core 128gb RAM 5TB SSDs (Per machine).
My goal is to index about 2.5 billion documents. I am up to 810 million. 30k average per doc.
I currently have ES_HEAP_SIZE=30g
But I have been experience lots of memory pressure and STW pauses. Example: Currently one node is always above 90% heap usage while the rest are coasting anywhere between 30% and 40%. So it seems that 1 node wont GC???
Only 2 things are happening on the cluster bulk indexing (no errors) logged and some scroll searches.
Using doc value where I can. Currently there's no field data cache (except marvel verry small) and filter cache is very minimal about 100MB per node.
The nodes are still trying to recover so i just don't want to stop the cluster fully and reset the RAM to 10GB??
How I connect to the cluster in both bulk and scroll search...
// Do this once at application startup and re-use the client instance.
Settings settings = ImmutableSettings
.settingsBuilder()
.put("cluster.name", "xxxx")
.build();
client = new TransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress("xxxx", 9300))
.addTransportAddress(new InetSocketTransportAddress("xxxx", 9300))
.addTransportAddress(new InetSocketTransportAddress("xxxx", 9300))
.addTransportAddress(new InetSocketTransportAddress("xxxx", 9300));
Don't send the bulk requests only to one node. The same goes for the search requests.
The bulk request is kept in a memory buffer on the node that receives the request and, obviously, is not a good idea to send any kind of requests to just one node. Round robin the requests either by using a proxy server (if you have one), or by using a client node and send the requests to that node. The client node knows how to do the round-robin mechanism.
You can, also, look at other options (depending on the clients accessing the cluster) and see if those clients support automatic round-robin/load balancing the requests.
这篇关于我们什么时候需要使用Elasticsearch的大堆?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!