Hazelcast-OperationTimeoutException [英] Hazelcast - OperationTimeoutException

查看：323 发布时间：2020/6/11 19:48:02 hazelcast distributed-cache

本文介绍了Hazelcast-OperationTimeoutException的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用Hazelcast版本3.3.1.
我有一个使用c3.2xlarge服务器在AWS上运行的9节点群集. 我正在使用分布式执行程序服务和分布式映射.
分布式执行程序服务使用单个线程. 分布式映射被配置为没有复制且没有近缓存，并使用Kryo序列化程序存储了大约一百万个大小为1-2kb的对象.
我的用例如下:

I am using Hazelcast version 3.3.1.
I have a 9 node cluster running on aws using c3.2xlarge servers.
I am using a distributed executor service and a distributed map.
Distributed executor service uses a single thread. Distributed map is configured with no replication and no near-cache and stores about 1 million objects of size 1-2kb using Kryo serializer.
My use case goes as follow:

所有9个节点在分布式执行程序服务上不断执行同步远程操作，并每秒产生约2万次匹配(每个节点约2k次).
调用使用Hazelcast API执行:com.hazelcast.core.IExecutorService#executeOnKeyOwner.
每个操作都会访问拥有分区的节点上的分布式映射，并使用存储的对象进行一些计算，然后将该对象存储到映射中. (为此，我使用了IMap对象的get和set API).

每隔一段时间，Hazelcast就会遇到超时异常，例如:
com.hazelcast.core.OperationTimeoutException:120000毫秒内无响应.中止调用！ BasicInvocationFuture {invocation = BasicInvocation {serviceName ='hz:impl:mapService'，op = GetOperation {}，partitionId = 212，replicaIndex = 0，tryCount = 250，tryPauseMillis = 500，invokeCount = 1，callTimeout = 60000，target = Address [ [172.31.44.2]:5701，backupsExpected = 0，backupsCompleted = 0}，响应= null，完成= false}未收到响应！ backups-expected:0备份已完成:0

Every once in a while Hazelcast encounters a timeout exceptions such as:
com.hazelcast.core.OperationTimeoutException: No response for 120000 ms. Aborting invocation! BasicInvocationFuture{invocation=BasicInvocation{ serviceName='hz:impl:mapService', op=GetOperation{}, partitionId=212, replicaIndex=0, tryCount=250, tryPauseMillis=500, invokeCount=1, callTimeout=60000, target=Address[172.31.44.2]:5701, backupsExpected=0, backupsCompleted=0}, response=null, done=false} No response has been received! backups-expected:0 backups-completed: 0

在某些情况下，我看到地图分区开始迁移，这使情况变得更糟，节点不断离开并重新加入集群，而克服这一问题的唯一方法是重新启动整个集群.

In some cases I see map partitions start to migrate which makes thing even worse, nodes constantly leave and re-join the cluster and the only way I can overcome the problem is by restarting the entire cluster.

我想知道是什么会导致Hazelcast阻止地图获取操作120秒?
我很确定这与网络无关，因为同一台服务器上的其他服务运行良好. 另外请注意，服务器大部分处于闲置状态(约70％).

I am wondering what may cause Hazelcast to block a map-get operation for 120 seconds?
I am pretty sure it's not network related since other services on the same servers operate just fine. Also note that the servers are mostly idle (~70%).

对于我的用例的任何反馈将非常感谢.

Any feedbacks on my use case will be highly appreciated.

Hazelcast-OperationTimeoutException [英] Hazelcast - OperationTimeoutException

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Hazelcast-OperationTimeoutException [英] Hazelcast - OperationTimeoutException

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭