您如何以编程方式为多播发现机制配置 hazelcast? [英] How do you programmatically configure hazelcast for the multicast discovery mechanism?

查看:25
本文介绍了您如何以编程方式为多播发现机制配置 hazelcast?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何以编程方式为多播发现机制配置hazelcast?

How do you programmatically configure hazelcast for the multicast discovery mechanism?

详情:

文档 仅提供了一个用于 TCP/IP 的示例并且已经过时:它使用不再存在的 Config.setPort().

The documentation only supplies an example for TCP/IP and is out-of-date: it uses Config.setPort(), which no longer exists.

我的配置看起来像这样,但发现不起作用(即我得到输出 "Members: 1":

My configuration looks like this, but discovery does not work (i.e. I get the output "Members: 1":

Config cfg = new Config();                  
NetworkConfig network = cfg.getNetworkConfig();
network.setPort(PORT_NUMBER);

JoinConfig join = network.getJoin();
join.getTcpIpConfig().setEnabled(false);
join.getAwsConfig().setEnabled(false);
join.getMulticastConfig().setEnabled(true);

join.getMulticastConfig().setMulticastGroup(MULTICAST_ADDRESS);
join.getMulticastConfig().setMulticastPort(PORT_NUMBER);
join.getMulticastConfig().setMulticastTimeoutSeconds(200);

HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg);
System.out.println("Members: "+hazelInst.getCluster().getMembers().size());

<小时>

更新 1,考虑到 asimaslan 的回答

如果我在 MulticastTimeout 中出错,我要么得到 "Members: 1"

2013 年 12 月 5 日晚上 8:50:42 com.hazelcast.nio.ReadHandler 警告:[192.168.0.9]:4446 [dev] hz._hzInstance_1_dev.IO.thread-in-0 关闭套接字到端点地址[192.168.0.7]:4446,原因:java.io.EOFException:远程套接字关闭!2013 年 12 月 5 日 8:57:24PM com.hazelcast.instance.Node 严重:[192.168.0.9]:4446 [dev] 可以不加入集群,关闭!com.hazelcast.core.HazelcastException: 无法在 300 秒内加入!

Dec 05, 2013 8:50:42 PM com.hazelcast.nio.ReadHandler WARNING: [192.168.0.9]:4446 [dev] hz._hzInstance_1_dev.IO.thread-in-0 Closing socket to endpoint Address[192.168.0.7]:4446, Cause:java.io.EOFException: Remote socket closed! Dec 05, 2013 8:57:24 PM com.hazelcast.instance.Node SEVERE: [192.168.0.9]:4446 [dev] Could not join cluster, shutting down! com.hazelcast.core.HazelcastException: Failed to join in 300 seconds!

<小时>

更新 2,考虑到 pveentjer 关于使用 tcp/ip 的回答

如果我把配置改成下面这样,我仍然只能得到1个成员:


Update 2, taking pveentjer's answer about using tcp/ip into account

If I change the configuration to the following, I still only get 1 member:

Config cfg = new Config();                  
NetworkConfig network = cfg.getNetworkConfig();
network.setPort(PORT_NUMBER);

JoinConfig join = network.getJoin();

join.getMulticastConfig().setEnabled(false);
join.getTcpIpConfig().addMember("192.168.0.1").addMember("192.168.0.2").
addMember("192.168.0.3").addMember("192.168.0.4").
addMember("192.168.0.5").addMember("192.168.0.6").
addMember("192.168.0.7").addMember("192.168.0.8").
addMember("192.168.0.9").addMember("192.168.0.10").
addMember("192.168.0.11").setRequiredMember(null).setEnabled(true);

//this sets the allowed connections to the cluster? necessary for multicast, too?
network.getInterfaces().setEnabled(true).addInterface("192.168.0.*");

HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg);
System.out.println("debug: joined via "+join+" with "+hazelInst.getCluster()
.getMembers().size()+" members.");

更准确地说,这次运行产生了输出

More precisely, this run produces the output

调试:通过 JoinConfig{multicastConfig=MulticastConfig 加入[启用=假,多播组=224.2.2.3,多播端口=54327,多播TimeToLive=32,多播TimeoutSeconds=2,可信接口=[]],tcpIpConfig=TcpIpConfig [启用=真,connectionTimeoutSeconds=5, members=[192.168.0.1, 192.168.0.2,192.168.0.3, 192.168.0.4, 192.168.0.5, 192.168.0.6, 192.168.0.7, 192.168.0.8, 192.168.0.9, 1902.10.9, 1902.10.0.9, 1902.10.0.16.168.0.0.16.168.10.0.16.168.0.16.168.10.0.16.168.0.9,1902.10.0.16.168.0.0.16.168.0.16.168.0.16.168.0.16 配置region='us-east-1', securityGroupName='null', tagKey='null',tagValue='null', hostHeader='ec2.amazonaws.com',connectionTimeoutSeconds=5}} 有 1 个成员.

debug: joined via JoinConfig{multicastConfig=MulticastConfig [enabled=false, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=true, connectionTimeoutSeconds=5, members=[192.168.0.1, 192.168.0.2, 192.168.0.3, 192.168.0.4, 192.168.0.5, 192.168.0.6, 192.168.0.7, 192.168.0.8, 192.168.0.9, 192.168.0.10, 192.168.0.11], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} with 1 members.

我的非 Hazelcast 实现使用 UDP 多播并且工作正常.那么防火墙真的是问题所在吗?

My non-hazelcast-implementation is using UDP multicasts and works fine. So can a firewall really be the problem?

由于我没有 iptables 或安装 iperf 的权限,我使用 com.hazelcast.examples.TestApp 来检查我的网络是否正常工作,如 Hazelcast 入门 在第 2 章直接炫耀"部分:

Since I do not have permissions for iptables or to install iperf, I am using com.hazelcast.examples.TestApp to check whether my network is working, as described in Getting Started With Hazelcast in Chapter 2, Section "Showing Off Straight Away":

我在 192.168.0.1 上调用 java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp 并得到输出

I call java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp on 192.168.0.1 and get the output

...Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.0.1]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 10, 2013 11:31:22 PM com.hazelcast.system
INFO: [192.168.0.1]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.1]:5701
Dec 10, 2013 11:31:22 PM com.hazelcast.system
INFO: [192.168.0.1]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 10, 2013 11:31:22 PM com.hazelcast.instance.Node
INFO: [192.168.0.1]:5701 [dev] Creating MulticastJoiner
Dec 10, 2013 11:31:22 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTING
Dec 10, 2013 11:31:24 PM com.hazelcast.cluster.MulticastJoiner
INFO: [192.168.0.1]:5701 [dev] 

Members [1] {
    Member [192.168.0.1]:5701 this
}

Dec 10, 2013 11:31:24 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTED

然后我在 192.168.0.2 上调用 java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp 并获得输出

I then call java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp on 192.168.0.2 and get the output

...Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.0.2]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 10, 2013 9:50:23 PM com.hazelcast.system
INFO: [192.168.0.2]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.2]:5701
Dec 10, 2013 9:50:23 PM com.hazelcast.system
INFO: [192.168.0.2]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 10, 2013 9:50:23 PM com.hazelcast.instance.Node
INFO: [192.168.0.2]:5701 [dev] Creating MulticastJoiner
Dec 10, 2013 9:50:23 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTING
Dec 10, 2013 9:50:23 PM com.hazelcast.nio.SocketConnector
INFO: [192.168.0.2]:5701 [dev] Connecting to /192.168.0.1:5701, timeout: 0, bind-any: true
Dec 10, 2013 9:50:23 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.0.2]:5701 [dev] 38476 accepted socket connection from /192.168.0.1:5701
Dec 10, 2013 9:50:28 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.0.2]:5701 [dev] 

Members [2] {
    Member [192.168.0.1]:5701
    Member [192.168.0.2]:5701 this
}

Dec 10, 2013 9:50:30 PM com.hazelcast.core.LifecycleService
INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTED

所以多播发现通常在我的集群上工作,对吗?5701 也是发现端口吗?最后输出中的 38476 是 ID 还是端口?

So multicast discovery is generally working on my cluster, right? Is 5701 also the port for discovery? Is 38476 in the last output an ID or a port?

加入仍然不适用于我自己的编程配置代码:(

Joining still does not work for my own code with programmatical configuration :(

修改后的TestApp给出输出

The modified TestApp gives the output

joinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, 
multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, 
trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, 
connectionTimeoutSeconds=5, members=[], requiredMember=null], 
awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', 
tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}}

并在几秒钟后检测到其他成员(如果所有实例同时启动,则在每个实例一次后仅将自己列为成员),而

and does detect other members after a couple of seconds (after each instance once lists only itself as a member if all are started at the same time), whereas

myProgram 给出输出

myProgram gives the output

joined via JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multica
stTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSecond
s=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='nu
ll', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} with 1 members.

并且在大约 1 分钟的运行时间内没有检测到成员(我大约每 5 秒计算一次成员).

and does not detect members within its runtime of about 1 minute (I am counting the members about every 5 seconds).

但是如果至少有一个 TestApp 实例在集群上同时运行,则所有 TestApp 实例和所有 myProgram 实例都会被检测到,我的程序运行正常.如果我启动 TestApp 一次,然后并行启动两次 myProgram,TestApp 会提供以下输出:

BUT if at least one instance of TestApp runs concurrently on the cluster, all TestApp instances and all myProgram instances are detected and my program works fine. In case I start TestApp once and then myProgram twice in parallel, TestApp gives the following output:

java -cp ~/CaseStudy/jtorx-1.10.0-beta8/lib/hazelcast-3.1.2.jar:. TestApp
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Prefer IPv4 stack is true.
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker
INFO: Picked Address[192.168.180.240]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true
Dec 12, 2013 12:02:15 PM com.hazelcast.system
INFO: [192.168.180.240]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.180.240]:5701
Dec 12, 2013 12:02:15 PM com.hazelcast.system
INFO: [192.168.180.240]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com
Dec 12, 2013 12:02:15 PM com.hazelcast.instance.Node
INFO: [192.168.180.240]:5701 [dev] Creating MulticastJoiner
Dec 12, 2013 12:02:15 PM com.hazelcast.core.LifecycleService
INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTING
Dec 12, 2013 12:02:21 PM com.hazelcast.cluster.MulticastJoiner
INFO: [192.168.180.240]:5701 [dev] 


Members [1] {
    Member [192.168.180.240]:5701 this
}

Dec 12, 2013 12:02:22 PM com.hazelcast.core.LifecycleService
INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTED
Dec 12, 2013 12:02:22 PM com.hazelcast.management.ManagementCenterService
INFO: [192.168.180.240]:5701 [dev] Hazelcast will connect to Management Center on address: http://localhost:8080/mancenter-3.1.2/
Join: JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}}
Dec 12, 2013 12:02:22 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Initializing cluster partition table first arrangement...
hazelcast[default] > Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor
INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.8:38764
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.8:38764
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor
INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.7:54436
Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager
INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.7:54436
Dec 12, 2013 12:03:32 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181
Dec 12, 2013 12:03:32 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] 

Members [3] {
    Member [192.168.180.240]:5701 this
    Member [192.168.0.8]:5701
    Member [192.168.0.7]:5701
}

Dec 12, 2013 12:03:43 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181
Dec 12, 2013 12:03:45 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] All migration tasks has been completed, queues are empty.
Dec 12, 2013 12:03:46 PM com.hazelcast.nio.TcpIpConnection
INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.8]:5701] lost. Reason: Socket explicitly closed
Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.8]:5701
Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] 

Members [2] {
    Member [192.168.180.240]:5701 this
    Member [192.168.0.7]:5701
}

Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data... 
Dec 12, 2013 12:03:48 PM com.hazelcast.nio.TcpIpConnection
INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.7]:5701] lost. Reason: Socket explicitly closed
Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.7]:5701
Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService
INFO: [192.168.180.240]:5701 [dev] 

Members [1] {
    Member [192.168.180.240]:5701 this
}

Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService
INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data... 

我在 TestApp 的配置中看到的唯一区别是

The only difference I see in TestApp's configuration is

config.getManagementCenterConfig().setEnabled(true);
        config.getManagementCenterConfig().setUrl("http://localhost:8080/mancenter-"+version);

for(int k=1;k<= LOAD_EXECUTORS_COUNT;k++){
    config.addExecutorConfig(new ExecutorConfig("e"+k).setPoolSize(k));
}

所以我也拼命地将它添加到 myProgram 中.但这并没有解决问题——在整个运行过程中,每个实例仍然只将自己检测为成员.

so I added it in a desperate attempt into myProgram, too. But it does not solve the problem - still each instance only detects itself as member during the whole run.

可能是程序运行时间不够长(正如 pveentjer 所说)?

Could it be that the program is not running long enough (as pveentjer put it)?

我的实验似乎证实了这一点:如果 Hazelcast.newHazelcastInstance(cfg); 和初始化 cleanUp() 之间的时间 t(即不再通过 hazelcast 进行通信并且不再检查成员数)是

My experiments seem to confirm this: If the time t between Hazelcast.newHazelcastInstance(cfg); and initializing cleanUp() (i.e. no longer communicating via hazelcast and no longer checking the number of members) is

  • 不到 30 秒,没有通信和 members: 1
  • 超过 30 秒:找到所有成员并进行通信(奇怪的是,这似乎发生的时间比 t - 30 秒要长得多).
  • less than 30 seconds, no communication and members: 1
  • more than 30 seconds: all members are found and communication happens (which weirdly seems to be happening for much longer than t - 30 seconds).

30 秒是一个榛播集群需要的现实时间跨度,还是有什么奇怪的事情发生了?这是来自 4 个同时运行的 myPrograms 的日志(查找 hazelcast-members 重叠 30 秒,例如实例 1 和实例 3):

Is 30 seconds a realistic time span that a hazelcast cluster needs, or is there something strange going on? Here is a log from 4 myPrograms running concurrently (looking for hazelcast-members overlaps 30 seconds for instance 1 and instance 3):

instance 1: 2013-12-19T12:39:16.553+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:21.973+0100 and 2013-12-19T12:40:27.863+0100  
2013-12-19T12:40:28.205+0100 LOG 35 (Torx-Explorer) Model  SymToSim is about to  exit

instance 2: 2013-12-19T12:39:16.592+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:22.192+0100 and 2013-12-19T12:39:28.429+0100 
2013-12-19T12:39:28.711+0100 LOG 52 (Torx-Explorer) Model  SymToSim is about to  exit

instance 3: 2013-12-19T12:39:16.593+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:22.145+0100 and 2013-12-19T12:39:52.425+0100  
2013-12-19T12:39:52.639+0100 LOG 54 (Torx-Explorer) Model  SymToSim is about to  exit

INSTANCE 4: 2013-12-19T12:39:16.885+0100 LOG 0 (START) engine started 
looking for members between 2013-12-19T12:39:21.478+0100 and 2013-12-19T12:39:35.980+0100  
2013-12-19T12:39:36.024+0100 LOG 34 (Torx-Explorer) Model  SymToSim is about to  exit

只有在 hazelcast 集群中存在足够多的成员后,我如何才能最好地启动我的实际分布式算法?我可以以编程方式设置 hazelcast.initial.min.cluster.size 吗?https://groups.google.com/forum/#!topic/hazelcast/sa-lmpEDa6A 听起来像这样会阻塞 Hazelcast.newHazelcastInstance(cfg); 直到达到 initial.min.cluster.size.正确的?不同实例将如何同步(在哪个时间跨度内)解除阻塞?

How do I best start my actual distributed algorithm only after enough members are present in the hazelcast cluster? Can I set hazelcast.initial.min.cluster.size programmatically? https://groups.google.com/forum/#!topic/hazelcast/sa-lmpEDa6A sounds like this would block Hazelcast.newHazelcastInstance(cfg); until the initial.min.cluster.size is reached. Correct? How synchronously (within which time span) will the different instances unblock?

推荐答案

问题显然是集群启动(和停止)并且没有等到集群中有足够多的成员.您可以设置 hazelcast.initial.min.cluster.size 属性,以防止这种情况发生.

The problem appearently is that the cluster starts (and stops) and doesn't wait till enough members are in the cluster. You can set the hazelcast.initial.min.cluster.size property, to prevent this from happening.

您可以使用以下方法以编程方式设置hazelcast.initial.min.cluster.size":

You Can set 'hazelcast.initial.min.cluster.size' programmatically using:

Config config = new Config(); 
config.setProperty("hazelcast.initial.min.cluster.size","3");

这篇关于您如何以编程方式为多播发现机制配置 hazelcast?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆