hazelcast-jet部署和数据摄取 [英] hazelcast-jet deployment and data ingestion

查看:348
本文介绍了hazelcast-jet部署和数据摄取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个在AWS EC2实例上运行的分布式系统。我的集群有大约2000个节点。我想介绍一种流处理模型,该模型可以处理每个节点定期发布的元数据(CPU使用率,内存使用率,IO等)。我的系统只关心最新数据。当处理模型关闭时,丢失几个数据点也是可以的。因此,我选择了hazelcast-jet,它是一种具有出色性能的内存处理模型。在这里,我对模型有一些疑问:

I have a distributed system running on AWS EC2 instances. My cluster has around 2000 nodes. I want to introduce a stream processing model which can process metadata being periodically published by each node (cpu usage, memory usage, IO and etc..). My system only cares about the latest data. It is also OK with missing a couple of data points when the processing model is down. Thus, I picked hazelcast-jet which is an in-memory processing model with great performance. Here I have a couple of questions regarding the model:


  1. 将hazelcast-jet部署到多个ec2实例的最佳方法是什么? li>
  2. 如何从数千个来源中提取数据?源将数据推送而不是拉出数据。

  3. 如何配置客户端,使其知道在哪里提交任务?

  1. What is the best way to deploy hazelcast-jet to multiple ec2 instances?
  2. How to ingest data from thousands of sources? The sources push data instead of being pulled.
  3. How to config client so that it knows where to submit the tasks?

如果有一个可供我学习的综合示例,这将非常有用。

It would be super useful if there is a comprehensive example where I can learn from.

推荐答案


将hazelcast-jet部署到多个ec2实例的最佳方法是什么?

What is the best way to deploy hazelcast-jet to multiple ec2 instances?




  1. 在每台计算机上下载并解压缩Hazelcast Jet发行版:

  1. Download and unzip the Hazelcast Jet distribution on each machine:

$ wget https://download.hazelcast.com/jet/hazelcast-jet-3.1.zip
$ unzip hazelcast-jet-3.1.zip
$ cd hazelcast-jet-3.1


  • 转到解压缩发行版的 lib 目录并下载 hazelcast-aws 模块:

    $ cd lib
    $ wget https://repo1.maven.org/maven2/com/hazelcast/hazelcast-aws/2.4/hazelcast-aws-2.4.jar
    


  • 编辑 bin / common.sh 将模块添加到类路径。在文件末尾是一行

  • Edit bin/common.sh to add the module to the classpath. Towards the end of the file is a line

    CLASSPATH="$JET_HOME/lib/hazelcast-jet-3.1.jar:$CLASSPATH"
    

    您可以复制此行并替换 -jet-3.1 -aws-2.4

    You can duplicate this line and replace -jet-3.1 with -aws-2.4.

    编辑 config /hazelcast.xml 启用AWS集群发现。详细信息在此处。在这一步中,您必须处理IAM角色,EC2安全组,区域等。此外,还有关于AWS部署的最佳实践指南

    Edit config/hazelcast.xml to enable the AWS cluster discovery. The details are here. In this step you'll have to deal with IAM roles, EC2 security groups, regions, etc. There's also a best practices guide for AWS deployment.

    使用 jet-start.sh <<启动集群/ code>。

    Start the cluster with jet-start.sh.




    如何配置客户端,使其知道位置提交任务?

    How to config client so that it knows where to submit the tasks?

    一种简单的方法是指定运行Jet的计算机的公共IP,例如:

    A straightforward approach is to specify the public IPs of the machines where Jet is running, for example:

    ClientConfig clientConfig = new ClientConfig();
    clientConfig.getGroupConfig().setName("jet");
    clientConfig.addAddress("54.224.63.209", "34.239.139.244");
    

    但是,根据您的AWS设置,它们可能不稳定,因此您可以配置发现它们也一样此处

    However, depending on your AWS setup, these may not be stable, so you can configure to discover them as well. This is explained here.


    如何从数千个源中提取数据?源将数据推送而不是拉出数据。

    How to ingest data from thousands of sources? The sources push data instead of being pulled.

    我认为您最好的选择是将数据放入Hazelcast地图中,然后使用 mapJournal 源从中获取更新事件。

    I think your best option for this is to put the data into a Hazelcast Map, and use a mapJournal source to get the update events from it.

    这篇关于hazelcast-jet部署和数据摄取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆