中间件,用于为分布式系统构建数据收集和监视 [英] Middleware to build data-gathering and monitoring for a distributed system

查看:120
本文介绍了中间件,用于为分布式系统构建数据收集和监视的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在寻找一种好的中间件来构建监视和维护系统的解决方案.我们面临的挑战是监视,收集并维护由多达10,000个独立节点组成的分布式系统.

I am currently looking for a good middleware to build a solution to for a monitoring and maintenance system. We are tasked with the challenge to monitor, gather data from and maintain a distributed system consisting of up to 10,000 individual nodes.

系统分为5-20个节点的组.每个组(作为一个团队)通过处理传入的传感器数据来生成数据.每个组都有一个专用节点(蓝色框),充当该组的立面/代理,将组中的数据和状态公开给外界.这些群集在地理位置上是分离的,并且可以通过不同的网络连接到外部世界(一个可以在光纤上运行,另一个可以在3G/卫星上运行).我们很可能会遇到更短的时间(秒/分钟)和更长的时间(小时).数据由每个群集本地保存.

The system is clustered into groups of 5-20 nodes. Each group produces data (as a team) by processing incoming sensor data. Each group has a dedicated node (blue boxes) acting as a facade/proxy for the group, exposing data and state from the group to the outside world. These clusters are geographically separated and may connect to the outside world over different networks (one may run over fiber, another over 3G/Satellite). It is likely we will experience both shorter (seconds/minutes) and longer (hours) outages. The data is persisted by each cluster locally.

此数据需要由外部& amp;收集(连续可靠).中央服务器(绿色框),以供各种客户端(橙色框)进行进一步处理,分析和查看.另外,我们需要通过每个组代理节点监视所有节点的状态.即使中间件可以支持每个节点(处理来自约10,000个节点的心跳/状态消息),也不需要直接监视每个节点.如果发生代理故障,则可以使用其他方法来查明各个节点.

This data needs to be collected (continuously and reliably) by external & centralized server(s) (green boxes) for further processing, analysis and viewing by various clients (orange boxes). Also, we need to monitor the state of all nodes through each groups proxy node. It is not required to monitor each node directly, even though it would be good if the middleware could support that (handle heartbeat/state messages from ~10,000 nodes). In case of proxy failure, other methods are available to pinpoint individual nodes.

此外,我们需要能够与每个节点进行交互以调整设置等.但这似乎更容易解决,因为在需要时每个节点都需要手动处理.可能需要进行一些批量调整,但总的来说,这看起来像是标准的RPC情况(Web Service或类似服务).当然,如果中间件也可以通过一些请求/响应机制来解决这个问题,那将是一个加分.

Furthermore, we need to be able to interact with each node to tweak settings etc. but that seems to be more easily solved since that is mostly manually handled per-node when needed. Some batch tweaking may be needed, but all-in-all it looks like a standard RPC situation (Web Service or alike). Of course, if the middleware can handle this too, via some Request/Response mechanism that would be a plus.

要求:

  • 1000多个节点发布/提供连续数据
  • 数据需要可靠地(以某种方式)并不断地收集到一个或多个服务器上.这可能会使用某种显式请求/响应在中间件之上构建,以请求丢失的数据.如果这可以由中间件自动处理,那当然是一个加分.
  • 一个以上的服务器/订户需要能够连接到同一数据生产者/发布者并接收相同的数据
  • 每组每秒最大数据传输速率为10-20.
  • 邮件大小可能在〜100字节到4-5 KB之间
  • 节点范围从嵌入式约束系统到普通的COTS Linux/Windows盒
  • 节点通常使用C/C ++,服务器和客户端通常使用C ++/C#
  • 节点应该(最好)不需要安装额外的软件或服务器,即每个节点一个专门的代理或额外的服务很昂贵
  • 安全性将基于消息,即无需传输安全性

我们正在寻找一种解决方案,该解决方案可以处理主要代理节点(蓝色)和服务器(绿色)之间的通信,以进行数据发布/轮询/下载,以及从客户端(橙色)到单个节点(RPC样式)进行调整设置

We are looking for a solution that can handle the communication between primarily proxy nodes (blue) and servers (green) for the data publishing/polling/downloading and from clients (orange) to individual nodes (RPC style) for tweaking settings.

关于逆境的讨论和建议似乎很多;将数据从服务器分发到许多客户端,但是很难找到与所描述情况有关的信息.通用的解决方案似乎是使用SNMP,Nagios,Ganglia等来监视和修改大量节点,但是对我们来说棘手的部分是数据收集.

There seems to be a lot of discussions and recommendations for the reversed situation; distributing data from server(s) to many clients, but it has been harder to find information related to the described situation. The general solution seems to be to use SNMP, Nagios, Ganglia etc. to monitor and modify large number of nodes, but the tricky part for us is the data gathering.

我们简要介绍了DDS,ZeroMQ,RabbitMQ(在所有节点上都需要代理吗?),SNMP,各种监视工具,Web服务(JSON-RPC,REST/协议缓冲区)等解决方案.

We have briefly looked at solutions like DDS, ZeroMQ, RabbitMQ (broker needed on all nodes?), SNMP, various monitoring tools, Web Services (JSON-RPC, REST/Protocol Buffers) etc.

所以,您是否有适合使用该票据的易用,健壮,稳定,轻便,跨平台,跨语言中间件(或其他)解决方案的建议?尽可能简单,但不要简单.

So, do you have any recommendations for an easy-to-use, robust, stable, light, cross-platform, cross-language middleware (or other) solution that would fit the bill? As simple as possible but not simpler.

推荐答案

似乎ZeroMQ可以轻松满足要求,而无需管理任何中央基础架构.由于监视服务器是固定的,因此实际上是一个非常简单的问题. 《 0MQ指南》中的这一部分可能会有所帮助:

Seems ZeroMQ will fit the bill easily, with no central infrastructure to manage. Since your monitoring servers are fixed, it's really quite a simple problem to solve. This section in the 0MQ Guide may help:

http://zguide.zeromq.org/page:all#分布式日志记录和监视

您提到可靠性",但是您可以指定要恢复的实际故障集吗?如果您使用的是TCP,那么根据定义,网络已经可靠".

You mention "reliability", but could you specify the actual set of failures you want to recover? If you are using TCP then the network is by definition "reliable" already.

这篇关于中间件,用于为分布式系统构建数据收集和监视的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆