使用Redis +集群扩展Node.js + Socket.IO [英] Node.js + Socket.IO scaling with redis + cluster

查看:101
本文介绍了使用Redis +集群扩展Node.js + Socket.IO的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当前,我面临的任务是必须使用Amazon EC2扩展Node.js应用程序.据我了解,执行此操作的方法是让每个子服务器使用群集中所有可用的进程,并具有粘性连接,以确保记住"连接到服务器的每个用户当前所使用的数据是什么

Currently, I'm faced with the task where I must scale a Node.js app using Amazon EC2. From what I understand, the way to do this is to have each child server use all available processes using cluster, and have sticky connections to ensure that every user connecting to the server is "remembered" as to what worker they're data is currently on from previous sessions.

完成此操作后,我所知道的下一个最佳方法是根据需要部署尽可能多的服务器,并使用nginx在所有服务器之间进行负载平衡,再次使用粘性连接来了解每个用户数据所在的子"服务器已经打开.

After doing this, the next best move from what I know is to deploy as many servers as needed, and use nginx to load balance between all of them, again using sticky connections to know which "child" server that each users data is on.

那么,当用户连接到服务器时,会发生这种情况吗?

So when a user connects to the server, is this what happens?

客户端连接->查找/选择服务器->查找/选择进程-> Socket.IO握手/连接等

Client connection -> Find/Choose server -> Find/Choose process -> Socket.IO handshake/connection etc.

如果没有,请允许我更好地了解此负载平衡任务.我也不明白在这种情况下redis的重要性.

If not, please allow me to better understand this load balancing task. I also do not understand the importance of redis in this situation.

下面是我用于在一台机器上使用所有CPU进行单独的Node.js进程的代码:

Below is the code I'm using to use all CPU's on one machine for a seperate Node.js process:

var express = require('express');
cluster = require('cluster'),
net = require('net'),
sio = require('socket.io'),
sio_redis = require('socket.io-redis');

var port = 3502,
num_processes = require('os').cpus().length;

if (cluster.isMaster) {
// This stores our workers. We need to keep them to be able to reference
// them based on source IP address. It's also useful for auto-restart,
// for example.
var workers = [];

// Helper function for spawning worker at index 'i'.
var spawn = function(i) {
    workers[i] = cluster.fork();

    // Optional: Restart worker on exit
    workers[i].on('exit', function(worker, code, signal) {
        console.log('respawning worker', i);
        spawn(i);
    });
};

// Spawn workers.
for (var i = 0; i < num_processes; i++) {
    spawn(i);
}

// Helper function for getting a worker index based on IP address.
// This is a hot path so it should be really fast. The way it works
// is by converting the IP address to a number by removing the dots,
// then compressing it to the number of slots we have.
//
// Compared against "real" hashing (from the sticky-session code) and
// "real" IP number conversion, this function is on par in terms of
// worker index distribution only much faster.
var worker_index = function(ip, len) {
    var s = '';
    for (var i = 0, _len = ip.length; i < _len; i++) {
        if (ip[i] !== '.') {
            s += ip[i];
        }
    }

    return Number(s) % len;
};

// Create the outside facing server listening on our port.
var server = net.createServer({ pauseOnConnect: true }, function(connection) {
    // We received a connection and need to pass it to the appropriate
    // worker. Get the worker for this connection's source IP and pass
    // it the connection.
    var worker = workers[worker_index(connection.remoteAddress, num_processes)];
    worker.send('sticky-session:connection', connection);
}).listen(port);
} else {
// Note we don't use a port here because the master listens on it for us.
var app = new express();

// Here you might use middleware, attach routes, etc.

// Don't expose our internal server to the outside.
var server = app.listen(0, 'localhost'),
    io = sio(server);

// Tell Socket.IO to use the redis adapter. By default, the redis
// server is assumed to be on localhost:6379. You don't have to
// specify them explicitly unless you want to change them.
io.adapter(sio_redis({ host: 'localhost', port: 6379 }));

// Here you might use Socket.IO middleware for authorization etc.

console.log("Listening");
// Listen to messages sent from the master. Ignore everything else.
process.on('message', function(message, connection) {
    if (message !== 'sticky-session:connection') {
        return;
    }

    // Emulate a connection event on the server by emitting the
    // event with the connection the master sent us.
    server.emit('connection', connection);

    connection.resume();
});
}

推荐答案

我相信您的一般理解是正确的,尽管我想发表一些评论:

I believe your general understanding is correct, although I'd like to make a few comments:

您是正确的,一种进行负载平衡的方法是在不同实例之间具有nginx负载平衡,并且在每个实例内部,在其创建的工作进程之间具有群集平衡.但是,这只是一种方法,不一定总是最好的方法.

You're correct that one way to do load balancing is having nginx load balance between the different instances, and inside each instance have cluster balance between the worker processes it creates. However, that's just one way, and not necessarily always the best one.

例如,如果您仍在使用AWS,则可能要考虑使用 ELB .它是专为EC2实例的负载平衡而设计的,它使在实例之间配置负载平衡的问题变得微不足道.它还提供了许多有用的功能,并且(使用 Auto Scaling )可以使缩放变得非常动态,而无需您的一切努力.

For one, if you're using AWS anyway, you might want to consider using ELB. It was designed specifically for load balancing EC2 instances, and it makes the problem of configuring load balancing between instances trivial. It also provides a lot of useful features, and (with Auto Scaling) can make scaling extremely dynamic without requiring any effort on your part.

ELB的一个功能(与您的问题特别相关)是它支持粘性会话

One feature ELB has, which is particularly pertinent to your question, is that it supports sticky sessions out of the box - just a matter of marking a checkbox.

但是,我必须添加一个主要警告,那就是ELB可以在

However, I have to add a major caveat, which is that ELB can break socket.io in bizarre ways. If you just use long polling you should be fine (assuming sticky sessions are enabled), but getting actual websockets working is somewhere between extremely frustrating and impossible.

虽然有很多使用群集的替代方法,但内部节点和没有,我倾向于同意集群本身通常非常好.

While there are a lot of alternatives to using cluster, both within Node and without, I tend to agree cluster itself is usually perfectly fine.

但是,它不起作用的一种情况是,您想要在负载均衡器后面进行粘性会话,就像您在此处所做的那样.

However, one case where it does not work is when you want sticky sessions behind a load balancer, as you apparently do here.

首先,应该明确地说,您首先需要粘性会话的唯一原因是因为socket.io依赖于工作请求之间(在Websocket握手期间或基本上在握手之间)存储在内存中的会话数据以便进行长时间的轮询).通常,出于各种原因,应尽可能避免依赖以这种方式存储的数据,但是使用socket.io时,您实际上别无选择.

First off, it should be made explicit that the only reason you even need sticky sessions in the first place is because socket.io relies on session data stored in-memory between requests to work (during the handshake for websockets, or basically throughout for long polling). In general, relying on data stored this way should be avoided as much as possible, for a variety of reasons, but with socket.io you don't really have a choice.

现在,这似乎还不错,因为集群可以使用文档中提到的"noreferrer>粘性会话模块,或您似乎正在使用的代码段.

Now, this doesn't seem too bad, since cluster can support sticky sessions, using the sticky-session module mentioned in socket.io's documentation, or the snippet you seem to be using.

事实是,由于这些粘性会话基于客户端的IP,因此它们无法在负载均衡器(无论是nginx,ELB还是其他任何东西)后面工作,因为此时实例内部可见的所有内容是负载平衡器的IP.您的代码尝试散列的remoteAddress实际上根本不是客户的地址.

The thing is, since these sticky sessions are based on the client's IP, they won't work behind a load balancer, be it nginx, ELB, or anything else, since all that's visible inside the instance at that point is the load balancer's IP. The remoteAddress your code tries to hash isn't actually the client's address at all.

也就是说,当您的节点代码尝试充当进程之间的负载平衡器时,它尝试使用的IP始终是另一个负载平衡器的IP,在之间平衡>实例.因此,所有请求最终都将在同一过程中完成,从而破坏了群集的整个目的.

That is, when your Node code tries to act as a load balancer between processes, the IP it tries to use will just always be the IP of the other load balancer, that balances between instances. Therefore, all requests will end up at the same process, defeating cluster's whole purpose.

您可以在此问题.

正如我之前提到的,一旦您有多个实例/进程接收到来自用户的请求,则会话数据的内存中存储将不再足够.粘性会话是一种解决方法,尽管存在其他一些更好的解决方案,其中包括Redis可以提供的中央会话存储.请参阅此帖子,以对该主题进行相当全面的审查.

As I mentioned earlier, once you have multiple instances/processes receiving requests from your users, in-memory storage of session data is no longer sufficient. Sticky sessions are one way to go, although other, arguably better solutions exist, among them central session storage, which Redis can provide. See this post for a pretty comprehensive review of the subject.

不过,由于您的问题是关于socket.io的,所以我假设您可能是说Redis对websocket的特定重要性,所以:

Seeing as your question is about socket.io, though, I'll assume you probably meant Redis's specific importance for websockets, so:

当您有多个socket.io服务器(实例/进程)时,给定用户在任何给定时间将仅连接到一个这样的服务器.但是,任何服务器都可能随时希望向给定用户发送消息,或者甚至向所有用户广播,无论他们当前在哪个服务器下.

When you have multiple socket.io servers (instances/processes), a given user will be connected to only one such server at any given time. However, any of the servers may, at any time, wish to emit a message to a given user, or even a broadcast to all users, regardless of which server they're currently under.

为此,socket.io支持适配器"(Redis是其中的一个),该适配器允许不同的socket.io服务器之间进行通信.当一台服务器发出消息时,它会进入Redis,然后所有服务器都可以看到它(Pub/Sub)并将其发送给用户,以确保消息到达目标.

To that end, socket.io supports "Adapters", of which Redis is one, that allow the different socket.io servers to communicate among themselves. When one server emits a message, it goes into Redis, and then all servers see it (Pub/Sub) and can send it to their users, making sure the message will reach its target.

这又在关于多个节点的socket.io的文档中进行了解释,甚至在此Stack Overflow中更胜一筹 answer .

This, again, is explained in socket.io's documentation regarding multiple nodes, and perhaps even better in this Stack Overflow answer.

这篇关于使用Redis +集群扩展Node.js + Socket.IO的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆