NodeJS:处理 TCP 套接字流的正确方法是什么?我应该使用哪个分隔符? [英] NodeJS: What is the proper way to handling TCP socket streams ? Which delimiter should I use?

查看:20
本文介绍了NodeJS:处理 TCP 套接字流的正确方法是什么?我应该使用哪个分隔符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解 此处,V8 有一个分代垃圾收集器.随机移动对象.Node无法获得指向原始字符串数据的指针以写入套接字."所以我不应该将来自 TCP 流的数据存储在字符串中,特别是如果该字符串变得大于 Math.pow(2,16) 字节.(希望我现在是对的..)

From what I understood here, "V8 has a generational garbage collector. Moves objects aound randomly. Node can’t get a pointer to raw string data to write to socket." so I shouldn't store data that comes from a TCP stream in a string, specially if that string becomes bigger than Math.pow(2,16) bytes. (hope I'm right till now..)

那么处理来自 TCP 套接字的所有数据的最佳方法是什么?到目前为止,我一直在尝试使用 _:_:_ 作为分隔符,因为我认为它在某种程度上是独一无二的,不会混淆其他东西.

What is then the best way to handle all the data that's comming from a TCP socket ? So far I've been trying to use _:_:_ as a delimiter because I think it's somehow unique and won't mess around other things.

即将到来的数据样本将是something_:_:_可能是一个大文本_:_:_可能是大量的行_:_:_更多的数据

A sample of the data that would come would be something_:_:_maybe a large text_:_:_ maybe tons of lines_:_:_more and more data

这就是我尝试做的:

net = require('net');
var server = net.createServer(function (socket) {
    socket.on('connect',function() {
        console.log('someone connected');
        buf = new Buffer(Math.pow(2,16));  //new buffer with size 2^16
        socket.on('data',function(data) {
            if (data.toString().search('_:_:_') === -1) {    // If there's no separator in the data that just arrived...
                buf.write(data.toString());   // ... write it on the buffer. it's part of another message that will come.
            } else {        // if there is a separator in the data that arrived
                parts = data.toString().split('_:_:_'); // the first part is the end of a previous message, the last part is the start of a message to be completed in the future. Parts between separators are independent messages
                if (parts.length == 2) {
                    msg = buf.toString('utf-8',0,4) + parts[0];
                    console.log('MSG: '+ msg);
                    buf = (new Buffer(Math.pow(2,16))).write(parts[1]);
                } else {
                    msg = buf.toString() + parts[0];
                    for (var i = 1; i <= parts.length -1; i++) {
                        if (i !== parts.length-1) {
                            msg = parts[i];
                            console.log('MSG: '+msg);
                        } else {
                            buf.write(parts[i]);
                        }
                    }
                }
            }
        });
    });
});

server.listen(9999);

每当我尝试 console.log('MSG' + msg) 时,它都会打印出整个缓冲区,因此查看是否有效是无用的.

Whenever I try to console.log('MSG' + msg), it will print out the whole buffer, so it's useless to see if something worked.

我如何以正确的方式处理这些数据?即使这些数据不是面向行的,惰性模块也能工作吗?是否有其他模块可以处理非面向行的流?

How can I handle this data the proper way ? Would the lazy module work, even if this data is not line oriented ? Is there some other module to handle streams that are not line oriented ?

推荐答案

确实有人说有额外的工作正在进行,因为 Node 必须获取该缓冲区,然后将其推送到 v8/将其转换为字符串.但是,在缓冲区上执行 toString() 也好不到哪里去.据我所知,目前还没有好的解决方案,尤其是如果您的最终目标是获得一根绳子并随意使用它.这是 Ryan 在@nodeconf 中提到的需要完成工作的领域之一.

It has indeed been said that there's extra work going on because Node has to take that buffer and then push it into v8/cast it to a string. However, doing a toString() on the buffer isn't any better. There's no good solution to this right now, as far as I know, especially if your end goal is to get a string and fool around with it. Its one of the things Ryan mentioned @ nodeconf as an area where work needs to be done.

至于分隔符,您可以选择任何您想要的.许多二进制协议选择包含一个固定的头,这样你就可以把东西放在一个正常的结构中,很多时候它包含一个长度.通过这种方式,您可以将已知的标头切分并获取有关其余数据的信息,而无需遍历整个缓冲区.有了这样的方案,人们可​​以使用如下工具:

As for delimiter, you can choose whatever you want. A lot of binary protocols choose to include a fixed header, such that you can put things in a normal structure, which a lot of times includes a length. In this way, you slice apart a known header and get information about the rest of the data without having to iterate over the entire buffer. With a scheme like that, one can use a tool like:

顺便说一句,缓冲区可以通过数组语法访问,也可以用 .slice() 分割.

As an aside, buffers can be accessed via array syntax, and they can also be sliced apart with .slice().

最后,检查这里:https://github.com/joyent/node/wiki/modules -- 找一个解析简单tcp协议的模块,貌似还不错,读了一些代码.

Lastly, check here: https://github.com/joyent/node/wiki/modules -- find a module that parses a simple tcp protocol and seems to do it well, and read some code.

这篇关于NodeJS:处理 TCP 套接字流的正确方法是什么?我应该使用哪个分隔符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆